├── FAQS ├── LICENSE ├── Makefile ├── README ├── dspl.hpp ├── graph.hpp ├── main.cpp └── utils.hpp /FAQS: -------------------------------------------------------------------------------- 1 | ***************** 2 | * miniVite FAQs * 3 | ***************** 4 | ---------------------------------------------------- 5 | FYI, typical "How to run" queries are addressed Q5 6 | onward. 7 | 8 | Please send your suggestions for improving this FAQ 9 | to zsayanz at gmail dot com OR hala at pnnl dot gov. 10 | ---------------------------------------------------- 11 | 12 | ------------------------------------------------------------------------- 13 | Q1. What is graph community detection? 14 | ------------------------------------------------------------------------- 15 | 16 | A1. In most real-world graphs/networks, the nodes/vertices tend to be 17 | organized into tightly-knit modules known as communities or clusters, 18 | such that nodes within a community are more likely to be "related" to 19 | one another than they are to the rest of the network. The goodness of 20 | partitioning into communities is typically measured using a metric 21 | called modularity. Community detection is the method of identifying 22 | these clusters or communities in graphs. 23 | 24 | [References] 25 | 26 | Fortunato, Santo. "Community detection in graphs." Physics reports 27 | 486.3-5 (2010): 75-174. https://arxiv.org/pdf/0906.0612.pdf 28 | 29 | -------------------------------------------------------------------------- 30 | Q2. What is miniVite? 31 | -------------------------------------------------------------------------- 32 | 33 | A2. miniVite is a distributed-memory code (or mini application) that 34 | performs partial graph community detection using the Louvain method. 35 | Louvain method is a multi-phase, iterative heuristic that performs 36 | modularity optimization for graph community detection. miniVite only 37 | performs the first phase of Louvain method. 38 | 39 | [Code] 40 | 41 | https://github.com/Exa-Graph/miniVite 42 | http://hpc.pnl.gov/people/hala/grappolo.html 43 | 44 | [References] 45 | 46 | [Louvain method] Blondel, Vincent D., et al. "Fast unfolding of 47 | communities in large networks." Journal of statistical mechanics: 48 | theory and experiment 2008.10 (2008): P10008. 49 | 50 | [miniVite] Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, 51 | Gebremedhin AH. miniVite: A Graph Analytics Benchmarking Tool for 52 | Massively Parallel Systems. 53 | 54 | --------------------------------------------------------------------------- 55 | Q3. What is the parent application of miniVite? How are they different? 56 | --------------------------------------------------------------------------- 57 | 58 | A3. miniVite is derived from Vite, which implements the multi-phase 59 | Louvain method. Apart from a parallel baseline version, Vite provides 60 | a number of heuristics (such as early termination, threshold cycling and 61 | incomplete coloring) that can improve the scalability and quality of 62 | community detection. In contrast, miniVite just provides a parallel 63 | baseline version, and, has option to select different MPI communication 64 | methods (such as send/recv, collectives and RMA) for one of the most 65 | communication intensive portions of the code. miniVite also includes an 66 | in-memory random geometric graph generator, making it convenient for 67 | users to run miniVite without any external files. Vite can also convert 68 | graphs from different native formats (like matrix market, SNAP, edge 69 | list, DIMACS, etc) to the binary format that both Vite and miniVite 70 | requires. 71 | 72 | [Code] 73 | 74 | http://hpc.pnl.gov/people/hala/grappolo.html 75 | 76 | [References] 77 | 78 | Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Lu H, Chavarria-Miranda D, 79 | Khan A, Gebremedhin A. Distributed louvain algorithm for graph community 80 | detection. In 2018 IEEE International Parallel and Distributed Processing 81 | Symposium (IPDPS) 2018 May 21 (pp. 885-895). IEEE. 82 | 83 | Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Gebremedhin AH. 84 | Scalable Distributed Memory Community Detection Using Vite. 85 | In 2018 IEEE High Performance extreme Computing Conference (HPEC) 2018 86 | Sep 25 (pp. 1-7). IEEE. 87 | 88 | ----------------------------------------------------------------------------- 89 | Q4. Is there a shared-memory equivalent of Vite/miniVite? 90 | ----------------------------------------------------------------------------- 91 | 92 | A4. Yes, Grappolo performs shared-memory community detection using Louvain 93 | method. Apart from community detection, Grappolo has routines for matrix 94 | reordering as well. 95 | 96 | [Code] 97 | 98 | http://hpc.pnl.gov/people/hala/grappolo.html 99 | 100 | [References] 101 | 102 | Lu H, Halappanavar M, Kalyanaraman A. Parallel heuristics for scalable 103 | community detection. Parallel Computing. 2015 Aug 1;47:19-37. 104 | 105 | Halappanavar M, Lu H, Kalyanaraman A, Tumeo A. Scalable static and dynamic 106 | community detection using grappolo. In High Performance Extreme Computing 107 | Conference (HPEC), 2017 IEEE 2017 Sep 12 (pp. 1-6). IEEE. 108 | 109 | ------------------------------------------------------------------------------ 110 | Q5. How does one perform strong scaling analysis using miniVite? How to 111 | determine 'good' candidates (input graphs) that can be used for strong 112 | scaling runs? How much time is approximately spent in performing I/O? 113 | ------------------------------------------------------------------------------ 114 | 115 | A5. Use a large graph as an input, preferably over a billion edges. Not all 116 | large graphs have a good community structure. You should be able to identify 117 | one that serves your purpose, hopefully after few trials. Graphs can be 118 | obtained various websites serving as repositories, such as Sparse TAMU 119 | collection[1], SNAP repository[2] and MIT Graph Challenge website[3], to name 120 | a few of the prominent ones. You can convert graphs from their native format to 121 | the binary format that miniVite requires, using the converters in Vite (please 122 | see README). 123 | 124 | If your graph is in Webgraph[4] format, you can easily convert it to an edge list 125 | first (example code snippet below), before passing it on to Vite for subsequent 126 | binary conversion, using the C++ version of Webgraph library[5]. Since the C++ 127 | Webgraph library is not actively developed, you may use the original Webgraph 128 | JAVA library too. 129 | 130 | #include "offline_edge_iterator.hpp" 131 | ... 132 | using namespace webgraph::ascii_graph; 133 | 134 | // read in input/output file 135 | std::ofstream ofile(argv[2]); 136 | offline_edge_iterator itor(argv[1]), end; 137 | 138 | // read edges 139 | while( itor != end ) { 140 | ofile << itor->first << " " << itor->second << std::endl; 141 | ++itor; 142 | } 143 | ofile.close(); 144 | ... 145 | 146 | miniVite takes about 2-4s to read a 55GB binary file if you use Burst buffer 147 | (Cray DataWarp) or Lustre striping (about 25 OSTs, default 1M blocks). The overall 148 | I/O time for most cases is observed to be within 1/2% of the overall execution time. 149 | This is assuming the simple vertex-based distribution (which is the default), if you 150 | pass "-b" (only valid when you pass an input graph), then miniVite tries to balance 151 | the number of edges (as a result, processes may own dissimilar number of vertices). 152 | In such cases, there will be a serial overhead that may increase the I/O time and 153 | subsequent distributed graph creation time significantly. 154 | 155 | [1] https://sparse.tamu.edu/ 156 | [2] http://snap.stanford.edu/data 157 | [3] http://graphchallenge.mit.edu/data-sets 158 | [4] http://webgraph.di.unimi.it/ 159 | [5] http://cnets.indiana.edu/groups/nan/webgraph/ 160 | 161 | ----------------------------------------------------------------------------------- 162 | Q6. How does one perform weak scaling analysis using miniVite? How does one scale 163 | the graphs with processes? 164 | ----------------------------------------------------------------------------------- 165 | 166 | A6. miniVite has an in-memory random geometric graph generator (please see 167 | README) that can be used for weak-scaling analysis. An n-D random geometric graph 168 | (RGG), is generated by randomly placing N vertices in an n-D space and connecting 169 | pairs of vertices whose Euclidean distance is less than or equal to d. We only 170 | consider 2D RGGs contained within a unit square, [0,1]^2. We distribute the domain 171 | such that each process receives N/p vertices (where p is the total 172 | number of processes). 173 | 174 | Each process owns (1 * 1/p) portion of the unit square and d is computed as (please 175 | refer to Section 4 of miniVite paper for details): 176 | 177 | d = (dc + dt)/2; 178 | where, dc = sqrt(ln(N) / pi*N); dt = sqrt(2.0736 / pi*N) 179 | 180 | Therefore, the number of vertices (N) passed during miniVite execution on p 181 | processes must satisfy the condition -- 1/p > d. 182 | 183 | Please note, the default distribution of graph generated from the in-built random 184 | geometric graph generator causes a process to only communicate with its two 185 | immediate neighbors. If you want to increase the communication intensity for 186 | generated graphs, please use the "-p" option to specify an extra percentage of edges 187 | that will be generated, linking random vertices. As a side-effect, this option 188 | significantly increases the time required to generate the graph. 189 | 190 | ------------------------------------------------------------------------------ 191 | Q7. Does Vite (the parent application to miniVite) have an in-built graph 192 | generator? 193 | ------------------------------------------------------------------------------ 194 | 195 | A7. At present, Vite does not have an in-built graph generator that we have in 196 | miniVite, so we rely on users providing external graphs for Vite (strong/weak 197 | scaling) analysis. However, Vite has bindings to NetworKit[5], and users can use 198 | those bindings to generate graphs of their choice from Vite (refer to the 199 | README). Generating large graphs in this manner can take a lot of time, since 200 | there are intermediate copies and the graph generators themselves may be serial 201 | or may use threads on a shared-memory system. We do not plan on supporting the 202 | NetworKit bindings in future. 203 | 204 | [5] https://networkit.github.io/ 205 | 206 | ------------------------------------------------------------------------------ 207 | Q8. Does providing a larger input graph translate to comparatively larger 208 | execution times? Is it possible to control the execution time for a particular 209 | graph? 210 | ------------------------------------------------------------------------------ 211 | 212 | A8. No. A relatively small graph can run for many iterations, as compared to 213 | a larger graph that runs for a few iterations to convergence. Since miniVite is 214 | iterative, the final number of iterations to convergence (and hence, execution 215 | time) depends on the structure of the graph. It is however possible to exit 216 | early by passing a larger threshold (using the "-t <...>" option, the default 217 | threshold or tolerance is 1.0E-06, a larger threshold can be passed, for e.g, 218 | "-t 1.0E-03"), that should reduce the overall execution time for all graphs in 219 | general (at least w.r.t miniVite, which only executes the first phase of Louvain 220 | method). 221 | 222 | ------------------------------------------------------------------------------ 223 | Q9. Is there an option to add some noise in the generated random geometric 224 | graphs? 225 | ------------------------------------------------------------------------------ 226 | 227 | A9. Yes, the "-p " option allows extra edges to be added between 228 | random vertices (see README). This increases the overall communication, but 229 | affects the structure of communities in the generated graph (lowers the 230 | modularity). Therefore, adding extra edges in the generated graph will 231 | most probably reduce the global modularity, and the number of iterations to 232 | convergence shall decrease. 233 | The maximum number of edges that can be added is bounded by INT_MAX, at 234 | present, we do not handle data ranges more than INT_MAX. 235 | 236 | ------------------------------------------------------------------------------ 237 | Q10. What are the steps required for using real-world graphs as an input to 238 | miniVite? 239 | ------------------------------------------------------------------------------ 240 | 241 | A10. First, please download Vite (parent application of miniVite) from: 242 | https://github.com/ECP-ExaGraph/vite 243 | 244 | Graphs/Sparse matrices come in several native formats (matrix market, SNAP, 245 | DIMACS, etc.) Vite has several options to convert graphs from native to the 246 | binary format that miniVite requires (please take a look at Vite README). 247 | 248 | As an example, you can download the Friendster file from: 249 | https://sparse.tamu.edu/SNAP/com-Friendster 250 | The option to convert Friendster to binary using Vite's converter is as follows 251 | (please note, this part is serial): 252 | 253 | $VITE_BIN_PATH/bin/./fileConvertDist -f $INPUT_PATH/com-Friendster.mtx 254 | -m -o $OUTPUT_PATH/com-Friendster.bin 255 | 256 | After the conversion, you can run miniVite with the binary file obtained 257 | from the previous step: 258 | 259 | mpiexec -n <...> $MINIVITE_PATH/./dspl -r 260 | -f $FILE_PATH/com-Friendster.bin 261 | 262 | -------------------------------------------------------------------------------- 263 | Q11. miniVite is scalable for a particular input graph, but not for another 264 | similar sized graph, why is that and what can I do to improve the situation? 265 | -------------------------------------------------------------------------------- 266 | 267 | A11. Presently, our default distribution is vertex-based. That means a process 268 | owns N/p vertices and all the edges connected to those N/p vertices (including 269 | ghost vertices). Load imbalances are very probable in this type of distribution, 270 | depending on the graph structure. Instead, it may be favorable to use an 271 | edge-centric distribution, in which processes own approximately equal number of 272 | edges. When the "-b" option is passed, miniVite attempts to distribute the 273 | vertices across processes such each process owns approximately similar number 274 | of edges. This edge-balanced distribution may reduce the overall communication, 275 | improving the performance. 276 | 277 | As an example, lets say there is a large (real-world) graph, and its structure 278 | is such that only a few processes end up owning a majority of edges, as per 279 | miniVite graph data distribution. Also, lets assume that the graph has either a 280 | very poor community structure (modularity closer to 0) or very stable community 281 | structure (modularity close to 1 after a few iterations, that means not many 282 | vertices are migrating to neighboring communities). In both these cases, 283 | community detection in miniVite will run for relatively less number of 284 | iterations, which may affect the overall scalability. 285 | 286 | -------------------------------------------------------------------------------- 287 | Q12. Can miniVite return some statistics on vertex/edge distribution of the 288 | underlying graph? 289 | -------------------------------------------------------------------------------- 290 | Yes, please pass -DPRINT_DIST_STATS while building miniVite. When miniVite is 291 | executed, it returns some basic information, such as: 292 | 293 | ------------------------------------------------------- 294 | Graph edge distribution characteristics 295 | ------------------------------------------------------- 296 | Number of vertices: 34 297 | Number of edges: 156 298 | Maximum number of edges: 80 299 | Average number of edges: 78 300 | Expected value of X^2: 6088 301 | Variance: 4 302 | Standard deviation: 2 303 | ------------------------------------------------------- 304 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2018, Battelle Memorial Institute 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | * Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | * Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | 2 | CXX = CC 3 | 4 | USE_TAUPROF=0 5 | ifeq ($(USE_TAUPROF),1) 6 | TAU=/soft/perftools/tau/tau-2.29/craycnl/lib 7 | CXX = tau_cxx.sh -tau_makefile=$(TAU)/Makefile.tau-intel-papi-mpi-pdt 8 | endif 9 | # use -xmic-avx512 instead of -xHost for Intel Xeon Phi platforms 10 | OPTFLAGS = -O3 -fopenmp -DPRINT_DIST_STATS #-DPRINT_EXTRA_NEDGES #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD 11 | #-DUSE_MPI_SENDRECV 12 | #-DUSE_MPI_COLLECTIVES 13 | # use export ASAN_OPTIONS=verbosity=1 to check ASAN output 14 | SNTFLAGS = -std=c++11 -fopenmp -fsanitize=address -O1 -fno-omit-frame-pointer 15 | CXXFLAGS = -std=c++11 -g $(OPTFLAGS) 16 | 17 | OBJ = main.o 18 | TARGET = miniVite 19 | 20 | all: $(TARGET) 21 | 22 | %.o: %.cpp 23 | $(CXX) $(CXXFLAGS) -c -o $@ $^ 24 | 25 | $(TARGET): $(OBJ) 26 | $(CXX) $^ $(OPTFLAGS) -o $@ 27 | 28 | .PHONY: clean 29 | 30 | clean: 31 | rm -rf *~ $(OBJ) $(TARGET) 32 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | ************************ 2 | miniVite (/mini/ˈviːte/) 3 | ************************ 4 | 5 | ******* 6 | ------- 7 | ABOUT 8 | ------- 9 | ******* 10 | miniVite[*] is a proxy app that implements a single phase of Louvain 11 | method in distributed memory for graph community detection. Please 12 | refer to the following paper for a detailed discussion on 13 | distributed memory Louvain method implementation: 14 | https://ieeexplore.ieee.org/abstract/document/8425242/ 15 | 16 | Apart from real world graphs, users can use specific options 17 | to generate a Random Geometric Graph (RGG) in parallel. 18 | RGGs have been known to have good community structure: 19 | https://arxiv.org/pdf/1604.03993.pdf 20 | 21 | The way we have implemented a parallel RGG generator, vertices 22 | owned by a process will only have cross edges with its logical 23 | neighboring processes (each process owning 1x1/p chunk of the 24 | 1x1 unit square). If MPI process mapping is such that consecutive 25 | processes (for e.g., p and p+1) are physically close to each other, 26 | then there is not much communication stress in the application. 27 | Therefore, we allow an option to add extra edges between randomly 28 | chosen vertices, whose owners may be physically far apart. 29 | 30 | We require the total number of processes to be a power of 2 and 31 | total number of vertices to be perfectly divisible by the number of 32 | processes when parallel RGG generation options are used. 33 | This constraint does not apply to real world graphs passed to miniVite. 34 | 35 | We also allow users to pass any real world graph as input. However, 36 | we expect an input graph to be in a certain binary format, which 37 | we have observed to be more efficient than reading ASCII format 38 | files. The code for binary conversion (from a variety of common 39 | graph formats) is packaged separately with Vite, which is our 40 | full implementation of Louvain method in distributed memory. 41 | Please follow instructions in Vite README for binary file 42 | conversion. 43 | 44 | Vite could be downloaded from (please don't use the past 45 | PNNL/PNL link to download Vite, the following GitHub 46 | link is the correct one): 47 | https://github.com/ECP-ExaGraph/vite 48 | 49 | Unlike Vite, we do not implement any heuristics to improve the 50 | performance of Louvain method. miniVite is a baseline parallel 51 | version, implementing only the first phase of Louvain method. 52 | 53 | This code requires an MPI library (preferably MPI-3 compatible) 54 | and C++11 compliant compiler for building. 55 | 56 | Please contact the following for any queries or support: 57 | 58 | Sayan Ghosh, PNNL (sg0 at pnnl dot gov) 59 | Mahantesh Halappanavar, PNNL (hala at pnnl dot gov) 60 | 61 | [*] Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman 62 | A, Gebremedhin AH. miniVite: A graph analytics benchmarking tool 63 | for massively parallel systems. In 2018 IEEE/ACM Performance Modeling, 64 | Benchmarking and Simulation of High Performance Computer Systems 65 | (PMBS) 2018 Nov 12 (pp. 51-56). IEEE. 66 | 67 | Please '*' this repository on GitHub if the code is useful to you. 68 | 69 | ************* 70 | ------------- 71 | COMPILATION 72 | ------------- 73 | ************* 74 | Please update the Makefile with compiler flags and use a C++11 compliant 75 | compiler of your choice. Invoke `make clean; make` after setting paths 76 | to MPI for generating the binary. Use `mpirun` or `mpiexec` or `srun` 77 | to execute the code with specific runtime arguments mentioned in the 78 | next section. 79 | 80 | Pass -DPRINT_DIST_STATS for printing distributed graph 81 | characteristics. 82 | 83 | Pass -DDEBUG_PRINTF if detailed diagonostics is required along 84 | program run. This program requires OpenMP and C++11 support, 85 | so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/ 86 | -std=c++0x. 87 | 88 | Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are 89 | within 32-bit range (2 x 10^9), else 64-bit range is assumed. 90 | 91 | Pass -DOMP_SCHEDULE_RUNTIME if you want to set OMP_SCHEDULE 92 | for all parallel regions at runtime. If -DOMP_SCHEDULE_RUNTIME 93 | is passed, and OMP_SCHEDULE is not set, then the default schedule will 94 | be chosen (which is most probably "static" or "guided" for most of 95 | the OpenMP regions). 96 | 97 | Communicating vertex-community information (per iteration) 98 | is the most expensive step of our distributed Louvain 99 | implementation. We use the one of the following MPI communication 100 | primitives for communicating vertex-community during a Louvain 101 | iteration, that could be enabled by passing predefined 102 | macros at compile time: 103 | 104 | 1. MPI Collectives: -DUSE_MPI_COLLECTIVES 105 | 2. MPI Send-Receive: -DUSE_MPI_SENDRECV 106 | 3. MPI RMA: -DUSE_MPI_RMA (using -DUSE_MPI_ACCUMULATE 107 | additionally ensures atomic put) 108 | 4. Default: Uses MPI point-to-point nonblocking API in communication 109 | intensive parts.. 110 | 111 | Apart from these, we use MPI (blocking) collectives, mostly 112 | MPI_Alltoall. 113 | 114 | There are other predefined macros in the code as well for printing 115 | intermediate results or checking correctness or using a particular 116 | C++ data structure. 117 | 118 | *********************** 119 | ----------------------- 120 | EXECUTING THE PROGRAM 121 | ----------------------- 122 | *********************** 123 | 124 | E.g.: 125 | mpiexec -n 2 bin/./minivite -f karate.bin 126 | mpiexec -n 2 bin/./minivite -l -n 100 127 | mpiexec -n 2 bin/./minivite -n 100 128 | mpiexec -n 2 bin/./minivite -p 2 -n 100 129 | 130 | [On Cray systems, pass MPICH_MAX_THREAD_SAFETY=multiple or 131 | pass -DDISABLE_THREAD_MULTIPLE_CHECK while building miniVite.] 132 | 133 | Possible options (can be combined): 134 | 135 | 1. -f : Specify input binary file after this argument. 136 | 2. -b : Only valid for real-world inputs. Attempts to distribute approximately 137 | equal number of edges among processes. Irregular number of vertices 138 | owned by a particular process. Increases the distributed graph creation 139 | time due to serial overheads, but may improve overall execution time. 140 | 3. -n : Only valid for synthetically generated inputs. Pass total number of 141 | vertices of the generated graph. 142 | 4. -l : Use distributed LCG for randomly choosing edges. If this option 143 | is not used, we will use C++ random number generator (using 144 | std::default_random_engine). 145 | 5. -p : Only valid for synthetically generated inputs. Specify percent of overall 146 | edges to be randomly generated between processes. 147 | 6. -t : Specify threshold quantity (default: 1.0E-06) used to determine the 148 | exit criteria in an iteration of Louvain method. 149 | 7. -w : Only valid for synthetically generated inputs. Use Euclidean distance as edge weight. 150 | If this option is not used, edge weights are considered as 1.0. Generate 151 | edge weight uniformly between (0,1) if Euclidean distance is not available. 152 | 8. -r : This is used to control the number of aggregators in MPI I/O and is 153 | meaningful when an input binary graph file is passed with option "-f". 154 | naggr := (nranks > 1) ? (nprocs/nranks) : nranks; 155 | 9. -s : Print graph data (edge list along with weights). 156 | -------------------------------------------------------------------------------- /dspl.hpp: -------------------------------------------------------------------------------- 1 | // *********************************************************************** 2 | // 3 | // miniVite 4 | // 5 | // *********************************************************************** 6 | // 7 | // Copyright (2018) Battelle Memorial Institute 8 | // All rights reserved. 9 | // 10 | // Redistribution and use in source and binary forms, with or without 11 | // modification, are permitted provided that the following conditions 12 | // are met: 13 | // 14 | // 1. Redistributions of source code must retain the above copyright 15 | // notice, this list of conditions and the following disclaimer. 16 | // 17 | // 2. Redistributions in binary form must reproduce the above copyright 18 | // notice, this list of conditions and the following disclaimer in the 19 | // documentation and/or other materials provided with the distribution. 20 | // 21 | // 3. Neither the name of the copyright holder nor the names of its 22 | // contributors may be used to endorse or promote products derived from 23 | // this software without specific prior written permission. 24 | // 25 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 26 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 27 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 28 | // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 29 | // COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 30 | // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 31 | // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 32 | // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 33 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 34 | // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 35 | // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 36 | // POSSIBILITY OF SUCH DAMAGE. 37 | // 38 | // ************************************************************************ 39 | 40 | #pragma once 41 | #ifndef DSPL_HPP 42 | #define DSPL_HPP 43 | 44 | #include 45 | #include 46 | #include 47 | #include 48 | #include 49 | #include 50 | #include 51 | #include 52 | #include 53 | #include 54 | 55 | #include 56 | #include 57 | 58 | #include "graph.hpp" 59 | #include "utils.hpp" 60 | 61 | struct Comm { 62 | GraphElem size; 63 | GraphWeight degree; 64 | 65 | Comm() : size(0), degree(0.0) {}; 66 | }; 67 | 68 | struct CommInfo { 69 | GraphElem community; 70 | GraphElem size; 71 | GraphWeight degree; 72 | }; 73 | 74 | const int SizeTag = 1; 75 | const int VertexTag = 2; 76 | const int CommunityTag = 3; 77 | const int CommunitySizeTag = 4; 78 | const int CommunityDataTag = 5; 79 | 80 | static MPI_Datatype commType; 81 | 82 | void distSumVertexDegree(const Graph &g, std::vector &vDegree, std::vector &localCinfo) 83 | { 84 | const GraphElem nv = g.get_lnv(); 85 | 86 | #ifdef OMP_SCHEDULE_RUNTIME 87 | #pragma omp parallel for default(shared), shared(g, vDegree, localCinfo), schedule(runtime) 88 | #else 89 | #pragma omp parallel for default(shared), shared(g, vDegree, localCinfo), schedule(guided) 90 | #endif 91 | for (GraphElem i = 0; i < nv; i++) { 92 | GraphElem e0, e1; 93 | GraphWeight tw = 0.0; 94 | 95 | g.edge_range(i, e0, e1); 96 | 97 | for (GraphElem k = e0; k < e1; k++) { 98 | const Edge &edge = g.get_edge(k); 99 | tw += edge.weight_; 100 | } 101 | 102 | vDegree[i] = tw; 103 | 104 | localCinfo[i].degree = tw; 105 | localCinfo[i].size = 1L; 106 | } 107 | } // distSumVertexDegree 108 | 109 | GraphWeight distCalcConstantForSecondTerm(const std::vector &vDegree, MPI_Comm gcomm) 110 | { 111 | GraphWeight totalEdgeWeightTwice = 0.0; 112 | GraphWeight localWeight = 0.0; 113 | int me = -1; 114 | 115 | const size_t vsz = vDegree.size(); 116 | 117 | #ifdef OMP_SCHEDULE_RUNTIME 118 | #pragma omp parallel for default(shared), shared(vDegree), reduction(+: localWeight) schedule(runtime) 119 | #else 120 | #pragma omp parallel for default(shared), shared(vDegree), reduction(+: localWeight) schedule(static) 121 | #endif 122 | for (GraphElem i = 0; i < vsz; i++) 123 | localWeight += vDegree[i]; // Local reduction 124 | 125 | // Global reduction 126 | MPI_Allreduce(&localWeight, &totalEdgeWeightTwice, 1, 127 | MPI_WEIGHT_TYPE, MPI_SUM, gcomm); 128 | 129 | return (1.0 / static_cast(totalEdgeWeightTwice)); 130 | } // distCalcConstantForSecondTerm 131 | 132 | void distInitComm(std::vector &pastComm, std::vector &currComm, const GraphElem base) 133 | { 134 | const size_t csz = currComm.size(); 135 | 136 | #ifdef DEBUG_PRINTF 137 | assert(csz == pastComm.size()); 138 | #endif 139 | 140 | #ifdef OMP_SCHEDULE_RUNTIME 141 | #pragma omp parallel for default(shared), shared(pastComm, currComm), firstprivate(base), schedule(runtime) 142 | #else 143 | #pragma omp parallel for default(shared), shared(pastComm, currComm), firstprivate(base), schedule(static) 144 | #endif 145 | for (GraphElem i = 0L; i < csz; i++) { 146 | pastComm[i] = i + base; 147 | currComm[i] = i + base; 148 | } 149 | } // distInitComm 150 | 151 | void distInitLouvain(const Graph &dg, std::vector &pastComm, 152 | std::vector &currComm, std::vector &vDegree, 153 | std::vector &clusterWeight, std::vector &localCinfo, 154 | std::vector &localCupdate, GraphWeight &constantForSecondTerm, 155 | const int me) 156 | { 157 | const GraphElem base = dg.get_base(me); 158 | const GraphElem nv = dg.get_lnv(); 159 | MPI_Comm gcomm = dg.get_comm(); 160 | 161 | vDegree.resize(nv); 162 | pastComm.resize(nv); 163 | currComm.resize(nv); 164 | clusterWeight.resize(nv); 165 | localCinfo.resize(nv); 166 | localCupdate.resize(nv); 167 | 168 | distSumVertexDegree(dg, vDegree, localCinfo); 169 | constantForSecondTerm = distCalcConstantForSecondTerm(vDegree, gcomm); 170 | 171 | distInitComm(pastComm, currComm, base); 172 | } // distInitLouvain 173 | 174 | GraphElem distGetMaxIndex(const std::unordered_map &clmap, const std::vector &counter, 175 | const GraphWeight selfLoop, const std::vector &localCinfo, 176 | const std::map &remoteCinfo, const GraphWeight vDegree, 177 | const GraphElem currSize, const GraphWeight currDegree, const GraphElem currComm, 178 | const GraphElem base, const GraphElem bound, const GraphWeight constant) 179 | { 180 | std::unordered_map::const_iterator storedAlready; 181 | GraphElem maxIndex = currComm; 182 | GraphWeight curGain = 0.0, maxGain = 0.0; 183 | GraphWeight eix = static_cast(counter[0]) - static_cast(selfLoop); 184 | 185 | GraphWeight ax = currDegree - vDegree; 186 | GraphWeight eiy = 0.0, ay = 0.0; 187 | 188 | GraphElem maxSize = currSize; 189 | GraphElem size = 0; 190 | 191 | storedAlready = clmap.begin(); 192 | #ifdef DEBUG_PRINTF 193 | assert(storedAlready != clmap.end()); 194 | #endif 195 | do { 196 | if (currComm != storedAlready->first) { 197 | 198 | // is_local, direct access local info 199 | if ((storedAlready->first >= base) && (storedAlready->first < bound)) { 200 | ay = localCinfo[storedAlready->first-base].degree; 201 | size = localCinfo[storedAlready->first - base].size; 202 | } 203 | else { 204 | // is_remote, lookup map 205 | std::map::const_iterator citer = remoteCinfo.find(storedAlready->first); 206 | ay = citer->second.degree; 207 | size = citer->second.size; 208 | } 209 | 210 | eiy = counter[storedAlready->second]; 211 | 212 | curGain = 2.0 * (eiy - eix) - 2.0 * vDegree * (ay - ax) * constant; 213 | 214 | if ((curGain > maxGain) || 215 | ((curGain == maxGain) && (curGain != 0.0) && (storedAlready->first < maxIndex))) { 216 | maxGain = curGain; 217 | maxIndex = storedAlready->first; 218 | maxSize = size; 219 | } 220 | } 221 | storedAlready++; 222 | } while (storedAlready != clmap.end()); 223 | 224 | if ((maxSize == 1) && (currSize == 1) && (maxIndex > currComm)) 225 | maxIndex = currComm; 226 | 227 | return maxIndex; 228 | } // distGetMaxIndex 229 | 230 | GraphWeight distBuildLocalMapCounter(const GraphElem e0, const GraphElem e1, std::unordered_map &clmap, 231 | std::vector &counter, const Graph &g, 232 | const std::vector &currComm, 233 | const std::unordered_map &remoteComm, 234 | const GraphElem vertex, const GraphElem base, const GraphElem bound) 235 | { 236 | GraphElem numUniqueClusters = 1L; 237 | GraphWeight selfLoop = 0; 238 | std::unordered_map::const_iterator storedAlready; 239 | 240 | for (GraphElem j = e0; j < e1; j++) { 241 | 242 | const Edge &edge = g.get_edge(j); 243 | const GraphElem &tail_ = edge.tail_; 244 | const GraphWeight &weight = edge.weight_; 245 | GraphElem tcomm; 246 | 247 | if (tail_ == vertex + base) 248 | selfLoop += weight; 249 | 250 | // is_local, direct access local std::vector 251 | if ((tail_ >= base) && (tail_ < bound)) 252 | tcomm = currComm[tail_ - base]; 253 | else { // is_remote, lookup map 254 | std::unordered_map::const_iterator iter = remoteComm.find(tail_); 255 | 256 | #ifdef DEBUG_PRINTF 257 | assert(iter != remoteComm.end()); 258 | #endif 259 | tcomm = iter->second; 260 | } 261 | 262 | storedAlready = clmap.find(tcomm); 263 | 264 | if (storedAlready != clmap.end()) 265 | counter[storedAlready->second] += weight; 266 | else { 267 | clmap.insert(std::unordered_map::value_type(tcomm, numUniqueClusters)); 268 | counter.push_back(weight); 269 | numUniqueClusters++; 270 | } 271 | } 272 | 273 | return selfLoop; 274 | } // distBuildLocalMapCounter 275 | 276 | void distExecuteLouvainIteration(const GraphElem i, const Graph &dg, const std::vector &currComm, 277 | std::vector &targetComm, const std::vector &vDegree, 278 | std::vector &localCinfo, std::vector &localCupdate, 279 | const std::unordered_map &remoteComm, 280 | const std::map &remoteCinfo, 281 | std::map &remoteCupdate, const GraphWeight constantForSecondTerm, 282 | std::vector &clusterWeight, const int me) 283 | { 284 | GraphElem localTarget = -1; 285 | GraphElem e0, e1, selfLoop = 0; 286 | std::unordered_map clmap; 287 | std::vector counter; 288 | 289 | const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); 290 | const GraphElem cc = currComm[i]; 291 | GraphWeight ccDegree; 292 | GraphElem ccSize; 293 | bool currCommIsLocal = false; 294 | bool targetCommIsLocal = false; 295 | 296 | // Current Community is local 297 | if (cc >= base && cc < bound) { 298 | ccDegree=localCinfo[cc-base].degree; 299 | ccSize=localCinfo[cc-base].size; 300 | currCommIsLocal=true; 301 | } else { 302 | // is remote 303 | std::map::const_iterator citer = remoteCinfo.find(cc); 304 | ccDegree = citer->second.degree; 305 | ccSize = citer->second.size; 306 | currCommIsLocal=false; 307 | } 308 | 309 | dg.edge_range(i, e0, e1); 310 | 311 | if (e0 != e1) { 312 | clmap.insert(std::unordered_map::value_type(cc, 0)); 313 | counter.push_back(0.0); 314 | 315 | selfLoop = distBuildLocalMapCounter(e0, e1, clmap, counter, dg, 316 | currComm, remoteComm, i, base, bound); 317 | 318 | clusterWeight[i] += counter[0]; 319 | 320 | localTarget = distGetMaxIndex(clmap, counter, selfLoop, localCinfo, remoteCinfo, 321 | vDegree[i], ccSize, ccDegree, cc, base, bound, constantForSecondTerm); 322 | } 323 | else 324 | localTarget = cc; 325 | 326 | // is the Target Local? 327 | if (localTarget >= base && localTarget < bound) 328 | targetCommIsLocal = true; 329 | 330 | // current and target comm are local - atomic updates to vectors 331 | if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && targetCommIsLocal) { 332 | 333 | #ifdef DEBUG_PRINTF 334 | assert( base < localTarget < bound); 335 | assert( base < cc < bound); 336 | assert( cc - base < localCupdate.size()); 337 | assert( localTarget - base < localCupdate.size()); 338 | #endif 339 | #pragma omp atomic update 340 | localCupdate[localTarget-base].degree += vDegree[i]; 341 | #pragma omp atomic update 342 | localCupdate[localTarget-base].size++; 343 | #pragma omp atomic update 344 | localCupdate[cc-base].degree -= vDegree[i]; 345 | #pragma omp atomic update 346 | localCupdate[cc-base].size--; 347 | } 348 | 349 | // current is local, target is not - do atomic on local, accumulate in Maps for remote 350 | if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && !targetCommIsLocal) { 351 | #pragma omp atomic update 352 | localCupdate[cc-base].degree -= vDegree[i]; 353 | #pragma omp atomic update 354 | localCupdate[cc-base].size--; 355 | 356 | // search target! 357 | std::map::iterator iter=remoteCupdate.find(localTarget); 358 | 359 | #pragma omp atomic update 360 | iter->second.degree += vDegree[i]; 361 | #pragma omp atomic update 362 | iter->second.size++; 363 | } 364 | 365 | // current is remote, target is local - accumulate for current, atomic on local 366 | if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && targetCommIsLocal) { 367 | #pragma omp atomic update 368 | localCupdate[localTarget-base].degree += vDegree[i]; 369 | #pragma omp atomic update 370 | localCupdate[localTarget-base].size++; 371 | 372 | // search current 373 | std::map::iterator iter=remoteCupdate.find(cc); 374 | 375 | #pragma omp atomic update 376 | iter->second.degree -= vDegree[i]; 377 | #pragma omp atomic update 378 | iter->second.size--; 379 | } 380 | 381 | // current and target are remote - accumulate for both 382 | if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && !targetCommIsLocal) { 383 | 384 | // search current 385 | std::map::iterator iter = remoteCupdate.find(cc); 386 | 387 | #pragma omp atomic update 388 | iter->second.degree -= vDegree[i]; 389 | #pragma omp atomic update 390 | iter->second.size--; 391 | 392 | // search target 393 | iter=remoteCupdate.find(localTarget); 394 | 395 | #pragma omp atomic update 396 | iter->second.degree += vDegree[i]; 397 | #pragma omp atomic update 398 | iter->second.size++; 399 | } 400 | 401 | #ifdef DEBUG_PRINTF 402 | assert(localTarget != -1); 403 | #endif 404 | targetComm[i] = localTarget; 405 | } // distExecuteLouvainIteration 406 | 407 | GraphWeight distComputeModularity(const Graph &g, std::vector &localCinfo, 408 | const std::vector &clusterWeight, 409 | const GraphWeight constantForSecondTerm, 410 | const int me) 411 | { 412 | const GraphElem nv = g.get_lnv(); 413 | MPI_Comm gcomm = g.get_comm(); 414 | 415 | GraphWeight le_la_xx[2]; 416 | GraphWeight e_a_xx[2] = {0.0, 0.0}; 417 | GraphWeight le_xx = 0.0, la2_x = 0.0; 418 | 419 | #ifdef DEBUG_PRINTF 420 | assert((clusterWeight.size() == nv)); 421 | #endif 422 | 423 | #ifdef OMP_SCHEDULE_RUNTIME 424 | #pragma omp parallel for default(shared), shared(clusterWeight, localCinfo), \ 425 | reduction(+: le_xx), reduction(+: la2_x) schedule(runtime) 426 | #else 427 | #pragma omp parallel for default(shared), shared(clusterWeight, localCinfo), \ 428 | reduction(+: le_xx), reduction(+: la2_x) schedule(static) 429 | #endif 430 | for (GraphElem i = 0L; i < nv; i++) { 431 | le_xx += clusterWeight[i]; 432 | la2_x += static_cast(localCinfo[i].degree) * static_cast(localCinfo[i].degree); 433 | } 434 | le_la_xx[0] = le_xx; 435 | le_la_xx[1] = la2_x; 436 | 437 | #ifdef DEBUG_PRINTF 438 | const double t0 = MPI_Wtime(); 439 | #endif 440 | 441 | MPI_Allreduce(le_la_xx, e_a_xx, 2, MPI_WEIGHT_TYPE, MPI_SUM, gcomm); 442 | 443 | #ifdef DEBUG_PRINTF 444 | const double t1 = MPI_Wtime(); 445 | #endif 446 | 447 | GraphWeight currMod = std::fabs((e_a_xx[0] * constantForSecondTerm) - 448 | (e_a_xx[1] * constantForSecondTerm * constantForSecondTerm)); 449 | #ifdef DEBUG_PRINTF 450 | std::cout << "[" << me << "]le_xx: " << le_xx << ", la2_x: " << la2_x << std::endl; 451 | std::cout << "[" << me << "]e_xx: " << e_a_xx[0] << ", a2_x: " << e_a_xx[1] << ", currMod: " << currMod << std::endl; 452 | std::cout << "[" << me << "]Reduction time: " << (t1 - t0) << std::endl; 453 | #endif 454 | 455 | return currMod; 456 | } // distComputeModularity 457 | 458 | void distUpdateLocalCinfo(std::vector &localCinfo, const std::vector &localCupdate) 459 | { 460 | size_t csz = localCinfo.size(); 461 | 462 | #ifdef OMP_SCHEDULE_RUNTIME 463 | #pragma omp for schedule(runtime) 464 | #else 465 | #pragma omp for schedule(static) 466 | #endif 467 | for (GraphElem i = 0L; i < csz; i++) { 468 | localCinfo[i].size += localCupdate[i].size; 469 | localCinfo[i].degree += localCupdate[i].degree; 470 | } 471 | } 472 | 473 | void distCleanCWandCU(const GraphElem nv, std::vector &clusterWeight, 474 | std::vector &localCupdate) 475 | { 476 | #ifdef OMP_SCHEDULE_RUNTIME 477 | #pragma omp for schedule(runtime) 478 | #else 479 | #pragma omp for schedule(static) 480 | #endif 481 | for (GraphElem i = 0L; i < nv; i++) { 482 | clusterWeight[i] = 0; 483 | localCupdate[i].degree = 0; 484 | localCupdate[i].size = 0; 485 | } 486 | } // distCleanCWandCU 487 | 488 | #if defined(USE_MPI_RMA) 489 | void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs, 490 | const size_t &ssz, const size_t &rsz, const std::vector &ssizes, 491 | const std::vector &rsizes, const std::vector &svdata, 492 | const std::vector &rvdata, const std::vector &currComm, 493 | const std::vector &localCinfo, std::map &remoteCinfo, 494 | std::unordered_map &remoteComm, std::map &remoteCupdate, 495 | const MPI_Win &commwin, const std::vector &disp) 496 | #else 497 | void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs, 498 | const size_t &ssz, const size_t &rsz, const std::vector &ssizes, 499 | const std::vector &rsizes, const std::vector &svdata, 500 | const std::vector &rvdata, const std::vector &currComm, 501 | const std::vector &localCinfo, std::map &remoteCinfo, 502 | std::unordered_map &remoteComm, std::map &remoteCupdate) 503 | #endif 504 | { 505 | #if defined(USE_MPI_RMA) 506 | std::vector scdata(ssz); 507 | #else 508 | std::vector rcdata(rsz), scdata(ssz); 509 | #endif 510 | GraphElem spos, rpos; 511 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR) 512 | std::vector< std::vector< GraphElem > > rcinfo(nprocs); 513 | #else 514 | std::vector > rcinfo(nprocs); 515 | #endif 516 | 517 | #if defined(USE_MPI_SENDRECV) 518 | #else 519 | std::vector rreqs(nprocs), sreqs(nprocs); 520 | #endif 521 | 522 | #ifdef DEBUG_PRINTF 523 | double t0, t1, ta = 0.0; 524 | #endif 525 | 526 | #if defined(USE_MPI_RMA) && !defined(USE_MPI_ACCUMULATE) 527 | int num_comm_procs; 528 | #endif 529 | 530 | #if defined(USE_MPI_RMA) && !defined(USE_MPI_ACCUMULATE) 531 | spos = 0; 532 | rpos = 0; 533 | std::vector comm_proc(nprocs); 534 | std::vector comm_proc_buf_disp(nprocs); 535 | 536 | /* Initialize all to -1 (unsure if necessary) */ 537 | for (int i = 0; i < nprocs; i++) { 538 | comm_proc[i] = -1; 539 | comm_proc_buf_disp[i] = -1; 540 | } 541 | 542 | num_comm_procs = 0; 543 | for (int i = 0; i < nprocs; i++) { 544 | if ((i != me) && (ssizes[i] > 0)) { 545 | comm_proc[num_comm_procs] = i; 546 | comm_proc_buf_disp[num_comm_procs] = spos; 547 | num_comm_procs++; 548 | } 549 | spos += ssizes[i]; 550 | rpos += rsizes[i]; 551 | } 552 | #endif 553 | 554 | const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); 555 | const GraphElem nv = dg.get_lnv(); 556 | MPI_Comm gcomm = dg.get_comm(); 557 | 558 | // Collects Communities of local vertices for remote nodes 559 | #ifdef OMP_SCHEDULE_RUNTIME 560 | #pragma omp parallel for shared(svdata, scdata, currComm) schedule(runtime) 561 | #else 562 | #pragma omp parallel for shared(svdata, scdata, currComm) schedule(static) 563 | #endif 564 | for (GraphElem i = 0; i < ssz; i++) { 565 | const GraphElem vertex = svdata[i]; 566 | #ifdef DEBUG_PRINTF 567 | assert((vertex >= base) && (vertex < bound)); 568 | #endif 569 | const GraphElem comm = currComm[vertex - base]; 570 | scdata[i] = comm; 571 | } 572 | 573 | std::vector rcsizes(nprocs), scsizes(nprocs); 574 | std::vector sinfo, rinfo; 575 | 576 | #ifdef DEBUG_PRINTF 577 | t0 = MPI_Wtime(); 578 | #endif 579 | #if !defined(USE_MPI_RMA) || defined(USE_MPI_ACCUMULATE) 580 | spos = 0; 581 | rpos = 0; 582 | #endif 583 | #if defined(USE_MPI_COLLECTIVES) 584 | std::vector scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs); 585 | for (int i = 0; i < nprocs; i++) { 586 | scnts[i] = ssizes[i]; 587 | rcnts[i] = rsizes[i]; 588 | sdispls[i] = spos; 589 | rdispls[i] = rpos; 590 | spos += scnts[i]; 591 | rpos += rcnts[i]; 592 | } 593 | scnts[me] = 0; 594 | rcnts[me] = 0; 595 | MPI_Alltoallv(scdata.data(), scnts.data(), sdispls.data(), 596 | MPI_GRAPH_TYPE, rcdata.data(), rcnts.data(), rdispls.data(), 597 | MPI_GRAPH_TYPE, gcomm); 598 | #elif defined(USE_MPI_RMA) 599 | #if defined(USE_MPI_ACCUMULATE) 600 | for (int i = 0; i < nprocs; i++) { 601 | if ((i != me) && (ssizes[i] > 0)) { 602 | MPI_Accumulate(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 603 | disp[i], ssizes[i], MPI_GRAPH_TYPE, MPI_REPLACE, commwin); 604 | } 605 | spos += ssizes[i]; 606 | rpos += rsizes[i]; 607 | } 608 | #else 609 | for (int i = 0; i < num_comm_procs; i++) { 610 | int target_rank = comm_proc[i]; 611 | MPI_Put(scdata.data() + comm_proc_buf_disp[i], ssizes[target_rank], MPI_GRAPH_TYPE, 612 | target_rank, disp[target_rank], ssizes[target_rank], MPI_GRAPH_TYPE, commwin); 613 | } 614 | #endif 615 | #elif defined(USE_MPI_SENDRECV) 616 | for (int i = 0; i < nprocs; i++) { 617 | if (i != me) 618 | MPI_Sendrecv(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 619 | rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 620 | gcomm, MPI_STATUSES_IGNORE); 621 | 622 | spos += ssizes[i]; 623 | rpos += rsizes[i]; 624 | } 625 | #else 626 | for (int i = 0; i < nprocs; i++) { 627 | if ((i != me) && (rsizes[i] > 0)) 628 | MPI_Irecv(rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 629 | CommunityTag, gcomm, &rreqs[i]); 630 | else 631 | rreqs[i] = MPI_REQUEST_NULL; 632 | 633 | rpos += rsizes[i]; 634 | } 635 | for (int i = 0; i < nprocs; i++) { 636 | if ((i != me) && (ssizes[i] > 0)) 637 | MPI_Isend(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 638 | CommunityTag, gcomm, &sreqs[i]); 639 | else 640 | sreqs[i] = MPI_REQUEST_NULL; 641 | 642 | spos += ssizes[i]; 643 | } 644 | 645 | MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); 646 | MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); 647 | #endif 648 | #ifdef DEBUG_PRINTF 649 | t1 = MPI_Wtime(); 650 | ta += (t1 - t0); 651 | #endif 652 | 653 | // reserve vectors 654 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR) 655 | for (GraphElem i = 0; i < nprocs; i++) { 656 | rcinfo[i].reserve(rpos); 657 | } 658 | #endif 659 | 660 | // fetch baseptr from MPI window 661 | #if defined(USE_MPI_RMA) 662 | MPI_Win_flush_all(commwin); 663 | MPI_Barrier(gcomm); 664 | 665 | GraphElem *rcbuf = nullptr; 666 | int flag = 0; 667 | MPI_Win_get_attr(commwin, MPI_WIN_BASE, &rcbuf, &flag); 668 | #endif 669 | 670 | remoteComm.clear(); 671 | for (GraphElem i = 0; i < rpos; i++) { 672 | 673 | #if defined(USE_MPI_RMA) 674 | const GraphElem comm = rcbuf[i]; 675 | #else 676 | const GraphElem comm = rcdata[i]; 677 | #endif 678 | 679 | remoteComm.insert(std::unordered_map::value_type(rvdata[i], comm)); 680 | const int tproc = dg.get_owner(comm); 681 | 682 | if (tproc != me) 683 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR) 684 | rcinfo[tproc].emplace_back(comm); 685 | #else 686 | rcinfo[tproc].insert(comm); 687 | #endif 688 | } 689 | 690 | for (GraphElem i = 0; i < nv; i++) { 691 | const GraphElem comm = currComm[i]; 692 | const int tproc = dg.get_owner(comm); 693 | 694 | if (tproc != me) 695 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR) 696 | rcinfo[tproc].emplace_back(comm); 697 | #else 698 | rcinfo[tproc].insert(comm); 699 | #endif 700 | } 701 | 702 | #ifdef DEBUG_PRINTF 703 | t0 = MPI_Wtime(); 704 | #endif 705 | GraphElem stcsz = 0, rtcsz = 0; 706 | 707 | #ifdef OMP_SCHEDULE_RUNTIME 708 | #pragma omp parallel for shared(scsizes, rcinfo) \ 709 | reduction(+:stcsz) schedule(runtime) 710 | #else 711 | #pragma omp parallel for shared(scsizes, rcinfo) \ 712 | reduction(+:stcsz) schedule(static) 713 | #endif 714 | for (int i = 0; i < nprocs; i++) { 715 | scsizes[i] = rcinfo[i].size(); 716 | stcsz += scsizes[i]; 717 | } 718 | 719 | MPI_Alltoall(scsizes.data(), 1, MPI_GRAPH_TYPE, rcsizes.data(), 720 | 1, MPI_GRAPH_TYPE, gcomm); 721 | 722 | #ifdef DEBUG_PRINTF 723 | t1 = MPI_Wtime(); 724 | ta += (t1 - t0); 725 | #endif 726 | 727 | #ifdef OMP_SCHEDULE_RUNTIME 728 | #pragma omp parallel for shared(rcsizes) \ 729 | reduction(+:rtcsz) schedule(runtime) 730 | #else 731 | #pragma omp parallel for shared(rcsizes) \ 732 | reduction(+:rtcsz) schedule(static) 733 | #endif 734 | for (int i = 0; i < nprocs; i++) { 735 | rtcsz += rcsizes[i]; 736 | } 737 | 738 | #ifdef DEBUG_PRINTF 739 | std::cout << "[" << me << "]Total communities to receive: " << rtcsz << std::endl; 740 | #endif 741 | #if defined(USE_MPI_COLLECTIVES) 742 | std::vector rcomms(rtcsz), scomms(stcsz); 743 | #else 744 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR) 745 | std::vector rcomms(rtcsz); 746 | #else 747 | std::vector rcomms(rtcsz), scomms(stcsz); 748 | #endif 749 | #endif 750 | sinfo.resize(rtcsz); 751 | rinfo.resize(stcsz); 752 | 753 | #ifdef DEBUG_PRINTF 754 | t0 = MPI_Wtime(); 755 | #endif 756 | spos = 0; 757 | rpos = 0; 758 | #if defined(USE_MPI_COLLECTIVES) 759 | for (int i = 0; i < nprocs; i++) { 760 | if (i != me) { 761 | std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); 762 | } 763 | scnts[i] = scsizes[i]; 764 | rcnts[i] = rcsizes[i]; 765 | sdispls[i] = spos; 766 | rdispls[i] = rpos; 767 | spos += scnts[i]; 768 | rpos += rcnts[i]; 769 | } 770 | scnts[me] = 0; 771 | rcnts[me] = 0; 772 | MPI_Alltoallv(scomms.data(), scnts.data(), sdispls.data(), 773 | MPI_GRAPH_TYPE, rcomms.data(), rcnts.data(), rdispls.data(), 774 | MPI_GRAPH_TYPE, gcomm); 775 | 776 | for (int i = 0; i < nprocs; i++) { 777 | if (i != me) { 778 | #ifdef OMP_SCHEDULE_RUNTIME 779 | #pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \ 780 | firstprivate(i, base), schedule(runtime) , if(rcsizes[i] >= 1000) 781 | #else 782 | #pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \ 783 | firstprivate(i, base), schedule(guided) , if(rcsizes[i] >= 1000) 784 | #endif 785 | for (GraphElem j = 0; j < rcsizes[i]; j++) { 786 | const GraphElem comm = rcomms[rdispls[i] + j]; 787 | sinfo[rdispls[i] + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; 788 | } 789 | } 790 | } 791 | 792 | MPI_Alltoallv(sinfo.data(), rcnts.data(), rdispls.data(), 793 | commType, rinfo.data(), scnts.data(), sdispls.data(), 794 | commType, gcomm); 795 | #else 796 | #if !defined(USE_MPI_SENDRECV) 797 | std::vector rcreqs(nprocs); 798 | #endif 799 | for (int i = 0; i < nprocs; i++) { 800 | if (i != me) { 801 | #if defined(USE_MPI_SENDRECV) 802 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR) 803 | MPI_Sendrecv(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 804 | rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 805 | gcomm, MPI_STATUSES_IGNORE); 806 | #else 807 | std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); 808 | MPI_Sendrecv(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 809 | rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 810 | gcomm, MPI_STATUSES_IGNORE); 811 | #endif 812 | #else 813 | if (rcsizes[i] > 0) { 814 | MPI_Irecv(rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, 815 | CommunityTag, gcomm, &rreqs[i]); 816 | } 817 | else 818 | rreqs[i] = MPI_REQUEST_NULL; 819 | 820 | if (scsizes[i] > 0) { 821 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR) 822 | MPI_Isend(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, 823 | CommunityTag, gcomm, &sreqs[i]); 824 | #else 825 | std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos); 826 | MPI_Isend(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, 827 | CommunityTag, gcomm, &sreqs[i]); 828 | #endif 829 | } 830 | else 831 | sreqs[i] = MPI_REQUEST_NULL; 832 | #endif 833 | } 834 | else { 835 | #if !defined(USE_MPI_SENDRECV) 836 | rreqs[i] = MPI_REQUEST_NULL; 837 | sreqs[i] = MPI_REQUEST_NULL; 838 | #endif 839 | } 840 | rpos += rcsizes[i]; 841 | spos += scsizes[i]; 842 | } 843 | 844 | spos = 0; 845 | rpos = 0; 846 | 847 | // poke progress on last isend/irecvs 848 | #if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) 849 | int tf = 0, id = 0; 850 | MPI_Testany(nprocs, sreqs.data(), &id, &tf, MPI_STATUS_IGNORE); 851 | #endif 852 | 853 | #if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && !defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) 854 | MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); 855 | MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); 856 | #endif 857 | 858 | for (int i = 0; i < nprocs; i++) { 859 | if (i != me) { 860 | #if defined(USE_MPI_SENDRECV) 861 | #ifdef OMP_SCHEDULE_RUNTIME 862 | #pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ 863 | firstprivate(i, rpos, base), schedule(runtime) , if(rcsizes[i] >= 1000) 864 | 865 | #else 866 | #pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \ 867 | firstprivate(i, rpos, base), schedule(guided) , if(rcsizes[i] >= 1000) 868 | #endif 869 | for (GraphElem j = 0; j < rcsizes[i]; j++) { 870 | const GraphElem comm = rcomms[rpos + j]; 871 | sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; 872 | } 873 | 874 | MPI_Sendrecv(sinfo.data() + rpos, rcsizes[i], commType, i, CommunityDataTag, 875 | rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 876 | gcomm, MPI_STATUSES_IGNORE); 877 | #else 878 | if (scsizes[i] > 0) { 879 | MPI_Irecv(rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 880 | gcomm, &rcreqs[i]); 881 | } 882 | else 883 | rcreqs[i] = MPI_REQUEST_NULL; 884 | 885 | // poke progress on last isend/irecvs 886 | #if defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP) 887 | int flag = 0, done = 0; 888 | while (!done) { 889 | MPI_Test(&sreqs[i], &flag, MPI_STATUS_IGNORE); 890 | MPI_Test(&rreqs[i], &flag, MPI_STATUS_IGNORE); 891 | if (flag) 892 | done = 1; 893 | } 894 | #endif 895 | 896 | #ifdef OMP_SCHEDULE_RUNTIME 897 | #pragma omp parallel for default(shared), shared(rcsizes, rcomms, localCinfo, sinfo), \ 898 | firstprivate(i, rpos, base), schedule(runtime) , if(rcsizes[i] >= 1000) 899 | #else 900 | #pragma omp parallel for default(shared), shared(rcsizes, rcomms, localCinfo, sinfo), \ 901 | firstprivate(i, rpos, base), schedule(guided) , if(rcsizes[i] >= 1000) 902 | #endif 903 | for (GraphElem j = 0; j < rcsizes[i]; j++) { 904 | const GraphElem comm = rcomms[rpos + j]; 905 | sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree}; 906 | } 907 | 908 | if (rcsizes[i] > 0) { 909 | MPI_Isend(sinfo.data() + rpos, rcsizes[i], commType, i, 910 | CommunityDataTag, gcomm, &sreqs[i]); 911 | } 912 | else 913 | sreqs[i] = MPI_REQUEST_NULL; 914 | #endif 915 | } 916 | else { 917 | #if !defined(USE_MPI_SENDRECV) 918 | rcreqs[i] = MPI_REQUEST_NULL; 919 | sreqs[i] = MPI_REQUEST_NULL; 920 | #endif 921 | } 922 | rpos += rcsizes[i]; 923 | spos += scsizes[i]; 924 | } 925 | 926 | #if !defined(USE_MPI_SENDRECV) 927 | MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); 928 | MPI_Waitall(nprocs, rcreqs.data(), MPI_STATUSES_IGNORE); 929 | #endif 930 | 931 | #endif 932 | 933 | #ifdef DEBUG_PRINTF 934 | t1 = MPI_Wtime(); 935 | ta += (t1 - t0); 936 | #endif 937 | 938 | remoteCinfo.clear(); 939 | remoteCupdate.clear(); 940 | 941 | for (GraphElem i = 0; i < stcsz; i++) { 942 | const GraphElem ccomm = rinfo[i].community; 943 | 944 | Comm comm; 945 | 946 | comm.size = rinfo[i].size; 947 | comm.degree = rinfo[i].degree; 948 | 949 | remoteCinfo.insert(std::map::value_type(ccomm, comm)); 950 | remoteCupdate.insert(std::map::value_type(ccomm, Comm())); 951 | } 952 | } // end fillRemoteCommunities 953 | 954 | void createCommunityMPIType() 955 | { 956 | CommInfo cinfo; 957 | 958 | MPI_Aint begin, community, size, degree; 959 | 960 | MPI_Get_address(&cinfo, &begin); 961 | MPI_Get_address(&cinfo.community, &community); 962 | MPI_Get_address(&cinfo.size, &size); 963 | MPI_Get_address(&cinfo.degree, °ree); 964 | 965 | int blens[] = { 1, 1, 1 }; 966 | MPI_Aint displ[] = { community - begin, size - begin, degree - begin }; 967 | MPI_Datatype types[] = { MPI_GRAPH_TYPE, MPI_GRAPH_TYPE, MPI_WEIGHT_TYPE }; 968 | 969 | MPI_Type_create_struct(3, blens, displ, types, &commType); 970 | MPI_Type_commit(&commType); 971 | } // createCommunityMPIType 972 | 973 | void destroyCommunityMPIType() 974 | { 975 | MPI_Type_free(&commType); 976 | } // destroyCommunityMPIType 977 | 978 | void updateRemoteCommunities(const Graph &dg, std::vector &localCinfo, 979 | const std::map &remoteCupdate, 980 | const int me, const int nprocs) 981 | { 982 | const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); 983 | std::vector> remoteArray(nprocs); 984 | MPI_Comm gcomm = dg.get_comm(); 985 | 986 | // FIXME TODO can we use TBB::concurrent_vector instead, 987 | // to make this parallel; first we have to get rid of maps 988 | for (std::map::const_iterator iter = remoteCupdate.begin(); iter != remoteCupdate.end(); iter++) { 989 | const GraphElem i = iter->first; 990 | const Comm &curr = iter->second; 991 | 992 | const int tproc = dg.get_owner(i); 993 | 994 | #ifdef DEBUG_PRINTF 995 | assert(tproc != me); 996 | #endif 997 | CommInfo rcinfo; 998 | 999 | rcinfo.community = i; 1000 | rcinfo.size = curr.size; 1001 | rcinfo.degree = curr.degree; 1002 | 1003 | remoteArray[tproc].push_back(rcinfo); 1004 | } 1005 | 1006 | std::vector send_sz(nprocs), recv_sz(nprocs); 1007 | 1008 | #ifdef DEBUG_PRINTF 1009 | GraphWeight tc = 0.0; 1010 | const double t0 = MPI_Wtime(); 1011 | #endif 1012 | 1013 | #ifdef OMP_SCHEDULE_RUNTIME 1014 | #pragma omp parallel for schedule(runtime) 1015 | #else 1016 | #pragma omp parallel for schedule(static) 1017 | #endif 1018 | for (int i = 0; i < nprocs; i++) { 1019 | send_sz[i] = remoteArray[i].size(); 1020 | } 1021 | 1022 | MPI_Alltoall(send_sz.data(), 1, MPI_GRAPH_TYPE, recv_sz.data(), 1023 | 1, MPI_GRAPH_TYPE, gcomm); 1024 | 1025 | #ifdef DEBUG_PRINTF 1026 | const double t1 = MPI_Wtime(); 1027 | tc += (t1 - t0); 1028 | #endif 1029 | 1030 | GraphElem rcnt = 0, scnt = 0; 1031 | #ifdef OMP_SCHEDULE_RUNTIME 1032 | #pragma omp parallel for shared(recv_sz, send_sz) \ 1033 | reduction(+:rcnt, scnt) schedule(runtime) 1034 | #else 1035 | #pragma omp parallel for shared(recv_sz, send_sz) \ 1036 | reduction(+:rcnt, scnt) schedule(static) 1037 | #endif 1038 | for (int i = 0; i < nprocs; i++) { 1039 | rcnt += recv_sz[i]; 1040 | scnt += send_sz[i]; 1041 | } 1042 | #ifdef DEBUG_PRINTF 1043 | std::cout << "[" << me << "]Total number of remote communities to update: " << scnt << std::endl; 1044 | #endif 1045 | 1046 | GraphElem currPos = 0; 1047 | std::vector rdata(rcnt); 1048 | 1049 | #ifdef DEBUG_PRINTF 1050 | const double t2 = MPI_Wtime(); 1051 | #endif 1052 | #if defined(USE_MPI_SENDRECV) 1053 | for (int i = 0; i < nprocs; i++) { 1054 | if (i != me) 1055 | MPI_Sendrecv(remoteArray[i].data(), send_sz[i], commType, i, CommunityDataTag, 1056 | rdata.data() + currPos, recv_sz[i], commType, i, CommunityDataTag, 1057 | gcomm, MPI_STATUSES_IGNORE); 1058 | 1059 | currPos += recv_sz[i]; 1060 | } 1061 | #else 1062 | std::vector sreqs(nprocs), rreqs(nprocs); 1063 | for (int i = 0; i < nprocs; i++) { 1064 | if ((i != me) && (recv_sz[i] > 0)) 1065 | MPI_Irecv(rdata.data() + currPos, recv_sz[i], commType, i, 1066 | CommunityDataTag, gcomm, &rreqs[i]); 1067 | else 1068 | rreqs[i] = MPI_REQUEST_NULL; 1069 | 1070 | currPos += recv_sz[i]; 1071 | } 1072 | 1073 | for (int i = 0; i < nprocs; i++) { 1074 | if ((i != me) && (send_sz[i] > 0)) 1075 | MPI_Isend(remoteArray[i].data(), send_sz[i], commType, i, 1076 | CommunityDataTag, gcomm, &sreqs[i]); 1077 | else 1078 | sreqs[i] = MPI_REQUEST_NULL; 1079 | } 1080 | 1081 | MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); 1082 | MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); 1083 | #endif 1084 | #ifdef DEBUG_PRINTF 1085 | const double t3 = MPI_Wtime(); 1086 | std::cout << "[" << me << "]Update remote community MPI time: " << (t3 - t2) << std::endl; 1087 | #endif 1088 | 1089 | #ifdef OMP_SCHEDULE_RUNTIME 1090 | #pragma omp parallel for shared(rdata, localCinfo) schedule(runtime) 1091 | #else 1092 | #pragma omp parallel for shared(rdata, localCinfo) schedule(dynamic) 1093 | #endif 1094 | for (GraphElem i = 0; i < rcnt; i++) { 1095 | const CommInfo &curr = rdata[i]; 1096 | 1097 | #ifdef DEBUG_PRINTF 1098 | assert(dg.get_owner(curr.community) == me); 1099 | #endif 1100 | localCinfo[curr.community-base].size += curr.size; 1101 | localCinfo[curr.community-base].degree += curr.degree; 1102 | } 1103 | } // updateRemoteCommunities 1104 | 1105 | // initial setup before Louvain iteration begins 1106 | #if defined(USE_MPI_RMA) 1107 | void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz, 1108 | std::vector &ssizes, std::vector &rsizes, 1109 | std::vector &svdata, std::vector &rvdata, 1110 | const int me, const int nprocs, MPI_Win &commwin) 1111 | #else 1112 | void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz, 1113 | std::vector &ssizes, std::vector &rsizes, 1114 | std::vector &svdata, std::vector &rvdata, 1115 | const int me, const int nprocs) 1116 | #endif 1117 | { 1118 | const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); 1119 | const GraphElem nv = dg.get_lnv(); 1120 | MPI_Comm gcomm = dg.get_comm(); 1121 | 1122 | #ifdef USE_OPENMP_LOCK 1123 | std::vector locks(nprocs); 1124 | for (int i = 0; i < nprocs; i++) 1125 | omp_init_lock(&locks[i]); 1126 | #endif 1127 | std::vector> parray(nprocs); 1128 | 1129 | #ifdef USE_OPENMP_LOCK 1130 | #pragma omp parallel default(shared), shared(dg, locks, parray), firstprivate(me) 1131 | #else 1132 | #pragma omp parallel default(shared), shared(dg, parray), firstprivate(me) 1133 | #endif 1134 | { 1135 | #ifdef OMP_SCHEDULE_RUNTIME 1136 | #pragma omp for schedule(runtime) 1137 | #else 1138 | #pragma omp for schedule(guided) 1139 | #endif 1140 | for (GraphElem i = 0; i < nv; i++) { 1141 | GraphElem e0, e1; 1142 | 1143 | dg.edge_range(i, e0, e1); 1144 | 1145 | for (GraphElem j = e0; j < e1; j++) { 1146 | const Edge &edge = dg.get_edge(j); 1147 | const int tproc = dg.get_owner(edge.tail_); 1148 | 1149 | if (tproc != me) { 1150 | #ifdef USE_OPENMP_LOCK 1151 | omp_set_lock(&locks[tproc]); 1152 | #else 1153 | lock(); 1154 | #endif 1155 | parray[tproc].insert(edge.tail_); 1156 | #ifdef USE_OPENMP_LOCK 1157 | omp_unset_lock(&locks[tproc]); 1158 | #else 1159 | unlock(); 1160 | #endif 1161 | } 1162 | } 1163 | } 1164 | } 1165 | 1166 | #ifdef USE_OPENMP_LOCK 1167 | for (int i = 0; i < nprocs; i++) { 1168 | omp_destroy_lock(&locks[i]); 1169 | } 1170 | #endif 1171 | 1172 | rsizes.resize(nprocs); 1173 | ssizes.resize(nprocs); 1174 | ssz = 0, rsz = 0; 1175 | 1176 | int pproc = 0; 1177 | // TODO FIXME parallelize this loop 1178 | for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { 1179 | ssz += iter->size(); 1180 | ssizes[pproc] = iter->size(); 1181 | pproc++; 1182 | } 1183 | 1184 | MPI_Alltoall(ssizes.data(), 1, MPI_GRAPH_TYPE, rsizes.data(), 1185 | 1, MPI_GRAPH_TYPE, gcomm); 1186 | 1187 | GraphElem rsz_r = 0; 1188 | #ifdef OMP_SCHEDULE_RUNTIME 1189 | #pragma omp parallel for shared(rsizes) \ 1190 | reduction(+:rsz_r) schedule(runtime) 1191 | #else 1192 | #pragma omp parallel for shared(rsizes) \ 1193 | reduction(+:rsz_r) schedule(static) 1194 | #endif 1195 | for (int i = 0; i < nprocs; i++) 1196 | rsz_r += rsizes[i]; 1197 | rsz = rsz_r; 1198 | 1199 | svdata.resize(ssz); 1200 | rvdata.resize(rsz); 1201 | 1202 | GraphElem cpos = 0, rpos = 0; 1203 | pproc = 0; 1204 | 1205 | #if defined(USE_MPI_COLLECTIVES) 1206 | std::vector scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs); 1207 | 1208 | for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { 1209 | std::copy(iter->begin(), iter->end(), svdata.begin() + cpos); 1210 | 1211 | scnts[pproc] = iter->size(); 1212 | rcnts[pproc] = rsizes[pproc]; 1213 | sdispls[pproc] = cpos; 1214 | rdispls[pproc] = rpos; 1215 | cpos += iter->size(); 1216 | rpos += rcnts[pproc]; 1217 | 1218 | pproc++; 1219 | } 1220 | 1221 | scnts[me] = 0; 1222 | rcnts[me] = 0; 1223 | MPI_Alltoallv(svdata.data(), scnts.data(), sdispls.data(), 1224 | MPI_GRAPH_TYPE, rvdata.data(), rcnts.data(), rdispls.data(), 1225 | MPI_GRAPH_TYPE, gcomm); 1226 | #else 1227 | std::vector rreqs(nprocs), sreqs(nprocs); 1228 | for (int i = 0; i < nprocs; i++) { 1229 | if ((i != me) && (rsizes[i] > 0)) 1230 | MPI_Irecv(rvdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 1231 | VertexTag, gcomm, &rreqs[i]); 1232 | else 1233 | rreqs[i] = MPI_REQUEST_NULL; 1234 | 1235 | rpos += rsizes[i]; 1236 | } 1237 | 1238 | for (std::vector>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) { 1239 | std::copy(iter->begin(), iter->end(), svdata.begin() + cpos); 1240 | 1241 | if ((me != pproc) && (iter->size() > 0)) 1242 | MPI_Isend(svdata.data() + cpos, iter->size(), MPI_GRAPH_TYPE, pproc, 1243 | VertexTag, gcomm, &sreqs[pproc]); 1244 | else 1245 | sreqs[pproc] = MPI_REQUEST_NULL; 1246 | 1247 | cpos += iter->size(); 1248 | pproc++; 1249 | } 1250 | 1251 | MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE); 1252 | MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE); 1253 | #endif 1254 | 1255 | std::swap(svdata, rvdata); 1256 | std::swap(ssizes, rsizes); 1257 | std::swap(ssz, rsz); 1258 | 1259 | // create MPI window for communities 1260 | #if defined(USE_MPI_RMA) 1261 | GraphElem *ptr = nullptr; 1262 | MPI_Info info = MPI_INFO_NULL; 1263 | #if defined(USE_MPI_ACCUMULATE) 1264 | MPI_Info_create(&info); 1265 | MPI_Info_set(info, "accumulate_ordering", "none"); 1266 | MPI_Info_set(info, "accumulate_ops", "same_op"); 1267 | #endif 1268 | MPI_Win_allocate(rsz*sizeof(GraphElem), sizeof(GraphElem), 1269 | info, gcomm, &ptr, &commwin); 1270 | MPI_Win_lock_all(MPI_MODE_NOCHECK, commwin); 1271 | #endif 1272 | } // exchangeVertexReqs 1273 | 1274 | #if defined(USE_MPI_RMA) 1275 | GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg, 1276 | size_t &ssz, size_t &rsz, std::vector &ssizes, std::vector &rsizes, 1277 | std::vector &svdata, std::vector &rvdata, const GraphWeight lower, 1278 | const GraphWeight thresh, int &iters, MPI_Win &commwin) 1279 | #else 1280 | GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg, 1281 | size_t &ssz, size_t &rsz, std::vector &ssizes, std::vector &rsizes, 1282 | std::vector &svdata, std::vector &rvdata, const GraphWeight lower, 1283 | const GraphWeight thresh, int &iters) 1284 | #endif 1285 | { 1286 | std::vector pastComm, currComm, targetComm; 1287 | std::vector vDegree; 1288 | std::vector clusterWeight; 1289 | std::vector localCinfo, localCupdate; 1290 | 1291 | std::unordered_map remoteComm; 1292 | std::map remoteCinfo, remoteCupdate; 1293 | 1294 | const GraphElem nv = dg.get_lnv(); 1295 | MPI_Comm gcomm = dg.get_comm(); 1296 | 1297 | GraphWeight constantForSecondTerm; 1298 | GraphWeight prevMod = lower; 1299 | GraphWeight currMod = -1.0; 1300 | int numIters = 0; 1301 | 1302 | distInitLouvain(dg, pastComm, currComm, vDegree, clusterWeight, localCinfo, 1303 | localCupdate, constantForSecondTerm, me); 1304 | targetComm.resize(nv); 1305 | 1306 | #ifdef DEBUG_PRINTF 1307 | std::cout << "[" << me << "]constantForSecondTerm: " << constantForSecondTerm << std::endl; 1308 | if (me == 0) 1309 | std::cout << "Threshold: " << thresh << std::endl; 1310 | #endif 1311 | const GraphElem base = dg.get_base(me), bound = dg.get_bound(me); 1312 | 1313 | #ifdef DEBUG_PRINTF 1314 | double t0, t1; 1315 | t0 = MPI_Wtime(); 1316 | #endif 1317 | 1318 | // setup vertices and communities 1319 | #if defined(USE_MPI_RMA) 1320 | exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 1321 | svdata, rvdata, me, nprocs, commwin); 1322 | 1323 | // store the remote displacements 1324 | std::vector disp(nprocs); 1325 | MPI_Exscan(ssizes.data(), (GraphElem*)disp.data(), nprocs, MPI_GRAPH_TYPE, 1326 | MPI_SUM, gcomm); 1327 | #else 1328 | exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 1329 | svdata, rvdata, me, nprocs); 1330 | #endif 1331 | 1332 | #ifdef DEBUG_PRINTF 1333 | t1 = MPI_Wtime(); 1334 | std::cout << "[" << me << "]Initial communication setup time before Louvain iteration (in s): " << (t1 - t0) << std::endl; 1335 | #endif 1336 | 1337 | // start Louvain iteration 1338 | while(true) { 1339 | #ifdef DEBUG_PRINTF 1340 | const double t2 = MPI_Wtime(); 1341 | if (me == 0) 1342 | std::cout << "Starting Louvain iteration: " << numIters << std::endl; 1343 | #endif 1344 | numIters++; 1345 | 1346 | #ifdef DEBUG_PRINTF 1347 | t0 = MPI_Wtime(); 1348 | #endif 1349 | 1350 | #if defined(USE_MPI_RMA) 1351 | fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 1352 | rsizes, svdata, rvdata, currComm, localCinfo, 1353 | remoteCinfo, remoteComm, remoteCupdate, 1354 | commwin, disp); 1355 | #else 1356 | fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 1357 | rsizes, svdata, rvdata, currComm, localCinfo, 1358 | remoteCinfo, remoteComm, remoteCupdate); 1359 | #endif 1360 | 1361 | #ifdef DEBUG_PRINTF 1362 | t1 = MPI_Wtime(); 1363 | std::cout << "[" << me << "]Remote community map size: " << remoteComm.size() << std::endl; 1364 | std::cout << "[" << me << "]Iteration communication time: " << (t1 - t0) << std::endl; 1365 | #endif 1366 | 1367 | #ifdef DEBUG_PRINTF 1368 | t0 = MPI_Wtime(); 1369 | #endif 1370 | 1371 | #pragma omp parallel default(shared), shared(clusterWeight, localCupdate, currComm, targetComm, \ 1372 | vDegree, localCinfo, remoteCinfo, remoteComm, pastComm, dg, remoteCupdate), \ 1373 | firstprivate(constantForSecondTerm, me) 1374 | { 1375 | distCleanCWandCU(nv, clusterWeight, localCupdate); 1376 | 1377 | #ifdef OMP_SCHEDULE_RUNTIME 1378 | #pragma omp for schedule(runtime) 1379 | #else 1380 | #pragma omp for schedule(guided) 1381 | #endif 1382 | for (GraphElem i = 0; i < nv; i++) { 1383 | distExecuteLouvainIteration(i, dg, currComm, targetComm, vDegree, localCinfo, 1384 | localCupdate, remoteComm, remoteCinfo, remoteCupdate, 1385 | constantForSecondTerm, clusterWeight, me); 1386 | } 1387 | } 1388 | 1389 | #pragma omp parallel default(none), shared(localCinfo, localCupdate) 1390 | { 1391 | distUpdateLocalCinfo(localCinfo, localCupdate); 1392 | } 1393 | 1394 | // communicate remote communities 1395 | updateRemoteCommunities(dg, localCinfo, remoteCupdate, me, nprocs); 1396 | 1397 | // compute modularity 1398 | currMod = distComputeModularity(dg, localCinfo, clusterWeight, constantForSecondTerm, me); 1399 | 1400 | // exit criteria 1401 | if (currMod - prevMod < thresh) 1402 | break; 1403 | 1404 | prevMod = currMod; 1405 | if (prevMod < lower) 1406 | prevMod = lower; 1407 | 1408 | #ifdef OMP_SCHEDULE_RUNTIME 1409 | #pragma omp parallel for default(shared) \ 1410 | shared(pastComm, currComm, targetComm) \ 1411 | schedule(runtime) 1412 | #else 1413 | #pragma omp parallel for default(shared) \ 1414 | shared(pastComm, currComm, targetComm) \ 1415 | schedule(static) 1416 | #endif 1417 | for (GraphElem i = 0; i < nv; i++) { 1418 | GraphElem tmp = pastComm[i]; 1419 | pastComm[i] = currComm[i]; 1420 | currComm[i] = targetComm[i]; 1421 | targetComm[i] = tmp; 1422 | } 1423 | } // end of Louvain iteration 1424 | 1425 | #if defined(USE_MPI_RMA) 1426 | MPI_Win_unlock_all(commwin); 1427 | MPI_Win_free(&commwin); 1428 | #endif 1429 | 1430 | iters = numIters; 1431 | 1432 | vDegree.clear(); 1433 | pastComm.clear(); 1434 | currComm.clear(); 1435 | targetComm.clear(); 1436 | clusterWeight.clear(); 1437 | localCinfo.clear(); 1438 | localCupdate.clear(); 1439 | 1440 | return prevMod; 1441 | } // distLouvainMethod plain 1442 | 1443 | #endif // __DSPL 1444 | -------------------------------------------------------------------------------- /graph.hpp: -------------------------------------------------------------------------------- 1 | // *********************************************************************** 2 | // 3 | // miniVite 4 | // 5 | // *********************************************************************** 6 | // 7 | // Copyright (2018) Battelle Memorial Institute 8 | // All rights reserved. 9 | // 10 | // Redistribution and use in source and binary forms, with or without 11 | // modification, are permitted provided that the following conditions 12 | // are met: 13 | // 14 | // 1. Redistributions of source code must retain the above copyright 15 | // notice, this list of conditions and the following disclaimer. 16 | // 17 | // 2. Redistributions in binary form must reproduce the above copyright 18 | // notice, this list of conditions and the following disclaimer in the 19 | // documentation and/or other materials provided with the distribution. 20 | // 21 | // 3. Neither the name of the copyright holder nor the names of its 22 | // contributors may be used to endorse or promote products derived from 23 | // this software without specific prior written permission. 24 | // 25 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 26 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 27 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 28 | // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 29 | // COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 30 | // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 31 | // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 32 | // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 33 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 34 | // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 35 | // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 36 | // POSSIBILITY OF SUCH DAMAGE. 37 | // 38 | // ************************************************************************ 39 | 40 | #pragma once 41 | #ifndef GRAPH_HPP 42 | #define GRAPH_HPP 43 | 44 | #include 45 | #include 46 | #include 47 | #include 48 | #include 49 | #include 50 | #include 51 | #include 52 | #include 53 | 54 | #include 55 | 56 | #include "utils.hpp" 57 | 58 | unsigned seed; 59 | 60 | struct Edge 61 | { 62 | GraphElem tail_; 63 | GraphWeight weight_; 64 | 65 | Edge(): tail_(-1), weight_(0.0) {} 66 | }; 67 | 68 | struct EdgeTuple 69 | { 70 | GraphElem ij_[2]; 71 | GraphWeight w_; 72 | 73 | EdgeTuple(GraphElem i, GraphElem j, GraphWeight w): 74 | ij_{i, j}, w_(w) 75 | {} 76 | EdgeTuple(GraphElem i, GraphElem j): 77 | ij_{i, j}, w_(1.0) 78 | {} 79 | EdgeTuple(): 80 | ij_{-1, -1}, w_(0.0) 81 | {} 82 | }; 83 | 84 | // per process graph instance 85 | class Graph 86 | { 87 | public: 88 | Graph(): 89 | lnv_(-1), lne_(-1), nv_(-1), 90 | ne_(-1), comm_(MPI_COMM_WORLD) 91 | { 92 | MPI_Comm_size(comm_, &size_); 93 | MPI_Comm_rank(comm_, &rank_); 94 | } 95 | 96 | Graph(GraphElem lnv, GraphElem lne, 97 | GraphElem nv, GraphElem ne, 98 | MPI_Comm comm=MPI_COMM_WORLD): 99 | lnv_(lnv), lne_(lne), 100 | nv_(nv), ne_(ne), 101 | comm_(comm) 102 | { 103 | MPI_Comm_size(comm_, &size_); 104 | MPI_Comm_rank(comm_, &rank_); 105 | 106 | edge_indices_.resize(lnv_+1, 0); 107 | edge_list_.resize(lne_); // this is usually populated later 108 | 109 | parts_.resize(size_+1); 110 | parts_[0] = 0; 111 | 112 | for (GraphElem i = 1; i < size_+1; i++) 113 | parts_[i] = ((nv_ * i) / size_); 114 | } 115 | 116 | ~Graph() 117 | { 118 | edge_list_.clear(); 119 | edge_indices_.clear(); 120 | parts_.clear(); 121 | } 122 | 123 | // update vertex partition information 124 | void repart(std::vector const& parts) 125 | { memcpy(parts_.data(), parts.data(), sizeof(GraphElem)*(size_+1)); } 126 | 127 | // TODO FIXME put asserts like the following 128 | // everywhere function member of Graph class 129 | void set_edge_index(GraphElem const vertex, GraphElem const e0) 130 | { 131 | #if defined(DEBUG_BUILD) 132 | assert((vertex >= 0) && (vertex <= lnv_)); 133 | assert((e0 >= 0) && (e0 <= lne_)); 134 | edge_indices_.at(vertex) = e0; 135 | #else 136 | edge_indices_[vertex] = e0; 137 | #endif 138 | } 139 | 140 | void edge_range(GraphElem const vertex, GraphElem& e0, 141 | GraphElem& e1) const 142 | { 143 | e0 = edge_indices_[vertex]; 144 | e1 = edge_indices_[vertex+1]; 145 | } 146 | 147 | // collective 148 | void set_nedges(GraphElem lne) 149 | { 150 | lne_ = lne; 151 | edge_list_.resize(lne_); 152 | 153 | // compute total number of edges 154 | ne_ = 0; 155 | MPI_Allreduce(&lne_, &ne_, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_); 156 | } 157 | 158 | GraphElem get_base(const int rank) const 159 | { return parts_[rank]; } 160 | 161 | GraphElem get_bound(const int rank) const 162 | { return parts_[rank+1]; } 163 | 164 | GraphElem get_range(const int rank) const 165 | { return (parts_[rank+1] - parts_[rank] + 1); } 166 | 167 | int get_owner(const GraphElem vertex) const 168 | { 169 | const std::vector::const_iterator iter = 170 | std::upper_bound(parts_.begin(), parts_.end(), vertex); 171 | 172 | return (iter - parts_.begin() - 1); 173 | } 174 | 175 | GraphElem get_lnv() const { return lnv_; } 176 | GraphElem get_lne() const { return lne_; } 177 | GraphElem get_nv() const { return nv_; } 178 | GraphElem get_ne() const { return ne_; } 179 | MPI_Comm get_comm() const { return comm_; } 180 | 181 | // return edge and active info 182 | // ---------------------------- 183 | 184 | Edge const& get_edge(GraphElem const index) const 185 | { return edge_list_[index]; } 186 | 187 | Edge& set_edge(GraphElem const index) 188 | { return edge_list_[index]; } 189 | 190 | // local <--> global index translation 191 | // ----------------------------------- 192 | GraphElem local_to_global(GraphElem idx) 193 | { return (idx + get_base(rank_)); } 194 | 195 | GraphElem global_to_local(GraphElem idx) 196 | { return (idx - get_base(rank_)); } 197 | 198 | // w.r.t passed rank 199 | GraphElem local_to_global(GraphElem idx, int rank) 200 | { return (idx + get_base(rank)); } 201 | 202 | GraphElem global_to_local(GraphElem idx, int rank) 203 | { return (idx - get_base(rank)); } 204 | 205 | // print edge list (with weights) 206 | void print(bool print_weight = true) const 207 | { 208 | if (lne_ < MAX_PRINT_NEDGE) 209 | { 210 | for (int p = 0; p < size_; p++) 211 | { 212 | MPI_Barrier(comm_); 213 | if (p == rank_) 214 | { 215 | std::cout << "###############" << std::endl; 216 | std::cout << "Process #" << p << ": " << std::endl; 217 | std::cout << "###############" << std::endl; 218 | GraphElem base = get_base(p); 219 | for (GraphElem i = 0; i < lnv_; i++) 220 | { 221 | GraphElem e0, e1; 222 | edge_range(i, e0, e1); 223 | if (print_weight) { // print weights (default) 224 | for (GraphElem e = e0; e < e1; e++) 225 | { 226 | Edge const& edge = get_edge(e); 227 | std::cout << i+base << " " << edge.tail_ << " " << edge.weight_ << std::endl; 228 | } 229 | } 230 | else { // don't print weights 231 | for (GraphElem e = e0; e < e1; e++) 232 | { 233 | Edge const& edge = get_edge(e); 234 | std::cout << i+base << " " << edge.tail_ << std::endl; 235 | } 236 | } 237 | } 238 | MPI_Barrier(comm_); 239 | } 240 | } 241 | } 242 | else 243 | { 244 | if (rank_ == 0) 245 | std::cout << "Graph size per process is {" << lnv_ << ", " << lne_ << 246 | "}, which will overwhelm STDOUT." << std::endl; 247 | } 248 | } 249 | 250 | // print statistics about edge distribution 251 | void print_dist_stats() 252 | { 253 | long sumdeg = 0, maxdeg = 0; 254 | long lne = (long) lne_; 255 | 256 | MPI_Reduce(&lne, &sumdeg, 1, MPI_LONG, MPI_SUM, 0, comm_); 257 | MPI_Reduce(&lne, &maxdeg, 1, MPI_LONG, MPI_MAX, 0, comm_); 258 | 259 | long my_sq = lne*lne; 260 | long sum_sq = 0; 261 | MPI_Reduce(&my_sq, &sum_sq, 1, MPI_LONG, MPI_SUM, 0, comm_); 262 | 263 | double average = (double) sumdeg / size_; 264 | double avg_sq = (double) sum_sq / size_; 265 | double var = avg_sq - (average*average); 266 | double stddev = sqrt(var); 267 | 268 | MPI_Barrier(comm_); 269 | 270 | if (rank_ == 0) 271 | { 272 | std::cout << std::endl; 273 | std::cout << "-------------------------------------------------------" << std::endl; 274 | std::cout << "Graph edge distribution characteristics" << std::endl; 275 | std::cout << "-------------------------------------------------------" << std::endl; 276 | std::cout << "Number of vertices: " << nv_ << std::endl; 277 | std::cout << "Number of edges: " << ne_ << std::endl; 278 | std::cout << "Maximum number of edges: " << maxdeg << std::endl; 279 | std::cout << "Average number of edges: " << average << std::endl; 280 | std::cout << "Expected value of X^2: " << avg_sq << std::endl; 281 | std::cout << "Variance: " << var << std::endl; 282 | std::cout << "Standard deviation: " << stddev << std::endl; 283 | std::cout << "-------------------------------------------------------" << std::endl; 284 | 285 | } 286 | } 287 | 288 | // public variables 289 | std::vector edge_indices_; 290 | std::vector edge_list_; 291 | private: 292 | GraphElem lnv_, lne_, nv_, ne_; 293 | std::vector parts_; 294 | MPI_Comm comm_; 295 | int rank_, size_; 296 | }; 297 | 298 | // read in binary edge list files 299 | // using MPI I/O 300 | class BinaryEdgeList 301 | { 302 | public: 303 | BinaryEdgeList() : 304 | M_(-1), N_(-1), 305 | M_local_(-1), N_local_(-1), 306 | comm_(MPI_COMM_WORLD) 307 | {} 308 | BinaryEdgeList(MPI_Comm comm) : 309 | M_(-1), N_(-1), 310 | M_local_(-1), N_local_(-1), 311 | comm_(comm) 312 | {} 313 | 314 | // read a file and return a graph 315 | Graph* read(int me, int nprocs, int ranks_per_node, std::string file) 316 | { 317 | int file_open_error; 318 | MPI_File fh; 319 | MPI_Status status; 320 | 321 | // specify the number of aggregates 322 | MPI_Info info; 323 | MPI_Info_create(&info); 324 | int naggr = (ranks_per_node > 1) ? (nprocs/ranks_per_node) : ranks_per_node; 325 | if (naggr >= nprocs) 326 | naggr = 1; 327 | std::stringstream tmp_str; 328 | tmp_str << naggr; 329 | std::string str = tmp_str.str(); 330 | MPI_Info_set(info, "cb_nodes", str.c_str()); 331 | 332 | file_open_error = MPI_File_open(comm_, file.c_str(), MPI_MODE_RDONLY, info, &fh); 333 | MPI_Info_free(&info); 334 | 335 | if (file_open_error != MPI_SUCCESS) 336 | { 337 | std::cout << " Error opening file! " << std::endl; 338 | MPI_Abort(comm_, -99); 339 | } 340 | 341 | // read the dimensions 342 | MPI_File_read_all(fh, &M_, sizeof(GraphElem), MPI_BYTE, &status); 343 | MPI_File_read_all(fh, &N_, sizeof(GraphElem), MPI_BYTE, &status); 344 | M_local_ = ((M_*(me + 1)) / nprocs) - ((M_*me) / nprocs); 345 | 346 | // create local graph 347 | Graph *g = new Graph(M_local_, 0, M_, N_); 348 | 349 | // Let N = array length and P = number of processors. 350 | // From j = 0 to P-1, 351 | // Starting point of array on processor j = floor(N * j / P) 352 | // Length of array on processor j = floor(N * (j + 1) / P) - floor(N * j / P) 353 | 354 | uint64_t tot_bytes=(M_local_+1)*sizeof(GraphElem); 355 | MPI_Offset offset = 2*sizeof(GraphElem) + ((M_*me) / nprocs)*sizeof(GraphElem); 356 | 357 | // read in INT_MAX increments if total byte size is > INT_MAX 358 | 359 | if (tot_bytes < INT_MAX) 360 | MPI_File_read_at(fh, offset, &g->edge_indices_[0], tot_bytes, MPI_BYTE, &status); 361 | else 362 | { 363 | int chunk_bytes=INT_MAX; 364 | uint8_t *curr_pointer = (uint8_t*) &g->edge_indices_[0]; 365 | uint64_t transf_bytes = 0; 366 | 367 | while (transf_bytes < tot_bytes) 368 | { 369 | MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status); 370 | transf_bytes += chunk_bytes; 371 | offset += chunk_bytes; 372 | curr_pointer += chunk_bytes; 373 | 374 | if ((tot_bytes - transf_bytes) < INT_MAX) 375 | chunk_bytes = tot_bytes - transf_bytes; 376 | } 377 | } 378 | 379 | N_local_ = g->edge_indices_[M_local_] - g->edge_indices_[0]; 380 | g->set_nedges(N_local_); 381 | 382 | tot_bytes = N_local_*(sizeof(Edge)); 383 | offset = 2*sizeof(GraphElem) + (M_+1)*sizeof(GraphElem) + g->edge_indices_[0]*(sizeof(Edge)); 384 | 385 | if (tot_bytes < INT_MAX) 386 | MPI_File_read_at(fh, offset, &g->edge_list_[0], tot_bytes, MPI_BYTE, &status); 387 | else 388 | { 389 | int chunk_bytes=INT_MAX; 390 | uint8_t *curr_pointer = (uint8_t*)&g->edge_list_[0]; 391 | uint64_t transf_bytes = 0; 392 | 393 | while (transf_bytes < tot_bytes) 394 | { 395 | MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status); 396 | transf_bytes += chunk_bytes; 397 | offset += chunk_bytes; 398 | curr_pointer += chunk_bytes; 399 | 400 | if ((tot_bytes - transf_bytes) < INT_MAX) 401 | chunk_bytes = (tot_bytes - transf_bytes); 402 | } 403 | } 404 | 405 | MPI_File_close(&fh); 406 | 407 | for(GraphElem i=1; i < M_local_+1; i++) 408 | g->edge_indices_[i] -= g->edge_indices_[0]; 409 | g->edge_indices_[0] = 0; 410 | 411 | return g; 412 | } 413 | 414 | // find a distribution such that every 415 | // process own equal number of edges (serial) 416 | void find_balanced_num_edges(int nprocs, std::string file, std::vector& mbins) 417 | { 418 | FILE *fp; 419 | GraphElem nv, ne; // #vertices, #edges 420 | std::vector nbins(nprocs,0); 421 | 422 | fp = fopen(file.c_str(), "rb"); 423 | if (fp == NULL) 424 | { 425 | std::cout<< " Error opening file! " << std::endl; 426 | return; 427 | } 428 | 429 | // read nv and ne 430 | fread(&nv, sizeof(GraphElem), 1, fp); 431 | fread(&ne, sizeof(GraphElem), 1, fp); 432 | 433 | // bin capacity 434 | GraphElem nbcap = (ne / nprocs), ecount_idx, past_ecount_idx = 0; 435 | int p = 0; 436 | 437 | for (GraphElem m = 0; m < nv; m++) 438 | { 439 | fread(&ecount_idx, sizeof(GraphElem), 1, fp); 440 | 441 | // bins[p] >= capacity only for the last process 442 | if ((nbins[p] < nbcap) || (p == (nprocs - 1))) 443 | nbins[p] += (ecount_idx - past_ecount_idx); 444 | 445 | // increment p as long as p is not the last process 446 | // worst case: excess edges piled up on (p-1) 447 | if ((nbins[p] >= nbcap) && (p < (nprocs - 1))) 448 | p++; 449 | 450 | mbins[p+1]++; 451 | past_ecount_idx = ecount_idx; 452 | } 453 | 454 | fclose(fp); 455 | 456 | // prefix sum to store indices 457 | for (int k = 1; k < nprocs+1; k++) 458 | mbins[k] += mbins[k-1]; 459 | 460 | nbins.clear(); 461 | } 462 | 463 | // read a file and return a graph 464 | // uses a balanced distribution 465 | // (approximately equal #edges per process) 466 | Graph* read_balanced(int me, int nprocs, int ranks_per_node, std::string file) 467 | { 468 | int file_open_error; 469 | MPI_File fh; 470 | MPI_Status status; 471 | std::vector mbins(nprocs+1,0); 472 | 473 | // find #vertices per process such that 474 | // each process roughly owns equal #edges 475 | if (me == 0) 476 | { 477 | find_balanced_num_edges(nprocs, file, mbins); 478 | std::cout << "Trying to achieve equal edge distribution across processes." << std::endl; 479 | } 480 | MPI_Barrier(comm_); 481 | MPI_Bcast(mbins.data(), nprocs+1, MPI_GRAPH_TYPE, 0, comm_); 482 | 483 | // specify the number of aggregates 484 | MPI_Info info; 485 | MPI_Info_create(&info); 486 | int naggr = (ranks_per_node > 1) ? (nprocs/ranks_per_node) : ranks_per_node; 487 | if (naggr >= nprocs) 488 | naggr = 1; 489 | std::stringstream tmp_str; 490 | tmp_str << naggr; 491 | std::string str = tmp_str.str(); 492 | MPI_Info_set(info, "cb_nodes", str.c_str()); 493 | 494 | file_open_error = MPI_File_open(comm_, file.c_str(), MPI_MODE_RDONLY, info, &fh); 495 | MPI_Info_free(&info); 496 | 497 | if (file_open_error != MPI_SUCCESS) 498 | { 499 | std::cout << " Error opening file! " << std::endl; 500 | MPI_Abort(comm_, -99); 501 | } 502 | 503 | // read the dimensions 504 | MPI_File_read_all(fh, &M_, sizeof(GraphElem), MPI_BYTE, &status); 505 | MPI_File_read_all(fh, &N_, sizeof(GraphElem), MPI_BYTE, &status); 506 | M_local_ = mbins[me+1] - mbins[me]; 507 | 508 | // create local graph 509 | Graph *g = new Graph(M_local_, 0, M_, N_); 510 | // readjust parts with new vertex partition 511 | g->repart(mbins); 512 | 513 | uint64_t tot_bytes=(M_local_+1)*sizeof(GraphElem); 514 | MPI_Offset offset = 2*sizeof(GraphElem) + mbins[me]*sizeof(GraphElem); 515 | 516 | // read in INT_MAX increments if total byte size is > INT_MAX 517 | if (tot_bytes < INT_MAX) 518 | MPI_File_read_at(fh, offset, &g->edge_indices_[0], tot_bytes, MPI_BYTE, &status); 519 | else 520 | { 521 | int chunk_bytes=INT_MAX; 522 | uint8_t *curr_pointer = (uint8_t*) &g->edge_indices_[0]; 523 | uint64_t transf_bytes = 0; 524 | 525 | while (transf_bytes < tot_bytes) 526 | { 527 | MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status); 528 | transf_bytes += chunk_bytes; 529 | offset += chunk_bytes; 530 | curr_pointer += chunk_bytes; 531 | 532 | if ((tot_bytes - transf_bytes) < INT_MAX) 533 | chunk_bytes = tot_bytes - transf_bytes; 534 | } 535 | } 536 | 537 | N_local_ = g->edge_indices_[M_local_] - g->edge_indices_[0]; 538 | g->set_nedges(N_local_); 539 | 540 | tot_bytes = N_local_*(sizeof(Edge)); 541 | offset = 2*sizeof(GraphElem) + (M_+1)*sizeof(GraphElem) + g->edge_indices_[0]*(sizeof(Edge)); 542 | 543 | if (tot_bytes < INT_MAX) 544 | MPI_File_read_at(fh, offset, &g->edge_list_[0], tot_bytes, MPI_BYTE, &status); 545 | else 546 | { 547 | int chunk_bytes=INT_MAX; 548 | uint8_t *curr_pointer = (uint8_t*)&g->edge_list_[0]; 549 | uint64_t transf_bytes = 0; 550 | 551 | while (transf_bytes < tot_bytes) 552 | { 553 | MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status); 554 | transf_bytes += chunk_bytes; 555 | offset += chunk_bytes; 556 | curr_pointer += chunk_bytes; 557 | 558 | if ((tot_bytes - transf_bytes) < INT_MAX) 559 | chunk_bytes = (tot_bytes - transf_bytes); 560 | } 561 | } 562 | 563 | MPI_File_close(&fh); 564 | 565 | for(GraphElem i=1; i < M_local_+1; i++) 566 | g->edge_indices_[i] -= g->edge_indices_[0]; 567 | g->edge_indices_[0] = 0; 568 | 569 | mbins.clear(); 570 | 571 | return g; 572 | } 573 | 574 | private: 575 | GraphElem M_; 576 | GraphElem N_; 577 | GraphElem M_local_; 578 | GraphElem N_local_; 579 | MPI_Comm comm_; 580 | }; 581 | 582 | // RGG graph 583 | // 1D vertex distribution 584 | class GenerateRGG 585 | { 586 | public: 587 | GenerateRGG(GraphElem nv, MPI_Comm comm = MPI_COMM_WORLD) 588 | { 589 | nv_ = nv; 590 | comm_ = comm; 591 | 592 | MPI_Comm_rank(comm_, &rank_); 593 | MPI_Comm_size(comm_, &nprocs_); 594 | 595 | // neighbors 596 | up_ = down_ = MPI_PROC_NULL; 597 | if (nprocs_ > 1) { 598 | if (rank_ > 0 && rank_ < (nprocs_ - 1)) { 599 | up_ = rank_ - 1; 600 | down_ = rank_ + 1; 601 | } 602 | if (rank_ == 0) 603 | down_ = 1; 604 | if (rank_ == (nprocs_ - 1)) 605 | up_ = rank_ - 1; 606 | } 607 | 608 | n_ = nv_ / nprocs_; 609 | 610 | // check if number of nodes is divisible by #processes 611 | if ((nv_ % nprocs_) != 0) { 612 | if (rank_ == 0) { 613 | std::cout << "[ERROR] Number of vertices must be perfectly divisible by number of processes." << std::endl; 614 | std::cout << "Exiting..." << std::endl; 615 | } 616 | MPI_Abort(comm_, -99); 617 | } 618 | 619 | // check if processes are power of 2 620 | if (!is_pwr2(nprocs_)) { 621 | if (rank_ == 0) { 622 | std::cout << "[ERROR] Number of processes must be a power of 2." << std::endl; 623 | std::cout << "Exiting..." << std::endl; 624 | } 625 | MPI_Abort(comm_, -99); 626 | } 627 | 628 | // calculate r(n) 629 | GraphWeight rc = sqrt((GraphWeight)log(nv)/(GraphWeight)(PI*nv)); 630 | GraphWeight rt = sqrt((GraphWeight)2.0736/(GraphWeight)nv); 631 | rn_ = (rc + rt)/(GraphWeight)2.0; 632 | 633 | assert(((GraphWeight)1.0/(GraphWeight)nprocs_) > rn_); 634 | 635 | MPI_Barrier(comm_); 636 | } 637 | 638 | // create RGG and returns Graph 639 | // TODO FIXME use OpenMP wherever possible 640 | // use Euclidean distance as edge weight 641 | // for random edges, choose from (0,1) 642 | // otherwise, use unit weight throughout 643 | Graph* generate(bool isLCG, bool unitEdgeWeight = true, GraphWeight randomEdgePercent = 0.0) 644 | { 645 | // Generate random coordinate points 646 | std::vector X, Y, X_up, Y_up, X_down, Y_down; 647 | 648 | if (isLCG) 649 | X.resize(2*n_); 650 | else 651 | X.resize(n_); 652 | 653 | Y.resize(n_); 654 | 655 | if (up_ != MPI_PROC_NULL) { 656 | X_up.resize(n_); 657 | Y_up.resize(n_); 658 | } 659 | 660 | if (down_ != MPI_PROC_NULL) { 661 | X_down.resize(n_); 662 | Y_down.resize(n_); 663 | } 664 | 665 | // create local graph 666 | Graph *g = new Graph(n_, 0, nv_, nv_); 667 | 668 | // generate random number within range 669 | // X: 0, 1 670 | // Y: rank_*1/p, (rank_+1)*1/p, 671 | GraphWeight rec_np = (GraphWeight)(1.0/(GraphWeight)nprocs_); 672 | GraphWeight lo = rank_* rec_np; 673 | GraphWeight hi = lo + rec_np; 674 | assert(hi > lo); 675 | 676 | // measure the time to generate random numbers 677 | MPI_Barrier(MPI_COMM_WORLD); 678 | double st = MPI_Wtime(); 679 | 680 | if (!isLCG) { 681 | // set seed (declared an extern in utils) 682 | seed = (unsigned)reseeder(1); 683 | 684 | #if defined(PRINT_RANDOM_XY_COORD) 685 | for (int k = 0; k < nprocs_; k++) { 686 | if (k == rank_) { 687 | std::cout << "Random number generated on Process#" << k << " :" << std::endl; 688 | for (GraphElem i = 0; i < n_; i++) { 689 | X[i] = genRandom(0.0, 1.0); 690 | Y[i] = genRandom(lo, hi); 691 | std::cout << "X, Y: " << X[i] << ", " << Y[i] << std::endl; 692 | } 693 | } 694 | MPI_Barrier(comm_); 695 | } 696 | #else 697 | for (GraphElem i = 0; i < n_; i++) { 698 | X[i] = genRandom(0.0, 1.0); 699 | Y[i] = genRandom(lo, hi); 700 | } 701 | #endif 702 | } 703 | else { // LCG 704 | // X | Y 705 | // e.g seeds: 1741, 3821 706 | // create LCG object 707 | // seed to generate x0 708 | LCG xr(/*seed*/1, X.data(), 2*n_, comm_); 709 | 710 | // generate random numbers between 0-1 711 | xr.generate(); 712 | 713 | // rescale xr further between lo-hi 714 | // and put the numbers in Y taking 715 | // from X[n] 716 | xr.rescale(Y.data(), n_, lo); 717 | 718 | #if defined(PRINT_RANDOM_XY_COORD) 719 | for (int k = 0; k < nprocs_; k++) { 720 | if (k == rank_) { 721 | std::cout << "Random number generated on Process#" << k << " :" << std::endl; 722 | for (GraphElem i = 0; i < n_; i++) { 723 | std::cout << "X, Y: " << X[i] << ", " << Y[i] << std::endl; 724 | } 725 | } 726 | MPI_Barrier(comm_); 727 | } 728 | #endif 729 | } 730 | 731 | double et = MPI_Wtime(); 732 | double tt = et - st; 733 | double tot_tt = 0.0; 734 | MPI_Reduce(&tt, &tot_tt, 1, MPI_DOUBLE, MPI_SUM, 0, comm_); 735 | 736 | if (rank_ == 0) { 737 | double tot_avg = (tot_tt/nprocs_); 738 | std::cout << "Average time to generate " << 2*n_ 739 | << " random numbers using LCG (in s): " 740 | << tot_avg << std::endl; 741 | } 742 | 743 | // ghost(s) 744 | 745 | // cross edges, each processor 746 | // communicates with up or/and down 747 | // neighbor only 748 | std::vector sendup_edges, senddn_edges; 749 | std::vector recvup_edges, recvdn_edges; 750 | std::vector edgeList; 751 | 752 | // counts, indexing: [2] = {up - 0, down - 1} 753 | // TODO can't we use MPI_INT 754 | std::array send_sizes = {0, 0}, recv_sizes = {0, 0}; 755 | #if defined(CHECK_NUM_EDGES) 756 | GraphElem numEdges = 0; 757 | #endif 758 | // local 759 | for (GraphElem i = 0; i < n_; i++) { 760 | for (GraphElem j = i + 1; j < n_; j++) { 761 | // euclidean distance: 762 | // 2D: sqrt((px-qx)^2 + (py-qy)^2) 763 | GraphWeight dx = X[i] - X[j]; 764 | GraphWeight dy = Y[i] - Y[j]; 765 | GraphWeight ed = sqrt(dx*dx + dy*dy); 766 | // are the two vertices within the range? 767 | if (ed <= rn_) { 768 | // local to global index 769 | const GraphElem g_i = g->local_to_global(i); 770 | const GraphElem g_j = g->local_to_global(j); 771 | 772 | if (!unitEdgeWeight) { 773 | edgeList.emplace_back(i, g_j, ed); 774 | edgeList.emplace_back(j, g_i, ed); 775 | } 776 | else { 777 | edgeList.emplace_back(i, g_j); 778 | edgeList.emplace_back(j, g_i); 779 | } 780 | #if defined(CHECK_NUM_EDGES) 781 | numEdges += 2; 782 | #endif 783 | 784 | g->edge_indices_[i+1]++; 785 | g->edge_indices_[j+1]++; 786 | } 787 | } 788 | } 789 | 790 | MPI_Barrier(comm_); 791 | 792 | // communicate ghost coordinates with neighbors 793 | 794 | const int x_ndown = X_down.empty() ? 0 : n_; 795 | const int y_ndown = Y_down.empty() ? 0 : n_; 796 | const int x_nup = X_up.empty() ? 0 : n_; 797 | const int y_nup = Y_up.empty() ? 0 : n_; 798 | 799 | MPI_Sendrecv(X.data(), n_, MPI_WEIGHT_TYPE, up_, SR_X_UP_TAG, 800 | X_down.data(), x_ndown, MPI_WEIGHT_TYPE, down_, SR_X_UP_TAG, 801 | comm_, MPI_STATUS_IGNORE); 802 | MPI_Sendrecv(X.data(), n_, MPI_WEIGHT_TYPE, down_, SR_X_DOWN_TAG, 803 | X_up.data(), x_nup, MPI_WEIGHT_TYPE, up_, SR_X_DOWN_TAG, 804 | comm_, MPI_STATUS_IGNORE); 805 | MPI_Sendrecv(Y.data(), n_, MPI_WEIGHT_TYPE, up_, SR_Y_UP_TAG, 806 | Y_down.data(), y_ndown, MPI_WEIGHT_TYPE, down_, SR_Y_UP_TAG, 807 | comm_, MPI_STATUS_IGNORE); 808 | MPI_Sendrecv(Y.data(), n_, MPI_WEIGHT_TYPE, down_, SR_Y_DOWN_TAG, 809 | Y_up.data(), y_nup, MPI_WEIGHT_TYPE, up_, SR_Y_DOWN_TAG, 810 | comm_, MPI_STATUS_IGNORE); 811 | 812 | // exchange ghost vertices / cross edges 813 | if (nprocs_ > 1) { 814 | if (up_ != MPI_PROC_NULL) { 815 | 816 | for (GraphElem i = 0; i < n_; i++) { 817 | for (GraphElem j = i + 1; j < n_; j++) { 818 | GraphWeight dx = X[i] - X_up[j]; 819 | GraphWeight dy = Y[i] - Y_up[j]; 820 | GraphWeight ed = sqrt(dx*dx + dy*dy); 821 | 822 | if (ed <= rn_) { 823 | const GraphElem g_i = g->local_to_global(i); 824 | const GraphElem g_j = j + up_*n_; 825 | 826 | if (!unitEdgeWeight) { 827 | sendup_edges.emplace_back(j, g_i, ed); 828 | edgeList.emplace_back(i, g_j, ed); 829 | } 830 | else { 831 | sendup_edges.emplace_back(j, g_i); 832 | edgeList.emplace_back(i, g_j); 833 | } 834 | #if defined(CHECK_NUM_EDGES) 835 | numEdges++; 836 | #endif 837 | g->edge_indices_[i+1]++; 838 | } 839 | } 840 | } 841 | 842 | // send up sizes 843 | send_sizes[0] = sendup_edges.size(); 844 | } 845 | 846 | if (down_ != MPI_PROC_NULL) { 847 | 848 | for (GraphElem i = 0; i < n_; i++) { 849 | for (GraphElem j = i + 1; j < n_; j++) { 850 | GraphWeight dx = X[i] - X_down[j]; 851 | GraphWeight dy = Y[i] - Y_down[j]; 852 | GraphWeight ed = sqrt(dx*dx + dy*dy); 853 | 854 | if (ed <= rn_) { 855 | const GraphElem g_i = g->local_to_global(i); 856 | const GraphElem g_j = j + down_*n_; 857 | 858 | if (!unitEdgeWeight) { 859 | senddn_edges.emplace_back(j, g_i, ed); 860 | edgeList.emplace_back(i, g_j, ed); 861 | } 862 | else { 863 | senddn_edges.emplace_back(j, g_i); 864 | edgeList.emplace_back(i, g_j); 865 | } 866 | #if defined(CHECK_NUM_EDGES) 867 | numEdges++; 868 | #endif 869 | g->edge_indices_[i+1]++; 870 | } 871 | } 872 | } 873 | 874 | // send down sizes 875 | send_sizes[1] = senddn_edges.size(); 876 | } 877 | } 878 | 879 | MPI_Barrier(comm_); 880 | 881 | // communicate ghost vertices with neighbors 882 | // send/recv buffer sizes 883 | 884 | MPI_Sendrecv(&send_sizes[0], 1, MPI_GRAPH_TYPE, up_, SR_SIZES_UP_TAG, 885 | &recv_sizes[1], 1, MPI_GRAPH_TYPE, down_, SR_SIZES_UP_TAG, 886 | comm_, MPI_STATUS_IGNORE); 887 | MPI_Sendrecv(&send_sizes[1], 1, MPI_GRAPH_TYPE, down_, SR_SIZES_DOWN_TAG, 888 | &recv_sizes[0], 1, MPI_GRAPH_TYPE, up_, SR_SIZES_DOWN_TAG, 889 | comm_, MPI_STATUS_IGNORE); 890 | 891 | // resize recv buffers 892 | 893 | if (recv_sizes[0] > 0) 894 | recvup_edges.resize(recv_sizes[0]); 895 | if (recv_sizes[1] > 0) 896 | recvdn_edges.resize(recv_sizes[1]); 897 | 898 | // send/recv both up and down 899 | 900 | MPI_Sendrecv(sendup_edges.data(), send_sizes[0]*sizeof(struct EdgeTuple), MPI_BYTE, 901 | up_, SR_UP_TAG, recvdn_edges.data(), recv_sizes[1]*sizeof(struct EdgeTuple), 902 | MPI_BYTE, down_, SR_UP_TAG, comm_, MPI_STATUS_IGNORE); 903 | MPI_Sendrecv(senddn_edges.data(), send_sizes[1]*sizeof(struct EdgeTuple), MPI_BYTE, 904 | down_, SR_DOWN_TAG, recvup_edges.data(), recv_sizes[0]*sizeof(struct EdgeTuple), 905 | MPI_BYTE, up_, SR_DOWN_TAG, comm_, MPI_STATUS_IGNORE); 906 | 907 | // update local #edges 908 | 909 | // down 910 | if (down_ != MPI_PROC_NULL) { 911 | for (GraphElem i = 0; i < recv_sizes[1]; i++) { 912 | #if defined(CHECK_NUM_EDGES) 913 | numEdges++; 914 | #endif 915 | if (!unitEdgeWeight) 916 | edgeList.emplace_back(recvdn_edges[i].ij_[0], recvdn_edges[i].ij_[1], recvdn_edges[i].w_); 917 | else 918 | edgeList.emplace_back(recvdn_edges[i].ij_[0], recvdn_edges[i].ij_[1]); 919 | g->edge_indices_[recvdn_edges[i].ij_[0]+1]++; 920 | } 921 | } 922 | 923 | // up 924 | if (up_ != MPI_PROC_NULL) { 925 | for (GraphElem i = 0; i < recv_sizes[0]; i++) { 926 | #if defined(CHECK_NUM_EDGES) 927 | numEdges++; 928 | #endif 929 | if (!unitEdgeWeight) 930 | edgeList.emplace_back(recvup_edges[i].ij_[0], recvup_edges[i].ij_[1], recvup_edges[i].w_); 931 | else 932 | edgeList.emplace_back(recvup_edges[i].ij_[0], recvup_edges[i].ij_[1]); 933 | g->edge_indices_[recvup_edges[i].ij_[0]+1]++; 934 | } 935 | } 936 | 937 | // add random edges based on 938 | // randomEdgePercent 939 | if (randomEdgePercent > 0.0) { 940 | const GraphElem pnedges = (edgeList.size()/2); 941 | GraphElem tot_pnedges = 0; 942 | 943 | MPI_Allreduce(&pnedges, &tot_pnedges, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_); 944 | 945 | // extra #edges per process 946 | const GraphElem nrande = (((GraphElem)(randomEdgePercent * (GraphWeight)tot_pnedges))/100); 947 | GraphElem pnrande = 0.0; 948 | 949 | // TODO FIXME try to ensure a fair edge distibution 950 | if (nrande < nprocs_) { 951 | if (rank_ == (nprocs_ - 1)) 952 | pnrande += nrande; 953 | } 954 | else { 955 | pnrande = nrande / nprocs_; 956 | const GraphElem pnrem = nrande % nprocs_; 957 | if (pnrem != 0) { 958 | if (rank_ == (nprocs_ - 1)) 959 | pnrande += pnrem; 960 | } 961 | } 962 | 963 | // add pnrande edges 964 | 965 | // send/recv buffers 966 | std::vector> rand_edges(nprocs_); 967 | std::vector sendrand_edges, recvrand_edges; 968 | 969 | // outgoing/incoming send/recv sizes 970 | // TODO FIXME if number of randomly added edges are above 971 | // INT_MAX, weird things will happen, fix it 972 | std::vector sendrand_sizes(nprocs_), recvrand_sizes(nprocs_); 973 | 974 | #if defined(PRINT_EXTRA_NEDGES) 975 | int extraEdges = 0; 976 | #endif 977 | 978 | #if defined(DEBUG_PRINTF) 979 | for (int i = 0; i < nprocs_; i++) { 980 | if (i == rank_) { 981 | std::cout << "[" << i << "]Target process for random edge insertion between " 982 | << lo << " and " << hi << std::endl; 983 | } 984 | MPI_Barrier(comm_); 985 | } 986 | #endif 987 | // make sure each process has a 988 | // different seed this time since 989 | // we want random edges 990 | unsigned rande_seed = (unsigned)(time(0)^getpid()); 991 | GraphWeight weight = 1.0; 992 | std::hash reh; 993 | 994 | // cannot use genRandom if it's already been seeded 995 | std::default_random_engine re(rande_seed); 996 | std::uniform_int_distribution IR, JR; 997 | std::uniform_real_distribution IJW; 998 | 999 | for (GraphElem k = 0; k < pnrande; k++) { 1000 | 1001 | // randomly pick start/end vertex and target from my list 1002 | const GraphElem i = (GraphElem)IR(re, std::uniform_int_distribution::param_type{0, (n_- 1)}); 1003 | const GraphElem g_j = (GraphElem)JR(re, std::uniform_int_distribution::param_type{0, (nv_- 1)}); 1004 | const int target = g->get_owner(g_j); 1005 | const GraphElem j = g->global_to_local(g_j, target); // local 1006 | 1007 | if (i == j) 1008 | continue; 1009 | 1010 | const GraphElem g_i = g->local_to_global(i); 1011 | 1012 | // check for duplicates prior to edgeList insertion 1013 | auto found = std::find_if(edgeList.begin(), edgeList.end(), 1014 | [&](EdgeTuple const& et) 1015 | { return ((et.ij_[0] == i) && (et.ij_[1] == g_j)); }); 1016 | 1017 | // OK to insert, not in list 1018 | if (found == std::end(edgeList)) { 1019 | 1020 | // calculate weight 1021 | if (!unitEdgeWeight) { 1022 | if (target == rank_) { 1023 | GraphWeight dx = X[i] - X[j]; 1024 | GraphWeight dy = Y[i] - Y[j]; 1025 | weight = sqrt(dx*dx + dy*dy); 1026 | } 1027 | else if (target == up_) { 1028 | GraphWeight dx = X[i] - X_up[j]; 1029 | GraphWeight dy = Y[i] - Y_up[j]; 1030 | weight = sqrt(dx*dx + dy*dy); 1031 | } 1032 | else if (target == down_) { 1033 | GraphWeight dx = X[i] - X_down[j]; 1034 | GraphWeight dy = Y[i] - Y_down[j]; 1035 | weight = sqrt(dx*dx + dy*dy); 1036 | } 1037 | else { 1038 | unsigned randw_seed = reh((GraphElem)(g_i*nv_+g_j)); 1039 | std::default_random_engine rew(randw_seed); 1040 | weight = (GraphWeight)IJW(rew, std::uniform_real_distribution::param_type{0.01, 1.0}); 1041 | } 1042 | } 1043 | 1044 | rand_edges[target].emplace_back(j, g_i, weight); 1045 | sendrand_sizes[target]++; 1046 | 1047 | #if defined(PRINT_EXTRA_NEDGES) 1048 | extraEdges++; 1049 | #endif 1050 | #if defined(CHECK_NUM_EDGES) 1051 | numEdges++; 1052 | #endif 1053 | edgeList.emplace_back(i, g_j, weight); 1054 | g->edge_indices_[i+1]++; 1055 | } 1056 | } 1057 | 1058 | #if defined(PRINT_EXTRA_NEDGES) 1059 | int totExtraEdges = 0; 1060 | MPI_Reduce(&extraEdges, &totExtraEdges, 1, MPI_INT, MPI_SUM, 0, comm_); 1061 | if (rank_ == 0) 1062 | std::cout << "Adding extra " << totExtraEdges << " edges while trying to incorporate " 1063 | << randomEdgePercent << "%" << " extra edges globally." << std::endl; 1064 | #endif 1065 | 1066 | MPI_Barrier(comm_); 1067 | 1068 | // communicate ghosts edges 1069 | MPI_Request rande_sreq; 1070 | 1071 | MPI_Ialltoall(sendrand_sizes.data(), 1, MPI_INT, 1072 | recvrand_sizes.data(), 1, MPI_INT, comm_, 1073 | &rande_sreq); 1074 | 1075 | // send data if outgoing size > 0 1076 | for (int p = 0; p < nprocs_; p++) { 1077 | sendrand_edges.insert(sendrand_edges.end(), 1078 | rand_edges[p].begin(), rand_edges[p].end()); 1079 | } 1080 | 1081 | MPI_Wait(&rande_sreq, MPI_STATUS_IGNORE); 1082 | 1083 | // total recvbuffer size 1084 | const int rcount = std::accumulate(recvrand_sizes.begin(), recvrand_sizes.end(), 0); 1085 | recvrand_edges.resize(rcount); 1086 | 1087 | // alltoallv for incoming data 1088 | // TODO FIXME make sure size of extra edges is 1089 | // within INT limits 1090 | 1091 | int rpos = 0, spos = 0; 1092 | std::vector sdispls(nprocs_), rdispls(nprocs_); 1093 | 1094 | for (int p = 0; p < nprocs_; p++) { 1095 | 1096 | sendrand_sizes[p] *= sizeof(struct EdgeTuple); 1097 | recvrand_sizes[p] *= sizeof(struct EdgeTuple); 1098 | 1099 | sdispls[p] = spos; 1100 | rdispls[p] = rpos; 1101 | 1102 | spos += sendrand_sizes[p]; 1103 | rpos += recvrand_sizes[p]; 1104 | } 1105 | 1106 | MPI_Alltoallv(sendrand_edges.data(), sendrand_sizes.data(), sdispls.data(), 1107 | MPI_BYTE, recvrand_edges.data(), recvrand_sizes.data(), rdispls.data(), 1108 | MPI_BYTE, comm_); 1109 | 1110 | // update local edge list 1111 | for (int i = 0; i < rcount; i++) { 1112 | #if defined(CHECK_NUM_EDGES) 1113 | numEdges++; 1114 | #endif 1115 | edgeList.emplace_back(recvrand_edges[i].ij_[0], recvrand_edges[i].ij_[1], recvrand_edges[i].w_); 1116 | g->edge_indices_[recvrand_edges[i].ij_[0]+1]++; 1117 | } 1118 | 1119 | sendrand_edges.clear(); 1120 | recvrand_edges.clear(); 1121 | rand_edges.clear(); 1122 | } // end of (conditional) random edges addition 1123 | 1124 | MPI_Barrier(comm_); 1125 | 1126 | // set graph edge indices 1127 | 1128 | std::vector ecTmp(n_+1); 1129 | std::partial_sum(g->edge_indices_.begin(), g->edge_indices_.end(), ecTmp.begin()); 1130 | g->edge_indices_ = ecTmp; 1131 | 1132 | for(GraphElem i = 1; i < n_+1; i++) 1133 | g->edge_indices_[i] -= g->edge_indices_[0]; 1134 | g->edge_indices_[0] = 0; 1135 | 1136 | g->set_edge_index(0, 0); 1137 | for (GraphElem i = 0; i < n_; i++) 1138 | g->set_edge_index(i+1, g->edge_indices_[i+1]); 1139 | 1140 | const GraphElem nedges = g->edge_indices_[n_] - g->edge_indices_[0]; 1141 | g->set_nedges(nedges); 1142 | 1143 | // set graph edge list 1144 | // sort edge list 1145 | auto ecmp = [] (EdgeTuple const& e0, EdgeTuple const& e1) 1146 | { return ((e0.ij_[0] < e1.ij_[0]) || ((e0.ij_[0] == e1.ij_[0]) && (e0.ij_[1] < e1.ij_[1]))); }; 1147 | 1148 | if (!std::is_sorted(edgeList.begin(), edgeList.end(), ecmp)) { 1149 | #if defined(DEBUG_PRINTF) 1150 | std::cout << "Edge list is not sorted." << std::endl; 1151 | #endif 1152 | std::sort(edgeList.begin(), edgeList.end(), ecmp); 1153 | } 1154 | #if defined(DEBUG_PRINTF) 1155 | else 1156 | std::cout << "Edge list is sorted!" << std::endl; 1157 | #endif 1158 | 1159 | GraphElem ePos = 0; 1160 | for (GraphElem i = 0; i < n_; i++) { 1161 | GraphElem e0, e1; 1162 | 1163 | g->edge_range(i, e0, e1); 1164 | #if defined(DEBUG_PRINTF) 1165 | if ((i % 100000) == 0) 1166 | std::cout << "Processing edges for vertex: " << i << ", range(" << e0 << ", " << e1 << 1167 | ")" << std::endl; 1168 | #endif 1169 | for (GraphElem j = e0; j < e1; j++) { 1170 | Edge &edge = g->set_edge(j); 1171 | 1172 | assert(ePos == j); 1173 | assert(i == edgeList[ePos].ij_[0]); 1174 | 1175 | edge.tail_ = edgeList[ePos].ij_[1]; 1176 | edge.weight_ = edgeList[ePos].w_; 1177 | 1178 | ePos++; 1179 | } 1180 | } 1181 | 1182 | #if defined(CHECK_NUM_EDGES) 1183 | GraphElem tot_numEdges = 0; 1184 | MPI_Allreduce(&numEdges, &tot_numEdges, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_); 1185 | const GraphElem tne = g->get_ne(); 1186 | assert(tne == tot_numEdges); 1187 | #endif 1188 | edgeList.clear(); 1189 | 1190 | X.clear(); 1191 | Y.clear(); 1192 | X_up.clear(); 1193 | Y_up.clear(); 1194 | X_down.clear(); 1195 | Y_down.clear(); 1196 | 1197 | sendup_edges.clear(); 1198 | senddn_edges.clear(); 1199 | recvup_edges.clear(); 1200 | recvdn_edges.clear(); 1201 | 1202 | return g; 1203 | } 1204 | 1205 | GraphWeight get_d() const { return rn_; } 1206 | GraphElem get_nv() const { return nv_; } 1207 | 1208 | private: 1209 | GraphElem nv_, n_; 1210 | GraphWeight rn_; 1211 | MPI_Comm comm_; 1212 | int nprocs_, rank_, up_, down_; 1213 | }; 1214 | 1215 | #endif 1216 | -------------------------------------------------------------------------------- /main.cpp: -------------------------------------------------------------------------------- 1 | // *********************************************************************** 2 | // 3 | // miniVite 4 | // 5 | // *********************************************************************** 6 | // 7 | // Copyright (2018) Battelle Memorial Institute 8 | // All rights reserved. 9 | // 10 | // Redistribution and use in source and binary forms, with or without 11 | // modification, are permitted provided that the following conditions 12 | // are met: 13 | // 14 | // 1. Redistributions of source code must retain the above copyright 15 | // notice, this list of conditions and the following disclaimer. 16 | // 17 | // 2. Redistributions in binary form must reproduce the above copyright 18 | // notice, this list of conditions and the following disclaimer in the 19 | // documentation and/or other materials provided with the distribution. 20 | // 21 | // 3. Neither the name of the copyright holder nor the names of its 22 | // contributors may be used to endorse or promote products derived from 23 | // this software without specific prior written permission. 24 | // 25 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 26 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 27 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 28 | // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 29 | // COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 30 | // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 31 | // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 32 | // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 33 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 34 | // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 35 | // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 36 | // POSSIBILITY OF SUCH DAMAGE. 37 | // 38 | // ************************************************************************ 39 | 40 | 41 | #include 42 | #include 43 | #include 44 | 45 | #include 46 | #include 47 | 48 | #include 49 | #include 50 | #include 51 | #include 52 | 53 | #include 54 | #include 55 | 56 | #include "dspl.hpp" 57 | 58 | // TODO FIXME add options for desired MPI thread-level 59 | 60 | static std::string inputFileName; 61 | static int me, nprocs; 62 | static int ranksPerNode = 1; 63 | static GraphElem nvRGG = 0; 64 | static bool generateGraph = false; 65 | static bool readBalanced = false; 66 | static bool showGraph = false; 67 | static GraphWeight randomEdgePercent = 0.0; 68 | static bool randomNumberLCG = false; 69 | static bool isUnitEdgeWeight = true; 70 | static GraphWeight threshold = 1.0E-6; 71 | 72 | // parse command line parameters 73 | static void parseCommandLine(const int argc, char * const argv[]); 74 | 75 | int main(int argc, char *argv[]) 76 | { 77 | double t0, t1, t2, t3, ti = 0.0; 78 | #ifdef DISABLE_THREAD_MULTIPLE_CHECK 79 | MPI_Init(&argc, &argv); 80 | #else 81 | int max_threads; 82 | 83 | max_threads = omp_get_max_threads(); 84 | 85 | if (max_threads > 1) { 86 | int provided; 87 | MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided); 88 | if (provided < MPI_THREAD_MULTIPLE) { 89 | std::cerr << "MPI library does not support MPI_THREAD_MULTIPLE." << std::endl; 90 | MPI_Abort(MPI_COMM_WORLD, -99); 91 | } 92 | } else { 93 | MPI_Init(&argc, &argv); 94 | } 95 | #endif 96 | 97 | MPI_Comm_size(MPI_COMM_WORLD, &nprocs); 98 | MPI_Comm_rank(MPI_COMM_WORLD, &me); 99 | 100 | parseCommandLine(argc, argv); 101 | 102 | createCommunityMPIType(); 103 | double td0, td1, td, tdt; 104 | 105 | MPI_Barrier(MPI_COMM_WORLD); 106 | td0 = MPI_Wtime(); 107 | 108 | Graph* g = nullptr; 109 | 110 | // generate graph only supports RGG as of now 111 | if (generateGraph) { 112 | GenerateRGG gr(nvRGG); 113 | g = gr.generate(randomNumberLCG, isUnitEdgeWeight, randomEdgePercent); 114 | } 115 | else { // read input graph 116 | BinaryEdgeList rm; 117 | if (readBalanced == true) 118 | g = rm.read_balanced(me, nprocs, ranksPerNode, inputFileName); 119 | else 120 | g = rm.read(me, nprocs, ranksPerNode, inputFileName); 121 | } 122 | 123 | assert(g != nullptr); 124 | if (showGraph) 125 | g->print(); 126 | 127 | #ifdef PRINT_DIST_STATS 128 | g->print_dist_stats(); 129 | #endif 130 | 131 | MPI_Barrier(MPI_COMM_WORLD); 132 | #ifdef DEBUG_PRINTF 133 | assert(g); 134 | #endif 135 | td1 = MPI_Wtime(); 136 | td = td1 - td0; 137 | 138 | MPI_Reduce(&td, &tdt, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); 139 | 140 | if (me == 0) { 141 | if (!generateGraph) 142 | std::cout << "Time to read input file and create distributed graph (in s): " 143 | << (tdt/nprocs) << std::endl; 144 | else 145 | std::cout << "Time to generate distributed graph of " 146 | << nvRGG << " vertices (in s): " << (tdt/nprocs) << std::endl; 147 | } 148 | 149 | GraphWeight currMod = -1.0; 150 | GraphWeight prevMod = -1.0; 151 | double total = 0.0; 152 | 153 | std::vector ssizes, rsizes, svdata, rvdata; 154 | #if defined(USE_MPI_RMA) 155 | MPI_Win commwin; 156 | #endif 157 | size_t ssz = 0, rsz = 0; 158 | int iters = 0; 159 | 160 | MPI_Barrier(MPI_COMM_WORLD); 161 | 162 | t1 = MPI_Wtime(); 163 | 164 | #if defined(USE_MPI_RMA) 165 | currMod = distLouvainMethod(me, nprocs, *g, ssz, rsz, ssizes, rsizes, 166 | svdata, rvdata, currMod, threshold, iters, commwin); 167 | #else 168 | currMod = distLouvainMethod(me, nprocs, *g, ssz, rsz, ssizes, rsizes, 169 | svdata, rvdata, currMod, threshold, iters); 170 | #endif 171 | MPI_Barrier(MPI_COMM_WORLD); 172 | t0 = MPI_Wtime(); 173 | total = t0 - t1; 174 | 175 | double tot_time = 0.0; 176 | MPI_Reduce(&total, &tot_time, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); 177 | 178 | if (me == 0) { 179 | double avgt = (tot_time / nprocs); 180 | if (!generateGraph) { 181 | std::cout << "-------------------------------------------------------" << std::endl; 182 | std::cout << "File: " << inputFileName << std::endl; 183 | std::cout << "-------------------------------------------------------" << std::endl; 184 | } 185 | std::cout << "-------------------------------------------------------" << std::endl; 186 | #ifdef USE_32_BIT_GRAPH 187 | std::cout << "32-bit datatype" << std::endl; 188 | #else 189 | std::cout << "64-bit datatype" << std::endl; 190 | #endif 191 | std::cout << "-------------------------------------------------------" << std::endl; 192 | std::cout << "Average total time (in s), #Processes: " << avgt << ", " << nprocs << std::endl; 193 | std::cout << "Modularity, #Iterations: " << currMod << ", " << iters << std::endl; 194 | std::cout << "MODS (final modularity * average time): " << (currMod * avgt) << std::endl; 195 | std::cout << "-------------------------------------------------------" << std::endl; 196 | } 197 | 198 | MPI_Barrier(MPI_COMM_WORLD); 199 | 200 | delete g; 201 | destroyCommunityMPIType(); 202 | 203 | MPI_Finalize(); 204 | 205 | return 0; 206 | } // main 207 | 208 | void parseCommandLine(const int argc, char * const argv[]) 209 | { 210 | int ret; 211 | 212 | while ((ret = getopt(argc, argv, "f:br:t:n:wlp:s")) != -1) { 213 | switch (ret) { 214 | case 'f': 215 | inputFileName.assign(optarg); 216 | break; 217 | case 'b': 218 | readBalanced = true; 219 | break; 220 | case 'r': 221 | ranksPerNode = atoi(optarg); 222 | break; 223 | case 't': 224 | threshold = atof(optarg); 225 | break; 226 | case 'n': 227 | nvRGG = atol(optarg); 228 | if (nvRGG > 0) 229 | generateGraph = true; 230 | break; 231 | case 'w': 232 | isUnitEdgeWeight = false; 233 | break; 234 | case 'l': 235 | randomNumberLCG = true; 236 | break; 237 | case 'p': 238 | randomEdgePercent = atof(optarg); 239 | break; 240 | case 's': 241 | showGraph = true; 242 | break; 243 | default: 244 | assert(0 && "Option not recognized!!!"); 245 | break; 246 | } 247 | } 248 | 249 | if (me == 0 && (argc == 1)) { 250 | std::cerr << "Must specify some options." << std::endl; 251 | MPI_Abort(MPI_COMM_WORLD, -99); 252 | } 253 | 254 | if (me == 0 && !generateGraph && inputFileName.empty()) { 255 | std::cerr << "Must specify a binary file name with -f or provide parameters for generating a graph." << std::endl; 256 | MPI_Abort(MPI_COMM_WORLD, -99); 257 | } 258 | 259 | if (me == 0 && !generateGraph && randomNumberLCG) { 260 | std::cerr << "Must specify -g for graph generation using LCG." << std::endl; 261 | MPI_Abort(MPI_COMM_WORLD, -99); 262 | } 263 | 264 | if (me == 0 && !generateGraph && (randomEdgePercent > 0.0)) { 265 | std::cerr << "Must specify -g for graph generation first to add random edges to it." << std::endl; 266 | MPI_Abort(MPI_COMM_WORLD, -99); 267 | } 268 | 269 | if (me == 0 && !generateGraph && !isUnitEdgeWeight) { 270 | std::cerr << "Must specify -g for graph generation first before setting edge weights." << std::endl; 271 | MPI_Abort(MPI_COMM_WORLD, -99); 272 | } 273 | 274 | if (me == 0 && generateGraph && ((randomEdgePercent < 0) || (randomEdgePercent >= 100))) { 275 | std::cerr << "Invalid random edge percentage for generated graph!" << std::endl; 276 | MPI_Abort(MPI_COMM_WORLD, -99); 277 | } 278 | } // parseCommandLine 279 | -------------------------------------------------------------------------------- /utils.hpp: -------------------------------------------------------------------------------- 1 | // *********************************************************************** 2 | // 3 | // miniVite 4 | // 5 | // *********************************************************************** 6 | // 7 | // Copyright (2018) Battelle Memorial Institute 8 | // All rights reserved. 9 | // 10 | // Redistribution and use in source and binary forms, with or without 11 | // modification, are permitted provided that the following conditions 12 | // are met: 13 | // 14 | // 1. Redistributions of source code must retain the above copyright 15 | // notice, this list of conditions and the following disclaimer. 16 | // 17 | // 2. Redistributions in binary form must reproduce the above copyright 18 | // notice, this list of conditions and the following disclaimer in the 19 | // documentation and/or other materials provided with the distribution. 20 | // 21 | // 3. Neither the name of the copyright holder nor the names of its 22 | // contributors may be used to endorse or promote products derived from 23 | // this software without specific prior written permission. 24 | // 25 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 26 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 27 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 28 | // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 29 | // COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 30 | // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 31 | // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 32 | // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 33 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 34 | // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 35 | // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 36 | // POSSIBILITY OF SUCH DAMAGE. 37 | // 38 | // ************************************************************************ 39 | 40 | #pragma once 41 | #ifndef UTILS_HPP 42 | #define UTILS_HPP 43 | 44 | #define PI (3.14159) 45 | 46 | #ifndef MAX_PRINT_NEDGE 47 | #define MAX_PRINT_NEDGE (10000000) 48 | #endif 49 | 50 | // Read https://en.wikipedia.org/wiki/Linear_congruential_generator#Period_length 51 | // about choice of LCG parameters 52 | // From numerical recipes 53 | // TODO FIXME investigate larger periods 54 | #define MLCG (2147483647) // 2^31 - 1 55 | #define ALCG (16807) // 7^5 56 | #define BLCG (0) 57 | 58 | #define SR_UP_TAG 100 59 | #define SR_DOWN_TAG 101 60 | #define SR_SIZES_UP_TAG 102 61 | #define SR_SIZES_DOWN_TAG 103 62 | #define SR_X_UP_TAG 104 63 | #define SR_X_DOWN_TAG 105 64 | #define SR_Y_UP_TAG 106 65 | #define SR_Y_DOWN_TAG 107 66 | #define SR_LCG_TAG 108 67 | 68 | #include 69 | #include 70 | #include 71 | 72 | #ifdef USE_32_BIT_GRAPH 73 | using GraphElem = int32_t; 74 | using GraphWeight = float; 75 | const MPI_Datatype MPI_GRAPH_TYPE = MPI_INT32_T; 76 | const MPI_Datatype MPI_WEIGHT_TYPE = MPI_FLOAT; 77 | #else 78 | using GraphElem = int64_t; 79 | using GraphWeight = double; 80 | const MPI_Datatype MPI_GRAPH_TYPE = MPI_INT64_T; 81 | const MPI_Datatype MPI_WEIGHT_TYPE = MPI_DOUBLE; 82 | #endif 83 | 84 | extern unsigned seed; 85 | 86 | // Is nprocs a power-of-2? 87 | int is_pwr2(int nprocs) 88 | { return ((nprocs != 0) && !(nprocs & (nprocs - 1))); } 89 | 90 | // return unint32_t seed 91 | GraphElem reseeder(unsigned initseed) 92 | { 93 | std::seed_seq seq({initseed}); 94 | std::vector seeds(1); 95 | seq.generate(seeds.begin(), seeds.end()); 96 | 97 | return (GraphElem)seeds[0]; 98 | } 99 | 100 | // Local random number generator 101 | template 102 | T genRandom(T lo, T hi) 103 | { 104 | thread_local static G gen(seed); 105 | using Dist = typename std::conditional 106 | < 107 | std::is_integral::value 108 | , std::uniform_int_distribution 109 | , std::uniform_real_distribution 110 | >::type; 111 | 112 | thread_local static Dist utd {}; 113 | return utd(gen, typename Dist::param_type{lo, hi}); 114 | } 115 | 116 | // Parallel Linear Congruential Generator 117 | // x[i] = (a*x[i-1] + b)%M 118 | class LCG 119 | { 120 | public: 121 | LCG(unsigned seed, GraphWeight* drand, 122 | GraphElem n, MPI_Comm comm = MPI_COMM_WORLD): 123 | seed_(seed), drand_(drand), n_(n) 124 | { 125 | comm_ = comm; 126 | MPI_Comm_size(comm_, &nprocs_); 127 | MPI_Comm_rank(comm_, &rank_); 128 | 129 | // allocate long random numbers 130 | rnums_.resize(n_); 131 | 132 | // init x0 133 | if (rank_ == 0) 134 | x0_ = reseeder(seed_); 135 | 136 | // step #1: bcast x0 from root 137 | MPI_Bcast(&x0_, 1, MPI_GRAPH_TYPE, 0, comm_); 138 | 139 | // step #2: parallel prefix to generate first random value per process 140 | parallel_prefix_op(); 141 | } 142 | 143 | ~LCG() { rnums_.clear(); } 144 | 145 | // matrix-matrix multiplication for 2x2 matrices 146 | void matmat_2x2(GraphElem c[], GraphElem a[], GraphElem b[]) 147 | { 148 | for (int i = 0; i < 2; i++) { 149 | for (int j = 0; j < 2; j++) { 150 | GraphElem sum = 0; 151 | for (int k = 0; k < 2; k++) { 152 | sum += a[i*2+k]*b[k*2+j]; 153 | } 154 | c[i*2+j] = sum; 155 | } 156 | } 157 | } 158 | 159 | // x *= y 160 | void matop_2x2(GraphElem x[], GraphElem y[]) 161 | { 162 | GraphElem tmp[4]; 163 | matmat_2x2(tmp, x, y); 164 | memcpy(x, tmp, sizeof(GraphElem[4])); 165 | } 166 | 167 | // find kth power of a 2x2 matrix 168 | void mat_power(GraphElem mat[], GraphElem k) 169 | { 170 | GraphElem tmp[4]; 171 | memcpy(tmp, mat, sizeof(GraphElem[4])); 172 | 173 | // mat-mat multiply k times 174 | for (GraphElem p = 0; p < k-1; p++) 175 | matop_2x2(mat, tmp); 176 | } 177 | 178 | // parallel prefix for matrix-matrix operation 179 | // `x0 is the very first random number in the series 180 | // `ab is a 2-length array which stores a and b 181 | // `n_ is (n/p) 182 | // `rnums is n_ length array which stores the random nums for a process 183 | void parallel_prefix_op() 184 | { 185 | GraphElem global_op[4]; 186 | global_op[0] = ALCG; 187 | global_op[1] = 0; 188 | global_op[2] = BLCG; 189 | global_op[3] = 1; 190 | 191 | mat_power(global_op, n_); // M^(n/p) 192 | GraphElem prefix_op[4] = {1,0,0,1}; // I in row-major 193 | 194 | GraphElem global_op_recv[4]; 195 | 196 | int steps = (int)(log2((double)nprocs_)); 197 | 198 | for (int s = 0; s < steps; s++) { 199 | 200 | int mate = rank_^(1 << s); // toggle the sth LSB to find my neighbor 201 | 202 | // send/recv global to/from mate 203 | MPI_Sendrecv(global_op, 4, MPI_GRAPH_TYPE, mate, SR_LCG_TAG, 204 | global_op_recv, 4, MPI_GRAPH_TYPE, mate, SR_LCG_TAG, 205 | comm_, MPI_STATUS_IGNORE); 206 | 207 | matop_2x2(global_op, global_op_recv); 208 | 209 | if (mate < rank_) 210 | matop_2x2(prefix_op, global_op_recv); 211 | 212 | MPI_Barrier(comm_); 213 | } 214 | 215 | // populate the first random number entry for each process 216 | // (x0*a + b)%P 217 | if (rank_ == 0) 218 | rnums_[0] = x0_; 219 | else 220 | rnums_[0] = (x0_*prefix_op[0] + prefix_op[2])%MLCG; 221 | } 222 | 223 | // generate random number based on the first 224 | // random number on a process 225 | // TODO check the 'quick'n dirty generators to 226 | // see if we can avoid the mod 227 | void generate() 228 | { 229 | #if defined(PRINT_LCG_LONG_RANDOM_NUMBERS) 230 | for (int k = 0; k < nprocs_; k++) { 231 | if (k == rank_) { 232 | std::cout << "------------" << std::endl; 233 | std::cout << "Process#" << rank_ << " :" << std::endl; 234 | std::cout << "------------" << std::endl; 235 | std::cout << rnums_[0] << std::endl; 236 | for (GraphElem i = 1; i < n_; i++) { 237 | rnums_[i] = (rnums_[i-1]*ALCG + BLCG)%MLCG; 238 | std::cout << rnums_[i] << std::endl; 239 | } 240 | } 241 | MPI_Barrier(comm_); 242 | } 243 | #else 244 | for (GraphElem i = 1; i < n_; i++) { 245 | rnums_[i] = (rnums_[i-1]*ALCG + BLCG)%MLCG; 246 | } 247 | #endif 248 | GraphWeight mult = 1.0 / (GraphWeight)(1.0 + (GraphWeight)(MLCG-1)); 249 | 250 | #if defined(PRINT_LCG_DOUBLE_RANDOM_NUMBERS) 251 | for (int k = 0; k < nprocs_; k++) { 252 | if (k == rank_) { 253 | std::cout << "------------" << std::endl; 254 | std::cout << "Process#" << rank_ << " :" << std::endl; 255 | std::cout << "------------" << std::endl; 256 | 257 | for (GraphElem i = 0; i < n_; i++) { 258 | drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult ); // 0-1 259 | std::cout << drand_[i] << std::endl; 260 | } 261 | } 262 | MPI_Barrier(comm_); 263 | } 264 | #else 265 | for (GraphElem i = 0; i < n_; i++) 266 | drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1 267 | #endif 268 | } 269 | 270 | // copy from drand_[idx_start] to new_drand, 271 | // rescale the random numbers between lo and hi 272 | void rescale(GraphWeight* new_drand, GraphElem idx_start, GraphWeight const& lo) 273 | { 274 | GraphWeight range = (1.0 / (GraphWeight)nprocs_); 275 | 276 | #if defined(PRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS) 277 | for (int k = 0; k < nprocs_; k++) { 278 | if (k == rank_) { 279 | std::cout << "------------" << std::endl; 280 | std::cout << "Process#" << rank_ << " :" << std::endl; 281 | std::cout << "------------" << std::endl; 282 | 283 | for (GraphElem i = idx_start, j = 0; i < n_; i++, j++) { 284 | new_drand[j] = lo + (GraphWeight)(range * drand_[i]); 285 | std::cout << new_drand[j] << std::endl; 286 | } 287 | } 288 | MPI_Barrier(comm_); 289 | } 290 | #else 291 | for (GraphElem i = idx_start, j = 0; i < n_; i++, j++) 292 | new_drand[j] = lo + (GraphWeight)(range * drand_[i]); // lo-hi 293 | #endif 294 | } 295 | 296 | private: 297 | MPI_Comm comm_; 298 | int nprocs_, rank_; 299 | unsigned seed_; 300 | GraphElem n_, x0_; 301 | GraphWeight* drand_; 302 | std::vector rnums_; 303 | }; 304 | 305 | // locks 306 | #ifdef USE_OPENMP_LOCK 307 | #else 308 | #ifdef USE_SPINLOCK 309 | #include 310 | std::atomic_flag lkd_ = ATOMIC_FLAG_INIT; 311 | #else 312 | #include 313 | std::mutex mtx_; 314 | #endif 315 | void lock() { 316 | #ifdef USE_SPINLOCK 317 | while (lkd_.test_and_set(std::memory_order_acquire)) { ; } 318 | #else 319 | mtx_.lock(); 320 | #endif 321 | } 322 | void unlock() { 323 | #ifdef USE_SPINLOCK 324 | lkd_.clear(std::memory_order_release); 325 | #else 326 | mtx_.unlock(); 327 | #endif 328 | } 329 | #endif 330 | 331 | #endif // UTILS 332 | --------------------------------------------------------------------------------