├── FAQS
├── LICENSE
├── Makefile
├── README
├── dspl.hpp
├── graph.hpp
├── main.cpp
└── utils.hpp


/FAQS:
--------------------------------------------------------------------------------
  1 | *****************
  2 | * miniVite FAQs *
  3 | *****************
  4 | ----------------------------------------------------
  5 | FYI, typical "How to run" queries are addressed Q5
  6 | onward.
  7 | 
  8 | Please send your suggestions for improving this FAQ
  9 | to zsayanz at gmail dot com OR hala at pnnl dot gov.
 10 | ----------------------------------------------------
 11 | 
 12 | -------------------------------------------------------------------------
 13 | Q1. What is graph community detection?
 14 | -------------------------------------------------------------------------
 15 | 
 16 | A1. In most real-world graphs/networks, the nodes/vertices tend to be 
 17 | organized into tightly-knit modules known as communities or clusters,
 18 | such that nodes within a community are more likely to be "related" to 
 19 | one another than they are to the rest of the network. The goodness of 
 20 | partitioning into communities is typically measured using a metric 
 21 | called modularity. Community detection is the method of identifying 
 22 | these clusters or communities in graphs.
 23 | 
 24 | [References]
 25 | 
 26 | Fortunato, Santo. "Community detection in graphs." Physics reports 
 27 | 486.3-5 (2010): 75-174. https://arxiv.org/pdf/0906.0612.pdf
 28 | 
 29 | --------------------------------------------------------------------------
 30 | Q2. What is miniVite? 
 31 | --------------------------------------------------------------------------
 32 | 
 33 | A2. miniVite is a distributed-memory code (or mini application) that 
 34 | performs partial graph community detection using the Louvain method. 
 35 | Louvain method is a multi-phase, iterative heuristic that performs 
 36 | modularity optimization for graph community detection. miniVite only 
 37 | performs the first phase of Louvain method.
 38 | 
 39 | [Code]
 40 | 
 41 | https://github.com/Exa-Graph/miniVite
 42 | http://hpc.pnl.gov/people/hala/grappolo.html
 43 | 
 44 | [References]
 45 | 
 46 | [Louvain method] Blondel, Vincent D., et al. "Fast unfolding of 
 47 | communities in large networks." Journal of statistical mechanics: 
 48 | theory and experiment 2008.10 (2008): P10008.
 49 | 
 50 | [miniVite] Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, 
 51 | Gebremedhin AH. miniVite: A Graph Analytics Benchmarking Tool for 
 52 | Massively Parallel Systems.
 53 | 
 54 | ---------------------------------------------------------------------------
 55 | Q3. What is the parent application of miniVite? How are they different?
 56 | ---------------------------------------------------------------------------
 57 | 
 58 | A3. miniVite is derived from Vite, which implements the multi-phase 
 59 | Louvain method. Apart from a parallel baseline version, Vite provides 
 60 | a number of heuristics (such as early termination, threshold cycling and 
 61 | incomplete coloring) that can improve the scalability and quality of 
 62 | community detection. In contrast, miniVite just provides a parallel 
 63 | baseline version, and, has option to select different MPI communication 
 64 | methods (such as send/recv, collectives and RMA) for one of the most 
 65 | communication intensive portions of the code. miniVite also includes an 
 66 | in-memory random geometric graph generator, making it convenient for 
 67 | users to run miniVite without any external files. Vite can also convert 
 68 | graphs from different native formats (like matrix market, SNAP, edge 
 69 | list, DIMACS, etc) to the binary format that both Vite and miniVite 
 70 | requires.
 71 | 
 72 | [Code]
 73 | 
 74 | http://hpc.pnl.gov/people/hala/grappolo.html
 75 | 
 76 | [References]
 77 | 
 78 | Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Lu H, Chavarria-Miranda D, 
 79 | Khan A, Gebremedhin A. Distributed louvain algorithm for graph community 
 80 | detection. In 2018 IEEE International Parallel and Distributed Processing 
 81 | Symposium (IPDPS) 2018 May 21 (pp. 885-895). IEEE.
 82 | 
 83 | Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Gebremedhin AH. 
 84 | Scalable Distributed Memory Community Detection Using Vite. 
 85 | In 2018 IEEE High Performance extreme Computing Conference (HPEC) 2018 
 86 | Sep 25 (pp. 1-7). IEEE.
 87 | 
 88 | -----------------------------------------------------------------------------
 89 | Q4. Is there a shared-memory equivalent of Vite/miniVite?
 90 | -----------------------------------------------------------------------------
 91 | 
 92 | A4. Yes, Grappolo performs shared-memory community detection using Louvain 
 93 | method. Apart from community detection, Grappolo has routines for matrix 
 94 | reordering as well. 
 95 | 
 96 | [Code]
 97 | 
 98 | http://hpc.pnl.gov/people/hala/grappolo.html
 99 | 
100 | [References]
101 | 
102 | Lu H, Halappanavar M, Kalyanaraman A. Parallel heuristics for scalable 
103 | community detection. Parallel Computing. 2015 Aug 1;47:19-37.
104 | 
105 | Halappanavar M, Lu H, Kalyanaraman A, Tumeo A. Scalable static and dynamic 
106 | community detection using grappolo. In High Performance Extreme Computing 
107 | Conference (HPEC), 2017 IEEE 2017 Sep 12 (pp. 1-6). IEEE.
108 | 
109 | ------------------------------------------------------------------------------
110 | Q5. How does one perform strong scaling analysis using miniVite? How to 
111 | determine 'good' candidates (input graphs) that can be used for strong 
112 | scaling runs? How much time is approximately spent in performing I/O?
113 | ------------------------------------------------------------------------------
114 | 
115 | A5. Use a large graph as an input, preferably over a billion edges. Not all 
116 | large graphs have a good community structure. You should be able to identify 
117 | one that serves your purpose, hopefully after few trials. Graphs can be 
118 | obtained various websites serving as repositories, such as Sparse TAMU 
119 | collection[1], SNAP repository[2] and MIT Graph Challenge website[3], to name 
120 | a few of the prominent ones. You can convert graphs from their native format to 
121 | the binary format that miniVite requires, using the converters in Vite (please 
122 | see README). 
123 | 
124 | If your graph is in Webgraph[4] format, you can easily convert it to an edge list 
125 | first (example code snippet below), before passing it on to Vite for subsequent 
126 | binary conversion, using the C++ version of Webgraph library[5]. Since the C++
127 | Webgraph library is not actively developed, you may use the original Webgraph
128 | JAVA library too.
129 | 
130 | #include "offline_edge_iterator.hpp"
131 | ...
132 | using namespace webgraph::ascii_graph;
133 | 
134 | // read in input/output file 
135 | std::ofstream ofile(argv[2]);
136 | offline_edge_iterator itor(argv[1]), end;
137 | 
138 | // read edges
139 | while( itor != end ) {
140 |     ofile << itor->first << " " << itor->second << std::endl;
141 |     ++itor;
142 | }
143 | ofile.close();
144 | ...
145 | 
146 | miniVite takes about 2-4s to read a 55GB binary file if you use Burst buffer 
147 | (Cray DataWarp) or Lustre striping (about 25 OSTs, default 1M blocks). The overall 
148 | I/O time for most cases is observed to be within 1/2% of the overall execution time. 
149 | This is assuming the simple vertex-based distribution (which is the default), if you 
150 | pass "-b" (only valid when you pass an input graph), then miniVite tries to balance 
151 | the number of edges (as a result, processes may own dissimilar number of vertices). 
152 | In such cases, there will be a serial overhead that may increase the I/O time and 
153 | subsequent distributed graph creation time significantly.
154 | 
155 | [1] https://sparse.tamu.edu/
156 | [2] http://snap.stanford.edu/data
157 | [3] http://graphchallenge.mit.edu/data-sets
158 | [4] http://webgraph.di.unimi.it/
159 | [5] http://cnets.indiana.edu/groups/nan/webgraph/
160 | 
161 | -----------------------------------------------------------------------------------
162 | Q6. How does one perform weak scaling analysis using miniVite? How does one scale
163 | the graphs with processes?
164 | -----------------------------------------------------------------------------------
165 | 
166 | A6. miniVite has an in-memory random geometric graph generator (please see
167 | README) that can be used for weak-scaling analysis. An n-D random geometric graph 
168 | (RGG), is generated by randomly placing N vertices in an n-D space and connecting 
169 | pairs of vertices whose Euclidean distance is less than or equal to d. We only 
170 | consider 2D RGGs contained within a unit square, [0,1]^2. We distribute the domain 
171 | such that each process receives N/p vertices (where p is the total 
172 | number of processes). 
173 | 
174 | Each process owns (1 * 1/p) portion of the unit square and d is computed as (please 
175 | refer to Section 4 of miniVite paper for details): 
176 | 
177 | d = (dc + dt)/2;
178 | where, dc = sqrt(ln(N) / pi*N); dt = sqrt(2.0736 / pi*N)
179 | 
180 | Therefore, the number of vertices (N) passed during miniVite execution on p
181 | processes must satisfy the condition -- 1/p > d.
182 | 
183 | Please note, the default distribution of graph generated from the in-built random 
184 | geometric graph generator causes a process to only communicate with its two 
185 | immediate neighbors. If you want to increase the communication intensity for 
186 | generated graphs, please use the "-p" option to specify an extra percentage of edges 
187 | that will be generated, linking random vertices. As a side-effect, this option 
188 | significantly increases the time required to generate the graph.
189 | 
190 | ------------------------------------------------------------------------------
191 | Q7. Does Vite (the parent application to miniVite) have an in-built graph 
192 | generator?
193 | ------------------------------------------------------------------------------
194 | 
195 | A7. At present, Vite does not have an in-built graph generator that we have in 
196 | miniVite, so we rely on users providing external graphs for Vite (strong/weak 
197 | scaling) analysis. However, Vite has bindings to NetworKit[5], and users can use 
198 | those bindings to generate graphs of their choice from Vite (refer to the 
199 | README). Generating large graphs in this manner can take a lot of time, since 
200 | there are intermediate copies and the graph generators themselves may be serial 
201 | or may use threads on a shared-memory system. We do not plan on supporting the 
202 | NetworKit bindings in future.
203 | 
204 | [5] https://networkit.github.io/
205 | 
206 | ------------------------------------------------------------------------------
207 | Q8. Does providing a larger input graph translate to comparatively larger 
208 | execution times? Is it possible to control the execution time for a particular
209 | graph?
210 | ------------------------------------------------------------------------------
211 | 
212 | A8. No. A relatively small graph can run for many iterations, as compared to
213 | a larger graph that runs for a few iterations to convergence. Since miniVite is 
214 | iterative, the final number of iterations to convergence (and hence, execution 
215 | time) depends on the structure of the graph. It is however possible to exit 
216 | early by passing a larger threshold (using the "-t <...>" option, the default 
217 | threshold or tolerance is 1.0E-06, a larger threshold can be passed, for e.g, 
218 | "-t 1.0E-03"), that should reduce the overall execution time for all graphs in 
219 | general (at least w.r.t miniVite, which only executes the first phase of Louvain 
220 | method).
221 | 
222 | ------------------------------------------------------------------------------
223 | Q9. Is there an option to add some noise in the generated random geometric 
224 | graphs?
225 | ------------------------------------------------------------------------------
226 | 
227 | A9. Yes, the "-p <percent>" option allows extra edges to be added between 
228 | random vertices (see README). This increases the overall communication, but 
229 | affects the structure of communities in the generated graph (lowers the 
230 | modularity). Therefore, adding extra edges in the generated graph will 
231 | most probably reduce the global modularity, and the number of iterations to 
232 | convergence shall decrease. 
233 | The maximum number of edges that can be added is bounded by INT_MAX, at 
234 | present, we do not handle data ranges more than INT_MAX.
235 | 
236 | ------------------------------------------------------------------------------
237 | Q10. What are the steps required for using real-world graphs as an input to
238 | miniVite?
239 | ------------------------------------------------------------------------------
240 | 
241 | A10. First, please download Vite (parent application of miniVite) from: 
242 | https://github.com/ECP-ExaGraph/vite
243 | 
244 | Graphs/Sparse matrices come in several native formats (matrix market, SNAP, 
245 | DIMACS, etc.) Vite has several options to convert graphs from native to the 
246 | binary format that miniVite requires (please take a look at Vite README).
247 | 
248 | As an example, you can download the Friendster file from: 
249 | https://sparse.tamu.edu/SNAP/com-Friendster
250 | The option to convert Friendster to binary using Vite's converter is as follows 
251 | (please note, this part is serial): 
252 | 
253 | $VITE_BIN_PATH/bin/./fileConvertDist -f $INPUT_PATH/com-Friendster.mtx 
254 |     -m -o $OUTPUT_PATH/com-Friendster.bin
255 | 
256 | After the conversion, you can run miniVite with the binary file obtained
257 | from the previous step:
258 | 
259 | mpiexec -n <...> $MINIVITE_PATH/./dspl -r <processes-per-node> 
260 |     -f $FILE_PATH/com-Friendster.bin
261 | 
262 | --------------------------------------------------------------------------------
263 | Q11. miniVite is scalable for a particular input graph, but not for another 
264 | similar sized graph, why is that and what can I do to improve the situation?
265 | --------------------------------------------------------------------------------
266 | 
267 | A11. Presently, our default distribution is vertex-based. That means a process 
268 | owns N/p vertices and all the edges connected to those N/p vertices (including 
269 | ghost vertices). Load imbalances are very probable in this type of distribution, 
270 | depending on the graph structure. Instead, it may be favorable to use an 
271 | edge-centric distribution, in which processes own approximately equal number of 
272 | edges. When the "-b" option is passed, miniVite attempts to distribute the 
273 | vertices across processes such each process owns approximately similar number 
274 | of edges. This edge-balanced distribution may reduce the overall communication,
275 | improving the performance.
276 | 
277 | As an example, lets say there is a large (real-world) graph, and its structure 
278 | is such that only a few processes end up owning a majority of edges, as per 
279 | miniVite graph data distribution. Also, lets assume that the graph has either a 
280 | very poor community structure (modularity closer to 0) or very stable community 
281 | structure (modularity close to 1 after a few iterations, that means not many 
282 | vertices are migrating to neighboring communities). In both these cases, 
283 | community detection in miniVite will run for relatively less number of 
284 | iterations, which may affect the overall scalability.
285 | 
286 | --------------------------------------------------------------------------------
287 | Q12. Can miniVite return some statistics on vertex/edge distribution of the
288 | underlying graph?
289 | --------------------------------------------------------------------------------
290 | Yes, please pass -DPRINT_DIST_STATS while building miniVite. When miniVite is 
291 | executed, it returns some basic information, such as:
292 | 
293 | -------------------------------------------------------
294 | Graph edge distribution characteristics
295 | -------------------------------------------------------
296 | Number of vertices: 34
297 | Number of edges: 156
298 | Maximum number of edges: 80
299 | Average number of edges: 78
300 | Expected value of X^2: 6088
301 | Variance: 4
302 | Standard deviation: 2
303 | -------------------------------------------------------
304 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (c) 2018, Battelle Memorial Institute
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | * Redistributions of source code must retain the above copyright notice, this
10 |   list of conditions and the following disclaimer.
11 | 
12 | * Redistributions in binary form must reproduce the above copyright notice,
13 |   this list of conditions and the following disclaimer in the documentation
14 |   and/or other materials provided with the distribution.
15 | 
16 | * Neither the name of the copyright holder nor the names of its
17 |   contributors may be used to endorse or promote products derived from
18 |   this software without specific prior written permission.
19 | 
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | 
 2 | CXX = CC
 3 | 
 4 | USE_TAUPROF=0
 5 | ifeq ($(USE_TAUPROF),1)
 6 | TAU=/soft/perftools/tau/tau-2.29/craycnl/lib
 7 | CXX = tau_cxx.sh -tau_makefile=$(TAU)/Makefile.tau-intel-papi-mpi-pdt 
 8 | endif
 9 | # use -xmic-avx512 instead of -xHost for Intel Xeon Phi platforms
10 | OPTFLAGS = -O3 -fopenmp -DPRINT_DIST_STATS #-DPRINT_EXTRA_NEDGES #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD
11 | #-DUSE_MPI_SENDRECV
12 | #-DUSE_MPI_COLLECTIVES
13 | # use export ASAN_OPTIONS=verbosity=1 to check ASAN output
14 | SNTFLAGS = -std=c++11 -fopenmp -fsanitize=address -O1 -fno-omit-frame-pointer
15 | CXXFLAGS = -std=c++11 -g $(OPTFLAGS)
16 | 
17 | OBJ = main.o
18 | TARGET = miniVite
19 | 
20 | all: $(TARGET)
21 | 
22 | %.o: %.cpp
23 | 	$(CXX) $(CXXFLAGS) -c -o $@ $^
24 | 
25 | $(TARGET):  $(OBJ)
26 | 	$(CXX) $^ $(OPTFLAGS) -o $@
27 | 
28 | .PHONY: clean
29 | 
30 | clean:
31 | 	rm -rf *~ $(OBJ) $(TARGET)
32 | 


--------------------------------------------------------------------------------
/README:
--------------------------------------------------------------------------------
  1 | ************************
  2 | miniVite (/mini/ˈviːte/)
  3 | ************************
  4 | 
  5 | *******
  6 | -------
  7 |  ABOUT
  8 | -------
  9 | *******
 10 | miniVite[*] is a proxy app that implements a single phase of Louvain 
 11 | method in distributed memory for graph community detection. Please 
 12 | refer to the following paper for a detailed discussion on
 13 | distributed memory Louvain method implementation:
 14 | https://ieeexplore.ieee.org/abstract/document/8425242/
 15 | 
 16 | Apart from real world graphs, users can use specific options 
 17 | to generate a Random Geometric Graph (RGG) in parallel.
 18 | RGGs have been known to have good community structure:
 19 | https://arxiv.org/pdf/1604.03993.pdf
 20 | 
 21 | The way we have implemented a parallel RGG generator, vertices 
 22 | owned by a process will only have cross edges with its logical
 23 | neighboring processes (each process owning 1x1/p chunk of the
 24 | 1x1 unit square). If MPI process mapping is such that consecutive 
 25 | processes (for e.g., p and p+1) are physically close to each other, 
 26 | then there is not much communication stress in the application. 
 27 | Therefore, we allow an option to add extra edges between randomly 
 28 | chosen vertices, whose owners may be physically far apart. 
 29 | 
 30 | We require the total number of processes to be a power of 2 and 
 31 | total number of vertices to be perfectly divisible by the number of 
 32 | processes when parallel RGG generation options are used. 
 33 | This constraint does not apply to real world graphs passed to miniVite.
 34 | 
 35 | We also allow users to pass any real world graph as input. However,
 36 | we expect an input graph to be in a certain binary format, which
 37 | we have observed to be more efficient than reading ASCII format
 38 | files. The code for binary conversion (from a variety of common
 39 | graph formats) is packaged separately with Vite, which is our
 40 | full implementation of Louvain method in distributed memory.
 41 | Please follow instructions in Vite README for binary file 
 42 | conversion.
 43 | 
 44 | Vite could be downloaded from (please don't use the past
 45 | PNNL/PNL link to download Vite, the following GitHub
 46 | link is the correct one):
 47 | https://github.com/ECP-ExaGraph/vite
 48 | 
 49 | Unlike Vite, we do not implement any heuristics to improve the
 50 | performance of Louvain method. miniVite is a baseline parallel
 51 | version, implementing only the first phase of Louvain method.
 52 | 
 53 | This code requires an MPI library (preferably MPI-3 compatible) 
 54 | and C++11 compliant compiler for building. 
 55 | 
 56 | Please contact the following for any queries or support:
 57 | 
 58 | Sayan Ghosh, PNNL (sg0 at pnnl dot gov)
 59 | Mahantesh Halappanavar, PNNL (hala at pnnl dot gov)
 60 | 
 61 | [*] Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman 
 62 | A, Gebremedhin AH. miniVite: A graph analytics benchmarking tool 
 63 | for massively parallel systems. In 2018 IEEE/ACM Performance Modeling, 
 64 | Benchmarking and Simulation of High Performance Computer Systems 
 65 | (PMBS) 2018 Nov 12 (pp. 51-56). IEEE.
 66 | 
 67 | Please '*' this repository on GitHub if the code is useful to you.
 68 | 
 69 | *************
 70 | -------------
 71 |  COMPILATION
 72 | -------------
 73 | *************
 74 | Please update the Makefile with compiler flags and use a C++11 compliant 
 75 | compiler of your choice. Invoke `make clean; make` after setting paths 
 76 | to MPI for generating the binary. Use `mpirun` or `mpiexec` or `srun`
 77 | to execute the code with specific runtime arguments mentioned in the
 78 | next section.
 79 | 
 80 | Pass -DPRINT_DIST_STATS for printing distributed graph 
 81 | characteristics.
 82 | 
 83 | Pass -DDEBUG_PRINTF if detailed diagonostics is required along
 84 | program run. This program requires OpenMP and C++11 support,
 85 | so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/
 86 | -std=c++0x.
 87 | 
 88 | Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are 
 89 | within 32-bit range (2 x 10^9), else 64-bit range is assumed.
 90 | 
 91 | Pass -DOMP_SCHEDULE_RUNTIME if you want to set OMP_SCHEDULE
 92 | for all parallel regions at runtime. If -DOMP_SCHEDULE_RUNTIME 
 93 | is passed, and OMP_SCHEDULE is not set, then the default schedule will
 94 | be chosen (which is most probably "static" or "guided" for most of 
 95 | the OpenMP regions).
 96 | 
 97 | Communicating vertex-community information (per iteration) 
 98 | is the most expensive step of our distributed Louvain 
 99 | implementation. We use the one of the following MPI communication 
100 | primitives for communicating vertex-community during a Louvain
101 | iteration, that could be enabled by passing predefined
102 | macros at compile time:
103 | 
104 | 1. MPI Collectives:  -DUSE_MPI_COLLECTIVES
105 | 2. MPI Send-Receive: -DUSE_MPI_SENDRECV
106 | 3. MPI RMA:          -DUSE_MPI_RMA (using -DUSE_MPI_ACCUMULATE 
107 |                      additionally ensures atomic put) 
108 | 4. Default:          Uses MPI point-to-point nonblocking API in communication 
109 |                      intensive parts..
110 | 
111 | Apart from these, we use MPI (blocking) collectives, mostly
112 | MPI_Alltoall.
113 | 
114 | There are other predefined macros in the code as well for printing
115 | intermediate results or checking correctness or using a particular
116 | C++ data structure. 
117 | 
118 | ***********************
119 | -----------------------
120 |  EXECUTING THE PROGRAM
121 | -----------------------
122 | ***********************
123 | 
124 | E.g.: 
125 | mpiexec -n 2 bin/./minivite -f karate.bin
126 | mpiexec -n 2 bin/./minivite -l -n 100
127 | mpiexec -n 2 bin/./minivite -n 100
128 | mpiexec -n 2 bin/./minivite -p 2 -n 100
129 | 
130 | [On Cray systems, pass MPICH_MAX_THREAD_SAFETY=multiple or 
131 | pass -DDISABLE_THREAD_MULTIPLE_CHECK while building miniVite.]
132 | 
133 | Possible options (can be combined):
134 | 
135 | 1. -f <bin-file>   : Specify input binary file after this argument. 
136 | 2. -b              : Only valid for real-world inputs. Attempts to distribute approximately 
137 |                      equal number of edges among processes. Irregular number of vertices
138 |                      owned by a particular process. Increases the distributed graph creation
139 |                      time due to serial overheads, but may improve overall execution time.
140 | 3. -n <vertices>   : Only valid for synthetically generated inputs. Pass total number of 
141 |                      vertices of the generated graph.
142 | 4. -l              : Use distributed LCG for randomly choosing edges. If this option 
143 |                      is not used, we will use C++ random number generator (using 
144 |                      std::default_random_engine).
145 | 5. -p <percent>    : Only valid for synthetically generated inputs. Specify percent of overall 
146 |                      edges to be randomly generated between processes.
147 | 6. -t <threshold>  : Specify threshold quantity (default: 1.0E-06) used to determine the 
148 |                      exit criteria in an iteration of Louvain method.
149 | 7. -w              : Only valid for synthetically generated inputs. Use Euclidean distance as edge weight. 
150 |                      If this option is not used, edge weights are considered as 1.0. Generate 
151 |                      edge weight uniformly between (0,1) if Euclidean distance is not available.                    
152 | 8. -r <nranks>     : This is used to control the number of aggregators in MPI I/O and is
153 |                      meaningful when an input binary graph file is passed with option "-f".
154 |                      naggr := (nranks > 1) ? (nprocs/nranks) : nranks;
155 | 9. -s              : Print graph data (edge list along with weights).
156 | 


--------------------------------------------------------------------------------
/dspl.hpp:
--------------------------------------------------------------------------------
   1 | // ***********************************************************************
   2 | //
   3 | //                              miniVite
   4 | //
   5 | // ***********************************************************************
   6 | //
   7 | //       Copyright (2018) Battelle Memorial Institute
   8 | //                      All rights reserved.
   9 | //
  10 | // Redistribution and use in source and binary forms, with or without
  11 | // modification, are permitted provided that the following conditions
  12 | // are met:
  13 | //
  14 | // 1. Redistributions of source code must retain the above copyright
  15 | // notice, this list of conditions and the following disclaimer.
  16 | //
  17 | // 2. Redistributions in binary form must reproduce the above copyright
  18 | // notice, this list of conditions and the following disclaimer in the
  19 | // documentation and/or other materials provided with the distribution.
  20 | //
  21 | // 3. Neither the name of the copyright holder nor the names of its
  22 | // contributors may be used to endorse or promote products derived from
  23 | // this software without specific prior written permission.
  24 | //
  25 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  26 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  27 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
  28 | // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
  29 | // COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
  30 | // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  31 | // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  32 | // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  33 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  34 | // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
  35 | // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  36 | // POSSIBILITY OF SUCH DAMAGE.
  37 | //
  38 | // ************************************************************************ 
  39 | 
  40 | #pragma once
  41 | #ifndef DSPL_HPP
  42 | #define DSPL_HPP
  43 | 
  44 | #include <algorithm>
  45 | #include <fstream>
  46 | #include <functional>
  47 | #include <iostream>
  48 | #include <list>
  49 | #include <numeric>
  50 | #include <vector>
  51 | #include <unordered_map>
  52 | #include <unordered_set>
  53 | #include <map>
  54 | 
  55 | #include <mpi.h>
  56 | #include <omp.h>
  57 | 
  58 | #include "graph.hpp"
  59 | #include "utils.hpp"
  60 | 
  61 | struct Comm {
  62 |   GraphElem size;
  63 |   GraphWeight degree;
  64 | 
  65 |   Comm() : size(0), degree(0.0) {};
  66 | };
  67 | 
  68 | struct CommInfo {
  69 |     GraphElem community;
  70 |     GraphElem size;
  71 |     GraphWeight degree;
  72 | };
  73 | 
  74 | const int SizeTag           = 1;
  75 | const int VertexTag         = 2;
  76 | const int CommunityTag      = 3;
  77 | const int CommunitySizeTag  = 4;
  78 | const int CommunityDataTag  = 5;
  79 | 
  80 | static MPI_Datatype commType;
  81 | 
  82 | void distSumVertexDegree(const Graph &g, std::vector<GraphWeight> &vDegree, std::vector<Comm> &localCinfo)
  83 | {
  84 |   const GraphElem nv = g.get_lnv();
  85 | 
  86 | #ifdef OMP_SCHEDULE_RUNTIME
  87 | #pragma omp parallel for default(shared), shared(g, vDegree, localCinfo), schedule(runtime)
  88 | #else
  89 | #pragma omp parallel for default(shared), shared(g, vDegree, localCinfo), schedule(guided)
  90 | #endif
  91 |   for (GraphElem i = 0; i < nv; i++) {
  92 |     GraphElem e0, e1;
  93 |     GraphWeight tw = 0.0;
  94 | 
  95 |     g.edge_range(i, e0, e1);
  96 | 
  97 |     for (GraphElem k = e0; k < e1; k++) {
  98 |       const Edge &edge = g.get_edge(k);
  99 |       tw += edge.weight_;
 100 |     }
 101 | 
 102 |     vDegree[i] = tw;
 103 |    
 104 |     localCinfo[i].degree = tw;
 105 |     localCinfo[i].size = 1L;
 106 |   }
 107 | } // distSumVertexDegree
 108 | 
 109 | GraphWeight distCalcConstantForSecondTerm(const std::vector<GraphWeight> &vDegree, MPI_Comm gcomm)
 110 | {
 111 |   GraphWeight totalEdgeWeightTwice = 0.0;
 112 |   GraphWeight localWeight = 0.0;
 113 |   int me = -1;
 114 | 
 115 |   const size_t vsz = vDegree.size();
 116 | 
 117 | #ifdef OMP_SCHEDULE_RUNTIME
 118 | #pragma omp parallel for default(shared), shared(vDegree), reduction(+: localWeight) schedule(runtime)
 119 | #else
 120 | #pragma omp parallel for default(shared), shared(vDegree), reduction(+: localWeight) schedule(static)
 121 | #endif  
 122 |   for (GraphElem i = 0; i < vsz; i++)
 123 |     localWeight += vDegree[i]; // Local reduction
 124 | 
 125 |   // Global reduction
 126 |   MPI_Allreduce(&localWeight, &totalEdgeWeightTwice, 1, 
 127 |           MPI_WEIGHT_TYPE, MPI_SUM, gcomm);
 128 | 
 129 |   return (1.0 / static_cast<GraphWeight>(totalEdgeWeightTwice));
 130 | } // distCalcConstantForSecondTerm
 131 | 
 132 | void distInitComm(std::vector<GraphElem> &pastComm, std::vector<GraphElem> &currComm, const GraphElem base)
 133 | {
 134 |   const size_t csz = currComm.size();
 135 | 
 136 | #ifdef DEBUG_PRINTF  
 137 |   assert(csz == pastComm.size());
 138 | #endif
 139 | 
 140 | #ifdef OMP_SCHEDULE_RUNTIME
 141 | #pragma omp parallel for default(shared), shared(pastComm, currComm), firstprivate(base), schedule(runtime)
 142 | #else
 143 | #pragma omp parallel for default(shared), shared(pastComm, currComm), firstprivate(base), schedule(static)
 144 | #endif
 145 |   for (GraphElem i = 0L; i < csz; i++) {
 146 |     pastComm[i] = i + base;
 147 |     currComm[i] = i + base;
 148 |   }
 149 | } // distInitComm
 150 | 
 151 | void distInitLouvain(const Graph &dg, std::vector<GraphElem> &pastComm, 
 152 |         std::vector<GraphElem> &currComm, std::vector<GraphWeight> &vDegree, 
 153 |         std::vector<GraphWeight> &clusterWeight, std::vector<Comm> &localCinfo, 
 154 |         std::vector<Comm> &localCupdate, GraphWeight &constantForSecondTerm,
 155 |         const int me)
 156 | {
 157 |   const GraphElem base = dg.get_base(me);
 158 |   const GraphElem nv = dg.get_lnv();
 159 |   MPI_Comm gcomm = dg.get_comm();
 160 | 
 161 |   vDegree.resize(nv);
 162 |   pastComm.resize(nv);
 163 |   currComm.resize(nv);
 164 |   clusterWeight.resize(nv);
 165 |   localCinfo.resize(nv);
 166 |   localCupdate.resize(nv);
 167 |  
 168 |   distSumVertexDegree(dg, vDegree, localCinfo);
 169 |   constantForSecondTerm = distCalcConstantForSecondTerm(vDegree, gcomm);
 170 | 
 171 |   distInitComm(pastComm, currComm, base);
 172 | } // distInitLouvain
 173 | 
 174 | GraphElem distGetMaxIndex(const std::unordered_map<GraphElem, GraphElem> &clmap, const std::vector<GraphWeight> &counter,
 175 | 			  const GraphWeight selfLoop, const std::vector<Comm> &localCinfo, 
 176 | 			  const std::map<GraphElem,Comm> &remoteCinfo, const GraphWeight vDegree, 
 177 |                           const GraphElem currSize, const GraphWeight currDegree, const GraphElem currComm,
 178 | 			  const GraphElem base, const GraphElem bound, const GraphWeight constant)
 179 | {
 180 |   std::unordered_map<GraphElem, GraphElem>::const_iterator storedAlready;
 181 |   GraphElem maxIndex = currComm;
 182 |   GraphWeight curGain = 0.0, maxGain = 0.0;
 183 |   GraphWeight eix = static_cast<GraphWeight>(counter[0]) - static_cast<GraphWeight>(selfLoop);
 184 | 
 185 |   GraphWeight ax = currDegree - vDegree;
 186 |   GraphWeight eiy = 0.0, ay = 0.0;
 187 | 
 188 |   GraphElem maxSize = currSize; 
 189 |   GraphElem size = 0;
 190 | 
 191 |   storedAlready = clmap.begin();
 192 | #ifdef DEBUG_PRINTF  
 193 |   assert(storedAlready != clmap.end());
 194 | #endif
 195 |   do {
 196 |       if (currComm != storedAlready->first) {
 197 | 
 198 |           // is_local, direct access local info
 199 |           if ((storedAlready->first >= base) && (storedAlready->first < bound)) {
 200 |               ay = localCinfo[storedAlready->first-base].degree;
 201 |               size = localCinfo[storedAlready->first - base].size;   
 202 |           }
 203 |           else {
 204 |               // is_remote, lookup map
 205 |               std::map<GraphElem,Comm>::const_iterator citer = remoteCinfo.find(storedAlready->first);
 206 |               ay = citer->second.degree;
 207 |               size = citer->second.size; 
 208 |           }
 209 | 
 210 |           eiy = counter[storedAlready->second];
 211 | 
 212 |           curGain = 2.0 * (eiy - eix) - 2.0 * vDegree * (ay - ax) * constant;
 213 | 
 214 |           if ((curGain > maxGain) ||
 215 |                   ((curGain == maxGain) && (curGain != 0.0) && (storedAlready->first < maxIndex))) {
 216 |               maxGain = curGain;
 217 |               maxIndex = storedAlready->first;
 218 |               maxSize = size;
 219 |           }
 220 |       }
 221 |       storedAlready++;
 222 |   } while (storedAlready != clmap.end());
 223 | 
 224 |   if ((maxSize == 1) && (currSize == 1) && (maxIndex > currComm))
 225 |     maxIndex = currComm;
 226 | 
 227 |   return maxIndex;
 228 | } // distGetMaxIndex
 229 | 
 230 | GraphWeight distBuildLocalMapCounter(const GraphElem e0, const GraphElem e1, std::unordered_map<GraphElem, GraphElem> &clmap, 
 231 | 				   std::vector<GraphWeight> &counter, const Graph &g, 
 232 |                                    const std::vector<GraphElem> &currComm, 
 233 |                                    const std::unordered_map<GraphElem, GraphElem> &remoteComm,
 234 | 	                           const GraphElem vertex, const GraphElem base, const GraphElem bound)
 235 | {
 236 |   GraphElem numUniqueClusters = 1L;
 237 |   GraphWeight selfLoop = 0;
 238 |   std::unordered_map<GraphElem, GraphElem>::const_iterator storedAlready;
 239 | 
 240 |   for (GraphElem j = e0; j < e1; j++) {
 241 |         
 242 |     const Edge &edge = g.get_edge(j);
 243 |     const GraphElem &tail_ = edge.tail_;
 244 |     const GraphWeight &weight = edge.weight_;
 245 |     GraphElem tcomm;
 246 | 
 247 |     if (tail_ == vertex + base)
 248 |       selfLoop += weight;
 249 | 
 250 |     // is_local, direct access local std::vector<GraphElem>
 251 |     if ((tail_ >= base) && (tail_ < bound))
 252 |       tcomm = currComm[tail_ - base];
 253 |     else { // is_remote, lookup map
 254 |       std::unordered_map<GraphElem, GraphElem>::const_iterator iter = remoteComm.find(tail_);
 255 | 
 256 | #ifdef DEBUG_PRINTF  
 257 |       assert(iter != remoteComm.end());
 258 | #endif
 259 |       tcomm = iter->second;
 260 |     }
 261 | 
 262 |     storedAlready = clmap.find(tcomm);
 263 |     
 264 |     if (storedAlready != clmap.end())
 265 |       counter[storedAlready->second] += weight;
 266 |     else {
 267 |         clmap.insert(std::unordered_map<GraphElem, GraphElem>::value_type(tcomm, numUniqueClusters));
 268 |         counter.push_back(weight);
 269 |         numUniqueClusters++;
 270 |     }
 271 |   }
 272 | 
 273 |   return selfLoop;
 274 | } // distBuildLocalMapCounter
 275 | 
 276 | void distExecuteLouvainIteration(const GraphElem i, const Graph &dg, const std::vector<GraphElem> &currComm,
 277 | 				 std::vector<GraphElem> &targetComm, const std::vector<GraphWeight> &vDegree,
 278 |                                  std::vector<Comm> &localCinfo, std::vector<Comm> &localCupdate,
 279 | 				 const std::unordered_map<GraphElem, GraphElem> &remoteComm, 
 280 |                                  const std::map<GraphElem,Comm> &remoteCinfo, 
 281 |                                  std::map<GraphElem,Comm> &remoteCupdate, const GraphWeight constantForSecondTerm,
 282 |                                  std::vector<GraphWeight> &clusterWeight, const int me)
 283 | {
 284 |   GraphElem localTarget = -1;
 285 |   GraphElem e0, e1, selfLoop = 0;
 286 |   std::unordered_map<GraphElem, GraphElem> clmap;
 287 |   std::vector<GraphWeight> counter;
 288 | 
 289 |   const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
 290 |   const GraphElem cc = currComm[i];
 291 |   GraphWeight ccDegree;
 292 |   GraphElem ccSize;  
 293 |   bool currCommIsLocal = false; 
 294 |   bool targetCommIsLocal = false;
 295 | 
 296 |   // Current Community is local
 297 |   if (cc >= base && cc < bound) {
 298 | 	ccDegree=localCinfo[cc-base].degree;
 299 |         ccSize=localCinfo[cc-base].size;
 300 |         currCommIsLocal=true;
 301 |   } else {
 302 |   // is remote
 303 |         std::map<GraphElem,Comm>::const_iterator citer = remoteCinfo.find(cc);
 304 | 	ccDegree = citer->second.degree;
 305 |  	ccSize = citer->second.size;
 306 | 	currCommIsLocal=false;
 307 |   }
 308 | 
 309 |   dg.edge_range(i, e0, e1);
 310 | 
 311 |   if (e0 != e1) {
 312 |     clmap.insert(std::unordered_map<GraphElem, GraphElem>::value_type(cc, 0));
 313 |     counter.push_back(0.0);
 314 | 
 315 |     selfLoop =  distBuildLocalMapCounter(e0, e1, clmap, counter, dg, 
 316 |                     currComm, remoteComm, i, base, bound);
 317 | 
 318 |     clusterWeight[i] += counter[0];
 319 | 
 320 |     localTarget = distGetMaxIndex(clmap, counter, selfLoop, localCinfo, remoteCinfo, 
 321 |                     vDegree[i], ccSize, ccDegree, cc, base, bound, constantForSecondTerm);
 322 |   }
 323 |   else
 324 |     localTarget = cc;
 325 | 
 326 |    // is the Target Local?
 327 |    if (localTarget >= base && localTarget < bound)
 328 |       targetCommIsLocal = true;
 329 |   
 330 |   // current and target comm are local - atomic updates to vectors
 331 |   if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && targetCommIsLocal) {
 332 |         
 333 | #ifdef DEBUG_PRINTF  
 334 |         assert( base < localTarget < bound);
 335 |         assert( base < cc < bound);
 336 | 	assert( cc - base < localCupdate.size()); 	
 337 | 	assert( localTarget - base < localCupdate.size()); 	
 338 | #endif
 339 |         #pragma omp atomic update
 340 |         localCupdate[localTarget-base].degree += vDegree[i];
 341 |         #pragma omp atomic update
 342 |         localCupdate[localTarget-base].size++;
 343 |         #pragma omp atomic update
 344 |         localCupdate[cc-base].degree -= vDegree[i];
 345 |         #pragma omp atomic update
 346 |         localCupdate[cc-base].size--;
 347 |      }	
 348 | 
 349 |   // current is local, target is not - do atomic on local, accumulate in Maps for remote
 350 |   if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && !targetCommIsLocal) {
 351 |         #pragma omp atomic update
 352 |         localCupdate[cc-base].degree -= vDegree[i];
 353 |         #pragma omp atomic update
 354 |         localCupdate[cc-base].size--;
 355 |  
 356 |         // search target!     
 357 |         std::map<GraphElem,Comm>::iterator iter=remoteCupdate.find(localTarget);
 358 |  
 359 |         #pragma omp atomic update
 360 |         iter->second.degree += vDegree[i];
 361 |         #pragma omp atomic update
 362 |         iter->second.size++;
 363 |   }
 364 |         
 365 |    // current is remote, target is local - accumulate for current, atomic on local
 366 |    if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && targetCommIsLocal) {
 367 |         #pragma omp atomic update
 368 |         localCupdate[localTarget-base].degree += vDegree[i];
 369 |         #pragma omp atomic update
 370 |         localCupdate[localTarget-base].size++;
 371 |        
 372 |         // search current 
 373 |         std::map<GraphElem,Comm>::iterator iter=remoteCupdate.find(cc);
 374 |   
 375 |         #pragma omp atomic update
 376 |         iter->second.degree -= vDegree[i];
 377 |         #pragma omp atomic update
 378 |         iter->second.size--;
 379 |    }
 380 |                     
 381 |    // current and target are remote - accumulate for both
 382 |    if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && !targetCommIsLocal) {
 383 |        
 384 |         // search current 
 385 |         std::map<GraphElem,Comm>::iterator iter = remoteCupdate.find(cc);
 386 |   
 387 |         #pragma omp atomic update
 388 |         iter->second.degree -= vDegree[i];
 389 |         #pragma omp atomic update
 390 |         iter->second.size--;
 391 |    
 392 |         // search target
 393 |         iter=remoteCupdate.find(localTarget);
 394 |   
 395 |         #pragma omp atomic update
 396 |         iter->second.degree += vDegree[i];
 397 |         #pragma omp atomic update
 398 |         iter->second.size++;
 399 |    }
 400 | 
 401 | #ifdef DEBUG_PRINTF  
 402 |   assert(localTarget != -1);
 403 | #endif
 404 |   targetComm[i] = localTarget;
 405 | } // distExecuteLouvainIteration
 406 | 
 407 | GraphWeight distComputeModularity(const Graph &g, std::vector<Comm> &localCinfo,
 408 | 			     const std::vector<GraphWeight> &clusterWeight,
 409 | 			     const GraphWeight constantForSecondTerm,
 410 | 			     const int me)
 411 | {
 412 |   const GraphElem nv = g.get_lnv();
 413 |   MPI_Comm gcomm = g.get_comm();
 414 | 
 415 |   GraphWeight le_la_xx[2];
 416 |   GraphWeight e_a_xx[2] = {0.0, 0.0};
 417 |   GraphWeight le_xx = 0.0, la2_x = 0.0;
 418 | 
 419 | #ifdef DEBUG_PRINTF  
 420 |   assert((clusterWeight.size() == nv));
 421 | #endif
 422 | 
 423 | #ifdef OMP_SCHEDULE_RUNTIME
 424 | #pragma omp parallel for default(shared), shared(clusterWeight, localCinfo), \
 425 |   reduction(+: le_xx), reduction(+: la2_x) schedule(runtime)
 426 | #else
 427 | #pragma omp parallel for default(shared), shared(clusterWeight, localCinfo), \
 428 |   reduction(+: le_xx), reduction(+: la2_x) schedule(static)
 429 | #endif
 430 |   for (GraphElem i = 0L; i < nv; i++) {
 431 |     le_xx += clusterWeight[i];
 432 |     la2_x += static_cast<GraphWeight>(localCinfo[i].degree) * static_cast<GraphWeight>(localCinfo[i].degree); 
 433 |   } 
 434 |   le_la_xx[0] = le_xx;
 435 |   le_la_xx[1] = la2_x;
 436 | 
 437 | #ifdef DEBUG_PRINTF  
 438 |   const double t0 = MPI_Wtime();
 439 | #endif
 440 | 
 441 |   MPI_Allreduce(le_la_xx, e_a_xx, 2, MPI_WEIGHT_TYPE, MPI_SUM, gcomm);
 442 | 
 443 | #ifdef DEBUG_PRINTF  
 444 |   const double t1 = MPI_Wtime();
 445 | #endif
 446 | 
 447 |   GraphWeight currMod = std::fabs((e_a_xx[0] * constantForSecondTerm) - 
 448 |       (e_a_xx[1] * constantForSecondTerm * constantForSecondTerm));
 449 | #ifdef DEBUG_PRINTF  
 450 |   std::cout << "[" << me << "]le_xx: " << le_xx << ", la2_x: " << la2_x << std::endl;
 451 |   std::cout << "[" << me << "]e_xx: " << e_a_xx[0] << ", a2_x: " << e_a_xx[1] << ", currMod: " << currMod << std::endl;
 452 |   std::cout << "[" << me << "]Reduction time: " << (t1 - t0) << std::endl;
 453 | #endif
 454 | 
 455 |   return currMod;
 456 | } // distComputeModularity
 457 | 
 458 | void distUpdateLocalCinfo(std::vector<Comm> &localCinfo, const std::vector<Comm> &localCupdate)
 459 | {
 460 |     size_t csz = localCinfo.size();
 461 | 
 462 | #ifdef OMP_SCHEDULE_RUNTIME
 463 | #pragma omp for schedule(runtime)
 464 | #else
 465 | #pragma omp for schedule(static)
 466 | #endif
 467 |     for (GraphElem i = 0L; i < csz; i++) {
 468 |         localCinfo[i].size += localCupdate[i].size;
 469 |         localCinfo[i].degree += localCupdate[i].degree;
 470 |     }
 471 | }
 472 | 
 473 | void distCleanCWandCU(const GraphElem nv, std::vector<GraphWeight> &clusterWeight,
 474 |         std::vector<Comm> &localCupdate)
 475 | {
 476 | #ifdef OMP_SCHEDULE_RUNTIME
 477 | #pragma omp for schedule(runtime)
 478 | #else
 479 | #pragma omp for schedule(static)
 480 | #endif
 481 |     for (GraphElem i = 0L; i < nv; i++) {
 482 |         clusterWeight[i] = 0;
 483 |         localCupdate[i].degree = 0;
 484 |         localCupdate[i].size = 0;
 485 |     }
 486 | } // distCleanCWandCU
 487 | 
 488 | #if defined(USE_MPI_RMA)
 489 | void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs,
 490 |         const size_t &ssz, const size_t &rsz, const std::vector<GraphElem> &ssizes, 
 491 |         const std::vector<GraphElem> &rsizes, const std::vector<GraphElem> &svdata, 
 492 |         const std::vector<GraphElem> &rvdata, const std::vector<GraphElem> &currComm, 
 493 |         const std::vector<Comm> &localCinfo, std::map<GraphElem,Comm> &remoteCinfo, 
 494 |         std::unordered_map<GraphElem, GraphElem> &remoteComm, std::map<GraphElem,Comm> &remoteCupdate, 
 495 |         const MPI_Win &commwin, const std::vector<GraphElem> &disp)
 496 | #else
 497 | void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs,
 498 |         const size_t &ssz, const size_t &rsz, const std::vector<GraphElem> &ssizes, 
 499 |         const std::vector<GraphElem> &rsizes, const std::vector<GraphElem> &svdata, 
 500 |         const std::vector<GraphElem> &rvdata, const std::vector<GraphElem> &currComm, 
 501 |         const std::vector<Comm> &localCinfo, std::map<GraphElem,Comm> &remoteCinfo, 
 502 |         std::unordered_map<GraphElem, GraphElem> &remoteComm, std::map<GraphElem,Comm> &remoteCupdate)
 503 | #endif
 504 | {
 505 | #if defined(USE_MPI_RMA)
 506 |     std::vector<GraphElem> scdata(ssz);
 507 | #else
 508 |     std::vector<GraphElem> rcdata(rsz), scdata(ssz);
 509 | #endif
 510 |   GraphElem spos, rpos;
 511 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR)
 512 |   std::vector< std::vector< GraphElem > > rcinfo(nprocs);
 513 | #else
 514 |   std::vector<std::unordered_set<GraphElem> > rcinfo(nprocs);
 515 | #endif
 516 | 
 517 | #if defined(USE_MPI_SENDRECV)
 518 | #else
 519 |   std::vector<MPI_Request> rreqs(nprocs), sreqs(nprocs);
 520 | #endif
 521 | 
 522 | #ifdef DEBUG_PRINTF  
 523 |   double t0, t1, ta = 0.0;
 524 | #endif
 525 | 
 526 | #if defined(USE_MPI_RMA) && !defined(USE_MPI_ACCUMULATE)
 527 |   int num_comm_procs;
 528 | #endif
 529 | 
 530 | #if defined(USE_MPI_RMA) && !defined(USE_MPI_ACCUMULATE)
 531 |   spos = 0;
 532 |   rpos = 0;
 533 |   std::vector<int> comm_proc(nprocs);
 534 |   std::vector<int> comm_proc_buf_disp(nprocs);
 535 |   
 536 |   /* Initialize all to -1 (unsure if necessary) */
 537 |   for (int i = 0; i < nprocs; i++) {
 538 |       comm_proc[i] = -1;
 539 |       comm_proc_buf_disp[i] = -1;
 540 |   }
 541 |   
 542 |   num_comm_procs = 0;
 543 |   for (int i = 0; i < nprocs; i++) {
 544 |       if ((i != me) && (ssizes[i] > 0)) {
 545 |           comm_proc[num_comm_procs] = i;
 546 |           comm_proc_buf_disp[num_comm_procs] = spos;
 547 |           num_comm_procs++;
 548 |       }
 549 |       spos += ssizes[i];
 550 |       rpos += rsizes[i];
 551 |   }
 552 | #endif
 553 | 
 554 |   const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
 555 |   const GraphElem nv = dg.get_lnv();
 556 |   MPI_Comm gcomm = dg.get_comm();
 557 | 
 558 |   // Collects Communities of local vertices for remote nodes
 559 | #ifdef OMP_SCHEDULE_RUNTIME
 560 | #pragma omp parallel for shared(svdata, scdata, currComm) schedule(runtime)
 561 | #else
 562 | #pragma omp parallel for shared(svdata, scdata, currComm) schedule(static)
 563 | #endif
 564 |   for (GraphElem i = 0; i < ssz; i++) {
 565 |     const GraphElem vertex = svdata[i];
 566 | #ifdef DEBUG_PRINTF  
 567 |     assert((vertex >= base) && (vertex < bound));
 568 | #endif
 569 |     const GraphElem comm = currComm[vertex - base];
 570 |     scdata[i] = comm;
 571 |   }
 572 | 
 573 |   std::vector<GraphElem> rcsizes(nprocs), scsizes(nprocs);
 574 |   std::vector<CommInfo> sinfo, rinfo;
 575 | 
 576 | #ifdef DEBUG_PRINTF  
 577 |   t0 = MPI_Wtime();
 578 | #endif
 579 | #if !defined(USE_MPI_RMA) || defined(USE_MPI_ACCUMULATE)
 580 |   spos = 0;
 581 |   rpos = 0;
 582 | #endif
 583 | #if defined(USE_MPI_COLLECTIVES)
 584 |   std::vector<int> scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs);
 585 |   for (int i = 0; i < nprocs; i++) {
 586 |       scnts[i] = ssizes[i];
 587 |       rcnts[i] = rsizes[i];
 588 |       sdispls[i] = spos;
 589 |       rdispls[i] = rpos;
 590 |       spos += scnts[i];
 591 |       rpos += rcnts[i];
 592 |   }
 593 |   scnts[me] = 0;
 594 |   rcnts[me] = 0;
 595 |   MPI_Alltoallv(scdata.data(), scnts.data(), sdispls.data(), 
 596 |           MPI_GRAPH_TYPE, rcdata.data(), rcnts.data(), rdispls.data(), 
 597 |           MPI_GRAPH_TYPE, gcomm);
 598 | #elif defined(USE_MPI_RMA)
 599 | #if defined(USE_MPI_ACCUMULATE)
 600 |   for (int i = 0; i < nprocs; i++) {
 601 |       if ((i != me) && (ssizes[i] > 0)) {
 602 |           MPI_Accumulate(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
 603 |                   disp[i], ssizes[i], MPI_GRAPH_TYPE, MPI_REPLACE, commwin);
 604 |       }
 605 |       spos += ssizes[i];
 606 |       rpos += rsizes[i];
 607 |   }
 608 | #else
 609 |   for (int i = 0; i < num_comm_procs; i++) {
 610 |       int target_rank = comm_proc[i];
 611 |       MPI_Put(scdata.data() + comm_proc_buf_disp[i], ssizes[target_rank], MPI_GRAPH_TYPE,
 612 |               target_rank, disp[target_rank], ssizes[target_rank], MPI_GRAPH_TYPE, commwin);
 613 |   }
 614 | #endif
 615 | #elif defined(USE_MPI_SENDRECV)
 616 |   for (int i = 0; i < nprocs; i++) {
 617 |       if (i != me)
 618 |           MPI_Sendrecv(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
 619 |                   rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
 620 |                   gcomm, MPI_STATUSES_IGNORE);
 621 | 
 622 |       spos += ssizes[i];
 623 |       rpos += rsizes[i];
 624 |   }
 625 | #else
 626 |   for (int i = 0; i < nprocs; i++) {
 627 |     if ((i != me) && (rsizes[i] > 0))
 628 |       MPI_Irecv(rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 
 629 |               CommunityTag, gcomm, &rreqs[i]);
 630 |     else
 631 |       rreqs[i] = MPI_REQUEST_NULL;
 632 | 
 633 |     rpos += rsizes[i];
 634 |   }
 635 |   for (int i = 0; i < nprocs; i++) {
 636 |     if ((i != me) && (ssizes[i] > 0))
 637 |       MPI_Isend(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
 638 |               CommunityTag, gcomm, &sreqs[i]);
 639 |     else
 640 |       sreqs[i] = MPI_REQUEST_NULL;
 641 | 
 642 |     spos += ssizes[i];
 643 |   }
 644 | 
 645 |   MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
 646 |   MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
 647 | #endif
 648 | #ifdef DEBUG_PRINTF  
 649 |   t1 = MPI_Wtime();
 650 |   ta += (t1 - t0);
 651 | #endif
 652 | 
 653 |   // reserve vectors
 654 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR)
 655 |   for (GraphElem i = 0; i < nprocs; i++) {
 656 |       rcinfo[i].reserve(rpos);
 657 |   }
 658 | #endif
 659 | 
 660 |   // fetch baseptr from MPI window
 661 | #if defined(USE_MPI_RMA)
 662 |   MPI_Win_flush_all(commwin);
 663 |   MPI_Barrier(gcomm);
 664 | 
 665 |   GraphElem *rcbuf = nullptr;
 666 |   int flag = 0;
 667 |   MPI_Win_get_attr(commwin, MPI_WIN_BASE, &rcbuf, &flag);
 668 | #endif
 669 | 
 670 |   remoteComm.clear();
 671 |   for (GraphElem i = 0; i < rpos; i++) {
 672 | 
 673 | #if defined(USE_MPI_RMA)
 674 |     const GraphElem comm = rcbuf[i];
 675 | #else
 676 |     const GraphElem comm = rcdata[i];
 677 | #endif
 678 | 
 679 |     remoteComm.insert(std::unordered_map<GraphElem, GraphElem>::value_type(rvdata[i], comm));
 680 |     const int tproc = dg.get_owner(comm);
 681 | 
 682 |     if (tproc != me)
 683 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR)
 684 |       rcinfo[tproc].emplace_back(comm);
 685 | #else
 686 |       rcinfo[tproc].insert(comm);
 687 | #endif
 688 |   }
 689 | 
 690 |   for (GraphElem i = 0; i < nv; i++) {
 691 |     const GraphElem comm = currComm[i];
 692 |     const int tproc = dg.get_owner(comm);
 693 | 
 694 |     if (tproc != me)
 695 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR)
 696 |       rcinfo[tproc].emplace_back(comm);
 697 | #else
 698 |       rcinfo[tproc].insert(comm);
 699 | #endif
 700 |   }
 701 | 
 702 | #ifdef DEBUG_PRINTF  
 703 |   t0 = MPI_Wtime();
 704 | #endif
 705 |   GraphElem stcsz = 0, rtcsz = 0;
 706 |   
 707 | #ifdef OMP_SCHEDULE_RUNTIME
 708 | #pragma omp parallel for shared(scsizes, rcinfo) \
 709 |   reduction(+:stcsz) schedule(runtime)
 710 | #else
 711 | #pragma omp parallel for shared(scsizes, rcinfo) \
 712 |   reduction(+:stcsz) schedule(static)
 713 | #endif
 714 |   for (int i = 0; i < nprocs; i++) {
 715 |     scsizes[i] = rcinfo[i].size();
 716 |     stcsz += scsizes[i];
 717 |   }
 718 | 
 719 |   MPI_Alltoall(scsizes.data(), 1, MPI_GRAPH_TYPE, rcsizes.data(), 
 720 |           1, MPI_GRAPH_TYPE, gcomm);
 721 | 
 722 | #ifdef DEBUG_PRINTF  
 723 |   t1 = MPI_Wtime();
 724 |   ta += (t1 - t0);
 725 | #endif
 726 | 
 727 | #ifdef OMP_SCHEDULE_RUNTIME
 728 | #pragma omp parallel for shared(rcsizes) \
 729 |   reduction(+:rtcsz) schedule(runtime)
 730 | #else
 731 | #pragma omp parallel for shared(rcsizes) \
 732 |   reduction(+:rtcsz) schedule(static)
 733 | #endif
 734 |   for (int i = 0; i < nprocs; i++) {
 735 |     rtcsz += rcsizes[i];
 736 |   }
 737 | 
 738 | #ifdef DEBUG_PRINTF  
 739 |   std::cout << "[" << me << "]Total communities to receive: " << rtcsz << std::endl;
 740 | #endif
 741 | #if defined(USE_MPI_COLLECTIVES)
 742 |   std::vector<GraphElem> rcomms(rtcsz), scomms(stcsz);
 743 | #else
 744 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR)
 745 |   std::vector<GraphElem> rcomms(rtcsz);
 746 | #else
 747 |   std::vector<GraphElem> rcomms(rtcsz), scomms(stcsz);
 748 | #endif
 749 | #endif
 750 |   sinfo.resize(rtcsz);
 751 |   rinfo.resize(stcsz);
 752 | 
 753 | #ifdef DEBUG_PRINTF  
 754 |   t0 = MPI_Wtime();
 755 | #endif
 756 |   spos = 0;
 757 |   rpos = 0;
 758 | #if defined(USE_MPI_COLLECTIVES)
 759 |   for (int i = 0; i < nprocs; i++) {
 760 |       if (i != me) {
 761 |           std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
 762 |       }
 763 |       scnts[i] = scsizes[i];
 764 |       rcnts[i] = rcsizes[i];
 765 |       sdispls[i] = spos;
 766 |       rdispls[i] = rpos;
 767 |       spos += scnts[i];
 768 |       rpos += rcnts[i];
 769 |   }
 770 |   scnts[me] = 0;
 771 |   rcnts[me] = 0;
 772 |   MPI_Alltoallv(scomms.data(), scnts.data(), sdispls.data(), 
 773 |           MPI_GRAPH_TYPE, rcomms.data(), rcnts.data(), rdispls.data(), 
 774 |           MPI_GRAPH_TYPE, gcomm);
 775 | 
 776 |   for (int i = 0; i < nprocs; i++) {
 777 |       if (i != me) {
 778 | #ifdef OMP_SCHEDULE_RUNTIME
 779 | #pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \
 780 |           firstprivate(i, base), schedule(runtime) , if(rcsizes[i] >= 1000)
 781 | #else
 782 | #pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \
 783 |           firstprivate(i, base), schedule(guided) , if(rcsizes[i] >= 1000)
 784 | #endif
 785 |           for (GraphElem j = 0; j < rcsizes[i]; j++) {
 786 |               const GraphElem comm = rcomms[rdispls[i] + j];
 787 |               sinfo[rdispls[i] + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
 788 |           }
 789 |       }
 790 |   }
 791 |   
 792 |   MPI_Alltoallv(sinfo.data(), rcnts.data(), rdispls.data(), 
 793 |           commType, rinfo.data(), scnts.data(), sdispls.data(), 
 794 |           commType, gcomm);
 795 | #else
 796 | #if !defined(USE_MPI_SENDRECV)
 797 |   std::vector<MPI_Request> rcreqs(nprocs);
 798 | #endif
 799 |   for (int i = 0; i < nprocs; i++) {
 800 |       if (i != me) {
 801 | #if defined(USE_MPI_SENDRECV)
 802 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR)
 803 |           MPI_Sendrecv(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
 804 |                   rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
 805 |                   gcomm, MPI_STATUSES_IGNORE);
 806 | #else
 807 |           std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
 808 |           MPI_Sendrecv(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
 809 |                   rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
 810 |                   gcomm, MPI_STATUSES_IGNORE);
 811 | #endif
 812 | #else
 813 |           if (rcsizes[i] > 0) {
 814 |               MPI_Irecv(rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, 
 815 |                       CommunityTag, gcomm, &rreqs[i]);
 816 |           }
 817 |           else
 818 |               rreqs[i] = MPI_REQUEST_NULL;
 819 | 
 820 |           if (scsizes[i] > 0) {
 821 | #if defined(REPLACE_STL_UOSET_WITH_VECTOR)
 822 |               MPI_Isend(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, 
 823 |                       CommunityTag, gcomm, &sreqs[i]);
 824 | #else
 825 |               std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
 826 |               MPI_Isend(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, 
 827 |                       CommunityTag, gcomm, &sreqs[i]);
 828 | #endif
 829 |           }
 830 |           else
 831 |               sreqs[i] = MPI_REQUEST_NULL;
 832 | #endif
 833 |       }
 834 |   else {
 835 | #if !defined(USE_MPI_SENDRECV)
 836 |           rreqs[i] = MPI_REQUEST_NULL;
 837 |           sreqs[i] = MPI_REQUEST_NULL;
 838 | #endif
 839 |       }
 840 |       rpos += rcsizes[i];
 841 |       spos += scsizes[i];
 842 |   }
 843 | 
 844 |   spos = 0;
 845 |   rpos = 0;
 846 |           
 847 |   // poke progress on last isend/irecvs
 848 | #if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
 849 |   int tf = 0, id = 0;
 850 |   MPI_Testany(nprocs, sreqs.data(), &id, &tf, MPI_STATUS_IGNORE);
 851 | #endif
 852 | 
 853 | #if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && !defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
 854 |   MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
 855 |   MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
 856 | #endif
 857 | 
 858 |   for (int i = 0; i < nprocs; i++) {
 859 |       if (i != me) {
 860 | #if defined(USE_MPI_SENDRECV)
 861 | #ifdef OMP_SCHEDULE_RUNTIME
 862 | #pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
 863 |           firstprivate(i, rpos, base), schedule(runtime) , if(rcsizes[i] >= 1000)
 864 | 
 865 | #else
 866 | #pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
 867 |           firstprivate(i, rpos, base), schedule(guided) , if(rcsizes[i] >= 1000)
 868 | #endif
 869 |           for (GraphElem j = 0; j < rcsizes[i]; j++) {
 870 |               const GraphElem comm = rcomms[rpos + j];
 871 |               sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
 872 |           }
 873 |           
 874 |           MPI_Sendrecv(sinfo.data() + rpos, rcsizes[i], commType, i, CommunityDataTag, 
 875 |                   rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 
 876 |                   gcomm, MPI_STATUSES_IGNORE);
 877 | #else
 878 |           if (scsizes[i] > 0) {
 879 |               MPI_Irecv(rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 
 880 |                       gcomm, &rcreqs[i]);
 881 |           }
 882 |           else
 883 |               rcreqs[i] = MPI_REQUEST_NULL;
 884 | 
 885 |           // poke progress on last isend/irecvs
 886 | #if defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
 887 |           int flag = 0, done = 0;
 888 |           while (!done) {
 889 |               MPI_Test(&sreqs[i], &flag, MPI_STATUS_IGNORE);
 890 |               MPI_Test(&rreqs[i], &flag, MPI_STATUS_IGNORE);
 891 |               if (flag) 
 892 |                   done = 1;
 893 |           }
 894 | #endif
 895 | 
 896 | #ifdef OMP_SCHEDULE_RUNTIME
 897 | #pragma omp parallel for default(shared), shared(rcsizes, rcomms, localCinfo, sinfo), \
 898 |           firstprivate(i, rpos, base), schedule(runtime) , if(rcsizes[i] >= 1000)
 899 | #else
 900 | #pragma omp parallel for default(shared), shared(rcsizes, rcomms, localCinfo, sinfo), \
 901 |           firstprivate(i, rpos, base), schedule(guided) , if(rcsizes[i] >= 1000) 
 902 | #endif
 903 |           for (GraphElem j = 0; j < rcsizes[i]; j++) {
 904 |               const GraphElem comm = rcomms[rpos + j];
 905 |               sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
 906 |           }
 907 | 
 908 |           if (rcsizes[i] > 0) {
 909 |               MPI_Isend(sinfo.data() + rpos, rcsizes[i], commType, i, 
 910 |                       CommunityDataTag, gcomm, &sreqs[i]);
 911 |           }
 912 |           else
 913 |               sreqs[i] = MPI_REQUEST_NULL;
 914 | #endif
 915 |       }
 916 |       else {
 917 | #if !defined(USE_MPI_SENDRECV)
 918 |           rcreqs[i] = MPI_REQUEST_NULL;
 919 |           sreqs[i] = MPI_REQUEST_NULL;
 920 | #endif
 921 |       }
 922 |       rpos += rcsizes[i];
 923 |       spos += scsizes[i];
 924 |   }
 925 | 
 926 | #if !defined(USE_MPI_SENDRECV)
 927 |   MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
 928 |   MPI_Waitall(nprocs, rcreqs.data(), MPI_STATUSES_IGNORE);
 929 | #endif
 930 | 
 931 | #endif
 932 | 
 933 | #ifdef DEBUG_PRINTF  
 934 |   t1 = MPI_Wtime();
 935 |   ta += (t1 - t0);
 936 | #endif
 937 | 
 938 |   remoteCinfo.clear();
 939 |   remoteCupdate.clear();
 940 | 
 941 |   for (GraphElem i = 0; i < stcsz; i++) {
 942 |       const GraphElem ccomm = rinfo[i].community;
 943 | 
 944 |       Comm comm;
 945 | 
 946 |       comm.size = rinfo[i].size;
 947 |       comm.degree = rinfo[i].degree;
 948 | 
 949 |       remoteCinfo.insert(std::map<GraphElem,Comm>::value_type(ccomm, comm));
 950 |       remoteCupdate.insert(std::map<GraphElem,Comm>::value_type(ccomm, Comm()));
 951 |   }
 952 | } // end fillRemoteCommunities
 953 | 
 954 | void createCommunityMPIType()
 955 | {
 956 |   CommInfo cinfo;
 957 | 
 958 |   MPI_Aint begin, community, size, degree;
 959 | 
 960 |   MPI_Get_address(&cinfo, &begin);
 961 |   MPI_Get_address(&cinfo.community, &community);
 962 |   MPI_Get_address(&cinfo.size, &size);
 963 |   MPI_Get_address(&cinfo.degree, &degree);
 964 | 
 965 |   int blens[] = { 1, 1, 1 };
 966 |   MPI_Aint displ[] = { community - begin, size - begin, degree - begin };
 967 |   MPI_Datatype types[] = { MPI_GRAPH_TYPE, MPI_GRAPH_TYPE, MPI_WEIGHT_TYPE };
 968 | 
 969 |   MPI_Type_create_struct(3, blens, displ, types, &commType);
 970 |   MPI_Type_commit(&commType);
 971 | } // createCommunityMPIType
 972 | 
 973 | void destroyCommunityMPIType()
 974 | {
 975 |   MPI_Type_free(&commType);
 976 | } // destroyCommunityMPIType
 977 | 
 978 | void updateRemoteCommunities(const Graph &dg, std::vector<Comm> &localCinfo,
 979 | 			     const std::map<GraphElem,Comm> &remoteCupdate,
 980 | 			     const int me, const int nprocs)
 981 | {
 982 |   const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
 983 |   std::vector<std::vector<CommInfo>> remoteArray(nprocs);
 984 |   MPI_Comm gcomm = dg.get_comm();
 985 |   
 986 |   // FIXME TODO can we use TBB::concurrent_vector instead,
 987 |   // to make this parallel; first we have to get rid of maps
 988 |   for (std::map<GraphElem,Comm>::const_iterator iter = remoteCupdate.begin(); iter != remoteCupdate.end(); iter++) {
 989 |       const GraphElem i = iter->first;
 990 |       const Comm &curr = iter->second;
 991 | 
 992 |       const int tproc = dg.get_owner(i);
 993 | 
 994 | #ifdef DEBUG_PRINTF  
 995 |       assert(tproc != me);
 996 | #endif
 997 |       CommInfo rcinfo;
 998 | 
 999 |       rcinfo.community = i;
1000 |       rcinfo.size = curr.size;
1001 |       rcinfo.degree = curr.degree;
1002 | 
1003 |       remoteArray[tproc].push_back(rcinfo);
1004 |   }
1005 | 
1006 |   std::vector<GraphElem> send_sz(nprocs), recv_sz(nprocs);
1007 | 
1008 | #ifdef DEBUG_PRINTF  
1009 |   GraphWeight tc = 0.0;
1010 |   const double t0 = MPI_Wtime();
1011 | #endif
1012 | 
1013 | #ifdef OMP_SCHEDULE_RUNTIME
1014 | #pragma omp parallel for schedule(runtime)
1015 | #else
1016 | #pragma omp parallel for schedule(static)
1017 | #endif
1018 |   for (int i = 0; i < nprocs; i++) {
1019 |     send_sz[i] = remoteArray[i].size();
1020 |   }
1021 | 
1022 |   MPI_Alltoall(send_sz.data(), 1, MPI_GRAPH_TYPE, recv_sz.data(), 
1023 |           1, MPI_GRAPH_TYPE, gcomm);
1024 | 
1025 | #ifdef DEBUG_PRINTF  
1026 |   const double t1 = MPI_Wtime();
1027 |   tc += (t1 - t0);
1028 | #endif
1029 | 
1030 |   GraphElem rcnt = 0, scnt = 0;
1031 | #ifdef OMP_SCHEDULE_RUNTIME
1032 | #pragma omp parallel for shared(recv_sz, send_sz) \
1033 |   reduction(+:rcnt, scnt) schedule(runtime)
1034 | #else
1035 | #pragma omp parallel for shared(recv_sz, send_sz) \
1036 |   reduction(+:rcnt, scnt) schedule(static)
1037 | #endif
1038 |   for (int i = 0; i < nprocs; i++) {
1039 |     rcnt += recv_sz[i];
1040 |     scnt += send_sz[i];
1041 |   }
1042 | #ifdef DEBUG_PRINTF  
1043 |   std::cout << "[" << me << "]Total number of remote communities to update: " << scnt << std::endl;
1044 | #endif
1045 | 
1046 |   GraphElem currPos = 0;
1047 |   std::vector<CommInfo> rdata(rcnt);
1048 | 
1049 | #ifdef DEBUG_PRINTF  
1050 |   const double t2 = MPI_Wtime();
1051 | #endif
1052 | #if defined(USE_MPI_SENDRECV)
1053 |   for (int i = 0; i < nprocs; i++) {
1054 |       if (i != me)
1055 |           MPI_Sendrecv(remoteArray[i].data(), send_sz[i], commType, i, CommunityDataTag, 
1056 |                   rdata.data() + currPos, recv_sz[i], commType, i, CommunityDataTag, 
1057 |                   gcomm, MPI_STATUSES_IGNORE);
1058 | 
1059 |       currPos += recv_sz[i];
1060 |   }
1061 | #else
1062 |   std::vector<MPI_Request> sreqs(nprocs), rreqs(nprocs);
1063 |   for (int i = 0; i < nprocs; i++) {
1064 |     if ((i != me) && (recv_sz[i] > 0))
1065 |       MPI_Irecv(rdata.data() + currPos, recv_sz[i], commType, i, 
1066 |               CommunityDataTag, gcomm, &rreqs[i]);
1067 |     else
1068 |       rreqs[i] = MPI_REQUEST_NULL;
1069 | 
1070 |     currPos += recv_sz[i];
1071 |   }
1072 | 
1073 |   for (int i = 0; i < nprocs; i++) {
1074 |     if ((i != me) && (send_sz[i] > 0))
1075 |       MPI_Isend(remoteArray[i].data(), send_sz[i], commType, i, 
1076 |               CommunityDataTag, gcomm, &sreqs[i]);
1077 |     else
1078 |       sreqs[i] = MPI_REQUEST_NULL;
1079 |   }
1080 | 
1081 |   MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
1082 |   MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
1083 | #endif
1084 | #ifdef DEBUG_PRINTF  
1085 |   const double t3 = MPI_Wtime();
1086 |   std::cout << "[" << me << "]Update remote community MPI time: " << (t3 - t2) << std::endl;
1087 | #endif
1088 | 
1089 | #ifdef OMP_SCHEDULE_RUNTIME
1090 | #pragma omp parallel for shared(rdata, localCinfo) schedule(runtime)
1091 | #else
1092 | #pragma omp parallel for shared(rdata, localCinfo) schedule(dynamic)
1093 | #endif
1094 |   for (GraphElem i = 0; i < rcnt; i++) {
1095 |     const CommInfo &curr = rdata[i];
1096 | 
1097 | #ifdef DEBUG_PRINTF  
1098 |     assert(dg.get_owner(curr.community) == me);
1099 | #endif
1100 |     localCinfo[curr.community-base].size += curr.size;
1101 |     localCinfo[curr.community-base].degree += curr.degree;
1102 |   }
1103 | } // updateRemoteCommunities
1104 | 
1105 | // initial setup before Louvain iteration begins
1106 | #if defined(USE_MPI_RMA)
1107 | void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz,
1108 |         std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
1109 |         std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata,
1110 |         const int me, const int nprocs, MPI_Win &commwin)
1111 | #else
1112 | void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz,
1113 |         std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
1114 |         std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata,
1115 |         const int me, const int nprocs)
1116 | #endif
1117 | {
1118 |   const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
1119 |   const GraphElem nv = dg.get_lnv();
1120 |   MPI_Comm gcomm = dg.get_comm();
1121 | 
1122 | #ifdef USE_OPENMP_LOCK
1123 |   std::vector<omp_lock_t> locks(nprocs);
1124 |   for (int i = 0; i < nprocs; i++)
1125 |     omp_init_lock(&locks[i]);
1126 | #endif
1127 |   std::vector<std::unordered_set<GraphElem>> parray(nprocs);
1128 | 
1129 | #ifdef USE_OPENMP_LOCK
1130 | #pragma omp parallel default(shared), shared(dg, locks, parray), firstprivate(me)
1131 | #else
1132 | #pragma omp parallel default(shared), shared(dg, parray), firstprivate(me)
1133 | #endif
1134 |   {
1135 | #ifdef OMP_SCHEDULE_RUNTIME
1136 | #pragma omp for schedule(runtime)
1137 | #else
1138 | #pragma omp for schedule(guided)
1139 | #endif
1140 |     for (GraphElem i = 0; i < nv; i++) {
1141 |       GraphElem e0, e1;
1142 | 
1143 |       dg.edge_range(i, e0, e1);
1144 | 
1145 |       for (GraphElem j = e0; j < e1; j++) {
1146 | 	const Edge &edge = dg.get_edge(j);
1147 | 	const int tproc = dg.get_owner(edge.tail_);
1148 | 
1149 | 	if (tproc != me) {
1150 | #ifdef USE_OPENMP_LOCK
1151 | 	  omp_set_lock(&locks[tproc]);
1152 | #else
1153 |           lock();
1154 | #endif
1155 | 	  parray[tproc].insert(edge.tail_);
1156 | #ifdef USE_OPENMP_LOCK
1157 | 	  omp_unset_lock(&locks[tproc]);
1158 | #else
1159 |           unlock();
1160 | #endif
1161 | 	}
1162 |       }
1163 |     }
1164 |   }
1165 | 
1166 | #ifdef USE_OPENMP_LOCK
1167 |   for (int i = 0; i < nprocs; i++) {
1168 |     omp_destroy_lock(&locks[i]);
1169 |   }
1170 | #endif
1171 |   
1172 |   rsizes.resize(nprocs);
1173 |   ssizes.resize(nprocs);
1174 |   ssz = 0, rsz = 0;
1175 | 
1176 |   int pproc = 0;
1177 |   // TODO FIXME parallelize this loop
1178 |   for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
1179 |     ssz += iter->size();
1180 |     ssizes[pproc] = iter->size();
1181 |     pproc++;
1182 |   }
1183 | 
1184 |   MPI_Alltoall(ssizes.data(), 1, MPI_GRAPH_TYPE, rsizes.data(), 
1185 |           1, MPI_GRAPH_TYPE, gcomm);
1186 | 
1187 |   GraphElem rsz_r = 0;
1188 | #ifdef OMP_SCHEDULE_RUNTIME
1189 | #pragma omp parallel for shared(rsizes) \
1190 |   reduction(+:rsz_r) schedule(runtime)
1191 | #else
1192 | #pragma omp parallel for shared(rsizes) \
1193 |   reduction(+:rsz_r) schedule(static)
1194 | #endif
1195 |   for (int i = 0; i < nprocs; i++)
1196 |     rsz_r += rsizes[i];
1197 |   rsz = rsz_r;
1198 |   
1199 |   svdata.resize(ssz);
1200 |   rvdata.resize(rsz);
1201 | 
1202 |   GraphElem cpos = 0, rpos = 0;
1203 |   pproc = 0;
1204 | 
1205 | #if defined(USE_MPI_COLLECTIVES)
1206 |   std::vector<int> scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs);
1207 |   
1208 |   for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
1209 |       std::copy(iter->begin(), iter->end(), svdata.begin() + cpos);
1210 |       
1211 |       scnts[pproc] = iter->size();
1212 |       rcnts[pproc] = rsizes[pproc];
1213 |       sdispls[pproc] = cpos;
1214 |       rdispls[pproc] = rpos;
1215 |       cpos += iter->size();
1216 |       rpos += rcnts[pproc];
1217 | 
1218 |       pproc++;
1219 |   }
1220 | 
1221 |   scnts[me] = 0;
1222 |   rcnts[me] = 0;
1223 |   MPI_Alltoallv(svdata.data(), scnts.data(), sdispls.data(), 
1224 |           MPI_GRAPH_TYPE, rvdata.data(), rcnts.data(), rdispls.data(), 
1225 |           MPI_GRAPH_TYPE, gcomm);
1226 | #else
1227 |   std::vector<MPI_Request> rreqs(nprocs), sreqs(nprocs);
1228 |   for (int i = 0; i < nprocs; i++) {
1229 |       if ((i != me) && (rsizes[i] > 0))
1230 |           MPI_Irecv(rvdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 
1231 |                   VertexTag, gcomm, &rreqs[i]);
1232 |       else
1233 |           rreqs[i] = MPI_REQUEST_NULL;
1234 | 
1235 |       rpos += rsizes[i];
1236 |   }
1237 | 
1238 |   for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
1239 |       std::copy(iter->begin(), iter->end(), svdata.begin() + cpos);
1240 | 
1241 |       if ((me != pproc) && (iter->size() > 0))
1242 |           MPI_Isend(svdata.data() + cpos, iter->size(), MPI_GRAPH_TYPE, pproc, 
1243 |                   VertexTag, gcomm, &sreqs[pproc]);
1244 |       else
1245 |           sreqs[pproc] = MPI_REQUEST_NULL;
1246 | 
1247 |       cpos += iter->size();
1248 |       pproc++;
1249 |   }
1250 | 
1251 |   MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
1252 |   MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
1253 | #endif
1254 | 
1255 |   std::swap(svdata, rvdata);
1256 |   std::swap(ssizes, rsizes);
1257 |   std::swap(ssz, rsz);
1258 | 
1259 |   // create MPI window for communities
1260 | #if defined(USE_MPI_RMA)  
1261 |   GraphElem *ptr = nullptr;
1262 |   MPI_Info info = MPI_INFO_NULL;
1263 | #if defined(USE_MPI_ACCUMULATE)
1264 |   MPI_Info_create(&info);
1265 |   MPI_Info_set(info, "accumulate_ordering", "none");
1266 |   MPI_Info_set(info, "accumulate_ops", "same_op");
1267 | #endif
1268 |   MPI_Win_allocate(rsz*sizeof(GraphElem), sizeof(GraphElem), 
1269 |           info, gcomm, &ptr, &commwin);
1270 |   MPI_Win_lock_all(MPI_MODE_NOCHECK, commwin);
1271 | #endif
1272 | } // exchangeVertexReqs
1273 | 
1274 | #if defined(USE_MPI_RMA)
1275 | GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg,
1276 |         size_t &ssz, size_t &rsz, std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
1277 |         std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata, const GraphWeight lower, 
1278 |         const GraphWeight thresh, int &iters, MPI_Win &commwin)
1279 | #else
1280 | GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg,
1281 |         size_t &ssz, size_t &rsz, std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
1282 |         std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata, const GraphWeight lower, 
1283 |         const GraphWeight thresh, int &iters)
1284 | #endif
1285 | {
1286 |   std::vector<GraphElem> pastComm, currComm, targetComm;
1287 |   std::vector<GraphWeight> vDegree;
1288 |   std::vector<GraphWeight> clusterWeight;
1289 |   std::vector<Comm> localCinfo, localCupdate;
1290 |  
1291 |   std::unordered_map<GraphElem, GraphElem> remoteComm;
1292 |   std::map<GraphElem,Comm> remoteCinfo, remoteCupdate;
1293 |   
1294 |   const GraphElem nv = dg.get_lnv();
1295 |   MPI_Comm gcomm = dg.get_comm();
1296 | 
1297 |   GraphWeight constantForSecondTerm;
1298 |   GraphWeight prevMod = lower;
1299 |   GraphWeight currMod = -1.0;
1300 |   int numIters = 0;
1301 |   
1302 |   distInitLouvain(dg, pastComm, currComm, vDegree, clusterWeight, localCinfo, 
1303 |           localCupdate, constantForSecondTerm, me);
1304 |   targetComm.resize(nv);
1305 | 
1306 | #ifdef DEBUG_PRINTF  
1307 |   std::cout << "[" << me << "]constantForSecondTerm: " << constantForSecondTerm << std::endl;
1308 |   if (me == 0)
1309 |       std::cout << "Threshold: " << thresh << std::endl;
1310 | #endif
1311 |   const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
1312 | 
1313 | #ifdef DEBUG_PRINTF  
1314 |   double t0, t1;
1315 |   t0 = MPI_Wtime();
1316 | #endif
1317 | 
1318 |   // setup vertices and communities
1319 | #if defined(USE_MPI_RMA)
1320 |   exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 
1321 |           svdata, rvdata, me, nprocs, commwin);
1322 |   
1323 |   // store the remote displacements 
1324 |   std::vector<GraphElem> disp(nprocs);
1325 |   MPI_Exscan(ssizes.data(), (GraphElem*)disp.data(), nprocs, MPI_GRAPH_TYPE, 
1326 |           MPI_SUM, gcomm);
1327 | #else
1328 |   exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 
1329 |           svdata, rvdata, me, nprocs);
1330 | #endif
1331 | 
1332 | #ifdef DEBUG_PRINTF  
1333 |   t1 = MPI_Wtime();
1334 |   std::cout << "[" << me << "]Initial communication setup time before Louvain iteration (in s): " << (t1 - t0) << std::endl;
1335 | #endif
1336 |  
1337 |   // start Louvain iteration
1338 |   while(true) {
1339 | #ifdef DEBUG_PRINTF  
1340 |     const double t2 = MPI_Wtime();
1341 |     if (me == 0)
1342 |         std::cout << "Starting Louvain iteration: " << numIters << std::endl;
1343 | #endif
1344 |     numIters++;
1345 | 
1346 | #ifdef DEBUG_PRINTF  
1347 |     t0 = MPI_Wtime();
1348 | #endif
1349 | 
1350 | #if defined(USE_MPI_RMA)
1351 |     fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 
1352 |             rsizes, svdata, rvdata, currComm, localCinfo, 
1353 |             remoteCinfo, remoteComm, remoteCupdate, 
1354 |             commwin, disp);
1355 | #else
1356 |     fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 
1357 |             rsizes, svdata, rvdata, currComm, localCinfo, 
1358 |             remoteCinfo, remoteComm, remoteCupdate);
1359 | #endif
1360 | 
1361 | #ifdef DEBUG_PRINTF  
1362 |     t1 = MPI_Wtime();
1363 |     std::cout << "[" << me << "]Remote community map size: " << remoteComm.size() << std::endl;
1364 |     std::cout << "[" << me << "]Iteration communication time: " << (t1 - t0) << std::endl;
1365 | #endif
1366 | 
1367 | #ifdef DEBUG_PRINTF  
1368 |     t0 = MPI_Wtime();
1369 | #endif
1370 | 
1371 | #pragma omp parallel default(shared), shared(clusterWeight, localCupdate, currComm, targetComm, \
1372 |         vDegree, localCinfo, remoteCinfo, remoteComm, pastComm, dg, remoteCupdate), \
1373 |         firstprivate(constantForSecondTerm, me)
1374 |     {
1375 |         distCleanCWandCU(nv, clusterWeight, localCupdate);
1376 | 
1377 | #ifdef OMP_SCHEDULE_RUNTIME
1378 | #pragma omp for schedule(runtime)
1379 | #else
1380 | #pragma omp for schedule(guided) 
1381 | #endif
1382 |         for (GraphElem i = 0; i < nv; i++) {
1383 |             distExecuteLouvainIteration(i, dg, currComm, targetComm, vDegree, localCinfo, 
1384 |                     localCupdate, remoteComm, remoteCinfo, remoteCupdate,
1385 |                     constantForSecondTerm, clusterWeight, me);
1386 |         }
1387 |     }
1388 | 
1389 | #pragma omp parallel default(none), shared(localCinfo, localCupdate)
1390 |     {
1391 |         distUpdateLocalCinfo(localCinfo, localCupdate);
1392 |     }
1393 | 
1394 |     // communicate remote communities
1395 |     updateRemoteCommunities(dg, localCinfo, remoteCupdate, me, nprocs);
1396 | 
1397 |     // compute modularity
1398 |     currMod = distComputeModularity(dg, localCinfo, clusterWeight, constantForSecondTerm, me);
1399 | 
1400 |     // exit criteria
1401 |     if (currMod - prevMod < thresh)
1402 |         break;
1403 | 
1404 |     prevMod = currMod;
1405 |     if (prevMod < lower)
1406 |         prevMod = lower;
1407 | 
1408 | #ifdef OMP_SCHEDULE_RUNTIME
1409 | #pragma omp parallel for default(shared) \
1410 |     shared(pastComm, currComm, targetComm) \
1411 |     schedule(runtime)
1412 | #else
1413 | #pragma omp parallel for default(shared) \
1414 |     shared(pastComm, currComm, targetComm) \
1415 |     schedule(static)
1416 | #endif
1417 |     for (GraphElem i = 0; i < nv; i++) {
1418 |         GraphElem tmp = pastComm[i];
1419 |         pastComm[i] = currComm[i];
1420 |         currComm[i] = targetComm[i];
1421 |         targetComm[i] = tmp;
1422 |     }
1423 |   } // end of Louvain iteration
1424 | 
1425 | #if defined(USE_MPI_RMA)
1426 |   MPI_Win_unlock_all(commwin);
1427 |   MPI_Win_free(&commwin);
1428 | #endif  
1429 | 
1430 |   iters = numIters;
1431 | 
1432 |   vDegree.clear();
1433 |   pastComm.clear();
1434 |   currComm.clear();
1435 |   targetComm.clear();
1436 |   clusterWeight.clear();
1437 |   localCinfo.clear();
1438 |   localCupdate.clear();
1439 |   
1440 |   return prevMod;
1441 | } // distLouvainMethod plain
1442 | 
1443 | #endif // __DSPL
1444 | 


--------------------------------------------------------------------------------
/graph.hpp:
--------------------------------------------------------------------------------
   1 | // ***********************************************************************
   2 | //
   3 | //                              miniVite
   4 | //
   5 | // ***********************************************************************
   6 | //
   7 | //       Copyright (2018) Battelle Memorial Institute
   8 | //                      All rights reserved.
   9 | //
  10 | // Redistribution and use in source and binary forms, with or without
  11 | // modification, are permitted provided that the following conditions
  12 | // are met:
  13 | //
  14 | // 1. Redistributions of source code must retain the above copyright
  15 | // notice, this list of conditions and the following disclaimer.
  16 | //
  17 | // 2. Redistributions in binary form must reproduce the above copyright
  18 | // notice, this list of conditions and the following disclaimer in the
  19 | // documentation and/or other materials provided with the distribution.
  20 | //
  21 | // 3. Neither the name of the copyright holder nor the names of its
  22 | // contributors may be used to endorse or promote products derived from
  23 | // this software without specific prior written permission.
  24 | //
  25 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  26 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  27 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
  28 | // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
  29 | // COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
  30 | // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  31 | // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  32 | // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  33 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  34 | // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
  35 | // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  36 | // POSSIBILITY OF SUCH DAMAGE.
  37 | //
  38 | // ************************************************************************
  39 | 
  40 | #pragma once
  41 | #ifndef GRAPH_HPP
  42 | #define GRAPH_HPP
  43 | 
  44 | #include <iostream>
  45 | #include <algorithm>
  46 | #include <vector>
  47 | #include <string>
  48 | #include <fstream>
  49 | #include <sstream>
  50 | #include <climits>
  51 | #include <array>
  52 | #include <unordered_map>
  53 | 
  54 | #include <mpi.h>
  55 | 
  56 | #include "utils.hpp"
  57 | 
  58 | unsigned seed;
  59 | 
  60 | struct Edge
  61 | {
  62 |     GraphElem tail_;
  63 |     GraphWeight weight_;
  64 |     
  65 |     Edge(): tail_(-1), weight_(0.0) {}
  66 | };
  67 | 
  68 | struct EdgeTuple
  69 | {
  70 |     GraphElem ij_[2];
  71 |     GraphWeight w_;
  72 | 
  73 |     EdgeTuple(GraphElem i, GraphElem j, GraphWeight w): 
  74 |         ij_{i, j}, w_(w)
  75 |     {}
  76 |     EdgeTuple(GraphElem i, GraphElem j): 
  77 |         ij_{i, j}, w_(1.0) 
  78 |     {}
  79 |     EdgeTuple(): 
  80 |         ij_{-1, -1}, w_(0.0)
  81 |     {}
  82 | };
  83 | 
  84 | // per process graph instance
  85 | class Graph
  86 | {
  87 |     public:
  88 |         Graph(): 
  89 |             lnv_(-1), lne_(-1), nv_(-1), 
  90 |             ne_(-1), comm_(MPI_COMM_WORLD) 
  91 |         {
  92 |             MPI_Comm_size(comm_, &size_);
  93 |             MPI_Comm_rank(comm_, &rank_);
  94 |         }
  95 |         
  96 |         Graph(GraphElem lnv, GraphElem lne, 
  97 |                 GraphElem nv, GraphElem ne, 
  98 |                 MPI_Comm comm=MPI_COMM_WORLD): 
  99 |             lnv_(lnv), lne_(lne), 
 100 |             nv_(nv), ne_(ne), 
 101 |             comm_(comm) 
 102 |         {
 103 |             MPI_Comm_size(comm_, &size_);
 104 |             MPI_Comm_rank(comm_, &rank_);
 105 | 
 106 |             edge_indices_.resize(lnv_+1, 0);
 107 |             edge_list_.resize(lne_); // this is usually populated later
 108 | 
 109 |             parts_.resize(size_+1);
 110 |             parts_[0] = 0;
 111 | 
 112 |             for (GraphElem i = 1; i < size_+1; i++)
 113 |                 parts_[i] = ((nv_ * i) / size_);  
 114 |         }
 115 | 
 116 |         ~Graph() 
 117 |         {
 118 |             edge_list_.clear();
 119 |             edge_indices_.clear();
 120 |             parts_.clear();
 121 |         }
 122 |          
 123 |         // update vertex partition information
 124 |         void repart(std::vector<GraphElem> const& parts)
 125 |         { memcpy(parts_.data(), parts.data(), sizeof(GraphElem)*(size_+1)); }
 126 | 
 127 |         // TODO FIXME put asserts like the following
 128 |         // everywhere function member of Graph class
 129 |         void set_edge_index(GraphElem const vertex, GraphElem const e0)
 130 |         {
 131 | #if defined(DEBUG_BUILD)
 132 |             assert((vertex >= 0) && (vertex <= lnv_));
 133 |             assert((e0 >= 0) && (e0 <= lne_));
 134 |             edge_indices_.at(vertex) = e0;
 135 | #else
 136 |             edge_indices_[vertex] = e0;
 137 | #endif
 138 |         } 
 139 |         
 140 |         void edge_range(GraphElem const vertex, GraphElem& e0, 
 141 |                 GraphElem& e1) const
 142 |         {
 143 |             e0 = edge_indices_[vertex];
 144 |             e1 = edge_indices_[vertex+1];
 145 |         } 
 146 | 
 147 |         // collective
 148 |         void set_nedges(GraphElem lne) 
 149 |         { 
 150 |             lne_ = lne; 
 151 |             edge_list_.resize(lne_);
 152 | 
 153 |             // compute total number of edges
 154 |             ne_ = 0;
 155 |             MPI_Allreduce(&lne_, &ne_, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_);
 156 |         }
 157 | 
 158 |         GraphElem get_base(const int rank) const
 159 |         { return parts_[rank]; }
 160 | 
 161 |         GraphElem get_bound(const int rank) const
 162 |         { return parts_[rank+1]; }
 163 |         
 164 |         GraphElem get_range(const int rank) const
 165 |         { return (parts_[rank+1] - parts_[rank] + 1); }
 166 | 
 167 |         int get_owner(const GraphElem vertex) const
 168 |         {
 169 |             const std::vector<GraphElem>::const_iterator iter = 
 170 |                 std::upper_bound(parts_.begin(), parts_.end(), vertex);
 171 | 
 172 |             return (iter - parts_.begin() - 1);
 173 |         }
 174 | 
 175 |         GraphElem get_lnv() const { return lnv_; }
 176 |         GraphElem get_lne() const { return lne_; }
 177 |         GraphElem get_nv() const { return nv_; }
 178 |         GraphElem get_ne() const { return ne_; }
 179 |         MPI_Comm get_comm() const { return comm_; }
 180 |        
 181 |         // return edge and active info
 182 |         // ----------------------------
 183 |        
 184 |         Edge const& get_edge(GraphElem const index) const
 185 |         { return edge_list_[index]; }
 186 |          
 187 |         Edge& set_edge(GraphElem const index)
 188 |         { return edge_list_[index]; }       
 189 |         
 190 |         // local <--> global index translation
 191 |         // -----------------------------------
 192 |         GraphElem local_to_global(GraphElem idx)
 193 |         { return (idx + get_base(rank_)); }
 194 | 
 195 |         GraphElem global_to_local(GraphElem idx)
 196 |         { return (idx - get_base(rank_)); }
 197 |        
 198 |         // w.r.t passed rank
 199 |         GraphElem local_to_global(GraphElem idx, int rank)
 200 |         { return (idx + get_base(rank)); }
 201 | 
 202 |         GraphElem global_to_local(GraphElem idx, int rank)
 203 |         { return (idx - get_base(rank)); }
 204 |  
 205 |         // print edge list (with weights)
 206 |         void print(bool print_weight = true) const
 207 |         {
 208 |             if (lne_ < MAX_PRINT_NEDGE)
 209 |             {
 210 |                 for (int p = 0; p < size_; p++)
 211 |                 {
 212 |                     MPI_Barrier(comm_);
 213 |                     if (p == rank_)
 214 |                     {
 215 |                         std::cout << "###############" << std::endl;
 216 |                         std::cout << "Process #" << p << ": " << std::endl;
 217 |                         std::cout << "###############" << std::endl;
 218 |                         GraphElem base = get_base(p);
 219 |                         for (GraphElem i = 0; i < lnv_; i++)
 220 |                         {
 221 |                             GraphElem e0, e1;
 222 |                             edge_range(i, e0, e1);
 223 |                             if (print_weight) { // print weights (default)
 224 |                                 for (GraphElem e = e0; e < e1; e++)
 225 |                                 {
 226 |                                     Edge const& edge = get_edge(e);
 227 |                                     std::cout << i+base << " " << edge.tail_ << " " << edge.weight_ << std::endl;
 228 |                                 }
 229 |                             }
 230 |                             else { // don't print weights
 231 |                                 for (GraphElem e = e0; e < e1; e++)
 232 |                                 {
 233 |                                     Edge const& edge = get_edge(e);
 234 |                                     std::cout << i+base << " " << edge.tail_ << std::endl;
 235 |                                 }
 236 |                             }
 237 |                         }
 238 |                         MPI_Barrier(comm_);
 239 |                     }
 240 |                 }
 241 |             }
 242 |             else
 243 |             {
 244 |                 if (rank_ == 0)
 245 |                     std::cout << "Graph size per process is {" << lnv_ << ", " << lne_ << 
 246 |                         "}, which will overwhelm STDOUT." << std::endl;
 247 |             }
 248 |         }
 249 |                 
 250 |         // print statistics about edge distribution
 251 |         void print_dist_stats()
 252 |         {
 253 |             long sumdeg = 0, maxdeg = 0;
 254 |             long lne = (long) lne_;
 255 | 
 256 |             MPI_Reduce(&lne, &sumdeg, 1, MPI_LONG, MPI_SUM, 0, comm_);
 257 |             MPI_Reduce(&lne, &maxdeg, 1, MPI_LONG, MPI_MAX, 0, comm_);
 258 | 
 259 |             long my_sq = lne*lne;
 260 |             long sum_sq = 0;
 261 |             MPI_Reduce(&my_sq, &sum_sq, 1, MPI_LONG, MPI_SUM, 0, comm_);
 262 | 
 263 |             double average  = (double) sumdeg / size_;
 264 |             double avg_sq   = (double) sum_sq / size_;
 265 |             double var      = avg_sq - (average*average);
 266 |             double stddev   = sqrt(var);
 267 | 
 268 |             MPI_Barrier(comm_);
 269 | 
 270 |             if (rank_ == 0)
 271 |             {
 272 |                 std::cout << std::endl;
 273 |                 std::cout << "-------------------------------------------------------" << std::endl;
 274 |                 std::cout << "Graph edge distribution characteristics" << std::endl;
 275 |                 std::cout << "-------------------------------------------------------" << std::endl;
 276 |                 std::cout << "Number of vertices: " << nv_ << std::endl;
 277 |                 std::cout << "Number of edges: " << ne_ << std::endl;
 278 |                 std::cout << "Maximum number of edges: " << maxdeg << std::endl;
 279 |                 std::cout << "Average number of edges: " << average << std::endl;
 280 |                 std::cout << "Expected value of X^2: " << avg_sq << std::endl;
 281 |                 std::cout << "Variance: " << var << std::endl;
 282 |                 std::cout << "Standard deviation: " << stddev << std::endl;
 283 |                 std::cout << "-------------------------------------------------------" << std::endl;
 284 | 
 285 |             }
 286 |         }
 287 |         
 288 |         // public variables
 289 |         std::vector<GraphElem> edge_indices_;
 290 |         std::vector<Edge> edge_list_;
 291 |     private:
 292 |         GraphElem lnv_, lne_, nv_, ne_;
 293 |         std::vector<GraphElem> parts_;       
 294 |         MPI_Comm comm_; 
 295 |         int rank_, size_;
 296 | };
 297 | 
 298 | // read in binary edge list files
 299 | // using MPI I/O
 300 | class BinaryEdgeList
 301 | {
 302 |     public:
 303 |         BinaryEdgeList() : 
 304 |             M_(-1), N_(-1), 
 305 |             M_local_(-1), N_local_(-1), 
 306 |             comm_(MPI_COMM_WORLD) 
 307 |         {}
 308 |         BinaryEdgeList(MPI_Comm comm) : 
 309 |             M_(-1), N_(-1), 
 310 |             M_local_(-1), N_local_(-1), 
 311 |             comm_(comm) 
 312 |         {}
 313 |         
 314 |         // read a file and return a graph
 315 |         Graph* read(int me, int nprocs, int ranks_per_node, std::string file)
 316 |         {
 317 |             int file_open_error;
 318 |             MPI_File fh;
 319 |             MPI_Status status;
 320 | 
 321 |             // specify the number of aggregates
 322 |             MPI_Info info;
 323 |             MPI_Info_create(&info);
 324 |             int naggr = (ranks_per_node > 1) ? (nprocs/ranks_per_node) : ranks_per_node;
 325 |             if (naggr >= nprocs)
 326 |                 naggr = 1;
 327 |             std::stringstream tmp_str;
 328 |             tmp_str << naggr;
 329 |             std::string str = tmp_str.str();
 330 |             MPI_Info_set(info, "cb_nodes", str.c_str());
 331 | 
 332 |             file_open_error = MPI_File_open(comm_, file.c_str(), MPI_MODE_RDONLY, info, &fh); 
 333 |             MPI_Info_free(&info);
 334 | 
 335 |             if (file_open_error != MPI_SUCCESS) 
 336 |             {
 337 |                 std::cout << " Error opening file! " << std::endl;
 338 |                 MPI_Abort(comm_, -99);
 339 |             }
 340 | 
 341 |             // read the dimensions 
 342 |             MPI_File_read_all(fh, &M_, sizeof(GraphElem), MPI_BYTE, &status);
 343 |             MPI_File_read_all(fh, &N_, sizeof(GraphElem), MPI_BYTE, &status);
 344 |             M_local_ = ((M_*(me + 1)) / nprocs) - ((M_*me) / nprocs); 
 345 | 
 346 |             // create local graph
 347 |             Graph *g = new Graph(M_local_, 0, M_, N_);
 348 | 
 349 |             // Let N = array length and P = number of processors.
 350 |             // From j = 0 to P-1,
 351 |             // Starting point of array on processor j = floor(N * j / P)
 352 |             // Length of array on processor j = floor(N * (j + 1) / P) - floor(N * j / P)
 353 | 
 354 |             uint64_t tot_bytes=(M_local_+1)*sizeof(GraphElem);
 355 |             MPI_Offset offset = 2*sizeof(GraphElem) + ((M_*me) / nprocs)*sizeof(GraphElem);
 356 | 
 357 |             // read in INT_MAX increments if total byte size is > INT_MAX
 358 |             
 359 |             if (tot_bytes < INT_MAX)
 360 |                 MPI_File_read_at(fh, offset, &g->edge_indices_[0], tot_bytes, MPI_BYTE, &status);
 361 |             else 
 362 |             {
 363 |                 int chunk_bytes=INT_MAX;
 364 |                 uint8_t *curr_pointer = (uint8_t*) &g->edge_indices_[0];
 365 |                 uint64_t transf_bytes = 0;
 366 | 
 367 |                 while (transf_bytes < tot_bytes)
 368 |                 {
 369 |                     MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status);
 370 |                     transf_bytes += chunk_bytes;
 371 |                     offset += chunk_bytes;
 372 |                     curr_pointer += chunk_bytes;
 373 | 
 374 |                     if ((tot_bytes - transf_bytes) < INT_MAX)
 375 |                         chunk_bytes = tot_bytes - transf_bytes;
 376 |                 } 
 377 |             }    
 378 | 
 379 |             N_local_ = g->edge_indices_[M_local_] - g->edge_indices_[0];
 380 |             g->set_nedges(N_local_);
 381 | 
 382 |             tot_bytes = N_local_*(sizeof(Edge));
 383 |             offset = 2*sizeof(GraphElem) + (M_+1)*sizeof(GraphElem) + g->edge_indices_[0]*(sizeof(Edge));
 384 | 
 385 |             if (tot_bytes < INT_MAX)
 386 |                 MPI_File_read_at(fh, offset, &g->edge_list_[0], tot_bytes, MPI_BYTE, &status);
 387 |             else 
 388 |             {
 389 |                 int chunk_bytes=INT_MAX;
 390 |                 uint8_t *curr_pointer = (uint8_t*)&g->edge_list_[0];
 391 |                 uint64_t transf_bytes = 0;
 392 | 
 393 |                 while (transf_bytes < tot_bytes)
 394 |                 {
 395 |                     MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status);
 396 |                     transf_bytes += chunk_bytes;
 397 |                     offset += chunk_bytes;
 398 |                     curr_pointer += chunk_bytes;
 399 | 
 400 |                     if ((tot_bytes - transf_bytes) < INT_MAX)
 401 |                         chunk_bytes = (tot_bytes - transf_bytes);
 402 |                 } 
 403 |             }    
 404 | 
 405 |             MPI_File_close(&fh);
 406 | 
 407 |             for(GraphElem i=1;  i < M_local_+1; i++)
 408 |                 g->edge_indices_[i] -= g->edge_indices_[0];   
 409 |             g->edge_indices_[0] = 0;
 410 | 
 411 |             return g;
 412 |         }
 413 | 
 414 |         // find a distribution such that every 
 415 |         // process own equal number of edges (serial)
 416 |         void find_balanced_num_edges(int nprocs, std::string file, std::vector<GraphElem>& mbins)
 417 |         {
 418 |             FILE *fp;
 419 |             GraphElem nv, ne; // #vertices, #edges
 420 |             std::vector<GraphElem> nbins(nprocs,0);
 421 |             
 422 |             fp = fopen(file.c_str(), "rb");
 423 |             if (fp == NULL) 
 424 |             {
 425 |                 std::cout<< " Error opening file! " << std::endl;
 426 |                 return;
 427 |             }
 428 | 
 429 |             // read nv and ne
 430 |             fread(&nv, sizeof(GraphElem), 1, fp);
 431 |             fread(&ne, sizeof(GraphElem), 1, fp);
 432 |           
 433 |             // bin capacity
 434 |             GraphElem nbcap = (ne / nprocs), ecount_idx, past_ecount_idx = 0;
 435 |             int p = 0;
 436 | 
 437 |             for (GraphElem m = 0; m < nv; m++)
 438 |             {
 439 |                 fread(&ecount_idx, sizeof(GraphElem), 1, fp);
 440 |                
 441 |                 // bins[p] >= capacity only for the last process
 442 |                 if ((nbins[p] < nbcap) || (p == (nprocs - 1)))
 443 |                     nbins[p] += (ecount_idx - past_ecount_idx);
 444 | 
 445 |                 // increment p as long as p is not the last process
 446 |                 // worst case: excess edges piled up on (p-1)
 447 |                 if ((nbins[p] >= nbcap) && (p < (nprocs - 1)))
 448 |                     p++;
 449 | 
 450 |                 mbins[p+1]++;
 451 |                 past_ecount_idx = ecount_idx;
 452 |             }
 453 |             
 454 |             fclose(fp);
 455 | 
 456 |             // prefix sum to store indices 
 457 |             for (int k = 1; k < nprocs+1; k++)
 458 |                 mbins[k] += mbins[k-1]; 
 459 |             
 460 |             nbins.clear();
 461 |         }
 462 |         
 463 |         // read a file and return a graph
 464 |         // uses a balanced distribution
 465 |         // (approximately equal #edges per process) 
 466 |         Graph* read_balanced(int me, int nprocs, int ranks_per_node, std::string file)
 467 |         {
 468 |             int file_open_error;
 469 |             MPI_File fh;
 470 |             MPI_Status status;
 471 |             std::vector<GraphElem> mbins(nprocs+1,0);
 472 | 
 473 |             // find #vertices per process such that 
 474 |             // each process roughly owns equal #edges
 475 |             if (me == 0)
 476 |             {
 477 |                 find_balanced_num_edges(nprocs, file, mbins);
 478 |                 std::cout << "Trying to achieve equal edge distribution across processes." << std::endl;
 479 |             }
 480 |             MPI_Barrier(comm_);
 481 |             MPI_Bcast(mbins.data(), nprocs+1, MPI_GRAPH_TYPE, 0, comm_);
 482 | 
 483 |             // specify the number of aggregates
 484 |             MPI_Info info;
 485 |             MPI_Info_create(&info);
 486 |             int naggr = (ranks_per_node > 1) ? (nprocs/ranks_per_node) : ranks_per_node;
 487 |             if (naggr >= nprocs)
 488 |                 naggr = 1;
 489 |             std::stringstream tmp_str;
 490 |             tmp_str << naggr;
 491 |             std::string str = tmp_str.str();
 492 |             MPI_Info_set(info, "cb_nodes", str.c_str());
 493 | 
 494 |             file_open_error = MPI_File_open(comm_, file.c_str(), MPI_MODE_RDONLY, info, &fh); 
 495 |             MPI_Info_free(&info);
 496 | 
 497 |             if (file_open_error != MPI_SUCCESS) 
 498 |             {
 499 |                 std::cout << " Error opening file! " << std::endl;
 500 |                 MPI_Abort(comm_, -99);
 501 |             }
 502 | 
 503 |             // read the dimensions 
 504 |             MPI_File_read_all(fh, &M_, sizeof(GraphElem), MPI_BYTE, &status);
 505 |             MPI_File_read_all(fh, &N_, sizeof(GraphElem), MPI_BYTE, &status);
 506 |             M_local_ = mbins[me+1] - mbins[me];
 507 | 
 508 |             // create local graph
 509 |             Graph *g = new Graph(M_local_, 0, M_, N_);
 510 |             // readjust parts with new vertex partition
 511 |             g->repart(mbins);
 512 | 
 513 |             uint64_t tot_bytes=(M_local_+1)*sizeof(GraphElem);
 514 |             MPI_Offset offset = 2*sizeof(GraphElem) + mbins[me]*sizeof(GraphElem);
 515 | 
 516 |             // read in INT_MAX increments if total byte size is > INT_MAX
 517 |             if (tot_bytes < INT_MAX)
 518 |                 MPI_File_read_at(fh, offset, &g->edge_indices_[0], tot_bytes, MPI_BYTE, &status);
 519 |             else 
 520 |             {
 521 |                 int chunk_bytes=INT_MAX;
 522 |                 uint8_t *curr_pointer = (uint8_t*) &g->edge_indices_[0];
 523 |                 uint64_t transf_bytes = 0;
 524 | 
 525 |                 while (transf_bytes < tot_bytes)
 526 |                 {
 527 |                     MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status);
 528 |                     transf_bytes += chunk_bytes;
 529 |                     offset += chunk_bytes;
 530 |                     curr_pointer += chunk_bytes;
 531 | 
 532 |                     if ((tot_bytes - transf_bytes) < INT_MAX)
 533 |                         chunk_bytes = tot_bytes - transf_bytes;
 534 |                 } 
 535 |             }    
 536 | 
 537 |             N_local_ = g->edge_indices_[M_local_] - g->edge_indices_[0];
 538 |             g->set_nedges(N_local_);
 539 | 
 540 |             tot_bytes = N_local_*(sizeof(Edge));
 541 |             offset = 2*sizeof(GraphElem) + (M_+1)*sizeof(GraphElem) + g->edge_indices_[0]*(sizeof(Edge));
 542 | 
 543 |             if (tot_bytes < INT_MAX)
 544 |                 MPI_File_read_at(fh, offset, &g->edge_list_[0], tot_bytes, MPI_BYTE, &status);
 545 |             else 
 546 |             {
 547 |                 int chunk_bytes=INT_MAX;
 548 |                 uint8_t *curr_pointer = (uint8_t*)&g->edge_list_[0];
 549 |                 uint64_t transf_bytes = 0;
 550 | 
 551 |                 while (transf_bytes < tot_bytes)
 552 |                 {
 553 |                     MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status);
 554 |                     transf_bytes += chunk_bytes;
 555 |                     offset += chunk_bytes;
 556 |                     curr_pointer += chunk_bytes;
 557 | 
 558 |                     if ((tot_bytes - transf_bytes) < INT_MAX)
 559 |                         chunk_bytes = (tot_bytes - transf_bytes);
 560 |                 } 
 561 |             }    
 562 | 
 563 |             MPI_File_close(&fh);
 564 | 
 565 |             for(GraphElem i=1;  i < M_local_+1; i++)
 566 |                 g->edge_indices_[i] -= g->edge_indices_[0];   
 567 |             g->edge_indices_[0] = 0;
 568 | 
 569 |             mbins.clear();
 570 | 
 571 |             return g;
 572 |         }
 573 | 
 574 |     private:
 575 |         GraphElem M_;
 576 |         GraphElem N_;
 577 |         GraphElem M_local_;
 578 |         GraphElem N_local_;
 579 |         MPI_Comm comm_;
 580 | };
 581 | 
 582 | // RGG graph
 583 | // 1D vertex distribution
 584 | class GenerateRGG
 585 | {
 586 |     public:
 587 |         GenerateRGG(GraphElem nv, MPI_Comm comm = MPI_COMM_WORLD)
 588 |         {
 589 |             nv_ = nv;
 590 |             comm_ = comm;
 591 | 
 592 |             MPI_Comm_rank(comm_, &rank_);
 593 |             MPI_Comm_size(comm_, &nprocs_);
 594 | 
 595 |             // neighbors
 596 |             up_ = down_ = MPI_PROC_NULL;
 597 |             if (nprocs_ > 1) {
 598 |                 if (rank_ > 0 && rank_ < (nprocs_ - 1)) {
 599 |                     up_ = rank_ - 1;
 600 |                     down_ = rank_ + 1;
 601 |                 }
 602 |                 if (rank_ == 0)
 603 |                     down_ = 1;
 604 |                 if (rank_ == (nprocs_ - 1))
 605 |                     up_ = rank_ - 1;
 606 |             }
 607 | 
 608 |             n_ = nv_ / nprocs_;
 609 | 
 610 |             // check if number of nodes is divisible by #processes
 611 |             if ((nv_ % nprocs_) != 0) {
 612 |                 if (rank_ == 0) {
 613 |                     std::cout << "[ERROR] Number of vertices must be perfectly divisible by number of processes." << std::endl;
 614 |                     std::cout << "Exiting..." << std::endl;
 615 |                 }
 616 |                 MPI_Abort(comm_, -99);
 617 |             }
 618 | 
 619 |             // check if processes are power of 2
 620 |             if (!is_pwr2(nprocs_)) {
 621 |                 if (rank_ == 0) {
 622 |                     std::cout << "[ERROR] Number of processes must be a power of 2." << std::endl;
 623 |                     std::cout << "Exiting..." << std::endl;
 624 |                 }
 625 |                 MPI_Abort(comm_, -99);
 626 |             }
 627 | 
 628 |             // calculate r(n)
 629 |             GraphWeight rc = sqrt((GraphWeight)log(nv)/(GraphWeight)(PI*nv));
 630 |             GraphWeight rt = sqrt((GraphWeight)2.0736/(GraphWeight)nv);
 631 |             rn_ = (rc + rt)/(GraphWeight)2.0;
 632 |             
 633 |             assert(((GraphWeight)1.0/(GraphWeight)nprocs_) > rn_);
 634 |             
 635 |             MPI_Barrier(comm_);
 636 |         }
 637 | 
 638 |         // create RGG and returns Graph
 639 |         // TODO FIXME use OpenMP wherever possible
 640 |         // use Euclidean distance as edge weight
 641 |         // for random edges, choose from (0,1)
 642 |         // otherwise, use unit weight throughout
 643 |         Graph* generate(bool isLCG, bool unitEdgeWeight = true, GraphWeight randomEdgePercent = 0.0)
 644 |         {
 645 |             // Generate random coordinate points
 646 |             std::vector<GraphWeight> X, Y, X_up, Y_up, X_down, Y_down;
 647 |                        
 648 |             if (isLCG)
 649 |                 X.resize(2*n_);
 650 |             else
 651 |                 X.resize(n_);
 652 | 
 653 |             Y.resize(n_);
 654 | 
 655 |             if (up_ != MPI_PROC_NULL) {
 656 |                 X_up.resize(n_);
 657 |                 Y_up.resize(n_);
 658 |             }
 659 | 
 660 |             if (down_ != MPI_PROC_NULL) {
 661 |                 X_down.resize(n_);
 662 |                 Y_down.resize(n_);
 663 |             }
 664 |     
 665 |             // create local graph
 666 |             Graph *g = new Graph(n_, 0, nv_, nv_);
 667 | 
 668 |             // generate random number within range
 669 |             // X: 0, 1
 670 |             // Y: rank_*1/p, (rank_+1)*1/p,
 671 |             GraphWeight rec_np = (GraphWeight)(1.0/(GraphWeight)nprocs_);
 672 |             GraphWeight lo = rank_* rec_np; 
 673 |             GraphWeight hi = lo + rec_np;
 674 |             assert(hi > lo);
 675 | 
 676 |             // measure the time to generate random numbers
 677 |             MPI_Barrier(MPI_COMM_WORLD);
 678 |             double st = MPI_Wtime();
 679 | 
 680 |             if (!isLCG) {
 681 |                 // set seed (declared an extern in utils)
 682 |                 seed = (unsigned)reseeder(1);
 683 | 
 684 | #if defined(PRINT_RANDOM_XY_COORD)
 685 |                 for (int k = 0; k < nprocs_; k++) {
 686 |                     if (k == rank_) {
 687 |                         std::cout << "Random number generated on Process#" << k << " :" << std::endl;
 688 |                         for (GraphElem i = 0; i < n_; i++) {
 689 |                             X[i] = genRandom<GraphWeight>(0.0, 1.0);
 690 |                             Y[i] = genRandom<GraphWeight>(lo, hi);
 691 |                             std::cout << "X, Y: " << X[i] << ", " << Y[i] << std::endl;
 692 |                         }
 693 |                     }
 694 |                     MPI_Barrier(comm_);
 695 |                 }
 696 | #else
 697 |                 for (GraphElem i = 0; i < n_; i++) {
 698 |                     X[i] = genRandom<GraphWeight>(0.0, 1.0);
 699 |                     Y[i] = genRandom<GraphWeight>(lo, hi);
 700 |                 }
 701 | #endif
 702 |             }
 703 |             else { // LCG
 704 |                 // X | Y
 705 |                 // e.g seeds: 1741, 3821
 706 |                 // create LCG object
 707 |                 // seed to generate x0
 708 |                 LCG xr(/*seed*/1, X.data(), 2*n_, comm_); 
 709 |                 
 710 |                 // generate random numbers between 0-1
 711 |                 xr.generate();
 712 | 
 713 |                 // rescale xr further between lo-hi
 714 |                 // and put the numbers in Y taking
 715 |                 // from X[n]
 716 |                 xr.rescale(Y.data(), n_, lo);
 717 | 
 718 | #if defined(PRINT_RANDOM_XY_COORD)
 719 |                 for (int k = 0; k < nprocs_; k++) {
 720 |                     if (k == rank_) {
 721 |                         std::cout << "Random number generated on Process#" << k << " :" << std::endl;
 722 |                         for (GraphElem i = 0; i < n_; i++) {
 723 |                             std::cout << "X, Y: " << X[i] << ", " << Y[i] << std::endl;
 724 |                         }
 725 |                     }
 726 |                     MPI_Barrier(comm_);
 727 |                 }
 728 | #endif
 729 |             }
 730 |                  
 731 |             double et = MPI_Wtime();
 732 |             double tt = et - st;
 733 |             double tot_tt = 0.0;
 734 |             MPI_Reduce(&tt, &tot_tt, 1, MPI_DOUBLE, MPI_SUM, 0, comm_);
 735 |                 
 736 |             if (rank_ == 0) {
 737 |                 double tot_avg = (tot_tt/nprocs_);
 738 |                 std::cout << "Average time to generate " << 2*n_ 
 739 |                     << " random numbers using LCG (in s): " 
 740 |                     << tot_avg << std::endl;
 741 |             }
 742 | 
 743 |             // ghost(s)
 744 |             
 745 |             // cross edges, each processor
 746 |             // communicates with up or/and down
 747 |             // neighbor only
 748 |             std::vector<EdgeTuple> sendup_edges, senddn_edges; 
 749 |             std::vector<EdgeTuple> recvup_edges, recvdn_edges;
 750 |             std::vector<EdgeTuple> edgeList;
 751 |             
 752 |             // counts, indexing: [2] = {up - 0, down - 1}
 753 |             // TODO can't we use MPI_INT 
 754 |             std::array<GraphElem, 2> send_sizes = {0, 0}, recv_sizes = {0, 0};
 755 | #if defined(CHECK_NUM_EDGES)
 756 |             GraphElem numEdges = 0;
 757 | #endif
 758 |             // local
 759 |             for (GraphElem i = 0; i < n_; i++) {
 760 |                 for (GraphElem j = i + 1; j < n_; j++) {
 761 |                     // euclidean distance:
 762 |                     // 2D: sqrt((px-qx)^2 + (py-qy)^2)
 763 |                     GraphWeight dx = X[i] - X[j];
 764 |                     GraphWeight dy = Y[i] - Y[j];
 765 |                     GraphWeight ed = sqrt(dx*dx + dy*dy);
 766 |                     // are the two vertices within the range?
 767 |                     if (ed <= rn_) {
 768 |                         // local to global index
 769 |                         const GraphElem g_i = g->local_to_global(i);
 770 |                         const GraphElem g_j = g->local_to_global(j);
 771 | 
 772 |                         if (!unitEdgeWeight) {
 773 |                             edgeList.emplace_back(i, g_j, ed);
 774 |                             edgeList.emplace_back(j, g_i, ed);
 775 |                         }
 776 |                         else {
 777 |                             edgeList.emplace_back(i, g_j);
 778 |                             edgeList.emplace_back(j, g_i);
 779 |                         }
 780 | #if defined(CHECK_NUM_EDGES)
 781 |                         numEdges += 2;
 782 | #endif
 783 | 
 784 |                         g->edge_indices_[i+1]++;
 785 |                         g->edge_indices_[j+1]++;
 786 |                     }
 787 |                 }
 788 |             }
 789 | 
 790 |             MPI_Barrier(comm_);
 791 |             
 792 |             // communicate ghost coordinates with neighbors
 793 |            
 794 |             const int x_ndown   = X_down.empty() ? 0 : n_;
 795 |             const int y_ndown   = Y_down.empty() ? 0 : n_;
 796 |             const int x_nup     = X_up.empty() ? 0 : n_;
 797 |             const int y_nup     = Y_up.empty() ? 0 : n_;
 798 | 
 799 |             MPI_Sendrecv(X.data(), n_, MPI_WEIGHT_TYPE, up_, SR_X_UP_TAG, 
 800 |                     X_down.data(), x_ndown, MPI_WEIGHT_TYPE, down_, SR_X_UP_TAG, 
 801 |                     comm_, MPI_STATUS_IGNORE);
 802 |             MPI_Sendrecv(X.data(), n_, MPI_WEIGHT_TYPE, down_, SR_X_DOWN_TAG, 
 803 |                     X_up.data(), x_nup, MPI_WEIGHT_TYPE, up_, SR_X_DOWN_TAG, 
 804 |                     comm_, MPI_STATUS_IGNORE);
 805 |             MPI_Sendrecv(Y.data(), n_, MPI_WEIGHT_TYPE, up_, SR_Y_UP_TAG, 
 806 |                     Y_down.data(), y_ndown, MPI_WEIGHT_TYPE, down_, SR_Y_UP_TAG, 
 807 |                     comm_, MPI_STATUS_IGNORE);
 808 |             MPI_Sendrecv(Y.data(), n_, MPI_WEIGHT_TYPE, down_, SR_Y_DOWN_TAG, 
 809 |                     Y_up.data(), y_nup, MPI_WEIGHT_TYPE, up_, SR_Y_DOWN_TAG, 
 810 |                     comm_, MPI_STATUS_IGNORE);
 811 |                         
 812 |             // exchange ghost vertices / cross edges
 813 |             if (nprocs_ > 1) {
 814 |                 if (up_ != MPI_PROC_NULL) {
 815 |                     
 816 |                     for (GraphElem i = 0; i < n_; i++) {
 817 |                         for (GraphElem j = i + 1; j < n_; j++) {
 818 |                             GraphWeight dx = X[i] - X_up[j];
 819 |                             GraphWeight dy = Y[i] - Y_up[j];
 820 |                             GraphWeight ed = sqrt(dx*dx + dy*dy);
 821 |                             
 822 |                             if (ed <= rn_) {
 823 |                                 const GraphElem g_i = g->local_to_global(i);
 824 |                                 const GraphElem g_j = j + up_*n_;
 825 | 
 826 |                                 if (!unitEdgeWeight) {
 827 |                                     sendup_edges.emplace_back(j, g_i, ed);
 828 |                                     edgeList.emplace_back(i, g_j, ed);
 829 |                                 }
 830 |                                 else {
 831 |                                     sendup_edges.emplace_back(j, g_i);
 832 |                                     edgeList.emplace_back(i, g_j);
 833 |                                 }
 834 | #if defined(CHECK_NUM_EDGES)
 835 |                                 numEdges++;
 836 | #endif
 837 |                                 g->edge_indices_[i+1]++;
 838 |                             }
 839 |                         }
 840 |                     }
 841 |                     
 842 |                     // send up sizes
 843 |                     send_sizes[0] = sendup_edges.size();
 844 |                 }
 845 | 
 846 |                 if (down_ != MPI_PROC_NULL) {
 847 |                     
 848 |                     for (GraphElem i = 0; i < n_; i++) {
 849 |                         for (GraphElem j = i + 1; j < n_; j++) {
 850 |                             GraphWeight dx = X[i] - X_down[j];
 851 |                             GraphWeight dy = Y[i] - Y_down[j];
 852 |                             GraphWeight ed = sqrt(dx*dx + dy*dy);
 853 | 
 854 |                             if (ed <= rn_) {
 855 |                                 const GraphElem g_i = g->local_to_global(i);
 856 |                                 const GraphElem g_j = j + down_*n_;
 857 | 
 858 |                                 if (!unitEdgeWeight) {
 859 |                                     senddn_edges.emplace_back(j, g_i, ed);
 860 |                                     edgeList.emplace_back(i, g_j, ed);
 861 |                                 }
 862 |                                 else {
 863 |                                     senddn_edges.emplace_back(j, g_i);
 864 |                                     edgeList.emplace_back(i, g_j);
 865 |                                 }
 866 | #if defined(CHECK_NUM_EDGES)
 867 |                                 numEdges++;
 868 | #endif
 869 |                                 g->edge_indices_[i+1]++;
 870 |                             }
 871 |                         }
 872 |                     }
 873 |                     
 874 |                     // send down sizes
 875 |                     send_sizes[1] = senddn_edges.size();
 876 |                 }
 877 |             }
 878 |             
 879 |             MPI_Barrier(comm_);
 880 |             
 881 |             // communicate ghost vertices with neighbors
 882 |             // send/recv buffer sizes
 883 |             
 884 |             MPI_Sendrecv(&send_sizes[0], 1, MPI_GRAPH_TYPE, up_, SR_SIZES_UP_TAG, 
 885 |                     &recv_sizes[1], 1, MPI_GRAPH_TYPE, down_, SR_SIZES_UP_TAG, 
 886 |                     comm_, MPI_STATUS_IGNORE);
 887 |             MPI_Sendrecv(&send_sizes[1], 1, MPI_GRAPH_TYPE, down_, SR_SIZES_DOWN_TAG, 
 888 |                     &recv_sizes[0], 1, MPI_GRAPH_TYPE, up_, SR_SIZES_DOWN_TAG, 
 889 |                     comm_, MPI_STATUS_IGNORE);
 890 | 
 891 |             // resize recv buffers
 892 |             
 893 |             if (recv_sizes[0] > 0)
 894 |                 recvup_edges.resize(recv_sizes[0]);
 895 |             if (recv_sizes[1] > 0)
 896 |                 recvdn_edges.resize(recv_sizes[1]);
 897 |              
 898 |             // send/recv both up and down
 899 |             
 900 |             MPI_Sendrecv(sendup_edges.data(), send_sizes[0]*sizeof(struct EdgeTuple), MPI_BYTE, 
 901 |                     up_, SR_UP_TAG, recvdn_edges.data(), recv_sizes[1]*sizeof(struct EdgeTuple), 
 902 |                     MPI_BYTE, down_, SR_UP_TAG, comm_, MPI_STATUS_IGNORE);
 903 |             MPI_Sendrecv(senddn_edges.data(), send_sizes[1]*sizeof(struct EdgeTuple), MPI_BYTE, 
 904 |                     down_, SR_DOWN_TAG, recvup_edges.data(), recv_sizes[0]*sizeof(struct EdgeTuple), 
 905 |                     MPI_BYTE, up_, SR_DOWN_TAG, comm_, MPI_STATUS_IGNORE);
 906 | 
 907 |             // update local #edges
 908 |             
 909 |             // down
 910 |             if (down_ != MPI_PROC_NULL) {
 911 |                 for (GraphElem i = 0; i < recv_sizes[1]; i++) {
 912 | #if defined(CHECK_NUM_EDGES)
 913 |                     numEdges++;
 914 | #endif           
 915 |                     if (!unitEdgeWeight)
 916 |                         edgeList.emplace_back(recvdn_edges[i].ij_[0], recvdn_edges[i].ij_[1], recvdn_edges[i].w_);
 917 |                     else
 918 |                         edgeList.emplace_back(recvdn_edges[i].ij_[0], recvdn_edges[i].ij_[1]);
 919 |                     g->edge_indices_[recvdn_edges[i].ij_[0]+1]++; 
 920 |                 } 
 921 |             }
 922 | 
 923 |             // up
 924 |             if (up_ != MPI_PROC_NULL) {
 925 |                 for (GraphElem i = 0; i < recv_sizes[0]; i++) {
 926 | #if defined(CHECK_NUM_EDGES)
 927 |                     numEdges++;
 928 | #endif
 929 |                     if (!unitEdgeWeight)
 930 |                         edgeList.emplace_back(recvup_edges[i].ij_[0], recvup_edges[i].ij_[1], recvup_edges[i].w_);
 931 |                     else
 932 |                         edgeList.emplace_back(recvup_edges[i].ij_[0], recvup_edges[i].ij_[1]);
 933 |                     g->edge_indices_[recvup_edges[i].ij_[0]+1]++; 
 934 |                 }
 935 |             }
 936 |             
 937 |             // add random edges based on 
 938 |             // randomEdgePercent 
 939 |             if (randomEdgePercent > 0.0) {
 940 |                 const GraphElem pnedges = (edgeList.size()/2);
 941 |                 GraphElem tot_pnedges = 0;
 942 | 
 943 |                 MPI_Allreduce(&pnedges, &tot_pnedges, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_);
 944 |                 
 945 |                 // extra #edges per process
 946 |                 const GraphElem nrande = (((GraphElem)(randomEdgePercent * (GraphWeight)tot_pnedges))/100);
 947 |                 GraphElem pnrande = 0.0;
 948 | 
 949 |                 // TODO FIXME try to ensure a fair edge distibution
 950 |                 if (nrande < nprocs_) {
 951 |                     if (rank_ == (nprocs_ - 1))
 952 |                         pnrande += nrande;
 953 |                 }
 954 |                 else {
 955 |                     pnrande = nrande / nprocs_;
 956 |                     const GraphElem pnrem = nrande % nprocs_;
 957 |                     if (pnrem != 0) {
 958 |                         if (rank_ == (nprocs_ - 1))
 959 |                             pnrande += pnrem;
 960 |                     }
 961 |                 }
 962 |                
 963 |                 // add pnrande edges 
 964 | 
 965 |                 // send/recv buffers
 966 |                 std::vector<std::vector<EdgeTuple>> rand_edges(nprocs_); 
 967 |                 std::vector<EdgeTuple> sendrand_edges, recvrand_edges;
 968 | 
 969 |                 // outgoing/incoming send/recv sizes
 970 |                 // TODO FIXME if number of randomly added edges are above
 971 |                 // INT_MAX, weird things will happen, fix it
 972 |                 std::vector<int> sendrand_sizes(nprocs_), recvrand_sizes(nprocs_);
 973 | 
 974 | #if defined(PRINT_EXTRA_NEDGES)
 975 |                 int extraEdges = 0;
 976 | #endif
 977 | 
 978 | #if defined(DEBUG_PRINTF)
 979 |                 for (int i = 0; i < nprocs_; i++) {
 980 |                     if (i == rank_) {
 981 |                         std::cout << "[" << i << "]Target process for random edge insertion between " 
 982 |                             << lo << " and " << hi << std::endl;
 983 |                     }
 984 |                     MPI_Barrier(comm_);
 985 |                 }
 986 | #endif
 987 |                 // make sure each process has a 
 988 |                 // different seed this time since
 989 |                 // we want random edges
 990 |                 unsigned rande_seed = (unsigned)(time(0)^getpid());
 991 |                 GraphWeight weight = 1.0;
 992 |                 std::hash<GraphElem> reh;
 993 |                
 994 |                 // cannot use genRandom if it's already been seeded
 995 |                 std::default_random_engine re(rande_seed); 
 996 |                 std::uniform_int_distribution<GraphElem> IR, JR; 
 997 |                 std::uniform_real_distribution<GraphWeight> IJW; 
 998 |  
 999 |                 for (GraphElem k = 0; k < pnrande; k++) {
1000 | 
1001 |                     // randomly pick start/end vertex and target from my list
1002 |                     const GraphElem i = (GraphElem)IR(re, std::uniform_int_distribution<GraphElem>::param_type{0, (n_- 1)});
1003 |                     const GraphElem g_j = (GraphElem)JR(re, std::uniform_int_distribution<GraphElem>::param_type{0, (nv_- 1)});
1004 |                     const int target = g->get_owner(g_j);
1005 |                     const GraphElem j = g->global_to_local(g_j, target); // local
1006 | 
1007 |                     if (i == j) 
1008 |                         continue;
1009 | 
1010 |                     const GraphElem g_i = g->local_to_global(i);
1011 |                     
1012 |                     // check for duplicates prior to edgeList insertion
1013 |                     auto found = std::find_if(edgeList.begin(), edgeList.end(), 
1014 |                             [&](EdgeTuple const& et) 
1015 |                             { return ((et.ij_[0] == i) && (et.ij_[1] == g_j)); });
1016 | 
1017 |                     // OK to insert, not in list
1018 |                     if (found == std::end(edgeList)) { 
1019 |                    
1020 |                         // calculate weight
1021 |                         if (!unitEdgeWeight) {
1022 |                             if (target == rank_) {
1023 |                                 GraphWeight dx = X[i] - X[j];
1024 |                                 GraphWeight dy = Y[i] - Y[j];
1025 |                                 weight = sqrt(dx*dx + dy*dy);
1026 |                             }
1027 |                             else if (target == up_) {
1028 |                                 GraphWeight dx = X[i] - X_up[j];
1029 |                                 GraphWeight dy = Y[i] - Y_up[j];
1030 |                                 weight = sqrt(dx*dx + dy*dy);
1031 |                             }
1032 |                             else if (target == down_) {
1033 |                                 GraphWeight dx = X[i] - X_down[j];
1034 |                                 GraphWeight dy = Y[i] - Y_down[j];
1035 |                                 weight = sqrt(dx*dx + dy*dy);
1036 |                             }
1037 |                             else {
1038 |                                 unsigned randw_seed = reh((GraphElem)(g_i*nv_+g_j));
1039 |                                 std::default_random_engine rew(randw_seed); 
1040 |                                 weight = (GraphWeight)IJW(rew, std::uniform_real_distribution<GraphWeight>::param_type{0.01, 1.0});
1041 |                             }
1042 |                         }
1043 | 
1044 |                         rand_edges[target].emplace_back(j, g_i, weight);
1045 |                         sendrand_sizes[target]++;
1046 | 
1047 | #if defined(PRINT_EXTRA_NEDGES)
1048 |                         extraEdges++;
1049 | #endif
1050 | #if defined(CHECK_NUM_EDGES)
1051 |                         numEdges++;
1052 | #endif                       
1053 |                         edgeList.emplace_back(i, g_j, weight);
1054 |                         g->edge_indices_[i+1]++;
1055 |                     }
1056 |                 }
1057 |                 
1058 | #if defined(PRINT_EXTRA_NEDGES)
1059 |                 int totExtraEdges = 0;
1060 |                 MPI_Reduce(&extraEdges, &totExtraEdges, 1, MPI_INT, MPI_SUM, 0, comm_);
1061 |                 if (rank_ == 0)
1062 |                     std::cout << "Adding extra " << totExtraEdges << " edges while trying to incorporate " 
1063 |                         << randomEdgePercent << "%" << " extra edges globally." << std::endl;
1064 | #endif
1065 | 
1066 |                 MPI_Barrier(comm_);
1067 |               
1068 |                 // communicate ghosts edges
1069 |                 MPI_Request rande_sreq;
1070 | 
1071 |                 MPI_Ialltoall(sendrand_sizes.data(), 1, MPI_INT, 
1072 |                         recvrand_sizes.data(), 1, MPI_INT, comm_, 
1073 |                         &rande_sreq);
1074 | 
1075 |                 // send data if outgoing size > 0
1076 |                 for (int p = 0; p < nprocs_; p++) {
1077 |                     sendrand_edges.insert(sendrand_edges.end(), 
1078 |                             rand_edges[p].begin(), rand_edges[p].end());
1079 |                 }
1080 | 
1081 |                 MPI_Wait(&rande_sreq, MPI_STATUS_IGNORE);
1082 |                
1083 |                 // total recvbuffer size
1084 |                 const int rcount = std::accumulate(recvrand_sizes.begin(), recvrand_sizes.end(), 0);
1085 |                 recvrand_edges.resize(rcount);
1086 |                                 
1087 |                 // alltoallv for incoming data
1088 |                 // TODO FIXME make sure size of extra edges is 
1089 |                 // within INT limits
1090 |                
1091 |                 int rpos = 0, spos = 0;
1092 |                 std::vector<int> sdispls(nprocs_), rdispls(nprocs_);
1093 |                 
1094 |                 for (int p = 0; p < nprocs_; p++) {
1095 | 
1096 |                     sendrand_sizes[p] *= sizeof(struct EdgeTuple);
1097 |                     recvrand_sizes[p] *= sizeof(struct EdgeTuple);
1098 |                     
1099 |                     sdispls[p] = spos;
1100 |                     rdispls[p] = rpos;
1101 |                     
1102 |                     spos += sendrand_sizes[p];
1103 |                     rpos += recvrand_sizes[p];
1104 |                 }
1105 |                 
1106 |                 MPI_Alltoallv(sendrand_edges.data(), sendrand_sizes.data(), sdispls.data(), 
1107 |                         MPI_BYTE, recvrand_edges.data(), recvrand_sizes.data(), rdispls.data(), 
1108 |                         MPI_BYTE, comm_);
1109 |                 
1110 |                 // update local edge list
1111 |                 for (int i = 0; i < rcount; i++) {
1112 | #if defined(CHECK_NUM_EDGES)
1113 |                     numEdges++;
1114 | #endif
1115 |                     edgeList.emplace_back(recvrand_edges[i].ij_[0], recvrand_edges[i].ij_[1], recvrand_edges[i].w_);
1116 |                     g->edge_indices_[recvrand_edges[i].ij_[0]+1]++; 
1117 |                 }
1118 | 
1119 |                 sendrand_edges.clear();
1120 |                 recvrand_edges.clear();
1121 |                 rand_edges.clear();
1122 |             } // end of (conditional) random edges addition
1123 | 
1124 |             MPI_Barrier(comm_);
1125 |   
1126 |             // set graph edge indices
1127 |             
1128 |             std::vector<GraphElem> ecTmp(n_+1);
1129 |             std::partial_sum(g->edge_indices_.begin(), g->edge_indices_.end(), ecTmp.begin());
1130 |             g->edge_indices_ = ecTmp;
1131 |              
1132 |             for(GraphElem i = 1; i < n_+1; i++)
1133 |                 g->edge_indices_[i] -= g->edge_indices_[0];   
1134 |             g->edge_indices_[0] = 0;
1135 | 
1136 |             g->set_edge_index(0, 0);
1137 |             for (GraphElem i = 0; i < n_; i++)
1138 |                 g->set_edge_index(i+1, g->edge_indices_[i+1]);
1139 |             
1140 |             const GraphElem nedges = g->edge_indices_[n_] - g->edge_indices_[0];
1141 |             g->set_nedges(nedges);
1142 |             
1143 |             // set graph edge list
1144 |             // sort edge list
1145 |             auto ecmp = [] (EdgeTuple const& e0, EdgeTuple const& e1)
1146 |             { return ((e0.ij_[0] < e1.ij_[0]) || ((e0.ij_[0] == e1.ij_[0]) && (e0.ij_[1] < e1.ij_[1]))); };
1147 | 
1148 |             if (!std::is_sorted(edgeList.begin(), edgeList.end(), ecmp)) {
1149 | #if defined(DEBUG_PRINTF)
1150 |                 std::cout << "Edge list is not sorted." << std::endl;
1151 | #endif
1152 |                 std::sort(edgeList.begin(), edgeList.end(), ecmp);
1153 |             }
1154 | #if defined(DEBUG_PRINTF)
1155 |             else
1156 |                 std::cout << "Edge list is sorted!" << std::endl;
1157 | #endif
1158 |   
1159 |             GraphElem ePos = 0;
1160 |             for (GraphElem i = 0; i < n_; i++) {
1161 |                 GraphElem e0, e1;
1162 | 
1163 |                 g->edge_range(i, e0, e1);
1164 | #if defined(DEBUG_PRINTF)
1165 |                 if ((i % 100000) == 0)
1166 |                     std::cout << "Processing edges for vertex: " << i << ", range(" << e0 << ", " << e1 <<
1167 |                         ")" << std::endl;
1168 | #endif
1169 |                 for (GraphElem j = e0; j < e1; j++) {
1170 |                     Edge &edge = g->set_edge(j);
1171 | 
1172 |                     assert(ePos == j);
1173 |                     assert(i == edgeList[ePos].ij_[0]);
1174 |                     
1175 |                     edge.tail_ = edgeList[ePos].ij_[1];
1176 |                     edge.weight_ = edgeList[ePos].w_;
1177 | 
1178 |                     ePos++;
1179 |                 }
1180 |             }
1181 |             
1182 | #if defined(CHECK_NUM_EDGES)
1183 |             GraphElem tot_numEdges = 0;
1184 |             MPI_Allreduce(&numEdges, &tot_numEdges, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_);
1185 |             const GraphElem tne = g->get_ne();
1186 |             assert(tne == tot_numEdges);
1187 | #endif
1188 |             edgeList.clear();
1189 |             
1190 |             X.clear();
1191 |             Y.clear();
1192 |             X_up.clear();
1193 |             Y_up.clear();
1194 |             X_down.clear();
1195 |             Y_down.clear();
1196 | 
1197 |             sendup_edges.clear();
1198 |             senddn_edges.clear();
1199 |             recvup_edges.clear();
1200 |             recvdn_edges.clear();
1201 | 
1202 |             return g;
1203 |         }
1204 | 
1205 |         GraphWeight get_d() const { return rn_; }
1206 |         GraphElem get_nv() const { return nv_; }
1207 | 
1208 |     private:
1209 |         GraphElem nv_, n_;
1210 |         GraphWeight rn_;
1211 |         MPI_Comm comm_;
1212 |         int nprocs_, rank_, up_, down_;
1213 | };
1214 | 
1215 | #endif
1216 | 


--------------------------------------------------------------------------------
/main.cpp:
--------------------------------------------------------------------------------
  1 | // ***********************************************************************
  2 | //
  3 | //                              miniVite
  4 | //
  5 | // ***********************************************************************
  6 | //
  7 | //       Copyright (2018) Battelle Memorial Institute
  8 | //                      All rights reserved.
  9 | //
 10 | // Redistribution and use in source and binary forms, with or without
 11 | // modification, are permitted provided that the following conditions
 12 | // are met:
 13 | //
 14 | // 1. Redistributions of source code must retain the above copyright
 15 | // notice, this list of conditions and the following disclaimer.
 16 | //
 17 | // 2. Redistributions in binary form must reproduce the above copyright
 18 | // notice, this list of conditions and the following disclaimer in the
 19 | // documentation and/or other materials provided with the distribution.
 20 | //
 21 | // 3. Neither the name of the copyright holder nor the names of its
 22 | // contributors may be used to endorse or promote products derived from
 23 | // this software without specific prior written permission.
 24 | //
 25 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 26 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 27 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
 28 | // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
 29 | // COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
 30 | // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
 31 | // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 32 | // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 33 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 34 | // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
 35 | // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 36 | // POSSIBILITY OF SUCH DAMAGE.
 37 | //
 38 | // ************************************************************************ 
 39 | 
 40 | 
 41 | #include <sys/resource.h>
 42 | #include <sys/time.h>
 43 | #include <unistd.h>
 44 | 
 45 | #include <cassert>
 46 | #include <cstdlib>
 47 | 
 48 | #include <fstream>
 49 | #include <iostream>
 50 | #include <sstream>
 51 | #include <string>
 52 | 
 53 | #include <omp.h>
 54 | #include <mpi.h>
 55 | 
 56 | #include "dspl.hpp"
 57 | 
 58 | // TODO FIXME add options for desired MPI thread-level
 59 | 
 60 | static std::string inputFileName;
 61 | static int me, nprocs;
 62 | static int ranksPerNode = 1;
 63 | static GraphElem nvRGG = 0;
 64 | static bool generateGraph = false;
 65 | static bool readBalanced = false;
 66 | static bool showGraph = false;
 67 | static GraphWeight randomEdgePercent = 0.0;
 68 | static bool randomNumberLCG = false;
 69 | static bool isUnitEdgeWeight = true;
 70 | static GraphWeight threshold = 1.0E-6;
 71 | 
 72 | // parse command line parameters
 73 | static void parseCommandLine(const int argc, char * const argv[]);
 74 | 
 75 | int main(int argc, char *argv[])
 76 | {
 77 |   double t0, t1, t2, t3, ti = 0.0;
 78 | #ifdef DISABLE_THREAD_MULTIPLE_CHECK
 79 |   MPI_Init(&argc, &argv);
 80 | #else  
 81 |   int max_threads;
 82 | 
 83 |   max_threads = omp_get_max_threads();
 84 | 
 85 |   if (max_threads > 1) {
 86 |       int provided;
 87 |       MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
 88 |       if (provided < MPI_THREAD_MULTIPLE) {
 89 |           std::cerr << "MPI library does not support MPI_THREAD_MULTIPLE." << std::endl;
 90 |           MPI_Abort(MPI_COMM_WORLD, -99);
 91 |       }
 92 |   } else {
 93 |       MPI_Init(&argc, &argv);
 94 |   }
 95 | #endif
 96 | 
 97 |   MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
 98 |   MPI_Comm_rank(MPI_COMM_WORLD, &me);
 99 | 
100 |   parseCommandLine(argc, argv);
101 | 
102 |   createCommunityMPIType();
103 |   double td0, td1, td, tdt;
104 | 
105 |   MPI_Barrier(MPI_COMM_WORLD);
106 |   td0 = MPI_Wtime();
107 | 
108 |   Graph* g = nullptr;
109 | 
110 |   // generate graph only supports RGG as of now
111 |   if (generateGraph) { 
112 |       GenerateRGG gr(nvRGG);
113 |       g = gr.generate(randomNumberLCG, isUnitEdgeWeight, randomEdgePercent);
114 |   }
115 |   else { // read input graph
116 |       BinaryEdgeList rm;
117 |       if (readBalanced == true)
118 |           g = rm.read_balanced(me, nprocs, ranksPerNode, inputFileName);
119 |       else
120 |           g = rm.read(me, nprocs, ranksPerNode, inputFileName);
121 |   }
122 | 
123 |   assert(g != nullptr);
124 |   if (showGraph)
125 |       g->print();
126 | 
127 | #ifdef PRINT_DIST_STATS 
128 |   g->print_dist_stats();
129 | #endif
130 | 
131 |   MPI_Barrier(MPI_COMM_WORLD);
132 | #ifdef DEBUG_PRINTF  
133 |   assert(g);
134 | #endif  
135 |   td1 = MPI_Wtime();
136 |   td = td1 - td0;
137 | 
138 |   MPI_Reduce(&td, &tdt, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
139 |  
140 |   if (me == 0)  {
141 |       if (!generateGraph)
142 |           std::cout << "Time to read input file and create distributed graph (in s): " 
143 |               << (tdt/nprocs) << std::endl;
144 |       else
145 |           std::cout << "Time to generate distributed graph of " 
146 |               << nvRGG << " vertices (in s): " << (tdt/nprocs) << std::endl;
147 |   }
148 | 
149 |   GraphWeight currMod = -1.0;
150 |   GraphWeight prevMod = -1.0;
151 |   double total = 0.0;
152 | 
153 |   std::vector<GraphElem> ssizes, rsizes, svdata, rvdata;
154 | #if defined(USE_MPI_RMA)
155 |   MPI_Win commwin;
156 | #endif
157 |   size_t ssz = 0, rsz = 0;
158 |   int iters = 0;
159 |     
160 |   MPI_Barrier(MPI_COMM_WORLD);
161 | 
162 |   t1 = MPI_Wtime();
163 | 
164 | #if defined(USE_MPI_RMA)
165 |   currMod = distLouvainMethod(me, nprocs, *g, ssz, rsz, ssizes, rsizes, 
166 |                 svdata, rvdata, currMod, threshold, iters, commwin);
167 | #else
168 |   currMod = distLouvainMethod(me, nprocs, *g, ssz, rsz, ssizes, rsizes, 
169 |                 svdata, rvdata, currMod, threshold, iters);
170 | #endif
171 |   MPI_Barrier(MPI_COMM_WORLD);
172 |   t0 = MPI_Wtime();
173 |   total = t0 - t1;
174 | 
175 |   double tot_time = 0.0;
176 |   MPI_Reduce(&total, &tot_time, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
177 | 
178 |   if (me == 0) {
179 |       double avgt = (tot_time / nprocs);
180 |       if (!generateGraph) {
181 |         std::cout << "-------------------------------------------------------" << std::endl;
182 |         std::cout << "File: " << inputFileName << std::endl;
183 |         std::cout << "-------------------------------------------------------" << std::endl;
184 |       }
185 |       std::cout << "-------------------------------------------------------" << std::endl;
186 | #ifdef USE_32_BIT_GRAPH
187 |       std::cout << "32-bit datatype" << std::endl;
188 | #else
189 |       std::cout << "64-bit datatype" << std::endl;
190 | #endif
191 |       std::cout << "-------------------------------------------------------" << std::endl;
192 |       std::cout << "Average total time (in s), #Processes: " << avgt << ", " << nprocs << std::endl;
193 |       std::cout << "Modularity, #Iterations: " << currMod << ", " << iters << std::endl;
194 |       std::cout << "MODS (final modularity * average time): " << (currMod * avgt) << std::endl;
195 |       std::cout << "-------------------------------------------------------" << std::endl;
196 |   }
197 | 
198 |   MPI_Barrier(MPI_COMM_WORLD);
199 | 
200 |   delete g;
201 |   destroyCommunityMPIType();
202 | 
203 |   MPI_Finalize();
204 | 
205 |   return 0;
206 | } // main
207 | 
208 | void parseCommandLine(const int argc, char * const argv[])
209 | {
210 |   int ret;
211 | 
212 |   while ((ret = getopt(argc, argv, "f:br:t:n:wlp:s")) != -1) {
213 |     switch (ret) {
214 |     case 'f':
215 |       inputFileName.assign(optarg);
216 |       break;
217 |     case 'b':
218 |       readBalanced = true;
219 |       break;
220 |     case 'r':
221 |       ranksPerNode = atoi(optarg);
222 |       break;
223 |     case 't':
224 |       threshold = atof(optarg);
225 |       break;
226 |     case 'n':
227 |       nvRGG = atol(optarg);
228 |       if (nvRGG > 0)
229 |           generateGraph = true; 
230 |       break;
231 |     case 'w':
232 |       isUnitEdgeWeight = false;
233 |       break;
234 |     case 'l':
235 |       randomNumberLCG = true;
236 |       break;
237 |     case 'p':
238 |       randomEdgePercent = atof(optarg);
239 |       break;
240 |     case 's':
241 |       showGraph = true;
242 |       break;
243 |     default:
244 |       assert(0 && "Option not recognized!!!");
245 |       break;
246 |     }
247 |   }
248 | 
249 |   if (me == 0 && (argc == 1)) {
250 |       std::cerr << "Must specify some options." << std::endl;
251 |       MPI_Abort(MPI_COMM_WORLD, -99);
252 |   }
253 |   
254 |   if (me == 0 && !generateGraph && inputFileName.empty()) {
255 |       std::cerr << "Must specify a binary file name with -f or provide parameters for generating a graph." << std::endl;
256 |       MPI_Abort(MPI_COMM_WORLD, -99);
257 |   }
258 |    
259 |   if (me == 0 && !generateGraph && randomNumberLCG) {
260 |       std::cerr << "Must specify -g for graph generation using LCG." << std::endl;
261 |       MPI_Abort(MPI_COMM_WORLD, -99);
262 |   } 
263 |    
264 |   if (me == 0 && !generateGraph && (randomEdgePercent > 0.0)) {
265 |       std::cerr << "Must specify -g for graph generation first to add random edges to it." << std::endl;
266 |       MPI_Abort(MPI_COMM_WORLD, -99);
267 |   } 
268 |   
269 |   if (me == 0 && !generateGraph && !isUnitEdgeWeight) {
270 |       std::cerr << "Must specify -g for graph generation first before setting edge weights." << std::endl;
271 |       MPI_Abort(MPI_COMM_WORLD, -99);
272 |   }
273 |   
274 |   if (me == 0 && generateGraph && ((randomEdgePercent < 0) || (randomEdgePercent >= 100))) {
275 |       std::cerr << "Invalid random edge percentage for generated graph!" << std::endl;
276 |       MPI_Abort(MPI_COMM_WORLD, -99);
277 |   }
278 | } // parseCommandLine
279 | 


--------------------------------------------------------------------------------
/utils.hpp:
--------------------------------------------------------------------------------
  1 | // ***********************************************************************
  2 | //
  3 | //                              miniVite
  4 | //
  5 | // ***********************************************************************
  6 | //
  7 | //       Copyright (2018) Battelle Memorial Institute
  8 | //                      All rights reserved.
  9 | //
 10 | // Redistribution and use in source and binary forms, with or without
 11 | // modification, are permitted provided that the following conditions
 12 | // are met:
 13 | //
 14 | // 1. Redistributions of source code must retain the above copyright
 15 | // notice, this list of conditions and the following disclaimer.
 16 | //
 17 | // 2. Redistributions in binary form must reproduce the above copyright
 18 | // notice, this list of conditions and the following disclaimer in the
 19 | // documentation and/or other materials provided with the distribution.
 20 | //
 21 | // 3. Neither the name of the copyright holder nor the names of its
 22 | // contributors may be used to endorse or promote products derived from
 23 | // this software without specific prior written permission.
 24 | //
 25 | // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 26 | // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 27 | // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
 28 | // FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
 29 | // COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
 30 | // INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
 31 | // BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 32 | // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 33 | // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 34 | // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
 35 | // ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 36 | // POSSIBILITY OF SUCH DAMAGE.
 37 | //
 38 | // ************************************************************************ 
 39 | 
 40 | #pragma once
 41 | #ifndef UTILS_HPP
 42 | #define UTILS_HPP
 43 | 
 44 | #define PI                          (3.14159)
 45 | 
 46 | #ifndef MAX_PRINT_NEDGE
 47 | #define MAX_PRINT_NEDGE             (10000000)
 48 | #endif
 49 | 
 50 | // Read https://en.wikipedia.org/wiki/Linear_congruential_generator#Period_length
 51 | // about choice of LCG parameters
 52 | // From numerical recipes
 53 | // TODO FIXME investigate larger periods
 54 | #define MLCG                        (2147483647)    // 2^31 - 1
 55 | #define ALCG                        (16807)         // 7^5
 56 | #define BLCG                        (0)
 57 | 
 58 | #define SR_UP_TAG                   100
 59 | #define SR_DOWN_TAG                 101
 60 | #define SR_SIZES_UP_TAG             102
 61 | #define SR_SIZES_DOWN_TAG           103
 62 | #define SR_X_UP_TAG                 104
 63 | #define SR_X_DOWN_TAG               105
 64 | #define SR_Y_UP_TAG                 106
 65 | #define SR_Y_DOWN_TAG               107
 66 | #define SR_LCG_TAG                  108
 67 | 
 68 | #include <random>
 69 | #include <utility>
 70 | #include <cstring>
 71 | 
 72 | #ifdef USE_32_BIT_GRAPH
 73 | using GraphElem = int32_t;
 74 | using GraphWeight = float;
 75 | const MPI_Datatype MPI_GRAPH_TYPE = MPI_INT32_T;
 76 | const MPI_Datatype MPI_WEIGHT_TYPE = MPI_FLOAT;
 77 | #else
 78 | using GraphElem = int64_t;
 79 | using GraphWeight = double;
 80 | const MPI_Datatype MPI_GRAPH_TYPE = MPI_INT64_T;
 81 | const MPI_Datatype MPI_WEIGHT_TYPE = MPI_DOUBLE;
 82 | #endif
 83 | 
 84 | extern unsigned seed;
 85 | 
 86 | // Is nprocs a power-of-2?
 87 | int is_pwr2(int nprocs) 
 88 | { return ((nprocs != 0) && !(nprocs & (nprocs - 1))); }
 89 | 
 90 | // return unint32_t seed
 91 | GraphElem reseeder(unsigned initseed)
 92 | {
 93 |     std::seed_seq seq({initseed});
 94 |     std::vector<std::uint32_t> seeds(1);
 95 |     seq.generate(seeds.begin(), seeds.end());
 96 | 
 97 |     return (GraphElem)seeds[0];
 98 | }
 99 | 
100 | // Local random number generator 
101 | template<typename T, typename G = std::default_random_engine>
102 | T genRandom(T lo, T hi)
103 | {
104 |     thread_local static G gen(seed);
105 |     using Dist = typename std::conditional
106 |         <
107 |         std::is_integral<T>::value
108 |         , std::uniform_int_distribution<T>
109 |         , std::uniform_real_distribution<T>
110 |         >::type;
111 | 
112 |     thread_local static Dist utd {};
113 |     return utd(gen, typename Dist::param_type{lo, hi});
114 | }
115 | 
116 | // Parallel Linear Congruential Generator
117 | // x[i] = (a*x[i-1] + b)%M
118 | class LCG
119 | {
120 |     public:
121 |         LCG(unsigned seed, GraphWeight* drand, 
122 |             GraphElem n, MPI_Comm comm = MPI_COMM_WORLD): 
123 |         seed_(seed), drand_(drand), n_(n)
124 |         {
125 |             comm_ = comm;
126 |             MPI_Comm_size(comm_, &nprocs_);
127 |             MPI_Comm_rank(comm_, &rank_);
128 | 
129 |             // allocate long random numbers
130 |             rnums_.resize(n_);
131 | 
132 |             // init x0
133 |             if (rank_ == 0)
134 |                 x0_ = reseeder(seed_);
135 | 
136 |             // step #1: bcast x0 from root
137 |             MPI_Bcast(&x0_, 1, MPI_GRAPH_TYPE, 0, comm_);
138 |             
139 |             // step #2: parallel prefix to generate first random value per process
140 |             parallel_prefix_op();
141 |         }
142 |         
143 |         ~LCG() { rnums_.clear(); }
144 | 
145 |         // matrix-matrix multiplication for 2x2 matrices
146 |         void matmat_2x2(GraphElem c[], GraphElem a[], GraphElem b[])
147 |         {
148 |             for (int i = 0; i < 2; i++) {
149 |                 for (int j = 0; j < 2; j++) {
150 |                     GraphElem sum = 0;
151 |                     for (int k = 0; k < 2; k++) {
152 |                         sum += a[i*2+k]*b[k*2+j];
153 |                     }
154 |                     c[i*2+j] = sum;
155 |                 }
156 |             }
157 |         }
158 | 
159 |         // x *= y
160 |         void matop_2x2(GraphElem x[], GraphElem y[])
161 |         {
162 |             GraphElem tmp[4];
163 |             matmat_2x2(tmp, x, y);
164 |             memcpy(x, tmp, sizeof(GraphElem[4]));
165 |         }
166 | 
167 |         // find kth power of a 2x2 matrix
168 |         void mat_power(GraphElem mat[], GraphElem k)
169 |         {
170 |             GraphElem tmp[4];
171 |             memcpy(tmp, mat, sizeof(GraphElem[4]));
172 | 
173 |             // mat-mat multiply k times
174 |             for (GraphElem p = 0; p < k-1; p++)
175 |                 matop_2x2(mat, tmp);
176 |         }
177 | 
178 |         // parallel prefix for matrix-matrix operation
179 |         // `x0 is the very first random number in the series
180 |         // `ab is a 2-length array which stores a and b
181 |         // `n_ is (n/p)
182 |         // `rnums is n_ length array which stores the random nums for a process
183 |         void parallel_prefix_op()
184 |         {
185 |             GraphElem global_op[4]; 
186 |             global_op[0] = ALCG;
187 |             global_op[1] = 0;
188 |             global_op[2] = BLCG;
189 |             global_op[3] = 1;
190 | 
191 |             mat_power(global_op, n_);        // M^(n/p)
192 |             GraphElem prefix_op[4] = {1,0,0,1};  // I in row-major
193 | 
194 |             GraphElem global_op_recv[4];
195 | 
196 |             int steps = (int)(log2((double)nprocs_));
197 | 
198 |             for (int s = 0; s < steps; s++) {
199 |                 
200 |                 int mate = rank_^(1 << s); // toggle the sth LSB to find my neighbor
201 |                 
202 |                 // send/recv global to/from mate
203 |                 MPI_Sendrecv(global_op, 4, MPI_GRAPH_TYPE, mate, SR_LCG_TAG, 
204 |                         global_op_recv, 4, MPI_GRAPH_TYPE, mate, SR_LCG_TAG, 
205 |                         comm_, MPI_STATUS_IGNORE);
206 | 
207 |                 matop_2x2(global_op, global_op_recv);   
208 |                 
209 |                 if (mate < rank_) 
210 |                     matop_2x2(prefix_op, global_op_recv);
211 | 
212 |                 MPI_Barrier(comm_);
213 |             }
214 | 
215 |             // populate the first random number entry for each process
216 |             // (x0*a + b)%P
217 |             if (rank_ == 0)
218 |                 rnums_[0] = x0_;
219 |             else
220 |                 rnums_[0] = (x0_*prefix_op[0] + prefix_op[2])%MLCG;
221 |         }
222 | 
223 |         // generate random number based on the first 
224 |         // random number on a process
225 |         // TODO check the 'quick'n dirty generators to
226 |         // see if we can avoid the mod
227 |         void generate()
228 |         {
229 | #if defined(PRINT_LCG_LONG_RANDOM_NUMBERS)
230 |             for (int k = 0; k < nprocs_; k++) {
231 |                 if (k == rank_) {
232 |                     std::cout << "------------" << std::endl;
233 |                     std::cout << "Process#" << rank_ << " :" << std::endl;
234 |                     std::cout << "------------" << std::endl;
235 |                     std::cout << rnums_[0] << std::endl;
236 |                     for (GraphElem i = 1; i < n_; i++) {
237 |                         rnums_[i] = (rnums_[i-1]*ALCG + BLCG)%MLCG;
238 |                         std::cout << rnums_[i] << std::endl;
239 |                     }
240 |                 }
241 |                 MPI_Barrier(comm_);
242 |             }
243 | #else
244 |             for (GraphElem i = 1; i < n_; i++) {
245 |                 rnums_[i] = (rnums_[i-1]*ALCG + BLCG)%MLCG;
246 |             }
247 | #endif
248 |             GraphWeight mult = 1.0 / (GraphWeight)(1.0 + (GraphWeight)(MLCG-1));
249 | 
250 | #if defined(PRINT_LCG_DOUBLE_RANDOM_NUMBERS)
251 |             for (int k = 0; k < nprocs_; k++) {
252 |                 if (k == rank_) {
253 |                     std::cout << "------------" << std::endl;
254 |                     std::cout << "Process#" << rank_ << " :" << std::endl;
255 |                     std::cout << "------------" << std::endl;
256 | 
257 |                     for (GraphElem i = 0; i < n_; i++) {
258 |                         drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult ); // 0-1
259 |                         std::cout << drand_[i] << std::endl;
260 |                     }
261 |                 }
262 |                 MPI_Barrier(comm_);
263 |             }
264 | #else
265 |             for (GraphElem i = 0; i < n_; i++)
266 |                 drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
267 | #endif
268 |         }
269 |          
270 |         // copy from drand_[idx_start] to new_drand, 
271 |         // rescale the random numbers between lo and hi
272 |         void rescale(GraphWeight* new_drand, GraphElem idx_start, GraphWeight const& lo)
273 |         {
274 |             GraphWeight range = (1.0 / (GraphWeight)nprocs_);
275 | 
276 | #if defined(PRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS)
277 |             for (int k = 0; k < nprocs_; k++) {
278 |                 if (k == rank_) {
279 |                     std::cout << "------------" << std::endl;
280 |                     std::cout << "Process#" << rank_ << " :" << std::endl;
281 |                     std::cout << "------------" << std::endl;
282 | 
283 |                     for (GraphElem i = idx_start, j = 0; i < n_; i++, j++) {
284 |                         new_drand[j] = lo + (GraphWeight)(range * drand_[i]);
285 |                         std::cout << new_drand[j] << std::endl;
286 |                     }
287 |                 }
288 |                 MPI_Barrier(comm_);
289 |             }
290 | #else
291 |             for (GraphElem i = idx_start, j = 0; i < n_; i++, j++)
292 |                 new_drand[j] = lo + (GraphWeight)(range * drand_[i]); // lo-hi
293 | #endif
294 |         }
295 | 
296 |     private:
297 |         MPI_Comm comm_;
298 |         int nprocs_, rank_;
299 |         unsigned seed_;
300 |         GraphElem n_, x0_;
301 |         GraphWeight* drand_;
302 |         std::vector<GraphElem> rnums_;
303 | };
304 | 
305 | // locks
306 | #ifdef USE_OPENMP_LOCK
307 | #else
308 | #ifdef USE_SPINLOCK 
309 | #include <atomic>
310 | std::atomic_flag lkd_ = ATOMIC_FLAG_INIT;
311 | #else
312 | #include <mutex>
313 | std::mutex mtx_;
314 | #endif
315 | void lock() {
316 | #ifdef USE_SPINLOCK 
317 |     while (lkd_.test_and_set(std::memory_order_acquire)) { ; } 
318 | #else
319 |     mtx_.lock();
320 | #endif
321 | }
322 | void unlock() { 
323 | #ifdef USE_SPINLOCK 
324 |     lkd_.clear(std::memory_order_release); 
325 | #else
326 |     mtx_.unlock();
327 | #endif
328 | }
329 | #endif
330 | 
331 | #endif // UTILS
332 | 


--------------------------------------------------------------------------------