├── LICENSE ├── Makefile ├── PanguLU_Users_Guide.pdf ├── PanguLU_Users_s_Guide.pdf ├── README.md ├── build_helper.py ├── build_list.csv ├── examples ├── Makefile ├── Trefethen_20b.mtx ├── example.c ├── mmio.h ├── mmio_highlevel.h └── run.sh ├── include ├── pangulu.h └── pangulu_interface_common.h ├── lib └── Makefile ├── make.inc └── src ├── Makefile ├── languages ├── pangulu_en.h └── pangulu_en_us.h ├── pangulu.c ├── pangulu_addmatrix.c ├── pangulu_addmatrix_cuda.c ├── pangulu_check.c ├── pangulu_common.h ├── pangulu_cuda_interface.c ├── pangulu_destroy.c ├── pangulu_gessm_fp64.c ├── pangulu_gessm_fp64_cuda.c ├── pangulu_getrf_fp64.c ├── pangulu_getrf_fp64_cuda.c ├── pangulu_heap.c ├── pangulu_kernel_interface.c ├── pangulu_malloc.c ├── pangulu_mpi.c ├── pangulu_numeric.c ├── pangulu_preprocessing.c ├── pangulu_reorder.c ├── pangulu_spmv_fp64.c ├── pangulu_sptrsv.c ├── pangulu_sptrsv_fp64.c ├── pangulu_ssssm_fp64.c ├── pangulu_ssssm_fp64_cuda.c ├── pangulu_symbolic.c ├── pangulu_thread.c ├── pangulu_time.c ├── pangulu_tstrf_fp64.c ├── pangulu_tstrf_fp64_cuda.c ├── pangulu_utils.c └── platforms ├── 02_GPU └── 01_CUDA │ └── 000_CUDA │ ├── Makefile │ ├── pangulu_cuda.cu │ └── pangulu_cuda.h └── platform_list.csv /Makefile: -------------------------------------------------------------------------------- 1 | all : examples 2 | 3 | .PHONY : examples lib src clean update 4 | 5 | examples : lib 6 | $(MAKE) -C $@ 7 | 8 | lib : src 9 | $(MAKE) -C $@ 10 | 11 | src: 12 | $(MAKE) -C $@ 13 | 14 | clean: 15 | (cd src; $(MAKE) clean) 16 | (cd lib; $(MAKE) clean) 17 | (cd examples; $(MAKE) clean) 18 | 19 | update : clean all -------------------------------------------------------------------------------- /PanguLU_Users_Guide.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SuperScientificSoftwareLaboratory/PanguLU/d11577cf0f5f1dae5ca02fd1d3128982e215ad60/PanguLU_Users_Guide.pdf -------------------------------------------------------------------------------- /PanguLU_Users_s_Guide.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SuperScientificSoftwareLaboratory/PanguLU/d11577cf0f5f1dae5ca02fd1d3128982e215ad60/PanguLU_Users_s_Guide.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PanguLU 2 | 3 | ------------------- 4 | 5 | ## Introduction 6 | 7 | PanguLU is an open source software package for solving a linear system *Ax = b* on heterogeneous distributed platforms. The library is written in C, and exploits parallelism from MPI, OpenMP and CUDA. The sparse LU factorisation algorithm used in PanguLU splits the sparse matrix into multiple equally-sized sparse matrix blocks and computes them by using sparse BLAS. The latest version of PanguLU uses a synchronisation-free communication strategy to reduce the overall latency overhead, and a variety of block-wise sparse BLAS methods have been adaptively called to improve efficiency on CPUs and GPUs. Currently, PanguLU supports both single and double precision, both real and complex values. In addition, our team at the SSSLab is constantly optimising and updating PanguLU. 8 | 9 | ## Structure of code 10 | 11 | ``` 12 | PanguLU/README instructions on installation 13 | PanguLU/src C and CUDA source code, to be compiled into libpangulu.a and libpangulu.so 14 | PanguLU/examples example code 15 | PanguLU/include contains headers archieve libpangulu.a and libpangulu.so 16 | PanguLU/lib contains library archieve libpangulu.a and libpangulu.so 17 | PanguLU/Makefile top-level Makefile that does installation and testing 18 | PanguLU/make.inc compiler, compiler flags included in all Makefiles (excepts examples/Makefile) 19 | ``` 20 | 21 | ## Installation 22 | #### Step 1 : Assert "make" is available. 23 | "make" is an automatic build tool, it is required to build PanguLU. "make" is available in most GNU/Linux. You can install it using package managers like `apt` or `yum`. 24 | 25 | #### Step 2 : Assert MPI library is available. 26 | PanguLU requires MPI library. you need to install MPI library with header files. Tested MPI libraries : OpenMPI 4.1.2, Intel MPI 2021.12. 27 | 28 | #### Step 3 : Assert CUDA is available. (optimal, required if GPU is used) 29 | If GPUs are used, CUDA is required. Tested version : CUDA 12.2. 30 | 31 | #### Step 4 : Assert BLAS library is available. (optimal, required if GPU is not used) 32 | A BLAS library is required if CPU takes part in algebra computing of numeric factorization. Tested version : OpenBLAS 0.3.26. 33 | 34 | #### Step 5 : Assert METIS is available. (optimal but recommended) 35 | The github page of METIS library is : https://github.com/KarypisLab/METIS 36 | 37 | #### Step 6 : Edit `make.inc`. 38 | Search `/path/to` in `make.inc`. Replace them to the path actually on your computer. 39 | 40 | #### Step 7 : Edit `examples/Makefile` 41 | The Makefile of example code doesn't include `make.inc`. Search `/path/to` in `examples/Makefile`. Replace them to the path actually on your computer. 42 | 43 | #### Step 8 : Decide if you want to use GPU. 44 | If you want to use GPU, you should : 45 | - Append `GPU_CUDA` in build_list.csv; 46 | - Add `-DGPU_OPEN` in `PANGULU_FLAGS`. You can find `PANGULU_FLAGS` in `make.inc`; 47 | - Uncomment `LINK_CUDA` in `examples/Makefile`. 48 | 49 | Vise versa. 50 | 51 | #### Step 9 : Run `make -j` in your terminal. 52 | Make sure the working directory of your terminal is the root directory of PanguLU. If PanguLU was successfully built, you will find `libpangulu.a` and `libpangulu.so` in `lib` directory, and `pangulu_example.elf` in `exampls` directory. 53 | 54 | ## Build flags 55 | `PANGULU_FLAGS` influences build behaviors. You can edit `PANGULU_FLAGS` in `make.inc` to implement different features of PanguLU. Here are available flags : 56 | 57 | #### Decide if or not using GPU. 58 | Use `-DGPU_OPEN` to use GPU, vice versa. Please notice that using this flag is not the only thing to do if you want to use GPU. Please check Step 8 in the Installation part. 59 | 60 | #### Decide the value type of matrix and vector entries. 61 | Use `-DCALCULATE_TYPE_R64` (double real) or `-DCALCULATE_TYPE_CR64` (double complex) or `-DCALCULATE_TYPE_R32` (float real) or `-DCALCULATE_TYPE_CR32` (float complex). 62 | 63 | #### Decide if or not using MC64 reordering algorithm. 64 | Use `-DPANGULU_MC64` to enable MC64 algorithm. Please notice that MC64 is not supported when matrix entries are complex numbers. If complex values are selected and `-DPANGULU_MC64` flag is used, MC64 would not enable. 65 | 66 | #### Decide if or not using METIS reordering tool. 67 | Use `-DMETIS` to enable METIS. 68 | 69 | #### Decide log level. 70 | Please select zero or one of these flags : `-DPANGULU_LOG_INFO`, `-DPANGULU_LOG_WARNING` or `-DPANGULU_LOG_ERROR`. Log level "INFO" prints all messages to standard output (including warnings and errors). Log level "WANRING" only prints warnings and errors. Log level "ERROR" only prints fatal errors causing PanguLU to terminate abnormally. 71 | 72 | #### Decide core binding strategy. 73 | Hyper-threading is not recommended. If you can't turn off the hyper-threading and each core of your CPU has 2 threads, using `-DHT_IS_OPEN` 74 | may reaps performance gain. 75 | 76 | ## Function interfaces 77 | To make it easier to call PanguLU in your software, PanguLU provides the following function interfaces: 78 | 79 | #### 1. pangulu_init() 80 | ``` 81 | void pangulu_init( 82 | int pangulu_n, // Specifies the number of rows in the CSR format matrix. 83 | long long pangulu_nnz, // Specifies the total number of non-zero elements in the CSR format matrix. 84 | long *csr_rowptr, // Points to an array that stores pointers to rows of the CSR format matrix. 85 | int *csr_colidx, // Points to an array that stores indices to columns of the CSR format matrix. 86 | pangulu_calculate_type *csr_value, // Points to an array that stores the values of the CSR format matrix. 87 | pangulu_init_options *init_options, // Pointer to a pangulu_init_options structure. 88 | void **pangulu_handle // On return, contains a handle pointer to the library’s internal state. 89 | ); 90 | ``` 91 | 92 | #### 2. pangulu_gstrf() 93 | ``` 94 | void pangulu_gstrf( 95 | pangulu_gstrf_options *gstrf_options, // Pointer to pangulu_gstrf_options structure. 96 | void **pangulu_handle // Pointer to the solver handle returned on initialization. 97 | ); 98 | ``` 99 | 100 | #### 3. pangulu_gstrs() 101 | ``` 102 | void pangulu_gstrs( 103 | pangulu_calculate_type *rhs, // Pointer to the right-hand side vector. 104 | pangulu_gstrs_options *gstrs_options, // Pointer to the pangulu_gstrs_options structure. 105 | void** pangulu_handle // Pointer to the library internal state handle returned on initialization. 106 | ); 107 | ``` 108 | 109 | #### 4. pangulu_gssv() 110 | ``` 111 | void pangulu_gssv( 112 | pangulu_calculate_type *rhs, // Pointer to the right-hand side vector. 113 | pangulu_gstrf_options *gstrf_options, // Pointer to a pangulu_gstrf_options structure. 114 | pangulu_gstrs_options *gstrs_options, // Pointer to a pangulu_gstrs_options structure. 115 | void **pangulu_handle // Pointer to the library internal status handle returned on initialization. 116 | ); 117 | ``` 118 | 119 | #### 5. pangulu_finalize() 120 | ``` 121 | void pangulu_finalize( 122 | void **pangulu_handle // Pointer to the library internal state handle returned on initialization. 123 | ); 124 | ``` 125 | 126 | `example.c` is a sample program to call PanguLU. You can refer to this file to complete the call to PanguLU. You should first create the distributed matrix using `pangulu_init()`. If you need to solve multiple right-hand side vectors while the matrix is unchanged, you can call `pangulu_gstrs()` multiple times after calling `pangulu_gstrf()`. If you need to factorize a number of different matrices, call `pangulu_finalize()` after completing the solution of one matrix, and then use `pangulu_init()` to to initialize the next matrix. 127 | 128 | ## Executing the example code of PanguLU 129 | The test routines are placed in the `examples` directory. The routine in `examples/example.c` firstly call `pangulu_gstrf()` to perform LU factorization, and then call `pangulu_gstrs()` to solve linear equation. 130 | #### run command 131 | 132 | > **mpirun -np process_count ./pangulu_example.elf -nb block_size -f path_to_mtx** 133 | 134 | process_count : MPI process number to launch PanguLU; 135 | 136 | block_size : Rank of each non-zero block; 137 | 138 | path_to_mtx : The matrix name in mtx format. 139 | 140 | You can also use the run.sh, for example: 141 | 142 | > **bash run path_to_mtx block_size process_count** 143 | 144 | #### test sample 145 | 146 | > **mpirun -np 6 ./pangulu_example.elf -nb 4 -f Trefethen_20b.mtx** 147 | 148 | or use the run.sh: 149 | > **bash run.sh Trefethen_20b.mtx 4 6** 150 | 151 | 152 | In this example, 6 processes are used to test, the block_size is 4, matrix name is Trefethen_20b.mtx. 153 | 154 | 155 | ## Release versions 156 | 157 | ####

Version 4.2.0 (Dec. 13, 2024)

158 | 159 | * Updated preprocessing phase to distributed data structure. 160 | 161 | ####

Version 4.1.0 (Sep. 1, 2024)

162 | 163 | * Optimized memory usage of numeric factorisation and solving; 164 | * Added parallel building support. 165 | 166 | ####

Version 4.0.0 (Jul. 24, 2024)

167 | 168 | * Optimized user interfaces of solver routines; 169 | * Optimized performamce of numeric factorisation phase on CPU platform; 170 | * Added support on complex matrix solving; 171 | * Optimized preprocessing performance; 172 | 173 | ####

Version 3.5.0 (Aug. 06, 2023)

174 | 175 | * Updated the pre-processing phase with OpenMP. 176 | * Updated the compilation method of PanguLU, compile libpangulu.so and libpangulu.a at the same time. 177 | * Updated timing for the reorder phase, the symbolic factorisation phase, the pre-processing phase. 178 | * Added GFLOPS for the numeric factorisation phase. 179 | 180 | ####

Version 3.0.0 (Apr. 02, 2023)

181 | 182 | * Used adaptive selection sparse BLAS in the numeric factorisation phase. 183 | * Added the reorder phase. 184 | * Added the symbolic factorisation phase. 185 | * Added mc64 sorting algorithm in the reorder phase. 186 | * Added interface for 64-bit metis package in the reorder phase. 187 | 188 | 189 | ####

Version 2.0.0 (Jul.  22, 2022)

190 | 191 | * Used a synchronisation-free scheduling strategy in the numeric factorisation phase. 192 | * Updated the MPI communication method in the numeric factorisation phase. 193 | * Added single precision in the numeric factorisation phase. 194 | 195 | ####

Version 1.0.0 (Oct. 19, 2021)

196 | 197 | * Used a rule-based 2D LU factorisation scheduling strategy. 198 | * Used Sparse BLAS for floating point calculations on GPUs. 199 | * Added the pre-processing phase. 200 | * Added the numeric factorisation phase. 201 | * Added the triangular solve phase. 202 | 203 | ## Reference 204 | 205 | * [1] Xu Fu, Bingbin Zhang, Tengcheng Wang, Wenhao Li, Yuechen Lu, Enxin Yi, Jianqi Zhao, Xiaohan Geng, Fangying Li, Jingwen Zhang, Zhou Jin, Weifeng Liu. PanguLU: A Scalable Regular Two-Dimensional Block-Cyclic Sparse Direct Solver on Distributed Heterogeneous Systems. 36th ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’23). 2023. 206 | 207 | 208 | -------------------------------------------------------------------------------- /build_helper.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | import csv 3 | import os 4 | import sys 5 | import subprocess 6 | 7 | def generate_platform_names(build_list_path, platform_list_path): 8 | build_name_list = [] 9 | with open(build_list_path, "r") as f: 10 | build_reader = csv.reader(f) 11 | for build_item in build_reader: 12 | if len(build_item) < 1: 13 | continue 14 | build_name_list.append(build_item[0]) 15 | 16 | platform_list = [] 17 | with open(platform_list_path, "r") as f: 18 | platform_reader = csv.reader(f) 19 | for platform_item in platform_reader: 20 | platform_list.append(platform_item) 21 | 22 | build_name_list_ret = [] 23 | for name in build_name_list: 24 | for platform in platform_list: 25 | if len(platform) < 2: 26 | continue 27 | if platform[1] == name: 28 | build_name_list_ret.append(platform) 29 | break 30 | return build_name_list_ret 31 | 32 | 33 | def generate_platform_paths(build_platform_names, platform_list_path): 34 | platform_paths = [] 35 | for platform in build_platform_names: 36 | platform_id = platform[0] 37 | assert(len(platform_id) == 7) 38 | platform_id_l1 = platform_id[0:2] 39 | platform_id_l2 = platform_id[2:4] 40 | platform_id_l3 = platform_id[4:7] 41 | dir_l1 = None 42 | dir_l2 = None 43 | dir_l3 = None 44 | dirs_l1 = [file for file in os.listdir(os.path.dirname(platform_list_path))] 45 | for current_dir_l1 in dirs_l1: 46 | if current_dir_l1[:2] == platform_id_l1: 47 | dir_l1 = current_dir_l1 48 | break 49 | dirs_l2 = [file for file in os.listdir(os.path.join(os.path.dirname(platform_list_path), dir_l1))] 50 | for current_dir_l2 in dirs_l2: 51 | if current_dir_l2[:2] == platform_id_l2: 52 | dir_l2 = current_dir_l2 53 | break 54 | dirs_l3 = [file for file in os.listdir(os.path.join(os.path.dirname(platform_list_path), dir_l1, dir_l2))] 55 | for current_dir_l3 in dirs_l3: 56 | if current_dir_l3[:3] == platform_id_l3: 57 | dir_l3 = current_dir_l3 58 | break 59 | platform_paths.append([platform_id, f"platforms/{dir_l1}/{dir_l2}/{dir_l3}"]) 60 | return platform_paths 61 | 62 | 63 | def compile_platform_code(build_list_path, platform_list_path): 64 | build_platform_names = generate_platform_names(build_list_path, platform_list_path) 65 | build_platform_paths = generate_platform_paths(build_platform_names, platform_list_path) 66 | for build_platform_path in build_platform_paths: 67 | command = f"make -C src/{build_platform_path[1]}" 68 | print(command) 69 | return_code = subprocess.call(command.split()) 70 | if return_code != 0: 71 | exit(return_code) 72 | 73 | 74 | if __name__ == "__main__": 75 | if sys.argv[1] == "compile_platform_code": 76 | compile_platform_code("build_list.csv", "src/platforms/platform_list.csv") 77 | else: 78 | print("[BUILD_HELPER_ERROR] Unknown command.") 79 | exit(1) -------------------------------------------------------------------------------- /build_list.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SuperScientificSoftwareLaboratory/PanguLU/d11577cf0f5f1dae5ca02fd1d3128982e215ad60/build_list.csv -------------------------------------------------------------------------------- /examples/Makefile: -------------------------------------------------------------------------------- 1 | LINK_METIS = /path/to/libmetis.a /path/to/libGKlib.a 2 | OPENBLAS_LIB = /path/to/libopenblas.a 3 | #LINK_CUDA = -L/path/to/cuda/lib64 -lcudart -lcusparse -lstdc++ 4 | LINK_PANGULU = ../lib/libpangulu.a # Derictly importing static library as compiler input makes dynamic library loader searching the directory of static library. 5 | 6 | all: pangulu_example.elf 7 | 8 | pangulu_example.elf:example.c 9 | mpicc -O3 $< -DCALCULATE_TYPE_R64 -I../include $(LINK_PANGULU) $(LINK_CUDA) $(LINK_METIS) $(OPENBLAS_LIB) -fopenmp -lpthread -lm -o $@ 10 | 11 | clean: 12 | rm -f *.elf 13 | -------------------------------------------------------------------------------- /examples/Trefethen_20b.mtx: -------------------------------------------------------------------------------- 1 | %%MatrixMarket matrix coordinate integer symmetric 2 | %------------------------------------------------------------------------------- 3 | % UF Sparse Matrix Collection, Tim Davis 4 | % http://www.cise.ufl.edu/research/sparse/matrices/JGD_Trefethen/Trefethen_20b 5 | % name: JGD_Trefethen/Trefethen_20b 6 | % [Diagonal matrices with primes, Nick Trefethen, Oxford Univ.] 7 | % id: 2203 8 | % date: 2008 9 | % author: N. Trefethen 10 | % ed: J.-G. Dumas 11 | % fields: name title A id date author ed kind notes 12 | % kind: combinatorial problem 13 | %------------------------------------------------------------------------------- 14 | % notes: 15 | % Diagonal matrices with primes, Nick Trefethen, Oxford Univ. 16 | % From Jean-Guillaume Dumas' Sparse Integer Matrix Collection, 17 | % http://ljk.imag.fr/membres/Jean-Guillaume.Dumas/simc.html 18 | % 19 | % Problem 7 of the Hundred-dollar, Hundred-digit Challenge Problems, 20 | % SIAM News, vol 35, no. 1. 21 | % 22 | % 7. Let A be the 20,000 x 20,000 matrix whose entries are zero 23 | % everywhere except for the primes 2, 3, 5, 7, . . . , 224737 along the 24 | % main diagonal and the number 1 in all the positions A(i,j) with 25 | % |i-j| = 1,2,4,8, . . . ,16384. What is the (1,1) entry of inv(A)? 26 | % 27 | % http://www.siam.org/news/news.php?id=388 28 | % 29 | % Filename in JGD collection: Trefethen/trefethen_20__19_minor.sms 30 | %------------------------------------------------------------------------------- 31 | 19 19 83 32 | 1 1 3 33 | 2 1 1 34 | 3 1 1 35 | 5 1 1 36 | 9 1 1 37 | 17 1 1 38 | 2 2 5 39 | 3 2 1 40 | 4 2 1 41 | 6 2 1 42 | 10 2 1 43 | 18 2 1 44 | 3 3 7 45 | 4 3 1 46 | 5 3 1 47 | 7 3 1 48 | 11 3 1 49 | 19 3 1 50 | 4 4 11 51 | 5 4 1 52 | 6 4 1 53 | 8 4 1 54 | 12 4 1 55 | 5 5 13 56 | 6 5 1 57 | 7 5 1 58 | 9 5 1 59 | 13 5 1 60 | 6 6 17 61 | 7 6 1 62 | 8 6 1 63 | 10 6 1 64 | 14 6 1 65 | 7 7 19 66 | 8 7 1 67 | 9 7 1 68 | 11 7 1 69 | 15 7 1 70 | 8 8 23 71 | 9 8 1 72 | 10 8 1 73 | 12 8 1 74 | 16 8 1 75 | 9 9 29 76 | 10 9 1 77 | 11 9 1 78 | 13 9 1 79 | 17 9 1 80 | 10 10 31 81 | 11 10 1 82 | 12 10 1 83 | 14 10 1 84 | 18 10 1 85 | 11 11 37 86 | 12 11 1 87 | 13 11 1 88 | 15 11 1 89 | 19 11 1 90 | 12 12 41 91 | 13 12 1 92 | 14 12 1 93 | 16 12 1 94 | 13 13 43 95 | 14 13 1 96 | 15 13 1 97 | 17 13 1 98 | 14 14 47 99 | 15 14 1 100 | 16 14 1 101 | 18 14 1 102 | 15 15 53 103 | 16 15 1 104 | 17 15 1 105 | 19 15 1 106 | 16 16 59 107 | 17 16 1 108 | 18 16 1 109 | 17 17 61 110 | 18 17 1 111 | 19 17 1 112 | 18 18 67 113 | 19 18 1 114 | 19 19 71 115 | -------------------------------------------------------------------------------- /examples/example.c: -------------------------------------------------------------------------------- 1 | typedef unsigned long long int sparse_pointer_t; 2 | #define MPI_SPARSE_POINTER_T MPI_UNSIGNED_LONG_LONG 3 | #define FMT_SPARSE_POINTER_T "%llu" 4 | 5 | typedef unsigned int sparse_index_t; 6 | #define MPI_SPARSE_INDEX_T MPI_UNSIGNED 7 | #define FMT_SPARSE_INDEX_T "%u" 8 | 9 | #if defined(CALCULATE_TYPE_R64) 10 | typedef double sparse_value_t; 11 | #elif defined(CALCULATE_TYPE_R32) 12 | typedef float sparse_value_t; 13 | #elif defined(CALCULATE_TYPE_CR64) 14 | typedef double _Complex sparse_value_t; 15 | typedef double sparse_value_real_t; 16 | #define COMPLEX_MTX 17 | #elif defined(CALCULATE_TYPE_CR32) 18 | typedef float _Complex sparse_value_t; 19 | typedef float sparse_value_real_t; 20 | #define COMPLEX_MTX 21 | #else 22 | #error[PanguLU Compile Error] Unknown value type. Set -DCALCULATE_TYPE_CR64 or -DCALCULATE_TYPE_R64 or -DCALCULATE_TYPE_CR32 or -DCALCULATE_TYPE_R32 in compile command line. 23 | #endif 24 | 25 | #include "../include/pangulu.h" 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include "mmio_highlevel.h" 32 | 33 | #ifdef COMPLEX_MTX 34 | sparse_value_real_t complex_fabs(sparse_value_t x) 35 | { 36 | return sqrt(__real__(x) * __real__(x) + __imag__(x) * __imag__(x)); 37 | } 38 | 39 | sparse_value_t complex_sqrt(sparse_value_t x) 40 | { 41 | sparse_value_t y; 42 | __real__(y) = sqrt(complex_fabs(x) + __real__(x)) / sqrt(2); 43 | __imag__(y) = (sqrt(complex_fabs(x) - __real__(x)) / sqrt(2)) * (__imag__(x) > 0 ? 1 : __imag__(x) == 0 ? 0 44 | : -1); 45 | return y; 46 | } 47 | #endif 48 | 49 | void read_command_params(int argc, char **argv, char *mtx_name, char *rhs_name, int *nb) 50 | { 51 | int c; 52 | extern char *optarg; 53 | while ((c = getopt(argc, argv, "nb:f:r:")) != EOF) 54 | { 55 | switch (c) 56 | { 57 | case 'b': 58 | *nb = atoi(optarg); 59 | continue; 60 | case 'f': 61 | strcpy(mtx_name, optarg); 62 | continue; 63 | case 'r': 64 | strcpy(rhs_name, optarg); 65 | continue; 66 | } 67 | } 68 | if ((nb) == 0) 69 | { 70 | printf("Error : nb is 0\n"); 71 | exit(1); 72 | } 73 | } 74 | 75 | int main(int ARGC, char **ARGV) 76 | { 77 | // Step 1: Create varibles, initialize MPI environment. 78 | int provided = 0; 79 | int rank = 0, size = 0; 80 | int nb = 0; 81 | MPI_Init_thread(&ARGC, &ARGV, MPI_THREAD_MULTIPLE, &provided); 82 | MPI_Comm_rank(MPI_COMM_WORLD, &rank); 83 | MPI_Comm_size(MPI_COMM_WORLD, &size); 84 | sparse_index_t m = 0, n = 0, is_sym = 0; 85 | sparse_pointer_t nnz; 86 | sparse_pointer_t *rowptr = NULL; 87 | sparse_index_t *colidx = NULL; 88 | sparse_value_t *value = NULL; 89 | sparse_value_t *sol = NULL; 90 | sparse_value_t *rhs = NULL; 91 | 92 | // Step 2: Read matrix and rhs vectors. 93 | if (rank == 0) 94 | { 95 | char mtx_name[200] = {'\0'}; 96 | char rhs_name[200] = {'\0'}; 97 | read_command_params(ARGC, ARGV, mtx_name, rhs_name, &nb); 98 | 99 | printf("Reading matrix %s\n", mtx_name); 100 | mmio_info(&m, &n, &nnz, &is_sym, mtx_name); 101 | rowptr = (sparse_pointer_t *)malloc(sizeof(sparse_pointer_t) * (n + 1)); 102 | colidx = (sparse_index_t *)malloc(sizeof(sparse_index_t) * nnz); 103 | value = (sparse_value_t *)malloc(sizeof(sparse_value_t) * nnz); 104 | mmio_data_csr(rowptr, colidx, value, mtx_name); 105 | printf("Read mtx done.\n"); 106 | 107 | sol = (sparse_value_t *)malloc(sizeof(sparse_value_t) * n); 108 | rhs = (sparse_value_t *)malloc(sizeof(sparse_value_t) * n); 109 | for (int i = 0; i < n; i++) 110 | { 111 | rhs[i] = 0; 112 | for (sparse_pointer_t j = rowptr[i]; j < rowptr[i + 1]; j++) 113 | { 114 | rhs[i] += value[j]; 115 | } 116 | sol[i] = rhs[i]; 117 | } 118 | printf("Generate rhs done.\n"); 119 | } 120 | MPI_Bcast(&n, 1, MPI_SPARSE_INDEX_T, 0, MPI_COMM_WORLD); 121 | MPI_Bcast(&nb, 1, MPI_INT, 0, MPI_COMM_WORLD); 122 | MPI_Barrier(MPI_COMM_WORLD); 123 | 124 | // Step 3: Initialize PanguLU solver. 125 | pangulu_init_options init_options; 126 | init_options.nb = nb; 127 | init_options.nthread = 20; 128 | void *pangulu_handle; 129 | pangulu_init(n, nnz, rowptr, colidx, value, &init_options, &pangulu_handle); 130 | 131 | // Step 4: Execute LU factorization. 132 | pangulu_gstrf_options gstrf_options; 133 | pangulu_gstrf(&gstrf_options, &pangulu_handle); 134 | 135 | // Step 5: Execute triangle solve using factorize results. 136 | pangulu_gstrs_options gstrs_options; 137 | pangulu_gstrs(sol, &gstrs_options, &pangulu_handle); 138 | MPI_Barrier(MPI_COMM_WORLD); 139 | 140 | // Step 6: Check the answer. 141 | sparse_value_t *rhs_computed; 142 | if (rank == 0) 143 | { 144 | // Step 6.1: Calculate rhs_computed = A * x. 145 | rhs_computed = (sparse_value_t *)malloc(sizeof(sparse_value_t) * n); 146 | for (int i = 0; i < n; i++) 147 | { 148 | rhs_computed[i] = 0.0; 149 | sparse_value_t c = 0.0; 150 | for (sparse_pointer_t j = rowptr[i]; j < rowptr[i + 1]; j++) 151 | { 152 | sparse_value_t num = value[j] * sol[colidx[j]]; 153 | sparse_value_t z = num - c; 154 | sparse_value_t t = rhs_computed[i] + z; 155 | c = (t - rhs_computed[i]) - z; 156 | rhs_computed[i] = t; 157 | } 158 | } 159 | 160 | // Step 6.2: Calculate residual residual = rhs_comuted - rhs. 161 | sparse_value_t *residual = rhs_computed; 162 | for (int i = 0; i < n; i++) 163 | { 164 | residual[i] = rhs_computed[i] - rhs[i]; 165 | } 166 | 167 | sparse_value_t sum, c; 168 | // Step 6.3: Calculte norm2 of residual. 169 | sum = 0.0; 170 | c = 0.0; 171 | for (int i = 0; i < n; i++) 172 | { 173 | sparse_value_t num = residual[i] * residual[i]; 174 | sparse_value_t z = num - c; 175 | sparse_value_t t = sum + z; 176 | c = (t - sum) - z; 177 | sum = t; 178 | } 179 | #ifdef COMPLEX_MTX 180 | sparse_value_real_t residual_norm2 = complex_fabs(complex_sqrt(sum)); 181 | #else 182 | sparse_value_t residual_norm2 = sqrt(sum); 183 | #endif 184 | 185 | // Step 6.4: Calculte norm2 of original rhs. 186 | sum = 0.0; 187 | c = 0.0; 188 | for (int i = 0; i < n; i++) 189 | { 190 | sparse_value_t num = rhs[i] * rhs[i]; 191 | sparse_value_t z = num - c; 192 | sparse_value_t t = sum + z; 193 | c = (t - sum) - z; 194 | sum = t; 195 | } 196 | #ifdef COMPLEX_MTX 197 | sparse_value_real_t rhs_norm2 = complex_fabs(complex_sqrt(sum)); 198 | #else 199 | sparse_value_t rhs_norm2 = sqrt(sum); 200 | #endif 201 | 202 | // Step 6.5: Calculate relative residual. 203 | double relative_residual = residual_norm2 / rhs_norm2; 204 | printf("|| Ax - b || / || b || = %le\n", relative_residual); 205 | } 206 | 207 | // Step 7: Clean and finalize. 208 | pangulu_finalize(&pangulu_handle); 209 | if (rank == 0) 210 | { 211 | free(rowptr); 212 | free(colidx); 213 | free(value); 214 | free(sol); 215 | free(rhs); 216 | free(rhs_computed); 217 | } 218 | MPI_Finalize(); 219 | } 220 | -------------------------------------------------------------------------------- /examples/mmio.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Matrix Market I/O library for ANSI C 3 | * 4 | * See http://math.nist.gov/MatrixMarket for details. 5 | * 6 | * 7 | */ 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | #define MM_MTX_STR "matrix" 15 | #define MM_ARRAY_STR "array" 16 | #define MM_DENSE_STR "array" 17 | #define MM_COORDINATE_STR "coordinate" 18 | #define MM_SPARSE_STR "coordinate" 19 | #define MM_COMPLEX_STR "complex" 20 | #define MM_REAL_STR "real" 21 | #define MM_INT_STR "integer" 22 | #define MM_GENERAL_STR "general" 23 | #define MM_SYMM_STR "symmetric" 24 | #define MM_HERM_STR "hermitian" 25 | #define MM_SKEW_STR "skew-symmetric" 26 | #define MM_PATTERN_STR "pattern" 27 | 28 | #ifndef MM_IO_H 29 | #define MM_IO_H 30 | 31 | typedef char mm_typecode[4]; 32 | 33 | char *mm_typecode_to_str(mm_typecode matcode); 34 | 35 | int mm_read_banner(FILE *f, mm_typecode *matcode); 36 | int mm_read_mtx_crd_size(FILE *f, sparse_index_t *M, sparse_index_t *N, sparse_pointer_t *nz); 37 | long mm_read_mtx_array_size(FILE *f, sparse_index_t *M, sparse_index_t *N); 38 | 39 | long mm_write_banner(FILE *f, mm_typecode matcode); 40 | long mm_write_mtx_crd_size(FILE *f, sparse_index_t M, sparse_index_t N, sparse_pointer_t nz); 41 | long mm_write_mtx_array_size(FILE *f, sparse_index_t M, sparse_index_t N); 42 | 43 | #define MM_MAX_LINE_LENGTH 1025 44 | #define MatrixMarketBanner "%%MatrixMarket" 45 | #define MM_MAX_TOKEN_LENGTH 64 46 | 47 | 48 | /********************* mm_typecode query fucntions ***************************/ 49 | 50 | #define mm_is_matrix(typecode) ((typecode)[0]=='M') 51 | 52 | #define mm_is_sparse(typecode) ((typecode)[1]=='C') 53 | #define mm_is_coordinate(typecode)((typecode)[1]=='C') 54 | #define mm_is_dense(typecode) ((typecode)[1]=='A') 55 | #define mm_is_array(typecode) ((typecode)[1]=='A') 56 | 57 | #define mm_is_complex(typecode) ((typecode)[2]=='C') 58 | #define mm_is_real(typecode) ((typecode)[2]=='R') 59 | #define mm_is_pattern(typecode) ((typecode)[2]=='P') 60 | #define mm_is_integer(typecode) ((typecode)[2]=='I') 61 | 62 | #define mm_is_symmetric(typecode)((typecode)[3]=='S') 63 | #define mm_is_general(typecode) ((typecode)[3]=='G') 64 | #define mm_is_skew(typecode) ((typecode)[3]=='K') 65 | #define mm_is_hermitian(typecode)((typecode)[3]=='H') 66 | 67 | long mm_is_valid(mm_typecode matcode); /* too complex for a macro */ 68 | 69 | 70 | /********************* mm_typecode modify fucntions ***************************/ 71 | 72 | #define mm_set_matrix(typecode) ((*typecode)[0]='M') 73 | #define mm_set_coordinate(typecode) ((*typecode)[1]='C') 74 | #define mm_set_array(typecode) ((*typecode)[1]='A') 75 | #define mm_set_dense(typecode) mm_set_array(typecode) 76 | #define mm_set_sparse(typecode) mm_set_coordinate(typecode) 77 | 78 | #define mm_set_complex(typecode)((*typecode)[2]='C') 79 | #define mm_set_real(typecode) ((*typecode)[2]='R') 80 | #define mm_set_pattern(typecode)((*typecode)[2]='P') 81 | #define mm_set_integer(typecode)((*typecode)[2]='I') 82 | 83 | 84 | #define mm_set_symmetric(typecode)((*typecode)[3]='S') 85 | #define mm_set_general(typecode)((*typecode)[3]='G') 86 | #define mm_set_skew(typecode) ((*typecode)[3]='K') 87 | #define mm_set_hermitian(typecode)((*typecode)[3]='H') 88 | 89 | #define mm_clear_typecode(typecode) ((*typecode)[0]=(*typecode)[1]= \ 90 | (*typecode)[2]=' ',(*typecode)[3]='G') 91 | 92 | #define mm_initialize_typecode(typecode) mm_clear_typecode(typecode) 93 | 94 | 95 | /********************* Matrix Market error codes ***************************/ 96 | 97 | 98 | #define MM_COULD_NOT_READ_FILE 11 99 | #define MM_PREMATURE_EOF 12 100 | #define MM_NOT_MTX 13 101 | #define MM_NO_HEADER 14 102 | #define MM_UNSUPPORTED_TYPE 15 103 | #define MM_LINE_TOO_LONG 16 104 | #define MM_COULD_NOT_WRITE_FILE 17 105 | 106 | 107 | /******************** Matrix Market internal definitions ******************** 108 | 109 | MM_matrix_typecode: 4-character sequence 110 | 111 | ojbect sparse/ data storage 112 | dense type scheme 113 | 114 | string position: [0] [1] [2] [3] 115 | 116 | Matrix typecode: M(atrix) C(oord) R(eal) G(eneral) 117 | A(array) C(omplex) H(ermitian) 118 | P(attern) S(ymmetric) 119 | I(nteger) K(kew) 120 | 121 | ***********************************************************************/ 122 | 123 | #define MM_MTX_STR "matrix" 124 | #define MM_ARRAY_STR "array" 125 | #define MM_DENSE_STR "array" 126 | #define MM_COORDINATE_STR "coordinate" 127 | #define MM_SPARSE_STR "coordinate" 128 | #define MM_COMPLEX_STR "complex" 129 | #define MM_REAL_STR "real" 130 | #define MM_INT_STR "integer" 131 | #define MM_GENERAL_STR "general" 132 | #define MM_SYMM_STR "symmetric" 133 | #define MM_HERM_STR "hermitian" 134 | #define MM_SKEW_STR "skew-symmetric" 135 | #define MM_PATTERN_STR "pattern" 136 | 137 | 138 | /* high level routines */ 139 | 140 | long mm_write_mtx_crd(char fname[], long M, long N, long nz, long I[], long J[], 141 | double val[], mm_typecode matcode); 142 | long mm_read_mtx_crd_data(FILE *f, long M, long N, long nz, long I[], long J[], 143 | double val[], mm_typecode matcode); 144 | long mm_read_mtx_crd_entry(FILE *f, long *I, long *J, double *real, double *img, 145 | mm_typecode matcode); 146 | 147 | long mm_read_unsymmetric_sparse(const char *fname, long *M_, long *N_, long *nz_, 148 | double **val_, long **I_, long **J_); 149 | 150 | char *mm_strdup(const char *s) 151 | { 152 | long len = strlen(s); 153 | char *s2 = (char *) malloc((len+1)*sizeof(char)); 154 | return strcpy(s2, s); 155 | } 156 | 157 | char *mm_typecode_to_str(mm_typecode matcode) 158 | { 159 | char buffer[MM_MAX_LINE_LENGTH]; 160 | char *types[4]; 161 | char *mm_strdup(const char *); 162 | //long error =0; 163 | 164 | /* check for MTX type */ 165 | if (mm_is_matrix(matcode)) 166 | types[0] = (char *)MM_MTX_STR; 167 | //else 168 | // error=1; 169 | 170 | /* check for CRD or ARR matrix */ 171 | if (mm_is_sparse(matcode)) 172 | types[1] = (char *)MM_SPARSE_STR; 173 | else 174 | if (mm_is_dense(matcode)) 175 | types[1] = (char *)MM_DENSE_STR; 176 | else 177 | return NULL; 178 | 179 | /* check for element data type */ 180 | if (mm_is_real(matcode)) 181 | types[2] = (char *)MM_REAL_STR; 182 | else 183 | if (mm_is_complex(matcode)) 184 | types[2] = (char *)MM_COMPLEX_STR; 185 | else 186 | if (mm_is_pattern(matcode)) 187 | types[2] = (char *)MM_PATTERN_STR; 188 | else 189 | if (mm_is_integer(matcode)) 190 | types[2] = (char *)MM_INT_STR; 191 | else 192 | return NULL; 193 | 194 | 195 | /* check for symmetry type */ 196 | if (mm_is_general(matcode)) 197 | types[3] = (char *)MM_GENERAL_STR; 198 | else 199 | if (mm_is_symmetric(matcode)) 200 | types[3] = (char *)MM_SYMM_STR; 201 | else 202 | if (mm_is_hermitian(matcode)) 203 | types[3] = (char *)MM_HERM_STR; 204 | else 205 | if (mm_is_skew(matcode)) 206 | types[3] = (char *)MM_SKEW_STR; 207 | else 208 | return NULL; 209 | 210 | sprintf(buffer,"%s %s %s %s", types[0], types[1], types[2], types[3]); 211 | return mm_strdup(buffer); 212 | 213 | } 214 | 215 | int mm_read_banner(FILE *f, mm_typecode *matcode) 216 | { 217 | char line[MM_MAX_LINE_LENGTH]; 218 | char banner[MM_MAX_TOKEN_LENGTH]; 219 | char mtx[MM_MAX_TOKEN_LENGTH]; 220 | char crd[MM_MAX_TOKEN_LENGTH]; 221 | char data_type[MM_MAX_TOKEN_LENGTH]; 222 | char storage_scheme[MM_MAX_TOKEN_LENGTH]; 223 | char *p; 224 | 225 | 226 | mm_clear_typecode(matcode); 227 | 228 | if (fgets(line, MM_MAX_LINE_LENGTH, f) == NULL) 229 | return MM_PREMATURE_EOF; 230 | 231 | if (sscanf(line, "%s %s %s %s %s", banner, mtx, crd, data_type, 232 | storage_scheme) != 5) 233 | return MM_PREMATURE_EOF; 234 | 235 | for (p=mtx; *p!='\0'; *p=tolower(*p),p++); /* convert to lower case */ 236 | for (p=crd; *p!='\0'; *p=tolower(*p),p++); 237 | for (p=data_type; *p!='\0'; *p=tolower(*p),p++); 238 | for (p=storage_scheme; *p!='\0'; *p=tolower(*p),p++); 239 | 240 | /* check for banner */ 241 | if (strncmp(banner, MatrixMarketBanner, strlen(MatrixMarketBanner)) != 0) 242 | return MM_NO_HEADER; 243 | 244 | /* first field should be "mtx" */ 245 | if (strcmp(mtx, MM_MTX_STR) != 0) 246 | return MM_UNSUPPORTED_TYPE; 247 | mm_set_matrix(matcode); 248 | 249 | 250 | /* second field describes whether this is a sparse matrix (in coordinate 251 | storgae) or a dense array */ 252 | 253 | 254 | if (strcmp(crd, MM_SPARSE_STR) == 0) 255 | mm_set_sparse(matcode); 256 | else 257 | if (strcmp(crd, MM_DENSE_STR) == 0) 258 | mm_set_dense(matcode); 259 | else 260 | return MM_UNSUPPORTED_TYPE; 261 | 262 | 263 | /* third field */ 264 | 265 | if (strcmp(data_type, MM_REAL_STR) == 0) 266 | mm_set_real(matcode); 267 | else 268 | if (strcmp(data_type, MM_COMPLEX_STR) == 0) 269 | mm_set_complex(matcode); 270 | else 271 | if (strcmp(data_type, MM_PATTERN_STR) == 0) 272 | mm_set_pattern(matcode); 273 | else 274 | if (strcmp(data_type, MM_INT_STR) == 0) 275 | mm_set_integer(matcode); 276 | else 277 | return MM_UNSUPPORTED_TYPE; 278 | 279 | 280 | /* fourth field */ 281 | 282 | if (strcmp(storage_scheme, MM_GENERAL_STR) == 0) 283 | mm_set_general(matcode); 284 | else 285 | if (strcmp(storage_scheme, MM_SYMM_STR) == 0) 286 | mm_set_symmetric(matcode); 287 | else 288 | if (strcmp(storage_scheme, MM_HERM_STR) == 0) 289 | mm_set_hermitian(matcode); 290 | else 291 | if (strcmp(storage_scheme, MM_SKEW_STR) == 0) 292 | mm_set_skew(matcode); 293 | else 294 | return MM_UNSUPPORTED_TYPE; 295 | 296 | 297 | return 0; 298 | } 299 | 300 | int mm_read_mtx_crd_size(FILE *f, sparse_index_t *M, sparse_index_t *N, sparse_pointer_t *nz) 301 | { 302 | char line[MM_MAX_LINE_LENGTH]; 303 | int num_items_read; 304 | 305 | /* set return null parameter values, in case we exit with errors */ 306 | *M = *N = *nz = 0; 307 | 308 | /* now continue scanning until you reach the end-of-comments */ 309 | do 310 | { 311 | if (fgets(line,MM_MAX_LINE_LENGTH,f) == NULL) 312 | return MM_PREMATURE_EOF; 313 | }while (line[0] == '%'); 314 | 315 | /* line[] is either blank or has M,N, nz */ 316 | if (sscanf(line, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T " " FMT_SPARSE_POINTER_T, M, N, nz) == 3) 317 | return 0; 318 | 319 | else 320 | do 321 | { 322 | num_items_read = fscanf(f, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T " " FMT_SPARSE_POINTER_T, M, N, nz); 323 | if (num_items_read == EOF) return MM_PREMATURE_EOF; 324 | } 325 | while (num_items_read != 3); 326 | 327 | return 0; 328 | } 329 | 330 | long mm_read_mtx_array_size(FILE *f, sparse_index_t *M, sparse_index_t *N) 331 | { 332 | char line[MM_MAX_LINE_LENGTH]; 333 | long num_items_read; 334 | /* set return null parameter values, in case we exit with errors */ 335 | *M = *N = 0; 336 | 337 | /* now continue scanning until you reach the end-of-comments */ 338 | do 339 | { 340 | if (fgets(line,MM_MAX_LINE_LENGTH,f) == NULL) 341 | return MM_PREMATURE_EOF; 342 | }while (line[0] == '%'); 343 | 344 | /* line[] is either blank or has M,N, nz */ 345 | if (sscanf(line, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T, M, N) == 2) 346 | return 0; 347 | 348 | else /* we have a blank line */ 349 | do 350 | { 351 | num_items_read = fscanf(f, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T, M, N); 352 | if (num_items_read == EOF) return MM_PREMATURE_EOF; 353 | } 354 | while (num_items_read != 2); 355 | 356 | return 0; 357 | } 358 | 359 | long mm_write_banner(FILE *f, mm_typecode matcode) 360 | { 361 | char *str = mm_typecode_to_str(matcode); 362 | long ret_code; 363 | 364 | ret_code = fprintf(f, "%s %s\n", MatrixMarketBanner, str); 365 | free(str); 366 | if (ret_code !=2 ) 367 | return MM_COULD_NOT_WRITE_FILE; 368 | else 369 | return 0; 370 | } 371 | 372 | long mm_write_mtx_crd_size(FILE *f, sparse_index_t M, sparse_index_t N, sparse_pointer_t nz) 373 | { 374 | if (fprintf(f, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T " " FMT_SPARSE_POINTER_T "\n", M, N, nz) != 3) 375 | return MM_COULD_NOT_WRITE_FILE; 376 | else 377 | return 0; 378 | } 379 | 380 | long mm_write_mtx_array_size(FILE *f, sparse_index_t M, sparse_index_t N) 381 | { 382 | if (fprintf(f, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T "\n", M, N) != 2) 383 | return MM_COULD_NOT_WRITE_FILE; 384 | else 385 | return 0; 386 | } 387 | 388 | 389 | 390 | 391 | long mm_is_valid(mm_typecode matcode) /* too complex for a macro */ 392 | { 393 | if (!mm_is_matrix(matcode)) return 0; 394 | if (mm_is_dense(matcode) && mm_is_pattern(matcode)) return 0; 395 | if (mm_is_real(matcode) && mm_is_hermitian(matcode)) return 0; 396 | if (mm_is_pattern(matcode) && (mm_is_hermitian(matcode) || 397 | mm_is_skew(matcode))) return 0; 398 | return 1; 399 | } 400 | 401 | 402 | 403 | 404 | /* high level routines */ 405 | 406 | long mm_write_mtx_crd(char fname[], long M, long N, long nz, long I[], long J[], 407 | double val[], mm_typecode matcode) 408 | { 409 | FILE *f; 410 | long i; 411 | 412 | if (strcmp(fname, "stdout") == 0) 413 | f = stdout; 414 | else 415 | if ((f = fopen(fname, "w")) == NULL) 416 | return MM_COULD_NOT_WRITE_FILE; 417 | 418 | /* print banner followed by typecode */ 419 | fprintf(f, "%s ", MatrixMarketBanner); 420 | fprintf(f, "%s\n", mm_typecode_to_str(matcode)); 421 | 422 | /* print matrix sizes and nonzeros */ 423 | fprintf(f, "%ld %ld %ld\n", M, N, nz); 424 | 425 | /* print values */ 426 | if (mm_is_pattern(matcode)) 427 | for (i=0; i 2 | #include 3 | typedef struct pangulu_init_options 4 | { 5 | int nthread; 6 | int nb; 7 | }pangulu_init_options; 8 | 9 | typedef struct pangulu_gstrf_options 10 | { 11 | }pangulu_gstrf_options; 12 | 13 | typedef struct pangulu_gstrs_options 14 | { 15 | }pangulu_gstrs_options; -------------------------------------------------------------------------------- /lib/Makefile: -------------------------------------------------------------------------------- 1 | include ../make.inc 2 | 3 | all : oclean 4 | 5 | libs : libpangulu.so libpangulu.a 6 | 7 | libpangulu.so: 8 | $(MPICC) $(MPICCFLAGS) -shared -fPIC -o $@ ./pangulu*.o 9 | libpangulu.a: 10 | ar -rv -o $@ ./pangulu*.o 11 | - ranlib $@ 12 | 13 | oclean: libs 14 | rm -f pangulu*.o 15 | 16 | clean: 17 | rm -f libpangulu.so 18 | rm -f libpangulu.a 19 | -------------------------------------------------------------------------------- /make.inc: -------------------------------------------------------------------------------- 1 | COMPILE_LEVEL = -O3 2 | 3 | #0201000,GPU_CUDA 4 | CUDA_PATH = /usr/local/cuda 5 | CUDA_INC = -I/path/to/cuda/include 6 | CUDA_LIB = -L/path/to/cuda/lib64 -lcudart -lcusparse 7 | NVCC = nvcc $(COMPILE_LEVEL) 8 | NVCCFLAGS = $(PANGULU_FLAGS) -w -Xptxas -dlcm=cg -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_61,code=compute_61 $(CUDA_INC) $(CUDA_LIB) 9 | 10 | #general 11 | CC = gcc $(COMPILE_LEVEL) #-fsanitize=address 12 | MPICC = mpicc $(COMPILE_LEVEL) #-fsanitize=address 13 | OPENBLAS_INC = -I/path/to/openblas/include 14 | OPENBLAS_LIB = -L/path/to/openblas/lib -lopenblas 15 | MPICCFLAGS = $(OPENBLAS_INC) $(CUDA_INC) $(OPENBLAS_LIB) -fopenmp -lpthread -lm 16 | MPICCLINK = $(OPENBLAS_LIB) 17 | METISFLAGS = -I/path/to/gklib/include -I/path/to/metis/include 18 | PANGULU_FLAGS = -DPANGULU_LOG_INFO -DCALCULATE_TYPE_R64 -DMETIS -DPANGULU_MC64 #-DGPU_OPEN -DHT_IS_OPEN 19 | -------------------------------------------------------------------------------- /src/Makefile: -------------------------------------------------------------------------------- 1 | include ../make.inc 2 | all:pangulu_host pangulu_platforms 3 | 4 | src:=$(wildcard *.c) 5 | pangulu_host:$(src:.c=.o) 6 | 7 | %.o:%.c 8 | $(MPICC) $(MPICCFLAGS) $(METISFLAGS) $(PANGULU_FLAGS) -c $< -o $@ -fPIC 9 | mv $@ ../lib 10 | 11 | pangulu_platforms: 12 | cd .. && python3 build_helper.py compile_platform_code 13 | 14 | clean: 15 | -(rm -f ../lib/pangulu*.o) 16 | -(rm -f ./pangulu*.o) 17 | -------------------------------------------------------------------------------- /src/languages/pangulu_en.h: -------------------------------------------------------------------------------- 1 | #ifdef PANGULU_EN 2 | 3 | #ifdef PANGULU_LOG_ERROR 4 | #define PANGULU_E_NB_IS_ZERO "[PanguLU Error] nb is zero.\n" 5 | #define PANGULU_E_INVALID_HEAP_SELECT "[PanguLU Error] Invalid heap comparing strategy.\n" 6 | #define PANGULU_E_HEAP_FULL "[PanguLU Error] The heap is full on rank " FMT_PANGULU_INT32_T ".\n", rank 7 | #define PANGULU_E_HEAP_EMPTY "[PanguLU Error] The heap is empty on rank " FMT_PANGULU_INT32_T ".\n", rank 8 | #define PANGULU_E_CPU_MEM "[PanguLU Error] Failed to allocate " FMT_PANGULU_INT64_T " byte(s). CPU memory is not enough. %s:" FMT_PANGULU_INT64_T "\n", size, file, line 9 | #define PANGULU_E_ISEND_CSR "[PanguLU Error] pangulu_isend_whole_pangulu_smatrix_csr error. value != s->value.\n" 10 | #define PANGULU_E_ISEND_CSC "[PanguLU Error] pangulu_isend_whole_pangulu_smatrix_csc error. value != s->value_csc.\n" 11 | #define PANGULU_E_ROW_IS_NULL "[PanguLU Error] The matrix has zero row(s).\n" 12 | #define PANGULU_E_ROW_DONT_HAVE_DIA "[PanguLU Error] Row[" FMT_PANGULU_EXBLOCK_IDX "] don't have diagonal element.\n", i 13 | #define PANGULU_E_ERR_IN_RRCL "[PanguLU Error] Invalid numeric factorization task on rank " FMT_PANGULU_INT32_T ". row=" FMT_PANGULU_INT64_T " col=" FMT_PANGULU_INT64_T " level=" FMT_PANGULU_INT64_T "\n", rank, row, col, level 14 | #define PANGULU_E_K_ID "[PanguLU Error] Invalid kernel id " FMT_PANGULU_INT64_T " for numeric factorization.\n", kernel_id 15 | #define PANGULU_E_ASYM "[PanguLU Error] MPI_Barrier_asym error.\n" 16 | #define PANGULU_E_ADD_DIA "[PanguLU Error] pangulu_add_diagonal_element error\n" 17 | #define PANGULU_E_CUDA_MALLOC "[PanguLU Error] Failed to cudaMalloc %lu byte(s). GPU memory is not enough.\n", size 18 | #define PANGULU_E_ROW_IS_ZERO "[PanguLU Error] Invalid input matrix.\n" 19 | #define PANGULU_E_MAX_NULL "[PanguLU Error] pangulu_mc64 internal error. (now_row_max==0)\n" 20 | #define PANGULU_E_WORK_ERR "[PanguLU Error] Invalid kernel id " FMT_PANGULU_INT64_T " for sptrsv.\n", kernel_id 21 | #define PANGULU_E_BIP_PTR_INVALID "[PanguLU Error] Invalid pangulu_block_info pointer.\n" 22 | #define PANGULU_E_BIP_INVALID "[PanguLU Error] Invalid pangulu_block_info.\n" 23 | #define PANGULU_E_BIP_NOT_EMPTY "[PanguLU Error] Block info pool is not empty.\n" 24 | #define PANGULU_E_BIP_OUT_OF_RANGE "[PanguLU Error] PANGULU_BIP index out of range.\n" 25 | #define PANGULU_E_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_init)\n" 26 | #define PANGULU_E_GSTRF_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_gstrf)\n" 27 | #define PANGULU_E_GSTRS_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_gstrs)\n" 28 | #endif // PANGULU_LOG_ERROR 29 | 30 | #ifdef PANGULU_LOG_WARNING 31 | #define PANGULU_W_RANK_HEAP_DONT_NULL "[PanguLU Warning] " FMT_PANGULU_INT64_T " task remaining on rank " FMT_PANGULU_INT32_T ".\n", heap->length, rank 32 | #define PANGULU_W_ERR_RANK "[PanguLU Warning] Receiving message error on rank " FMT_PANGULU_INT32_T ".\n", rank 33 | #define PANGULU_W_BIP_INCREASE_SPEED_TOO_SMALL "[PanguLU Warning] PANGULU_BIP_INCREASE_SPEED too small.\n" 34 | #define PANGULU_W_GPU_BIG_BLOCK "[PanguLU Warning] When GPU is open, init_options->nb > 256 and pangulu_inblock_idx isn't pangulu_uint32_t, performance will be limited.\n" 35 | #define PANGULU_W_COMPLEX_FALLBACK "[PanguLU Warning] Calculating complex value on GPU is not supported. Fallback to CPU.\n" 36 | #endif // PANGULU_LOG_WARNING 37 | 38 | #ifdef PANGULU_LOG_INFO 39 | #define PANGULU_I_VECT2NORM_ERR "[PanguLU Info] || Ax - B || / || Ax || = %12.4le.\n", error 40 | #define PANGULU_I_CHECK_PASS "[PanguLU Info] Check ------------------------------------- pass\n" 41 | #define PANGULU_I_CHECK_ERROR "[PanguLU Info] Check ------------------------------------ error\n" 42 | #define PANGULU_I_DEV_IS "[PanguLU Info] Device is %s.\n", prop.name 43 | #define PANGULU_I_TASK_INFO "[PanguLU Info] Info of inserting task is: row=" FMT_PANGULU_INT64_T " col=" FMT_PANGULU_INT64_T " level=" FMT_PANGULU_INT64_T " kernel=" FMT_PANGULU_INT64_T ".\n", row, col, task_level, kernel_id 44 | #define PANGULU_I_HEAP_LEN "[PanguLU Info] heap.length=" FMT_PANGULU_INT64_T " heap.capacity=" FMT_PANGULU_INT64_T "\n", heap->length, heap->max_length 45 | #define PANGULU_I_ADAPTIVE_KERNEL_SELECTION_ON "[PanguLU Info] ADAPTIVE_KERNEL_SELECTION ------------- ON\n" 46 | #define PANGULU_I_ADAPTIVE_KERNEL_SELECTION_OFF "[PanguLU Info] ADAPTIVE_KERNEL_SELECTION ------------- OFF\n" 47 | #define PANGULU_I_SYNCHRONIZE_FREE_ON "[PanguLU Info] SYNCHRONIZE_FREE ---------------------- ON\n" 48 | #define PANGULU_I_SYNCHRONIZE_FREE_OFF "[PanguLU Info] SYNCHRONIZE_FREE ---------------------- OFF\n" 49 | #ifdef METIS 50 | #define PANGULU_I_BASIC_INFO "[PanguLU Info] n=" FMT_PANGULU_INT64_T " nnz=" FMT_PANGULU_EXBLOCK_PTR " nb=" FMT_PANGULU_INT32_T " mpi_process=" FMT_PANGULU_INT32_T " preprocessing_thread=%d METIS:%s\n", n, origin_smatrix->rowpointer[n], nb, size, init_options->nthread, (sizeof(idx_t) == 4) ? ("i32") : ((sizeof(idx_t) == 8) ? ("i64") : ("?")) 51 | #else 52 | #define PANGULU_I_BASIC_INFO "[PanguLU Info] n=" FMT_PANGULU_INT64_T " nnz=" FMT_PANGULU_EXBLOCK_PTR " nb=" FMT_PANGULU_INT32_T " mpi_process=" FMT_PANGULU_INT32_T " preprocessing_thread=%d\n", n, origin_smatrix->rowpointer[n], nb, size, init_options->nthread 53 | #endif 54 | #define PANGULU_I_TIME_REORDER "[PanguLU Info] Reordering time is %lf s.\n", elapsed_time 55 | #define PANGULU_I_TIME_SYMBOLIC "[PanguLU Info] Symbolic factorization time is %lf s.\n", elapsed_time 56 | #define PANGULU_I_TIME_PRE "[PanguLU Info] Preprocessing time is %lf s.\n", elapsed_time 57 | #define PANGULU_I_TIME_NUMERICAL "[PanguLU Info] Numeric factorization time is %lf s.\n", elapsed_time //, flop / pangulu_get_spend_time(common) / 1000000000.0 58 | #define PANGULU_I_TIME_SPTRSV "[PanguLU Info] Solving time is %lf s.\n", elapsed_time 59 | #define PANGULU_I_SYMBOLIC_NONZERO "[PanguLU Info] Symbolic nonzero count is " FMT_PANGULU_EXBLOCK_PTR ".\n",*symbolic_nnz 60 | #endif // PANGULU_LOG_INFO 61 | 62 | #endif // #ifdef PANGULU_EN -------------------------------------------------------------------------------- /src/languages/pangulu_en_us.h: -------------------------------------------------------------------------------- 1 | #ifdef PANGULU_EN_US 2 | 3 | #ifdef PANGULU_LOG_ERROR 4 | #define PANGULU_E_NB_IS_ZERO "[PanguLU Error] nb is zero.\n" 5 | #define PANGULU_E_INVALID_HEAP_SELECT "[PanguLU Error] Invalid heap comparing strategy.\n" 6 | #define PANGULU_E_HEAP_FULL "[PanguLU Error] The heap is full on rank " FMT_PANGULU_INT32_T ".\n", rank 7 | #define PANGULU_E_HEAP_EMPTY "[PanguLU Error] The heap is empty on rank " FMT_PANGULU_INT32_T ".\n", rank 8 | #define PANGULU_E_CPU_MEM "[PanguLU Error] Failed to allocate " FMT_PANGULU_INT64_T " byte(s). CPU memory is not enough. %s:" FMT_PANGULU_INT64_T "\n", size, file, line 9 | #define PANGULU_E_ISEND_CSR "[PanguLU Error] pangulu_isend_whole_pangulu_smatrix_csr error. value != s->value.\n" 10 | #define PANGULU_E_ISEND_CSC "[PanguLU Error] pangulu_isend_whole_pangulu_smatrix_csc error. value != s->value_csc.\n" 11 | #define PANGULU_E_ROW_IS_NULL "[PanguLU Error] The matrix has zero row(s).\n" 12 | #define PANGULU_E_ROW_DONT_HAVE_DIA "[PanguLU Error] Row[" FMT_PANGULU_EXBLOCK_IDX "] don't have diagonal element.\n", i 13 | #define PANGULU_E_ERR_IN_RRCL "[PanguLU Error] Invalid numeric factorization task on rank " FMT_PANGULU_INT32_T ". row=" FMT_PANGULU_INT64_T " col=" FMT_PANGULU_INT64_T " level=" FMT_PANGULU_INT64_T "\n", rank, row, col, level 14 | #define PANGULU_E_K_ID "[PanguLU Error] Invalid kernel id " FMT_PANGULU_INT64_T " for numeric factorization.\n", kernel_id 15 | #define PANGULU_E_ASYM "[PanguLU Error] MPI_Barrier_asym error.\n" 16 | #define PANGULU_E_ADD_DIA "[PanguLU Error] pangulu_add_diagonal_element error\n" 17 | #define PANGULU_E_CUDA_MALLOC "[PanguLU Error] Failed to cudaMalloc %lu byte(s). GPU memory is not enough.\n", size 18 | #define PANGULU_E_ROW_IS_ZERO "[PanguLU Error] Invalid input matrix.\n" 19 | #define PANGULU_E_MAX_NULL "[PanguLU Error] pangulu_mc64 internal error. (now_row_max==0)\n" 20 | #define PANGULU_E_WORK_ERR "[PanguLU Error] Invalid kernel id " FMT_PANGULU_INT64_T " for sptrsv.\n", kernel_id 21 | #define PANGULU_E_BIP_PTR_INVALID "[PanguLU Error] Invalid pangulu_block_info pointer.\n" 22 | #define PANGULU_E_BIP_INVALID "[PanguLU Error] Invalid pangulu_block_info.\n" 23 | #define PANGULU_E_BIP_NOT_EMPTY "[PanguLU Error] Block info pool is not empty.\n" 24 | #define PANGULU_E_BIP_OUT_OF_RANGE "[PanguLU Error] PANGULU_BIP index out of range.\n" 25 | #define PANGULU_E_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_init)\n" 26 | #define PANGULU_E_GSTRF_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_gstrf)\n" 27 | #define PANGULU_E_GSTRS_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_gstrs)\n" 28 | #endif // PANGULU_LOG_ERROR 29 | 30 | #ifdef PANGULU_LOG_WARNING 31 | #define PANGULU_W_RANK_HEAP_DONT_NULL "[PanguLU Warning] " FMT_PANGULU_INT64_T " task remaining on rank " FMT_PANGULU_INT32_T ".\n", heap->length, rank 32 | #define PANGULU_W_ERR_RANK "[PanguLU Warning] Receiving message error on rank " FMT_PANGULU_INT32_T ".\n", rank 33 | #define PANGULU_W_BIP_INCREASE_SPEED_TOO_SMALL "[PanguLU Warning] PANGULU_BIP_INCREASE_SPEED too small.\n" 34 | #define PANGULU_W_GPU_BIG_BLOCK "[PanguLU Warning] When GPU is open, init_options->nb > 256 and pangulu_inblock_idx isn't pangulu_uint32_t, performance will be limited.\n" 35 | #define PANGULU_W_COMPLEX_FALLBACK "[PanguLU Warning] Calculating complex value on GPU is not supported. Fallback to CPU.\n" 36 | #endif // PANGULU_LOG_WARNING 37 | 38 | #ifdef PANGULU_LOG_INFO 39 | #define PANGULU_I_VECT2NORM_ERR "[PanguLU Info] || Ax - B || / || Ax || = %12.4le.\n", error 40 | #define PANGULU_I_CHECK_PASS "[PanguLU Info] Check ------------------------------------- pass\n" 41 | #define PANGULU_I_CHECK_ERROR "[PanguLU Info] Check ------------------------------------ error\n" 42 | #define PANGULU_I_DEV_IS "[PanguLU Info] Device is %s.\n", prop.name 43 | #define PANGULU_I_TASK_INFO "[PanguLU Info] Info of inserting task is: row=" FMT_PANGULU_INT64_T " col=" FMT_PANGULU_INT64_T " level=" FMT_PANGULU_INT64_T " kernel=" FMT_PANGULU_INT64_T ".\n", row, col, task_level, kernel_id 44 | #define PANGULU_I_HEAP_LEN "[PanguLU Info] heap.length=" FMT_PANGULU_INT64_T " heap.capacity=" FMT_PANGULU_INT64_T "\n", heap->length, heap->max_length 45 | #define PANGULU_I_ADAPTIVE_KERNEL_SELECTION_ON "[PanguLU Info] ADAPTIVE_KERNEL_SELECTION ------------- ON\n" 46 | #define PANGULU_I_ADAPTIVE_KERNEL_SELECTION_OFF "[PanguLU Info] ADAPTIVE_KERNEL_SELECTION ------------- OFF\n" 47 | #define PANGULU_I_SYNCHRONIZE_FREE_ON "[PanguLU Info] SYNCHRONIZE_FREE ---------------------- ON\n" 48 | #define PANGULU_I_SYNCHRONIZE_FREE_OFF "[PanguLU Info] SYNCHRONIZE_FREE ---------------------- OFF\n" 49 | #ifdef METIS 50 | #define PANGULU_I_BASIC_INFO "[PanguLU Info] n=" FMT_PANGULU_INT64_T " nnz=" FMT_PANGULU_EXBLOCK_PTR " nb=" FMT_PANGULU_INT32_T " mpi_process=" FMT_PANGULU_INT32_T " preprocessing_thread=%d METIS:%s\n", n, origin_smatrix->rowpointer[n], nb, size, init_options->nthread, (sizeof(idx_t) == 4) ? ("i32") : ((sizeof(idx_t) == 8) ? ("i64") : ("?")) 51 | #else 52 | #define PANGULU_I_BASIC_INFO "[PanguLU Info] n=" FMT_PANGULU_INT64_T " nnz=" FMT_PANGULU_EXBLOCK_PTR " nb=" FMT_PANGULU_INT32_T " mpi_process=" FMT_PANGULU_INT32_T " preprocessing_thread=%d\n", n, origin_smatrix->rowpointer[n], nb, size, init_options->nthread 53 | #endif 54 | #define PANGULU_I_TIME_REORDER "[PanguLU Info] Reordering time is %lf s.\n", pangulu_get_spend_time(common) 55 | #define PANGULU_I_TIME_SYMBOLIC "[PanguLU Info] Symbolic factorization time is %lf s.\n", pangulu_get_spend_time(common) 56 | #define PANGULU_I_TIME_PRE "[PanguLU Info] Preprocessing time is %lf s.\n", pangulu_get_spend_time(common) 57 | #define PANGULU_I_TIME_NUMERICAL "[PanguLU Info] Numeric factorization time is %lf s.\n", pangulu_get_spend_time(common) //, flop / pangulu_get_spend_time(common) / 1000000000.0 58 | #define PANGULU_I_TIME_SPTRSV "[PanguLU Info] Solving time is %lf s.\n", pangulu_get_spend_time(common) 59 | #define PANGULU_I_SYMBOLIC_NONZERO "[PanguLU Info] Symbolic nonzero count is " FMT_PANGULU_EXBLOCK_PTR ".\n",*symbolic_nnz 60 | #endif // PANGULU_LOG_INFO 61 | 62 | #endif // #ifdef PANGULU_EN_US -------------------------------------------------------------------------------- /src/pangulu.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | pangulu_int64_t cpu_memory = 0; 4 | pangulu_int64_t cpu_peak_memory = 0; 5 | pangulu_int64_t gpu_memory = 0; 6 | pangulu_int64_t heap_select; 7 | calculate_type *temp_a_value = NULL; 8 | pangulu_int32_t *cuda_b_idx_col = NULL; 9 | calculate_type *cuda_temp_value = NULL; 10 | pangulu_int64_t *ssssm_col_ops_u = NULL; 11 | pangulu_int32_t *ssssm_ops_pointer = NULL; 12 | pangulu_int32_t *getrf_diagIndex_csc = NULL; 13 | pangulu_int32_t *getrf_diagIndex_csr = NULL; 14 | 15 | pangulu_int64_t STREAM_DENSE_INDEX = 0; 16 | pangulu_int64_t INDEX_NUM = 0; 17 | pangulu_int32_t pangu_omp_num_threads = 1; 18 | 19 | pangulu_int64_t flop = 0; 20 | double time_transpose = 0.0; 21 | double time_isend = 0.0; 22 | double time_receive = 0.0; 23 | double time_getrf = 0.0; 24 | double time_tstrf = 0.0; 25 | double time_gessm = 0.0; 26 | double time_gessm_dense = 0.0; 27 | double time_gessm_sparse = 0.0; 28 | double time_ssssm = 0.0; 29 | double time_cuda_memcpy = 0.0; 30 | double time_wait = 0.0; 31 | double calculate_time_wait = 0.0; 32 | pangulu_int64_t calculate_time = 0; 33 | 34 | pangulu_int32_t *ssssm_hash_lu = NULL; 35 | pangulu_int32_t *ssssm_hash_l_row = NULL; 36 | pangulu_int32_t zip_cur_id = 0; 37 | calculate_type *ssssm_l_value = NULL; 38 | calculate_type *ssssm_u_value = NULL; 39 | pangulu_int32_t *ssssm_hash_u_col = NULL; 40 | 41 | pangulu_int32_t rank; 42 | pangulu_int32_t global_level; 43 | pangulu_int32_t omp_thread; 44 | 45 | void pangulu_init(pangulu_exblock_idx pangulu_n, pangulu_exblock_ptr pangulu_nnz, pangulu_exblock_ptr *csr_rowptr, pangulu_exblock_idx *csr_colidx, calculate_type *csr_value, pangulu_init_options *init_options, void **pangulu_handle) 46 | { 47 | MPI_Comm_rank(MPI_COMM_WORLD, &rank); 48 | 49 | struct timeval time_start; 50 | double elapsed_time; 51 | 52 | pangulu_int32_t size; 53 | MPI_Comm_size(MPI_COMM_WORLD, &size); 54 | pangulu_common *common = (pangulu_common *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_common)); 55 | common->rank = rank; 56 | common->size = size; 57 | common->n = pangulu_n; 58 | #ifdef GPU_OPEN 59 | if (init_options->nb > 256 && sizeof(pangulu_inblock_idx) == 2) 60 | { 61 | init_options->nb = 256; 62 | if (rank == 0) 63 | { 64 | printf(PANGULU_W_GPU_BIG_BLOCK); 65 | } 66 | } 67 | #endif 68 | 69 | if (rank == 0) 70 | { 71 | if (init_options == NULL) 72 | { 73 | printf(PANGULU_E_OPTION_IS_NULLPTR); 74 | pangulu_exit(1); 75 | } 76 | if (init_options->nb == 0) 77 | { 78 | printf(PANGULU_E_NB_IS_ZERO); 79 | pangulu_exit(1); 80 | } 81 | } 82 | 83 | common->nb = init_options->nb; 84 | common->sum_rank_size = size; 85 | common->omp_thread = init_options->nthread; 86 | MPI_Bcast(&common->n, 1, MPI_PANGULU_EXBLOCK_IDX, 0, MPI_COMM_WORLD); 87 | MPI_Bcast(&common->nb, 1, MPI_PANGULU_INBLOCK_IDX, 0, MPI_COMM_WORLD); 88 | 89 | pangulu_int64_t tmp_p = sqrt(common->sum_rank_size); 90 | while (((common->sum_rank_size) % tmp_p) != 0) 91 | { 92 | tmp_p--; 93 | } 94 | 95 | common->p = tmp_p; 96 | common->q = common->sum_rank_size / tmp_p; 97 | pangulu_origin_smatrix *origin_smatrix = (pangulu_origin_smatrix *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_origin_smatrix)); 98 | pangulu_init_pangulu_origin_smatrix(origin_smatrix); 99 | 100 | if (rank == 0) 101 | { 102 | struct timeval start, end; 103 | gettimeofday(&start, NULL); 104 | pangulu_read_pangulu_origin_smatrix(origin_smatrix, pangulu_n, pangulu_nnz, csr_rowptr, csr_colidx, csr_value); 105 | gettimeofday(&end, NULL); 106 | if (origin_smatrix->row == 0) 107 | { 108 | printf(PANGULU_E_ROW_IS_ZERO); 109 | pangulu_exit(1); 110 | } 111 | } 112 | 113 | pangulu_int32_t p = common->p; 114 | pangulu_int32_t q = common->q; 115 | pangulu_int32_t nb = common->nb; 116 | MPI_Barrier(MPI_COMM_WORLD); 117 | common->n = pangulu_bcast_n(origin_smatrix->row, 0); 118 | pangulu_int64_t n = common->n; 119 | omp_set_num_threads(init_options->nthread); 120 | #if defined(OPENBLAS_CONFIG_H) || defined(OPENBLAS_VERSION) 121 | openblas_set_num_threads(1); 122 | #endif 123 | if (rank == 0) 124 | { 125 | // #ifdef ADAPTIVE_KERNEL_SELECTION 126 | // printf(PANGULU_I_ADAPTIVE_KERNEL_SELECTION_ON); 127 | // #else 128 | // printf(PANGULU_I_ADAPTIVE_KERNEL_SELECTION_OFF); 129 | // #endif 130 | // #ifdef SYNCHRONIZE_FREE 131 | // printf(PANGULU_I_SYNCHRONIZE_FREE_ON); 132 | // #else 133 | // printf(PANGULU_I_SYNCHRONIZE_FREE_OFF); 134 | // #endif 135 | #ifdef PANGULU_GPU_COMPLEX_FALLBACK_FLAG 136 | printf(PANGULU_W_COMPLEX_FALLBACK); 137 | #endif 138 | omp_thread = pangu_omp_num_threads; 139 | printf(PANGULU_I_BASIC_INFO); 140 | } 141 | 142 | #ifdef GPU_OPEN 143 | pangulu_cuda_device_init(rank); 144 | #endif 145 | 146 | pangulu_block_smatrix *block_smatrix = (pangulu_block_smatrix *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_block_smatrix)); 147 | pangulu_init_pangulu_block_smatrix(block_smatrix); 148 | pangulu_block_common *block_common = (pangulu_block_common *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_block_common)); 149 | block_common->rank = rank; 150 | block_common->p = p; 151 | block_common->q = q; 152 | block_common->nb = nb; 153 | block_common->n = n; 154 | block_common->block_length = pangulu_Calculate_Block(n, nb); 155 | block_common->sum_rank_size = common->sum_rank_size; 156 | block_common->max_pq = PANGULU_MAX(p, q); 157 | block_common->every_level_length = block_common->block_length; 158 | pangulu_bip_init(&(block_smatrix->BIP), block_common->block_length * (block_common->block_length + 1)); 159 | 160 | #ifdef SYNCHRONIZE_FREE 161 | block_common->every_level_length = 10; 162 | #else 163 | block_common->every_level_length = 1; 164 | #endif 165 | 166 | pangulu_origin_smatrix *reorder_matrix = (pangulu_origin_smatrix *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_origin_smatrix)); 167 | pangulu_init_pangulu_origin_smatrix(reorder_matrix); 168 | 169 | block_common->rank_row_length = (block_common->block_length / p + (((block_common->block_length % p) > (rank / q)) ? 1 : 0)); 170 | block_common->rank_col_length = (block_common->block_length / q + (((block_common->block_length % q) > (rank % q)) ? 1 : 0)); 171 | block_common->every_level_length = PANGULU_MIN(block_common->every_level_length, block_common->block_length); 172 | MPI_Barrier(MPI_COMM_WORLD); 173 | pangulu_time_start(&time_start); 174 | 175 | pangulu_reorder(block_smatrix, 176 | origin_smatrix, 177 | reorder_matrix); 178 | 179 | MPI_Barrier(MPI_COMM_WORLD); 180 | elapsed_time = pangulu_time_stop(&time_start); 181 | if (rank == 0) 182 | { 183 | printf(PANGULU_I_TIME_REORDER); 184 | } 185 | 186 | calculate_time = 0; 187 | 188 | MPI_Barrier(MPI_COMM_WORLD); 189 | pangulu_time_start(&time_start); 190 | if (rank == 0) 191 | { 192 | pangulu_symbolic(block_common, 193 | block_smatrix, 194 | reorder_matrix); 195 | } 196 | 197 | MPI_Barrier(MPI_COMM_WORLD); 198 | elapsed_time = pangulu_time_stop(&time_start); 199 | if (rank == 0) 200 | { 201 | printf(PANGULU_I_TIME_SYMBOLIC); 202 | } 203 | 204 | pangulu_init_heap_select(0); 205 | 206 | MPI_Barrier(MPI_COMM_WORLD); 207 | pangulu_time_start(&time_start); 208 | pangulu_preprocessing( 209 | block_common, 210 | block_smatrix, 211 | reorder_matrix, 212 | init_options->nthread); 213 | 214 | MPI_Barrier(MPI_COMM_WORLD); 215 | 216 | elapsed_time = pangulu_time_stop(&time_start); 217 | if (rank == 0) 218 | { 219 | printf(PANGULU_I_TIME_PRE); 220 | } 221 | 222 | // pangulu_free(__FILE__, __LINE__, block_smatrix->symbolic_rowpointer); 223 | // block_smatrix->symbolic_rowpointer = NULL; 224 | 225 | // pangulu_free(__FILE__, __LINE__, block_smatrix->symbolic_columnindex); 226 | // block_smatrix->symbolic_columnindex = NULL; 227 | 228 | pangulu_free(__FILE__, __LINE__, origin_smatrix); 229 | origin_smatrix = NULL; 230 | 231 | pangulu_free(__FILE__, __LINE__, reorder_matrix->rowpointer); 232 | pangulu_free(__FILE__, __LINE__, reorder_matrix->columnindex); 233 | pangulu_free(__FILE__, __LINE__, reorder_matrix->value); 234 | pangulu_free(__FILE__, __LINE__, reorder_matrix); 235 | reorder_matrix = NULL; 236 | 237 | (*pangulu_handle) = pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_handle_t)); 238 | (*(pangulu_handle_t **)pangulu_handle)->block_common = block_common; 239 | (*(pangulu_handle_t **)pangulu_handle)->block_smatrix = block_smatrix; 240 | (*(pangulu_handle_t **)pangulu_handle)->commmon = common; 241 | } 242 | 243 | void pangulu_gstrf(pangulu_gstrf_options *gstrf_options, void **pangulu_handle) 244 | { 245 | pangulu_block_common *block_common = (*(pangulu_handle_t **)pangulu_handle)->block_common; 246 | pangulu_block_smatrix *block_smatrix = (*(pangulu_handle_t **)pangulu_handle)->block_smatrix; 247 | pangulu_common *common = (*(pangulu_handle_t **)pangulu_handle)->commmon; 248 | 249 | struct timeval time_start; 250 | double elapsed_time; 251 | 252 | if (rank == 0) 253 | { 254 | if (gstrf_options == NULL) 255 | { 256 | printf(PANGULU_E_GSTRF_OPTION_IS_NULLPTR); 257 | pangulu_exit(1); 258 | } 259 | } 260 | 261 | #ifdef CHECK_TIME 262 | pangulu_time_init(); 263 | #endif 264 | MPI_Barrier(MPI_COMM_WORLD); 265 | 266 | #ifdef OVERLAP 267 | pangulu_create_pthread(block_common, 268 | block_smatrix); 269 | #endif 270 | 271 | pangulu_time_init(); 272 | MPI_Barrier(MPI_COMM_WORLD); 273 | pangulu_time_start(&time_start); 274 | 275 | pangulu_numeric(block_common, 276 | block_smatrix); 277 | 278 | MPI_Barrier(MPI_COMM_WORLD); 279 | elapsed_time = pangulu_time_stop(&time_start); 280 | 281 | if (rank == 0) 282 | { 283 | 284 | pangulu_int64_t another_calculate_time = 0; 285 | for (pangulu_int64_t i = 1; i < block_common->sum_rank_size; i++) 286 | { 287 | pangulu_recv_vector_int(&another_calculate_time, 1, i, 0); 288 | calculate_time += another_calculate_time; 289 | } 290 | flop = calculate_time * 2; 291 | } 292 | else 293 | { 294 | pangulu_send_vector_int(&calculate_time, 1, 0, 0); 295 | } 296 | 297 | if (rank == 0) 298 | { 299 | printf(PANGULU_I_TIME_NUMERICAL); 300 | } 301 | } 302 | 303 | void pangulu_gstrs(calculate_type *rhs, pangulu_gstrs_options *gstrs_options, void **pangulu_handle) 304 | { 305 | pangulu_block_common *block_common = (*(pangulu_handle_t **)pangulu_handle)->block_common; 306 | pangulu_block_smatrix *block_smatrix = (*(pangulu_handle_t **)pangulu_handle)->block_smatrix; 307 | pangulu_common *common = (*(pangulu_handle_t **)pangulu_handle)->commmon; 308 | 309 | struct timeval time_start; 310 | double elapsed_time; 311 | 312 | if (rank == 0) 313 | { 314 | if (gstrs_options == NULL) 315 | { 316 | printf(PANGULU_E_GSTRS_OPTION_IS_NULLPTR); 317 | pangulu_exit(1); 318 | } 319 | } 320 | 321 | pangulu_int64_t vector_length = common->n; 322 | pangulu_vector *x_vector = NULL; 323 | pangulu_vector *b_vector = NULL; 324 | pangulu_vector *answer_vector = NULL; 325 | 326 | if (rank == 0) 327 | { 328 | x_vector = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 329 | b_vector = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 330 | answer_vector = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 331 | b_vector->row = common->n; 332 | b_vector->value = rhs; 333 | pangulu_init_pangulu_vector(x_vector, vector_length); 334 | pangulu_init_pangulu_vector(answer_vector, vector_length); 335 | pangulu_reorder_vector_b_tran(block_smatrix, b_vector, answer_vector); 336 | } 337 | 338 | pangulu_sptrsv_preprocessing( 339 | block_common, 340 | block_smatrix, 341 | answer_vector); 342 | 343 | #ifdef PANGULU_SPTRSV 344 | 345 | MPI_Barrier(MPI_COMM_WORLD); 346 | pangulu_time_start(&time_start); 347 | 348 | pangulu_sptrsv_L(block_common, block_smatrix); 349 | pangulu_init_heap_select(4); 350 | pangulu_sptrsv_U(block_common, block_smatrix); 351 | 352 | MPI_Barrier(MPI_COMM_WORLD); 353 | elapsed_time = pangulu_time_stop(&time_start); 354 | 355 | if (rank == 0) 356 | { 357 | printf(PANGULU_I_TIME_SPTRSV); 358 | } 359 | 360 | #endif 361 | 362 | // check sptrsv answer 363 | pangulu_sptrsv_vector_gather(block_common, block_smatrix, answer_vector); 364 | 365 | int n = common->n; 366 | 367 | if (rank == 0) 368 | { 369 | pangulu_reorder_vector_x_tran(block_smatrix, answer_vector, x_vector); 370 | 371 | for (int i = 0; i < n; i++) 372 | { 373 | rhs[i] = x_vector->value[i]; 374 | } 375 | 376 | pangulu_destroy_pangulu_vector(x_vector); 377 | pangulu_destroy_pangulu_vector(answer_vector); 378 | pangulu_free(__FILE__, __LINE__, b_vector); 379 | } 380 | } 381 | 382 | void pangulu_gssv(calculate_type *rhs, pangulu_gstrf_options *gstrf_options, pangulu_gstrs_options *gstrs_options, void **pangulu_handle) 383 | { 384 | pangulu_gstrf(gstrf_options, pangulu_handle); 385 | pangulu_gstrs(rhs, gstrs_options, pangulu_handle); 386 | } 387 | 388 | void pangulu_finalize(void **pangulu_handle) 389 | { 390 | pangulu_block_common *block_common = (*(pangulu_handle_t **)pangulu_handle)->block_common; 391 | pangulu_block_smatrix *block_smatrix = (*(pangulu_handle_t **)pangulu_handle)->block_smatrix; 392 | pangulu_common *common = (*(pangulu_handle_t **)pangulu_handle)->commmon; 393 | 394 | pangulu_destroy(block_common, block_smatrix); 395 | 396 | pangulu_free(__FILE__, __LINE__, block_common); 397 | pangulu_free(__FILE__, __LINE__, block_smatrix); 398 | pangulu_free(__FILE__, __LINE__, common); 399 | pangulu_free(__FILE__, __LINE__, *(pangulu_handle_t **)pangulu_handle); 400 | } -------------------------------------------------------------------------------- /src/pangulu_addmatrix.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | void pangulu_add_pangulu_smatrix_cpu(pangulu_smatrix *a, 4 | pangulu_smatrix *b) 5 | { 6 | for (pangulu_int64_t i = 0; i < a->nnz; i++) 7 | { 8 | a->value_csc[i] += b->value_csc[i]; 9 | } 10 | } 11 | 12 | void pangulu_add_pangulu_smatrix_csr_to_csc(pangulu_smatrix *a) 13 | { 14 | for (pangulu_int64_t i = 0; i < a->nnz; i++) 15 | { 16 | a->value_csc[i] += a->value[i]; 17 | } 18 | } -------------------------------------------------------------------------------- /src/pangulu_addmatrix_cuda.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | void pangulu_add_pangulu_smatrix_cuda(pangulu_smatrix *a, 4 | pangulu_smatrix *b) 5 | { 6 | #ifdef GPU_OPEN 7 | pangulu_cuda_vector_add_kernel(a->nnz, a->cuda_value, b->cuda_value); 8 | #endif 9 | } -------------------------------------------------------------------------------- /src/pangulu_check.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | void pangulu_multiply_upper_upper_u(pangulu_block_common *block_common, 4 | pangulu_block_smatrix *block_smatrix, 5 | pangulu_vector *x, pangulu_vector *b) 6 | { 7 | pangulu_int64_t block_length = block_common->block_length; 8 | pangulu_int64_t nb = block_common->nb; 9 | pangulu_block_info_pool* BIP = block_smatrix->BIP; 10 | pangulu_smatrix *big_smatrix_value = block_smatrix->big_pangulu_smatrix_value; 11 | pangulu_smatrix **diagonal_U = block_smatrix->diagonal_smatrix_u; 12 | pangulu_int64_t *mapper_diagonal = block_smatrix->mapper_diagonal; 13 | if(block_smatrix->current_rank_block_count == 0){ 14 | return; 15 | } 16 | for (pangulu_int64_t row = 0; row < block_length; row++) 17 | { 18 | pangulu_int64_t row_offset = row * nb; 19 | for (pangulu_int64_t col = row; col < block_length; col++) 20 | { 21 | pangulu_int64_t mapper_index = pangulu_bip_get(row * block_length + col, BIP)->mapper_a; 22 | pangulu_int64_t col_offset = col * nb; 23 | if (row == col) 24 | { 25 | pangulu_int64_t diagonal_index = mapper_diagonal[row]; 26 | pangulu_pangulu_smatrix_multiply_block_pangulu_vector_csc(diagonal_U[diagonal_index], 27 | x->value + col_offset, 28 | b->value + row_offset); 29 | if (rank == -1) 30 | { 31 | pangulu_display_pangulu_smatrix_csc(diagonal_U[diagonal_index]); 32 | } 33 | } 34 | else 35 | { 36 | pangulu_pangulu_smatrix_multiply_block_pangulu_vector_csc(&big_smatrix_value[mapper_index], 37 | x->value + col_offset, 38 | b->value + row_offset); 39 | 40 | } 41 | } 42 | } 43 | } 44 | 45 | void pangulu_multiply_triggle_l(pangulu_block_common *block_common, 46 | pangulu_block_smatrix *block_smatrix, 47 | pangulu_vector *x, pangulu_vector *b) 48 | { 49 | pangulu_int64_t block_length = block_common->block_length; 50 | pangulu_int64_t nb = block_common->nb; 51 | pangulu_block_info_pool* BIP = block_smatrix->BIP; 52 | pangulu_smatrix *big_smatrix_value = block_smatrix->big_pangulu_smatrix_value; 53 | pangulu_smatrix **diagonal_L = block_smatrix->diagonal_smatrix_l; 54 | pangulu_int64_t *mapper_diagonal = block_smatrix->mapper_diagonal; 55 | if(block_smatrix->current_rank_block_count == 0){ 56 | return; 57 | } 58 | for (pangulu_int64_t row = 0; row < block_length; row++) 59 | { 60 | pangulu_int64_t row_offset = row * nb; 61 | for (pangulu_int64_t col = 0; col <= row; col++) 62 | { 63 | pangulu_int64_t mapper_index = pangulu_bip_get(row * block_length + col, BIP)->mapper_a; 64 | pangulu_int64_t col_offset = col * nb; 65 | if (row == col) 66 | { 67 | pangulu_int64_t diagonal_index = mapper_diagonal[col]; 68 | pangulu_pangulu_smatrix_multiply_block_pangulu_vector_csc(diagonal_L[diagonal_index], 69 | x->value + col_offset, 70 | b->value + row_offset); 71 | } 72 | else 73 | { 74 | pangulu_pangulu_smatrix_multiply_block_pangulu_vector_csc(&big_smatrix_value[mapper_index], 75 | x->value + col_offset, 76 | b->value + row_offset); 77 | } 78 | } 79 | } 80 | } 81 | 82 | void pangulu_gather_pangulu_vector_to_rank_0(pangulu_int64_t rank, 83 | pangulu_vector *gather_v, 84 | pangulu_int64_t vector_length, 85 | pangulu_int64_t sum_rank_size) 86 | { 87 | if (rank == 0) 88 | { 89 | pangulu_vector *save_vector = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 90 | pangulu_init_pangulu_vector(save_vector, vector_length); 91 | 92 | for (pangulu_int64_t i = 1; i < sum_rank_size; i++) 93 | { 94 | pangulu_recv_pangulu_vector_value(save_vector, i, i, vector_length); 95 | for (pangulu_int64_t j = 0; j < vector_length; j++) 96 | { 97 | gather_v->value[j] += save_vector->value[j]; 98 | } 99 | } 100 | for (pangulu_int64_t i = 1; i < sum_rank_size; i++) 101 | { 102 | pangulu_send_pangulu_vector_value(gather_v, i, i, vector_length); 103 | } 104 | pangulu_free(__FILE__, __LINE__, save_vector->value); 105 | pangulu_free(__FILE__, __LINE__, save_vector); 106 | } 107 | else 108 | { 109 | pangulu_send_pangulu_vector_value(gather_v, 0, rank, vector_length); 110 | pangulu_recv_pangulu_vector_value(gather_v, 0, rank, vector_length); 111 | } 112 | } 113 | 114 | calculate_type vec2norm(const calculate_type *x, pangulu_int64_t n) 115 | { 116 | calculate_type sum = 0.0; 117 | for (pangulu_int64_t i = 0; i < n; i++) 118 | sum += x[i] * x[i]; 119 | return sqrt(sum); 120 | } 121 | 122 | calculate_type sub_vec2norm(const calculate_type *x1, const calculate_type *x2, pangulu_int64_t n) 123 | { 124 | calculate_type sum = 0.0; 125 | for (pangulu_int64_t i = 0; i < n; i++) 126 | sum += (x1[i] - x2[i]) * (x1[i] - x2[i]); 127 | return sqrt(sum); 128 | } 129 | 130 | void pangulu_check_answer_vec2norm(pangulu_vector *X1, pangulu_vector *X2, pangulu_int64_t n) 131 | { 132 | calculate_type vec2 = vec2norm(X1->value, n); 133 | double error = sub_vec2norm(X1->value, X2->value, n) / vec2; 134 | 135 | printf(PANGULU_I_VECT2NORM_ERR); 136 | if (fabs(error) < 1e-10) 137 | { 138 | printf(PANGULU_I_CHECK_PASS); 139 | } 140 | else 141 | { 142 | printf(PANGULU_I_CHECK_ERROR); 143 | } 144 | } 145 | 146 | void pangulu_check(pangulu_block_common *block_common, 147 | pangulu_block_smatrix *block_smatrix, 148 | pangulu_origin_smatrix *origin_smatrix) 149 | { 150 | pangulu_exblock_idx n = block_common->n; 151 | pangulu_inblock_idx nb = block_common->nb; 152 | pangulu_exblock_idx vector_length = ((n + nb - 1) / nb) * nb; 153 | pangulu_int32_t sum_rank_size = block_common->sum_rank_size; 154 | 155 | pangulu_vector *x = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 156 | pangulu_get_init_value_pangulu_vector(x, vector_length); 157 | pangulu_vector *b1 = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 158 | pangulu_init_pangulu_vector(b1, vector_length); 159 | 160 | if (rank == 0) 161 | { 162 | pangulu_origin_smatrix_multiple_pangulu_vector_csr(origin_smatrix, x, b1); 163 | } 164 | 165 | pangulu_vector *b2 = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 166 | pangulu_get_init_value_pangulu_vector(b2, vector_length); 167 | 168 | pangulu_vector *b3 = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 169 | pangulu_init_pangulu_vector(b3, vector_length); 170 | 171 | pangulu_vector *b4 = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector)); 172 | pangulu_init_pangulu_vector(b4, vector_length); 173 | pangulu_multiply_upper_upper_u(block_common, block_smatrix, b2, b3); 174 | pangulu_gather_pangulu_vector_to_rank_0(rank, b3, vector_length, sum_rank_size); 175 | pangulu_multiply_triggle_l(block_common, block_smatrix, b3, b4); 176 | pangulu_gather_pangulu_vector_to_rank_0(rank, b4, vector_length, sum_rank_size); 177 | if (rank == 0) 178 | { 179 | // pangulu_check_answer(b1, b4, n); 180 | pangulu_check_answer_vec2norm(b1, b4, n); 181 | } 182 | 183 | pangulu_destroy_pangulu_vector(x); 184 | pangulu_destroy_pangulu_vector(b1); 185 | pangulu_destroy_pangulu_vector(b2); 186 | pangulu_destroy_pangulu_vector(b3); 187 | pangulu_destroy_pangulu_vector(b4); 188 | } 189 | 190 | long double max_check_ld(long double* x, int n) 191 | { 192 | long double max = __DBL_MIN__; 193 | for (int i = 0; i < n; i++) { 194 | long double x_fabs = fabsl(x[i]); 195 | max = max > x_fabs ? max : x_fabs; 196 | } 197 | return max; 198 | } 199 | 200 | 201 | // Multiply a csr matrix with a vector x, and get the resulting vector y ,sum use kekan 202 | // sum 203 | void spmv_ld(int n, const pangulu_int64_t* row_ptr, const pangulu_int32_t* col_idx, const long double* val, const long double* x, long double* y) 204 | { 205 | for (int i = 0; i < n; i++) { 206 | y[i] = 0.0; 207 | long double c = 0.0; 208 | for (int j = row_ptr[i]; j < row_ptr[i + 1]; j++) { 209 | long double num = val[j] * x[col_idx[j]]; 210 | long double z = num - c; 211 | long double t = y[i] + z; 212 | c = (t - y[i]) - z; 213 | y[i] = t; 214 | } 215 | } 216 | } 217 | -------------------------------------------------------------------------------- /src/pangulu_cuda_interface.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | #ifdef GPU_OPEN 4 | void pangulu_cuda_device_init(pangulu_int32_t rank) 5 | { 6 | pangulu_int32_t gpu_num; 7 | pangulu_cuda_getdevicenum(&gpu_num); 8 | pangulu_int32_t usr_id = pangulu_cuda_setdevice(gpu_num, rank); 9 | struct cudaDeviceProp prop; 10 | cudaGetDeviceProperties(&prop, usr_id); 11 | if (rank == 0) 12 | printf(PANGULU_I_DEV_IS); 13 | } 14 | 15 | void pangulu_cuda_device_init_thread(pangulu_int32_t rank) 16 | { 17 | pangulu_int32_t gpu_num; 18 | pangulu_cuda_getdevicenum(&gpu_num); 19 | pangulu_cuda_setdevice(gpu_num, rank); 20 | } 21 | 22 | void pangulu_cuda_free_interface(void *cuda_address) 23 | { 24 | pangulu_cuda_free(cuda_address); 25 | } 26 | 27 | void pangulu_smatrix_add_cuda_memory(pangulu_smatrix *s) 28 | { 29 | pangulu_cuda_malloc((void **)&(s->cuda_rowpointer), ((s->row) + 1) * sizeof(pangulu_int64_t)); 30 | pangulu_cuda_malloc((void **)&(s->cuda_columnindex), (s->nnz) * sizeof(pangulu_inblock_idx)); 31 | pangulu_cuda_malloc((void **)&(s->cuda_value), (s->nnz) * sizeof(calculate_type)); 32 | pangulu_cuda_malloc((void **)&(s->cuda_bin_rowpointer), BIN_LENGTH * sizeof(pangulu_int64_t)); 33 | pangulu_cuda_malloc((void **)&(s->cuda_bin_rowindex), (s->row) * sizeof(pangulu_inblock_idx)); 34 | } 35 | 36 | void pangulu_smatrix_cuda_memory_init(pangulu_smatrix *s, pangulu_int64_t nb, pangulu_int64_t nnz) 37 | { 38 | s->row = nb; 39 | s->column = nb; 40 | s->nnz = nnz; 41 | pangulu_cuda_malloc((void **)&(s->cuda_rowpointer), (nb + 1) * sizeof(pangulu_int64_t)); 42 | pangulu_cuda_malloc((void **)&(s->cuda_columnindex), nnz * sizeof(pangulu_inblock_idx)); 43 | pangulu_cuda_malloc((void **)&(s->cuda_value), nnz * sizeof(calculate_type)); 44 | } 45 | 46 | void pangulu_smatrix_add_cuda_memory_u(pangulu_smatrix *u) 47 | { 48 | pangulu_cuda_malloc((void **)&(u->cuda_nnzu), (u->row) * sizeof(int)); 49 | } 50 | 51 | void pangulu_smatrix_cuda_memcpy_a(pangulu_smatrix *s) 52 | { 53 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(s->cuda_rowpointer, s->columnpointer, (s->row) + 1); 54 | pangulu_cuda_memcpy_host_to_device_inblock_idx(s->cuda_columnindex, s->rowindex, s->nnz); 55 | pangulu_cuda_memcpy_host_to_device_value(s->cuda_value, s->value_csc, s->nnz); 56 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(s->cuda_bin_rowpointer, s->bin_rowpointer, BIN_LENGTH); 57 | pangulu_cuda_memcpy_host_to_device_inblock_idx(s->cuda_bin_rowindex, s->bin_rowindex, s->row); 58 | } 59 | 60 | void pangulu_smatrix_cuda_memcpy_struct_csr(pangulu_smatrix *calculate_s, pangulu_smatrix *s) 61 | { 62 | calculate_s->nnz = s->nnz; 63 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(calculate_s->cuda_rowpointer, s->rowpointer, (s->row) + 1); 64 | pangulu_cuda_memcpy_host_to_device_inblock_idx(calculate_s->cuda_columnindex, s->columnindex, s->nnz); 65 | } 66 | 67 | void pangulu_smatrix_cuda_memcpy_struct_csc(pangulu_smatrix *calculate_s, pangulu_smatrix *s) 68 | { 69 | calculate_s->nnz = s->nnz; 70 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(calculate_s->cuda_rowpointer, s->columnpointer, (s->row) + 1); 71 | pangulu_cuda_memcpy_host_to_device_inblock_idx(calculate_s->cuda_columnindex, s->rowindex, s->nnz); 72 | } 73 | 74 | void pangulu_smatrix_cuda_memcpy_complete_csr(pangulu_smatrix *calculate_s, pangulu_smatrix *s) 75 | { 76 | calculate_s->nnz = s->nnz; 77 | #ifdef check_time 78 | struct timeval get_time_start; 79 | pangulu_time_check_begin(&get_time_start); 80 | #endif 81 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(calculate_s->cuda_rowpointer, s->rowpointer, (s->row) + 1); 82 | pangulu_cuda_memcpy_host_to_device_inblock_idx(calculate_s->cuda_columnindex, s->columnindex, s->nnz); 83 | pangulu_cuda_memcpy_host_to_device_value(calculate_s->cuda_value, s->value, s->nnz); 84 | #ifdef check_time 85 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 86 | #endif 87 | } 88 | 89 | void pangulu_smatrix_cuda_memcpy_nnzu(pangulu_smatrix *calculate_u, pangulu_smatrix *u) 90 | { 91 | pangulu_cuda_memcpy_host_to_device_int32(calculate_u->cuda_nnzu, u->nnzu, calculate_u->row); 92 | } 93 | 94 | void pangulu_smatrix_cuda_memcpy_value_csr(pangulu_smatrix *s, pangulu_smatrix *calculate_s) 95 | { 96 | #ifdef check_time 97 | struct timeval get_time_start; 98 | pangulu_time_check_begin(&get_time_start); 99 | #endif 100 | pangulu_cuda_memcpy_device_to_host_value(s->value, calculate_s->cuda_value, s->nnz); 101 | 102 | #ifdef check_time 103 | 104 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 105 | #endif 106 | } 107 | 108 | void pangulu_smatrix_cuda_memcpy_value_csr_async(pangulu_smatrix *s, pangulu_smatrix *calculate_s, cudaStream_t *stream) 109 | { 110 | #ifdef check_time 111 | struct timeval get_time_start; 112 | pangulu_time_check_begin(&get_time_start); 113 | #endif 114 | pangulu_cudamemcpyasync_device_to_host(s->value, calculate_s->cuda_value, (s->nnz) * sizeof(calculate_type), stream); 115 | 116 | #ifdef check_time 117 | 118 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 119 | #endif 120 | } 121 | 122 | void pangulu_smatrix_cuda_memcpy_value_csc(pangulu_smatrix *s, pangulu_smatrix *calculate_s) 123 | { 124 | #ifdef check_time 125 | struct timeval get_time_start; 126 | pangulu_time_check_begin(&get_time_start); 127 | #endif 128 | pangulu_cuda_memcpy_device_to_host_value(s->value_csc, calculate_s->cuda_value, s->nnz); 129 | #ifdef check_time 130 | 131 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 132 | #endif 133 | } 134 | 135 | void pangulu_smatrix_cuda_memcpy_value_csc_async(pangulu_smatrix *s, pangulu_smatrix *calculate_s, cudaStream_t *stream) 136 | { 137 | #ifdef check_time 138 | struct timeval get_time_start; 139 | pangulu_time_check_begin(&get_time_start); 140 | #endif 141 | pangulu_cudamemcpyasync_device_to_host(s->value_csc, calculate_s->cuda_value, (s->nnz) * sizeof(calculate_type), stream); 142 | #ifdef check_time 143 | 144 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 145 | #endif 146 | } 147 | 148 | void pangulu_smatrix_cuda_memcpy_value_csc_cal_length(pangulu_smatrix *s, pangulu_smatrix *calculate_s) 149 | { 150 | 151 | pangulu_cuda_memcpy_device_to_host_value(s->value_csc, calculate_s->cuda_value, calculate_s->nnz); 152 | } 153 | 154 | void pangulu_smatrix_cuda_memcpy_to_device_value_csc_async(pangulu_smatrix *calculate_s, pangulu_smatrix *s, cudaStream_t *stream) 155 | { 156 | #ifdef check_time 157 | struct timeval get_time_start; 158 | pangulu_time_check_begin(&get_time_start); 159 | #endif 160 | 161 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_value, s->value_csc, sizeof(calculate_type) * (s->nnz), stream); 162 | 163 | #ifdef check_time 164 | 165 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 166 | #endif 167 | } 168 | 169 | void pangulu_smatrix_cuda_memcpy_to_device_value_csc(pangulu_smatrix *calculate_s, pangulu_smatrix *s) 170 | { 171 | #ifdef check_time 172 | struct timeval get_time_start; 173 | pangulu_time_check_begin(&get_time_start); 174 | #endif 175 | 176 | pangulu_cuda_memcpy_host_to_device_value(calculate_s->cuda_value, s->value_csc, s->nnz); 177 | 178 | #ifdef check_time 179 | 180 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 181 | #endif 182 | } 183 | 184 | void pangulu_smatrix_cuda_memcpy_complete_csr_async(pangulu_smatrix *calculate_s, pangulu_smatrix *s, cudaStream_t *stream) 185 | { 186 | calculate_s->nnz = s->nnz; 187 | #ifdef check_time 188 | struct timeval get_time_start; 189 | pangulu_time_check_begin(&get_time_start); 190 | #endif 191 | 192 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_rowpointer, s->rowpointer, sizeof(pangulu_int64_t) * ((s->row) + 1), stream); 193 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_columnindex, s->columnindex, sizeof(pangulu_int32_t) * s->nnz, stream); 194 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_value, s->value, sizeof(calculate_type) * s->nnz, stream); 195 | 196 | #ifdef check_time 197 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 198 | #endif 199 | } 200 | 201 | void pangulu_smatrix_cuda_memcpy_complete_csc_async(pangulu_smatrix *calculate_s, pangulu_smatrix *s, cudaStream_t *stream) 202 | { 203 | calculate_s->nnz = s->nnz; 204 | #ifdef check_time 205 | struct timeval get_time_start; 206 | pangulu_time_check_begin(&get_time_start); 207 | #endif 208 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_rowpointer, s->columnpointer, sizeof(pangulu_int64_t) * ((s->row) + 1), stream); 209 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_columnindex, s->rowindex, sizeof(pangulu_int32_t) * s->nnz, stream); 210 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_value, s->value_csc, sizeof(calculate_type) * s->nnz, stream); 211 | 212 | #ifdef check_time 213 | 214 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 215 | #endif 216 | } 217 | 218 | void pangulu_smatrix_cuda_memcpy_complete_csc(pangulu_smatrix *calculate_s, pangulu_smatrix *s) 219 | { 220 | calculate_s->nnz = s->nnz; 221 | #ifdef check_time 222 | struct timeval get_time_start; 223 | pangulu_time_check_begin(&get_time_start); 224 | #endif 225 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(calculate_s->cuda_rowpointer, s->columnpointer, (s->row) + 1); 226 | pangulu_cuda_memcpy_host_to_device_inblock_idx(calculate_s->cuda_columnindex, s->rowindex, s->nnz); 227 | pangulu_cuda_memcpy_host_to_device_value(calculate_s->cuda_value, s->value_csc, s->nnz); 228 | #ifdef check_time 229 | 230 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start); 231 | #endif 232 | } 233 | #endif -------------------------------------------------------------------------------- /src/pangulu_gessm_fp64.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | void pangulu_gessm_fp64_cpu_1(pangulu_smatrix *a, 3 | pangulu_smatrix *l, 4 | pangulu_smatrix *x) 5 | { 6 | 7 | pangulu_inblock_ptr *a_rowpointer = a->rowpointer; 8 | pangulu_inblock_idx *a_colindex = a->columnindex; 9 | calculate_type *a_value = x->value; 10 | 11 | pangulu_inblock_ptr *l_colpointer = l->columnpointer; 12 | pangulu_inblock_idx *l_rowindex = l->rowindex; 13 | calculate_type *l_value = l->value_csc; 14 | 15 | pangulu_inblock_ptr *x_rowpointer = a->rowpointer; 16 | pangulu_inblock_idx *x_colindex = a->columnindex; 17 | calculate_type *x_value = a->value; 18 | 19 | pangulu_int64_t n = a->row; 20 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 21 | for (pangulu_int64_t i = 0; i < a->nnz; i++) 22 | { 23 | x_value[i] = 0.0; 24 | } 25 | 26 | for (pangulu_int64_t i = 0; i < n; i++) 27 | { 28 | // x get value from a 29 | for (pangulu_int64_t k = x_rowpointer[i]; k < x_rowpointer[i + 1]; k++) 30 | { 31 | x_value[k] = a_value[k]; 32 | } 33 | // update Value 34 | if (x_rowpointer[i] != x_rowpointer[i + 1]) 35 | { 36 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 37 | for (pangulu_int64_t j = l_colpointer[i]; j < l_colpointer[i + 1]; j++) 38 | { 39 | 40 | for (pangulu_int64_t p = a_rowpointer[l_rowindex[j]], k = x_rowpointer[i]; p < a_rowpointer[l_rowindex[j] + 1]; p++, k++) 41 | { 42 | if (a_colindex[p] == x_colindex[k]) 43 | { 44 | a_value[p] -= l_value[j] * x_value[k]; 45 | } 46 | else 47 | { 48 | k--; 49 | } 50 | } 51 | } 52 | } 53 | } 54 | } 55 | 56 | void pangulu_gessm_fp64_cpu_2(pangulu_smatrix *a, 57 | pangulu_smatrix *l, 58 | pangulu_smatrix *x) 59 | { 60 | 61 | pangulu_inblock_ptr *a_columnpointer = a->columnpointer; 62 | pangulu_inblock_idx *a_rowidx = a->rowindex; 63 | 64 | calculate_type *a_value = a->value_csc; 65 | 66 | pangulu_inblock_ptr *l_rowpointer = l->rowpointer; 67 | 68 | pangulu_inblock_ptr *l_colpointer = l->columnpointer; 69 | pangulu_inblock_idx *l_rowindex = l->rowindex; 70 | calculate_type *l_value = l->value_csc; 71 | 72 | pangulu_int64_t n = a->row; 73 | 74 | pangulu_int64_t *spointer = (pangulu_int64_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_int64_t) * (n + 1)); 75 | memset(spointer, 0, sizeof(pangulu_int64_t) * (n + 1)); 76 | int rhs = 0; 77 | for (pangulu_int64_t i = 0; i < n; i++) 78 | { 79 | if (a_columnpointer[i] != a_columnpointer[i + 1]) 80 | { 81 | spointer[rhs] = i; 82 | rhs++; 83 | } 84 | } 85 | 86 | calculate_type *C_b = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * rhs); 87 | calculate_type *D_x = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * rhs); 88 | 89 | memset(C_b, 0.0, sizeof(calculate_type) * n * rhs); 90 | memset(D_x, 0.0, sizeof(calculate_type) * n * rhs); 91 | 92 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 93 | for (int i = 0; i < rhs; i++) 94 | { 95 | int index = spointer[i]; 96 | for (int j = a_columnpointer[index]; j < a_columnpointer[index + 1]; j++) 97 | { 98 | C_b[i * n + a_rowidx[j]] = a_value[j]; 99 | } 100 | } 101 | 102 | int nlevel = 0; 103 | int *levelPtr = (int *)pangulu_malloc(__FILE__, __LINE__, sizeof(int) * (n + 1)); 104 | int *levelItem = (int *)pangulu_malloc(__FILE__, __LINE__, sizeof(int) * n); 105 | findlevel(l_colpointer, l_rowindex, l_rowpointer, n, &nlevel, levelPtr, levelItem); 106 | 107 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 108 | for (int i = 0; i < rhs; i++) 109 | { 110 | for (int li = 0; li < nlevel; li++) 111 | { 112 | 113 | for (int ri = levelPtr[li]; ri < levelPtr[li + 1]; ri++) 114 | { 115 | for (int j = l_colpointer[levelItem[ri]] + 1; j < l_colpointer[levelItem[ri] + 1]; j++) 116 | { 117 | C_b[i * n + l_rowindex[j]] -= l_value[j] * C_b[i * n + levelItem[ri]]; 118 | } 119 | } 120 | } 121 | } 122 | 123 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 124 | for (int i = 0; i < rhs; i++) 125 | { 126 | int index = spointer[i]; 127 | for (int j = a_columnpointer[index]; j < a_columnpointer[index + 1]; j++) 128 | { 129 | a_value[j] = C_b[i * n + a_rowidx[j]]; 130 | } 131 | } 132 | 133 | pangulu_free(__FILE__, __LINE__, spointer); 134 | pangulu_free(__FILE__, __LINE__, C_b); 135 | pangulu_free(__FILE__, __LINE__, D_x); 136 | } 137 | 138 | void pangulu_gessm_fp64_cpu_3(pangulu_smatrix *a, 139 | pangulu_smatrix *l, 140 | pangulu_smatrix *x) 141 | { 142 | 143 | pangulu_inblock_ptr *a_columnpointer = a->columnpointer; 144 | pangulu_inblock_idx *a_rowidx = a->rowindex; 145 | 146 | calculate_type *a_value = a->value_csc; 147 | 148 | pangulu_inblock_ptr *l_columnpointer = l->columnpointer; 149 | pangulu_inblock_idx *l_rowidx = l->rowindex; 150 | calculate_type *l_value = l->value_csc; 151 | 152 | pangulu_int64_t n = a->row; 153 | 154 | calculate_type *C_b = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * n); 155 | 156 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 157 | for (int i = 0; i < n; i++) 158 | { 159 | for (int j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++) 160 | { 161 | int idx = a_rowidx[j]; 162 | C_b[i * n + idx] = a_value[j]; 163 | } 164 | } 165 | 166 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 167 | for (pangulu_int64_t i = 0; i < n; i++) 168 | { 169 | for (pangulu_int64_t j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++) 170 | { 171 | pangulu_inblock_idx idx = a_rowidx[j]; 172 | for (pangulu_int64_t k = l_columnpointer[idx] + 1; k < l_columnpointer[idx + 1]; k++) 173 | { 174 | C_b[i * n + l_rowidx[k]] -= l_value[k] * C_b[i * n + a_rowidx[j]]; 175 | } 176 | } 177 | } 178 | 179 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 180 | for (int i = 0; i < n; i++) 181 | { 182 | for (int j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++) 183 | { 184 | int idx = a_rowidx[j]; 185 | a_value[j] = C_b[i * n + idx]; 186 | } 187 | } 188 | pangulu_free(__FILE__, __LINE__, C_b); 189 | } 190 | 191 | void pangulu_gessm_fp64_cpu_4(pangulu_smatrix *a, 192 | pangulu_smatrix *l, 193 | pangulu_smatrix *x) 194 | { 195 | 196 | pangulu_inblock_ptr *a_columnpointer = a->columnpointer; 197 | pangulu_inblock_idx *a_rowidx = a->rowindex; 198 | 199 | calculate_type *a_value = a->value_csc; 200 | 201 | pangulu_inblock_ptr *l_columnpointer = l->columnpointer; 202 | pangulu_inblock_idx *l_rowidx = l->rowindex; 203 | calculate_type *l_value = l->value_csc; 204 | 205 | pangulu_int64_t n = a->row; 206 | 207 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 208 | for (pangulu_int64_t i = 0; i < n; i++) 209 | { 210 | for (pangulu_int64_t j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++) 211 | { 212 | pangulu_inblock_idx idx = a_rowidx[j]; 213 | for (pangulu_int64_t k = l_columnpointer[idx] + 1, p = j + 1; k < l_columnpointer[idx + 1] && p < a_columnpointer[i + 1]; k++, p++) 214 | { 215 | if (l_rowidx[k] == a_rowidx[p]) 216 | { 217 | a_value[p] -= l_value[k] * a_value[j]; 218 | } 219 | else 220 | { 221 | k--; 222 | } 223 | } 224 | } 225 | } 226 | } 227 | 228 | void pangulu_gessm_fp64_cpu_5(pangulu_smatrix *a, 229 | pangulu_smatrix *l, 230 | pangulu_smatrix *x) 231 | { 232 | 233 | pangulu_inblock_ptr *a_rowpointer = a->rowpointer; 234 | pangulu_inblock_idx *a_colindex = a->columnindex; 235 | calculate_type *a_value = x->value; 236 | 237 | pangulu_inblock_ptr *l_colpointer = l->columnpointer; 238 | pangulu_inblock_idx *l_rowindex = l->rowindex; 239 | calculate_type *l_value = l->value_csc; 240 | 241 | pangulu_inblock_ptr *x_rowpointer = a->rowpointer; 242 | pangulu_inblock_idx *x_colindex = a->columnindex; 243 | calculate_type *x_value = a->value; 244 | 245 | pangulu_int64_t n = a->row; 246 | 247 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 248 | for (int i = 0; i < n; i++) // jth column of U 249 | { 250 | for (int j = a_rowpointer[i]; j < a_rowpointer[i + 1]; j++) 251 | { 252 | pangulu_inblock_idx idx = a_colindex[j]; 253 | temp_a_value[i * n + idx] = a_value[j]; // tranform csr to dense,only value 254 | } 255 | } 256 | 257 | for (pangulu_int64_t i = 0; i < n; i++) 258 | { 259 | // x get value from a 260 | for (pangulu_int64_t k = x_rowpointer[i]; k < x_rowpointer[i + 1]; k++) 261 | { 262 | x_value[k] = temp_a_value[i * n + x_colindex[k]]; 263 | } 264 | // update Value 265 | if (x_rowpointer[i] != x_rowpointer[i + 1]) 266 | { 267 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 268 | for (pangulu_int64_t j = l_colpointer[i] + 1; j < l_colpointer[i + 1]; j++) 269 | { 270 | pangulu_inblock_idx idx1 = l_rowindex[j]; 271 | 272 | for (pangulu_int64_t p = x_rowpointer[i]; p < x_rowpointer[i + 1]; p++) 273 | { 274 | 275 | pangulu_inblock_idx idx2 = a_colindex[p]; 276 | temp_a_value[idx1 * n + idx2] -= l_value[j] * temp_a_value[i * n + idx2]; 277 | } 278 | } 279 | } 280 | } 281 | } 282 | 283 | void pangulu_gessm_fp64_cpu_6(pangulu_smatrix *a, 284 | pangulu_smatrix *l, 285 | pangulu_smatrix *x) 286 | { 287 | 288 | pangulu_inblock_ptr *a_columnpointer = a->columnpointer; 289 | pangulu_inblock_idx *a_rowidx = a->rowindex; 290 | 291 | calculate_type *a_value = a->value_csc; 292 | 293 | pangulu_inblock_ptr *l_columnpointer = l->columnpointer; 294 | pangulu_inblock_idx *l_rowidx = l->rowindex; 295 | calculate_type *l_value = l->value_csc; 296 | 297 | pangulu_int64_t n = a->row; 298 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 299 | for (int i = 0; i < n; i++) // jth column of U 300 | { 301 | for (int j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++) 302 | { 303 | int idx = a_rowidx[j]; 304 | temp_a_value[i * n + idx] = a_value[j]; // tranform csr to dense,only value 305 | } 306 | } 307 | 308 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 309 | for (pangulu_int64_t i = 0; i < n; i++) 310 | { 311 | for (pangulu_int64_t j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++) 312 | { 313 | pangulu_inblock_idx idx = a_rowidx[j]; 314 | a_value[j] = temp_a_value[i * n + idx]; 315 | for (pangulu_int64_t k = l_columnpointer[idx] + 1; k < l_columnpointer[idx + 1]; k++) 316 | { 317 | temp_a_value[i * n + l_rowidx[k]] -= l_value[k] * a_value[j]; 318 | } 319 | } 320 | } 321 | } 322 | 323 | int findlevel(const pangulu_inblock_ptr *cscColPtr, 324 | const pangulu_inblock_idx *cscRowIdx, 325 | const pangulu_inblock_ptr *csrRowPtr, 326 | const pangulu_int64_t m, 327 | int *nlevel, 328 | int *levelPtr, 329 | int *levelItem) 330 | { 331 | int *indegree = (int *)pangulu_malloc(__FILE__, __LINE__, m * sizeof(int)); 332 | 333 | for (int i = 0; i < m; i++) 334 | { 335 | indegree[i] = csrRowPtr[i + 1] - csrRowPtr[i]; 336 | } 337 | 338 | int ptr = 0; 339 | 340 | levelPtr[0] = 0; 341 | for (int i = 0; i < m; i++) 342 | { 343 | if (indegree[i] == 1) 344 | { 345 | levelItem[ptr] = i; 346 | ptr++; 347 | } 348 | } 349 | 350 | levelPtr[1] = ptr; 351 | 352 | int lvi = 1; 353 | while (levelPtr[lvi] != m) 354 | { 355 | for (pangulu_int64_t i = levelPtr[lvi - 1]; i < levelPtr[lvi]; i++) 356 | { 357 | int node = levelItem[i]; 358 | for (pangulu_int64_t j = cscColPtr[node]; j < cscColPtr[node + 1]; j++) 359 | { 360 | pangulu_inblock_idx visit_node = cscRowIdx[j]; 361 | indegree[visit_node]--; 362 | if (indegree[visit_node] == 1) 363 | { 364 | levelItem[ptr] = visit_node; 365 | ptr++; 366 | } 367 | } 368 | } 369 | lvi++; 370 | levelPtr[lvi] = ptr; 371 | } 372 | 373 | *nlevel = lvi; 374 | 375 | pangulu_free(__FILE__, __LINE__, indegree); 376 | 377 | return 0; 378 | } 379 | 380 | void pangulu_gessm_interface_cpu_csc(pangulu_smatrix *a, 381 | pangulu_smatrix *l, 382 | pangulu_smatrix *x) 383 | { 384 | pangulu_gessm_fp64_cpu_4(a, l, x); 385 | } 386 | 387 | void pangulu_gessm_interface_cpu_csr(pangulu_smatrix *a, 388 | pangulu_smatrix *l, 389 | pangulu_smatrix *x) 390 | { 391 | #ifdef OUTPUT_MATRICES 392 | char out_name_B[512]; 393 | char out_name_L[512]; 394 | sprintf(out_name_B, "%s/%s/%d%s", OUTPUT_FILE, "gessm", gessm_number, "_gessm_B.cbd"); 395 | sprintf(out_name_L, "%s/%s/%d%s", OUTPUT_FILE, "gessm", gessm_number, "_gessm_L.cbd"); 396 | pangulu_binary_write_csc_pangulu_smatrix(a, out_name_B); 397 | pangulu_binary_write_csc_pangulu_smatrix(l, out_name_L); 398 | gessm_number++; 399 | #endif 400 | 401 | pangulu_gessm_fp64_cpu_1(a, l, x); 402 | } 403 | void pangulu_gessm_interface_c_v1(pangulu_smatrix *a, 404 | pangulu_smatrix *l, 405 | pangulu_smatrix *x) 406 | { 407 | #ifdef GPU_OPEN 408 | pangulu_smatrix_cuda_memcpy_value_csc(a, a); 409 | #endif 410 | pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(x, a); 411 | pangulu_gessm_fp64_cpu_4(a, l, x); 412 | #ifdef GPU_OPEN 413 | pangulu_smatrix_cuda_memcpy_to_device_value_csc(a, a); 414 | #endif 415 | } 416 | void pangulu_gessm_interface_c_v2(pangulu_smatrix *a, 417 | pangulu_smatrix *l, 418 | pangulu_smatrix *x) 419 | { 420 | #ifdef GPU_OPEN 421 | pangulu_smatrix_cuda_memcpy_value_csc(a, a); 422 | #endif 423 | pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(x, a); 424 | pangulu_gessm_fp64_cpu_6(a, l, x); 425 | #ifdef GPU_OPEN 426 | pangulu_smatrix_cuda_memcpy_to_device_value_csc(a, a); 427 | #endif 428 | } -------------------------------------------------------------------------------- /src/pangulu_gessm_fp64_cuda.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | #ifdef GPU_OPEN 4 | void pangulu_gessm_fp64_cuda_v9(pangulu_smatrix *a, 5 | pangulu_smatrix *l, 6 | pangulu_smatrix *x) 7 | { 8 | 9 | pangulu_int64_t n = a->row; 10 | pangulu_int64_t nnzl = l->nnz; 11 | pangulu_int64_t nnza = a->nnz; 12 | 13 | int *d_graphindegree = l->d_graphindegree; 14 | cudaMemcpy(d_graphindegree, l->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice); 15 | int *d_id_extractor = l->d_id_extractor; 16 | cudaMemset(d_id_extractor, 0, sizeof(int)); 17 | 18 | int *d_while_profiler; 19 | cudaMalloc((void **)&d_while_profiler, sizeof(int) * n); 20 | cudaMemset(d_while_profiler, 0, sizeof(int) * n); 21 | pangulu_int64_t *spointer = (pangulu_int64_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_int64_t) * (n + 1)); 22 | memset(spointer, 0, sizeof(pangulu_int64_t) * (n + 1)); 23 | pangulu_int64_t rhs = 0; 24 | for (int i = 0; i < n; i++) 25 | { 26 | if (a->columnpointer[i] != a->columnpointer[i + 1]) 27 | { 28 | spointer[rhs] = i; 29 | rhs++; 30 | } 31 | } 32 | calculate_type *d_left_sum; 33 | cudaMalloc((void **)&d_left_sum, n * rhs * sizeof(calculate_type)); 34 | cudaMemset(d_left_sum, 0, n * rhs * sizeof(calculate_type)); 35 | 36 | calculate_type *d_x, *d_b; 37 | cudaMalloc((void **)&d_x, n * rhs * sizeof(calculate_type)); 38 | cudaMalloc((void **)&d_b, n * rhs * sizeof(calculate_type)); 39 | cudaMemset(d_x, 0, n * rhs * sizeof(calculate_type)); 40 | cudaMemset(d_b, 0, n * rhs * sizeof(calculate_type)); 41 | 42 | pangulu_inblock_ptr *d_spointer; 43 | cudaMalloc((void **)&d_spointer, sizeof(pangulu_inblock_ptr) * (n + 1)); 44 | cudaMemset(d_spointer, 0, sizeof(pangulu_inblock_ptr) * (n + 1)); 45 | cudaMemcpy(d_spointer, spointer, sizeof(pangulu_inblock_ptr) * (n + 1), cudaMemcpyHostToDevice); 46 | 47 | pangulu_gessm_cuda_kernel_v9(n, 48 | nnzl, 49 | rhs, 50 | nnza, 51 | d_spointer, 52 | d_graphindegree, 53 | d_id_extractor, 54 | d_while_profiler, 55 | l->cuda_rowpointer, 56 | l->cuda_columnindex, 57 | l->cuda_value, 58 | a->cuda_rowpointer, 59 | a->cuda_columnindex, 60 | x->cuda_value, 61 | a->cuda_rowpointer, 62 | a->cuda_columnindex, 63 | a->cuda_value, 64 | d_left_sum, 65 | d_x, 66 | d_b); 67 | 68 | cudaFree(d_x); 69 | cudaFree(d_b); 70 | cudaFree(d_left_sum); 71 | cudaFree(d_while_profiler); 72 | } 73 | 74 | void pangulu_gessm_fp64_cuda_v11(pangulu_smatrix *a, 75 | pangulu_smatrix *l, 76 | pangulu_smatrix *x) 77 | { 78 | pangulu_int64_t n = a->row; 79 | pangulu_int64_t nnzl = l->nnz; 80 | pangulu_int64_t nnza = a->nnz; 81 | /**********************************l****************************************/ 82 | int *d_graphindegree = l->d_graphindegree; 83 | cudaMemcpy(d_graphindegree, l->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice); 84 | int *d_id_extractor = l->d_id_extractor; 85 | cudaMemset(d_id_extractor, 0, sizeof(int)); 86 | 87 | calculate_type *d_left_sum = a->d_left_sum; 88 | cudaMemset(d_left_sum, 0, nnza * sizeof(calculate_type)); 89 | /*****************************************************************************/ 90 | pangulu_gessm_cuda_kernel_v11(n, 91 | nnzl, 92 | nnza, 93 | d_graphindegree, 94 | d_id_extractor, 95 | d_left_sum, 96 | l->cuda_rowpointer, 97 | l->cuda_columnindex, 98 | l->cuda_value, 99 | a->cuda_rowpointer, 100 | a->cuda_columnindex, 101 | x->cuda_value, 102 | a->cuda_rowpointer, 103 | a->cuda_columnindex, 104 | a->cuda_value); 105 | cudaDeviceSynchronize(); 106 | } 107 | 108 | void pangulu_gessm_fp64_cuda_v7(pangulu_smatrix *a, 109 | pangulu_smatrix *l, 110 | pangulu_smatrix *x) 111 | { 112 | 113 | pangulu_int64_t n = a->row; 114 | pangulu_int64_t nnzl = l->nnz; 115 | pangulu_gessm_cuda_kernel_v7(n, 116 | nnzl, 117 | l->cuda_rowpointer, 118 | l->cuda_columnindex, 119 | l->cuda_value, 120 | a->cuda_rowpointer, 121 | a->cuda_columnindex, 122 | x->cuda_value, 123 | a->cuda_rowpointer, 124 | a->cuda_columnindex, 125 | a->cuda_value); 126 | } 127 | 128 | void pangulu_gessm_fp64_cuda_v8(pangulu_smatrix *a, 129 | pangulu_smatrix *l, 130 | pangulu_smatrix *x) 131 | { 132 | pangulu_int64_t n = a->row; 133 | pangulu_int64_t nnzl = l->nnz; 134 | pangulu_int64_t nnza = a->nnz; 135 | /**********************************l****************************************/ 136 | int *d_graphindegree = l->d_graphindegree; 137 | cudaMemcpy(d_graphindegree, l->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice); 138 | int *d_id_extractor = l->d_id_extractor; 139 | cudaMemset(d_id_extractor, 0, sizeof(int)); 140 | 141 | calculate_type *d_left_sum = a->d_left_sum; 142 | cudaMemset(d_left_sum, 0, nnza * sizeof(calculate_type)); 143 | /*****************************************************************************/ 144 | pangulu_gessm_cuda_kernel_v8(n, 145 | nnzl, 146 | nnza, 147 | d_graphindegree, 148 | d_id_extractor, 149 | d_left_sum, 150 | l->cuda_rowpointer, 151 | l->cuda_columnindex, 152 | l->cuda_value, 153 | a->cuda_rowpointer, 154 | a->cuda_columnindex, 155 | x->cuda_value, 156 | a->cuda_rowpointer, 157 | a->cuda_columnindex, 158 | a->cuda_value); 159 | cudaDeviceSynchronize(); 160 | } 161 | 162 | void pangulu_gessm_fp64_cuda_v10(pangulu_smatrix *a, 163 | pangulu_smatrix *l, 164 | pangulu_smatrix *x) 165 | { 166 | 167 | pangulu_int64_t n = a->row; 168 | pangulu_int64_t nnzl = l->nnz; 169 | pangulu_gessm_cuda_kernel_v10(n, 170 | nnzl, 171 | l->cuda_rowpointer, 172 | l->cuda_columnindex, 173 | l->cuda_value, 174 | a->cuda_rowpointer, 175 | a->cuda_columnindex, 176 | x->cuda_value, 177 | a->cuda_rowpointer, 178 | a->cuda_columnindex, 179 | a->cuda_value); 180 | } 181 | 182 | void pangulu_gessm_interface_g_v1(pangulu_smatrix *a, 183 | pangulu_smatrix *l, 184 | pangulu_smatrix *x) 185 | { 186 | pangulu_gessm_fp64_cuda_v7(a, l, x); 187 | pangulu_smatrix_cuda_memcpy_value_csc(a, x); 188 | } 189 | void pangulu_gessm_interface_g_v2(pangulu_smatrix *a, 190 | pangulu_smatrix *l, 191 | pangulu_smatrix *x) 192 | { 193 | pangulu_smatrix_cuda_memcpy_value_csc(a, a); 194 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a); 195 | pangulu_smatrix_cuda_memcpy_complete_csr(a, a); 196 | 197 | pangulu_gessm_fp64_cuda_v8(a, l, x); 198 | 199 | pangulu_smatrix_cuda_memcpy_value_csr(a, x); 200 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a); 201 | } 202 | void pangulu_gessm_interface_g_v3(pangulu_smatrix *a, 203 | pangulu_smatrix *l, 204 | pangulu_smatrix *x) 205 | { 206 | pangulu_gessm_fp64_cuda_v10(a, l, x); 207 | pangulu_smatrix_cuda_memcpy_value_csc(a, x); 208 | } 209 | #endif -------------------------------------------------------------------------------- /src/pangulu_getrf_fp64_cuda.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | #ifdef GPU_OPEN 4 | void pangulu_getrf_fp64_cuda(pangulu_smatrix *a, 5 | pangulu_smatrix *l, 6 | pangulu_smatrix *u) 7 | { 8 | 9 | if (a->nnz > 1e4) 10 | { 11 | pangulu_getrf_cuda_dense_kernel(a->row, 12 | a->rowpointer[a->row], 13 | u->cuda_nnzu, 14 | a->cuda_rowpointer, 15 | a->cuda_columnindex, 16 | a->cuda_value, 17 | l->cuda_rowpointer, 18 | l->cuda_columnindex, 19 | l->cuda_value, 20 | u->cuda_rowpointer, 21 | u->cuda_columnindex, 22 | u->cuda_value); 23 | } 24 | else 25 | { 26 | pangulu_getrf_cuda_kernel(a->row, 27 | a->rowpointer[a->row], 28 | u->cuda_nnzu, 29 | a->cuda_rowpointer, 30 | a->cuda_columnindex, 31 | a->cuda_value, 32 | l->cuda_rowpointer, 33 | l->cuda_columnindex, 34 | l->cuda_value, 35 | u->cuda_rowpointer, 36 | u->cuda_columnindex, 37 | u->cuda_value); 38 | } 39 | } 40 | 41 | void pangulu_getrf_interface_G_V1(pangulu_smatrix *a, 42 | pangulu_smatrix *l, 43 | pangulu_smatrix *u) 44 | { 45 | pangulu_getrf_cuda_kernel(a->row, 46 | a->rowpointer[a->row], 47 | u->cuda_nnzu, 48 | a->cuda_rowpointer, 49 | a->cuda_columnindex, 50 | a->cuda_value, 51 | l->cuda_rowpointer, 52 | l->cuda_columnindex, 53 | l->cuda_value, 54 | u->cuda_rowpointer, 55 | u->cuda_columnindex, 56 | u->cuda_value); 57 | pangulu_smatrix_cuda_memcpy_value_csc(l, l); 58 | pangulu_smatrix_cuda_memcpy_value_csc(u, u); 59 | } 60 | void pangulu_getrf_interface_G_V2(pangulu_smatrix *a, 61 | pangulu_smatrix *l, 62 | pangulu_smatrix *u) 63 | { 64 | pangulu_getrf_cuda_dense_kernel(a->row, 65 | a->rowpointer[a->row], 66 | u->cuda_nnzu, 67 | a->cuda_rowpointer, 68 | a->cuda_columnindex, 69 | a->cuda_value, 70 | l->cuda_rowpointer, 71 | l->cuda_columnindex, 72 | l->cuda_value, 73 | u->cuda_rowpointer, 74 | u->cuda_columnindex, 75 | u->cuda_value); 76 | pangulu_smatrix_cuda_memcpy_value_csc(l, l); 77 | pangulu_smatrix_cuda_memcpy_value_csc(u, u); 78 | } 79 | 80 | #endif -------------------------------------------------------------------------------- /src/pangulu_heap.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | void pangulu_init_heap_select(pangulu_int64_t select) 3 | { 4 | heap_select = select; 5 | } 6 | 7 | void pangulu_init_pangulu_heap(pangulu_heap *heap, pangulu_int64_t max_length) 8 | { 9 | compare_struct *compare_queue = (compare_struct *)pangulu_malloc(__FILE__, __LINE__, sizeof(compare_struct) * max_length); 10 | pangulu_int64_t *heap_queue = (pangulu_int64_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_int64_t) * max_length); 11 | heap->comapre_queue = compare_queue; 12 | heap->heap_queue = heap_queue; 13 | heap->max_length = max_length; 14 | heap->length = 0; 15 | heap->nnz_flag = 0; 16 | #ifdef OVERLAP 17 | heap->heap_bsem = NULL; 18 | #endif 19 | } 20 | 21 | pangulu_heap *pangulu_destory_pangulu_heap(pangulu_heap *heap) 22 | { 23 | if (heap != NULL) 24 | { 25 | pangulu_free(__FILE__, __LINE__, heap->comapre_queue); 26 | pangulu_free(__FILE__, __LINE__, heap->heap_queue); 27 | heap->length = 0; 28 | heap->nnz_flag = 0; 29 | heap->max_length = 0; 30 | } 31 | pangulu_free(__FILE__, __LINE__, heap); 32 | return NULL; 33 | } 34 | 35 | void pangulu_zero_pangulu_heap(pangulu_heap *heap) 36 | { 37 | heap->length = 0; 38 | heap->nnz_flag = 0; 39 | } 40 | 41 | pangulu_int64_t pangulu_compare(compare_struct *compare_queue, pangulu_int64_t a, pangulu_int64_t b) 42 | { 43 | if (0 == heap_select) 44 | { 45 | if (compare_queue[a].compare_flag == compare_queue[b].compare_flag) 46 | { 47 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag; 48 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag; 49 | 50 | return compare_flag_a < compare_flag_b; 51 | } 52 | else 53 | { 54 | return compare_queue[a].compare_flag < compare_queue[b].compare_flag; 55 | } 56 | } 57 | else if (1 == heap_select) 58 | { 59 | if (compare_queue[a].kernel_id == compare_queue[b].kernel_id) 60 | { 61 | 62 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag; 63 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag; 64 | 65 | if (compare_flag_a == compare_flag_b) 66 | { 67 | return compare_queue[a].compare_flag < compare_queue[b].compare_flag; 68 | } 69 | else 70 | { 71 | return compare_flag_a < compare_flag_b; 72 | } 73 | } 74 | else 75 | { 76 | return compare_queue[a].kernel_id < compare_queue[b].kernel_id; 77 | } 78 | } 79 | else if (2 == heap_select) 80 | { 81 | if (compare_queue[a].kernel_id == compare_queue[b].kernel_id) 82 | { 83 | if (compare_queue[a].compare_flag == compare_queue[b].compare_flag) 84 | { 85 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag; 86 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag; 87 | return compare_flag_a < compare_flag_b; 88 | } 89 | else 90 | { 91 | return compare_queue[a].compare_flag < compare_queue[b].compare_flag; 92 | } 93 | } 94 | else 95 | { 96 | return compare_queue[a].kernel_id < compare_queue[b].kernel_id; 97 | } 98 | } 99 | else if (3 == heap_select) 100 | { 101 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag; 102 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag; 103 | return compare_flag_a < compare_flag_b; 104 | } 105 | else if (4 == heap_select) 106 | { 107 | if (compare_queue[a].compare_flag == compare_queue[b].compare_flag) 108 | { 109 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag; 110 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag; 111 | 112 | return compare_flag_a > compare_flag_b; 113 | } 114 | else 115 | { 116 | return compare_queue[a].compare_flag > compare_queue[b].compare_flag; 117 | } 118 | } 119 | else 120 | { 121 | printf(PANGULU_E_INVALID_HEAP_SELECT); 122 | pangulu_exit(1); 123 | } 124 | } 125 | 126 | void pangulu_swap(pangulu_int64_t *heap_queue, pangulu_int64_t a, pangulu_int64_t b) 127 | { 128 | pangulu_int64_t temp = heap_queue[a]; 129 | heap_queue[a] = heap_queue[b]; 130 | heap_queue[b] = temp; 131 | } 132 | 133 | void pangulu_heap_insert(pangulu_heap *heap, pangulu_int64_t row, pangulu_int64_t col, pangulu_int64_t task_level, pangulu_int64_t kernel_id, pangulu_int64_t compare_flag) 134 | { 135 | 136 | compare_struct *compare_queue = heap->comapre_queue; 137 | pangulu_int64_t *heap_queue = heap->heap_queue; 138 | pangulu_int64_t length = heap->length; 139 | pangulu_int64_t nnz_flag = heap->nnz_flag; 140 | 141 | if (rank == -1) 142 | { 143 | printf(PANGULU_I_TASK_INFO); 144 | } 145 | 146 | if ((nnz_flag) >= heap->max_length) 147 | { 148 | printf(PANGULU_E_HEAP_FULL); 149 | pangulu_exit(1); 150 | } 151 | compare_queue[nnz_flag].row = row; 152 | compare_queue[nnz_flag].col = col; 153 | compare_queue[nnz_flag].task_level = task_level; 154 | compare_queue[nnz_flag].kernel_id = kernel_id; 155 | compare_queue[nnz_flag].compare_flag = compare_flag; 156 | heap_queue[length] = nnz_flag; 157 | (heap->nnz_flag)++; 158 | pangulu_int64_t now = length; 159 | pangulu_int64_t before = (now - 1) / 2; 160 | while (now != 0 && before >= 0) 161 | { 162 | if (pangulu_compare(compare_queue, heap_queue[now], heap_queue[before])) 163 | { 164 | pangulu_swap(heap_queue, now, before); 165 | } 166 | else 167 | { 168 | break; 169 | } 170 | now = before; 171 | before = (now - 1) / 2; 172 | } 173 | heap->length = length + 1; 174 | } 175 | 176 | pangulu_int64_t heap_empty(pangulu_heap *heap) 177 | { 178 | return !(heap->length); 179 | } 180 | 181 | void pangulu_heap_adjust(pangulu_heap *heap, pangulu_int64_t top, pangulu_int64_t n) 182 | { 183 | compare_struct *compare_queue = heap->comapre_queue; 184 | pangulu_int64_t *heap_queue = heap->heap_queue; 185 | pangulu_int64_t left = top * 2 + 1; 186 | 187 | while (left < n) 188 | { 189 | if ((left + 1) < n && pangulu_compare(compare_queue, heap_queue[left + 1], heap_queue[left])) 190 | { 191 | left = left + 1; 192 | } 193 | if (pangulu_compare(compare_queue, heap_queue[left], heap_queue[top])) 194 | { 195 | pangulu_swap(heap_queue, left, top); 196 | top = left; 197 | left = 2 * top + 1; 198 | } 199 | else 200 | { 201 | break; 202 | } 203 | } 204 | } 205 | 206 | pangulu_int64_t pangulu_heap_delete(pangulu_heap *heap) 207 | { 208 | if (heap_empty(heap)) 209 | { 210 | printf(PANGULU_E_HEAP_EMPTY); 211 | pangulu_exit(1); 212 | } 213 | pangulu_int64_t length = heap->length; 214 | pangulu_int64_t *heap_queue = heap->heap_queue; 215 | pangulu_swap(heap_queue, length - 1, 0); 216 | pangulu_heap_adjust(heap, 0, length - 1); 217 | heap->length = length - 1; 218 | return heap_queue[length - 1]; 219 | } 220 | 221 | void pangulu_display_heap(pangulu_heap *heap) 222 | { 223 | printf(PANGULU_I_HEAP_LEN); 224 | for (pangulu_int64_t i = 0; i < heap->length; i++) 225 | { 226 | printf(FMT_PANGULU_INT64_T " ", heap->heap_queue[i]); 227 | } 228 | printf("\n"); 229 | for (pangulu_int64_t i = 0; i < heap->length; i++) 230 | { 231 | pangulu_int64_t now = heap->heap_queue[i]; 232 | printf("row is " FMT_PANGULU_EXBLOCK_IDX 233 | " col is " FMT_PANGULU_EXBLOCK_IDX 234 | " level is " FMT_PANGULU_EXBLOCK_IDX 235 | " compare_flag is " FMT_PANGULU_INT64_T 236 | " do the kernel " FMT_PANGULU_INT16_T "\n", 237 | heap->comapre_queue[now].row, 238 | heap->comapre_queue[now].col, 239 | heap->comapre_queue[now].task_level, 240 | heap->comapre_queue[now].compare_flag, 241 | heap->comapre_queue[now].kernel_id); 242 | } 243 | } -------------------------------------------------------------------------------- /src/pangulu_kernel_interface.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | void pangulu_getrf_interface(pangulu_smatrix *a, pangulu_smatrix *l, pangulu_smatrix *u, 4 | pangulu_smatrix *calculate_L, pangulu_smatrix *calculate_U) 5 | { 6 | for(pangulu_int64_t i=0;innz;i++){ 7 | pangulu_int64_t now_row=u->rowindex[i]; 8 | calculate_time+=(l->columnpointer[now_row+1]-l->columnpointer[now_row]); 9 | } 10 | #ifdef CHECK_TIME 11 | struct timeval GET_TIME_START; 12 | pangulu_time_check_begin(&GET_TIME_START); 13 | #endif 14 | 15 | #ifdef GPU_OPEN 16 | 17 | #ifdef ADD_GPU_MEMORY 18 | 19 | #ifdef ADAPTIVE_KERNEL_SELECTION 20 | int nnzA = a->nnz; 21 | if (nnzA < 6309) 22 | { // 6309≈1e3.8 23 | pangulu_getrf_interface_C_V1(a, l, u); 24 | } 25 | else if (nnzA < 1e4) 26 | { 27 | pangulu_getrf_interface_G_V1(a, l, u); 28 | } 29 | else 30 | { 31 | pangulu_getrf_interface_G_V2(a, l, u); 32 | } 33 | #else // ADAPTIVE_KERNEL_SELECTION 34 | pangulu_getrf_interface_G_V1(a, l, u); 35 | #endif // ADAPTIVE_KERNEL_SELECTION 36 | cudaDeviceSynchronize(); 37 | 38 | #else // ADD_GPU_MEMORY 39 | pangulu_smatrix_cuda_memcpy_struct_csc(calculate_L, l); 40 | pangulu_smatrix_cuda_memcpy_struct_csc(calculate_U, u); 41 | pangulu_smatrix_cuda_memcpy_nnzu(calculate_U, u); 42 | pangulu_getrf_fp64_cuda(a, calculate_L, calculate_U); 43 | pangulu_smatrix_cuda_memcpy_value_csc(l, calculate_L); 44 | pangulu_smatrix_cuda_memcpy_value_csc(u, calculate_U); 45 | 46 | #endif // ADD_GPU_MEMORY 47 | #else // GPU_OPEN 48 | 49 | pangulu_getrf_fp64(a, l, u); 50 | 51 | #endif // GPU_OPEN 52 | 53 | #ifdef CHECK_TIME 54 | time_getrf += pangulu_time_check_end(&GET_TIME_START); 55 | #endif 56 | } 57 | 58 | void pangulu_tstrf_interface(pangulu_smatrix *a, pangulu_smatrix *save_X, pangulu_smatrix *u, 59 | pangulu_smatrix *calculate_X, pangulu_smatrix *calculate_U) 60 | { 61 | // for(int_t i=0;innz;i++){ 62 | // int_t now_col=a->columnindex[i]; 63 | // calculate_time+=(u->rowpointer[now_col+1]-u->rowpointer[now_col]); 64 | // } 65 | #ifdef CHECK_TIME 66 | struct timeval GET_TIME_START; 67 | pangulu_time_check_begin(&GET_TIME_START); 68 | #endif 69 | 70 | #ifdef GPU_OPEN 71 | 72 | #ifndef GPU_TSTRF 73 | 74 | #ifndef CPU_OPTION 75 | pangulu_smatrix_cuda_memcpy_value_csc_cal_length(calculate_X, a); 76 | 77 | pangulu_tstrf_interface_cpu(a, calculate_X, u); 78 | #else // CPU_OPTION 79 | 80 | pangulu_int64_t cpu_choice2 = a->nnz; 81 | calculate_type cpu_choice3 = cpu_choice2 / ((calculate_type)nrecord * (calculate_type)cpu_choice1); 82 | pangulu_int64_t TSTRF_choice_cpu = Select_Function_CPU(cpu_choice1, cpu_choice3, nrecord); 83 | pangulu_tstrf_kernel_choice_cpu(a, calculate_X, u, TSTRF_choice_cpu); 84 | #endif // CPU_OPTION 85 | 86 | #else // GPU_TSTRF 87 | 88 | #ifdef ADD_GPU_MEMORY 89 | #ifdef ADAPTIVE_KERNEL_SELECTION 90 | pangulu_int64_t nnzB = a->nnz; 91 | if (nnzB < 6309) 92 | { 93 | // 6309≈1e3.8 94 | if (nnzB < 3981) 95 | { // 3981≈1e3.6 96 | pangulu_tstrf_interface_C_V1(a, calculate_X, u); 97 | } 98 | else 99 | { 100 | pangulu_tstrf_interface_C_V2(a, calculate_X, u); 101 | } 102 | } 103 | else 104 | { 105 | if (nnzB < 1e4) 106 | { 107 | pangulu_tstrf_interface_G_V2(a, calculate_X, u); 108 | } 109 | else if (nnzB < 19952) 110 | { // 19952≈1e4.3 111 | pangulu_tstrf_interface_G_V1(a, calculate_X, u); 112 | } 113 | else 114 | { 115 | pangulu_tstrf_interface_G_V3(a, calculate_X, u); 116 | } 117 | } 118 | #else // ADAPTIVE_KERNEL_SELECTION 119 | pangulu_tstrf_interface_G_V1(a, calculate_X, u); 120 | #endif // ADAPTIVE_KERNEL_SELECTION 121 | cudaDeviceSynchronize(); 122 | 123 | #else // ADD_GPU_MEMORY 124 | 125 | pangulu_smatrix_cuda_memcpy_complete_csr(calculate_U, u); 126 | pangulu_tstrf_interface(a, calculate_X, calculate_U); 127 | pangulu_smatrix_cuda_memcpy_value_csc(a, calculate_X); 128 | #endif // ADD_GPU_MEMORY 129 | 130 | #endif // ADAPTIVE_KERNEL_SELECTION 131 | 132 | #else // GPU_OPEN 133 | 134 | // csc 135 | tstrf_csc_csc(a->row, u->columnpointer, u->rowindex, u->value_csc, a->columnpointer, a->rowindex, a->value_csc); 136 | pangulu_pangulu_smatrix_memcpy_columnpointer_csc(save_X, a); 137 | 138 | // // csr 139 | // pangulu_transpose_pangulu_smatrix_csc_to_csr(a); 140 | // pangulu_pangulu_smatrix_memcpy_value_csc_copy_length(calculate_X, a); 141 | // pangulu_tstrf_fp64_CPU_6(a, calculate_X, u); 142 | // pangulu_transpose_pangulu_smatrix_csr_to_csc(a); 143 | // pangulu_pangulu_smatrix_memcpy_columnpointer_csc(save_X, a); 144 | 145 | #endif // GPU_OPEN 146 | 147 | #ifdef CHECK_TIME 148 | time_tstrf += pangulu_time_check_end(&GET_TIME_START); 149 | #endif 150 | } 151 | 152 | void pangulu_gessm_interface(pangulu_smatrix *a, pangulu_smatrix *save_X, pangulu_smatrix *l, 153 | pangulu_smatrix *calculate_X, pangulu_smatrix *calculate_L) 154 | { 155 | for(pangulu_int64_t i=0;innz;i++){ 156 | pangulu_int64_t now_row=a->rowindex[i]; 157 | calculate_time+=(l->columnpointer[now_row+1]-l->columnpointer[now_row]); 158 | } 159 | #ifdef CHECK_TIME 160 | struct timeval GET_TIME_START; 161 | pangulu_time_check_begin(&GET_TIME_START); 162 | #endif 163 | 164 | #ifdef GPU_OPEN 165 | 166 | #ifndef GPU_GESSM 167 | 168 | #ifndef CPU_OPTION 169 | pangulu_smatrix_cuda_memcpy_value_csc(a, a); 170 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a); 171 | pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(calculate_X, a); 172 | pangulu_transpose_pangulu_smatrix_csc_to_csr(l); 173 | pangulu_gessm_interface_cpu(a, l, calculate_X); 174 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a); 175 | #else 176 | 177 | /*******************Choose the best performance(性能择优)*************************/ 178 | pangulu_int64_t cpu_choice2 = a->nnz; 179 | calculate_type cpu_choice3 = cpu_choice2 / ((calculate_type)nrecord * (calculate_type)cpu_choice1); 180 | pangulu_int64_t GESSM_choice_cpu = Select_Function_CPU(cpu_choice1, cpu_choice3, nrecord); 181 | pangulu_gessm_kernel_choice_cpu(a, l, calculate_X, GESSM_choice_cpu); 182 | #endif 183 | #else 184 | 185 | #ifdef ADD_GPU_MEMORY 186 | #ifdef ADAPTIVE_KERNEL_SELECTION 187 | int nnzL = l->nnz; 188 | if (nnzL < 7943) 189 | { 190 | // 7943≈1e3.9 191 | if (nnzL < 3981) 192 | { // 3981≈1e3.6 193 | pangulu_gessm_interface_C_V1(a, l, calculate_X); 194 | } 195 | else 196 | { 197 | pangulu_gessm_interface_C_V2(a, l, calculate_X); 198 | } 199 | } 200 | else 201 | { 202 | if (nnzL < 12589) 203 | { 204 | // 12589≈1e4.1 205 | pangulu_gessm_interface_G_V2(a, l, calculate_X); 206 | } 207 | else if (nnzL < 19952) 208 | { // 19952≈1e4.3 209 | pangulu_gessm_interface_g_v1(a, l, calculate_X); 210 | } 211 | else 212 | { 213 | pangulu_gessm_interface_G_V3(a, l, calculate_X); 214 | } 215 | } 216 | #else 217 | pangulu_gessm_interface_g_v1(a, l, calculate_X); 218 | #endif 219 | cudaDeviceSynchronize(); 220 | 221 | #else 222 | 223 | pangulu_smatrix_cuda_memcpy_complete_csc(calculate_L, l); 224 | pangulu_gessm_interface(a, calculate_L, calculate_X); 225 | #endif 226 | 227 | #endif 228 | 229 | #else 230 | pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(calculate_X, a); 231 | pangulu_gessm_fp64_cpu_6(a, l, calculate_X); 232 | pangulu_pangulu_smatrix_memcpy_columnpointer_csc(save_X, a); 233 | // pangulu_transpose_pangulu_smatrix_csc_to_csr(a); 234 | // pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(calculate_X, a); 235 | // pangulu_gessm_interface_CPU_csr(a, l, calculate_X); 236 | // pangulu_transpose_pangulu_smatrix_csr_to_csc(a); 237 | // pangulu_pangulu_smatrix_memcpy_columnpointer_csc(save_X, a); 238 | #endif 239 | 240 | #ifdef CHECK_TIME 241 | time_gessm += pangulu_time_check_end(&GET_TIME_START); 242 | #endif 243 | } 244 | 245 | void pangulu_ssssm_interface(pangulu_smatrix *a, pangulu_smatrix *l, pangulu_smatrix *u, 246 | pangulu_smatrix *calculate_L, pangulu_smatrix *calculate_U) 247 | { 248 | for(pangulu_int64_t i=0;innz;i++){ 249 | pangulu_int64_t now_row=u->rowindex[i]; 250 | calculate_time+=(l->columnpointer[now_row+1]-l->columnpointer[now_row]); 251 | } 252 | #ifdef CHECK_TIME 253 | struct timeval GET_TIME_START; 254 | pangulu_time_check_begin(&GET_TIME_START); 255 | #endif 256 | 257 | #ifdef GPU_OPEN 258 | 259 | #ifndef ADD_GPU_MEMORY 260 | pangulu_smatrix_cuda_memcpy_complete_csc(calculate_L, l); 261 | pangulu_smatrix_cuda_memcpy_complete_csc(calculate_U, u); 262 | pangulu_ssssm_fp64_cuda(a, calculate_L, calculate_U); 263 | #else 264 | 265 | #ifdef ADAPTIVE_KERNEL_SELECTION 266 | long long flops = 0; 267 | int n = a->row; 268 | for (int i = 0; i < n; i++) 269 | { 270 | for (int j = u->columnpointer[i]; j < u->columnpointer[i + 1]; j++) 271 | { 272 | int col_L = u->rowindex[j]; 273 | flops += l->columnpointer[col_L + 1] - l->columnpointer[col_L]; 274 | } 275 | } 276 | if (flops < 1e7) 277 | { 278 | if (flops < 3981071705) 279 | { 280 | // 3981071705≈1e9.6 281 | pangulu_ssssm_interface_G_V2(a, l, u); 282 | } 283 | else 284 | { 285 | pangulu_ssssm_interface_G_V1(a, l, u); 286 | } 287 | } 288 | else 289 | { 290 | if (flops < 63095) 291 | { 292 | // 63095≈1e4.8 293 | pangulu_ssssm_interface_C_V1(a, l, u); 294 | } 295 | else 296 | { 297 | pangulu_ssssm_interface_C_V2(a, l, u); 298 | } 299 | } 300 | #else 301 | pangulu_ssssm_interface_G_V1(a, l, u); 302 | #endif 303 | cudaDeviceSynchronize(); 304 | #endif 305 | #else 306 | 307 | pangulu_ssssm_fp64(a, l, u); 308 | #endif 309 | 310 | #ifdef CHECK_TIME 311 | time_ssssm += pangulu_time_check_end(&GET_TIME_START); 312 | #endif 313 | } 314 | 315 | #ifdef GPU_OPEN 316 | 317 | void pangulu_addmatrix_interface(pangulu_smatrix *a, 318 | pangulu_smatrix *b) 319 | { 320 | pangulu_add_pangulu_smatrix_cuda(a, b); 321 | } 322 | 323 | #endif 324 | 325 | void pangulu_addmatrix_interface_cpu(pangulu_smatrix *a, 326 | pangulu_smatrix *b) 327 | { 328 | pangulu_add_pangulu_smatrix_cpu(a, b); 329 | } 330 | 331 | void pangulu_spmv(pangulu_smatrix *s, pangulu_vector *z, pangulu_vector *answer, int vector_number) 332 | { 333 | pangulu_spmv_cpu_xishu_csc(s, z, answer, vector_number); 334 | } 335 | 336 | void pangulu_sptrsv(pangulu_smatrix *s, pangulu_vector *answer, pangulu_vector *z, int vector_number, int32_t tag) 337 | { 338 | pangulu_sptrsv_cpu_xishu_csc(s, answer, z, vector_number, tag); 339 | } 340 | 341 | void pangulu_vector_add(pangulu_vector *answer, pangulu_vector *z) 342 | { 343 | pangulu_vector_add_cpu(answer, z); 344 | } 345 | 346 | void pangulu_vector_sub(pangulu_vector *answer, pangulu_vector *z) 347 | { 348 | pangulu_vector_sub_cpu(answer, z); 349 | } 350 | 351 | void pangulu_vector_copy(pangulu_vector *answer, pangulu_vector *z) 352 | { 353 | pangulu_vector_copy_cpu(answer, z); 354 | } -------------------------------------------------------------------------------- /src/pangulu_mpi.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | int have_msg; 4 | void pangulu_probe_message(MPI_Status *status) 5 | { 6 | have_msg=0; 7 | do{ 8 | MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &have_msg, status); 9 | if(have_msg){ 10 | return; 11 | } 12 | usleep(10); 13 | }while(!have_msg); 14 | } 15 | 16 | pangulu_int64_t pangulu_bcast_n(pangulu_int64_t n, pangulu_int64_t send_rank) 17 | { 18 | MPI_Bcast(&n, 1, MPI_PANGULU_INT64_T, send_rank, MPI_COMM_WORLD); 19 | return n; 20 | } 21 | 22 | void pangulu_bcast_vector(pangulu_inblock_ptr *vector, pangulu_int32_t length, pangulu_int64_t send_rank) 23 | { 24 | pangulu_int64_t everry_length = 100000000; 25 | for (pangulu_int64_t i = 0; i < length; i += everry_length) 26 | { 27 | if ((i + everry_length) > length) 28 | { 29 | MPI_Bcast(vector + i, length - i, MPI_PANGULU_INBLOCK_PTR, send_rank, MPI_COMM_WORLD); 30 | } 31 | else 32 | { 33 | MPI_Bcast(vector + i, everry_length, MPI_PANGULU_INBLOCK_PTR, send_rank, MPI_COMM_WORLD); 34 | } 35 | } 36 | } 37 | void pangulu_bcast_vector_int64(pangulu_int64_t *vector, pangulu_int32_t length, pangulu_int64_t send_rank) 38 | { 39 | pangulu_int64_t everry_length = 100000000; 40 | for (pangulu_int64_t i = 0; i < length; i += everry_length) 41 | { 42 | if ((i + everry_length) > length) 43 | { 44 | MPI_Bcast(vector + i, length - i, MPI_PANGULU_INT64_T, send_rank, MPI_COMM_WORLD); 45 | } 46 | else 47 | { 48 | MPI_Bcast(vector + i, everry_length, MPI_PANGULU_INT64_T, send_rank, MPI_COMM_WORLD); 49 | } 50 | } 51 | } 52 | void pangulu_mpi_waitall(MPI_Request *Request, int num) 53 | { 54 | MPI_Status Status; 55 | for(int i = 0; i < num; i++) 56 | { 57 | MPI_Wait(&Request[i], &Status); 58 | } 59 | } 60 | void pangulu_isend_vector_char_wait(char *a, pangulu_int64_t n, pangulu_int64_t send_id, int signal, MPI_Request* req) 61 | { 62 | MPI_Isend(a, n, MPI_CHAR, send_id, signal, MPI_COMM_WORLD, req); 63 | } 64 | 65 | void pangulu_send_vector_int(pangulu_int64_t *a, pangulu_int64_t n, pangulu_int64_t send_id, int signal) 66 | { 67 | MPI_Send(a, n, MPI_PANGULU_INT64_T, send_id, signal, MPI_COMM_WORLD); 68 | } 69 | 70 | void pangulu_recv_vector_int(pangulu_int64_t *a, pangulu_int64_t n, pangulu_int64_t receive_id, int signal) 71 | { 72 | MPI_Status status; 73 | for (pangulu_int64_t i = 0; i < n; i++) 74 | { 75 | a[i] = 0; 76 | } 77 | MPI_Recv(a, n, MPI_PANGULU_INT64_T, receive_id, signal, MPI_COMM_WORLD, &status); 78 | } 79 | 80 | void pangulu_send_vector_char(char *a, pangulu_int64_t n, pangulu_int64_t send_id, int signal) 81 | { 82 | MPI_Send(a, n, MPI_CHAR, send_id, signal, MPI_COMM_WORLD); 83 | } 84 | 85 | void pangulu_recv_vector_char(char *a, pangulu_int64_t n, pangulu_int64_t receive_id, int signal) 86 | { 87 | MPI_Status status; 88 | for (pangulu_int64_t i = 0; i < n; i++) 89 | { 90 | a[i] = 0; 91 | } 92 | pangulu_probe_message(&status); 93 | MPI_Recv(a, n, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &status); 94 | } 95 | 96 | void pangulu_send_vector_value(calculate_type *a, pangulu_int64_t n, pangulu_int64_t send_id, int signal) 97 | { 98 | MPI_Send(a, n, MPI_VAL_TYPE, send_id, signal, MPI_COMM_WORLD); 99 | } 100 | 101 | void pangulu_recv_vector_value(calculate_type *a, pangulu_int64_t n, pangulu_int64_t receive_id, int signal) 102 | { 103 | MPI_Status status; 104 | for (pangulu_int64_t i = 0; i < n; i++) 105 | { 106 | a[i] = 0.0; 107 | } 108 | MPI_Recv(a, n, MPI_VAL_TYPE, receive_id, signal, MPI_COMM_WORLD, &status); 109 | } 110 | 111 | void pangulu_send_pangulu_smatrix_value_csr(pangulu_smatrix *s, 112 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 113 | { 114 | 115 | MPI_Send(s->value, s->nnz, MPI_VAL_TYPE, send_id, signal + 2, MPI_COMM_WORLD); 116 | } 117 | void pangulu_send_pangulu_smatrix_struct_csr(pangulu_smatrix *s, 118 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 119 | { 120 | 121 | MPI_Send(s->rowpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, send_id, signal, MPI_COMM_WORLD); 122 | MPI_Send(s->columnindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, send_id, signal + 1, MPI_COMM_WORLD); 123 | } 124 | 125 | void pangulu_send_pangulu_smatrix_complete_csr(pangulu_smatrix *s, 126 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 127 | { 128 | pangulu_send_pangulu_smatrix_struct_csr(s, send_id, signal * 3, nb); 129 | pangulu_send_pangulu_smatrix_value_csr(s, send_id, signal * 3, nb); 130 | } 131 | 132 | void pangulu_recv_pangulu_smatrix_struct_csr(pangulu_smatrix *s, 133 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 134 | { 135 | 136 | MPI_Status status; 137 | for (pangulu_int64_t i = 0; i < (s->row + 1); i++) 138 | { 139 | s->rowpointer[i] = 0; 140 | } 141 | 142 | MPI_Recv(s->rowpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, receive_id, signal, MPI_COMM_WORLD, &status); 143 | s->nnz = s->rowpointer[s->row]; 144 | for (pangulu_int64_t i = 0; i < s->nnz; i++) 145 | { 146 | s->columnindex[i] = 0; 147 | } 148 | MPI_Recv(s->columnindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, receive_id, signal + 1, MPI_COMM_WORLD, &status); 149 | } 150 | void pangulu_recv_pangulu_smatrix_value_csr(pangulu_smatrix *s, 151 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 152 | { 153 | MPI_Status status; 154 | for (pangulu_int64_t i = 0; i < s->nnz; i++) 155 | { 156 | s->value[i] = (calculate_type)0.0; 157 | } 158 | MPI_Recv(s->value, s->nnz, MPI_VAL_TYPE, receive_id, signal + 2, MPI_COMM_WORLD, &status); 159 | } 160 | 161 | void pangulu_recv_pangulu_smatrix_value_csr_in_signal(pangulu_smatrix *s, 162 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 163 | { 164 | MPI_Status status; 165 | for (pangulu_int64_t i = 0; i < s->nnz; i++) 166 | { 167 | s->value[i] = (calculate_type)0.0; 168 | } 169 | MPI_Recv(s->value, s->nnz, MPI_VAL_TYPE, receive_id, signal, MPI_COMM_WORLD, &status); 170 | } 171 | 172 | void pangulu_recv_pangulu_smatrix_complete_csr(pangulu_smatrix *s, 173 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 174 | { 175 | 176 | pangulu_recv_pangulu_smatrix_struct_csr(s, receive_id, signal * 3, nb); 177 | pangulu_recv_pangulu_smatrix_value_csr(s, receive_id, signal * 3, nb); 178 | } 179 | 180 | void pangulu_recv_whole_pangulu_smatrix_csr(pangulu_smatrix *s, 181 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nnz, pangulu_int64_t nb) 182 | { 183 | #ifdef CHECK_TIME 184 | struct timeval GET_TIME_START; 185 | pangulu_time_check_begin(&GET_TIME_START); 186 | #endif 187 | pangulu_int64_t length = sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz + sizeof(calculate_type) * nnz; 188 | MPI_Status status; 189 | char *now_vector = (char *)(s->rowpointer); 190 | for (pangulu_int64_t i = 0; i < length; i++) 191 | { 192 | now_vector[i] = 0; 193 | } 194 | s->columnindex = (pangulu_inblock_idx *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1)); 195 | s->value = (calculate_type *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz); 196 | MPI_Recv(now_vector, length, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &status); 197 | s->nnz = nnz; 198 | #ifdef CHECK_TIME 199 | time_receive += pangulu_time_check_end(&GET_TIME_START); 200 | #endif 201 | } 202 | 203 | void pangulu_send_pangulu_smatrix_value_csc(pangulu_smatrix *s, 204 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 205 | { 206 | MPI_Send(s->value_csc, s->nnz, MPI_VAL_TYPE, send_id, signal + 2, MPI_COMM_WORLD); 207 | } 208 | 209 | void pangulu_send_pangulu_smatrix_struct_csc(pangulu_smatrix *s, 210 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 211 | { 212 | 213 | MPI_Send(s->columnpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, send_id, signal, MPI_COMM_WORLD); 214 | MPI_Send(s->rowindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, send_id, signal + 1, MPI_COMM_WORLD); 215 | } 216 | 217 | void pangulu_send_pangulu_smatrix_complete_csc(pangulu_smatrix *s, 218 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 219 | { 220 | pangulu_send_pangulu_smatrix_struct_csc(s, send_id, signal * 3, nb); 221 | pangulu_send_pangulu_smatrix_value_csc(s, send_id, signal * 3, nb); 222 | } 223 | 224 | void pangulu_recv_pangulu_smatrix_struct_csc(pangulu_smatrix *s, 225 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 226 | { 227 | 228 | MPI_Status status; 229 | for (pangulu_int64_t i = 0; i < (s->row + 1); i++) 230 | { 231 | s->columnpointer[i] = 0; 232 | } 233 | 234 | MPI_Recv(s->columnpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, receive_id, signal, MPI_COMM_WORLD, &status); 235 | s->nnz = s->columnpointer[s->row]; 236 | for (pangulu_int64_t i = 0; i < s->nnz; i++) 237 | { 238 | s->rowindex[i] = 0; 239 | } 240 | MPI_Recv(s->rowindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, receive_id, signal + 1, MPI_COMM_WORLD, &status); 241 | } 242 | void pangulu_recv_pangulu_smatrix_value_csc(pangulu_smatrix *s, 243 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 244 | { 245 | 246 | MPI_Status status; 247 | for (pangulu_int64_t i = 0; i < s->nnz; i++) 248 | { 249 | s->value_csc[i] = (calculate_type)0.0; 250 | } 251 | 252 | MPI_Recv(s->value_csc, s->nnz, MPI_VAL_TYPE, receive_id, signal + 2, MPI_COMM_WORLD, &status); 253 | } 254 | 255 | void pangulu_recv_pangulu_smatrix_value_csc_in_signal(pangulu_smatrix *s, 256 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 257 | { 258 | MPI_Status status; 259 | for (pangulu_int64_t i = 0; i < s->nnz; i++) 260 | { 261 | s->value_csc[i] = (calculate_type)0.0; 262 | } 263 | MPI_Recv(s->value_csc, s->nnz, MPI_VAL_TYPE, receive_id, signal, MPI_COMM_WORLD, &status); 264 | } 265 | 266 | void pangulu_recv_pangulu_smatrix_complete_csc(pangulu_smatrix *s, 267 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 268 | { 269 | 270 | pangulu_recv_pangulu_smatrix_struct_csc(s, receive_id, signal * 3, nb); 271 | pangulu_recv_pangulu_smatrix_value_csc(s, receive_id, signal * 3, nb); 272 | } 273 | 274 | void pangulu_recv_whole_pangulu_smatrix_csc(pangulu_smatrix *s, 275 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nnz, pangulu_int64_t nb) 276 | { 277 | #ifdef CHECK_TIME 278 | struct timeval GET_TIME_START; 279 | pangulu_time_check_begin(&GET_TIME_START); 280 | #endif 281 | pangulu_int64_t length = sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz + sizeof(calculate_type) * nnz; 282 | MPI_Status status; 283 | char *now_vector = (char *)(s->columnpointer); 284 | for (pangulu_int64_t i = 0; i < length; i++) 285 | { 286 | now_vector[i] = 0; 287 | } 288 | s->rowindex = (pangulu_inblock_idx *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1)); 289 | s->value_csc = (calculate_type *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz); 290 | MPI_Recv(now_vector, length, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &status); 291 | s->nnz = nnz; 292 | #ifdef CHECK_TIME 293 | time_receive += pangulu_time_check_end(&GET_TIME_START); 294 | #endif 295 | } 296 | 297 | int pangulu_iprobe_message(MPI_Status *status) 298 | { 299 | int flag; 300 | MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, status); 301 | return flag; 302 | } 303 | 304 | void pangulu_isend_pangulu_smatrix_value_csr(pangulu_smatrix *s, 305 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 306 | { 307 | 308 | MPI_Request req; 309 | MPI_Isend(s->value, s->nnz, MPI_VAL_TYPE, send_id, signal + 2, MPI_COMM_WORLD, &req); 310 | } 311 | void pangulu_isend_pangulu_smatrix_struct_csr(pangulu_smatrix *s, 312 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 313 | { 314 | MPI_Request req; 315 | MPI_Isend(s->rowpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, send_id, signal, MPI_COMM_WORLD, &req); 316 | MPI_Isend(s->columnindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, send_id, signal + 1, MPI_COMM_WORLD, &req); 317 | } 318 | 319 | void pangulu_isend_pangulu_smatrix_complete_csr(pangulu_smatrix *s, 320 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 321 | { 322 | 323 | pangulu_isend_pangulu_smatrix_struct_csr(s, send_id, signal * 3, nb); 324 | pangulu_isend_pangulu_smatrix_value_csr(s, send_id, signal * 3, nb); 325 | } 326 | 327 | void pangulu_isend_whole_pangulu_smatrix_csr(pangulu_smatrix *s, 328 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 329 | { 330 | pangulu_int64_t nnz = s->nnz; 331 | pangulu_int64_t length = sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz + sizeof(calculate_type) * nnz; 332 | MPI_Request req; 333 | char *now_vector = (char *)(s->rowpointer); 334 | calculate_type *value = (calculate_type *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz); 335 | if (value != s->value) 336 | { 337 | printf(PANGULU_E_ISEND_CSR); 338 | pangulu_exit(1); 339 | } 340 | MPI_Isend(now_vector, length, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &req); 341 | } 342 | 343 | void pangulu_isend_pangulu_smatrix_value_csc(pangulu_smatrix *s, 344 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 345 | { 346 | MPI_Request req; 347 | MPI_Isend(s->value_csc, s->nnz, MPI_VAL_TYPE, send_id, signal + 2, MPI_COMM_WORLD, &req); 348 | } 349 | 350 | void pangulu_isend_pangulu_smatrix_value_csc_in_signal(pangulu_smatrix *s, 351 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 352 | { 353 | MPI_Request req; 354 | MPI_Isend(s->value_csc, s->nnz, MPI_VAL_TYPE, send_id, signal, MPI_COMM_WORLD, &req); 355 | } 356 | 357 | void pangulu_isend_pangulu_smatrix_struct_csc(pangulu_smatrix *s, 358 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 359 | { 360 | MPI_Request req; 361 | MPI_Isend(s->columnpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, send_id, signal, MPI_COMM_WORLD, &req); 362 | MPI_Isend(s->rowindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, send_id, signal + 1, MPI_COMM_WORLD, &req); 363 | } 364 | 365 | void pangulu_isend_pangulu_smatrix_complete_csc(pangulu_smatrix *s, 366 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb) 367 | { 368 | 369 | pangulu_isend_pangulu_smatrix_struct_csc(s, send_id, signal * 3, nb); 370 | pangulu_isend_pangulu_smatrix_value_csc(s, send_id, signal * 3, nb); 371 | } 372 | 373 | void pangulu_isend_whole_pangulu_smatrix_csc(pangulu_smatrix *s, 374 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb) 375 | { 376 | pangulu_int64_t nnz = s->nnz; 377 | pangulu_int64_t length = sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz + sizeof(calculate_type) * nnz; 378 | MPI_Request req; 379 | char *now_vector = (char *)(s->columnpointer); 380 | calculate_type *value = (calculate_type *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz); 381 | if (value != s->value_csc) 382 | { 383 | printf(PANGULU_E_ISEND_CSC); 384 | pangulu_exit(1); 385 | } 386 | MPI_Isend(now_vector, length, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &req); 387 | } -------------------------------------------------------------------------------- /src/pangulu_spmv_fp64.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | void pangulu_spmv_cpu_choumi(pangulu_smatrix *s, pangulu_vector *x, pangulu_vector *b) 4 | { 5 | calculate_type *value = s->value; 6 | calculate_type *bval = b->value; 7 | calculate_type *xval = x->value; 8 | pangulu_int64_t n = s->column; 9 | pangulu_int64_t m = s->row; 10 | for (pangulu_int64_t i = 0; i < m; i++) 11 | bval[i] = 0.0; 12 | for (pangulu_int64_t i = 0; i < m; i++) 13 | { 14 | for (pangulu_int64_t j = 0; j < n; j++) 15 | { 16 | bval[i] += value[i * n + j] * xval[j]; 17 | } 18 | } 19 | } 20 | 21 | void pangulu_spmv_cpu_xishu(pangulu_smatrix *s, pangulu_vector *x, pangulu_vector *b, pangulu_int64_t vector_number) 22 | { 23 | pangulu_int64_t m = s->row; 24 | pangulu_inblock_ptr *csrRowPtr_tmp = s->rowpointer; 25 | pangulu_inblock_idx *csrColIdx_tmp = s->columnindex; 26 | calculate_type *csrVal_tmp = s->value; 27 | for (pangulu_int64_t vector_index = 0; vector_index < vector_number; vector_index++) 28 | { 29 | calculate_type *xval = x->value + vector_index * m; 30 | calculate_type *yval = b->value + vector_index * m; 31 | for (pangulu_int64_t i = 0; i < m; i++) 32 | { 33 | for (pangulu_int64_t j = csrRowPtr_tmp[i]; j < csrRowPtr_tmp[i + 1]; j++) 34 | { 35 | yval[i] += csrVal_tmp[j] * xval[csrColIdx_tmp[j]]; 36 | } 37 | } 38 | } 39 | } 40 | 41 | void pangulu_spmv_cpu_xishu_csc(pangulu_smatrix *s, pangulu_vector *x, pangulu_vector *b, pangulu_int64_t vector_number) 42 | { 43 | pangulu_int64_t m = s->row; 44 | pangulu_inblock_ptr *csccolumnPtr_tmp = s->columnpointer; 45 | pangulu_inblock_idx *cscrowIdx_tmp = s->rowindex; 46 | calculate_type *cscVal_tmp = s->value_csc; 47 | for (pangulu_int64_t vector_index = 0; vector_index < vector_number; vector_index++) 48 | { 49 | calculate_type *xval = x->value + vector_index * m; 50 | calculate_type *yval = b->value + vector_index * m; 51 | for (pangulu_int64_t i = 0; i < m; i++) 52 | { 53 | for (pangulu_int64_t j = csccolumnPtr_tmp[i]; j < csccolumnPtr_tmp[i + 1]; j++) 54 | { 55 | pangulu_inblock_idx row = cscrowIdx_tmp[j]; 56 | yval[row] += cscVal_tmp[j] * xval[i]; 57 | } 58 | } 59 | } 60 | } 61 | 62 | void pangulu_vector_add_cpu(pangulu_vector *b, pangulu_vector *x) 63 | { 64 | 65 | calculate_type *xval = x->value; 66 | calculate_type *bval = b->value; 67 | pangulu_int64_t n = x->row; 68 | for (pangulu_int64_t i = 0; i < n; i++) 69 | { 70 | bval[i] += xval[i]; 71 | } 72 | } 73 | 74 | void pangulu_vector_sub_cpu(pangulu_vector *b, pangulu_vector *x) 75 | { 76 | 77 | calculate_type *xval = x->value; 78 | calculate_type *bval = b->value; 79 | pangulu_int64_t n = x->row; 80 | for (pangulu_int64_t i = 0; i < n; i++) 81 | { 82 | bval[i] -= xval[i]; 83 | } 84 | } 85 | 86 | void pangulu_vector_copy_cpu(pangulu_vector *b, pangulu_vector *x) 87 | { 88 | 89 | calculate_type *xval = x->value; 90 | calculate_type *bval = b->value; 91 | pangulu_int64_t n = x->row; 92 | for (pangulu_int64_t i = 0; i < n; i++) 93 | { 94 | bval[i] = xval[i]; 95 | } 96 | } -------------------------------------------------------------------------------- /src/pangulu_sptrsv_fp64.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | void pangulu_sptrsv_cpu_choumi(pangulu_smatrix *s,pangulu_vector *x,pangulu_vector *b) 4 | { 5 | calculate_type *value=s->value; 6 | calculate_type *bval=b->value; 7 | calculate_type *xval=x->value; 8 | pangulu_int64_t n=s->column; 9 | for(pangulu_int64_t i=0;irow; 24 | pangulu_inblock_ptr *csr_row_ptr_tmp=s->rowpointer; 25 | pangulu_inblock_idx *csr_col_idx_tmp=s->columnindex; 26 | calculate_type *csr_val_tmp=s->value; 27 | for(pangulu_int64_t vector_index=0;vector_indexvalue+vector_index*row; 29 | calculate_type *bval=b->value+vector_index*row; 30 | for(pangulu_int64_t i=0;icolumn; 52 | pangulu_inblock_ptr *csc_column_ptr_tmp=s->columnpointer; 53 | pangulu_inblock_idx *csc_row_idx_tmp=s->rowindex; 54 | calculate_type *cscVal_tmp = s->value_csc; 55 | if(tag==0){ 56 | for(pangulu_int64_t vector_index=0;vector_indexvalue+vector_index*col; 58 | calculate_type *bval=b->value+vector_index*col; 59 | for(pangulu_int64_t i=0;iSPTRSV_ERROR) 63 | xval[i]=bval[i]/cscVal_tmp[csc_column_ptr_tmp[i]]; 64 | else 65 | xval[i]=bval[i]/SPTRSV_ERROR; 66 | } 67 | else{ 68 | xval[i]=0.0; 69 | continue; 70 | } 71 | for(pangulu_int64_t j=csc_column_ptr_tmp[i]+1;jvalue+vector_index*col; 82 | calculate_type *bval=b->value+vector_index*col; 83 | for(pangulu_int64_t i=col-1;i>=0;i--) 84 | { 85 | if(csc_row_idx_tmp[csc_column_ptr_tmp[i+1]-1]==i){ 86 | if(fabs(cscVal_tmp[csc_column_ptr_tmp[i+1]-1])>SPTRSV_ERROR) 87 | xval[i]=bval[i]/cscVal_tmp[csc_column_ptr_tmp[i+1]-1]; 88 | else 89 | xval[i]=bval[i]/SPTRSV_ERROR; 90 | } 91 | else{ 92 | xval[i]=0.0; 93 | continue; 94 | } 95 | if(csc_column_ptr_tmp[i+1]>=2){ // Don't modify this to csc_column_ptr_tmp[i+1]-2>=0, because values in array csc_column_ptr_tmp are unsigned. 96 | for(pangulu_int64_t j=csc_column_ptr_tmp[i+1]-2;j>=csc_column_ptr_tmp[i];j--) 97 | { 98 | pangulu_inblock_idx row=csc_row_idx_tmp[j]; 99 | bval[row]-=cscVal_tmp[j]*xval[i]; 100 | } 101 | } 102 | } 103 | } 104 | } 105 | 106 | } 107 | -------------------------------------------------------------------------------- /src/pangulu_ssssm_fp64_cuda.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | #ifdef GPU_OPEN 4 | void pangulu_ssssm_fp64_cuda(pangulu_smatrix *a, 5 | pangulu_smatrix *l, 6 | pangulu_smatrix *u) 7 | { 8 | int n = a->row; 9 | int nnz_a = a->columnpointer[n] - a->columnpointer[0]; 10 | double sparsity_A = (double)nnz_a / (double)(n * n); 11 | 12 | if (sparsity_A < 0.001) 13 | { 14 | pangulu_ssssm_cuda_kernel(a->row, 15 | a->bin_rowpointer, 16 | a->cuda_bin_rowpointer, 17 | a->cuda_bin_rowindex, 18 | u->cuda_rowpointer, 19 | u->cuda_columnindex, 20 | u->cuda_value, 21 | l->cuda_rowpointer, 22 | l->cuda_columnindex, 23 | l->cuda_value, 24 | a->cuda_rowpointer, 25 | a->cuda_columnindex, 26 | a->cuda_value); 27 | } 28 | else 29 | { 30 | pangulu_ssssm_dense_cuda_kernel(a->row, 31 | a->columnpointer[a->row], 32 | u->columnpointer[u->row], 33 | l->cuda_rowpointer, 34 | l->cuda_columnindex, 35 | l->cuda_value, 36 | u->cuda_rowpointer, 37 | u->cuda_columnindex, 38 | u->cuda_value, 39 | a->cuda_rowpointer, 40 | a->cuda_columnindex, 41 | a->cuda_value); 42 | } 43 | } 44 | 45 | void pangulu_ssssm_interface_G_V1(pangulu_smatrix *a, 46 | pangulu_smatrix *l, 47 | pangulu_smatrix *u) 48 | { 49 | pangulu_ssssm_cuda_kernel(a->row, 50 | a->bin_rowpointer, 51 | a->cuda_bin_rowpointer, 52 | a->cuda_bin_rowindex, 53 | u->cuda_rowpointer, 54 | u->cuda_columnindex, 55 | u->cuda_value, 56 | l->cuda_rowpointer, 57 | l->cuda_columnindex, 58 | l->cuda_value, 59 | a->cuda_rowpointer, 60 | a->cuda_columnindex, 61 | a->cuda_value); 62 | } 63 | void pangulu_ssssm_interface_G_V2(pangulu_smatrix *a, 64 | pangulu_smatrix *l, 65 | pangulu_smatrix *u) 66 | { 67 | pangulu_ssssm_dense_cuda_kernel(a->row, 68 | a->columnpointer[a->row], 69 | u->columnpointer[u->row], 70 | l->cuda_rowpointer, 71 | l->cuda_columnindex, 72 | l->cuda_value, 73 | u->cuda_rowpointer, 74 | u->cuda_columnindex, 75 | u->cuda_value, 76 | a->cuda_rowpointer, 77 | a->cuda_columnindex, 78 | a->cuda_value); 79 | } 80 | #endif -------------------------------------------------------------------------------- /src/pangulu_thread.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | void pangulu_mutex_init(pthread_mutex_t *mutex) 4 | { 5 | pthread_mutex_init((mutex), NULL); 6 | } 7 | 8 | void pangulu_bsem_init(bsem *bsem_p, pangulu_int64_t value) 9 | { 10 | if (value < 0 || value > 1) 11 | { 12 | exit(1); 13 | } 14 | bsem_p->mutex = (pthread_mutex_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pthread_mutex_t)); 15 | bsem_p->cond = (pthread_cond_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pthread_cond_t)); 16 | pangulu_mutex_init((bsem_p->mutex)); 17 | pthread_cond_init((bsem_p->cond), NULL); 18 | bsem_p->v = value; 19 | } 20 | 21 | bsem *pangulu_bsem_destory(bsem *bsem_p) 22 | { 23 | pangulu_free(__FILE__, __LINE__, bsem_p->mutex); 24 | bsem_p->mutex = NULL; 25 | pangulu_free(__FILE__, __LINE__, bsem_p->cond); 26 | bsem_p->cond = NULL; 27 | bsem_p->v = 0; 28 | pangulu_free(__FILE__, __LINE__, bsem_p); 29 | return NULL; 30 | } 31 | 32 | void pangulu_bsem_post(pangulu_heap *heap) 33 | { 34 | bsem *bsem_p = heap->heap_bsem; 35 | pthread_mutex_lock(bsem_p->mutex); 36 | pangulu_int64_t flag = heap_empty(heap); 37 | if (((bsem_p->v == 0) && (flag == 0))) 38 | { 39 | bsem_p->v = 1; 40 | // get bsem p 41 | pthread_cond_signal(bsem_p->cond); 42 | // send 43 | } 44 | pthread_mutex_unlock(bsem_p->mutex); 45 | } 46 | 47 | pangulu_int64_t pangulu_bsem_wait(pangulu_heap *heap) 48 | { 49 | bsem *heap_bsem = heap->heap_bsem; 50 | pthread_mutex_t *heap_mutex = heap_bsem->mutex; 51 | 52 | pthread_mutex_lock(heap_mutex); 53 | if (heap_empty(heap) == 1) 54 | { 55 | heap_bsem->v = 0; 56 | while (heap_bsem->v == 0) 57 | { 58 | // wait 59 | pthread_cond_wait(heap_bsem->cond, heap_bsem->mutex); 60 | } 61 | } 62 | 63 | pangulu_int64_t compare_flag = pangulu_heap_delete(heap); 64 | heap_bsem->v = 1; 65 | pthread_mutex_unlock(heap_mutex); 66 | return compare_flag; 67 | } 68 | 69 | void pangulu_bsem_stop(pangulu_heap *heap) 70 | { 71 | bsem *bsem_p = heap->heap_bsem; 72 | pthread_mutex_lock(bsem_p->mutex); 73 | bsem_p->v = 0; 74 | pthread_mutex_unlock(bsem_p->mutex); 75 | } 76 | 77 | void pangulu_bsem_synchronize(bsem *bsem_p) 78 | { 79 | pthread_mutex_lock((bsem_p->mutex)); 80 | pangulu_int64_t v = bsem_p->v; 81 | if (v == 1) 82 | { 83 | bsem_p->v = 0; 84 | pthread_cond_signal(bsem_p->cond); 85 | pthread_mutex_unlock(bsem_p->mutex); 86 | } 87 | else 88 | { 89 | bsem_p->v = 1; 90 | while (bsem_p->v == 1) 91 | { 92 | pthread_cond_wait((bsem_p->cond), (bsem_p->mutex)); 93 | bsem_p->v = 0; 94 | } 95 | bsem_p->v = 0; 96 | pthread_mutex_unlock(bsem_p->mutex); 97 | } 98 | } -------------------------------------------------------------------------------- /src/pangulu_time.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | 3 | #ifdef CHECK_TIME 4 | void pangulu_time_check_begin(struct timeval *GET_TIME_START) 5 | { 6 | gettimeofday((GET_TIME_START), NULL); 7 | } 8 | 9 | double pangulu_time_check_end(struct timeval *GET_TIME_START) 10 | { 11 | struct timeval GET_TIME_END; 12 | gettimeofday((&GET_TIME_END), NULL); 13 | return (((GET_TIME_END.tv_sec - GET_TIME_START->tv_sec) * 1000.0 + (GET_TIME_END.tv_usec - GET_TIME_START->tv_usec) / 1000.0))/1000.0; 14 | } 15 | 16 | void pangulu_time_init() 17 | { 18 | time_transpose = 0.0; 19 | time_isend = 0.0; 20 | time_receive = 0.0; 21 | time_getrf = 0.0; 22 | time_tstrf = 0.0; 23 | time_gessm = 0.0; 24 | time_gessm_sparse = 0.0; 25 | time_gessm_dense = 0.0; 26 | time_ssssm = 0.0; 27 | time_cuda_memcpy = 0.0; 28 | time_wait = 0.0; 29 | return; 30 | } 31 | 32 | void pangulu_time_simple_output(pangulu_int64_t rank) 33 | { 34 | printf( FMT_PANGULU_INT64_T "\t" "%.5lf\t%.5lf\t%.5lf\t%.5lf\t%.5lf\t%.5lf\t%.5lf\n", 35 | rank, 36 | calculate_time_wait, 37 | time_getrf, 38 | time_tstrf, 39 | time_gessm, 40 | time_ssssm, time_gessm + time_getrf + time_tstrf + time_ssssm, time_cuda_memcpy); 41 | } 42 | #endif // CHECK_TIME -------------------------------------------------------------------------------- /src/pangulu_tstrf_fp64.c: -------------------------------------------------------------------------------- 1 | #include "pangulu_common.h" 2 | void pangulu_tstrf_fp64_cpu_1(pangulu_smatrix *a, 3 | pangulu_smatrix *x, 4 | pangulu_smatrix *u) 5 | { 6 | pangulu_inblock_ptr *x_colpointer = a->columnpointer; 7 | pangulu_inblock_idx *x_rowindex = a->rowindex; 8 | calculate_type *x_value = a->value_csc; 9 | pangulu_inblock_ptr *u_rowpointer = u->rowpointer; 10 | pangulu_inblock_idx *u_columnindex = u->columnindex; 11 | calculate_type *u_value = u->value; 12 | pangulu_inblock_ptr *a_colpointer = a->columnpointer; 13 | pangulu_inblock_idx *a_rowindex = a->rowindex; 14 | calculate_type *a_value = x->value_csc; 15 | pangulu_int64_t n = a->row; 16 | 17 | for (pangulu_int64_t i = 0; i < a->nnz; i++) 18 | { 19 | x_value[i] = 0.0; 20 | } 21 | for (pangulu_int64_t i = 0; i < n; i++) 22 | { 23 | calculate_type t = u_value[u_rowpointer[i]]; 24 | if (fabs(t) < ERROR) 25 | { 26 | t = ERROR; 27 | } 28 | for (pangulu_int64_t k = a_colpointer[i]; k < a_colpointer[i + 1]; k++) 29 | { 30 | x_value[k] = a_value[k] / t; 31 | } 32 | // update Value 33 | if (a_colpointer[i] != a_colpointer[i + 1]) 34 | { 35 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 36 | for (pangulu_int64_t k = u_rowpointer[i]; k < u_rowpointer[i + 1]; k++) 37 | { 38 | pangulu_int64_t p = x_colpointer[i]; 39 | for (pangulu_int64_t s = a_colpointer[u_columnindex[k]]; s < a_colpointer[u_columnindex[k] + 1]; s++) 40 | { 41 | if (x_rowindex[p] == a_rowindex[s]) 42 | { 43 | a_value[s] -= x_value[p] * u_value[k]; 44 | p++; 45 | } 46 | else 47 | { 48 | continue; 49 | } 50 | } 51 | } 52 | } 53 | } 54 | } 55 | 56 | void pangulu_tstrf_fp64_cpu_2(pangulu_smatrix *a, 57 | pangulu_smatrix *x, 58 | pangulu_smatrix *u) 59 | { 60 | 61 | pangulu_inblock_ptr *A_columnpointer = a->rowpointer; 62 | pangulu_inblock_idx *A_rowidx = a->columnindex; 63 | 64 | calculate_type *a_value = a->value; 65 | 66 | pangulu_inblock_ptr *L_rowpointer = u->columnpointer; 67 | 68 | pangulu_inblock_ptr *L_colpointer = u->rowpointer; 69 | pangulu_inblock_idx *L_rowindex = u->columnindex; 70 | calculate_type *L_value = u->value; 71 | 72 | pangulu_int64_t n = a->row; 73 | 74 | pangulu_int64_t *Spointer = (pangulu_int64_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_int64_t) * (n + 1)); 75 | memset(Spointer, 0, sizeof(pangulu_int64_t) * (n + 1)); 76 | int rhs = 0; 77 | for (pangulu_int64_t i = 0; i < n; i++) 78 | { 79 | if (A_columnpointer[i] != A_columnpointer[i + 1]) 80 | { 81 | Spointer[rhs] = i; 82 | rhs++; 83 | } 84 | } 85 | 86 | calculate_type *C_b = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * rhs); 87 | calculate_type *D_x = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * rhs); 88 | 89 | memset(C_b, 0.0, sizeof(calculate_type) * n * rhs); 90 | memset(D_x, 0.0, sizeof(calculate_type) * n * rhs); 91 | 92 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 93 | for (int i = 0; i < rhs; i++) 94 | { 95 | int index = Spointer[i]; 96 | for (int j = A_columnpointer[index]; j < A_columnpointer[index + 1]; j++) 97 | { 98 | C_b[i * n + A_rowidx[j]] = a_value[j]; 99 | } 100 | } 101 | 102 | int nlevel = 0; 103 | int *levelPtr = (int *)pangulu_malloc(__FILE__, __LINE__, sizeof(int) * (n + 1)); 104 | int *levelItem = (int *)pangulu_malloc(__FILE__, __LINE__, sizeof(int) * n); 105 | findlevel(L_colpointer, L_rowindex, L_rowpointer, n, &nlevel, levelPtr, levelItem); 106 | 107 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 108 | for (int i = 0; i < rhs; i++) 109 | { 110 | for (int li = 0; li < nlevel; li++) 111 | { 112 | 113 | for (int ri = levelPtr[li]; ri < levelPtr[li + 1]; ri++) 114 | { 115 | C_b[i * n + levelItem[ri]] /= L_value[L_colpointer[levelItem[ri]]]; 116 | for (int j = L_colpointer[levelItem[ri]] + 1; j < L_colpointer[levelItem[ri] + 1]; j++) 117 | { 118 | C_b[i * n + L_rowindex[j]] -= L_value[j] * C_b[i * n + levelItem[ri]]; 119 | } 120 | } 121 | } 122 | } 123 | 124 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 125 | for (int i = 0; i < rhs; i++) 126 | { 127 | int index = Spointer[i]; 128 | for (int j = A_columnpointer[index]; j < A_columnpointer[index + 1]; j++) 129 | { 130 | a_value[j] = C_b[i * n + A_rowidx[j]]; 131 | } 132 | } 133 | 134 | pangulu_free(__FILE__, __LINE__, Spointer); 135 | pangulu_free(__FILE__, __LINE__, C_b); 136 | pangulu_free(__FILE__, __LINE__, D_x); 137 | } 138 | void pangulu_tstrf_fp64_cpu_3(pangulu_smatrix *a, 139 | pangulu_smatrix *x, 140 | pangulu_smatrix *u) 141 | { 142 | 143 | pangulu_inblock_ptr *A_columnpointer = a->rowpointer; 144 | pangulu_inblock_idx *A_rowidx = a->columnindex; 145 | 146 | calculate_type *a_value = a->value; 147 | 148 | pangulu_inblock_ptr *L_columnpointer = u->rowpointer; 149 | pangulu_inblock_idx *L_rowidx = u->columnindex; 150 | calculate_type *L_value = u->value; 151 | 152 | pangulu_int64_t n = a->row; 153 | 154 | calculate_type *C_b = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * n); 155 | 156 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 157 | for (int i = 0; i < n; i++) // jth column of u 158 | { 159 | for (int j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++) 160 | { 161 | int idx = A_rowidx[j]; 162 | C_b[i * n + idx] = a_value[j]; // tranform csr to dense,only value 163 | } 164 | } 165 | 166 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 167 | for (pangulu_int64_t i = 0; i < n; i++) 168 | { 169 | for (pangulu_int64_t j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++) 170 | { 171 | C_b[i * n + A_rowidx[j]] /= L_value[L_columnpointer[A_rowidx[j]]]; 172 | pangulu_inblock_idx idx = A_rowidx[j]; 173 | for (pangulu_int64_t k = L_columnpointer[idx] + 1; k < L_columnpointer[idx + 1]; k++) 174 | { 175 | C_b[i * n + L_rowidx[k]] -= L_value[k] * C_b[i * n + A_rowidx[j]]; 176 | } 177 | } 178 | } 179 | 180 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 181 | for (int i = 0; i < n; i++) 182 | { 183 | for (int j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++) 184 | { 185 | int idx = A_rowidx[j]; 186 | a_value[j] = C_b[i * n + idx]; 187 | } 188 | } 189 | pangulu_free(__FILE__, __LINE__, C_b); 190 | } 191 | void pangulu_tstrf_fp64_cpu_4(pangulu_smatrix *a, 192 | pangulu_smatrix *x, 193 | pangulu_smatrix *u) 194 | { 195 | 196 | pangulu_inblock_ptr *A_columnpointer = a->rowpointer; 197 | pangulu_inblock_idx *A_rowidx = a->columnindex; 198 | 199 | calculate_type *a_value = a->value; 200 | 201 | pangulu_inblock_ptr *L_columnpointer = u->rowpointer; 202 | pangulu_inblock_idx *L_rowidx = u->columnindex; 203 | calculate_type *L_value = u->value; 204 | 205 | pangulu_int64_t n = a->row; 206 | 207 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 208 | for (pangulu_int64_t i = 0; i < n; i++) 209 | { 210 | for (pangulu_int64_t j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++) 211 | { 212 | pangulu_inblock_idx idx = A_rowidx[j]; 213 | a_value[j] /= L_value[L_columnpointer[idx]]; 214 | for (pangulu_int64_t k = L_columnpointer[idx] + 1, p = j + 1; k < L_columnpointer[idx + 1] && p < A_columnpointer[i + 1]; k++, p++) 215 | { 216 | if (L_rowidx[k] == A_rowidx[p]) 217 | { 218 | a_value[p] -= L_value[k] * a_value[j]; 219 | } 220 | else 221 | { 222 | k--; 223 | } 224 | } 225 | } 226 | } 227 | } 228 | void pangulu_tstrf_fp64_cpu_5(pangulu_smatrix *a, 229 | pangulu_smatrix *x, 230 | pangulu_smatrix *u) 231 | { 232 | 233 | pangulu_inblock_ptr *A_rowpointer = a->columnpointer; 234 | pangulu_inblock_idx *A_colindex = a->rowindex; 235 | calculate_type *a_value = x->value_csc; 236 | 237 | pangulu_inblock_ptr *L_colpointer = u->rowpointer; 238 | pangulu_inblock_idx *L_rowindex = u->columnindex; 239 | calculate_type *L_value = u->value; 240 | 241 | pangulu_inblock_ptr *X_rowpointer = a->columnpointer; 242 | pangulu_inblock_idx *X_colindex = a->rowindex; 243 | calculate_type *x_value = a->value_csc; 244 | 245 | pangulu_int64_t n = a->row; 246 | 247 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 248 | for (int i = 0; i < n; i++) // jth column of u 249 | { 250 | for (int j = A_rowpointer[i]; j < A_rowpointer[i + 1]; j++) 251 | { 252 | pangulu_inblock_idx idx = A_colindex[j]; 253 | temp_a_value[i * n + idx] = a_value[j]; // tranform csr to dense,only value 254 | } 255 | } 256 | 257 | for (pangulu_int64_t i = 0; i < n; i++) 258 | { 259 | // x get value from a 260 | for (pangulu_int64_t k = X_rowpointer[i]; k < X_rowpointer[i + 1]; k++) 261 | { 262 | temp_a_value[i * n + X_colindex[k]] /= L_value[L_colpointer[i]]; 263 | x_value[k] = temp_a_value[i * n + X_colindex[k]]; 264 | } 265 | // update Value 266 | if (X_rowpointer[i] != X_rowpointer[i + 1]) 267 | { 268 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 269 | for (pangulu_int64_t j = L_colpointer[i] + 1; j < L_colpointer[i + 1]; j++) 270 | { 271 | pangulu_inblock_idx idx1 = L_rowindex[j]; 272 | 273 | for (pangulu_int64_t p = X_rowpointer[i]; p < X_rowpointer[i + 1]; p++) 274 | { 275 | 276 | pangulu_inblock_idx idx2 = A_colindex[p]; 277 | temp_a_value[idx1 * n + idx2] -= L_value[j] * temp_a_value[i * n + idx2]; 278 | } 279 | } 280 | } 281 | } 282 | } 283 | void pangulu_tstrf_fp64_cpu_6(pangulu_smatrix *a, 284 | pangulu_smatrix *x, 285 | pangulu_smatrix *u) 286 | { 287 | 288 | pangulu_inblock_ptr *A_columnpointer = a->rowpointer; 289 | pangulu_inblock_idx *A_rowidx = a->columnindex; 290 | 291 | calculate_type *a_value = a->value; 292 | 293 | pangulu_inblock_ptr *L_columnpointer = u->rowpointer; 294 | pangulu_inblock_idx *L_rowidx = u->columnindex; 295 | calculate_type *L_value = u->value; 296 | 297 | pangulu_inblock_ptr n = a->row; 298 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 299 | for (int i = 0; i < n; i++) // jth column of u 300 | { 301 | for (int j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++) 302 | { 303 | int idx = A_rowidx[j]; 304 | temp_a_value[i * n + idx] = a_value[j]; // tranform csr to dense,only value 305 | } 306 | } 307 | 308 | #pragma omp parallel for num_threads(pangu_omp_num_threads) 309 | for (pangulu_int64_t i = 0; i < n; i++) 310 | { 311 | for (pangulu_int64_t j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++) 312 | { 313 | pangulu_inblock_idx idx = A_rowidx[j]; 314 | 315 | a_value[j] = temp_a_value[i * n + idx] / L_value[L_columnpointer[idx]]; 316 | for (pangulu_int64_t k = L_columnpointer[idx] + 1; k < L_columnpointer[idx + 1]; k++) 317 | { 318 | temp_a_value[i * n + L_rowidx[k]] -= L_value[k] * a_value[j]; 319 | } 320 | } 321 | } 322 | } 323 | void pangulu_tstrf_interface_cpu_csr(pangulu_smatrix *a, 324 | pangulu_smatrix *x, 325 | pangulu_smatrix *u) 326 | { 327 | 328 | #ifdef OUTPUT_MATRICES 329 | char out_name_B[512]; 330 | char out_name_U[512]; 331 | sprintf(out_name_B, "%s/%s/%d%s", OUTPUT_FILE, "tstrf", tstrf_number, "_tstrf_B.cbd"); 332 | sprintf(out_name_U, "%s/%s/%d%s", OUTPUT_FILE, "tstrf", tstrf_number, "_tstrf_U.cbd"); 333 | pangulu_binary_write_csc_pangulu_smatrix(a, out_name_B); 334 | pangulu_binary_write_csc_pangulu_smatrix(u, out_name_U); 335 | tstrf_number++; 336 | #endif 337 | pangulu_tstrf_fp64_cpu_1(a, x, u); 338 | } 339 | 340 | void pangulu_tstrf_interface_cpu_csc(pangulu_smatrix *a, 341 | pangulu_smatrix *x, 342 | pangulu_smatrix *u) 343 | { 344 | pangulu_tstrf_fp64_cpu_6(a, x, u); 345 | } 346 | 347 | void pangulu_tstrf_interface_c_v1(pangulu_smatrix *a, 348 | pangulu_smatrix *x, 349 | pangulu_smatrix *u) 350 | { 351 | #ifdef GPU_OPEN 352 | pangulu_smatrix_cuda_memcpy_value_csc(a, a); 353 | #endif 354 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a); 355 | pangulu_pangulu_smatrix_memcpy_value_csc_copy_length(x, a); 356 | pangulu_tstrf_fp64_cpu_4(a, x, u); 357 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a); 358 | #ifdef GPU_OPEN 359 | pangulu_smatrix_cuda_memcpy_to_device_value_csc(a, a); 360 | #endif 361 | } 362 | void pangulu_tstrf_interface_c_v2(pangulu_smatrix *a, 363 | pangulu_smatrix *x, 364 | pangulu_smatrix *u) 365 | { 366 | #ifdef GPU_OPEN 367 | pangulu_smatrix_cuda_memcpy_value_csc(a, a); 368 | #endif 369 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a); 370 | pangulu_pangulu_smatrix_memcpy_value_csc_copy_length(x, a); 371 | pangulu_tstrf_fp64_cpu_6(a, x, u); 372 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a); 373 | #ifdef GPU_OPEN 374 | pangulu_smatrix_cuda_memcpy_to_device_value_csc(a, a); 375 | #endif 376 | } 377 | 378 | pangulu_int64_t TEMP_calculate_type_len = 0; 379 | calculate_type* TEMP_calculate_type = NULL; 380 | pangulu_int64_t TEMP_pangulu_inblock_ptr_len = 0; 381 | pangulu_inblock_ptr* TEMP_pangulu_inblock_ptr = NULL; 382 | 383 | int tstrf_csc_csc( 384 | pangulu_inblock_idx n, 385 | pangulu_inblock_ptr* U_colptr, 386 | pangulu_inblock_idx* U_rowidx, 387 | calculate_type* u_value, 388 | pangulu_inblock_ptr* A_colptr, 389 | pangulu_inblock_idx* A_rowidx, 390 | calculate_type* a_value 391 | ){ 392 | if(TEMP_calculate_type_len < n){ 393 | calculate_type* TEMP_calculate_type_old = TEMP_calculate_type; 394 | TEMP_calculate_type = (calculate_type*)pangulu_realloc(__FILE__, __LINE__, TEMP_calculate_type, n*sizeof(calculate_type)); 395 | if(TEMP_calculate_type == NULL){ 396 | pangulu_free(__FILE__, __LINE__, TEMP_calculate_type_old); 397 | TEMP_calculate_type_len = 0; 398 | printf("[ERROR] kernel error : CPU sparse tstrf : realloc TEMP_calculate_type failed.\n"); 399 | return 1; 400 | } 401 | TEMP_calculate_type_len = n; 402 | } 403 | 404 | if(TEMP_pangulu_inblock_ptr_len < n){ 405 | pangulu_inblock_ptr* TEMP_int64_old = TEMP_pangulu_inblock_ptr; 406 | TEMP_pangulu_inblock_ptr = (pangulu_inblock_ptr*)pangulu_realloc(__FILE__, __LINE__, TEMP_pangulu_inblock_ptr, n*sizeof(pangulu_inblock_ptr)); 407 | if(TEMP_pangulu_inblock_ptr == NULL){ 408 | pangulu_free(__FILE__, __LINE__, TEMP_int64_old); 409 | TEMP_pangulu_inblock_ptr_len = 0; 410 | printf("[ERROR] kernel error : CPU sparse tstrf : realloc TEMP_int64 failed.\n"); 411 | return 2; 412 | } 413 | TEMP_pangulu_inblock_ptr_len = n; 414 | } 415 | 416 | pangulu_inblock_ptr* U_next_array = TEMP_pangulu_inblock_ptr; 417 | calculate_type* A_major_column = TEMP_calculate_type; 418 | memcpy(U_next_array, U_colptr, sizeof(pangulu_inblock_ptr) * n); 419 | for(pangulu_int64_t i=0;i= U_colptr[k+1]/*U_next_array[k]跑到了下一列*/ || U_rowidx[U_next_array[k]] > i/*U的第k列中,下一个要访问的元素的行号大于A当前主列号i*/){ 429 | continue; 430 | } 431 | for(pangulu_int64_t j=A_colptr[k];jrow; 9 | pangulu_int64_t nnzU = u->nnz; 10 | pangulu_int64_t nnzA = a->nnz; 11 | 12 | /*********************************u****************************************/ 13 | int *d_graphindegree = u->d_graphindegree; 14 | cudaMemcpy(d_graphindegree, u->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice); 15 | int *d_id_extractor = u->d_id_extractor; 16 | cudaMemset(d_id_extractor, 0, sizeof(int)); 17 | calculate_type *d_left_sum = a->d_left_sum; 18 | cudaMemset(d_left_sum, 0, nnzA * sizeof(calculate_type)); 19 | /*****************************************************************************/ 20 | pangulu_tstrf_cuda_kernel_v8(n, 21 | nnzU, 22 | d_graphindegree, 23 | d_id_extractor, 24 | d_left_sum, 25 | u->cuda_rowpointer, 26 | u->cuda_columnindex, 27 | u->cuda_value, 28 | a->cuda_rowpointer, 29 | a->cuda_columnindex, 30 | x->cuda_value, 31 | a->cuda_rowpointer, 32 | a->cuda_columnindex, 33 | a->cuda_value); 34 | } 35 | 36 | void pangulu_tstrf_fp64_cuda_v9(pangulu_smatrix *a, 37 | pangulu_smatrix *x, 38 | pangulu_smatrix *u) 39 | { 40 | pangulu_int64_t n = a->row; 41 | pangulu_int64_t nnzU = u->nnz; 42 | pangulu_int64_t nnzA = a->nnz; 43 | 44 | int *d_graphindegree = u->d_graphindegree; 45 | cudaMemcpy(d_graphindegree, u->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice); 46 | int *d_id_extractor = u->d_id_extractor; 47 | cudaMemset(d_id_extractor, 0, sizeof(int)); 48 | 49 | int *d_while_profiler; 50 | cudaMalloc((void **)&d_while_profiler, sizeof(int) * n); 51 | cudaMemset(d_while_profiler, 0, sizeof(int) * n); 52 | pangulu_inblock_ptr *Spointer = (pangulu_inblock_ptr *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_inblock_ptr) * (n + 1)); 53 | memset(Spointer, 0, sizeof(pangulu_int64_t) * (n + 1)); 54 | pangulu_int64_t rhs = 0; 55 | for (int i = 0; i < n; i++) 56 | { 57 | if (a->rowpointer[i] != a->rowpointer[i + 1]) 58 | { 59 | Spointer[rhs] = i; 60 | rhs++; 61 | } 62 | } 63 | calculate_type *d_left_sum; 64 | cudaMalloc((void **)&d_left_sum, n * rhs * sizeof(calculate_type)); 65 | cudaMemset(d_left_sum, 0, n * rhs * sizeof(calculate_type)); 66 | 67 | calculate_type *d_x, *d_b; 68 | cudaMalloc((void **)&d_x, n * rhs * sizeof(calculate_type)); 69 | cudaMalloc((void **)&d_b, n * rhs * sizeof(calculate_type)); 70 | cudaMemset(d_x, 0, n * rhs * sizeof(calculate_type)); 71 | cudaMemset(d_b, 0, n * rhs * sizeof(calculate_type)); 72 | 73 | pangulu_inblock_ptr *d_Spointer; 74 | cudaMalloc((void **)&d_Spointer, sizeof(pangulu_inblock_ptr) * (n + 1)); 75 | cudaMemset(d_Spointer, 0, sizeof(pangulu_inblock_ptr) * (n + 1)); 76 | cudaMemcpy(d_Spointer, Spointer, sizeof(pangulu_inblock_ptr) * (n + 1), cudaMemcpyHostToDevice); 77 | 78 | pangulu_gessm_cuda_kernel_v9(n, 79 | nnzU, 80 | rhs, 81 | nnzA, 82 | d_Spointer, 83 | d_graphindegree, 84 | d_id_extractor, 85 | d_while_profiler, 86 | u->cuda_rowpointer, 87 | u->cuda_columnindex, 88 | u->cuda_value, 89 | a->cuda_rowpointer, 90 | a->cuda_columnindex, 91 | x->cuda_value, 92 | 93 | a->cuda_rowpointer, 94 | a->cuda_columnindex, 95 | a->cuda_value, 96 | d_left_sum, 97 | d_x, 98 | d_b); 99 | 100 | cudaFree(d_x); 101 | cudaFree(d_b); 102 | cudaFree(d_left_sum); 103 | cudaFree(d_while_profiler); 104 | } 105 | 106 | void pangulu_tstrf_fp64_cuda_v7(pangulu_smatrix *a, 107 | pangulu_smatrix *x, 108 | pangulu_smatrix *u) 109 | { 110 | pangulu_int64_t n = a->row; 111 | pangulu_int64_t nnzU = u->nnz; 112 | pangulu_tstrf_cuda_kernel_v7(n, 113 | nnzU, 114 | u->cuda_rowpointer, 115 | u->cuda_columnindex, 116 | u->cuda_value, 117 | a->cuda_rowpointer, 118 | a->cuda_columnindex, 119 | x->cuda_value, 120 | a->cuda_rowpointer, 121 | a->cuda_columnindex, 122 | a->cuda_value); 123 | } 124 | 125 | void pangulu_tstrf_fp64_cuda_v10(pangulu_smatrix *a, 126 | pangulu_smatrix *x, 127 | pangulu_smatrix *u) 128 | { 129 | pangulu_int64_t n = a->row; 130 | pangulu_int64_t nnzU = u->nnz; 131 | pangulu_tstrf_cuda_kernel_v10(n, 132 | nnzU, 133 | u->cuda_rowpointer, 134 | u->cuda_columnindex, 135 | u->cuda_value, 136 | a->cuda_rowpointer, 137 | a->cuda_columnindex, 138 | x->cuda_value, 139 | a->cuda_rowpointer, 140 | a->cuda_columnindex, 141 | a->cuda_value); 142 | } 143 | 144 | void pangulu_tstrf_fp64_cuda_v11(pangulu_smatrix *a, 145 | pangulu_smatrix *x, 146 | pangulu_smatrix *u) 147 | { 148 | pangulu_int64_t n = a->row; 149 | pangulu_int64_t nnzU = u->nnz; 150 | pangulu_int64_t nnzA = a->nnz; 151 | 152 | /*********************************u****************************************/ 153 | int *d_graphindegree = u->d_graphindegree; 154 | cudaMemcpy(d_graphindegree, u->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice); 155 | int *d_id_extractor = u->d_id_extractor; 156 | cudaMemset(d_id_extractor, 0, sizeof(int)); 157 | calculate_type *d_left_sum = a->d_left_sum; 158 | cudaMemset(d_left_sum, 0, nnzA * sizeof(calculate_type)); 159 | /*****************************************************************************/ 160 | pangulu_tstrf_cuda_kernel_v11(n, 161 | nnzU, 162 | d_graphindegree, 163 | d_id_extractor, 164 | d_left_sum, 165 | u->cuda_rowpointer, 166 | u->cuda_columnindex, 167 | u->cuda_value, 168 | a->cuda_rowpointer, 169 | a->cuda_columnindex, 170 | x->cuda_value, 171 | a->cuda_rowpointer, 172 | a->cuda_columnindex, 173 | a->cuda_value); 174 | } 175 | 176 | void pangulu_tstrf_interface_G_V1(pangulu_smatrix *a, 177 | pangulu_smatrix *x, 178 | pangulu_smatrix *u) 179 | { 180 | pangulu_smatrix_cuda_memcpy_value_csc(a, a); 181 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a); 182 | pangulu_smatrix_cuda_memcpy_complete_csr(a, a); 183 | pangulu_tstrf_fp64_cuda_v7(a, x, u); 184 | pangulu_smatrix_cuda_memcpy_value_csr(a, x); 185 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a); 186 | } 187 | void pangulu_tstrf_interface_G_V2(pangulu_smatrix *a, 188 | pangulu_smatrix *x, 189 | pangulu_smatrix *u) 190 | { 191 | pangulu_tstrf_fp64_cuda_v8(a, x, u); 192 | pangulu_smatrix_cuda_memcpy_value_csc(a, x); 193 | } 194 | void pangulu_tstrf_interface_G_V3(pangulu_smatrix *a, 195 | pangulu_smatrix *x, 196 | pangulu_smatrix *u) 197 | { 198 | pangulu_smatrix_cuda_memcpy_value_csc(a, a); 199 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a); 200 | pangulu_smatrix_cuda_memcpy_complete_csr(a, a); 201 | pangulu_tstrf_fp64_cuda_v10(a, x, u); 202 | pangulu_smatrix_cuda_memcpy_value_csr(a, x); 203 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a); 204 | } 205 | #endif -------------------------------------------------------------------------------- /src/platforms/02_GPU/01_CUDA/000_CUDA/Makefile: -------------------------------------------------------------------------------- 1 | include ../../../../../make.inc 2 | pangulu_0201000.o:pangulu_cuda.cu 3 | $(NVCC) $(NVCCFLAGS) $(METIS_INC) -Xcompiler -fPIC -c $< -o $@ 4 | mv $@ ../../../../../lib -------------------------------------------------------------------------------- /src/platforms/platform_list.csv: -------------------------------------------------------------------------------- 1 | 0201000,GPU_CUDA --------------------------------------------------------------------------------