├── LICENSE
├── Makefile
├── PanguLU_Users_Guide.pdf
├── PanguLU_Users_s_Guide.pdf
├── README.md
├── build_helper.py
├── build_list.csv
├── examples
├── Makefile
├── Trefethen_20b.mtx
├── example.c
├── mmio.h
├── mmio_highlevel.h
└── run.sh
├── include
├── pangulu.h
└── pangulu_interface_common.h
├── lib
└── Makefile
├── make.inc
└── src
├── Makefile
├── languages
├── pangulu_en.h
└── pangulu_en_us.h
├── pangulu.c
├── pangulu_addmatrix.c
├── pangulu_addmatrix_cuda.c
├── pangulu_check.c
├── pangulu_common.h
├── pangulu_cuda_interface.c
├── pangulu_destroy.c
├── pangulu_gessm_fp64.c
├── pangulu_gessm_fp64_cuda.c
├── pangulu_getrf_fp64.c
├── pangulu_getrf_fp64_cuda.c
├── pangulu_heap.c
├── pangulu_kernel_interface.c
├── pangulu_malloc.c
├── pangulu_mpi.c
├── pangulu_numeric.c
├── pangulu_preprocessing.c
├── pangulu_reorder.c
├── pangulu_spmv_fp64.c
├── pangulu_sptrsv.c
├── pangulu_sptrsv_fp64.c
├── pangulu_ssssm_fp64.c
├── pangulu_ssssm_fp64_cuda.c
├── pangulu_symbolic.c
├── pangulu_thread.c
├── pangulu_time.c
├── pangulu_tstrf_fp64.c
├── pangulu_tstrf_fp64_cuda.c
├── pangulu_utils.c
└── platforms
├── 02_GPU
└── 01_CUDA
│ └── 000_CUDA
│ ├── Makefile
│ ├── pangulu_cuda.cu
│ └── pangulu_cuda.h
└── platform_list.csv
/Makefile:
--------------------------------------------------------------------------------
1 | all : examples
2 |
3 | .PHONY : examples lib src clean update
4 |
5 | examples : lib
6 | $(MAKE) -C $@
7 |
8 | lib : src
9 | $(MAKE) -C $@
10 |
11 | src:
12 | $(MAKE) -C $@
13 |
14 | clean:
15 | (cd src; $(MAKE) clean)
16 | (cd lib; $(MAKE) clean)
17 | (cd examples; $(MAKE) clean)
18 |
19 | update : clean all
--------------------------------------------------------------------------------
/PanguLU_Users_Guide.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SuperScientificSoftwareLaboratory/PanguLU/d11577cf0f5f1dae5ca02fd1d3128982e215ad60/PanguLU_Users_Guide.pdf
--------------------------------------------------------------------------------
/PanguLU_Users_s_Guide.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SuperScientificSoftwareLaboratory/PanguLU/d11577cf0f5f1dae5ca02fd1d3128982e215ad60/PanguLU_Users_s_Guide.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # PanguLU
2 |
3 | -------------------
4 |
5 | ## Introduction
6 |
7 | PanguLU is an open source software package for solving a linear system *Ax = b* on heterogeneous distributed platforms. The library is written in C, and exploits parallelism from MPI, OpenMP and CUDA. The sparse LU factorisation algorithm used in PanguLU splits the sparse matrix into multiple equally-sized sparse matrix blocks and computes them by using sparse BLAS. The latest version of PanguLU uses a synchronisation-free communication strategy to reduce the overall latency overhead, and a variety of block-wise sparse BLAS methods have been adaptively called to improve efficiency on CPUs and GPUs. Currently, PanguLU supports both single and double precision, both real and complex values. In addition, our team at the SSSLab is constantly optimising and updating PanguLU.
8 |
9 | ## Structure of code
10 |
11 | ```
12 | PanguLU/README instructions on installation
13 | PanguLU/src C and CUDA source code, to be compiled into libpangulu.a and libpangulu.so
14 | PanguLU/examples example code
15 | PanguLU/include contains headers archieve libpangulu.a and libpangulu.so
16 | PanguLU/lib contains library archieve libpangulu.a and libpangulu.so
17 | PanguLU/Makefile top-level Makefile that does installation and testing
18 | PanguLU/make.inc compiler, compiler flags included in all Makefiles (excepts examples/Makefile)
19 | ```
20 |
21 | ## Installation
22 | #### Step 1 : Assert "make" is available.
23 | "make" is an automatic build tool, it is required to build PanguLU. "make" is available in most GNU/Linux. You can install it using package managers like `apt` or `yum`.
24 |
25 | #### Step 2 : Assert MPI library is available.
26 | PanguLU requires MPI library. you need to install MPI library with header files. Tested MPI libraries : OpenMPI 4.1.2, Intel MPI 2021.12.
27 |
28 | #### Step 3 : Assert CUDA is available. (optimal, required if GPU is used)
29 | If GPUs are used, CUDA is required. Tested version : CUDA 12.2.
30 |
31 | #### Step 4 : Assert BLAS library is available. (optimal, required if GPU is not used)
32 | A BLAS library is required if CPU takes part in algebra computing of numeric factorization. Tested version : OpenBLAS 0.3.26.
33 |
34 | #### Step 5 : Assert METIS is available. (optimal but recommended)
35 | The github page of METIS library is : https://github.com/KarypisLab/METIS
36 |
37 | #### Step 6 : Edit `make.inc`.
38 | Search `/path/to` in `make.inc`. Replace them to the path actually on your computer.
39 |
40 | #### Step 7 : Edit `examples/Makefile`
41 | The Makefile of example code doesn't include `make.inc`. Search `/path/to` in `examples/Makefile`. Replace them to the path actually on your computer.
42 |
43 | #### Step 8 : Decide if you want to use GPU.
44 | If you want to use GPU, you should :
45 | - Append `GPU_CUDA` in build_list.csv;
46 | - Add `-DGPU_OPEN` in `PANGULU_FLAGS`. You can find `PANGULU_FLAGS` in `make.inc`;
47 | - Uncomment `LINK_CUDA` in `examples/Makefile`.
48 |
49 | Vise versa.
50 |
51 | #### Step 9 : Run `make -j` in your terminal.
52 | Make sure the working directory of your terminal is the root directory of PanguLU. If PanguLU was successfully built, you will find `libpangulu.a` and `libpangulu.so` in `lib` directory, and `pangulu_example.elf` in `exampls` directory.
53 |
54 | ## Build flags
55 | `PANGULU_FLAGS` influences build behaviors. You can edit `PANGULU_FLAGS` in `make.inc` to implement different features of PanguLU. Here are available flags :
56 |
57 | #### Decide if or not using GPU.
58 | Use `-DGPU_OPEN` to use GPU, vice versa. Please notice that using this flag is not the only thing to do if you want to use GPU. Please check Step 8 in the Installation part.
59 |
60 | #### Decide the value type of matrix and vector entries.
61 | Use `-DCALCULATE_TYPE_R64` (double real) or `-DCALCULATE_TYPE_CR64` (double complex) or `-DCALCULATE_TYPE_R32` (float real) or `-DCALCULATE_TYPE_CR32` (float complex).
62 |
63 | #### Decide if or not using MC64 reordering algorithm.
64 | Use `-DPANGULU_MC64` to enable MC64 algorithm. Please notice that MC64 is not supported when matrix entries are complex numbers. If complex values are selected and `-DPANGULU_MC64` flag is used, MC64 would not enable.
65 |
66 | #### Decide if or not using METIS reordering tool.
67 | Use `-DMETIS` to enable METIS.
68 |
69 | #### Decide log level.
70 | Please select zero or one of these flags : `-DPANGULU_LOG_INFO`, `-DPANGULU_LOG_WARNING` or `-DPANGULU_LOG_ERROR`. Log level "INFO" prints all messages to standard output (including warnings and errors). Log level "WANRING" only prints warnings and errors. Log level "ERROR" only prints fatal errors causing PanguLU to terminate abnormally.
71 |
72 | #### Decide core binding strategy.
73 | Hyper-threading is not recommended. If you can't turn off the hyper-threading and each core of your CPU has 2 threads, using `-DHT_IS_OPEN`
74 | may reaps performance gain.
75 |
76 | ## Function interfaces
77 | To make it easier to call PanguLU in your software, PanguLU provides the following function interfaces:
78 |
79 | #### 1. pangulu_init()
80 | ```
81 | void pangulu_init(
82 | int pangulu_n, // Specifies the number of rows in the CSR format matrix.
83 | long long pangulu_nnz, // Specifies the total number of non-zero elements in the CSR format matrix.
84 | long *csr_rowptr, // Points to an array that stores pointers to rows of the CSR format matrix.
85 | int *csr_colidx, // Points to an array that stores indices to columns of the CSR format matrix.
86 | pangulu_calculate_type *csr_value, // Points to an array that stores the values of the CSR format matrix.
87 | pangulu_init_options *init_options, // Pointer to a pangulu_init_options structure.
88 | void **pangulu_handle // On return, contains a handle pointer to the library’s internal state.
89 | );
90 | ```
91 |
92 | #### 2. pangulu_gstrf()
93 | ```
94 | void pangulu_gstrf(
95 | pangulu_gstrf_options *gstrf_options, // Pointer to pangulu_gstrf_options structure.
96 | void **pangulu_handle // Pointer to the solver handle returned on initialization.
97 | );
98 | ```
99 |
100 | #### 3. pangulu_gstrs()
101 | ```
102 | void pangulu_gstrs(
103 | pangulu_calculate_type *rhs, // Pointer to the right-hand side vector.
104 | pangulu_gstrs_options *gstrs_options, // Pointer to the pangulu_gstrs_options structure.
105 | void** pangulu_handle // Pointer to the library internal state handle returned on initialization.
106 | );
107 | ```
108 |
109 | #### 4. pangulu_gssv()
110 | ```
111 | void pangulu_gssv(
112 | pangulu_calculate_type *rhs, // Pointer to the right-hand side vector.
113 | pangulu_gstrf_options *gstrf_options, // Pointer to a pangulu_gstrf_options structure.
114 | pangulu_gstrs_options *gstrs_options, // Pointer to a pangulu_gstrs_options structure.
115 | void **pangulu_handle // Pointer to the library internal status handle returned on initialization.
116 | );
117 | ```
118 |
119 | #### 5. pangulu_finalize()
120 | ```
121 | void pangulu_finalize(
122 | void **pangulu_handle // Pointer to the library internal state handle returned on initialization.
123 | );
124 | ```
125 |
126 | `example.c` is a sample program to call PanguLU. You can refer to this file to complete the call to PanguLU. You should first create the distributed matrix using `pangulu_init()`. If you need to solve multiple right-hand side vectors while the matrix is unchanged, you can call `pangulu_gstrs()` multiple times after calling `pangulu_gstrf()`. If you need to factorize a number of different matrices, call `pangulu_finalize()` after completing the solution of one matrix, and then use `pangulu_init()` to to initialize the next matrix.
127 |
128 | ## Executing the example code of PanguLU
129 | The test routines are placed in the `examples` directory. The routine in `examples/example.c` firstly call `pangulu_gstrf()` to perform LU factorization, and then call `pangulu_gstrs()` to solve linear equation.
130 | #### run command
131 |
132 | > **mpirun -np process_count ./pangulu_example.elf -nb block_size -f path_to_mtx**
133 |
134 | process_count : MPI process number to launch PanguLU;
135 |
136 | block_size : Rank of each non-zero block;
137 |
138 | path_to_mtx : The matrix name in mtx format.
139 |
140 | You can also use the run.sh, for example:
141 |
142 | > **bash run path_to_mtx block_size process_count**
143 |
144 | #### test sample
145 |
146 | > **mpirun -np 6 ./pangulu_example.elf -nb 4 -f Trefethen_20b.mtx**
147 |
148 | or use the run.sh:
149 | > **bash run.sh Trefethen_20b.mtx 4 6**
150 |
151 |
152 | In this example, 6 processes are used to test, the block_size is 4, matrix name is Trefethen_20b.mtx.
153 |
154 |
155 | ## Release versions
156 |
157 | ####
Version 4.2.0 (Dec. 13, 2024)
158 |
159 | * Updated preprocessing phase to distributed data structure.
160 |
161 | #### Version 4.1.0 (Sep. 1, 2024)
162 |
163 | * Optimized memory usage of numeric factorisation and solving;
164 | * Added parallel building support.
165 |
166 | #### Version 4.0.0 (Jul. 24, 2024)
167 |
168 | * Optimized user interfaces of solver routines;
169 | * Optimized performamce of numeric factorisation phase on CPU platform;
170 | * Added support on complex matrix solving;
171 | * Optimized preprocessing performance;
172 |
173 | #### Version 3.5.0 (Aug. 06, 2023)
174 |
175 | * Updated the pre-processing phase with OpenMP.
176 | * Updated the compilation method of PanguLU, compile libpangulu.so and libpangulu.a at the same time.
177 | * Updated timing for the reorder phase, the symbolic factorisation phase, the pre-processing phase.
178 | * Added GFLOPS for the numeric factorisation phase.
179 |
180 | #### Version 3.0.0 (Apr. 02, 2023)
181 |
182 | * Used adaptive selection sparse BLAS in the numeric factorisation phase.
183 | * Added the reorder phase.
184 | * Added the symbolic factorisation phase.
185 | * Added mc64 sorting algorithm in the reorder phase.
186 | * Added interface for 64-bit metis package in the reorder phase.
187 |
188 |
189 | #### Version 2.0.0 (Jul. 22, 2022)
190 |
191 | * Used a synchronisation-free scheduling strategy in the numeric factorisation phase.
192 | * Updated the MPI communication method in the numeric factorisation phase.
193 | * Added single precision in the numeric factorisation phase.
194 |
195 | #### Version 1.0.0 (Oct. 19, 2021)
196 |
197 | * Used a rule-based 2D LU factorisation scheduling strategy.
198 | * Used Sparse BLAS for floating point calculations on GPUs.
199 | * Added the pre-processing phase.
200 | * Added the numeric factorisation phase.
201 | * Added the triangular solve phase.
202 |
203 | ## Reference
204 |
205 | * [1] Xu Fu, Bingbin Zhang, Tengcheng Wang, Wenhao Li, Yuechen Lu, Enxin Yi, Jianqi Zhao, Xiaohan Geng, Fangying Li, Jingwen Zhang, Zhou Jin, Weifeng Liu. PanguLU: A Scalable Regular Two-Dimensional Block-Cyclic Sparse Direct Solver on Distributed Heterogeneous Systems. 36th ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’23). 2023.
206 |
207 |
208 |
--------------------------------------------------------------------------------
/build_helper.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python3
2 | import csv
3 | import os
4 | import sys
5 | import subprocess
6 |
7 | def generate_platform_names(build_list_path, platform_list_path):
8 | build_name_list = []
9 | with open(build_list_path, "r") as f:
10 | build_reader = csv.reader(f)
11 | for build_item in build_reader:
12 | if len(build_item) < 1:
13 | continue
14 | build_name_list.append(build_item[0])
15 |
16 | platform_list = []
17 | with open(platform_list_path, "r") as f:
18 | platform_reader = csv.reader(f)
19 | for platform_item in platform_reader:
20 | platform_list.append(platform_item)
21 |
22 | build_name_list_ret = []
23 | for name in build_name_list:
24 | for platform in platform_list:
25 | if len(platform) < 2:
26 | continue
27 | if platform[1] == name:
28 | build_name_list_ret.append(platform)
29 | break
30 | return build_name_list_ret
31 |
32 |
33 | def generate_platform_paths(build_platform_names, platform_list_path):
34 | platform_paths = []
35 | for platform in build_platform_names:
36 | platform_id = platform[0]
37 | assert(len(platform_id) == 7)
38 | platform_id_l1 = platform_id[0:2]
39 | platform_id_l2 = platform_id[2:4]
40 | platform_id_l3 = platform_id[4:7]
41 | dir_l1 = None
42 | dir_l2 = None
43 | dir_l3 = None
44 | dirs_l1 = [file for file in os.listdir(os.path.dirname(platform_list_path))]
45 | for current_dir_l1 in dirs_l1:
46 | if current_dir_l1[:2] == platform_id_l1:
47 | dir_l1 = current_dir_l1
48 | break
49 | dirs_l2 = [file for file in os.listdir(os.path.join(os.path.dirname(platform_list_path), dir_l1))]
50 | for current_dir_l2 in dirs_l2:
51 | if current_dir_l2[:2] == platform_id_l2:
52 | dir_l2 = current_dir_l2
53 | break
54 | dirs_l3 = [file for file in os.listdir(os.path.join(os.path.dirname(platform_list_path), dir_l1, dir_l2))]
55 | for current_dir_l3 in dirs_l3:
56 | if current_dir_l3[:3] == platform_id_l3:
57 | dir_l3 = current_dir_l3
58 | break
59 | platform_paths.append([platform_id, f"platforms/{dir_l1}/{dir_l2}/{dir_l3}"])
60 | return platform_paths
61 |
62 |
63 | def compile_platform_code(build_list_path, platform_list_path):
64 | build_platform_names = generate_platform_names(build_list_path, platform_list_path)
65 | build_platform_paths = generate_platform_paths(build_platform_names, platform_list_path)
66 | for build_platform_path in build_platform_paths:
67 | command = f"make -C src/{build_platform_path[1]}"
68 | print(command)
69 | return_code = subprocess.call(command.split())
70 | if return_code != 0:
71 | exit(return_code)
72 |
73 |
74 | if __name__ == "__main__":
75 | if sys.argv[1] == "compile_platform_code":
76 | compile_platform_code("build_list.csv", "src/platforms/platform_list.csv")
77 | else:
78 | print("[BUILD_HELPER_ERROR] Unknown command.")
79 | exit(1)
--------------------------------------------------------------------------------
/build_list.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SuperScientificSoftwareLaboratory/PanguLU/d11577cf0f5f1dae5ca02fd1d3128982e215ad60/build_list.csv
--------------------------------------------------------------------------------
/examples/Makefile:
--------------------------------------------------------------------------------
1 | LINK_METIS = /path/to/libmetis.a /path/to/libGKlib.a
2 | OPENBLAS_LIB = /path/to/libopenblas.a
3 | #LINK_CUDA = -L/path/to/cuda/lib64 -lcudart -lcusparse -lstdc++
4 | LINK_PANGULU = ../lib/libpangulu.a # Derictly importing static library as compiler input makes dynamic library loader searching the directory of static library.
5 |
6 | all: pangulu_example.elf
7 |
8 | pangulu_example.elf:example.c
9 | mpicc -O3 $< -DCALCULATE_TYPE_R64 -I../include $(LINK_PANGULU) $(LINK_CUDA) $(LINK_METIS) $(OPENBLAS_LIB) -fopenmp -lpthread -lm -o $@
10 |
11 | clean:
12 | rm -f *.elf
13 |
--------------------------------------------------------------------------------
/examples/Trefethen_20b.mtx:
--------------------------------------------------------------------------------
1 | %%MatrixMarket matrix coordinate integer symmetric
2 | %-------------------------------------------------------------------------------
3 | % UF Sparse Matrix Collection, Tim Davis
4 | % http://www.cise.ufl.edu/research/sparse/matrices/JGD_Trefethen/Trefethen_20b
5 | % name: JGD_Trefethen/Trefethen_20b
6 | % [Diagonal matrices with primes, Nick Trefethen, Oxford Univ.]
7 | % id: 2203
8 | % date: 2008
9 | % author: N. Trefethen
10 | % ed: J.-G. Dumas
11 | % fields: name title A id date author ed kind notes
12 | % kind: combinatorial problem
13 | %-------------------------------------------------------------------------------
14 | % notes:
15 | % Diagonal matrices with primes, Nick Trefethen, Oxford Univ.
16 | % From Jean-Guillaume Dumas' Sparse Integer Matrix Collection,
17 | % http://ljk.imag.fr/membres/Jean-Guillaume.Dumas/simc.html
18 | %
19 | % Problem 7 of the Hundred-dollar, Hundred-digit Challenge Problems,
20 | % SIAM News, vol 35, no. 1.
21 | %
22 | % 7. Let A be the 20,000 x 20,000 matrix whose entries are zero
23 | % everywhere except for the primes 2, 3, 5, 7, . . . , 224737 along the
24 | % main diagonal and the number 1 in all the positions A(i,j) with
25 | % |i-j| = 1,2,4,8, . . . ,16384. What is the (1,1) entry of inv(A)?
26 | %
27 | % http://www.siam.org/news/news.php?id=388
28 | %
29 | % Filename in JGD collection: Trefethen/trefethen_20__19_minor.sms
30 | %-------------------------------------------------------------------------------
31 | 19 19 83
32 | 1 1 3
33 | 2 1 1
34 | 3 1 1
35 | 5 1 1
36 | 9 1 1
37 | 17 1 1
38 | 2 2 5
39 | 3 2 1
40 | 4 2 1
41 | 6 2 1
42 | 10 2 1
43 | 18 2 1
44 | 3 3 7
45 | 4 3 1
46 | 5 3 1
47 | 7 3 1
48 | 11 3 1
49 | 19 3 1
50 | 4 4 11
51 | 5 4 1
52 | 6 4 1
53 | 8 4 1
54 | 12 4 1
55 | 5 5 13
56 | 6 5 1
57 | 7 5 1
58 | 9 5 1
59 | 13 5 1
60 | 6 6 17
61 | 7 6 1
62 | 8 6 1
63 | 10 6 1
64 | 14 6 1
65 | 7 7 19
66 | 8 7 1
67 | 9 7 1
68 | 11 7 1
69 | 15 7 1
70 | 8 8 23
71 | 9 8 1
72 | 10 8 1
73 | 12 8 1
74 | 16 8 1
75 | 9 9 29
76 | 10 9 1
77 | 11 9 1
78 | 13 9 1
79 | 17 9 1
80 | 10 10 31
81 | 11 10 1
82 | 12 10 1
83 | 14 10 1
84 | 18 10 1
85 | 11 11 37
86 | 12 11 1
87 | 13 11 1
88 | 15 11 1
89 | 19 11 1
90 | 12 12 41
91 | 13 12 1
92 | 14 12 1
93 | 16 12 1
94 | 13 13 43
95 | 14 13 1
96 | 15 13 1
97 | 17 13 1
98 | 14 14 47
99 | 15 14 1
100 | 16 14 1
101 | 18 14 1
102 | 15 15 53
103 | 16 15 1
104 | 17 15 1
105 | 19 15 1
106 | 16 16 59
107 | 17 16 1
108 | 18 16 1
109 | 17 17 61
110 | 18 17 1
111 | 19 17 1
112 | 18 18 67
113 | 19 18 1
114 | 19 19 71
115 |
--------------------------------------------------------------------------------
/examples/example.c:
--------------------------------------------------------------------------------
1 | typedef unsigned long long int sparse_pointer_t;
2 | #define MPI_SPARSE_POINTER_T MPI_UNSIGNED_LONG_LONG
3 | #define FMT_SPARSE_POINTER_T "%llu"
4 |
5 | typedef unsigned int sparse_index_t;
6 | #define MPI_SPARSE_INDEX_T MPI_UNSIGNED
7 | #define FMT_SPARSE_INDEX_T "%u"
8 |
9 | #if defined(CALCULATE_TYPE_R64)
10 | typedef double sparse_value_t;
11 | #elif defined(CALCULATE_TYPE_R32)
12 | typedef float sparse_value_t;
13 | #elif defined(CALCULATE_TYPE_CR64)
14 | typedef double _Complex sparse_value_t;
15 | typedef double sparse_value_real_t;
16 | #define COMPLEX_MTX
17 | #elif defined(CALCULATE_TYPE_CR32)
18 | typedef float _Complex sparse_value_t;
19 | typedef float sparse_value_real_t;
20 | #define COMPLEX_MTX
21 | #else
22 | #error[PanguLU Compile Error] Unknown value type. Set -DCALCULATE_TYPE_CR64 or -DCALCULATE_TYPE_R64 or -DCALCULATE_TYPE_CR32 or -DCALCULATE_TYPE_R32 in compile command line.
23 | #endif
24 |
25 | #include "../include/pangulu.h"
26 | #include
27 | #include
28 | #include
29 | #include
30 | #include
31 | #include "mmio_highlevel.h"
32 |
33 | #ifdef COMPLEX_MTX
34 | sparse_value_real_t complex_fabs(sparse_value_t x)
35 | {
36 | return sqrt(__real__(x) * __real__(x) + __imag__(x) * __imag__(x));
37 | }
38 |
39 | sparse_value_t complex_sqrt(sparse_value_t x)
40 | {
41 | sparse_value_t y;
42 | __real__(y) = sqrt(complex_fabs(x) + __real__(x)) / sqrt(2);
43 | __imag__(y) = (sqrt(complex_fabs(x) - __real__(x)) / sqrt(2)) * (__imag__(x) > 0 ? 1 : __imag__(x) == 0 ? 0
44 | : -1);
45 | return y;
46 | }
47 | #endif
48 |
49 | void read_command_params(int argc, char **argv, char *mtx_name, char *rhs_name, int *nb)
50 | {
51 | int c;
52 | extern char *optarg;
53 | while ((c = getopt(argc, argv, "nb:f:r:")) != EOF)
54 | {
55 | switch (c)
56 | {
57 | case 'b':
58 | *nb = atoi(optarg);
59 | continue;
60 | case 'f':
61 | strcpy(mtx_name, optarg);
62 | continue;
63 | case 'r':
64 | strcpy(rhs_name, optarg);
65 | continue;
66 | }
67 | }
68 | if ((nb) == 0)
69 | {
70 | printf("Error : nb is 0\n");
71 | exit(1);
72 | }
73 | }
74 |
75 | int main(int ARGC, char **ARGV)
76 | {
77 | // Step 1: Create varibles, initialize MPI environment.
78 | int provided = 0;
79 | int rank = 0, size = 0;
80 | int nb = 0;
81 | MPI_Init_thread(&ARGC, &ARGV, MPI_THREAD_MULTIPLE, &provided);
82 | MPI_Comm_rank(MPI_COMM_WORLD, &rank);
83 | MPI_Comm_size(MPI_COMM_WORLD, &size);
84 | sparse_index_t m = 0, n = 0, is_sym = 0;
85 | sparse_pointer_t nnz;
86 | sparse_pointer_t *rowptr = NULL;
87 | sparse_index_t *colidx = NULL;
88 | sparse_value_t *value = NULL;
89 | sparse_value_t *sol = NULL;
90 | sparse_value_t *rhs = NULL;
91 |
92 | // Step 2: Read matrix and rhs vectors.
93 | if (rank == 0)
94 | {
95 | char mtx_name[200] = {'\0'};
96 | char rhs_name[200] = {'\0'};
97 | read_command_params(ARGC, ARGV, mtx_name, rhs_name, &nb);
98 |
99 | printf("Reading matrix %s\n", mtx_name);
100 | mmio_info(&m, &n, &nnz, &is_sym, mtx_name);
101 | rowptr = (sparse_pointer_t *)malloc(sizeof(sparse_pointer_t) * (n + 1));
102 | colidx = (sparse_index_t *)malloc(sizeof(sparse_index_t) * nnz);
103 | value = (sparse_value_t *)malloc(sizeof(sparse_value_t) * nnz);
104 | mmio_data_csr(rowptr, colidx, value, mtx_name);
105 | printf("Read mtx done.\n");
106 |
107 | sol = (sparse_value_t *)malloc(sizeof(sparse_value_t) * n);
108 | rhs = (sparse_value_t *)malloc(sizeof(sparse_value_t) * n);
109 | for (int i = 0; i < n; i++)
110 | {
111 | rhs[i] = 0;
112 | for (sparse_pointer_t j = rowptr[i]; j < rowptr[i + 1]; j++)
113 | {
114 | rhs[i] += value[j];
115 | }
116 | sol[i] = rhs[i];
117 | }
118 | printf("Generate rhs done.\n");
119 | }
120 | MPI_Bcast(&n, 1, MPI_SPARSE_INDEX_T, 0, MPI_COMM_WORLD);
121 | MPI_Bcast(&nb, 1, MPI_INT, 0, MPI_COMM_WORLD);
122 | MPI_Barrier(MPI_COMM_WORLD);
123 |
124 | // Step 3: Initialize PanguLU solver.
125 | pangulu_init_options init_options;
126 | init_options.nb = nb;
127 | init_options.nthread = 20;
128 | void *pangulu_handle;
129 | pangulu_init(n, nnz, rowptr, colidx, value, &init_options, &pangulu_handle);
130 |
131 | // Step 4: Execute LU factorization.
132 | pangulu_gstrf_options gstrf_options;
133 | pangulu_gstrf(&gstrf_options, &pangulu_handle);
134 |
135 | // Step 5: Execute triangle solve using factorize results.
136 | pangulu_gstrs_options gstrs_options;
137 | pangulu_gstrs(sol, &gstrs_options, &pangulu_handle);
138 | MPI_Barrier(MPI_COMM_WORLD);
139 |
140 | // Step 6: Check the answer.
141 | sparse_value_t *rhs_computed;
142 | if (rank == 0)
143 | {
144 | // Step 6.1: Calculate rhs_computed = A * x.
145 | rhs_computed = (sparse_value_t *)malloc(sizeof(sparse_value_t) * n);
146 | for (int i = 0; i < n; i++)
147 | {
148 | rhs_computed[i] = 0.0;
149 | sparse_value_t c = 0.0;
150 | for (sparse_pointer_t j = rowptr[i]; j < rowptr[i + 1]; j++)
151 | {
152 | sparse_value_t num = value[j] * sol[colidx[j]];
153 | sparse_value_t z = num - c;
154 | sparse_value_t t = rhs_computed[i] + z;
155 | c = (t - rhs_computed[i]) - z;
156 | rhs_computed[i] = t;
157 | }
158 | }
159 |
160 | // Step 6.2: Calculate residual residual = rhs_comuted - rhs.
161 | sparse_value_t *residual = rhs_computed;
162 | for (int i = 0; i < n; i++)
163 | {
164 | residual[i] = rhs_computed[i] - rhs[i];
165 | }
166 |
167 | sparse_value_t sum, c;
168 | // Step 6.3: Calculte norm2 of residual.
169 | sum = 0.0;
170 | c = 0.0;
171 | for (int i = 0; i < n; i++)
172 | {
173 | sparse_value_t num = residual[i] * residual[i];
174 | sparse_value_t z = num - c;
175 | sparse_value_t t = sum + z;
176 | c = (t - sum) - z;
177 | sum = t;
178 | }
179 | #ifdef COMPLEX_MTX
180 | sparse_value_real_t residual_norm2 = complex_fabs(complex_sqrt(sum));
181 | #else
182 | sparse_value_t residual_norm2 = sqrt(sum);
183 | #endif
184 |
185 | // Step 6.4: Calculte norm2 of original rhs.
186 | sum = 0.0;
187 | c = 0.0;
188 | for (int i = 0; i < n; i++)
189 | {
190 | sparse_value_t num = rhs[i] * rhs[i];
191 | sparse_value_t z = num - c;
192 | sparse_value_t t = sum + z;
193 | c = (t - sum) - z;
194 | sum = t;
195 | }
196 | #ifdef COMPLEX_MTX
197 | sparse_value_real_t rhs_norm2 = complex_fabs(complex_sqrt(sum));
198 | #else
199 | sparse_value_t rhs_norm2 = sqrt(sum);
200 | #endif
201 |
202 | // Step 6.5: Calculate relative residual.
203 | double relative_residual = residual_norm2 / rhs_norm2;
204 | printf("|| Ax - b || / || b || = %le\n", relative_residual);
205 | }
206 |
207 | // Step 7: Clean and finalize.
208 | pangulu_finalize(&pangulu_handle);
209 | if (rank == 0)
210 | {
211 | free(rowptr);
212 | free(colidx);
213 | free(value);
214 | free(sol);
215 | free(rhs);
216 | free(rhs_computed);
217 | }
218 | MPI_Finalize();
219 | }
220 |
--------------------------------------------------------------------------------
/examples/mmio.h:
--------------------------------------------------------------------------------
1 | /*
2 | * Matrix Market I/O library for ANSI C
3 | *
4 | * See http://math.nist.gov/MatrixMarket for details.
5 | *
6 | *
7 | */
8 |
9 | #include
10 | #include
11 | #include
12 | #include
13 |
14 | #define MM_MTX_STR "matrix"
15 | #define MM_ARRAY_STR "array"
16 | #define MM_DENSE_STR "array"
17 | #define MM_COORDINATE_STR "coordinate"
18 | #define MM_SPARSE_STR "coordinate"
19 | #define MM_COMPLEX_STR "complex"
20 | #define MM_REAL_STR "real"
21 | #define MM_INT_STR "integer"
22 | #define MM_GENERAL_STR "general"
23 | #define MM_SYMM_STR "symmetric"
24 | #define MM_HERM_STR "hermitian"
25 | #define MM_SKEW_STR "skew-symmetric"
26 | #define MM_PATTERN_STR "pattern"
27 |
28 | #ifndef MM_IO_H
29 | #define MM_IO_H
30 |
31 | typedef char mm_typecode[4];
32 |
33 | char *mm_typecode_to_str(mm_typecode matcode);
34 |
35 | int mm_read_banner(FILE *f, mm_typecode *matcode);
36 | int mm_read_mtx_crd_size(FILE *f, sparse_index_t *M, sparse_index_t *N, sparse_pointer_t *nz);
37 | long mm_read_mtx_array_size(FILE *f, sparse_index_t *M, sparse_index_t *N);
38 |
39 | long mm_write_banner(FILE *f, mm_typecode matcode);
40 | long mm_write_mtx_crd_size(FILE *f, sparse_index_t M, sparse_index_t N, sparse_pointer_t nz);
41 | long mm_write_mtx_array_size(FILE *f, sparse_index_t M, sparse_index_t N);
42 |
43 | #define MM_MAX_LINE_LENGTH 1025
44 | #define MatrixMarketBanner "%%MatrixMarket"
45 | #define MM_MAX_TOKEN_LENGTH 64
46 |
47 |
48 | /********************* mm_typecode query fucntions ***************************/
49 |
50 | #define mm_is_matrix(typecode) ((typecode)[0]=='M')
51 |
52 | #define mm_is_sparse(typecode) ((typecode)[1]=='C')
53 | #define mm_is_coordinate(typecode)((typecode)[1]=='C')
54 | #define mm_is_dense(typecode) ((typecode)[1]=='A')
55 | #define mm_is_array(typecode) ((typecode)[1]=='A')
56 |
57 | #define mm_is_complex(typecode) ((typecode)[2]=='C')
58 | #define mm_is_real(typecode) ((typecode)[2]=='R')
59 | #define mm_is_pattern(typecode) ((typecode)[2]=='P')
60 | #define mm_is_integer(typecode) ((typecode)[2]=='I')
61 |
62 | #define mm_is_symmetric(typecode)((typecode)[3]=='S')
63 | #define mm_is_general(typecode) ((typecode)[3]=='G')
64 | #define mm_is_skew(typecode) ((typecode)[3]=='K')
65 | #define mm_is_hermitian(typecode)((typecode)[3]=='H')
66 |
67 | long mm_is_valid(mm_typecode matcode); /* too complex for a macro */
68 |
69 |
70 | /********************* mm_typecode modify fucntions ***************************/
71 |
72 | #define mm_set_matrix(typecode) ((*typecode)[0]='M')
73 | #define mm_set_coordinate(typecode) ((*typecode)[1]='C')
74 | #define mm_set_array(typecode) ((*typecode)[1]='A')
75 | #define mm_set_dense(typecode) mm_set_array(typecode)
76 | #define mm_set_sparse(typecode) mm_set_coordinate(typecode)
77 |
78 | #define mm_set_complex(typecode)((*typecode)[2]='C')
79 | #define mm_set_real(typecode) ((*typecode)[2]='R')
80 | #define mm_set_pattern(typecode)((*typecode)[2]='P')
81 | #define mm_set_integer(typecode)((*typecode)[2]='I')
82 |
83 |
84 | #define mm_set_symmetric(typecode)((*typecode)[3]='S')
85 | #define mm_set_general(typecode)((*typecode)[3]='G')
86 | #define mm_set_skew(typecode) ((*typecode)[3]='K')
87 | #define mm_set_hermitian(typecode)((*typecode)[3]='H')
88 |
89 | #define mm_clear_typecode(typecode) ((*typecode)[0]=(*typecode)[1]= \
90 | (*typecode)[2]=' ',(*typecode)[3]='G')
91 |
92 | #define mm_initialize_typecode(typecode) mm_clear_typecode(typecode)
93 |
94 |
95 | /********************* Matrix Market error codes ***************************/
96 |
97 |
98 | #define MM_COULD_NOT_READ_FILE 11
99 | #define MM_PREMATURE_EOF 12
100 | #define MM_NOT_MTX 13
101 | #define MM_NO_HEADER 14
102 | #define MM_UNSUPPORTED_TYPE 15
103 | #define MM_LINE_TOO_LONG 16
104 | #define MM_COULD_NOT_WRITE_FILE 17
105 |
106 |
107 | /******************** Matrix Market internal definitions ********************
108 |
109 | MM_matrix_typecode: 4-character sequence
110 |
111 | ojbect sparse/ data storage
112 | dense type scheme
113 |
114 | string position: [0] [1] [2] [3]
115 |
116 | Matrix typecode: M(atrix) C(oord) R(eal) G(eneral)
117 | A(array) C(omplex) H(ermitian)
118 | P(attern) S(ymmetric)
119 | I(nteger) K(kew)
120 |
121 | ***********************************************************************/
122 |
123 | #define MM_MTX_STR "matrix"
124 | #define MM_ARRAY_STR "array"
125 | #define MM_DENSE_STR "array"
126 | #define MM_COORDINATE_STR "coordinate"
127 | #define MM_SPARSE_STR "coordinate"
128 | #define MM_COMPLEX_STR "complex"
129 | #define MM_REAL_STR "real"
130 | #define MM_INT_STR "integer"
131 | #define MM_GENERAL_STR "general"
132 | #define MM_SYMM_STR "symmetric"
133 | #define MM_HERM_STR "hermitian"
134 | #define MM_SKEW_STR "skew-symmetric"
135 | #define MM_PATTERN_STR "pattern"
136 |
137 |
138 | /* high level routines */
139 |
140 | long mm_write_mtx_crd(char fname[], long M, long N, long nz, long I[], long J[],
141 | double val[], mm_typecode matcode);
142 | long mm_read_mtx_crd_data(FILE *f, long M, long N, long nz, long I[], long J[],
143 | double val[], mm_typecode matcode);
144 | long mm_read_mtx_crd_entry(FILE *f, long *I, long *J, double *real, double *img,
145 | mm_typecode matcode);
146 |
147 | long mm_read_unsymmetric_sparse(const char *fname, long *M_, long *N_, long *nz_,
148 | double **val_, long **I_, long **J_);
149 |
150 | char *mm_strdup(const char *s)
151 | {
152 | long len = strlen(s);
153 | char *s2 = (char *) malloc((len+1)*sizeof(char));
154 | return strcpy(s2, s);
155 | }
156 |
157 | char *mm_typecode_to_str(mm_typecode matcode)
158 | {
159 | char buffer[MM_MAX_LINE_LENGTH];
160 | char *types[4];
161 | char *mm_strdup(const char *);
162 | //long error =0;
163 |
164 | /* check for MTX type */
165 | if (mm_is_matrix(matcode))
166 | types[0] = (char *)MM_MTX_STR;
167 | //else
168 | // error=1;
169 |
170 | /* check for CRD or ARR matrix */
171 | if (mm_is_sparse(matcode))
172 | types[1] = (char *)MM_SPARSE_STR;
173 | else
174 | if (mm_is_dense(matcode))
175 | types[1] = (char *)MM_DENSE_STR;
176 | else
177 | return NULL;
178 |
179 | /* check for element data type */
180 | if (mm_is_real(matcode))
181 | types[2] = (char *)MM_REAL_STR;
182 | else
183 | if (mm_is_complex(matcode))
184 | types[2] = (char *)MM_COMPLEX_STR;
185 | else
186 | if (mm_is_pattern(matcode))
187 | types[2] = (char *)MM_PATTERN_STR;
188 | else
189 | if (mm_is_integer(matcode))
190 | types[2] = (char *)MM_INT_STR;
191 | else
192 | return NULL;
193 |
194 |
195 | /* check for symmetry type */
196 | if (mm_is_general(matcode))
197 | types[3] = (char *)MM_GENERAL_STR;
198 | else
199 | if (mm_is_symmetric(matcode))
200 | types[3] = (char *)MM_SYMM_STR;
201 | else
202 | if (mm_is_hermitian(matcode))
203 | types[3] = (char *)MM_HERM_STR;
204 | else
205 | if (mm_is_skew(matcode))
206 | types[3] = (char *)MM_SKEW_STR;
207 | else
208 | return NULL;
209 |
210 | sprintf(buffer,"%s %s %s %s", types[0], types[1], types[2], types[3]);
211 | return mm_strdup(buffer);
212 |
213 | }
214 |
215 | int mm_read_banner(FILE *f, mm_typecode *matcode)
216 | {
217 | char line[MM_MAX_LINE_LENGTH];
218 | char banner[MM_MAX_TOKEN_LENGTH];
219 | char mtx[MM_MAX_TOKEN_LENGTH];
220 | char crd[MM_MAX_TOKEN_LENGTH];
221 | char data_type[MM_MAX_TOKEN_LENGTH];
222 | char storage_scheme[MM_MAX_TOKEN_LENGTH];
223 | char *p;
224 |
225 |
226 | mm_clear_typecode(matcode);
227 |
228 | if (fgets(line, MM_MAX_LINE_LENGTH, f) == NULL)
229 | return MM_PREMATURE_EOF;
230 |
231 | if (sscanf(line, "%s %s %s %s %s", banner, mtx, crd, data_type,
232 | storage_scheme) != 5)
233 | return MM_PREMATURE_EOF;
234 |
235 | for (p=mtx; *p!='\0'; *p=tolower(*p),p++); /* convert to lower case */
236 | for (p=crd; *p!='\0'; *p=tolower(*p),p++);
237 | for (p=data_type; *p!='\0'; *p=tolower(*p),p++);
238 | for (p=storage_scheme; *p!='\0'; *p=tolower(*p),p++);
239 |
240 | /* check for banner */
241 | if (strncmp(banner, MatrixMarketBanner, strlen(MatrixMarketBanner)) != 0)
242 | return MM_NO_HEADER;
243 |
244 | /* first field should be "mtx" */
245 | if (strcmp(mtx, MM_MTX_STR) != 0)
246 | return MM_UNSUPPORTED_TYPE;
247 | mm_set_matrix(matcode);
248 |
249 |
250 | /* second field describes whether this is a sparse matrix (in coordinate
251 | storgae) or a dense array */
252 |
253 |
254 | if (strcmp(crd, MM_SPARSE_STR) == 0)
255 | mm_set_sparse(matcode);
256 | else
257 | if (strcmp(crd, MM_DENSE_STR) == 0)
258 | mm_set_dense(matcode);
259 | else
260 | return MM_UNSUPPORTED_TYPE;
261 |
262 |
263 | /* third field */
264 |
265 | if (strcmp(data_type, MM_REAL_STR) == 0)
266 | mm_set_real(matcode);
267 | else
268 | if (strcmp(data_type, MM_COMPLEX_STR) == 0)
269 | mm_set_complex(matcode);
270 | else
271 | if (strcmp(data_type, MM_PATTERN_STR) == 0)
272 | mm_set_pattern(matcode);
273 | else
274 | if (strcmp(data_type, MM_INT_STR) == 0)
275 | mm_set_integer(matcode);
276 | else
277 | return MM_UNSUPPORTED_TYPE;
278 |
279 |
280 | /* fourth field */
281 |
282 | if (strcmp(storage_scheme, MM_GENERAL_STR) == 0)
283 | mm_set_general(matcode);
284 | else
285 | if (strcmp(storage_scheme, MM_SYMM_STR) == 0)
286 | mm_set_symmetric(matcode);
287 | else
288 | if (strcmp(storage_scheme, MM_HERM_STR) == 0)
289 | mm_set_hermitian(matcode);
290 | else
291 | if (strcmp(storage_scheme, MM_SKEW_STR) == 0)
292 | mm_set_skew(matcode);
293 | else
294 | return MM_UNSUPPORTED_TYPE;
295 |
296 |
297 | return 0;
298 | }
299 |
300 | int mm_read_mtx_crd_size(FILE *f, sparse_index_t *M, sparse_index_t *N, sparse_pointer_t *nz)
301 | {
302 | char line[MM_MAX_LINE_LENGTH];
303 | int num_items_read;
304 |
305 | /* set return null parameter values, in case we exit with errors */
306 | *M = *N = *nz = 0;
307 |
308 | /* now continue scanning until you reach the end-of-comments */
309 | do
310 | {
311 | if (fgets(line,MM_MAX_LINE_LENGTH,f) == NULL)
312 | return MM_PREMATURE_EOF;
313 | }while (line[0] == '%');
314 |
315 | /* line[] is either blank or has M,N, nz */
316 | if (sscanf(line, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T " " FMT_SPARSE_POINTER_T, M, N, nz) == 3)
317 | return 0;
318 |
319 | else
320 | do
321 | {
322 | num_items_read = fscanf(f, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T " " FMT_SPARSE_POINTER_T, M, N, nz);
323 | if (num_items_read == EOF) return MM_PREMATURE_EOF;
324 | }
325 | while (num_items_read != 3);
326 |
327 | return 0;
328 | }
329 |
330 | long mm_read_mtx_array_size(FILE *f, sparse_index_t *M, sparse_index_t *N)
331 | {
332 | char line[MM_MAX_LINE_LENGTH];
333 | long num_items_read;
334 | /* set return null parameter values, in case we exit with errors */
335 | *M = *N = 0;
336 |
337 | /* now continue scanning until you reach the end-of-comments */
338 | do
339 | {
340 | if (fgets(line,MM_MAX_LINE_LENGTH,f) == NULL)
341 | return MM_PREMATURE_EOF;
342 | }while (line[0] == '%');
343 |
344 | /* line[] is either blank or has M,N, nz */
345 | if (sscanf(line, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T, M, N) == 2)
346 | return 0;
347 |
348 | else /* we have a blank line */
349 | do
350 | {
351 | num_items_read = fscanf(f, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T, M, N);
352 | if (num_items_read == EOF) return MM_PREMATURE_EOF;
353 | }
354 | while (num_items_read != 2);
355 |
356 | return 0;
357 | }
358 |
359 | long mm_write_banner(FILE *f, mm_typecode matcode)
360 | {
361 | char *str = mm_typecode_to_str(matcode);
362 | long ret_code;
363 |
364 | ret_code = fprintf(f, "%s %s\n", MatrixMarketBanner, str);
365 | free(str);
366 | if (ret_code !=2 )
367 | return MM_COULD_NOT_WRITE_FILE;
368 | else
369 | return 0;
370 | }
371 |
372 | long mm_write_mtx_crd_size(FILE *f, sparse_index_t M, sparse_index_t N, sparse_pointer_t nz)
373 | {
374 | if (fprintf(f, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T " " FMT_SPARSE_POINTER_T "\n", M, N, nz) != 3)
375 | return MM_COULD_NOT_WRITE_FILE;
376 | else
377 | return 0;
378 | }
379 |
380 | long mm_write_mtx_array_size(FILE *f, sparse_index_t M, sparse_index_t N)
381 | {
382 | if (fprintf(f, FMT_SPARSE_INDEX_T " " FMT_SPARSE_INDEX_T "\n", M, N) != 2)
383 | return MM_COULD_NOT_WRITE_FILE;
384 | else
385 | return 0;
386 | }
387 |
388 |
389 |
390 |
391 | long mm_is_valid(mm_typecode matcode) /* too complex for a macro */
392 | {
393 | if (!mm_is_matrix(matcode)) return 0;
394 | if (mm_is_dense(matcode) && mm_is_pattern(matcode)) return 0;
395 | if (mm_is_real(matcode) && mm_is_hermitian(matcode)) return 0;
396 | if (mm_is_pattern(matcode) && (mm_is_hermitian(matcode) ||
397 | mm_is_skew(matcode))) return 0;
398 | return 1;
399 | }
400 |
401 |
402 |
403 |
404 | /* high level routines */
405 |
406 | long mm_write_mtx_crd(char fname[], long M, long N, long nz, long I[], long J[],
407 | double val[], mm_typecode matcode)
408 | {
409 | FILE *f;
410 | long i;
411 |
412 | if (strcmp(fname, "stdout") == 0)
413 | f = stdout;
414 | else
415 | if ((f = fopen(fname, "w")) == NULL)
416 | return MM_COULD_NOT_WRITE_FILE;
417 |
418 | /* print banner followed by typecode */
419 | fprintf(f, "%s ", MatrixMarketBanner);
420 | fprintf(f, "%s\n", mm_typecode_to_str(matcode));
421 |
422 | /* print matrix sizes and nonzeros */
423 | fprintf(f, "%ld %ld %ld\n", M, N, nz);
424 |
425 | /* print values */
426 | if (mm_is_pattern(matcode))
427 | for (i=0; i
2 | #include
3 | typedef struct pangulu_init_options
4 | {
5 | int nthread;
6 | int nb;
7 | }pangulu_init_options;
8 |
9 | typedef struct pangulu_gstrf_options
10 | {
11 | }pangulu_gstrf_options;
12 |
13 | typedef struct pangulu_gstrs_options
14 | {
15 | }pangulu_gstrs_options;
--------------------------------------------------------------------------------
/lib/Makefile:
--------------------------------------------------------------------------------
1 | include ../make.inc
2 |
3 | all : oclean
4 |
5 | libs : libpangulu.so libpangulu.a
6 |
7 | libpangulu.so:
8 | $(MPICC) $(MPICCFLAGS) -shared -fPIC -o $@ ./pangulu*.o
9 | libpangulu.a:
10 | ar -rv -o $@ ./pangulu*.o
11 | - ranlib $@
12 |
13 | oclean: libs
14 | rm -f pangulu*.o
15 |
16 | clean:
17 | rm -f libpangulu.so
18 | rm -f libpangulu.a
19 |
--------------------------------------------------------------------------------
/make.inc:
--------------------------------------------------------------------------------
1 | COMPILE_LEVEL = -O3
2 |
3 | #0201000,GPU_CUDA
4 | CUDA_PATH = /usr/local/cuda
5 | CUDA_INC = -I/path/to/cuda/include
6 | CUDA_LIB = -L/path/to/cuda/lib64 -lcudart -lcusparse
7 | NVCC = nvcc $(COMPILE_LEVEL)
8 | NVCCFLAGS = $(PANGULU_FLAGS) -w -Xptxas -dlcm=cg -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_61,code=compute_61 $(CUDA_INC) $(CUDA_LIB)
9 |
10 | #general
11 | CC = gcc $(COMPILE_LEVEL) #-fsanitize=address
12 | MPICC = mpicc $(COMPILE_LEVEL) #-fsanitize=address
13 | OPENBLAS_INC = -I/path/to/openblas/include
14 | OPENBLAS_LIB = -L/path/to/openblas/lib -lopenblas
15 | MPICCFLAGS = $(OPENBLAS_INC) $(CUDA_INC) $(OPENBLAS_LIB) -fopenmp -lpthread -lm
16 | MPICCLINK = $(OPENBLAS_LIB)
17 | METISFLAGS = -I/path/to/gklib/include -I/path/to/metis/include
18 | PANGULU_FLAGS = -DPANGULU_LOG_INFO -DCALCULATE_TYPE_R64 -DMETIS -DPANGULU_MC64 #-DGPU_OPEN -DHT_IS_OPEN
19 |
--------------------------------------------------------------------------------
/src/Makefile:
--------------------------------------------------------------------------------
1 | include ../make.inc
2 | all:pangulu_host pangulu_platforms
3 |
4 | src:=$(wildcard *.c)
5 | pangulu_host:$(src:.c=.o)
6 |
7 | %.o:%.c
8 | $(MPICC) $(MPICCFLAGS) $(METISFLAGS) $(PANGULU_FLAGS) -c $< -o $@ -fPIC
9 | mv $@ ../lib
10 |
11 | pangulu_platforms:
12 | cd .. && python3 build_helper.py compile_platform_code
13 |
14 | clean:
15 | -(rm -f ../lib/pangulu*.o)
16 | -(rm -f ./pangulu*.o)
17 |
--------------------------------------------------------------------------------
/src/languages/pangulu_en.h:
--------------------------------------------------------------------------------
1 | #ifdef PANGULU_EN
2 |
3 | #ifdef PANGULU_LOG_ERROR
4 | #define PANGULU_E_NB_IS_ZERO "[PanguLU Error] nb is zero.\n"
5 | #define PANGULU_E_INVALID_HEAP_SELECT "[PanguLU Error] Invalid heap comparing strategy.\n"
6 | #define PANGULU_E_HEAP_FULL "[PanguLU Error] The heap is full on rank " FMT_PANGULU_INT32_T ".\n", rank
7 | #define PANGULU_E_HEAP_EMPTY "[PanguLU Error] The heap is empty on rank " FMT_PANGULU_INT32_T ".\n", rank
8 | #define PANGULU_E_CPU_MEM "[PanguLU Error] Failed to allocate " FMT_PANGULU_INT64_T " byte(s). CPU memory is not enough. %s:" FMT_PANGULU_INT64_T "\n", size, file, line
9 | #define PANGULU_E_ISEND_CSR "[PanguLU Error] pangulu_isend_whole_pangulu_smatrix_csr error. value != s->value.\n"
10 | #define PANGULU_E_ISEND_CSC "[PanguLU Error] pangulu_isend_whole_pangulu_smatrix_csc error. value != s->value_csc.\n"
11 | #define PANGULU_E_ROW_IS_NULL "[PanguLU Error] The matrix has zero row(s).\n"
12 | #define PANGULU_E_ROW_DONT_HAVE_DIA "[PanguLU Error] Row[" FMT_PANGULU_EXBLOCK_IDX "] don't have diagonal element.\n", i
13 | #define PANGULU_E_ERR_IN_RRCL "[PanguLU Error] Invalid numeric factorization task on rank " FMT_PANGULU_INT32_T ". row=" FMT_PANGULU_INT64_T " col=" FMT_PANGULU_INT64_T " level=" FMT_PANGULU_INT64_T "\n", rank, row, col, level
14 | #define PANGULU_E_K_ID "[PanguLU Error] Invalid kernel id " FMT_PANGULU_INT64_T " for numeric factorization.\n", kernel_id
15 | #define PANGULU_E_ASYM "[PanguLU Error] MPI_Barrier_asym error.\n"
16 | #define PANGULU_E_ADD_DIA "[PanguLU Error] pangulu_add_diagonal_element error\n"
17 | #define PANGULU_E_CUDA_MALLOC "[PanguLU Error] Failed to cudaMalloc %lu byte(s). GPU memory is not enough.\n", size
18 | #define PANGULU_E_ROW_IS_ZERO "[PanguLU Error] Invalid input matrix.\n"
19 | #define PANGULU_E_MAX_NULL "[PanguLU Error] pangulu_mc64 internal error. (now_row_max==0)\n"
20 | #define PANGULU_E_WORK_ERR "[PanguLU Error] Invalid kernel id " FMT_PANGULU_INT64_T " for sptrsv.\n", kernel_id
21 | #define PANGULU_E_BIP_PTR_INVALID "[PanguLU Error] Invalid pangulu_block_info pointer.\n"
22 | #define PANGULU_E_BIP_INVALID "[PanguLU Error] Invalid pangulu_block_info.\n"
23 | #define PANGULU_E_BIP_NOT_EMPTY "[PanguLU Error] Block info pool is not empty.\n"
24 | #define PANGULU_E_BIP_OUT_OF_RANGE "[PanguLU Error] PANGULU_BIP index out of range.\n"
25 | #define PANGULU_E_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_init)\n"
26 | #define PANGULU_E_GSTRF_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_gstrf)\n"
27 | #define PANGULU_E_GSTRS_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_gstrs)\n"
28 | #endif // PANGULU_LOG_ERROR
29 |
30 | #ifdef PANGULU_LOG_WARNING
31 | #define PANGULU_W_RANK_HEAP_DONT_NULL "[PanguLU Warning] " FMT_PANGULU_INT64_T " task remaining on rank " FMT_PANGULU_INT32_T ".\n", heap->length, rank
32 | #define PANGULU_W_ERR_RANK "[PanguLU Warning] Receiving message error on rank " FMT_PANGULU_INT32_T ".\n", rank
33 | #define PANGULU_W_BIP_INCREASE_SPEED_TOO_SMALL "[PanguLU Warning] PANGULU_BIP_INCREASE_SPEED too small.\n"
34 | #define PANGULU_W_GPU_BIG_BLOCK "[PanguLU Warning] When GPU is open, init_options->nb > 256 and pangulu_inblock_idx isn't pangulu_uint32_t, performance will be limited.\n"
35 | #define PANGULU_W_COMPLEX_FALLBACK "[PanguLU Warning] Calculating complex value on GPU is not supported. Fallback to CPU.\n"
36 | #endif // PANGULU_LOG_WARNING
37 |
38 | #ifdef PANGULU_LOG_INFO
39 | #define PANGULU_I_VECT2NORM_ERR "[PanguLU Info] || Ax - B || / || Ax || = %12.4le.\n", error
40 | #define PANGULU_I_CHECK_PASS "[PanguLU Info] Check ------------------------------------- pass\n"
41 | #define PANGULU_I_CHECK_ERROR "[PanguLU Info] Check ------------------------------------ error\n"
42 | #define PANGULU_I_DEV_IS "[PanguLU Info] Device is %s.\n", prop.name
43 | #define PANGULU_I_TASK_INFO "[PanguLU Info] Info of inserting task is: row=" FMT_PANGULU_INT64_T " col=" FMT_PANGULU_INT64_T " level=" FMT_PANGULU_INT64_T " kernel=" FMT_PANGULU_INT64_T ".\n", row, col, task_level, kernel_id
44 | #define PANGULU_I_HEAP_LEN "[PanguLU Info] heap.length=" FMT_PANGULU_INT64_T " heap.capacity=" FMT_PANGULU_INT64_T "\n", heap->length, heap->max_length
45 | #define PANGULU_I_ADAPTIVE_KERNEL_SELECTION_ON "[PanguLU Info] ADAPTIVE_KERNEL_SELECTION ------------- ON\n"
46 | #define PANGULU_I_ADAPTIVE_KERNEL_SELECTION_OFF "[PanguLU Info] ADAPTIVE_KERNEL_SELECTION ------------- OFF\n"
47 | #define PANGULU_I_SYNCHRONIZE_FREE_ON "[PanguLU Info] SYNCHRONIZE_FREE ---------------------- ON\n"
48 | #define PANGULU_I_SYNCHRONIZE_FREE_OFF "[PanguLU Info] SYNCHRONIZE_FREE ---------------------- OFF\n"
49 | #ifdef METIS
50 | #define PANGULU_I_BASIC_INFO "[PanguLU Info] n=" FMT_PANGULU_INT64_T " nnz=" FMT_PANGULU_EXBLOCK_PTR " nb=" FMT_PANGULU_INT32_T " mpi_process=" FMT_PANGULU_INT32_T " preprocessing_thread=%d METIS:%s\n", n, origin_smatrix->rowpointer[n], nb, size, init_options->nthread, (sizeof(idx_t) == 4) ? ("i32") : ((sizeof(idx_t) == 8) ? ("i64") : ("?"))
51 | #else
52 | #define PANGULU_I_BASIC_INFO "[PanguLU Info] n=" FMT_PANGULU_INT64_T " nnz=" FMT_PANGULU_EXBLOCK_PTR " nb=" FMT_PANGULU_INT32_T " mpi_process=" FMT_PANGULU_INT32_T " preprocessing_thread=%d\n", n, origin_smatrix->rowpointer[n], nb, size, init_options->nthread
53 | #endif
54 | #define PANGULU_I_TIME_REORDER "[PanguLU Info] Reordering time is %lf s.\n", elapsed_time
55 | #define PANGULU_I_TIME_SYMBOLIC "[PanguLU Info] Symbolic factorization time is %lf s.\n", elapsed_time
56 | #define PANGULU_I_TIME_PRE "[PanguLU Info] Preprocessing time is %lf s.\n", elapsed_time
57 | #define PANGULU_I_TIME_NUMERICAL "[PanguLU Info] Numeric factorization time is %lf s.\n", elapsed_time //, flop / pangulu_get_spend_time(common) / 1000000000.0
58 | #define PANGULU_I_TIME_SPTRSV "[PanguLU Info] Solving time is %lf s.\n", elapsed_time
59 | #define PANGULU_I_SYMBOLIC_NONZERO "[PanguLU Info] Symbolic nonzero count is " FMT_PANGULU_EXBLOCK_PTR ".\n",*symbolic_nnz
60 | #endif // PANGULU_LOG_INFO
61 |
62 | #endif // #ifdef PANGULU_EN
--------------------------------------------------------------------------------
/src/languages/pangulu_en_us.h:
--------------------------------------------------------------------------------
1 | #ifdef PANGULU_EN_US
2 |
3 | #ifdef PANGULU_LOG_ERROR
4 | #define PANGULU_E_NB_IS_ZERO "[PanguLU Error] nb is zero.\n"
5 | #define PANGULU_E_INVALID_HEAP_SELECT "[PanguLU Error] Invalid heap comparing strategy.\n"
6 | #define PANGULU_E_HEAP_FULL "[PanguLU Error] The heap is full on rank " FMT_PANGULU_INT32_T ".\n", rank
7 | #define PANGULU_E_HEAP_EMPTY "[PanguLU Error] The heap is empty on rank " FMT_PANGULU_INT32_T ".\n", rank
8 | #define PANGULU_E_CPU_MEM "[PanguLU Error] Failed to allocate " FMT_PANGULU_INT64_T " byte(s). CPU memory is not enough. %s:" FMT_PANGULU_INT64_T "\n", size, file, line
9 | #define PANGULU_E_ISEND_CSR "[PanguLU Error] pangulu_isend_whole_pangulu_smatrix_csr error. value != s->value.\n"
10 | #define PANGULU_E_ISEND_CSC "[PanguLU Error] pangulu_isend_whole_pangulu_smatrix_csc error. value != s->value_csc.\n"
11 | #define PANGULU_E_ROW_IS_NULL "[PanguLU Error] The matrix has zero row(s).\n"
12 | #define PANGULU_E_ROW_DONT_HAVE_DIA "[PanguLU Error] Row[" FMT_PANGULU_EXBLOCK_IDX "] don't have diagonal element.\n", i
13 | #define PANGULU_E_ERR_IN_RRCL "[PanguLU Error] Invalid numeric factorization task on rank " FMT_PANGULU_INT32_T ". row=" FMT_PANGULU_INT64_T " col=" FMT_PANGULU_INT64_T " level=" FMT_PANGULU_INT64_T "\n", rank, row, col, level
14 | #define PANGULU_E_K_ID "[PanguLU Error] Invalid kernel id " FMT_PANGULU_INT64_T " for numeric factorization.\n", kernel_id
15 | #define PANGULU_E_ASYM "[PanguLU Error] MPI_Barrier_asym error.\n"
16 | #define PANGULU_E_ADD_DIA "[PanguLU Error] pangulu_add_diagonal_element error\n"
17 | #define PANGULU_E_CUDA_MALLOC "[PanguLU Error] Failed to cudaMalloc %lu byte(s). GPU memory is not enough.\n", size
18 | #define PANGULU_E_ROW_IS_ZERO "[PanguLU Error] Invalid input matrix.\n"
19 | #define PANGULU_E_MAX_NULL "[PanguLU Error] pangulu_mc64 internal error. (now_row_max==0)\n"
20 | #define PANGULU_E_WORK_ERR "[PanguLU Error] Invalid kernel id " FMT_PANGULU_INT64_T " for sptrsv.\n", kernel_id
21 | #define PANGULU_E_BIP_PTR_INVALID "[PanguLU Error] Invalid pangulu_block_info pointer.\n"
22 | #define PANGULU_E_BIP_INVALID "[PanguLU Error] Invalid pangulu_block_info.\n"
23 | #define PANGULU_E_BIP_NOT_EMPTY "[PanguLU Error] Block info pool is not empty.\n"
24 | #define PANGULU_E_BIP_OUT_OF_RANGE "[PanguLU Error] PANGULU_BIP index out of range.\n"
25 | #define PANGULU_E_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_init)\n"
26 | #define PANGULU_E_GSTRF_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_gstrf)\n"
27 | #define PANGULU_E_GSTRS_OPTION_IS_NULLPTR "[PanguLU Error] Option struct pointer is NULL. (pangulu_gstrs)\n"
28 | #endif // PANGULU_LOG_ERROR
29 |
30 | #ifdef PANGULU_LOG_WARNING
31 | #define PANGULU_W_RANK_HEAP_DONT_NULL "[PanguLU Warning] " FMT_PANGULU_INT64_T " task remaining on rank " FMT_PANGULU_INT32_T ".\n", heap->length, rank
32 | #define PANGULU_W_ERR_RANK "[PanguLU Warning] Receiving message error on rank " FMT_PANGULU_INT32_T ".\n", rank
33 | #define PANGULU_W_BIP_INCREASE_SPEED_TOO_SMALL "[PanguLU Warning] PANGULU_BIP_INCREASE_SPEED too small.\n"
34 | #define PANGULU_W_GPU_BIG_BLOCK "[PanguLU Warning] When GPU is open, init_options->nb > 256 and pangulu_inblock_idx isn't pangulu_uint32_t, performance will be limited.\n"
35 | #define PANGULU_W_COMPLEX_FALLBACK "[PanguLU Warning] Calculating complex value on GPU is not supported. Fallback to CPU.\n"
36 | #endif // PANGULU_LOG_WARNING
37 |
38 | #ifdef PANGULU_LOG_INFO
39 | #define PANGULU_I_VECT2NORM_ERR "[PanguLU Info] || Ax - B || / || Ax || = %12.4le.\n", error
40 | #define PANGULU_I_CHECK_PASS "[PanguLU Info] Check ------------------------------------- pass\n"
41 | #define PANGULU_I_CHECK_ERROR "[PanguLU Info] Check ------------------------------------ error\n"
42 | #define PANGULU_I_DEV_IS "[PanguLU Info] Device is %s.\n", prop.name
43 | #define PANGULU_I_TASK_INFO "[PanguLU Info] Info of inserting task is: row=" FMT_PANGULU_INT64_T " col=" FMT_PANGULU_INT64_T " level=" FMT_PANGULU_INT64_T " kernel=" FMT_PANGULU_INT64_T ".\n", row, col, task_level, kernel_id
44 | #define PANGULU_I_HEAP_LEN "[PanguLU Info] heap.length=" FMT_PANGULU_INT64_T " heap.capacity=" FMT_PANGULU_INT64_T "\n", heap->length, heap->max_length
45 | #define PANGULU_I_ADAPTIVE_KERNEL_SELECTION_ON "[PanguLU Info] ADAPTIVE_KERNEL_SELECTION ------------- ON\n"
46 | #define PANGULU_I_ADAPTIVE_KERNEL_SELECTION_OFF "[PanguLU Info] ADAPTIVE_KERNEL_SELECTION ------------- OFF\n"
47 | #define PANGULU_I_SYNCHRONIZE_FREE_ON "[PanguLU Info] SYNCHRONIZE_FREE ---------------------- ON\n"
48 | #define PANGULU_I_SYNCHRONIZE_FREE_OFF "[PanguLU Info] SYNCHRONIZE_FREE ---------------------- OFF\n"
49 | #ifdef METIS
50 | #define PANGULU_I_BASIC_INFO "[PanguLU Info] n=" FMT_PANGULU_INT64_T " nnz=" FMT_PANGULU_EXBLOCK_PTR " nb=" FMT_PANGULU_INT32_T " mpi_process=" FMT_PANGULU_INT32_T " preprocessing_thread=%d METIS:%s\n", n, origin_smatrix->rowpointer[n], nb, size, init_options->nthread, (sizeof(idx_t) == 4) ? ("i32") : ((sizeof(idx_t) == 8) ? ("i64") : ("?"))
51 | #else
52 | #define PANGULU_I_BASIC_INFO "[PanguLU Info] n=" FMT_PANGULU_INT64_T " nnz=" FMT_PANGULU_EXBLOCK_PTR " nb=" FMT_PANGULU_INT32_T " mpi_process=" FMT_PANGULU_INT32_T " preprocessing_thread=%d\n", n, origin_smatrix->rowpointer[n], nb, size, init_options->nthread
53 | #endif
54 | #define PANGULU_I_TIME_REORDER "[PanguLU Info] Reordering time is %lf s.\n", pangulu_get_spend_time(common)
55 | #define PANGULU_I_TIME_SYMBOLIC "[PanguLU Info] Symbolic factorization time is %lf s.\n", pangulu_get_spend_time(common)
56 | #define PANGULU_I_TIME_PRE "[PanguLU Info] Preprocessing time is %lf s.\n", pangulu_get_spend_time(common)
57 | #define PANGULU_I_TIME_NUMERICAL "[PanguLU Info] Numeric factorization time is %lf s.\n", pangulu_get_spend_time(common) //, flop / pangulu_get_spend_time(common) / 1000000000.0
58 | #define PANGULU_I_TIME_SPTRSV "[PanguLU Info] Solving time is %lf s.\n", pangulu_get_spend_time(common)
59 | #define PANGULU_I_SYMBOLIC_NONZERO "[PanguLU Info] Symbolic nonzero count is " FMT_PANGULU_EXBLOCK_PTR ".\n",*symbolic_nnz
60 | #endif // PANGULU_LOG_INFO
61 |
62 | #endif // #ifdef PANGULU_EN_US
--------------------------------------------------------------------------------
/src/pangulu.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | pangulu_int64_t cpu_memory = 0;
4 | pangulu_int64_t cpu_peak_memory = 0;
5 | pangulu_int64_t gpu_memory = 0;
6 | pangulu_int64_t heap_select;
7 | calculate_type *temp_a_value = NULL;
8 | pangulu_int32_t *cuda_b_idx_col = NULL;
9 | calculate_type *cuda_temp_value = NULL;
10 | pangulu_int64_t *ssssm_col_ops_u = NULL;
11 | pangulu_int32_t *ssssm_ops_pointer = NULL;
12 | pangulu_int32_t *getrf_diagIndex_csc = NULL;
13 | pangulu_int32_t *getrf_diagIndex_csr = NULL;
14 |
15 | pangulu_int64_t STREAM_DENSE_INDEX = 0;
16 | pangulu_int64_t INDEX_NUM = 0;
17 | pangulu_int32_t pangu_omp_num_threads = 1;
18 |
19 | pangulu_int64_t flop = 0;
20 | double time_transpose = 0.0;
21 | double time_isend = 0.0;
22 | double time_receive = 0.0;
23 | double time_getrf = 0.0;
24 | double time_tstrf = 0.0;
25 | double time_gessm = 0.0;
26 | double time_gessm_dense = 0.0;
27 | double time_gessm_sparse = 0.0;
28 | double time_ssssm = 0.0;
29 | double time_cuda_memcpy = 0.0;
30 | double time_wait = 0.0;
31 | double calculate_time_wait = 0.0;
32 | pangulu_int64_t calculate_time = 0;
33 |
34 | pangulu_int32_t *ssssm_hash_lu = NULL;
35 | pangulu_int32_t *ssssm_hash_l_row = NULL;
36 | pangulu_int32_t zip_cur_id = 0;
37 | calculate_type *ssssm_l_value = NULL;
38 | calculate_type *ssssm_u_value = NULL;
39 | pangulu_int32_t *ssssm_hash_u_col = NULL;
40 |
41 | pangulu_int32_t rank;
42 | pangulu_int32_t global_level;
43 | pangulu_int32_t omp_thread;
44 |
45 | void pangulu_init(pangulu_exblock_idx pangulu_n, pangulu_exblock_ptr pangulu_nnz, pangulu_exblock_ptr *csr_rowptr, pangulu_exblock_idx *csr_colidx, calculate_type *csr_value, pangulu_init_options *init_options, void **pangulu_handle)
46 | {
47 | MPI_Comm_rank(MPI_COMM_WORLD, &rank);
48 |
49 | struct timeval time_start;
50 | double elapsed_time;
51 |
52 | pangulu_int32_t size;
53 | MPI_Comm_size(MPI_COMM_WORLD, &size);
54 | pangulu_common *common = (pangulu_common *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_common));
55 | common->rank = rank;
56 | common->size = size;
57 | common->n = pangulu_n;
58 | #ifdef GPU_OPEN
59 | if (init_options->nb > 256 && sizeof(pangulu_inblock_idx) == 2)
60 | {
61 | init_options->nb = 256;
62 | if (rank == 0)
63 | {
64 | printf(PANGULU_W_GPU_BIG_BLOCK);
65 | }
66 | }
67 | #endif
68 |
69 | if (rank == 0)
70 | {
71 | if (init_options == NULL)
72 | {
73 | printf(PANGULU_E_OPTION_IS_NULLPTR);
74 | pangulu_exit(1);
75 | }
76 | if (init_options->nb == 0)
77 | {
78 | printf(PANGULU_E_NB_IS_ZERO);
79 | pangulu_exit(1);
80 | }
81 | }
82 |
83 | common->nb = init_options->nb;
84 | common->sum_rank_size = size;
85 | common->omp_thread = init_options->nthread;
86 | MPI_Bcast(&common->n, 1, MPI_PANGULU_EXBLOCK_IDX, 0, MPI_COMM_WORLD);
87 | MPI_Bcast(&common->nb, 1, MPI_PANGULU_INBLOCK_IDX, 0, MPI_COMM_WORLD);
88 |
89 | pangulu_int64_t tmp_p = sqrt(common->sum_rank_size);
90 | while (((common->sum_rank_size) % tmp_p) != 0)
91 | {
92 | tmp_p--;
93 | }
94 |
95 | common->p = tmp_p;
96 | common->q = common->sum_rank_size / tmp_p;
97 | pangulu_origin_smatrix *origin_smatrix = (pangulu_origin_smatrix *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_origin_smatrix));
98 | pangulu_init_pangulu_origin_smatrix(origin_smatrix);
99 |
100 | if (rank == 0)
101 | {
102 | struct timeval start, end;
103 | gettimeofday(&start, NULL);
104 | pangulu_read_pangulu_origin_smatrix(origin_smatrix, pangulu_n, pangulu_nnz, csr_rowptr, csr_colidx, csr_value);
105 | gettimeofday(&end, NULL);
106 | if (origin_smatrix->row == 0)
107 | {
108 | printf(PANGULU_E_ROW_IS_ZERO);
109 | pangulu_exit(1);
110 | }
111 | }
112 |
113 | pangulu_int32_t p = common->p;
114 | pangulu_int32_t q = common->q;
115 | pangulu_int32_t nb = common->nb;
116 | MPI_Barrier(MPI_COMM_WORLD);
117 | common->n = pangulu_bcast_n(origin_smatrix->row, 0);
118 | pangulu_int64_t n = common->n;
119 | omp_set_num_threads(init_options->nthread);
120 | #if defined(OPENBLAS_CONFIG_H) || defined(OPENBLAS_VERSION)
121 | openblas_set_num_threads(1);
122 | #endif
123 | if (rank == 0)
124 | {
125 | // #ifdef ADAPTIVE_KERNEL_SELECTION
126 | // printf(PANGULU_I_ADAPTIVE_KERNEL_SELECTION_ON);
127 | // #else
128 | // printf(PANGULU_I_ADAPTIVE_KERNEL_SELECTION_OFF);
129 | // #endif
130 | // #ifdef SYNCHRONIZE_FREE
131 | // printf(PANGULU_I_SYNCHRONIZE_FREE_ON);
132 | // #else
133 | // printf(PANGULU_I_SYNCHRONIZE_FREE_OFF);
134 | // #endif
135 | #ifdef PANGULU_GPU_COMPLEX_FALLBACK_FLAG
136 | printf(PANGULU_W_COMPLEX_FALLBACK);
137 | #endif
138 | omp_thread = pangu_omp_num_threads;
139 | printf(PANGULU_I_BASIC_INFO);
140 | }
141 |
142 | #ifdef GPU_OPEN
143 | pangulu_cuda_device_init(rank);
144 | #endif
145 |
146 | pangulu_block_smatrix *block_smatrix = (pangulu_block_smatrix *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_block_smatrix));
147 | pangulu_init_pangulu_block_smatrix(block_smatrix);
148 | pangulu_block_common *block_common = (pangulu_block_common *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_block_common));
149 | block_common->rank = rank;
150 | block_common->p = p;
151 | block_common->q = q;
152 | block_common->nb = nb;
153 | block_common->n = n;
154 | block_common->block_length = pangulu_Calculate_Block(n, nb);
155 | block_common->sum_rank_size = common->sum_rank_size;
156 | block_common->max_pq = PANGULU_MAX(p, q);
157 | block_common->every_level_length = block_common->block_length;
158 | pangulu_bip_init(&(block_smatrix->BIP), block_common->block_length * (block_common->block_length + 1));
159 |
160 | #ifdef SYNCHRONIZE_FREE
161 | block_common->every_level_length = 10;
162 | #else
163 | block_common->every_level_length = 1;
164 | #endif
165 |
166 | pangulu_origin_smatrix *reorder_matrix = (pangulu_origin_smatrix *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_origin_smatrix));
167 | pangulu_init_pangulu_origin_smatrix(reorder_matrix);
168 |
169 | block_common->rank_row_length = (block_common->block_length / p + (((block_common->block_length % p) > (rank / q)) ? 1 : 0));
170 | block_common->rank_col_length = (block_common->block_length / q + (((block_common->block_length % q) > (rank % q)) ? 1 : 0));
171 | block_common->every_level_length = PANGULU_MIN(block_common->every_level_length, block_common->block_length);
172 | MPI_Barrier(MPI_COMM_WORLD);
173 | pangulu_time_start(&time_start);
174 |
175 | pangulu_reorder(block_smatrix,
176 | origin_smatrix,
177 | reorder_matrix);
178 |
179 | MPI_Barrier(MPI_COMM_WORLD);
180 | elapsed_time = pangulu_time_stop(&time_start);
181 | if (rank == 0)
182 | {
183 | printf(PANGULU_I_TIME_REORDER);
184 | }
185 |
186 | calculate_time = 0;
187 |
188 | MPI_Barrier(MPI_COMM_WORLD);
189 | pangulu_time_start(&time_start);
190 | if (rank == 0)
191 | {
192 | pangulu_symbolic(block_common,
193 | block_smatrix,
194 | reorder_matrix);
195 | }
196 |
197 | MPI_Barrier(MPI_COMM_WORLD);
198 | elapsed_time = pangulu_time_stop(&time_start);
199 | if (rank == 0)
200 | {
201 | printf(PANGULU_I_TIME_SYMBOLIC);
202 | }
203 |
204 | pangulu_init_heap_select(0);
205 |
206 | MPI_Barrier(MPI_COMM_WORLD);
207 | pangulu_time_start(&time_start);
208 | pangulu_preprocessing(
209 | block_common,
210 | block_smatrix,
211 | reorder_matrix,
212 | init_options->nthread);
213 |
214 | MPI_Barrier(MPI_COMM_WORLD);
215 |
216 | elapsed_time = pangulu_time_stop(&time_start);
217 | if (rank == 0)
218 | {
219 | printf(PANGULU_I_TIME_PRE);
220 | }
221 |
222 | // pangulu_free(__FILE__, __LINE__, block_smatrix->symbolic_rowpointer);
223 | // block_smatrix->symbolic_rowpointer = NULL;
224 |
225 | // pangulu_free(__FILE__, __LINE__, block_smatrix->symbolic_columnindex);
226 | // block_smatrix->symbolic_columnindex = NULL;
227 |
228 | pangulu_free(__FILE__, __LINE__, origin_smatrix);
229 | origin_smatrix = NULL;
230 |
231 | pangulu_free(__FILE__, __LINE__, reorder_matrix->rowpointer);
232 | pangulu_free(__FILE__, __LINE__, reorder_matrix->columnindex);
233 | pangulu_free(__FILE__, __LINE__, reorder_matrix->value);
234 | pangulu_free(__FILE__, __LINE__, reorder_matrix);
235 | reorder_matrix = NULL;
236 |
237 | (*pangulu_handle) = pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_handle_t));
238 | (*(pangulu_handle_t **)pangulu_handle)->block_common = block_common;
239 | (*(pangulu_handle_t **)pangulu_handle)->block_smatrix = block_smatrix;
240 | (*(pangulu_handle_t **)pangulu_handle)->commmon = common;
241 | }
242 |
243 | void pangulu_gstrf(pangulu_gstrf_options *gstrf_options, void **pangulu_handle)
244 | {
245 | pangulu_block_common *block_common = (*(pangulu_handle_t **)pangulu_handle)->block_common;
246 | pangulu_block_smatrix *block_smatrix = (*(pangulu_handle_t **)pangulu_handle)->block_smatrix;
247 | pangulu_common *common = (*(pangulu_handle_t **)pangulu_handle)->commmon;
248 |
249 | struct timeval time_start;
250 | double elapsed_time;
251 |
252 | if (rank == 0)
253 | {
254 | if (gstrf_options == NULL)
255 | {
256 | printf(PANGULU_E_GSTRF_OPTION_IS_NULLPTR);
257 | pangulu_exit(1);
258 | }
259 | }
260 |
261 | #ifdef CHECK_TIME
262 | pangulu_time_init();
263 | #endif
264 | MPI_Barrier(MPI_COMM_WORLD);
265 |
266 | #ifdef OVERLAP
267 | pangulu_create_pthread(block_common,
268 | block_smatrix);
269 | #endif
270 |
271 | pangulu_time_init();
272 | MPI_Barrier(MPI_COMM_WORLD);
273 | pangulu_time_start(&time_start);
274 |
275 | pangulu_numeric(block_common,
276 | block_smatrix);
277 |
278 | MPI_Barrier(MPI_COMM_WORLD);
279 | elapsed_time = pangulu_time_stop(&time_start);
280 |
281 | if (rank == 0)
282 | {
283 |
284 | pangulu_int64_t another_calculate_time = 0;
285 | for (pangulu_int64_t i = 1; i < block_common->sum_rank_size; i++)
286 | {
287 | pangulu_recv_vector_int(&another_calculate_time, 1, i, 0);
288 | calculate_time += another_calculate_time;
289 | }
290 | flop = calculate_time * 2;
291 | }
292 | else
293 | {
294 | pangulu_send_vector_int(&calculate_time, 1, 0, 0);
295 | }
296 |
297 | if (rank == 0)
298 | {
299 | printf(PANGULU_I_TIME_NUMERICAL);
300 | }
301 | }
302 |
303 | void pangulu_gstrs(calculate_type *rhs, pangulu_gstrs_options *gstrs_options, void **pangulu_handle)
304 | {
305 | pangulu_block_common *block_common = (*(pangulu_handle_t **)pangulu_handle)->block_common;
306 | pangulu_block_smatrix *block_smatrix = (*(pangulu_handle_t **)pangulu_handle)->block_smatrix;
307 | pangulu_common *common = (*(pangulu_handle_t **)pangulu_handle)->commmon;
308 |
309 | struct timeval time_start;
310 | double elapsed_time;
311 |
312 | if (rank == 0)
313 | {
314 | if (gstrs_options == NULL)
315 | {
316 | printf(PANGULU_E_GSTRS_OPTION_IS_NULLPTR);
317 | pangulu_exit(1);
318 | }
319 | }
320 |
321 | pangulu_int64_t vector_length = common->n;
322 | pangulu_vector *x_vector = NULL;
323 | pangulu_vector *b_vector = NULL;
324 | pangulu_vector *answer_vector = NULL;
325 |
326 | if (rank == 0)
327 | {
328 | x_vector = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
329 | b_vector = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
330 | answer_vector = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
331 | b_vector->row = common->n;
332 | b_vector->value = rhs;
333 | pangulu_init_pangulu_vector(x_vector, vector_length);
334 | pangulu_init_pangulu_vector(answer_vector, vector_length);
335 | pangulu_reorder_vector_b_tran(block_smatrix, b_vector, answer_vector);
336 | }
337 |
338 | pangulu_sptrsv_preprocessing(
339 | block_common,
340 | block_smatrix,
341 | answer_vector);
342 |
343 | #ifdef PANGULU_SPTRSV
344 |
345 | MPI_Barrier(MPI_COMM_WORLD);
346 | pangulu_time_start(&time_start);
347 |
348 | pangulu_sptrsv_L(block_common, block_smatrix);
349 | pangulu_init_heap_select(4);
350 | pangulu_sptrsv_U(block_common, block_smatrix);
351 |
352 | MPI_Barrier(MPI_COMM_WORLD);
353 | elapsed_time = pangulu_time_stop(&time_start);
354 |
355 | if (rank == 0)
356 | {
357 | printf(PANGULU_I_TIME_SPTRSV);
358 | }
359 |
360 | #endif
361 |
362 | // check sptrsv answer
363 | pangulu_sptrsv_vector_gather(block_common, block_smatrix, answer_vector);
364 |
365 | int n = common->n;
366 |
367 | if (rank == 0)
368 | {
369 | pangulu_reorder_vector_x_tran(block_smatrix, answer_vector, x_vector);
370 |
371 | for (int i = 0; i < n; i++)
372 | {
373 | rhs[i] = x_vector->value[i];
374 | }
375 |
376 | pangulu_destroy_pangulu_vector(x_vector);
377 | pangulu_destroy_pangulu_vector(answer_vector);
378 | pangulu_free(__FILE__, __LINE__, b_vector);
379 | }
380 | }
381 |
382 | void pangulu_gssv(calculate_type *rhs, pangulu_gstrf_options *gstrf_options, pangulu_gstrs_options *gstrs_options, void **pangulu_handle)
383 | {
384 | pangulu_gstrf(gstrf_options, pangulu_handle);
385 | pangulu_gstrs(rhs, gstrs_options, pangulu_handle);
386 | }
387 |
388 | void pangulu_finalize(void **pangulu_handle)
389 | {
390 | pangulu_block_common *block_common = (*(pangulu_handle_t **)pangulu_handle)->block_common;
391 | pangulu_block_smatrix *block_smatrix = (*(pangulu_handle_t **)pangulu_handle)->block_smatrix;
392 | pangulu_common *common = (*(pangulu_handle_t **)pangulu_handle)->commmon;
393 |
394 | pangulu_destroy(block_common, block_smatrix);
395 |
396 | pangulu_free(__FILE__, __LINE__, block_common);
397 | pangulu_free(__FILE__, __LINE__, block_smatrix);
398 | pangulu_free(__FILE__, __LINE__, common);
399 | pangulu_free(__FILE__, __LINE__, *(pangulu_handle_t **)pangulu_handle);
400 | }
--------------------------------------------------------------------------------
/src/pangulu_addmatrix.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | void pangulu_add_pangulu_smatrix_cpu(pangulu_smatrix *a,
4 | pangulu_smatrix *b)
5 | {
6 | for (pangulu_int64_t i = 0; i < a->nnz; i++)
7 | {
8 | a->value_csc[i] += b->value_csc[i];
9 | }
10 | }
11 |
12 | void pangulu_add_pangulu_smatrix_csr_to_csc(pangulu_smatrix *a)
13 | {
14 | for (pangulu_int64_t i = 0; i < a->nnz; i++)
15 | {
16 | a->value_csc[i] += a->value[i];
17 | }
18 | }
--------------------------------------------------------------------------------
/src/pangulu_addmatrix_cuda.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | void pangulu_add_pangulu_smatrix_cuda(pangulu_smatrix *a,
4 | pangulu_smatrix *b)
5 | {
6 | #ifdef GPU_OPEN
7 | pangulu_cuda_vector_add_kernel(a->nnz, a->cuda_value, b->cuda_value);
8 | #endif
9 | }
--------------------------------------------------------------------------------
/src/pangulu_check.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | void pangulu_multiply_upper_upper_u(pangulu_block_common *block_common,
4 | pangulu_block_smatrix *block_smatrix,
5 | pangulu_vector *x, pangulu_vector *b)
6 | {
7 | pangulu_int64_t block_length = block_common->block_length;
8 | pangulu_int64_t nb = block_common->nb;
9 | pangulu_block_info_pool* BIP = block_smatrix->BIP;
10 | pangulu_smatrix *big_smatrix_value = block_smatrix->big_pangulu_smatrix_value;
11 | pangulu_smatrix **diagonal_U = block_smatrix->diagonal_smatrix_u;
12 | pangulu_int64_t *mapper_diagonal = block_smatrix->mapper_diagonal;
13 | if(block_smatrix->current_rank_block_count == 0){
14 | return;
15 | }
16 | for (pangulu_int64_t row = 0; row < block_length; row++)
17 | {
18 | pangulu_int64_t row_offset = row * nb;
19 | for (pangulu_int64_t col = row; col < block_length; col++)
20 | {
21 | pangulu_int64_t mapper_index = pangulu_bip_get(row * block_length + col, BIP)->mapper_a;
22 | pangulu_int64_t col_offset = col * nb;
23 | if (row == col)
24 | {
25 | pangulu_int64_t diagonal_index = mapper_diagonal[row];
26 | pangulu_pangulu_smatrix_multiply_block_pangulu_vector_csc(diagonal_U[diagonal_index],
27 | x->value + col_offset,
28 | b->value + row_offset);
29 | if (rank == -1)
30 | {
31 | pangulu_display_pangulu_smatrix_csc(diagonal_U[diagonal_index]);
32 | }
33 | }
34 | else
35 | {
36 | pangulu_pangulu_smatrix_multiply_block_pangulu_vector_csc(&big_smatrix_value[mapper_index],
37 | x->value + col_offset,
38 | b->value + row_offset);
39 |
40 | }
41 | }
42 | }
43 | }
44 |
45 | void pangulu_multiply_triggle_l(pangulu_block_common *block_common,
46 | pangulu_block_smatrix *block_smatrix,
47 | pangulu_vector *x, pangulu_vector *b)
48 | {
49 | pangulu_int64_t block_length = block_common->block_length;
50 | pangulu_int64_t nb = block_common->nb;
51 | pangulu_block_info_pool* BIP = block_smatrix->BIP;
52 | pangulu_smatrix *big_smatrix_value = block_smatrix->big_pangulu_smatrix_value;
53 | pangulu_smatrix **diagonal_L = block_smatrix->diagonal_smatrix_l;
54 | pangulu_int64_t *mapper_diagonal = block_smatrix->mapper_diagonal;
55 | if(block_smatrix->current_rank_block_count == 0){
56 | return;
57 | }
58 | for (pangulu_int64_t row = 0; row < block_length; row++)
59 | {
60 | pangulu_int64_t row_offset = row * nb;
61 | for (pangulu_int64_t col = 0; col <= row; col++)
62 | {
63 | pangulu_int64_t mapper_index = pangulu_bip_get(row * block_length + col, BIP)->mapper_a;
64 | pangulu_int64_t col_offset = col * nb;
65 | if (row == col)
66 | {
67 | pangulu_int64_t diagonal_index = mapper_diagonal[col];
68 | pangulu_pangulu_smatrix_multiply_block_pangulu_vector_csc(diagonal_L[diagonal_index],
69 | x->value + col_offset,
70 | b->value + row_offset);
71 | }
72 | else
73 | {
74 | pangulu_pangulu_smatrix_multiply_block_pangulu_vector_csc(&big_smatrix_value[mapper_index],
75 | x->value + col_offset,
76 | b->value + row_offset);
77 | }
78 | }
79 | }
80 | }
81 |
82 | void pangulu_gather_pangulu_vector_to_rank_0(pangulu_int64_t rank,
83 | pangulu_vector *gather_v,
84 | pangulu_int64_t vector_length,
85 | pangulu_int64_t sum_rank_size)
86 | {
87 | if (rank == 0)
88 | {
89 | pangulu_vector *save_vector = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
90 | pangulu_init_pangulu_vector(save_vector, vector_length);
91 |
92 | for (pangulu_int64_t i = 1; i < sum_rank_size; i++)
93 | {
94 | pangulu_recv_pangulu_vector_value(save_vector, i, i, vector_length);
95 | for (pangulu_int64_t j = 0; j < vector_length; j++)
96 | {
97 | gather_v->value[j] += save_vector->value[j];
98 | }
99 | }
100 | for (pangulu_int64_t i = 1; i < sum_rank_size; i++)
101 | {
102 | pangulu_send_pangulu_vector_value(gather_v, i, i, vector_length);
103 | }
104 | pangulu_free(__FILE__, __LINE__, save_vector->value);
105 | pangulu_free(__FILE__, __LINE__, save_vector);
106 | }
107 | else
108 | {
109 | pangulu_send_pangulu_vector_value(gather_v, 0, rank, vector_length);
110 | pangulu_recv_pangulu_vector_value(gather_v, 0, rank, vector_length);
111 | }
112 | }
113 |
114 | calculate_type vec2norm(const calculate_type *x, pangulu_int64_t n)
115 | {
116 | calculate_type sum = 0.0;
117 | for (pangulu_int64_t i = 0; i < n; i++)
118 | sum += x[i] * x[i];
119 | return sqrt(sum);
120 | }
121 |
122 | calculate_type sub_vec2norm(const calculate_type *x1, const calculate_type *x2, pangulu_int64_t n)
123 | {
124 | calculate_type sum = 0.0;
125 | for (pangulu_int64_t i = 0; i < n; i++)
126 | sum += (x1[i] - x2[i]) * (x1[i] - x2[i]);
127 | return sqrt(sum);
128 | }
129 |
130 | void pangulu_check_answer_vec2norm(pangulu_vector *X1, pangulu_vector *X2, pangulu_int64_t n)
131 | {
132 | calculate_type vec2 = vec2norm(X1->value, n);
133 | double error = sub_vec2norm(X1->value, X2->value, n) / vec2;
134 |
135 | printf(PANGULU_I_VECT2NORM_ERR);
136 | if (fabs(error) < 1e-10)
137 | {
138 | printf(PANGULU_I_CHECK_PASS);
139 | }
140 | else
141 | {
142 | printf(PANGULU_I_CHECK_ERROR);
143 | }
144 | }
145 |
146 | void pangulu_check(pangulu_block_common *block_common,
147 | pangulu_block_smatrix *block_smatrix,
148 | pangulu_origin_smatrix *origin_smatrix)
149 | {
150 | pangulu_exblock_idx n = block_common->n;
151 | pangulu_inblock_idx nb = block_common->nb;
152 | pangulu_exblock_idx vector_length = ((n + nb - 1) / nb) * nb;
153 | pangulu_int32_t sum_rank_size = block_common->sum_rank_size;
154 |
155 | pangulu_vector *x = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
156 | pangulu_get_init_value_pangulu_vector(x, vector_length);
157 | pangulu_vector *b1 = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
158 | pangulu_init_pangulu_vector(b1, vector_length);
159 |
160 | if (rank == 0)
161 | {
162 | pangulu_origin_smatrix_multiple_pangulu_vector_csr(origin_smatrix, x, b1);
163 | }
164 |
165 | pangulu_vector *b2 = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
166 | pangulu_get_init_value_pangulu_vector(b2, vector_length);
167 |
168 | pangulu_vector *b3 = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
169 | pangulu_init_pangulu_vector(b3, vector_length);
170 |
171 | pangulu_vector *b4 = (pangulu_vector *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_vector));
172 | pangulu_init_pangulu_vector(b4, vector_length);
173 | pangulu_multiply_upper_upper_u(block_common, block_smatrix, b2, b3);
174 | pangulu_gather_pangulu_vector_to_rank_0(rank, b3, vector_length, sum_rank_size);
175 | pangulu_multiply_triggle_l(block_common, block_smatrix, b3, b4);
176 | pangulu_gather_pangulu_vector_to_rank_0(rank, b4, vector_length, sum_rank_size);
177 | if (rank == 0)
178 | {
179 | // pangulu_check_answer(b1, b4, n);
180 | pangulu_check_answer_vec2norm(b1, b4, n);
181 | }
182 |
183 | pangulu_destroy_pangulu_vector(x);
184 | pangulu_destroy_pangulu_vector(b1);
185 | pangulu_destroy_pangulu_vector(b2);
186 | pangulu_destroy_pangulu_vector(b3);
187 | pangulu_destroy_pangulu_vector(b4);
188 | }
189 |
190 | long double max_check_ld(long double* x, int n)
191 | {
192 | long double max = __DBL_MIN__;
193 | for (int i = 0; i < n; i++) {
194 | long double x_fabs = fabsl(x[i]);
195 | max = max > x_fabs ? max : x_fabs;
196 | }
197 | return max;
198 | }
199 |
200 |
201 | // Multiply a csr matrix with a vector x, and get the resulting vector y ,sum use kekan
202 | // sum
203 | void spmv_ld(int n, const pangulu_int64_t* row_ptr, const pangulu_int32_t* col_idx, const long double* val, const long double* x, long double* y)
204 | {
205 | for (int i = 0; i < n; i++) {
206 | y[i] = 0.0;
207 | long double c = 0.0;
208 | for (int j = row_ptr[i]; j < row_ptr[i + 1]; j++) {
209 | long double num = val[j] * x[col_idx[j]];
210 | long double z = num - c;
211 | long double t = y[i] + z;
212 | c = (t - y[i]) - z;
213 | y[i] = t;
214 | }
215 | }
216 | }
217 |
--------------------------------------------------------------------------------
/src/pangulu_cuda_interface.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | #ifdef GPU_OPEN
4 | void pangulu_cuda_device_init(pangulu_int32_t rank)
5 | {
6 | pangulu_int32_t gpu_num;
7 | pangulu_cuda_getdevicenum(&gpu_num);
8 | pangulu_int32_t usr_id = pangulu_cuda_setdevice(gpu_num, rank);
9 | struct cudaDeviceProp prop;
10 | cudaGetDeviceProperties(&prop, usr_id);
11 | if (rank == 0)
12 | printf(PANGULU_I_DEV_IS);
13 | }
14 |
15 | void pangulu_cuda_device_init_thread(pangulu_int32_t rank)
16 | {
17 | pangulu_int32_t gpu_num;
18 | pangulu_cuda_getdevicenum(&gpu_num);
19 | pangulu_cuda_setdevice(gpu_num, rank);
20 | }
21 |
22 | void pangulu_cuda_free_interface(void *cuda_address)
23 | {
24 | pangulu_cuda_free(cuda_address);
25 | }
26 |
27 | void pangulu_smatrix_add_cuda_memory(pangulu_smatrix *s)
28 | {
29 | pangulu_cuda_malloc((void **)&(s->cuda_rowpointer), ((s->row) + 1) * sizeof(pangulu_int64_t));
30 | pangulu_cuda_malloc((void **)&(s->cuda_columnindex), (s->nnz) * sizeof(pangulu_inblock_idx));
31 | pangulu_cuda_malloc((void **)&(s->cuda_value), (s->nnz) * sizeof(calculate_type));
32 | pangulu_cuda_malloc((void **)&(s->cuda_bin_rowpointer), BIN_LENGTH * sizeof(pangulu_int64_t));
33 | pangulu_cuda_malloc((void **)&(s->cuda_bin_rowindex), (s->row) * sizeof(pangulu_inblock_idx));
34 | }
35 |
36 | void pangulu_smatrix_cuda_memory_init(pangulu_smatrix *s, pangulu_int64_t nb, pangulu_int64_t nnz)
37 | {
38 | s->row = nb;
39 | s->column = nb;
40 | s->nnz = nnz;
41 | pangulu_cuda_malloc((void **)&(s->cuda_rowpointer), (nb + 1) * sizeof(pangulu_int64_t));
42 | pangulu_cuda_malloc((void **)&(s->cuda_columnindex), nnz * sizeof(pangulu_inblock_idx));
43 | pangulu_cuda_malloc((void **)&(s->cuda_value), nnz * sizeof(calculate_type));
44 | }
45 |
46 | void pangulu_smatrix_add_cuda_memory_u(pangulu_smatrix *u)
47 | {
48 | pangulu_cuda_malloc((void **)&(u->cuda_nnzu), (u->row) * sizeof(int));
49 | }
50 |
51 | void pangulu_smatrix_cuda_memcpy_a(pangulu_smatrix *s)
52 | {
53 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(s->cuda_rowpointer, s->columnpointer, (s->row) + 1);
54 | pangulu_cuda_memcpy_host_to_device_inblock_idx(s->cuda_columnindex, s->rowindex, s->nnz);
55 | pangulu_cuda_memcpy_host_to_device_value(s->cuda_value, s->value_csc, s->nnz);
56 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(s->cuda_bin_rowpointer, s->bin_rowpointer, BIN_LENGTH);
57 | pangulu_cuda_memcpy_host_to_device_inblock_idx(s->cuda_bin_rowindex, s->bin_rowindex, s->row);
58 | }
59 |
60 | void pangulu_smatrix_cuda_memcpy_struct_csr(pangulu_smatrix *calculate_s, pangulu_smatrix *s)
61 | {
62 | calculate_s->nnz = s->nnz;
63 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(calculate_s->cuda_rowpointer, s->rowpointer, (s->row) + 1);
64 | pangulu_cuda_memcpy_host_to_device_inblock_idx(calculate_s->cuda_columnindex, s->columnindex, s->nnz);
65 | }
66 |
67 | void pangulu_smatrix_cuda_memcpy_struct_csc(pangulu_smatrix *calculate_s, pangulu_smatrix *s)
68 | {
69 | calculate_s->nnz = s->nnz;
70 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(calculate_s->cuda_rowpointer, s->columnpointer, (s->row) + 1);
71 | pangulu_cuda_memcpy_host_to_device_inblock_idx(calculate_s->cuda_columnindex, s->rowindex, s->nnz);
72 | }
73 |
74 | void pangulu_smatrix_cuda_memcpy_complete_csr(pangulu_smatrix *calculate_s, pangulu_smatrix *s)
75 | {
76 | calculate_s->nnz = s->nnz;
77 | #ifdef check_time
78 | struct timeval get_time_start;
79 | pangulu_time_check_begin(&get_time_start);
80 | #endif
81 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(calculate_s->cuda_rowpointer, s->rowpointer, (s->row) + 1);
82 | pangulu_cuda_memcpy_host_to_device_inblock_idx(calculate_s->cuda_columnindex, s->columnindex, s->nnz);
83 | pangulu_cuda_memcpy_host_to_device_value(calculate_s->cuda_value, s->value, s->nnz);
84 | #ifdef check_time
85 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
86 | #endif
87 | }
88 |
89 | void pangulu_smatrix_cuda_memcpy_nnzu(pangulu_smatrix *calculate_u, pangulu_smatrix *u)
90 | {
91 | pangulu_cuda_memcpy_host_to_device_int32(calculate_u->cuda_nnzu, u->nnzu, calculate_u->row);
92 | }
93 |
94 | void pangulu_smatrix_cuda_memcpy_value_csr(pangulu_smatrix *s, pangulu_smatrix *calculate_s)
95 | {
96 | #ifdef check_time
97 | struct timeval get_time_start;
98 | pangulu_time_check_begin(&get_time_start);
99 | #endif
100 | pangulu_cuda_memcpy_device_to_host_value(s->value, calculate_s->cuda_value, s->nnz);
101 |
102 | #ifdef check_time
103 |
104 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
105 | #endif
106 | }
107 |
108 | void pangulu_smatrix_cuda_memcpy_value_csr_async(pangulu_smatrix *s, pangulu_smatrix *calculate_s, cudaStream_t *stream)
109 | {
110 | #ifdef check_time
111 | struct timeval get_time_start;
112 | pangulu_time_check_begin(&get_time_start);
113 | #endif
114 | pangulu_cudamemcpyasync_device_to_host(s->value, calculate_s->cuda_value, (s->nnz) * sizeof(calculate_type), stream);
115 |
116 | #ifdef check_time
117 |
118 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
119 | #endif
120 | }
121 |
122 | void pangulu_smatrix_cuda_memcpy_value_csc(pangulu_smatrix *s, pangulu_smatrix *calculate_s)
123 | {
124 | #ifdef check_time
125 | struct timeval get_time_start;
126 | pangulu_time_check_begin(&get_time_start);
127 | #endif
128 | pangulu_cuda_memcpy_device_to_host_value(s->value_csc, calculate_s->cuda_value, s->nnz);
129 | #ifdef check_time
130 |
131 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
132 | #endif
133 | }
134 |
135 | void pangulu_smatrix_cuda_memcpy_value_csc_async(pangulu_smatrix *s, pangulu_smatrix *calculate_s, cudaStream_t *stream)
136 | {
137 | #ifdef check_time
138 | struct timeval get_time_start;
139 | pangulu_time_check_begin(&get_time_start);
140 | #endif
141 | pangulu_cudamemcpyasync_device_to_host(s->value_csc, calculate_s->cuda_value, (s->nnz) * sizeof(calculate_type), stream);
142 | #ifdef check_time
143 |
144 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
145 | #endif
146 | }
147 |
148 | void pangulu_smatrix_cuda_memcpy_value_csc_cal_length(pangulu_smatrix *s, pangulu_smatrix *calculate_s)
149 | {
150 |
151 | pangulu_cuda_memcpy_device_to_host_value(s->value_csc, calculate_s->cuda_value, calculate_s->nnz);
152 | }
153 |
154 | void pangulu_smatrix_cuda_memcpy_to_device_value_csc_async(pangulu_smatrix *calculate_s, pangulu_smatrix *s, cudaStream_t *stream)
155 | {
156 | #ifdef check_time
157 | struct timeval get_time_start;
158 | pangulu_time_check_begin(&get_time_start);
159 | #endif
160 |
161 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_value, s->value_csc, sizeof(calculate_type) * (s->nnz), stream);
162 |
163 | #ifdef check_time
164 |
165 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
166 | #endif
167 | }
168 |
169 | void pangulu_smatrix_cuda_memcpy_to_device_value_csc(pangulu_smatrix *calculate_s, pangulu_smatrix *s)
170 | {
171 | #ifdef check_time
172 | struct timeval get_time_start;
173 | pangulu_time_check_begin(&get_time_start);
174 | #endif
175 |
176 | pangulu_cuda_memcpy_host_to_device_value(calculate_s->cuda_value, s->value_csc, s->nnz);
177 |
178 | #ifdef check_time
179 |
180 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
181 | #endif
182 | }
183 |
184 | void pangulu_smatrix_cuda_memcpy_complete_csr_async(pangulu_smatrix *calculate_s, pangulu_smatrix *s, cudaStream_t *stream)
185 | {
186 | calculate_s->nnz = s->nnz;
187 | #ifdef check_time
188 | struct timeval get_time_start;
189 | pangulu_time_check_begin(&get_time_start);
190 | #endif
191 |
192 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_rowpointer, s->rowpointer, sizeof(pangulu_int64_t) * ((s->row) + 1), stream);
193 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_columnindex, s->columnindex, sizeof(pangulu_int32_t) * s->nnz, stream);
194 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_value, s->value, sizeof(calculate_type) * s->nnz, stream);
195 |
196 | #ifdef check_time
197 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
198 | #endif
199 | }
200 |
201 | void pangulu_smatrix_cuda_memcpy_complete_csc_async(pangulu_smatrix *calculate_s, pangulu_smatrix *s, cudaStream_t *stream)
202 | {
203 | calculate_s->nnz = s->nnz;
204 | #ifdef check_time
205 | struct timeval get_time_start;
206 | pangulu_time_check_begin(&get_time_start);
207 | #endif
208 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_rowpointer, s->columnpointer, sizeof(pangulu_int64_t) * ((s->row) + 1), stream);
209 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_columnindex, s->rowindex, sizeof(pangulu_int32_t) * s->nnz, stream);
210 | pangulu_cudamemcpyasync_host_to_device(calculate_s->cuda_value, s->value_csc, sizeof(calculate_type) * s->nnz, stream);
211 |
212 | #ifdef check_time
213 |
214 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
215 | #endif
216 | }
217 |
218 | void pangulu_smatrix_cuda_memcpy_complete_csc(pangulu_smatrix *calculate_s, pangulu_smatrix *s)
219 | {
220 | calculate_s->nnz = s->nnz;
221 | #ifdef check_time
222 | struct timeval get_time_start;
223 | pangulu_time_check_begin(&get_time_start);
224 | #endif
225 | pangulu_cuda_memcpy_host_to_device_inblock_ptr(calculate_s->cuda_rowpointer, s->columnpointer, (s->row) + 1);
226 | pangulu_cuda_memcpy_host_to_device_inblock_idx(calculate_s->cuda_columnindex, s->rowindex, s->nnz);
227 | pangulu_cuda_memcpy_host_to_device_value(calculate_s->cuda_value, s->value_csc, s->nnz);
228 | #ifdef check_time
229 |
230 | time_cuda_memcpy += pangulu_time_check_end(&get_time_start);
231 | #endif
232 | }
233 | #endif
--------------------------------------------------------------------------------
/src/pangulu_gessm_fp64.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 | void pangulu_gessm_fp64_cpu_1(pangulu_smatrix *a,
3 | pangulu_smatrix *l,
4 | pangulu_smatrix *x)
5 | {
6 |
7 | pangulu_inblock_ptr *a_rowpointer = a->rowpointer;
8 | pangulu_inblock_idx *a_colindex = a->columnindex;
9 | calculate_type *a_value = x->value;
10 |
11 | pangulu_inblock_ptr *l_colpointer = l->columnpointer;
12 | pangulu_inblock_idx *l_rowindex = l->rowindex;
13 | calculate_type *l_value = l->value_csc;
14 |
15 | pangulu_inblock_ptr *x_rowpointer = a->rowpointer;
16 | pangulu_inblock_idx *x_colindex = a->columnindex;
17 | calculate_type *x_value = a->value;
18 |
19 | pangulu_int64_t n = a->row;
20 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
21 | for (pangulu_int64_t i = 0; i < a->nnz; i++)
22 | {
23 | x_value[i] = 0.0;
24 | }
25 |
26 | for (pangulu_int64_t i = 0; i < n; i++)
27 | {
28 | // x get value from a
29 | for (pangulu_int64_t k = x_rowpointer[i]; k < x_rowpointer[i + 1]; k++)
30 | {
31 | x_value[k] = a_value[k];
32 | }
33 | // update Value
34 | if (x_rowpointer[i] != x_rowpointer[i + 1])
35 | {
36 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
37 | for (pangulu_int64_t j = l_colpointer[i]; j < l_colpointer[i + 1]; j++)
38 | {
39 |
40 | for (pangulu_int64_t p = a_rowpointer[l_rowindex[j]], k = x_rowpointer[i]; p < a_rowpointer[l_rowindex[j] + 1]; p++, k++)
41 | {
42 | if (a_colindex[p] == x_colindex[k])
43 | {
44 | a_value[p] -= l_value[j] * x_value[k];
45 | }
46 | else
47 | {
48 | k--;
49 | }
50 | }
51 | }
52 | }
53 | }
54 | }
55 |
56 | void pangulu_gessm_fp64_cpu_2(pangulu_smatrix *a,
57 | pangulu_smatrix *l,
58 | pangulu_smatrix *x)
59 | {
60 |
61 | pangulu_inblock_ptr *a_columnpointer = a->columnpointer;
62 | pangulu_inblock_idx *a_rowidx = a->rowindex;
63 |
64 | calculate_type *a_value = a->value_csc;
65 |
66 | pangulu_inblock_ptr *l_rowpointer = l->rowpointer;
67 |
68 | pangulu_inblock_ptr *l_colpointer = l->columnpointer;
69 | pangulu_inblock_idx *l_rowindex = l->rowindex;
70 | calculate_type *l_value = l->value_csc;
71 |
72 | pangulu_int64_t n = a->row;
73 |
74 | pangulu_int64_t *spointer = (pangulu_int64_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_int64_t) * (n + 1));
75 | memset(spointer, 0, sizeof(pangulu_int64_t) * (n + 1));
76 | int rhs = 0;
77 | for (pangulu_int64_t i = 0; i < n; i++)
78 | {
79 | if (a_columnpointer[i] != a_columnpointer[i + 1])
80 | {
81 | spointer[rhs] = i;
82 | rhs++;
83 | }
84 | }
85 |
86 | calculate_type *C_b = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * rhs);
87 | calculate_type *D_x = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * rhs);
88 |
89 | memset(C_b, 0.0, sizeof(calculate_type) * n * rhs);
90 | memset(D_x, 0.0, sizeof(calculate_type) * n * rhs);
91 |
92 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
93 | for (int i = 0; i < rhs; i++)
94 | {
95 | int index = spointer[i];
96 | for (int j = a_columnpointer[index]; j < a_columnpointer[index + 1]; j++)
97 | {
98 | C_b[i * n + a_rowidx[j]] = a_value[j];
99 | }
100 | }
101 |
102 | int nlevel = 0;
103 | int *levelPtr = (int *)pangulu_malloc(__FILE__, __LINE__, sizeof(int) * (n + 1));
104 | int *levelItem = (int *)pangulu_malloc(__FILE__, __LINE__, sizeof(int) * n);
105 | findlevel(l_colpointer, l_rowindex, l_rowpointer, n, &nlevel, levelPtr, levelItem);
106 |
107 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
108 | for (int i = 0; i < rhs; i++)
109 | {
110 | for (int li = 0; li < nlevel; li++)
111 | {
112 |
113 | for (int ri = levelPtr[li]; ri < levelPtr[li + 1]; ri++)
114 | {
115 | for (int j = l_colpointer[levelItem[ri]] + 1; j < l_colpointer[levelItem[ri] + 1]; j++)
116 | {
117 | C_b[i * n + l_rowindex[j]] -= l_value[j] * C_b[i * n + levelItem[ri]];
118 | }
119 | }
120 | }
121 | }
122 |
123 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
124 | for (int i = 0; i < rhs; i++)
125 | {
126 | int index = spointer[i];
127 | for (int j = a_columnpointer[index]; j < a_columnpointer[index + 1]; j++)
128 | {
129 | a_value[j] = C_b[i * n + a_rowidx[j]];
130 | }
131 | }
132 |
133 | pangulu_free(__FILE__, __LINE__, spointer);
134 | pangulu_free(__FILE__, __LINE__, C_b);
135 | pangulu_free(__FILE__, __LINE__, D_x);
136 | }
137 |
138 | void pangulu_gessm_fp64_cpu_3(pangulu_smatrix *a,
139 | pangulu_smatrix *l,
140 | pangulu_smatrix *x)
141 | {
142 |
143 | pangulu_inblock_ptr *a_columnpointer = a->columnpointer;
144 | pangulu_inblock_idx *a_rowidx = a->rowindex;
145 |
146 | calculate_type *a_value = a->value_csc;
147 |
148 | pangulu_inblock_ptr *l_columnpointer = l->columnpointer;
149 | pangulu_inblock_idx *l_rowidx = l->rowindex;
150 | calculate_type *l_value = l->value_csc;
151 |
152 | pangulu_int64_t n = a->row;
153 |
154 | calculate_type *C_b = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * n);
155 |
156 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
157 | for (int i = 0; i < n; i++)
158 | {
159 | for (int j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++)
160 | {
161 | int idx = a_rowidx[j];
162 | C_b[i * n + idx] = a_value[j];
163 | }
164 | }
165 |
166 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
167 | for (pangulu_int64_t i = 0; i < n; i++)
168 | {
169 | for (pangulu_int64_t j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++)
170 | {
171 | pangulu_inblock_idx idx = a_rowidx[j];
172 | for (pangulu_int64_t k = l_columnpointer[idx] + 1; k < l_columnpointer[idx + 1]; k++)
173 | {
174 | C_b[i * n + l_rowidx[k]] -= l_value[k] * C_b[i * n + a_rowidx[j]];
175 | }
176 | }
177 | }
178 |
179 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
180 | for (int i = 0; i < n; i++)
181 | {
182 | for (int j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++)
183 | {
184 | int idx = a_rowidx[j];
185 | a_value[j] = C_b[i * n + idx];
186 | }
187 | }
188 | pangulu_free(__FILE__, __LINE__, C_b);
189 | }
190 |
191 | void pangulu_gessm_fp64_cpu_4(pangulu_smatrix *a,
192 | pangulu_smatrix *l,
193 | pangulu_smatrix *x)
194 | {
195 |
196 | pangulu_inblock_ptr *a_columnpointer = a->columnpointer;
197 | pangulu_inblock_idx *a_rowidx = a->rowindex;
198 |
199 | calculate_type *a_value = a->value_csc;
200 |
201 | pangulu_inblock_ptr *l_columnpointer = l->columnpointer;
202 | pangulu_inblock_idx *l_rowidx = l->rowindex;
203 | calculate_type *l_value = l->value_csc;
204 |
205 | pangulu_int64_t n = a->row;
206 |
207 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
208 | for (pangulu_int64_t i = 0; i < n; i++)
209 | {
210 | for (pangulu_int64_t j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++)
211 | {
212 | pangulu_inblock_idx idx = a_rowidx[j];
213 | for (pangulu_int64_t k = l_columnpointer[idx] + 1, p = j + 1; k < l_columnpointer[idx + 1] && p < a_columnpointer[i + 1]; k++, p++)
214 | {
215 | if (l_rowidx[k] == a_rowidx[p])
216 | {
217 | a_value[p] -= l_value[k] * a_value[j];
218 | }
219 | else
220 | {
221 | k--;
222 | }
223 | }
224 | }
225 | }
226 | }
227 |
228 | void pangulu_gessm_fp64_cpu_5(pangulu_smatrix *a,
229 | pangulu_smatrix *l,
230 | pangulu_smatrix *x)
231 | {
232 |
233 | pangulu_inblock_ptr *a_rowpointer = a->rowpointer;
234 | pangulu_inblock_idx *a_colindex = a->columnindex;
235 | calculate_type *a_value = x->value;
236 |
237 | pangulu_inblock_ptr *l_colpointer = l->columnpointer;
238 | pangulu_inblock_idx *l_rowindex = l->rowindex;
239 | calculate_type *l_value = l->value_csc;
240 |
241 | pangulu_inblock_ptr *x_rowpointer = a->rowpointer;
242 | pangulu_inblock_idx *x_colindex = a->columnindex;
243 | calculate_type *x_value = a->value;
244 |
245 | pangulu_int64_t n = a->row;
246 |
247 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
248 | for (int i = 0; i < n; i++) // jth column of U
249 | {
250 | for (int j = a_rowpointer[i]; j < a_rowpointer[i + 1]; j++)
251 | {
252 | pangulu_inblock_idx idx = a_colindex[j];
253 | temp_a_value[i * n + idx] = a_value[j]; // tranform csr to dense,only value
254 | }
255 | }
256 |
257 | for (pangulu_int64_t i = 0; i < n; i++)
258 | {
259 | // x get value from a
260 | for (pangulu_int64_t k = x_rowpointer[i]; k < x_rowpointer[i + 1]; k++)
261 | {
262 | x_value[k] = temp_a_value[i * n + x_colindex[k]];
263 | }
264 | // update Value
265 | if (x_rowpointer[i] != x_rowpointer[i + 1])
266 | {
267 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
268 | for (pangulu_int64_t j = l_colpointer[i] + 1; j < l_colpointer[i + 1]; j++)
269 | {
270 | pangulu_inblock_idx idx1 = l_rowindex[j];
271 |
272 | for (pangulu_int64_t p = x_rowpointer[i]; p < x_rowpointer[i + 1]; p++)
273 | {
274 |
275 | pangulu_inblock_idx idx2 = a_colindex[p];
276 | temp_a_value[idx1 * n + idx2] -= l_value[j] * temp_a_value[i * n + idx2];
277 | }
278 | }
279 | }
280 | }
281 | }
282 |
283 | void pangulu_gessm_fp64_cpu_6(pangulu_smatrix *a,
284 | pangulu_smatrix *l,
285 | pangulu_smatrix *x)
286 | {
287 |
288 | pangulu_inblock_ptr *a_columnpointer = a->columnpointer;
289 | pangulu_inblock_idx *a_rowidx = a->rowindex;
290 |
291 | calculate_type *a_value = a->value_csc;
292 |
293 | pangulu_inblock_ptr *l_columnpointer = l->columnpointer;
294 | pangulu_inblock_idx *l_rowidx = l->rowindex;
295 | calculate_type *l_value = l->value_csc;
296 |
297 | pangulu_int64_t n = a->row;
298 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
299 | for (int i = 0; i < n; i++) // jth column of U
300 | {
301 | for (int j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++)
302 | {
303 | int idx = a_rowidx[j];
304 | temp_a_value[i * n + idx] = a_value[j]; // tranform csr to dense,only value
305 | }
306 | }
307 |
308 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
309 | for (pangulu_int64_t i = 0; i < n; i++)
310 | {
311 | for (pangulu_int64_t j = a_columnpointer[i]; j < a_columnpointer[i + 1]; j++)
312 | {
313 | pangulu_inblock_idx idx = a_rowidx[j];
314 | a_value[j] = temp_a_value[i * n + idx];
315 | for (pangulu_int64_t k = l_columnpointer[idx] + 1; k < l_columnpointer[idx + 1]; k++)
316 | {
317 | temp_a_value[i * n + l_rowidx[k]] -= l_value[k] * a_value[j];
318 | }
319 | }
320 | }
321 | }
322 |
323 | int findlevel(const pangulu_inblock_ptr *cscColPtr,
324 | const pangulu_inblock_idx *cscRowIdx,
325 | const pangulu_inblock_ptr *csrRowPtr,
326 | const pangulu_int64_t m,
327 | int *nlevel,
328 | int *levelPtr,
329 | int *levelItem)
330 | {
331 | int *indegree = (int *)pangulu_malloc(__FILE__, __LINE__, m * sizeof(int));
332 |
333 | for (int i = 0; i < m; i++)
334 | {
335 | indegree[i] = csrRowPtr[i + 1] - csrRowPtr[i];
336 | }
337 |
338 | int ptr = 0;
339 |
340 | levelPtr[0] = 0;
341 | for (int i = 0; i < m; i++)
342 | {
343 | if (indegree[i] == 1)
344 | {
345 | levelItem[ptr] = i;
346 | ptr++;
347 | }
348 | }
349 |
350 | levelPtr[1] = ptr;
351 |
352 | int lvi = 1;
353 | while (levelPtr[lvi] != m)
354 | {
355 | for (pangulu_int64_t i = levelPtr[lvi - 1]; i < levelPtr[lvi]; i++)
356 | {
357 | int node = levelItem[i];
358 | for (pangulu_int64_t j = cscColPtr[node]; j < cscColPtr[node + 1]; j++)
359 | {
360 | pangulu_inblock_idx visit_node = cscRowIdx[j];
361 | indegree[visit_node]--;
362 | if (indegree[visit_node] == 1)
363 | {
364 | levelItem[ptr] = visit_node;
365 | ptr++;
366 | }
367 | }
368 | }
369 | lvi++;
370 | levelPtr[lvi] = ptr;
371 | }
372 |
373 | *nlevel = lvi;
374 |
375 | pangulu_free(__FILE__, __LINE__, indegree);
376 |
377 | return 0;
378 | }
379 |
380 | void pangulu_gessm_interface_cpu_csc(pangulu_smatrix *a,
381 | pangulu_smatrix *l,
382 | pangulu_smatrix *x)
383 | {
384 | pangulu_gessm_fp64_cpu_4(a, l, x);
385 | }
386 |
387 | void pangulu_gessm_interface_cpu_csr(pangulu_smatrix *a,
388 | pangulu_smatrix *l,
389 | pangulu_smatrix *x)
390 | {
391 | #ifdef OUTPUT_MATRICES
392 | char out_name_B[512];
393 | char out_name_L[512];
394 | sprintf(out_name_B, "%s/%s/%d%s", OUTPUT_FILE, "gessm", gessm_number, "_gessm_B.cbd");
395 | sprintf(out_name_L, "%s/%s/%d%s", OUTPUT_FILE, "gessm", gessm_number, "_gessm_L.cbd");
396 | pangulu_binary_write_csc_pangulu_smatrix(a, out_name_B);
397 | pangulu_binary_write_csc_pangulu_smatrix(l, out_name_L);
398 | gessm_number++;
399 | #endif
400 |
401 | pangulu_gessm_fp64_cpu_1(a, l, x);
402 | }
403 | void pangulu_gessm_interface_c_v1(pangulu_smatrix *a,
404 | pangulu_smatrix *l,
405 | pangulu_smatrix *x)
406 | {
407 | #ifdef GPU_OPEN
408 | pangulu_smatrix_cuda_memcpy_value_csc(a, a);
409 | #endif
410 | pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(x, a);
411 | pangulu_gessm_fp64_cpu_4(a, l, x);
412 | #ifdef GPU_OPEN
413 | pangulu_smatrix_cuda_memcpy_to_device_value_csc(a, a);
414 | #endif
415 | }
416 | void pangulu_gessm_interface_c_v2(pangulu_smatrix *a,
417 | pangulu_smatrix *l,
418 | pangulu_smatrix *x)
419 | {
420 | #ifdef GPU_OPEN
421 | pangulu_smatrix_cuda_memcpy_value_csc(a, a);
422 | #endif
423 | pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(x, a);
424 | pangulu_gessm_fp64_cpu_6(a, l, x);
425 | #ifdef GPU_OPEN
426 | pangulu_smatrix_cuda_memcpy_to_device_value_csc(a, a);
427 | #endif
428 | }
--------------------------------------------------------------------------------
/src/pangulu_gessm_fp64_cuda.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | #ifdef GPU_OPEN
4 | void pangulu_gessm_fp64_cuda_v9(pangulu_smatrix *a,
5 | pangulu_smatrix *l,
6 | pangulu_smatrix *x)
7 | {
8 |
9 | pangulu_int64_t n = a->row;
10 | pangulu_int64_t nnzl = l->nnz;
11 | pangulu_int64_t nnza = a->nnz;
12 |
13 | int *d_graphindegree = l->d_graphindegree;
14 | cudaMemcpy(d_graphindegree, l->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice);
15 | int *d_id_extractor = l->d_id_extractor;
16 | cudaMemset(d_id_extractor, 0, sizeof(int));
17 |
18 | int *d_while_profiler;
19 | cudaMalloc((void **)&d_while_profiler, sizeof(int) * n);
20 | cudaMemset(d_while_profiler, 0, sizeof(int) * n);
21 | pangulu_int64_t *spointer = (pangulu_int64_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_int64_t) * (n + 1));
22 | memset(spointer, 0, sizeof(pangulu_int64_t) * (n + 1));
23 | pangulu_int64_t rhs = 0;
24 | for (int i = 0; i < n; i++)
25 | {
26 | if (a->columnpointer[i] != a->columnpointer[i + 1])
27 | {
28 | spointer[rhs] = i;
29 | rhs++;
30 | }
31 | }
32 | calculate_type *d_left_sum;
33 | cudaMalloc((void **)&d_left_sum, n * rhs * sizeof(calculate_type));
34 | cudaMemset(d_left_sum, 0, n * rhs * sizeof(calculate_type));
35 |
36 | calculate_type *d_x, *d_b;
37 | cudaMalloc((void **)&d_x, n * rhs * sizeof(calculate_type));
38 | cudaMalloc((void **)&d_b, n * rhs * sizeof(calculate_type));
39 | cudaMemset(d_x, 0, n * rhs * sizeof(calculate_type));
40 | cudaMemset(d_b, 0, n * rhs * sizeof(calculate_type));
41 |
42 | pangulu_inblock_ptr *d_spointer;
43 | cudaMalloc((void **)&d_spointer, sizeof(pangulu_inblock_ptr) * (n + 1));
44 | cudaMemset(d_spointer, 0, sizeof(pangulu_inblock_ptr) * (n + 1));
45 | cudaMemcpy(d_spointer, spointer, sizeof(pangulu_inblock_ptr) * (n + 1), cudaMemcpyHostToDevice);
46 |
47 | pangulu_gessm_cuda_kernel_v9(n,
48 | nnzl,
49 | rhs,
50 | nnza,
51 | d_spointer,
52 | d_graphindegree,
53 | d_id_extractor,
54 | d_while_profiler,
55 | l->cuda_rowpointer,
56 | l->cuda_columnindex,
57 | l->cuda_value,
58 | a->cuda_rowpointer,
59 | a->cuda_columnindex,
60 | x->cuda_value,
61 | a->cuda_rowpointer,
62 | a->cuda_columnindex,
63 | a->cuda_value,
64 | d_left_sum,
65 | d_x,
66 | d_b);
67 |
68 | cudaFree(d_x);
69 | cudaFree(d_b);
70 | cudaFree(d_left_sum);
71 | cudaFree(d_while_profiler);
72 | }
73 |
74 | void pangulu_gessm_fp64_cuda_v11(pangulu_smatrix *a,
75 | pangulu_smatrix *l,
76 | pangulu_smatrix *x)
77 | {
78 | pangulu_int64_t n = a->row;
79 | pangulu_int64_t nnzl = l->nnz;
80 | pangulu_int64_t nnza = a->nnz;
81 | /**********************************l****************************************/
82 | int *d_graphindegree = l->d_graphindegree;
83 | cudaMemcpy(d_graphindegree, l->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice);
84 | int *d_id_extractor = l->d_id_extractor;
85 | cudaMemset(d_id_extractor, 0, sizeof(int));
86 |
87 | calculate_type *d_left_sum = a->d_left_sum;
88 | cudaMemset(d_left_sum, 0, nnza * sizeof(calculate_type));
89 | /*****************************************************************************/
90 | pangulu_gessm_cuda_kernel_v11(n,
91 | nnzl,
92 | nnza,
93 | d_graphindegree,
94 | d_id_extractor,
95 | d_left_sum,
96 | l->cuda_rowpointer,
97 | l->cuda_columnindex,
98 | l->cuda_value,
99 | a->cuda_rowpointer,
100 | a->cuda_columnindex,
101 | x->cuda_value,
102 | a->cuda_rowpointer,
103 | a->cuda_columnindex,
104 | a->cuda_value);
105 | cudaDeviceSynchronize();
106 | }
107 |
108 | void pangulu_gessm_fp64_cuda_v7(pangulu_smatrix *a,
109 | pangulu_smatrix *l,
110 | pangulu_smatrix *x)
111 | {
112 |
113 | pangulu_int64_t n = a->row;
114 | pangulu_int64_t nnzl = l->nnz;
115 | pangulu_gessm_cuda_kernel_v7(n,
116 | nnzl,
117 | l->cuda_rowpointer,
118 | l->cuda_columnindex,
119 | l->cuda_value,
120 | a->cuda_rowpointer,
121 | a->cuda_columnindex,
122 | x->cuda_value,
123 | a->cuda_rowpointer,
124 | a->cuda_columnindex,
125 | a->cuda_value);
126 | }
127 |
128 | void pangulu_gessm_fp64_cuda_v8(pangulu_smatrix *a,
129 | pangulu_smatrix *l,
130 | pangulu_smatrix *x)
131 | {
132 | pangulu_int64_t n = a->row;
133 | pangulu_int64_t nnzl = l->nnz;
134 | pangulu_int64_t nnza = a->nnz;
135 | /**********************************l****************************************/
136 | int *d_graphindegree = l->d_graphindegree;
137 | cudaMemcpy(d_graphindegree, l->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice);
138 | int *d_id_extractor = l->d_id_extractor;
139 | cudaMemset(d_id_extractor, 0, sizeof(int));
140 |
141 | calculate_type *d_left_sum = a->d_left_sum;
142 | cudaMemset(d_left_sum, 0, nnza * sizeof(calculate_type));
143 | /*****************************************************************************/
144 | pangulu_gessm_cuda_kernel_v8(n,
145 | nnzl,
146 | nnza,
147 | d_graphindegree,
148 | d_id_extractor,
149 | d_left_sum,
150 | l->cuda_rowpointer,
151 | l->cuda_columnindex,
152 | l->cuda_value,
153 | a->cuda_rowpointer,
154 | a->cuda_columnindex,
155 | x->cuda_value,
156 | a->cuda_rowpointer,
157 | a->cuda_columnindex,
158 | a->cuda_value);
159 | cudaDeviceSynchronize();
160 | }
161 |
162 | void pangulu_gessm_fp64_cuda_v10(pangulu_smatrix *a,
163 | pangulu_smatrix *l,
164 | pangulu_smatrix *x)
165 | {
166 |
167 | pangulu_int64_t n = a->row;
168 | pangulu_int64_t nnzl = l->nnz;
169 | pangulu_gessm_cuda_kernel_v10(n,
170 | nnzl,
171 | l->cuda_rowpointer,
172 | l->cuda_columnindex,
173 | l->cuda_value,
174 | a->cuda_rowpointer,
175 | a->cuda_columnindex,
176 | x->cuda_value,
177 | a->cuda_rowpointer,
178 | a->cuda_columnindex,
179 | a->cuda_value);
180 | }
181 |
182 | void pangulu_gessm_interface_g_v1(pangulu_smatrix *a,
183 | pangulu_smatrix *l,
184 | pangulu_smatrix *x)
185 | {
186 | pangulu_gessm_fp64_cuda_v7(a, l, x);
187 | pangulu_smatrix_cuda_memcpy_value_csc(a, x);
188 | }
189 | void pangulu_gessm_interface_g_v2(pangulu_smatrix *a,
190 | pangulu_smatrix *l,
191 | pangulu_smatrix *x)
192 | {
193 | pangulu_smatrix_cuda_memcpy_value_csc(a, a);
194 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a);
195 | pangulu_smatrix_cuda_memcpy_complete_csr(a, a);
196 |
197 | pangulu_gessm_fp64_cuda_v8(a, l, x);
198 |
199 | pangulu_smatrix_cuda_memcpy_value_csr(a, x);
200 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a);
201 | }
202 | void pangulu_gessm_interface_g_v3(pangulu_smatrix *a,
203 | pangulu_smatrix *l,
204 | pangulu_smatrix *x)
205 | {
206 | pangulu_gessm_fp64_cuda_v10(a, l, x);
207 | pangulu_smatrix_cuda_memcpy_value_csc(a, x);
208 | }
209 | #endif
--------------------------------------------------------------------------------
/src/pangulu_getrf_fp64_cuda.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | #ifdef GPU_OPEN
4 | void pangulu_getrf_fp64_cuda(pangulu_smatrix *a,
5 | pangulu_smatrix *l,
6 | pangulu_smatrix *u)
7 | {
8 |
9 | if (a->nnz > 1e4)
10 | {
11 | pangulu_getrf_cuda_dense_kernel(a->row,
12 | a->rowpointer[a->row],
13 | u->cuda_nnzu,
14 | a->cuda_rowpointer,
15 | a->cuda_columnindex,
16 | a->cuda_value,
17 | l->cuda_rowpointer,
18 | l->cuda_columnindex,
19 | l->cuda_value,
20 | u->cuda_rowpointer,
21 | u->cuda_columnindex,
22 | u->cuda_value);
23 | }
24 | else
25 | {
26 | pangulu_getrf_cuda_kernel(a->row,
27 | a->rowpointer[a->row],
28 | u->cuda_nnzu,
29 | a->cuda_rowpointer,
30 | a->cuda_columnindex,
31 | a->cuda_value,
32 | l->cuda_rowpointer,
33 | l->cuda_columnindex,
34 | l->cuda_value,
35 | u->cuda_rowpointer,
36 | u->cuda_columnindex,
37 | u->cuda_value);
38 | }
39 | }
40 |
41 | void pangulu_getrf_interface_G_V1(pangulu_smatrix *a,
42 | pangulu_smatrix *l,
43 | pangulu_smatrix *u)
44 | {
45 | pangulu_getrf_cuda_kernel(a->row,
46 | a->rowpointer[a->row],
47 | u->cuda_nnzu,
48 | a->cuda_rowpointer,
49 | a->cuda_columnindex,
50 | a->cuda_value,
51 | l->cuda_rowpointer,
52 | l->cuda_columnindex,
53 | l->cuda_value,
54 | u->cuda_rowpointer,
55 | u->cuda_columnindex,
56 | u->cuda_value);
57 | pangulu_smatrix_cuda_memcpy_value_csc(l, l);
58 | pangulu_smatrix_cuda_memcpy_value_csc(u, u);
59 | }
60 | void pangulu_getrf_interface_G_V2(pangulu_smatrix *a,
61 | pangulu_smatrix *l,
62 | pangulu_smatrix *u)
63 | {
64 | pangulu_getrf_cuda_dense_kernel(a->row,
65 | a->rowpointer[a->row],
66 | u->cuda_nnzu,
67 | a->cuda_rowpointer,
68 | a->cuda_columnindex,
69 | a->cuda_value,
70 | l->cuda_rowpointer,
71 | l->cuda_columnindex,
72 | l->cuda_value,
73 | u->cuda_rowpointer,
74 | u->cuda_columnindex,
75 | u->cuda_value);
76 | pangulu_smatrix_cuda_memcpy_value_csc(l, l);
77 | pangulu_smatrix_cuda_memcpy_value_csc(u, u);
78 | }
79 |
80 | #endif
--------------------------------------------------------------------------------
/src/pangulu_heap.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 | void pangulu_init_heap_select(pangulu_int64_t select)
3 | {
4 | heap_select = select;
5 | }
6 |
7 | void pangulu_init_pangulu_heap(pangulu_heap *heap, pangulu_int64_t max_length)
8 | {
9 | compare_struct *compare_queue = (compare_struct *)pangulu_malloc(__FILE__, __LINE__, sizeof(compare_struct) * max_length);
10 | pangulu_int64_t *heap_queue = (pangulu_int64_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_int64_t) * max_length);
11 | heap->comapre_queue = compare_queue;
12 | heap->heap_queue = heap_queue;
13 | heap->max_length = max_length;
14 | heap->length = 0;
15 | heap->nnz_flag = 0;
16 | #ifdef OVERLAP
17 | heap->heap_bsem = NULL;
18 | #endif
19 | }
20 |
21 | pangulu_heap *pangulu_destory_pangulu_heap(pangulu_heap *heap)
22 | {
23 | if (heap != NULL)
24 | {
25 | pangulu_free(__FILE__, __LINE__, heap->comapre_queue);
26 | pangulu_free(__FILE__, __LINE__, heap->heap_queue);
27 | heap->length = 0;
28 | heap->nnz_flag = 0;
29 | heap->max_length = 0;
30 | }
31 | pangulu_free(__FILE__, __LINE__, heap);
32 | return NULL;
33 | }
34 |
35 | void pangulu_zero_pangulu_heap(pangulu_heap *heap)
36 | {
37 | heap->length = 0;
38 | heap->nnz_flag = 0;
39 | }
40 |
41 | pangulu_int64_t pangulu_compare(compare_struct *compare_queue, pangulu_int64_t a, pangulu_int64_t b)
42 | {
43 | if (0 == heap_select)
44 | {
45 | if (compare_queue[a].compare_flag == compare_queue[b].compare_flag)
46 | {
47 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag;
48 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag;
49 |
50 | return compare_flag_a < compare_flag_b;
51 | }
52 | else
53 | {
54 | return compare_queue[a].compare_flag < compare_queue[b].compare_flag;
55 | }
56 | }
57 | else if (1 == heap_select)
58 | {
59 | if (compare_queue[a].kernel_id == compare_queue[b].kernel_id)
60 | {
61 |
62 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag;
63 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag;
64 |
65 | if (compare_flag_a == compare_flag_b)
66 | {
67 | return compare_queue[a].compare_flag < compare_queue[b].compare_flag;
68 | }
69 | else
70 | {
71 | return compare_flag_a < compare_flag_b;
72 | }
73 | }
74 | else
75 | {
76 | return compare_queue[a].kernel_id < compare_queue[b].kernel_id;
77 | }
78 | }
79 | else if (2 == heap_select)
80 | {
81 | if (compare_queue[a].kernel_id == compare_queue[b].kernel_id)
82 | {
83 | if (compare_queue[a].compare_flag == compare_queue[b].compare_flag)
84 | {
85 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag;
86 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag;
87 | return compare_flag_a < compare_flag_b;
88 | }
89 | else
90 | {
91 | return compare_queue[a].compare_flag < compare_queue[b].compare_flag;
92 | }
93 | }
94 | else
95 | {
96 | return compare_queue[a].kernel_id < compare_queue[b].kernel_id;
97 | }
98 | }
99 | else if (3 == heap_select)
100 | {
101 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag;
102 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag;
103 | return compare_flag_a < compare_flag_b;
104 | }
105 | else if (4 == heap_select)
106 | {
107 | if (compare_queue[a].compare_flag == compare_queue[b].compare_flag)
108 | {
109 | pangulu_int64_t compare_flag_a = compare_queue[a].row + compare_queue[a].col - compare_queue[a].compare_flag;
110 | pangulu_int64_t compare_flag_b = compare_queue[b].row + compare_queue[b].col - compare_queue[b].compare_flag;
111 |
112 | return compare_flag_a > compare_flag_b;
113 | }
114 | else
115 | {
116 | return compare_queue[a].compare_flag > compare_queue[b].compare_flag;
117 | }
118 | }
119 | else
120 | {
121 | printf(PANGULU_E_INVALID_HEAP_SELECT);
122 | pangulu_exit(1);
123 | }
124 | }
125 |
126 | void pangulu_swap(pangulu_int64_t *heap_queue, pangulu_int64_t a, pangulu_int64_t b)
127 | {
128 | pangulu_int64_t temp = heap_queue[a];
129 | heap_queue[a] = heap_queue[b];
130 | heap_queue[b] = temp;
131 | }
132 |
133 | void pangulu_heap_insert(pangulu_heap *heap, pangulu_int64_t row, pangulu_int64_t col, pangulu_int64_t task_level, pangulu_int64_t kernel_id, pangulu_int64_t compare_flag)
134 | {
135 |
136 | compare_struct *compare_queue = heap->comapre_queue;
137 | pangulu_int64_t *heap_queue = heap->heap_queue;
138 | pangulu_int64_t length = heap->length;
139 | pangulu_int64_t nnz_flag = heap->nnz_flag;
140 |
141 | if (rank == -1)
142 | {
143 | printf(PANGULU_I_TASK_INFO);
144 | }
145 |
146 | if ((nnz_flag) >= heap->max_length)
147 | {
148 | printf(PANGULU_E_HEAP_FULL);
149 | pangulu_exit(1);
150 | }
151 | compare_queue[nnz_flag].row = row;
152 | compare_queue[nnz_flag].col = col;
153 | compare_queue[nnz_flag].task_level = task_level;
154 | compare_queue[nnz_flag].kernel_id = kernel_id;
155 | compare_queue[nnz_flag].compare_flag = compare_flag;
156 | heap_queue[length] = nnz_flag;
157 | (heap->nnz_flag)++;
158 | pangulu_int64_t now = length;
159 | pangulu_int64_t before = (now - 1) / 2;
160 | while (now != 0 && before >= 0)
161 | {
162 | if (pangulu_compare(compare_queue, heap_queue[now], heap_queue[before]))
163 | {
164 | pangulu_swap(heap_queue, now, before);
165 | }
166 | else
167 | {
168 | break;
169 | }
170 | now = before;
171 | before = (now - 1) / 2;
172 | }
173 | heap->length = length + 1;
174 | }
175 |
176 | pangulu_int64_t heap_empty(pangulu_heap *heap)
177 | {
178 | return !(heap->length);
179 | }
180 |
181 | void pangulu_heap_adjust(pangulu_heap *heap, pangulu_int64_t top, pangulu_int64_t n)
182 | {
183 | compare_struct *compare_queue = heap->comapre_queue;
184 | pangulu_int64_t *heap_queue = heap->heap_queue;
185 | pangulu_int64_t left = top * 2 + 1;
186 |
187 | while (left < n)
188 | {
189 | if ((left + 1) < n && pangulu_compare(compare_queue, heap_queue[left + 1], heap_queue[left]))
190 | {
191 | left = left + 1;
192 | }
193 | if (pangulu_compare(compare_queue, heap_queue[left], heap_queue[top]))
194 | {
195 | pangulu_swap(heap_queue, left, top);
196 | top = left;
197 | left = 2 * top + 1;
198 | }
199 | else
200 | {
201 | break;
202 | }
203 | }
204 | }
205 |
206 | pangulu_int64_t pangulu_heap_delete(pangulu_heap *heap)
207 | {
208 | if (heap_empty(heap))
209 | {
210 | printf(PANGULU_E_HEAP_EMPTY);
211 | pangulu_exit(1);
212 | }
213 | pangulu_int64_t length = heap->length;
214 | pangulu_int64_t *heap_queue = heap->heap_queue;
215 | pangulu_swap(heap_queue, length - 1, 0);
216 | pangulu_heap_adjust(heap, 0, length - 1);
217 | heap->length = length - 1;
218 | return heap_queue[length - 1];
219 | }
220 |
221 | void pangulu_display_heap(pangulu_heap *heap)
222 | {
223 | printf(PANGULU_I_HEAP_LEN);
224 | for (pangulu_int64_t i = 0; i < heap->length; i++)
225 | {
226 | printf(FMT_PANGULU_INT64_T " ", heap->heap_queue[i]);
227 | }
228 | printf("\n");
229 | for (pangulu_int64_t i = 0; i < heap->length; i++)
230 | {
231 | pangulu_int64_t now = heap->heap_queue[i];
232 | printf("row is " FMT_PANGULU_EXBLOCK_IDX
233 | " col is " FMT_PANGULU_EXBLOCK_IDX
234 | " level is " FMT_PANGULU_EXBLOCK_IDX
235 | " compare_flag is " FMT_PANGULU_INT64_T
236 | " do the kernel " FMT_PANGULU_INT16_T "\n",
237 | heap->comapre_queue[now].row,
238 | heap->comapre_queue[now].col,
239 | heap->comapre_queue[now].task_level,
240 | heap->comapre_queue[now].compare_flag,
241 | heap->comapre_queue[now].kernel_id);
242 | }
243 | }
--------------------------------------------------------------------------------
/src/pangulu_kernel_interface.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | void pangulu_getrf_interface(pangulu_smatrix *a, pangulu_smatrix *l, pangulu_smatrix *u,
4 | pangulu_smatrix *calculate_L, pangulu_smatrix *calculate_U)
5 | {
6 | for(pangulu_int64_t i=0;innz;i++){
7 | pangulu_int64_t now_row=u->rowindex[i];
8 | calculate_time+=(l->columnpointer[now_row+1]-l->columnpointer[now_row]);
9 | }
10 | #ifdef CHECK_TIME
11 | struct timeval GET_TIME_START;
12 | pangulu_time_check_begin(&GET_TIME_START);
13 | #endif
14 |
15 | #ifdef GPU_OPEN
16 |
17 | #ifdef ADD_GPU_MEMORY
18 |
19 | #ifdef ADAPTIVE_KERNEL_SELECTION
20 | int nnzA = a->nnz;
21 | if (nnzA < 6309)
22 | { // 6309≈1e3.8
23 | pangulu_getrf_interface_C_V1(a, l, u);
24 | }
25 | else if (nnzA < 1e4)
26 | {
27 | pangulu_getrf_interface_G_V1(a, l, u);
28 | }
29 | else
30 | {
31 | pangulu_getrf_interface_G_V2(a, l, u);
32 | }
33 | #else // ADAPTIVE_KERNEL_SELECTION
34 | pangulu_getrf_interface_G_V1(a, l, u);
35 | #endif // ADAPTIVE_KERNEL_SELECTION
36 | cudaDeviceSynchronize();
37 |
38 | #else // ADD_GPU_MEMORY
39 | pangulu_smatrix_cuda_memcpy_struct_csc(calculate_L, l);
40 | pangulu_smatrix_cuda_memcpy_struct_csc(calculate_U, u);
41 | pangulu_smatrix_cuda_memcpy_nnzu(calculate_U, u);
42 | pangulu_getrf_fp64_cuda(a, calculate_L, calculate_U);
43 | pangulu_smatrix_cuda_memcpy_value_csc(l, calculate_L);
44 | pangulu_smatrix_cuda_memcpy_value_csc(u, calculate_U);
45 |
46 | #endif // ADD_GPU_MEMORY
47 | #else // GPU_OPEN
48 |
49 | pangulu_getrf_fp64(a, l, u);
50 |
51 | #endif // GPU_OPEN
52 |
53 | #ifdef CHECK_TIME
54 | time_getrf += pangulu_time_check_end(&GET_TIME_START);
55 | #endif
56 | }
57 |
58 | void pangulu_tstrf_interface(pangulu_smatrix *a, pangulu_smatrix *save_X, pangulu_smatrix *u,
59 | pangulu_smatrix *calculate_X, pangulu_smatrix *calculate_U)
60 | {
61 | // for(int_t i=0;innz;i++){
62 | // int_t now_col=a->columnindex[i];
63 | // calculate_time+=(u->rowpointer[now_col+1]-u->rowpointer[now_col]);
64 | // }
65 | #ifdef CHECK_TIME
66 | struct timeval GET_TIME_START;
67 | pangulu_time_check_begin(&GET_TIME_START);
68 | #endif
69 |
70 | #ifdef GPU_OPEN
71 |
72 | #ifndef GPU_TSTRF
73 |
74 | #ifndef CPU_OPTION
75 | pangulu_smatrix_cuda_memcpy_value_csc_cal_length(calculate_X, a);
76 |
77 | pangulu_tstrf_interface_cpu(a, calculate_X, u);
78 | #else // CPU_OPTION
79 |
80 | pangulu_int64_t cpu_choice2 = a->nnz;
81 | calculate_type cpu_choice3 = cpu_choice2 / ((calculate_type)nrecord * (calculate_type)cpu_choice1);
82 | pangulu_int64_t TSTRF_choice_cpu = Select_Function_CPU(cpu_choice1, cpu_choice3, nrecord);
83 | pangulu_tstrf_kernel_choice_cpu(a, calculate_X, u, TSTRF_choice_cpu);
84 | #endif // CPU_OPTION
85 |
86 | #else // GPU_TSTRF
87 |
88 | #ifdef ADD_GPU_MEMORY
89 | #ifdef ADAPTIVE_KERNEL_SELECTION
90 | pangulu_int64_t nnzB = a->nnz;
91 | if (nnzB < 6309)
92 | {
93 | // 6309≈1e3.8
94 | if (nnzB < 3981)
95 | { // 3981≈1e3.6
96 | pangulu_tstrf_interface_C_V1(a, calculate_X, u);
97 | }
98 | else
99 | {
100 | pangulu_tstrf_interface_C_V2(a, calculate_X, u);
101 | }
102 | }
103 | else
104 | {
105 | if (nnzB < 1e4)
106 | {
107 | pangulu_tstrf_interface_G_V2(a, calculate_X, u);
108 | }
109 | else if (nnzB < 19952)
110 | { // 19952≈1e4.3
111 | pangulu_tstrf_interface_G_V1(a, calculate_X, u);
112 | }
113 | else
114 | {
115 | pangulu_tstrf_interface_G_V3(a, calculate_X, u);
116 | }
117 | }
118 | #else // ADAPTIVE_KERNEL_SELECTION
119 | pangulu_tstrf_interface_G_V1(a, calculate_X, u);
120 | #endif // ADAPTIVE_KERNEL_SELECTION
121 | cudaDeviceSynchronize();
122 |
123 | #else // ADD_GPU_MEMORY
124 |
125 | pangulu_smatrix_cuda_memcpy_complete_csr(calculate_U, u);
126 | pangulu_tstrf_interface(a, calculate_X, calculate_U);
127 | pangulu_smatrix_cuda_memcpy_value_csc(a, calculate_X);
128 | #endif // ADD_GPU_MEMORY
129 |
130 | #endif // ADAPTIVE_KERNEL_SELECTION
131 |
132 | #else // GPU_OPEN
133 |
134 | // csc
135 | tstrf_csc_csc(a->row, u->columnpointer, u->rowindex, u->value_csc, a->columnpointer, a->rowindex, a->value_csc);
136 | pangulu_pangulu_smatrix_memcpy_columnpointer_csc(save_X, a);
137 |
138 | // // csr
139 | // pangulu_transpose_pangulu_smatrix_csc_to_csr(a);
140 | // pangulu_pangulu_smatrix_memcpy_value_csc_copy_length(calculate_X, a);
141 | // pangulu_tstrf_fp64_CPU_6(a, calculate_X, u);
142 | // pangulu_transpose_pangulu_smatrix_csr_to_csc(a);
143 | // pangulu_pangulu_smatrix_memcpy_columnpointer_csc(save_X, a);
144 |
145 | #endif // GPU_OPEN
146 |
147 | #ifdef CHECK_TIME
148 | time_tstrf += pangulu_time_check_end(&GET_TIME_START);
149 | #endif
150 | }
151 |
152 | void pangulu_gessm_interface(pangulu_smatrix *a, pangulu_smatrix *save_X, pangulu_smatrix *l,
153 | pangulu_smatrix *calculate_X, pangulu_smatrix *calculate_L)
154 | {
155 | for(pangulu_int64_t i=0;innz;i++){
156 | pangulu_int64_t now_row=a->rowindex[i];
157 | calculate_time+=(l->columnpointer[now_row+1]-l->columnpointer[now_row]);
158 | }
159 | #ifdef CHECK_TIME
160 | struct timeval GET_TIME_START;
161 | pangulu_time_check_begin(&GET_TIME_START);
162 | #endif
163 |
164 | #ifdef GPU_OPEN
165 |
166 | #ifndef GPU_GESSM
167 |
168 | #ifndef CPU_OPTION
169 | pangulu_smatrix_cuda_memcpy_value_csc(a, a);
170 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a);
171 | pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(calculate_X, a);
172 | pangulu_transpose_pangulu_smatrix_csc_to_csr(l);
173 | pangulu_gessm_interface_cpu(a, l, calculate_X);
174 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a);
175 | #else
176 |
177 | /*******************Choose the best performance(性能择优)*************************/
178 | pangulu_int64_t cpu_choice2 = a->nnz;
179 | calculate_type cpu_choice3 = cpu_choice2 / ((calculate_type)nrecord * (calculate_type)cpu_choice1);
180 | pangulu_int64_t GESSM_choice_cpu = Select_Function_CPU(cpu_choice1, cpu_choice3, nrecord);
181 | pangulu_gessm_kernel_choice_cpu(a, l, calculate_X, GESSM_choice_cpu);
182 | #endif
183 | #else
184 |
185 | #ifdef ADD_GPU_MEMORY
186 | #ifdef ADAPTIVE_KERNEL_SELECTION
187 | int nnzL = l->nnz;
188 | if (nnzL < 7943)
189 | {
190 | // 7943≈1e3.9
191 | if (nnzL < 3981)
192 | { // 3981≈1e3.6
193 | pangulu_gessm_interface_C_V1(a, l, calculate_X);
194 | }
195 | else
196 | {
197 | pangulu_gessm_interface_C_V2(a, l, calculate_X);
198 | }
199 | }
200 | else
201 | {
202 | if (nnzL < 12589)
203 | {
204 | // 12589≈1e4.1
205 | pangulu_gessm_interface_G_V2(a, l, calculate_X);
206 | }
207 | else if (nnzL < 19952)
208 | { // 19952≈1e4.3
209 | pangulu_gessm_interface_g_v1(a, l, calculate_X);
210 | }
211 | else
212 | {
213 | pangulu_gessm_interface_G_V3(a, l, calculate_X);
214 | }
215 | }
216 | #else
217 | pangulu_gessm_interface_g_v1(a, l, calculate_X);
218 | #endif
219 | cudaDeviceSynchronize();
220 |
221 | #else
222 |
223 | pangulu_smatrix_cuda_memcpy_complete_csc(calculate_L, l);
224 | pangulu_gessm_interface(a, calculate_L, calculate_X);
225 | #endif
226 |
227 | #endif
228 |
229 | #else
230 | pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(calculate_X, a);
231 | pangulu_gessm_fp64_cpu_6(a, l, calculate_X);
232 | pangulu_pangulu_smatrix_memcpy_columnpointer_csc(save_X, a);
233 | // pangulu_transpose_pangulu_smatrix_csc_to_csr(a);
234 | // pangulu_pangulu_smatrix_memcpy_value_csr_copy_length(calculate_X, a);
235 | // pangulu_gessm_interface_CPU_csr(a, l, calculate_X);
236 | // pangulu_transpose_pangulu_smatrix_csr_to_csc(a);
237 | // pangulu_pangulu_smatrix_memcpy_columnpointer_csc(save_X, a);
238 | #endif
239 |
240 | #ifdef CHECK_TIME
241 | time_gessm += pangulu_time_check_end(&GET_TIME_START);
242 | #endif
243 | }
244 |
245 | void pangulu_ssssm_interface(pangulu_smatrix *a, pangulu_smatrix *l, pangulu_smatrix *u,
246 | pangulu_smatrix *calculate_L, pangulu_smatrix *calculate_U)
247 | {
248 | for(pangulu_int64_t i=0;innz;i++){
249 | pangulu_int64_t now_row=u->rowindex[i];
250 | calculate_time+=(l->columnpointer[now_row+1]-l->columnpointer[now_row]);
251 | }
252 | #ifdef CHECK_TIME
253 | struct timeval GET_TIME_START;
254 | pangulu_time_check_begin(&GET_TIME_START);
255 | #endif
256 |
257 | #ifdef GPU_OPEN
258 |
259 | #ifndef ADD_GPU_MEMORY
260 | pangulu_smatrix_cuda_memcpy_complete_csc(calculate_L, l);
261 | pangulu_smatrix_cuda_memcpy_complete_csc(calculate_U, u);
262 | pangulu_ssssm_fp64_cuda(a, calculate_L, calculate_U);
263 | #else
264 |
265 | #ifdef ADAPTIVE_KERNEL_SELECTION
266 | long long flops = 0;
267 | int n = a->row;
268 | for (int i = 0; i < n; i++)
269 | {
270 | for (int j = u->columnpointer[i]; j < u->columnpointer[i + 1]; j++)
271 | {
272 | int col_L = u->rowindex[j];
273 | flops += l->columnpointer[col_L + 1] - l->columnpointer[col_L];
274 | }
275 | }
276 | if (flops < 1e7)
277 | {
278 | if (flops < 3981071705)
279 | {
280 | // 3981071705≈1e9.6
281 | pangulu_ssssm_interface_G_V2(a, l, u);
282 | }
283 | else
284 | {
285 | pangulu_ssssm_interface_G_V1(a, l, u);
286 | }
287 | }
288 | else
289 | {
290 | if (flops < 63095)
291 | {
292 | // 63095≈1e4.8
293 | pangulu_ssssm_interface_C_V1(a, l, u);
294 | }
295 | else
296 | {
297 | pangulu_ssssm_interface_C_V2(a, l, u);
298 | }
299 | }
300 | #else
301 | pangulu_ssssm_interface_G_V1(a, l, u);
302 | #endif
303 | cudaDeviceSynchronize();
304 | #endif
305 | #else
306 |
307 | pangulu_ssssm_fp64(a, l, u);
308 | #endif
309 |
310 | #ifdef CHECK_TIME
311 | time_ssssm += pangulu_time_check_end(&GET_TIME_START);
312 | #endif
313 | }
314 |
315 | #ifdef GPU_OPEN
316 |
317 | void pangulu_addmatrix_interface(pangulu_smatrix *a,
318 | pangulu_smatrix *b)
319 | {
320 | pangulu_add_pangulu_smatrix_cuda(a, b);
321 | }
322 |
323 | #endif
324 |
325 | void pangulu_addmatrix_interface_cpu(pangulu_smatrix *a,
326 | pangulu_smatrix *b)
327 | {
328 | pangulu_add_pangulu_smatrix_cpu(a, b);
329 | }
330 |
331 | void pangulu_spmv(pangulu_smatrix *s, pangulu_vector *z, pangulu_vector *answer, int vector_number)
332 | {
333 | pangulu_spmv_cpu_xishu_csc(s, z, answer, vector_number);
334 | }
335 |
336 | void pangulu_sptrsv(pangulu_smatrix *s, pangulu_vector *answer, pangulu_vector *z, int vector_number, int32_t tag)
337 | {
338 | pangulu_sptrsv_cpu_xishu_csc(s, answer, z, vector_number, tag);
339 | }
340 |
341 | void pangulu_vector_add(pangulu_vector *answer, pangulu_vector *z)
342 | {
343 | pangulu_vector_add_cpu(answer, z);
344 | }
345 |
346 | void pangulu_vector_sub(pangulu_vector *answer, pangulu_vector *z)
347 | {
348 | pangulu_vector_sub_cpu(answer, z);
349 | }
350 |
351 | void pangulu_vector_copy(pangulu_vector *answer, pangulu_vector *z)
352 | {
353 | pangulu_vector_copy_cpu(answer, z);
354 | }
--------------------------------------------------------------------------------
/src/pangulu_mpi.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | int have_msg;
4 | void pangulu_probe_message(MPI_Status *status)
5 | {
6 | have_msg=0;
7 | do{
8 | MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &have_msg, status);
9 | if(have_msg){
10 | return;
11 | }
12 | usleep(10);
13 | }while(!have_msg);
14 | }
15 |
16 | pangulu_int64_t pangulu_bcast_n(pangulu_int64_t n, pangulu_int64_t send_rank)
17 | {
18 | MPI_Bcast(&n, 1, MPI_PANGULU_INT64_T, send_rank, MPI_COMM_WORLD);
19 | return n;
20 | }
21 |
22 | void pangulu_bcast_vector(pangulu_inblock_ptr *vector, pangulu_int32_t length, pangulu_int64_t send_rank)
23 | {
24 | pangulu_int64_t everry_length = 100000000;
25 | for (pangulu_int64_t i = 0; i < length; i += everry_length)
26 | {
27 | if ((i + everry_length) > length)
28 | {
29 | MPI_Bcast(vector + i, length - i, MPI_PANGULU_INBLOCK_PTR, send_rank, MPI_COMM_WORLD);
30 | }
31 | else
32 | {
33 | MPI_Bcast(vector + i, everry_length, MPI_PANGULU_INBLOCK_PTR, send_rank, MPI_COMM_WORLD);
34 | }
35 | }
36 | }
37 | void pangulu_bcast_vector_int64(pangulu_int64_t *vector, pangulu_int32_t length, pangulu_int64_t send_rank)
38 | {
39 | pangulu_int64_t everry_length = 100000000;
40 | for (pangulu_int64_t i = 0; i < length; i += everry_length)
41 | {
42 | if ((i + everry_length) > length)
43 | {
44 | MPI_Bcast(vector + i, length - i, MPI_PANGULU_INT64_T, send_rank, MPI_COMM_WORLD);
45 | }
46 | else
47 | {
48 | MPI_Bcast(vector + i, everry_length, MPI_PANGULU_INT64_T, send_rank, MPI_COMM_WORLD);
49 | }
50 | }
51 | }
52 | void pangulu_mpi_waitall(MPI_Request *Request, int num)
53 | {
54 | MPI_Status Status;
55 | for(int i = 0; i < num; i++)
56 | {
57 | MPI_Wait(&Request[i], &Status);
58 | }
59 | }
60 | void pangulu_isend_vector_char_wait(char *a, pangulu_int64_t n, pangulu_int64_t send_id, int signal, MPI_Request* req)
61 | {
62 | MPI_Isend(a, n, MPI_CHAR, send_id, signal, MPI_COMM_WORLD, req);
63 | }
64 |
65 | void pangulu_send_vector_int(pangulu_int64_t *a, pangulu_int64_t n, pangulu_int64_t send_id, int signal)
66 | {
67 | MPI_Send(a, n, MPI_PANGULU_INT64_T, send_id, signal, MPI_COMM_WORLD);
68 | }
69 |
70 | void pangulu_recv_vector_int(pangulu_int64_t *a, pangulu_int64_t n, pangulu_int64_t receive_id, int signal)
71 | {
72 | MPI_Status status;
73 | for (pangulu_int64_t i = 0; i < n; i++)
74 | {
75 | a[i] = 0;
76 | }
77 | MPI_Recv(a, n, MPI_PANGULU_INT64_T, receive_id, signal, MPI_COMM_WORLD, &status);
78 | }
79 |
80 | void pangulu_send_vector_char(char *a, pangulu_int64_t n, pangulu_int64_t send_id, int signal)
81 | {
82 | MPI_Send(a, n, MPI_CHAR, send_id, signal, MPI_COMM_WORLD);
83 | }
84 |
85 | void pangulu_recv_vector_char(char *a, pangulu_int64_t n, pangulu_int64_t receive_id, int signal)
86 | {
87 | MPI_Status status;
88 | for (pangulu_int64_t i = 0; i < n; i++)
89 | {
90 | a[i] = 0;
91 | }
92 | pangulu_probe_message(&status);
93 | MPI_Recv(a, n, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &status);
94 | }
95 |
96 | void pangulu_send_vector_value(calculate_type *a, pangulu_int64_t n, pangulu_int64_t send_id, int signal)
97 | {
98 | MPI_Send(a, n, MPI_VAL_TYPE, send_id, signal, MPI_COMM_WORLD);
99 | }
100 |
101 | void pangulu_recv_vector_value(calculate_type *a, pangulu_int64_t n, pangulu_int64_t receive_id, int signal)
102 | {
103 | MPI_Status status;
104 | for (pangulu_int64_t i = 0; i < n; i++)
105 | {
106 | a[i] = 0.0;
107 | }
108 | MPI_Recv(a, n, MPI_VAL_TYPE, receive_id, signal, MPI_COMM_WORLD, &status);
109 | }
110 |
111 | void pangulu_send_pangulu_smatrix_value_csr(pangulu_smatrix *s,
112 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
113 | {
114 |
115 | MPI_Send(s->value, s->nnz, MPI_VAL_TYPE, send_id, signal + 2, MPI_COMM_WORLD);
116 | }
117 | void pangulu_send_pangulu_smatrix_struct_csr(pangulu_smatrix *s,
118 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
119 | {
120 |
121 | MPI_Send(s->rowpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, send_id, signal, MPI_COMM_WORLD);
122 | MPI_Send(s->columnindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, send_id, signal + 1, MPI_COMM_WORLD);
123 | }
124 |
125 | void pangulu_send_pangulu_smatrix_complete_csr(pangulu_smatrix *s,
126 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
127 | {
128 | pangulu_send_pangulu_smatrix_struct_csr(s, send_id, signal * 3, nb);
129 | pangulu_send_pangulu_smatrix_value_csr(s, send_id, signal * 3, nb);
130 | }
131 |
132 | void pangulu_recv_pangulu_smatrix_struct_csr(pangulu_smatrix *s,
133 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
134 | {
135 |
136 | MPI_Status status;
137 | for (pangulu_int64_t i = 0; i < (s->row + 1); i++)
138 | {
139 | s->rowpointer[i] = 0;
140 | }
141 |
142 | MPI_Recv(s->rowpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, receive_id, signal, MPI_COMM_WORLD, &status);
143 | s->nnz = s->rowpointer[s->row];
144 | for (pangulu_int64_t i = 0; i < s->nnz; i++)
145 | {
146 | s->columnindex[i] = 0;
147 | }
148 | MPI_Recv(s->columnindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, receive_id, signal + 1, MPI_COMM_WORLD, &status);
149 | }
150 | void pangulu_recv_pangulu_smatrix_value_csr(pangulu_smatrix *s,
151 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
152 | {
153 | MPI_Status status;
154 | for (pangulu_int64_t i = 0; i < s->nnz; i++)
155 | {
156 | s->value[i] = (calculate_type)0.0;
157 | }
158 | MPI_Recv(s->value, s->nnz, MPI_VAL_TYPE, receive_id, signal + 2, MPI_COMM_WORLD, &status);
159 | }
160 |
161 | void pangulu_recv_pangulu_smatrix_value_csr_in_signal(pangulu_smatrix *s,
162 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
163 | {
164 | MPI_Status status;
165 | for (pangulu_int64_t i = 0; i < s->nnz; i++)
166 | {
167 | s->value[i] = (calculate_type)0.0;
168 | }
169 | MPI_Recv(s->value, s->nnz, MPI_VAL_TYPE, receive_id, signal, MPI_COMM_WORLD, &status);
170 | }
171 |
172 | void pangulu_recv_pangulu_smatrix_complete_csr(pangulu_smatrix *s,
173 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
174 | {
175 |
176 | pangulu_recv_pangulu_smatrix_struct_csr(s, receive_id, signal * 3, nb);
177 | pangulu_recv_pangulu_smatrix_value_csr(s, receive_id, signal * 3, nb);
178 | }
179 |
180 | void pangulu_recv_whole_pangulu_smatrix_csr(pangulu_smatrix *s,
181 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nnz, pangulu_int64_t nb)
182 | {
183 | #ifdef CHECK_TIME
184 | struct timeval GET_TIME_START;
185 | pangulu_time_check_begin(&GET_TIME_START);
186 | #endif
187 | pangulu_int64_t length = sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz + sizeof(calculate_type) * nnz;
188 | MPI_Status status;
189 | char *now_vector = (char *)(s->rowpointer);
190 | for (pangulu_int64_t i = 0; i < length; i++)
191 | {
192 | now_vector[i] = 0;
193 | }
194 | s->columnindex = (pangulu_inblock_idx *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1));
195 | s->value = (calculate_type *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz);
196 | MPI_Recv(now_vector, length, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &status);
197 | s->nnz = nnz;
198 | #ifdef CHECK_TIME
199 | time_receive += pangulu_time_check_end(&GET_TIME_START);
200 | #endif
201 | }
202 |
203 | void pangulu_send_pangulu_smatrix_value_csc(pangulu_smatrix *s,
204 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
205 | {
206 | MPI_Send(s->value_csc, s->nnz, MPI_VAL_TYPE, send_id, signal + 2, MPI_COMM_WORLD);
207 | }
208 |
209 | void pangulu_send_pangulu_smatrix_struct_csc(pangulu_smatrix *s,
210 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
211 | {
212 |
213 | MPI_Send(s->columnpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, send_id, signal, MPI_COMM_WORLD);
214 | MPI_Send(s->rowindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, send_id, signal + 1, MPI_COMM_WORLD);
215 | }
216 |
217 | void pangulu_send_pangulu_smatrix_complete_csc(pangulu_smatrix *s,
218 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
219 | {
220 | pangulu_send_pangulu_smatrix_struct_csc(s, send_id, signal * 3, nb);
221 | pangulu_send_pangulu_smatrix_value_csc(s, send_id, signal * 3, nb);
222 | }
223 |
224 | void pangulu_recv_pangulu_smatrix_struct_csc(pangulu_smatrix *s,
225 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
226 | {
227 |
228 | MPI_Status status;
229 | for (pangulu_int64_t i = 0; i < (s->row + 1); i++)
230 | {
231 | s->columnpointer[i] = 0;
232 | }
233 |
234 | MPI_Recv(s->columnpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, receive_id, signal, MPI_COMM_WORLD, &status);
235 | s->nnz = s->columnpointer[s->row];
236 | for (pangulu_int64_t i = 0; i < s->nnz; i++)
237 | {
238 | s->rowindex[i] = 0;
239 | }
240 | MPI_Recv(s->rowindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, receive_id, signal + 1, MPI_COMM_WORLD, &status);
241 | }
242 | void pangulu_recv_pangulu_smatrix_value_csc(pangulu_smatrix *s,
243 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
244 | {
245 |
246 | MPI_Status status;
247 | for (pangulu_int64_t i = 0; i < s->nnz; i++)
248 | {
249 | s->value_csc[i] = (calculate_type)0.0;
250 | }
251 |
252 | MPI_Recv(s->value_csc, s->nnz, MPI_VAL_TYPE, receive_id, signal + 2, MPI_COMM_WORLD, &status);
253 | }
254 |
255 | void pangulu_recv_pangulu_smatrix_value_csc_in_signal(pangulu_smatrix *s,
256 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
257 | {
258 | MPI_Status status;
259 | for (pangulu_int64_t i = 0; i < s->nnz; i++)
260 | {
261 | s->value_csc[i] = (calculate_type)0.0;
262 | }
263 | MPI_Recv(s->value_csc, s->nnz, MPI_VAL_TYPE, receive_id, signal, MPI_COMM_WORLD, &status);
264 | }
265 |
266 | void pangulu_recv_pangulu_smatrix_complete_csc(pangulu_smatrix *s,
267 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
268 | {
269 |
270 | pangulu_recv_pangulu_smatrix_struct_csc(s, receive_id, signal * 3, nb);
271 | pangulu_recv_pangulu_smatrix_value_csc(s, receive_id, signal * 3, nb);
272 | }
273 |
274 | void pangulu_recv_whole_pangulu_smatrix_csc(pangulu_smatrix *s,
275 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nnz, pangulu_int64_t nb)
276 | {
277 | #ifdef CHECK_TIME
278 | struct timeval GET_TIME_START;
279 | pangulu_time_check_begin(&GET_TIME_START);
280 | #endif
281 | pangulu_int64_t length = sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz + sizeof(calculate_type) * nnz;
282 | MPI_Status status;
283 | char *now_vector = (char *)(s->columnpointer);
284 | for (pangulu_int64_t i = 0; i < length; i++)
285 | {
286 | now_vector[i] = 0;
287 | }
288 | s->rowindex = (pangulu_inblock_idx *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1));
289 | s->value_csc = (calculate_type *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz);
290 | MPI_Recv(now_vector, length, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &status);
291 | s->nnz = nnz;
292 | #ifdef CHECK_TIME
293 | time_receive += pangulu_time_check_end(&GET_TIME_START);
294 | #endif
295 | }
296 |
297 | int pangulu_iprobe_message(MPI_Status *status)
298 | {
299 | int flag;
300 | MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, status);
301 | return flag;
302 | }
303 |
304 | void pangulu_isend_pangulu_smatrix_value_csr(pangulu_smatrix *s,
305 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
306 | {
307 |
308 | MPI_Request req;
309 | MPI_Isend(s->value, s->nnz, MPI_VAL_TYPE, send_id, signal + 2, MPI_COMM_WORLD, &req);
310 | }
311 | void pangulu_isend_pangulu_smatrix_struct_csr(pangulu_smatrix *s,
312 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
313 | {
314 | MPI_Request req;
315 | MPI_Isend(s->rowpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, send_id, signal, MPI_COMM_WORLD, &req);
316 | MPI_Isend(s->columnindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, send_id, signal + 1, MPI_COMM_WORLD, &req);
317 | }
318 |
319 | void pangulu_isend_pangulu_smatrix_complete_csr(pangulu_smatrix *s,
320 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
321 | {
322 |
323 | pangulu_isend_pangulu_smatrix_struct_csr(s, send_id, signal * 3, nb);
324 | pangulu_isend_pangulu_smatrix_value_csr(s, send_id, signal * 3, nb);
325 | }
326 |
327 | void pangulu_isend_whole_pangulu_smatrix_csr(pangulu_smatrix *s,
328 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
329 | {
330 | pangulu_int64_t nnz = s->nnz;
331 | pangulu_int64_t length = sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz + sizeof(calculate_type) * nnz;
332 | MPI_Request req;
333 | char *now_vector = (char *)(s->rowpointer);
334 | calculate_type *value = (calculate_type *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz);
335 | if (value != s->value)
336 | {
337 | printf(PANGULU_E_ISEND_CSR);
338 | pangulu_exit(1);
339 | }
340 | MPI_Isend(now_vector, length, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &req);
341 | }
342 |
343 | void pangulu_isend_pangulu_smatrix_value_csc(pangulu_smatrix *s,
344 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
345 | {
346 | MPI_Request req;
347 | MPI_Isend(s->value_csc, s->nnz, MPI_VAL_TYPE, send_id, signal + 2, MPI_COMM_WORLD, &req);
348 | }
349 |
350 | void pangulu_isend_pangulu_smatrix_value_csc_in_signal(pangulu_smatrix *s,
351 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
352 | {
353 | MPI_Request req;
354 | MPI_Isend(s->value_csc, s->nnz, MPI_VAL_TYPE, send_id, signal, MPI_COMM_WORLD, &req);
355 | }
356 |
357 | void pangulu_isend_pangulu_smatrix_struct_csc(pangulu_smatrix *s,
358 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
359 | {
360 | MPI_Request req;
361 | MPI_Isend(s->columnpointer, s->row + 1, MPI_PANGULU_INBLOCK_PTR, send_id, signal, MPI_COMM_WORLD, &req);
362 | MPI_Isend(s->rowindex, s->nnz, MPI_PANGULU_INBLOCK_IDX, send_id, signal + 1, MPI_COMM_WORLD, &req);
363 | }
364 |
365 | void pangulu_isend_pangulu_smatrix_complete_csc(pangulu_smatrix *s,
366 | pangulu_int64_t send_id, int signal, pangulu_int64_t nb)
367 | {
368 |
369 | pangulu_isend_pangulu_smatrix_struct_csc(s, send_id, signal * 3, nb);
370 | pangulu_isend_pangulu_smatrix_value_csc(s, send_id, signal * 3, nb);
371 | }
372 |
373 | void pangulu_isend_whole_pangulu_smatrix_csc(pangulu_smatrix *s,
374 | pangulu_int64_t receive_id, int signal, pangulu_int64_t nb)
375 | {
376 | pangulu_int64_t nnz = s->nnz;
377 | pangulu_int64_t length = sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz + sizeof(calculate_type) * nnz;
378 | MPI_Request req;
379 | char *now_vector = (char *)(s->columnpointer);
380 | calculate_type *value = (calculate_type *)(now_vector + sizeof(pangulu_inblock_ptr) * (nb + 1) + sizeof(pangulu_inblock_idx) * nnz);
381 | if (value != s->value_csc)
382 | {
383 | printf(PANGULU_E_ISEND_CSC);
384 | pangulu_exit(1);
385 | }
386 | MPI_Isend(now_vector, length, MPI_CHAR, receive_id, signal, MPI_COMM_WORLD, &req);
387 | }
--------------------------------------------------------------------------------
/src/pangulu_spmv_fp64.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | void pangulu_spmv_cpu_choumi(pangulu_smatrix *s, pangulu_vector *x, pangulu_vector *b)
4 | {
5 | calculate_type *value = s->value;
6 | calculate_type *bval = b->value;
7 | calculate_type *xval = x->value;
8 | pangulu_int64_t n = s->column;
9 | pangulu_int64_t m = s->row;
10 | for (pangulu_int64_t i = 0; i < m; i++)
11 | bval[i] = 0.0;
12 | for (pangulu_int64_t i = 0; i < m; i++)
13 | {
14 | for (pangulu_int64_t j = 0; j < n; j++)
15 | {
16 | bval[i] += value[i * n + j] * xval[j];
17 | }
18 | }
19 | }
20 |
21 | void pangulu_spmv_cpu_xishu(pangulu_smatrix *s, pangulu_vector *x, pangulu_vector *b, pangulu_int64_t vector_number)
22 | {
23 | pangulu_int64_t m = s->row;
24 | pangulu_inblock_ptr *csrRowPtr_tmp = s->rowpointer;
25 | pangulu_inblock_idx *csrColIdx_tmp = s->columnindex;
26 | calculate_type *csrVal_tmp = s->value;
27 | for (pangulu_int64_t vector_index = 0; vector_index < vector_number; vector_index++)
28 | {
29 | calculate_type *xval = x->value + vector_index * m;
30 | calculate_type *yval = b->value + vector_index * m;
31 | for (pangulu_int64_t i = 0; i < m; i++)
32 | {
33 | for (pangulu_int64_t j = csrRowPtr_tmp[i]; j < csrRowPtr_tmp[i + 1]; j++)
34 | {
35 | yval[i] += csrVal_tmp[j] * xval[csrColIdx_tmp[j]];
36 | }
37 | }
38 | }
39 | }
40 |
41 | void pangulu_spmv_cpu_xishu_csc(pangulu_smatrix *s, pangulu_vector *x, pangulu_vector *b, pangulu_int64_t vector_number)
42 | {
43 | pangulu_int64_t m = s->row;
44 | pangulu_inblock_ptr *csccolumnPtr_tmp = s->columnpointer;
45 | pangulu_inblock_idx *cscrowIdx_tmp = s->rowindex;
46 | calculate_type *cscVal_tmp = s->value_csc;
47 | for (pangulu_int64_t vector_index = 0; vector_index < vector_number; vector_index++)
48 | {
49 | calculate_type *xval = x->value + vector_index * m;
50 | calculate_type *yval = b->value + vector_index * m;
51 | for (pangulu_int64_t i = 0; i < m; i++)
52 | {
53 | for (pangulu_int64_t j = csccolumnPtr_tmp[i]; j < csccolumnPtr_tmp[i + 1]; j++)
54 | {
55 | pangulu_inblock_idx row = cscrowIdx_tmp[j];
56 | yval[row] += cscVal_tmp[j] * xval[i];
57 | }
58 | }
59 | }
60 | }
61 |
62 | void pangulu_vector_add_cpu(pangulu_vector *b, pangulu_vector *x)
63 | {
64 |
65 | calculate_type *xval = x->value;
66 | calculate_type *bval = b->value;
67 | pangulu_int64_t n = x->row;
68 | for (pangulu_int64_t i = 0; i < n; i++)
69 | {
70 | bval[i] += xval[i];
71 | }
72 | }
73 |
74 | void pangulu_vector_sub_cpu(pangulu_vector *b, pangulu_vector *x)
75 | {
76 |
77 | calculate_type *xval = x->value;
78 | calculate_type *bval = b->value;
79 | pangulu_int64_t n = x->row;
80 | for (pangulu_int64_t i = 0; i < n; i++)
81 | {
82 | bval[i] -= xval[i];
83 | }
84 | }
85 |
86 | void pangulu_vector_copy_cpu(pangulu_vector *b, pangulu_vector *x)
87 | {
88 |
89 | calculate_type *xval = x->value;
90 | calculate_type *bval = b->value;
91 | pangulu_int64_t n = x->row;
92 | for (pangulu_int64_t i = 0; i < n; i++)
93 | {
94 | bval[i] = xval[i];
95 | }
96 | }
--------------------------------------------------------------------------------
/src/pangulu_sptrsv_fp64.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | void pangulu_sptrsv_cpu_choumi(pangulu_smatrix *s,pangulu_vector *x,pangulu_vector *b)
4 | {
5 | calculate_type *value=s->value;
6 | calculate_type *bval=b->value;
7 | calculate_type *xval=x->value;
8 | pangulu_int64_t n=s->column;
9 | for(pangulu_int64_t i=0;irow;
24 | pangulu_inblock_ptr *csr_row_ptr_tmp=s->rowpointer;
25 | pangulu_inblock_idx *csr_col_idx_tmp=s->columnindex;
26 | calculate_type *csr_val_tmp=s->value;
27 | for(pangulu_int64_t vector_index=0;vector_indexvalue+vector_index*row;
29 | calculate_type *bval=b->value+vector_index*row;
30 | for(pangulu_int64_t i=0;icolumn;
52 | pangulu_inblock_ptr *csc_column_ptr_tmp=s->columnpointer;
53 | pangulu_inblock_idx *csc_row_idx_tmp=s->rowindex;
54 | calculate_type *cscVal_tmp = s->value_csc;
55 | if(tag==0){
56 | for(pangulu_int64_t vector_index=0;vector_indexvalue+vector_index*col;
58 | calculate_type *bval=b->value+vector_index*col;
59 | for(pangulu_int64_t i=0;iSPTRSV_ERROR)
63 | xval[i]=bval[i]/cscVal_tmp[csc_column_ptr_tmp[i]];
64 | else
65 | xval[i]=bval[i]/SPTRSV_ERROR;
66 | }
67 | else{
68 | xval[i]=0.0;
69 | continue;
70 | }
71 | for(pangulu_int64_t j=csc_column_ptr_tmp[i]+1;jvalue+vector_index*col;
82 | calculate_type *bval=b->value+vector_index*col;
83 | for(pangulu_int64_t i=col-1;i>=0;i--)
84 | {
85 | if(csc_row_idx_tmp[csc_column_ptr_tmp[i+1]-1]==i){
86 | if(fabs(cscVal_tmp[csc_column_ptr_tmp[i+1]-1])>SPTRSV_ERROR)
87 | xval[i]=bval[i]/cscVal_tmp[csc_column_ptr_tmp[i+1]-1];
88 | else
89 | xval[i]=bval[i]/SPTRSV_ERROR;
90 | }
91 | else{
92 | xval[i]=0.0;
93 | continue;
94 | }
95 | if(csc_column_ptr_tmp[i+1]>=2){ // Don't modify this to csc_column_ptr_tmp[i+1]-2>=0, because values in array csc_column_ptr_tmp are unsigned.
96 | for(pangulu_int64_t j=csc_column_ptr_tmp[i+1]-2;j>=csc_column_ptr_tmp[i];j--)
97 | {
98 | pangulu_inblock_idx row=csc_row_idx_tmp[j];
99 | bval[row]-=cscVal_tmp[j]*xval[i];
100 | }
101 | }
102 | }
103 | }
104 | }
105 |
106 | }
107 |
--------------------------------------------------------------------------------
/src/pangulu_ssssm_fp64_cuda.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | #ifdef GPU_OPEN
4 | void pangulu_ssssm_fp64_cuda(pangulu_smatrix *a,
5 | pangulu_smatrix *l,
6 | pangulu_smatrix *u)
7 | {
8 | int n = a->row;
9 | int nnz_a = a->columnpointer[n] - a->columnpointer[0];
10 | double sparsity_A = (double)nnz_a / (double)(n * n);
11 |
12 | if (sparsity_A < 0.001)
13 | {
14 | pangulu_ssssm_cuda_kernel(a->row,
15 | a->bin_rowpointer,
16 | a->cuda_bin_rowpointer,
17 | a->cuda_bin_rowindex,
18 | u->cuda_rowpointer,
19 | u->cuda_columnindex,
20 | u->cuda_value,
21 | l->cuda_rowpointer,
22 | l->cuda_columnindex,
23 | l->cuda_value,
24 | a->cuda_rowpointer,
25 | a->cuda_columnindex,
26 | a->cuda_value);
27 | }
28 | else
29 | {
30 | pangulu_ssssm_dense_cuda_kernel(a->row,
31 | a->columnpointer[a->row],
32 | u->columnpointer[u->row],
33 | l->cuda_rowpointer,
34 | l->cuda_columnindex,
35 | l->cuda_value,
36 | u->cuda_rowpointer,
37 | u->cuda_columnindex,
38 | u->cuda_value,
39 | a->cuda_rowpointer,
40 | a->cuda_columnindex,
41 | a->cuda_value);
42 | }
43 | }
44 |
45 | void pangulu_ssssm_interface_G_V1(pangulu_smatrix *a,
46 | pangulu_smatrix *l,
47 | pangulu_smatrix *u)
48 | {
49 | pangulu_ssssm_cuda_kernel(a->row,
50 | a->bin_rowpointer,
51 | a->cuda_bin_rowpointer,
52 | a->cuda_bin_rowindex,
53 | u->cuda_rowpointer,
54 | u->cuda_columnindex,
55 | u->cuda_value,
56 | l->cuda_rowpointer,
57 | l->cuda_columnindex,
58 | l->cuda_value,
59 | a->cuda_rowpointer,
60 | a->cuda_columnindex,
61 | a->cuda_value);
62 | }
63 | void pangulu_ssssm_interface_G_V2(pangulu_smatrix *a,
64 | pangulu_smatrix *l,
65 | pangulu_smatrix *u)
66 | {
67 | pangulu_ssssm_dense_cuda_kernel(a->row,
68 | a->columnpointer[a->row],
69 | u->columnpointer[u->row],
70 | l->cuda_rowpointer,
71 | l->cuda_columnindex,
72 | l->cuda_value,
73 | u->cuda_rowpointer,
74 | u->cuda_columnindex,
75 | u->cuda_value,
76 | a->cuda_rowpointer,
77 | a->cuda_columnindex,
78 | a->cuda_value);
79 | }
80 | #endif
--------------------------------------------------------------------------------
/src/pangulu_thread.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | void pangulu_mutex_init(pthread_mutex_t *mutex)
4 | {
5 | pthread_mutex_init((mutex), NULL);
6 | }
7 |
8 | void pangulu_bsem_init(bsem *bsem_p, pangulu_int64_t value)
9 | {
10 | if (value < 0 || value > 1)
11 | {
12 | exit(1);
13 | }
14 | bsem_p->mutex = (pthread_mutex_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pthread_mutex_t));
15 | bsem_p->cond = (pthread_cond_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pthread_cond_t));
16 | pangulu_mutex_init((bsem_p->mutex));
17 | pthread_cond_init((bsem_p->cond), NULL);
18 | bsem_p->v = value;
19 | }
20 |
21 | bsem *pangulu_bsem_destory(bsem *bsem_p)
22 | {
23 | pangulu_free(__FILE__, __LINE__, bsem_p->mutex);
24 | bsem_p->mutex = NULL;
25 | pangulu_free(__FILE__, __LINE__, bsem_p->cond);
26 | bsem_p->cond = NULL;
27 | bsem_p->v = 0;
28 | pangulu_free(__FILE__, __LINE__, bsem_p);
29 | return NULL;
30 | }
31 |
32 | void pangulu_bsem_post(pangulu_heap *heap)
33 | {
34 | bsem *bsem_p = heap->heap_bsem;
35 | pthread_mutex_lock(bsem_p->mutex);
36 | pangulu_int64_t flag = heap_empty(heap);
37 | if (((bsem_p->v == 0) && (flag == 0)))
38 | {
39 | bsem_p->v = 1;
40 | // get bsem p
41 | pthread_cond_signal(bsem_p->cond);
42 | // send
43 | }
44 | pthread_mutex_unlock(bsem_p->mutex);
45 | }
46 |
47 | pangulu_int64_t pangulu_bsem_wait(pangulu_heap *heap)
48 | {
49 | bsem *heap_bsem = heap->heap_bsem;
50 | pthread_mutex_t *heap_mutex = heap_bsem->mutex;
51 |
52 | pthread_mutex_lock(heap_mutex);
53 | if (heap_empty(heap) == 1)
54 | {
55 | heap_bsem->v = 0;
56 | while (heap_bsem->v == 0)
57 | {
58 | // wait
59 | pthread_cond_wait(heap_bsem->cond, heap_bsem->mutex);
60 | }
61 | }
62 |
63 | pangulu_int64_t compare_flag = pangulu_heap_delete(heap);
64 | heap_bsem->v = 1;
65 | pthread_mutex_unlock(heap_mutex);
66 | return compare_flag;
67 | }
68 |
69 | void pangulu_bsem_stop(pangulu_heap *heap)
70 | {
71 | bsem *bsem_p = heap->heap_bsem;
72 | pthread_mutex_lock(bsem_p->mutex);
73 | bsem_p->v = 0;
74 | pthread_mutex_unlock(bsem_p->mutex);
75 | }
76 |
77 | void pangulu_bsem_synchronize(bsem *bsem_p)
78 | {
79 | pthread_mutex_lock((bsem_p->mutex));
80 | pangulu_int64_t v = bsem_p->v;
81 | if (v == 1)
82 | {
83 | bsem_p->v = 0;
84 | pthread_cond_signal(bsem_p->cond);
85 | pthread_mutex_unlock(bsem_p->mutex);
86 | }
87 | else
88 | {
89 | bsem_p->v = 1;
90 | while (bsem_p->v == 1)
91 | {
92 | pthread_cond_wait((bsem_p->cond), (bsem_p->mutex));
93 | bsem_p->v = 0;
94 | }
95 | bsem_p->v = 0;
96 | pthread_mutex_unlock(bsem_p->mutex);
97 | }
98 | }
--------------------------------------------------------------------------------
/src/pangulu_time.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 |
3 | #ifdef CHECK_TIME
4 | void pangulu_time_check_begin(struct timeval *GET_TIME_START)
5 | {
6 | gettimeofday((GET_TIME_START), NULL);
7 | }
8 |
9 | double pangulu_time_check_end(struct timeval *GET_TIME_START)
10 | {
11 | struct timeval GET_TIME_END;
12 | gettimeofday((&GET_TIME_END), NULL);
13 | return (((GET_TIME_END.tv_sec - GET_TIME_START->tv_sec) * 1000.0 + (GET_TIME_END.tv_usec - GET_TIME_START->tv_usec) / 1000.0))/1000.0;
14 | }
15 |
16 | void pangulu_time_init()
17 | {
18 | time_transpose = 0.0;
19 | time_isend = 0.0;
20 | time_receive = 0.0;
21 | time_getrf = 0.0;
22 | time_tstrf = 0.0;
23 | time_gessm = 0.0;
24 | time_gessm_sparse = 0.0;
25 | time_gessm_dense = 0.0;
26 | time_ssssm = 0.0;
27 | time_cuda_memcpy = 0.0;
28 | time_wait = 0.0;
29 | return;
30 | }
31 |
32 | void pangulu_time_simple_output(pangulu_int64_t rank)
33 | {
34 | printf( FMT_PANGULU_INT64_T "\t" "%.5lf\t%.5lf\t%.5lf\t%.5lf\t%.5lf\t%.5lf\t%.5lf\n",
35 | rank,
36 | calculate_time_wait,
37 | time_getrf,
38 | time_tstrf,
39 | time_gessm,
40 | time_ssssm, time_gessm + time_getrf + time_tstrf + time_ssssm, time_cuda_memcpy);
41 | }
42 | #endif // CHECK_TIME
--------------------------------------------------------------------------------
/src/pangulu_tstrf_fp64.c:
--------------------------------------------------------------------------------
1 | #include "pangulu_common.h"
2 | void pangulu_tstrf_fp64_cpu_1(pangulu_smatrix *a,
3 | pangulu_smatrix *x,
4 | pangulu_smatrix *u)
5 | {
6 | pangulu_inblock_ptr *x_colpointer = a->columnpointer;
7 | pangulu_inblock_idx *x_rowindex = a->rowindex;
8 | calculate_type *x_value = a->value_csc;
9 | pangulu_inblock_ptr *u_rowpointer = u->rowpointer;
10 | pangulu_inblock_idx *u_columnindex = u->columnindex;
11 | calculate_type *u_value = u->value;
12 | pangulu_inblock_ptr *a_colpointer = a->columnpointer;
13 | pangulu_inblock_idx *a_rowindex = a->rowindex;
14 | calculate_type *a_value = x->value_csc;
15 | pangulu_int64_t n = a->row;
16 |
17 | for (pangulu_int64_t i = 0; i < a->nnz; i++)
18 | {
19 | x_value[i] = 0.0;
20 | }
21 | for (pangulu_int64_t i = 0; i < n; i++)
22 | {
23 | calculate_type t = u_value[u_rowpointer[i]];
24 | if (fabs(t) < ERROR)
25 | {
26 | t = ERROR;
27 | }
28 | for (pangulu_int64_t k = a_colpointer[i]; k < a_colpointer[i + 1]; k++)
29 | {
30 | x_value[k] = a_value[k] / t;
31 | }
32 | // update Value
33 | if (a_colpointer[i] != a_colpointer[i + 1])
34 | {
35 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
36 | for (pangulu_int64_t k = u_rowpointer[i]; k < u_rowpointer[i + 1]; k++)
37 | {
38 | pangulu_int64_t p = x_colpointer[i];
39 | for (pangulu_int64_t s = a_colpointer[u_columnindex[k]]; s < a_colpointer[u_columnindex[k] + 1]; s++)
40 | {
41 | if (x_rowindex[p] == a_rowindex[s])
42 | {
43 | a_value[s] -= x_value[p] * u_value[k];
44 | p++;
45 | }
46 | else
47 | {
48 | continue;
49 | }
50 | }
51 | }
52 | }
53 | }
54 | }
55 |
56 | void pangulu_tstrf_fp64_cpu_2(pangulu_smatrix *a,
57 | pangulu_smatrix *x,
58 | pangulu_smatrix *u)
59 | {
60 |
61 | pangulu_inblock_ptr *A_columnpointer = a->rowpointer;
62 | pangulu_inblock_idx *A_rowidx = a->columnindex;
63 |
64 | calculate_type *a_value = a->value;
65 |
66 | pangulu_inblock_ptr *L_rowpointer = u->columnpointer;
67 |
68 | pangulu_inblock_ptr *L_colpointer = u->rowpointer;
69 | pangulu_inblock_idx *L_rowindex = u->columnindex;
70 | calculate_type *L_value = u->value;
71 |
72 | pangulu_int64_t n = a->row;
73 |
74 | pangulu_int64_t *Spointer = (pangulu_int64_t *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_int64_t) * (n + 1));
75 | memset(Spointer, 0, sizeof(pangulu_int64_t) * (n + 1));
76 | int rhs = 0;
77 | for (pangulu_int64_t i = 0; i < n; i++)
78 | {
79 | if (A_columnpointer[i] != A_columnpointer[i + 1])
80 | {
81 | Spointer[rhs] = i;
82 | rhs++;
83 | }
84 | }
85 |
86 | calculate_type *C_b = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * rhs);
87 | calculate_type *D_x = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * rhs);
88 |
89 | memset(C_b, 0.0, sizeof(calculate_type) * n * rhs);
90 | memset(D_x, 0.0, sizeof(calculate_type) * n * rhs);
91 |
92 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
93 | for (int i = 0; i < rhs; i++)
94 | {
95 | int index = Spointer[i];
96 | for (int j = A_columnpointer[index]; j < A_columnpointer[index + 1]; j++)
97 | {
98 | C_b[i * n + A_rowidx[j]] = a_value[j];
99 | }
100 | }
101 |
102 | int nlevel = 0;
103 | int *levelPtr = (int *)pangulu_malloc(__FILE__, __LINE__, sizeof(int) * (n + 1));
104 | int *levelItem = (int *)pangulu_malloc(__FILE__, __LINE__, sizeof(int) * n);
105 | findlevel(L_colpointer, L_rowindex, L_rowpointer, n, &nlevel, levelPtr, levelItem);
106 |
107 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
108 | for (int i = 0; i < rhs; i++)
109 | {
110 | for (int li = 0; li < nlevel; li++)
111 | {
112 |
113 | for (int ri = levelPtr[li]; ri < levelPtr[li + 1]; ri++)
114 | {
115 | C_b[i * n + levelItem[ri]] /= L_value[L_colpointer[levelItem[ri]]];
116 | for (int j = L_colpointer[levelItem[ri]] + 1; j < L_colpointer[levelItem[ri] + 1]; j++)
117 | {
118 | C_b[i * n + L_rowindex[j]] -= L_value[j] * C_b[i * n + levelItem[ri]];
119 | }
120 | }
121 | }
122 | }
123 |
124 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
125 | for (int i = 0; i < rhs; i++)
126 | {
127 | int index = Spointer[i];
128 | for (int j = A_columnpointer[index]; j < A_columnpointer[index + 1]; j++)
129 | {
130 | a_value[j] = C_b[i * n + A_rowidx[j]];
131 | }
132 | }
133 |
134 | pangulu_free(__FILE__, __LINE__, Spointer);
135 | pangulu_free(__FILE__, __LINE__, C_b);
136 | pangulu_free(__FILE__, __LINE__, D_x);
137 | }
138 | void pangulu_tstrf_fp64_cpu_3(pangulu_smatrix *a,
139 | pangulu_smatrix *x,
140 | pangulu_smatrix *u)
141 | {
142 |
143 | pangulu_inblock_ptr *A_columnpointer = a->rowpointer;
144 | pangulu_inblock_idx *A_rowidx = a->columnindex;
145 |
146 | calculate_type *a_value = a->value;
147 |
148 | pangulu_inblock_ptr *L_columnpointer = u->rowpointer;
149 | pangulu_inblock_idx *L_rowidx = u->columnindex;
150 | calculate_type *L_value = u->value;
151 |
152 | pangulu_int64_t n = a->row;
153 |
154 | calculate_type *C_b = (calculate_type *)pangulu_malloc(__FILE__, __LINE__, sizeof(calculate_type) * n * n);
155 |
156 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
157 | for (int i = 0; i < n; i++) // jth column of u
158 | {
159 | for (int j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++)
160 | {
161 | int idx = A_rowidx[j];
162 | C_b[i * n + idx] = a_value[j]; // tranform csr to dense,only value
163 | }
164 | }
165 |
166 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
167 | for (pangulu_int64_t i = 0; i < n; i++)
168 | {
169 | for (pangulu_int64_t j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++)
170 | {
171 | C_b[i * n + A_rowidx[j]] /= L_value[L_columnpointer[A_rowidx[j]]];
172 | pangulu_inblock_idx idx = A_rowidx[j];
173 | for (pangulu_int64_t k = L_columnpointer[idx] + 1; k < L_columnpointer[idx + 1]; k++)
174 | {
175 | C_b[i * n + L_rowidx[k]] -= L_value[k] * C_b[i * n + A_rowidx[j]];
176 | }
177 | }
178 | }
179 |
180 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
181 | for (int i = 0; i < n; i++)
182 | {
183 | for (int j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++)
184 | {
185 | int idx = A_rowidx[j];
186 | a_value[j] = C_b[i * n + idx];
187 | }
188 | }
189 | pangulu_free(__FILE__, __LINE__, C_b);
190 | }
191 | void pangulu_tstrf_fp64_cpu_4(pangulu_smatrix *a,
192 | pangulu_smatrix *x,
193 | pangulu_smatrix *u)
194 | {
195 |
196 | pangulu_inblock_ptr *A_columnpointer = a->rowpointer;
197 | pangulu_inblock_idx *A_rowidx = a->columnindex;
198 |
199 | calculate_type *a_value = a->value;
200 |
201 | pangulu_inblock_ptr *L_columnpointer = u->rowpointer;
202 | pangulu_inblock_idx *L_rowidx = u->columnindex;
203 | calculate_type *L_value = u->value;
204 |
205 | pangulu_int64_t n = a->row;
206 |
207 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
208 | for (pangulu_int64_t i = 0; i < n; i++)
209 | {
210 | for (pangulu_int64_t j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++)
211 | {
212 | pangulu_inblock_idx idx = A_rowidx[j];
213 | a_value[j] /= L_value[L_columnpointer[idx]];
214 | for (pangulu_int64_t k = L_columnpointer[idx] + 1, p = j + 1; k < L_columnpointer[idx + 1] && p < A_columnpointer[i + 1]; k++, p++)
215 | {
216 | if (L_rowidx[k] == A_rowidx[p])
217 | {
218 | a_value[p] -= L_value[k] * a_value[j];
219 | }
220 | else
221 | {
222 | k--;
223 | }
224 | }
225 | }
226 | }
227 | }
228 | void pangulu_tstrf_fp64_cpu_5(pangulu_smatrix *a,
229 | pangulu_smatrix *x,
230 | pangulu_smatrix *u)
231 | {
232 |
233 | pangulu_inblock_ptr *A_rowpointer = a->columnpointer;
234 | pangulu_inblock_idx *A_colindex = a->rowindex;
235 | calculate_type *a_value = x->value_csc;
236 |
237 | pangulu_inblock_ptr *L_colpointer = u->rowpointer;
238 | pangulu_inblock_idx *L_rowindex = u->columnindex;
239 | calculate_type *L_value = u->value;
240 |
241 | pangulu_inblock_ptr *X_rowpointer = a->columnpointer;
242 | pangulu_inblock_idx *X_colindex = a->rowindex;
243 | calculate_type *x_value = a->value_csc;
244 |
245 | pangulu_int64_t n = a->row;
246 |
247 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
248 | for (int i = 0; i < n; i++) // jth column of u
249 | {
250 | for (int j = A_rowpointer[i]; j < A_rowpointer[i + 1]; j++)
251 | {
252 | pangulu_inblock_idx idx = A_colindex[j];
253 | temp_a_value[i * n + idx] = a_value[j]; // tranform csr to dense,only value
254 | }
255 | }
256 |
257 | for (pangulu_int64_t i = 0; i < n; i++)
258 | {
259 | // x get value from a
260 | for (pangulu_int64_t k = X_rowpointer[i]; k < X_rowpointer[i + 1]; k++)
261 | {
262 | temp_a_value[i * n + X_colindex[k]] /= L_value[L_colpointer[i]];
263 | x_value[k] = temp_a_value[i * n + X_colindex[k]];
264 | }
265 | // update Value
266 | if (X_rowpointer[i] != X_rowpointer[i + 1])
267 | {
268 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
269 | for (pangulu_int64_t j = L_colpointer[i] + 1; j < L_colpointer[i + 1]; j++)
270 | {
271 | pangulu_inblock_idx idx1 = L_rowindex[j];
272 |
273 | for (pangulu_int64_t p = X_rowpointer[i]; p < X_rowpointer[i + 1]; p++)
274 | {
275 |
276 | pangulu_inblock_idx idx2 = A_colindex[p];
277 | temp_a_value[idx1 * n + idx2] -= L_value[j] * temp_a_value[i * n + idx2];
278 | }
279 | }
280 | }
281 | }
282 | }
283 | void pangulu_tstrf_fp64_cpu_6(pangulu_smatrix *a,
284 | pangulu_smatrix *x,
285 | pangulu_smatrix *u)
286 | {
287 |
288 | pangulu_inblock_ptr *A_columnpointer = a->rowpointer;
289 | pangulu_inblock_idx *A_rowidx = a->columnindex;
290 |
291 | calculate_type *a_value = a->value;
292 |
293 | pangulu_inblock_ptr *L_columnpointer = u->rowpointer;
294 | pangulu_inblock_idx *L_rowidx = u->columnindex;
295 | calculate_type *L_value = u->value;
296 |
297 | pangulu_inblock_ptr n = a->row;
298 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
299 | for (int i = 0; i < n; i++) // jth column of u
300 | {
301 | for (int j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++)
302 | {
303 | int idx = A_rowidx[j];
304 | temp_a_value[i * n + idx] = a_value[j]; // tranform csr to dense,only value
305 | }
306 | }
307 |
308 | #pragma omp parallel for num_threads(pangu_omp_num_threads)
309 | for (pangulu_int64_t i = 0; i < n; i++)
310 | {
311 | for (pangulu_int64_t j = A_columnpointer[i]; j < A_columnpointer[i + 1]; j++)
312 | {
313 | pangulu_inblock_idx idx = A_rowidx[j];
314 |
315 | a_value[j] = temp_a_value[i * n + idx] / L_value[L_columnpointer[idx]];
316 | for (pangulu_int64_t k = L_columnpointer[idx] + 1; k < L_columnpointer[idx + 1]; k++)
317 | {
318 | temp_a_value[i * n + L_rowidx[k]] -= L_value[k] * a_value[j];
319 | }
320 | }
321 | }
322 | }
323 | void pangulu_tstrf_interface_cpu_csr(pangulu_smatrix *a,
324 | pangulu_smatrix *x,
325 | pangulu_smatrix *u)
326 | {
327 |
328 | #ifdef OUTPUT_MATRICES
329 | char out_name_B[512];
330 | char out_name_U[512];
331 | sprintf(out_name_B, "%s/%s/%d%s", OUTPUT_FILE, "tstrf", tstrf_number, "_tstrf_B.cbd");
332 | sprintf(out_name_U, "%s/%s/%d%s", OUTPUT_FILE, "tstrf", tstrf_number, "_tstrf_U.cbd");
333 | pangulu_binary_write_csc_pangulu_smatrix(a, out_name_B);
334 | pangulu_binary_write_csc_pangulu_smatrix(u, out_name_U);
335 | tstrf_number++;
336 | #endif
337 | pangulu_tstrf_fp64_cpu_1(a, x, u);
338 | }
339 |
340 | void pangulu_tstrf_interface_cpu_csc(pangulu_smatrix *a,
341 | pangulu_smatrix *x,
342 | pangulu_smatrix *u)
343 | {
344 | pangulu_tstrf_fp64_cpu_6(a, x, u);
345 | }
346 |
347 | void pangulu_tstrf_interface_c_v1(pangulu_smatrix *a,
348 | pangulu_smatrix *x,
349 | pangulu_smatrix *u)
350 | {
351 | #ifdef GPU_OPEN
352 | pangulu_smatrix_cuda_memcpy_value_csc(a, a);
353 | #endif
354 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a);
355 | pangulu_pangulu_smatrix_memcpy_value_csc_copy_length(x, a);
356 | pangulu_tstrf_fp64_cpu_4(a, x, u);
357 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a);
358 | #ifdef GPU_OPEN
359 | pangulu_smatrix_cuda_memcpy_to_device_value_csc(a, a);
360 | #endif
361 | }
362 | void pangulu_tstrf_interface_c_v2(pangulu_smatrix *a,
363 | pangulu_smatrix *x,
364 | pangulu_smatrix *u)
365 | {
366 | #ifdef GPU_OPEN
367 | pangulu_smatrix_cuda_memcpy_value_csc(a, a);
368 | #endif
369 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a);
370 | pangulu_pangulu_smatrix_memcpy_value_csc_copy_length(x, a);
371 | pangulu_tstrf_fp64_cpu_6(a, x, u);
372 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a);
373 | #ifdef GPU_OPEN
374 | pangulu_smatrix_cuda_memcpy_to_device_value_csc(a, a);
375 | #endif
376 | }
377 |
378 | pangulu_int64_t TEMP_calculate_type_len = 0;
379 | calculate_type* TEMP_calculate_type = NULL;
380 | pangulu_int64_t TEMP_pangulu_inblock_ptr_len = 0;
381 | pangulu_inblock_ptr* TEMP_pangulu_inblock_ptr = NULL;
382 |
383 | int tstrf_csc_csc(
384 | pangulu_inblock_idx n,
385 | pangulu_inblock_ptr* U_colptr,
386 | pangulu_inblock_idx* U_rowidx,
387 | calculate_type* u_value,
388 | pangulu_inblock_ptr* A_colptr,
389 | pangulu_inblock_idx* A_rowidx,
390 | calculate_type* a_value
391 | ){
392 | if(TEMP_calculate_type_len < n){
393 | calculate_type* TEMP_calculate_type_old = TEMP_calculate_type;
394 | TEMP_calculate_type = (calculate_type*)pangulu_realloc(__FILE__, __LINE__, TEMP_calculate_type, n*sizeof(calculate_type));
395 | if(TEMP_calculate_type == NULL){
396 | pangulu_free(__FILE__, __LINE__, TEMP_calculate_type_old);
397 | TEMP_calculate_type_len = 0;
398 | printf("[ERROR] kernel error : CPU sparse tstrf : realloc TEMP_calculate_type failed.\n");
399 | return 1;
400 | }
401 | TEMP_calculate_type_len = n;
402 | }
403 |
404 | if(TEMP_pangulu_inblock_ptr_len < n){
405 | pangulu_inblock_ptr* TEMP_int64_old = TEMP_pangulu_inblock_ptr;
406 | TEMP_pangulu_inblock_ptr = (pangulu_inblock_ptr*)pangulu_realloc(__FILE__, __LINE__, TEMP_pangulu_inblock_ptr, n*sizeof(pangulu_inblock_ptr));
407 | if(TEMP_pangulu_inblock_ptr == NULL){
408 | pangulu_free(__FILE__, __LINE__, TEMP_int64_old);
409 | TEMP_pangulu_inblock_ptr_len = 0;
410 | printf("[ERROR] kernel error : CPU sparse tstrf : realloc TEMP_int64 failed.\n");
411 | return 2;
412 | }
413 | TEMP_pangulu_inblock_ptr_len = n;
414 | }
415 |
416 | pangulu_inblock_ptr* U_next_array = TEMP_pangulu_inblock_ptr;
417 | calculate_type* A_major_column = TEMP_calculate_type;
418 | memcpy(U_next_array, U_colptr, sizeof(pangulu_inblock_ptr) * n);
419 | for(pangulu_int64_t i=0;i= U_colptr[k+1]/*U_next_array[k]跑到了下一列*/ || U_rowidx[U_next_array[k]] > i/*U的第k列中,下一个要访问的元素的行号大于A当前主列号i*/){
429 | continue;
430 | }
431 | for(pangulu_int64_t j=A_colptr[k];jrow;
9 | pangulu_int64_t nnzU = u->nnz;
10 | pangulu_int64_t nnzA = a->nnz;
11 |
12 | /*********************************u****************************************/
13 | int *d_graphindegree = u->d_graphindegree;
14 | cudaMemcpy(d_graphindegree, u->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice);
15 | int *d_id_extractor = u->d_id_extractor;
16 | cudaMemset(d_id_extractor, 0, sizeof(int));
17 | calculate_type *d_left_sum = a->d_left_sum;
18 | cudaMemset(d_left_sum, 0, nnzA * sizeof(calculate_type));
19 | /*****************************************************************************/
20 | pangulu_tstrf_cuda_kernel_v8(n,
21 | nnzU,
22 | d_graphindegree,
23 | d_id_extractor,
24 | d_left_sum,
25 | u->cuda_rowpointer,
26 | u->cuda_columnindex,
27 | u->cuda_value,
28 | a->cuda_rowpointer,
29 | a->cuda_columnindex,
30 | x->cuda_value,
31 | a->cuda_rowpointer,
32 | a->cuda_columnindex,
33 | a->cuda_value);
34 | }
35 |
36 | void pangulu_tstrf_fp64_cuda_v9(pangulu_smatrix *a,
37 | pangulu_smatrix *x,
38 | pangulu_smatrix *u)
39 | {
40 | pangulu_int64_t n = a->row;
41 | pangulu_int64_t nnzU = u->nnz;
42 | pangulu_int64_t nnzA = a->nnz;
43 |
44 | int *d_graphindegree = u->d_graphindegree;
45 | cudaMemcpy(d_graphindegree, u->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice);
46 | int *d_id_extractor = u->d_id_extractor;
47 | cudaMemset(d_id_extractor, 0, sizeof(int));
48 |
49 | int *d_while_profiler;
50 | cudaMalloc((void **)&d_while_profiler, sizeof(int) * n);
51 | cudaMemset(d_while_profiler, 0, sizeof(int) * n);
52 | pangulu_inblock_ptr *Spointer = (pangulu_inblock_ptr *)pangulu_malloc(__FILE__, __LINE__, sizeof(pangulu_inblock_ptr) * (n + 1));
53 | memset(Spointer, 0, sizeof(pangulu_int64_t) * (n + 1));
54 | pangulu_int64_t rhs = 0;
55 | for (int i = 0; i < n; i++)
56 | {
57 | if (a->rowpointer[i] != a->rowpointer[i + 1])
58 | {
59 | Spointer[rhs] = i;
60 | rhs++;
61 | }
62 | }
63 | calculate_type *d_left_sum;
64 | cudaMalloc((void **)&d_left_sum, n * rhs * sizeof(calculate_type));
65 | cudaMemset(d_left_sum, 0, n * rhs * sizeof(calculate_type));
66 |
67 | calculate_type *d_x, *d_b;
68 | cudaMalloc((void **)&d_x, n * rhs * sizeof(calculate_type));
69 | cudaMalloc((void **)&d_b, n * rhs * sizeof(calculate_type));
70 | cudaMemset(d_x, 0, n * rhs * sizeof(calculate_type));
71 | cudaMemset(d_b, 0, n * rhs * sizeof(calculate_type));
72 |
73 | pangulu_inblock_ptr *d_Spointer;
74 | cudaMalloc((void **)&d_Spointer, sizeof(pangulu_inblock_ptr) * (n + 1));
75 | cudaMemset(d_Spointer, 0, sizeof(pangulu_inblock_ptr) * (n + 1));
76 | cudaMemcpy(d_Spointer, Spointer, sizeof(pangulu_inblock_ptr) * (n + 1), cudaMemcpyHostToDevice);
77 |
78 | pangulu_gessm_cuda_kernel_v9(n,
79 | nnzU,
80 | rhs,
81 | nnzA,
82 | d_Spointer,
83 | d_graphindegree,
84 | d_id_extractor,
85 | d_while_profiler,
86 | u->cuda_rowpointer,
87 | u->cuda_columnindex,
88 | u->cuda_value,
89 | a->cuda_rowpointer,
90 | a->cuda_columnindex,
91 | x->cuda_value,
92 |
93 | a->cuda_rowpointer,
94 | a->cuda_columnindex,
95 | a->cuda_value,
96 | d_left_sum,
97 | d_x,
98 | d_b);
99 |
100 | cudaFree(d_x);
101 | cudaFree(d_b);
102 | cudaFree(d_left_sum);
103 | cudaFree(d_while_profiler);
104 | }
105 |
106 | void pangulu_tstrf_fp64_cuda_v7(pangulu_smatrix *a,
107 | pangulu_smatrix *x,
108 | pangulu_smatrix *u)
109 | {
110 | pangulu_int64_t n = a->row;
111 | pangulu_int64_t nnzU = u->nnz;
112 | pangulu_tstrf_cuda_kernel_v7(n,
113 | nnzU,
114 | u->cuda_rowpointer,
115 | u->cuda_columnindex,
116 | u->cuda_value,
117 | a->cuda_rowpointer,
118 | a->cuda_columnindex,
119 | x->cuda_value,
120 | a->cuda_rowpointer,
121 | a->cuda_columnindex,
122 | a->cuda_value);
123 | }
124 |
125 | void pangulu_tstrf_fp64_cuda_v10(pangulu_smatrix *a,
126 | pangulu_smatrix *x,
127 | pangulu_smatrix *u)
128 | {
129 | pangulu_int64_t n = a->row;
130 | pangulu_int64_t nnzU = u->nnz;
131 | pangulu_tstrf_cuda_kernel_v10(n,
132 | nnzU,
133 | u->cuda_rowpointer,
134 | u->cuda_columnindex,
135 | u->cuda_value,
136 | a->cuda_rowpointer,
137 | a->cuda_columnindex,
138 | x->cuda_value,
139 | a->cuda_rowpointer,
140 | a->cuda_columnindex,
141 | a->cuda_value);
142 | }
143 |
144 | void pangulu_tstrf_fp64_cuda_v11(pangulu_smatrix *a,
145 | pangulu_smatrix *x,
146 | pangulu_smatrix *u)
147 | {
148 | pangulu_int64_t n = a->row;
149 | pangulu_int64_t nnzU = u->nnz;
150 | pangulu_int64_t nnzA = a->nnz;
151 |
152 | /*********************************u****************************************/
153 | int *d_graphindegree = u->d_graphindegree;
154 | cudaMemcpy(d_graphindegree, u->graphindegree, n * sizeof(int), cudaMemcpyHostToDevice);
155 | int *d_id_extractor = u->d_id_extractor;
156 | cudaMemset(d_id_extractor, 0, sizeof(int));
157 | calculate_type *d_left_sum = a->d_left_sum;
158 | cudaMemset(d_left_sum, 0, nnzA * sizeof(calculate_type));
159 | /*****************************************************************************/
160 | pangulu_tstrf_cuda_kernel_v11(n,
161 | nnzU,
162 | d_graphindegree,
163 | d_id_extractor,
164 | d_left_sum,
165 | u->cuda_rowpointer,
166 | u->cuda_columnindex,
167 | u->cuda_value,
168 | a->cuda_rowpointer,
169 | a->cuda_columnindex,
170 | x->cuda_value,
171 | a->cuda_rowpointer,
172 | a->cuda_columnindex,
173 | a->cuda_value);
174 | }
175 |
176 | void pangulu_tstrf_interface_G_V1(pangulu_smatrix *a,
177 | pangulu_smatrix *x,
178 | pangulu_smatrix *u)
179 | {
180 | pangulu_smatrix_cuda_memcpy_value_csc(a, a);
181 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a);
182 | pangulu_smatrix_cuda_memcpy_complete_csr(a, a);
183 | pangulu_tstrf_fp64_cuda_v7(a, x, u);
184 | pangulu_smatrix_cuda_memcpy_value_csr(a, x);
185 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a);
186 | }
187 | void pangulu_tstrf_interface_G_V2(pangulu_smatrix *a,
188 | pangulu_smatrix *x,
189 | pangulu_smatrix *u)
190 | {
191 | pangulu_tstrf_fp64_cuda_v8(a, x, u);
192 | pangulu_smatrix_cuda_memcpy_value_csc(a, x);
193 | }
194 | void pangulu_tstrf_interface_G_V3(pangulu_smatrix *a,
195 | pangulu_smatrix *x,
196 | pangulu_smatrix *u)
197 | {
198 | pangulu_smatrix_cuda_memcpy_value_csc(a, a);
199 | pangulu_transpose_pangulu_smatrix_csc_to_csr(a);
200 | pangulu_smatrix_cuda_memcpy_complete_csr(a, a);
201 | pangulu_tstrf_fp64_cuda_v10(a, x, u);
202 | pangulu_smatrix_cuda_memcpy_value_csr(a, x);
203 | pangulu_transpose_pangulu_smatrix_csr_to_csc(a);
204 | }
205 | #endif
--------------------------------------------------------------------------------
/src/platforms/02_GPU/01_CUDA/000_CUDA/Makefile:
--------------------------------------------------------------------------------
1 | include ../../../../../make.inc
2 | pangulu_0201000.o:pangulu_cuda.cu
3 | $(NVCC) $(NVCCFLAGS) $(METIS_INC) -Xcompiler -fPIC -c $< -o $@
4 | mv $@ ../../../../../lib
--------------------------------------------------------------------------------
/src/platforms/platform_list.csv:
--------------------------------------------------------------------------------
1 | 0201000,GPU_CUDA
--------------------------------------------------------------------------------