├── documents └── figs │ ├── Lorentz.png │ └── vanderpol.png ├── Readme.md ├── LICENSE └── grkf45 ├── grkf45_1d.py ├── grkf45_2d.py ├── grkf45_3d.py └── grkf45.py /documents/figs/Lorentz.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HajimeKawahara/grkf45/HEAD/documents/figs/Lorentz.png -------------------------------------------------------------------------------- /documents/figs/vanderpol.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HajimeKawahara/grkf45/HEAD/documents/figs/vanderpol.png -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # GPU RKF45 (Runge–Kutta–Fehlberg) ODE parallel solver written in pycuda 2 | 3 | The Runge–Kutta–Fehlberg method (RKF45) is one of the widely used ODE solvers. GRKF45 is a parallel RKF45 solver with many different parameter sets. 4 | 5 | The original RKF45 code was taken from [cpp code](http://people.sc.fsu.edu/~jburkardt/cpp_src/rkf45/rkf45.html) by John Burkardt (LGPL). 6 | 7 | Currently, you need to define the ODEs in r4_f0, r4_f1, r4_f2, .... in GRKF45. 8 | 9 | 10 | ## codes 11 | 12 | - grkf45_1d (1D ODE) 13 | - grkf45_2d (2D ODE, the van der pol oscillator) 14 | 15 | 16 | 17 | - grkf45_3d (3D ODE, the Lorenz attractor) 18 | 19 | 20 | 21 | 22 | 23 | 24 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU LESSER GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | 9 | This version of the GNU Lesser General Public License incorporates 10 | the terms and conditions of version 3 of the GNU General Public 11 | License, supplemented by the additional permissions listed below. 12 | 13 | 0. Additional Definitions. 14 | 15 | As used herein, "this License" refers to version 3 of the GNU Lesser 16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU 17 | General Public License. 18 | 19 | "The Library" refers to a covered work governed by this License, 20 | other than an Application or a Combined Work as defined below. 21 | 22 | An "Application" is any work that makes use of an interface provided 23 | by the Library, but which is not otherwise based on the Library. 24 | Defining a subclass of a class defined by the Library is deemed a mode 25 | of using an interface provided by the Library. 26 | 27 | A "Combined Work" is a work produced by combining or linking an 28 | Application with the Library. The particular version of the Library 29 | with which the Combined Work was made is also called the "Linked 30 | Version". 31 | 32 | The "Minimal Corresponding Source" for a Combined Work means the 33 | Corresponding Source for the Combined Work, excluding any source code 34 | for portions of the Combined Work that, considered in isolation, are 35 | based on the Application, and not on the Linked Version. 36 | 37 | The "Corresponding Application Code" for a Combined Work means the 38 | object code and/or source code for the Application, including any data 39 | and utility programs needed for reproducing the Combined Work from the 40 | Application, but excluding the System Libraries of the Combined Work. 41 | 42 | 1. Exception to Section 3 of the GNU GPL. 43 | 44 | You may convey a covered work under sections 3 and 4 of this License 45 | without being bound by section 3 of the GNU GPL. 46 | 47 | 2. Conveying Modified Versions. 48 | 49 | If you modify a copy of the Library, and, in your modifications, a 50 | facility refers to a function or data to be supplied by an Application 51 | that uses the facility (other than as an argument passed when the 52 | facility is invoked), then you may convey a copy of the modified 53 | version: 54 | 55 | a) under this License, provided that you make a good faith effort to 56 | ensure that, in the event an Application does not supply the 57 | function or data, the facility still operates, and performs 58 | whatever part of its purpose remains meaningful, or 59 | 60 | b) under the GNU GPL, with none of the additional permissions of 61 | this License applicable to that copy. 62 | 63 | 3. Object Code Incorporating Material from Library Header Files. 64 | 65 | The object code form of an Application may incorporate material from 66 | a header file that is part of the Library. You may convey such object 67 | code under terms of your choice, provided that, if the incorporated 68 | material is not limited to numerical parameters, data structure 69 | layouts and accessors, or small macros, inline functions and templates 70 | (ten or fewer lines in length), you do both of the following: 71 | 72 | a) Give prominent notice with each copy of the object code that the 73 | Library is used in it and that the Library and its use are 74 | covered by this License. 75 | 76 | b) Accompany the object code with a copy of the GNU GPL and this license 77 | document. 78 | 79 | 4. Combined Works. 80 | 81 | You may convey a Combined Work under terms of your choice that, 82 | taken together, effectively do not restrict modification of the 83 | portions of the Library contained in the Combined Work and reverse 84 | engineering for debugging such modifications, if you also do each of 85 | the following: 86 | 87 | a) Give prominent notice with each copy of the Combined Work that 88 | the Library is used in it and that the Library and its use are 89 | covered by this License. 90 | 91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license 92 | document. 93 | 94 | c) For a Combined Work that displays copyright notices during 95 | execution, include the copyright notice for the Library among 96 | these notices, as well as a reference directing the user to the 97 | copies of the GNU GPL and this license document. 98 | 99 | d) Do one of the following: 100 | 101 | 0) Convey the Minimal Corresponding Source under the terms of this 102 | License, and the Corresponding Application Code in a form 103 | suitable for, and under terms that permit, the user to 104 | recombine or relink the Application with a modified version of 105 | the Linked Version to produce a modified Combined Work, in the 106 | manner specified by section 6 of the GNU GPL for conveying 107 | Corresponding Source. 108 | 109 | 1) Use a suitable shared library mechanism for linking with the 110 | Library. A suitable mechanism is one that (a) uses at run time 111 | a copy of the Library already present on the user's computer 112 | system, and (b) will operate properly with a modified version 113 | of the Library that is interface-compatible with the Linked 114 | Version. 115 | 116 | e) Provide Installation Information, but only if you would otherwise 117 | be required to provide such information under section 6 of the 118 | GNU GPL, and only to the extent that such information is 119 | necessary to install and execute a modified version of the 120 | Combined Work produced by recombining or relinking the 121 | Application with a modified version of the Linked Version. (If 122 | you use option 4d0, the Installation Information must accompany 123 | the Minimal Corresponding Source and Corresponding Application 124 | Code. If you use option 4d1, you must provide the Installation 125 | Information in the manner specified by section 6 of the GNU GPL 126 | for conveying Corresponding Source.) 127 | 128 | 5. Combined Libraries. 129 | 130 | You may place library facilities that are a work based on the 131 | Library side by side in a single library together with other library 132 | facilities that are not Applications and are not covered by this 133 | License, and convey such a combined library under terms of your 134 | choice, if you do both of the following: 135 | 136 | a) Accompany the combined library with a copy of the same work based 137 | on the Library, uncombined with any other library facilities, 138 | conveyed under the terms of this License. 139 | 140 | b) Give prominent notice with the combined library that part of it 141 | is a work based on the Library, and explaining where to find the 142 | accompanying uncombined form of the same work. 143 | 144 | 6. Revised Versions of the GNU Lesser General Public License. 145 | 146 | The Free Software Foundation may publish revised and/or new versions 147 | of the GNU Lesser General Public License from time to time. Such new 148 | versions will be similar in spirit to the present version, but may 149 | differ in detail to address new problems or concerns. 150 | 151 | Each version is given a distinguishing version number. If the 152 | Library as you received it specifies that a certain numbered version 153 | of the GNU Lesser General Public License "or any later version" 154 | applies to it, you have the option of following the terms and 155 | conditions either of that published version or of any later version 156 | published by the Free Software Foundation. If the Library as you 157 | received it does not specify a version number of the GNU Lesser 158 | General Public License, you may choose any version of the GNU Lesser 159 | General Public License ever published by the Free Software Foundation. 160 | 161 | If the Library as you received it specifies that a proxy can decide 162 | whether future versions of the GNU Lesser General Public License shall 163 | apply, that proxy's public statement of acceptance of any version is 164 | permanent authorization for you to choose that version for the 165 | Library. 166 | -------------------------------------------------------------------------------- /grkf45/grkf45_1d.py: -------------------------------------------------------------------------------- 1 | import pycuda.autoinit 2 | import pycuda.driver as cuda 3 | import pycuda.compiler 4 | from pycuda.compiler import SourceModule 5 | 6 | def grkf45_1d_module(): 7 | source_module = SourceModule(""" 8 | 9 | #include 10 | #include 11 | #include 12 | # define MAXNFE 3000 13 | 14 | using namespace std; 15 | 16 | /* USER GIVEN DEVICE FUNCTION: parameter = a */ 17 | __device__ void r4_f0 ( float t, float y0, float *yp0, float a){ 18 | 19 | *yp0 = 1.0 + y0*y0 + a*sin(t); 20 | 21 | /* printf("(*_*)< t=%2.8f,y0= %2.8f,yp0= %2.8f \\n",t,y0,yp0); */ 22 | 23 | return; 24 | } 25 | 26 | 27 | 28 | 29 | /* DEVICE FUNCTION */ 30 | 31 | __device__ void r4_fehl (float y0, float t, float h, float yp0, float *f1_0, float *f2_0, float *f3_0, float *f4_0, float *f5_0, float a){ 32 | 33 | float ch; 34 | int i; 35 | float s0; 36 | 37 | ch = h / 4.0; 38 | 39 | /* for ( i = 0; i < neqn; i++ ){ */ 40 | *f5_0 = y0 + ch * yp0; 41 | /* } */ 42 | 43 | r4_f0 ( t + ch, *f5_0, f1_0, a); 44 | 45 | ch = 3.0 * h / 32.0; 46 | 47 | /* for ( i = 0; i < neqn; i++ ){ */ 48 | *f5_0 = y0 + ch * ( yp0 + 3.0 * *f1_0 ); 49 | /* } */ 50 | 51 | r4_f0 ( t + 3.0 * h / 8.0, *f5_0, f2_0, a); 52 | ch = h / 2197.0; 53 | 54 | /* for ( i = 0; i < neqn; i++ ){ */ 55 | *f5_0 = y0 + ch * 56 | ( 1932.0 * yp0 57 | + ( 7296.0 * *f2_0 - 7200.0 * *f1_0 ) 58 | ); 59 | /* } */ 60 | 61 | r4_f0 ( t + 12.0 * h / 13.0, *f5_0, f3_0, a); 62 | ch = h / 4104.0; 63 | 64 | /* for ( i = 0; i < neqn; i++ ){ */ 65 | *f5_0 = y0 + ch * 66 | ( 67 | ( 8341.0 * yp0 - 845.0 * *f3_0 ) 68 | + ( 29440.0 * *f2_0 - 32832.0 * *f1_0 ) 69 | ); 70 | /* } */ 71 | 72 | r4_f0 ( t + h, *f5_0, f4_0, a); 73 | ch = h / 20520.0; 74 | 75 | /* for ( i = 0; i < neqn; i++ ){ */ 76 | *f1_0 = y0 + ch * 77 | ( 78 | ( -6080.0 * yp0 79 | + ( 9295.0 * *f3_0 - 5643.0 * *f4_0 ) 80 | ) 81 | + ( 41040.0 * *f1_0 - 28352.0 * *f2_0 ) 82 | ); 83 | /* } */ 84 | 85 | r4_f0 ( t + h / 2.0, *f1_0, f5_0, a); 86 | ch = h / 7618050.0; 87 | 88 | /* for ( i = 0; i < neqn; i++ ){ */ 89 | s0 = y0 + ch * 90 | ( 91 | ( 902880.0 * yp0 92 | + ( 3855735.0 * *f3_0 - 1371249.0 * *f4_0 ) ) 93 | + ( 3953664.0 * *f2_0 + 277020.0 * *f5_0 ) 94 | ); 95 | /* } */ 96 | 97 | *f1_0 = s0; 98 | 99 | /* printf("(*_*)< +++++ %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 100 | 101 | return; 102 | } 103 | 104 | 105 | /* GLOBAL FUNCTION */ 106 | __global__ void r4_rkf45_1d (int* flagM, float* aM, float *yM, float tM, float toutM, float relerr, float abserr){ 107 | 108 | float ae; 109 | float dt; 110 | float ee; 111 | float eeoet; 112 | const float eps = 1.19209290E-07; 113 | float esttol; 114 | float et; 115 | float f1_0; 116 | float f2_0; 117 | float f3_0; 118 | float f4_0; 119 | float f5_0; 120 | float h = -1.0; 121 | bool hfaild; 122 | float hmin; 123 | int i; 124 | int init = -1000; 125 | int k; 126 | int kop = -1; 127 | int nfe = -1; 128 | float s; 129 | float scale; 130 | float tol; 131 | float toln; 132 | float ypk; 133 | bool output; 134 | 135 | /* user defined parameters */ 136 | float a; 137 | int ib = blockIdx.x; 138 | 139 | a = aM[ib]; 140 | 141 | 142 | /* USE register */ 143 | float t; 144 | float y0; 145 | float yp0; 146 | float tout; 147 | 148 | t = tM; 149 | tout = toutM; 150 | y0 = yM[ib]; 151 | r4_f0 ( t, y0, &yp0, a); 152 | 153 | dt = tout - t; 154 | 155 | if ( init == 0 ){ 156 | 157 | init = 1; 158 | h = abs( dt ); 159 | toln = 0.0; 160 | 161 | /* for ( k = 0; k < neqn; k++ ){ */ 162 | tol = (relerr) * abs( y0 ) + abserr; 163 | 164 | if ( 0.0 < tol ){ 165 | toln = tol; 166 | ypk = abs( yp0 ); 167 | if ( tol < ypk * pow ( h, 5 ) ) 168 | { 169 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 170 | }} 171 | 172 | /* } */ 173 | 174 | 175 | if ( toln <= 0.0 ){h = 0.0;} 176 | h = max ( h, 26.0 * eps * max ( abs ( t ), abs ( dt ) ) ); 177 | } 178 | 179 | 180 | /* SIGN(positive/negative -> 1/-1) to signbit(positive/negative -> 0/1) in CUDA math API */ 181 | 182 | h = ( - 2.0* signbit( dt ) + 1.0 ) *abs( h ); 183 | 184 | if ( 2.0 * abs( dt ) <= abs( h ) ){ 185 | kop = kop + 1; 186 | } 187 | 188 | output = false; 189 | scale = 2.0 / (relerr); 190 | ae = scale * abserr; 191 | 192 | for ( ; ; ){ 193 | hfaild = false; 194 | hmin = 26.0 * eps * abs ( t ); 195 | dt = tout - t; 196 | 197 | if ( 2.0 * abs ( h ) <= abs ( dt ) ){ 198 | }else{ 199 | if ( abs ( dt ) <= abs ( h ) ){ 200 | output = true; 201 | h = dt; 202 | }else{ 203 | h = 0.5 * dt; 204 | } 205 | } 206 | 207 | for ( ; ; ){ 208 | 209 | if ( MAXNFE < nfe ){ 210 | 211 | tM = t; 212 | yM[ib] = y0; 213 | flagM[ib]=4; 214 | 215 | printf("(*_*)< t=%2.8f \\n",t); 216 | printf("*WARNING! END MAXNFE < nfe condition! \\n"); 217 | 218 | return; 219 | } 220 | 221 | /* printf("(*_*)< >>>>> %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 222 | r4_fehl (y0, t, h, yp0, &f1_0, &f2_0, &f3_0, &f4_0, &f5_0, a); 223 | 224 | /* printf("(*_*)< <<<<< %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 225 | 226 | nfe = nfe + 5; 227 | eeoet = 0.0; 228 | 229 | 230 | /* for ( k = 0; k < neqn; k++ ){ */ 231 | et = abs ( y0 ) + abs ( f1_0 ) + ae; 232 | 233 | ee = abs ( ( -2090.0 * yp0 234 | + ( 21970.0 * f3_0 - 15048.0 * f4_0 ) 235 | ) 236 | + ( 22528.0 * f2_0 - 27360.0 * f5_0 ) 237 | ); 238 | 239 | eeoet = max ( eeoet, ee / et ); 240 | 241 | /* } */ 242 | 243 | esttol = abs ( h ) * eeoet * scale / 752400.0; 244 | 245 | if ( esttol <= 1.0 ) 246 | { 247 | break; 248 | } 249 | 250 | hfaild = true; 251 | output = false; 252 | 253 | /* printf("(*_*)< h = %2.8f, esttol= %2.8f \\n",h, esttol); */ 254 | 255 | if ( esttol < 59049.0 ){ 256 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 ); 257 | }else{ 258 | s = 0.1; 259 | } 260 | 261 | h = s * h; 262 | } 263 | 264 | t = t + h; 265 | /* for ( i = 0; i < neqn; i++ ){ */ 266 | y0 = f1_0; 267 | /* } */ 268 | 269 | r4_f0 ( t, y0, &yp0, a); 270 | 271 | nfe = nfe + 1; 272 | if ( 0.0001889568 < esttol ) 273 | { 274 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 ); 275 | } 276 | else 277 | { 278 | s = 5.0; 279 | } 280 | 281 | if ( hfaild ) 282 | { 283 | s = min ( s, 1.0 ); 284 | } 285 | 286 | h = ( - 2.0* signbit( h ) + 1.0 ) * max ( s * abs ( h ), hmin ); 287 | 288 | if (output){ 289 | 290 | tM = t; 291 | yM[ib] = y0; 292 | flagM[ib]=2; 293 | 294 | /* printf("Normal Exit N=%d\\n",nfe); */ 295 | 296 | return; 297 | } 298 | 299 | } 300 | 301 | 302 | return; 303 | 304 | } 305 | 306 | """,options=['-use_fast_math']) 307 | 308 | return source_module 309 | 310 | 311 | if __name__ == "__main__": 312 | import numpy as np 313 | import matplotlib.pyplot as plt 314 | 315 | print("*******************************************") 316 | print("GPU RKF45 1D ODE solver for the following example.") 317 | print("y'=1+y*y+a sin(t), y[0]=0 on t=[0,1.4] ") 318 | print("*******************************************") 319 | 320 | source_module=grkf45_1d_module() 321 | pkernel=source_module.get_function("r4_rkf45_1d") 322 | 323 | eps=1.19209290e-07 324 | relerr=np.sqrt(eps) 325 | abserr=np.sqrt(eps) 326 | print("abserr=",abserr) 327 | print("relerr=",relerr) 328 | 329 | 330 | nw=1 331 | nt=16 332 | nq=1 333 | nb = nw*nt*nq 334 | sharedsize=0 #byte 335 | 336 | #parameters 337 | a=np.linspace(-0.1,0.1,nb) 338 | a=a.astype(np.float32) 339 | dev_a = cuda.mem_alloc(a.nbytes) 340 | cuda.memcpy_htod(dev_a,a) 341 | 342 | y0=np.zeros(nb) 343 | y0=y0.astype(np.float32) 344 | dev_y0 = cuda.mem_alloc(y0.nbytes) 345 | cuda.memcpy_htod(dev_y0,y0) 346 | 347 | flag=np.zeros(nb) 348 | flag=flag.astype(np.int32) 349 | dev_flag = cuda.mem_alloc(flag.nbytes) 350 | cuda.memcpy_htod(dev_flag,flag) 351 | 352 | 353 | t=np.linspace(0.0,1.4,25) 354 | yarr=[] 355 | yarr.append(np.copy(y0)) 356 | for j,tnow in enumerate(t[:-1]): 357 | tin=tnow 358 | tout=t[j+1] 359 | pkernel(dev_flag, dev_a,dev_y0, np.float32(tin),np.float32(tout),np.float32(relerr),np.float32(abserr),block=(int(nw),1,1), grid=(int(nt),int(nq)),shared=sharedsize) 360 | 361 | cuda.memcpy_dtoh(y0, dev_y0) 362 | cuda.memcpy_dtoh(flag, dev_flag) 363 | yarr.append(np.copy(y0)) 364 | 365 | yarr=np.array(yarr) 366 | 367 | plt.plot(t,yarr,".",color="C0") 368 | plt.plot(t,yarr,alpha=0.3) 369 | plt.show() 370 | 371 | -------------------------------------------------------------------------------- /grkf45/grkf45_2d.py: -------------------------------------------------------------------------------- 1 | import pycuda.autoinit 2 | import pycuda.driver as cuda 3 | import pycuda.compiler 4 | from pycuda.compiler import SourceModule 5 | 6 | def grkf45_module(): 7 | source_module = SourceModule(""" 8 | 9 | #include 10 | #include 11 | #include 12 | # define MAXNFE 3000 13 | 14 | using namespace std; 15 | 16 | /* USER GIVEN DEVICE FUNCTION: parameter = a */ 17 | __device__ void r4_f0 ( float t, float y0, float y1, float *yp0, float a){ 18 | 19 | *yp0 = y1; 20 | 21 | return; 22 | } 23 | __device__ void r4_f1 ( float t, float y0, float y1, float *yp1, float a){ 24 | 25 | *yp1 = -y0 + a*(1.0 - y0*y0)*y1 ; 26 | 27 | return; 28 | } 29 | 30 | 31 | 32 | 33 | /* DEVICE FUNCTION */ 34 | 35 | __device__ void r4_fehl (float y0, float y1, float t, float h, float yp0, float yp1, float *f1_0, float *f2_0, float *f3_0, float *f4_0, float *f5_0, float *f1_1, float *f2_1, float *f3_1, float *f4_1, float *f5_1, float a){ 36 | 37 | float ch; 38 | int i; 39 | float s0; 40 | float s1; 41 | 42 | ch = h / 4.0; 43 | 44 | /* for ( i = 0; i < neqn; i++ ){ */ 45 | *f5_0 = y0 + ch * yp0; 46 | *f5_1 = y1 + ch * yp1; 47 | /* } */ 48 | 49 | r4_f0 ( t + ch, *f5_0, *f5_1, f1_0, a); 50 | r4_f1 ( t + ch, *f5_0, *f5_1, f1_1, a); 51 | 52 | ch = 3.0 * h / 32.0; 53 | 54 | /* for ( i = 0; i < neqn; i++ ){ */ 55 | *f5_0 = y0 + ch * ( yp0 + 3.0 * *f1_0 ); 56 | *f5_1 = y1 + ch * ( yp1 + 3.0 * *f1_1 ); 57 | /* } */ 58 | 59 | r4_f0 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, f2_0, a); 60 | r4_f1 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, f2_1, a); 61 | 62 | ch = h / 2197.0; 63 | 64 | /* for ( i = 0; i < neqn; i++ ){ */ 65 | *f5_0 = y0 + ch * 66 | ( 1932.0 * yp0 67 | + ( 7296.0 * *f2_0 - 7200.0 * *f1_0 ) 68 | ); 69 | *f5_1 = y1 + ch * 70 | ( 1932.0 * yp1 71 | + ( 7296.0 * *f2_1 - 7200.0 * *f1_1 ) 72 | ); 73 | 74 | /* } */ 75 | 76 | r4_f0 ( t + 12.0 * h / 13.0, *f5_0,*f5_1, f3_0, a); 77 | r4_f1 ( t + 12.0 * h / 13.0, *f5_0,*f5_1, f3_1, a); 78 | 79 | ch = h / 4104.0; 80 | 81 | /* for ( i = 0; i < neqn; i++ ){ */ 82 | *f5_0 = y0 + ch * 83 | ( 84 | ( 8341.0 * yp0 - 845.0 * *f3_0 ) 85 | + ( 29440.0 * *f2_0 - 32832.0 * *f1_0 ) 86 | ); 87 | *f5_1 = y1 + ch * 88 | ( 89 | ( 8341.0 * yp1 - 845.0 * *f3_1 ) 90 | + ( 29440.0 * *f2_1 - 32832.0 * *f1_1 ) 91 | ); 92 | /* } */ 93 | 94 | r4_f0 ( t + h, *f5_0,*f5_1, f4_0, a); 95 | r4_f1 ( t + h, *f5_0,*f5_1, f4_1, a); 96 | 97 | ch = h / 20520.0; 98 | 99 | /* for ( i = 0; i < neqn; i++ ){ */ 100 | *f1_0 = y0 + ch * 101 | ( 102 | ( -6080.0 * yp0 103 | + ( 9295.0 * *f3_0 - 5643.0 * *f4_0 ) 104 | ) 105 | + ( 41040.0 * *f1_0 - 28352.0 * *f2_0 ) 106 | ); 107 | *f1_1 = y1 + ch * 108 | ( 109 | ( -6080.0 * yp1 110 | + ( 9295.0 * *f3_1 - 5643.0 * *f4_1 ) 111 | ) 112 | + ( 41040.0 * *f1_1 - 28352.0 * *f2_1 ) 113 | ); 114 | /* } */ 115 | 116 | r4_f0 ( t + h / 2.0, *f1_0,*f1_1, f5_0, a); 117 | r4_f1 ( t + h / 2.0, *f1_0,*f1_1, f5_1, a); 118 | 119 | ch = h / 7618050.0; 120 | 121 | /* for ( i = 0; i < neqn; i++ ){ */ 122 | s0 = y0 + ch * 123 | ( 124 | ( 902880.0 * yp0 125 | + ( 3855735.0 * *f3_0 - 1371249.0 * *f4_0 ) ) 126 | + ( 3953664.0 * *f2_0 + 277020.0 * *f5_0 ) 127 | ); 128 | s1 = y1 + ch * 129 | ( 130 | ( 902880.0 * yp1 131 | + ( 3855735.0 * *f3_1 - 1371249.0 * *f4_1 ) ) 132 | + ( 3953664.0 * *f2_1 + 277020.0 * *f5_1 ) 133 | ); 134 | /* } */ 135 | 136 | *f1_0 = s0; 137 | *f1_1 = s1; 138 | 139 | /* printf("(*_*)< +++++ %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 140 | 141 | return; 142 | } 143 | 144 | 145 | /* GLOBAL FUNCTION */ 146 | __global__ void r4_rkf45 (int* flagM, float* aM, float *yM0, float *yM1, float tM, float toutM, float relerr, float abserr){ 147 | 148 | float ae; 149 | float dt; 150 | float ee; 151 | float eeoet; 152 | const float eps = 1.19209290E-07; 153 | float esttol; 154 | float et; 155 | float f1_0; 156 | float f2_0; 157 | float f3_0; 158 | float f4_0; 159 | float f5_0; 160 | float f1_1; 161 | float f2_1; 162 | float f3_1; 163 | float f4_1; 164 | float f5_1; 165 | float h = -1.0; 166 | bool hfaild; 167 | float hmin; 168 | int i; 169 | int init = -1000; 170 | int k; 171 | int kop = -1; 172 | int nfe = -1; 173 | float s; 174 | float scale; 175 | float tol; 176 | float toln; 177 | float ypk; 178 | bool output; 179 | 180 | /* user defined parameters */ 181 | float a; 182 | int ib = blockIdx.x; 183 | 184 | a = aM[ib]; 185 | 186 | 187 | /* USE register */ 188 | float t; 189 | float y0; 190 | float yp0; 191 | float y1; 192 | float yp1; 193 | float tout; 194 | 195 | t = tM; 196 | tout = toutM; 197 | y0 = yM0[ib]; 198 | y1 = yM1[ib]; 199 | r4_f0 ( t, y0, y1, &yp0, a); 200 | r4_f1 ( t, y0, y1, &yp1, a); 201 | 202 | 203 | dt = tout - t; 204 | 205 | if ( init == 0 ){ 206 | 207 | init = 1; 208 | h = abs( dt ); 209 | toln = 0.0; 210 | 211 | /* for ( k = 0; k < neqn; k++ ){ */ 212 | tol = (relerr) * abs( y0 ) + abserr; 213 | if ( 0.0 < tol ){ 214 | toln = tol; 215 | ypk = abs( yp0 ); 216 | if ( tol < ypk * pow ( h, 5 ) ) 217 | { 218 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 219 | }} 220 | 221 | tol = (relerr) * abs( y1 ) + abserr; 222 | if ( 0.0 < tol ){ 223 | toln = tol; 224 | ypk = abs( yp1 ); 225 | if ( tol < ypk * pow ( h, 5 ) ) 226 | { 227 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 228 | }} 229 | /* } */ 230 | 231 | 232 | if ( toln <= 0.0 ){h = 0.0;} 233 | h = max ( h, 26.0 * eps * max ( abs ( t ), abs ( dt ) ) ); 234 | } 235 | 236 | 237 | /* SIGN(positive/negative -> 1/-1) to signbit(positive/negative -> 0/1) in CUDA math API */ 238 | 239 | h = ( - 2.0* signbit( dt ) + 1.0 ) *abs( h ); 240 | 241 | if ( 2.0 * abs( dt ) <= abs( h ) ){ 242 | kop = kop + 1; 243 | } 244 | 245 | output = false; 246 | scale = 2.0 / (relerr); 247 | ae = scale * abserr; 248 | 249 | for ( ; ; ){ 250 | hfaild = false; 251 | hmin = 26.0 * eps * abs ( t ); 252 | dt = tout - t; 253 | 254 | if ( 2.0 * abs ( h ) <= abs ( dt ) ){ 255 | }else{ 256 | if ( abs ( dt ) <= abs ( h ) ){ 257 | output = true; 258 | h = dt; 259 | }else{ 260 | h = 0.5 * dt; 261 | } 262 | } 263 | 264 | for ( ; ; ){ 265 | 266 | if ( MAXNFE < nfe ){ 267 | 268 | tM = t; 269 | yM0[ib] = y0; 270 | yM1[ib] = y1; 271 | 272 | 273 | flagM[ib]=4; 274 | 275 | printf("(*_*)< t=%2.8f \\n",t); 276 | printf("*WARNING! END MAXNFE < nfe condition! \\n"); 277 | 278 | return; 279 | } 280 | 281 | /* printf("(*_*)< >>>>> %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 282 | r4_fehl (y0, y1, t, h, yp0, yp1, &f1_0, &f2_0, &f3_0, &f4_0, &f5_0, &f1_1, &f2_1, &f3_1, &f4_1, &f5_1, a); 283 | 284 | /* printf("(*_*)< <<<<< %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 285 | 286 | nfe = nfe + 5; 287 | eeoet = 0.0; 288 | 289 | /* for ( k = 0; k < neqn; k++ ){ */ 290 | et = abs ( y0 ) + abs ( f1_0 ) + ae; 291 | ee = abs ( ( -2090.0 * yp0 292 | + ( 21970.0 * f3_0 - 15048.0 * f4_0 ) 293 | ) 294 | + ( 22528.0 * f2_0 - 27360.0 * f5_0 ) 295 | ); 296 | eeoet = max ( eeoet, ee / et ); 297 | 298 | et = abs ( y1 ) + abs ( f1_1 ) + ae; 299 | ee = abs ( ( -2090.0 * yp1 300 | + ( 21970.0 * f3_1 - 15048.0 * f4_1 ) 301 | ) 302 | + ( 22528.0 * f2_1 - 27360.0 * f5_1 ) 303 | ); 304 | eeoet = max ( eeoet, ee / et ); 305 | /* } */ 306 | 307 | esttol = abs ( h ) * eeoet * scale / 752400.0; 308 | 309 | if ( esttol <= 1.0 ) 310 | { 311 | break; 312 | } 313 | 314 | hfaild = true; 315 | output = false; 316 | 317 | /* printf("(*_*)< h = %2.8f, esttol= %2.8f \\n",h, esttol); */ 318 | 319 | if ( esttol < 59049.0 ){ 320 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 ); 321 | }else{ 322 | s = 0.1; 323 | } 324 | 325 | h = s * h; 326 | } 327 | 328 | t = t + h; 329 | /* for ( i = 0; i < neqn; i++ ){ */ 330 | y0 = f1_0; 331 | y1 = f1_1; 332 | /* } */ 333 | 334 | r4_f0 ( t, y0, y1, &yp0, a); 335 | r4_f1 ( t, y0, y1, &yp1, a); 336 | 337 | nfe = nfe + 1; 338 | if ( 0.0001889568 < esttol ) 339 | { 340 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 ); 341 | } 342 | else 343 | { 344 | s = 5.0; 345 | } 346 | 347 | if ( hfaild ) 348 | { 349 | s = min ( s, 1.0 ); 350 | } 351 | 352 | h = ( - 2.0* signbit( h ) + 1.0 ) * max ( s * abs ( h ), hmin ); 353 | 354 | if (output){ 355 | 356 | tM = t; 357 | yM0[ib] = y0; 358 | yM1[ib] = y1; 359 | 360 | flagM[ib]=2; 361 | 362 | /* printf("Normal Exit N=%d\\n",nfe); */ 363 | 364 | return; 365 | } 366 | 367 | } 368 | 369 | 370 | return; 371 | 372 | } 373 | 374 | """,options=['-use_fast_math']) 375 | 376 | return source_module 377 | 378 | 379 | if __name__ == "__main__": 380 | import numpy as np 381 | import matplotlib.pyplot as plt 382 | 383 | print("*******************************************") 384 | print("GPU RKF45 2D ODE solver for the Van der Pol oscillator. ") 385 | print("y0' = y1 on t=[0,20] ") 386 | print("y1' = -y0 + a (1 - y0*y0)*y1 on t=[0,20] ") 387 | print("*******************************************") 388 | 389 | source_module=grkf45_module() 390 | pkernel=source_module.get_function("r4_rkf45") 391 | 392 | eps=1.19209290e-07 393 | relerr=np.sqrt(eps) 394 | abserr=np.sqrt(eps) 395 | print("abserr=",abserr) 396 | print("relerr=",relerr) 397 | 398 | nw=1 399 | nt=16 400 | nq=1 401 | nb = nw*nt*nq 402 | sharedsize=0 #byte 403 | 404 | #parameters 405 | a=np.linspace(0.1,4.0,nb) 406 | a=a.astype(np.float32) 407 | dev_a = cuda.mem_alloc(a.nbytes) 408 | cuda.memcpy_htod(dev_a,a) 409 | 410 | y0=np.ones(nb)*-4.0 411 | y0=y0.astype(np.float32) 412 | dev_y0 = cuda.mem_alloc(y0.nbytes) 413 | cuda.memcpy_htod(dev_y0,y0) 414 | 415 | y1=np.zeros(nb) 416 | y1=y1.astype(np.float32) 417 | dev_y1 = cuda.mem_alloc(y1.nbytes) 418 | cuda.memcpy_htod(dev_y1,y1) 419 | 420 | flag=np.zeros(nb) 421 | flag=flag.astype(np.int32) 422 | dev_flag = cuda.mem_alloc(flag.nbytes) 423 | cuda.memcpy_htod(dev_flag,flag) 424 | 425 | t=np.linspace(0.0,20.0,1000) 426 | yarr0=[] 427 | yarr0.append(np.copy(y0)) 428 | yarr1=[] 429 | yarr1.append(np.copy(y1)) 430 | 431 | for j,tnow in enumerate(t[:-1]): 432 | tin=tnow 433 | tout=t[j+1] 434 | pkernel(dev_flag, dev_a,dev_y0,dev_y1,np.float32(tin),np.float32(tout),np.float32(relerr),np.float32(abserr),block=(int(nw),1,1), grid=(int(nt),int(nq)),shared=sharedsize) 435 | 436 | cuda.memcpy_dtoh(y0, dev_y0) 437 | cuda.memcpy_dtoh(y1, dev_y1) 438 | cuda.memcpy_dtoh(flag, dev_flag) 439 | yarr0.append(np.copy(y0)) 440 | yarr1.append(np.copy(y1)) 441 | 442 | yarr0=np.array(yarr0) 443 | yarr1=np.array(yarr1) 444 | 445 | 446 | fig = plt.figure() 447 | ax = fig.add_subplot(111,aspect=1.0) 448 | plt.plot(yarr0,yarr1,alpha=0.5) 449 | plt.savefig("vanderpol.png") 450 | plt.show() 451 | -------------------------------------------------------------------------------- /grkf45/grkf45_3d.py: -------------------------------------------------------------------------------- 1 | import pycuda.autoinit 2 | import pycuda.driver as cuda 3 | import pycuda.compiler 4 | from pycuda.compiler import SourceModule 5 | 6 | def grkf45_module(): 7 | source_module = SourceModule(""" 8 | 9 | #include 10 | #include 11 | #include 12 | # define MAXNFE 10000 13 | 14 | using namespace std; 15 | 16 | /* USER GIVEN DEVICE FUNCTION: parameter = a */ 17 | __device__ void r4_f0 ( float t, float y0, float y1, float y2, float *yp0, float p, float r, float b){ 18 | 19 | *yp0 = -p*y0 + p*y1; 20 | 21 | return; 22 | } 23 | __device__ void r4_f1 ( float t, float y0, float y1, float y2, float *yp1, float p, float r, float b){ 24 | 25 | *yp1 = -y0*y2 + r*y0 - y1; 26 | 27 | return; 28 | } 29 | __device__ void r4_f2 ( float t, float y0, float y1, float y2, float *yp2, float p, float r, float b){ 30 | 31 | *yp2 = y0*y1 - b*y2 ; 32 | 33 | return; 34 | } 35 | 36 | 37 | 38 | 39 | /* DEVICE FUNCTION */ 40 | 41 | __device__ void r4_fehl (float y0, float y1, float y2, float t, float h, float yp0, float yp1, float yp2, float *f1_0, float *f2_0, float *f3_0, float *f4_0, float *f5_0, float *f1_1, float *f2_1, float *f3_1, float *f4_1, float *f5_1,float *f1_2, float *f2_2, float *f3_2, float *f4_2, float *f5_2, float p, float r, float b){ 42 | 43 | float ch; 44 | int i; 45 | float s0; 46 | float s1; 47 | float s2; 48 | 49 | ch = h / 4.0; 50 | 51 | /* for ( i = 0; i < neqn; i++ ){ */ 52 | *f5_0 = y0 + ch * yp0; 53 | *f5_1 = y1 + ch * yp1; 54 | *f5_2 = y2 + ch * yp2; 55 | 56 | /* } */ 57 | 58 | r4_f0 ( t + ch, *f5_0, *f5_1, *f5_2, f1_0, p, r, b); 59 | r4_f1 ( t + ch, *f5_0, *f5_1, *f5_2, f1_1, p, r, b); 60 | r4_f2 ( t + ch, *f5_0, *f5_1, *f5_2, f1_2, p, r, b); 61 | 62 | ch = 3.0 * h / 32.0; 63 | 64 | /* for ( i = 0; i < neqn; i++ ){ */ 65 | *f5_0 = y0 + ch * ( yp0 + 3.0 * *f1_0 ); 66 | *f5_1 = y1 + ch * ( yp1 + 3.0 * *f1_1 ); 67 | *f5_2 = y2 + ch * ( yp2 + 3.0 * *f1_2 ); 68 | /* } */ 69 | 70 | r4_f0 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_0, p, r, b); 71 | r4_f1 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_1, p, r, b); 72 | r4_f2 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_2, p, r, b); 73 | 74 | ch = h / 2197.0; 75 | 76 | /* for ( i = 0; i < neqn; i++ ){ */ 77 | *f5_0 = y0 + ch * 78 | ( 1932.0 * yp0 79 | + ( 7296.0 * *f2_0 - 7200.0 * *f1_0 ) 80 | ); 81 | *f5_1 = y1 + ch * 82 | ( 1932.0 * yp1 83 | + ( 7296.0 * *f2_1 - 7200.0 * *f1_1 ) 84 | ); 85 | *f5_2 = y2 + ch * 86 | ( 1932.0 * yp2 87 | + ( 7296.0 * *f2_2 - 7200.0 * *f1_2 ) 88 | ); 89 | /* } */ 90 | 91 | r4_f0 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_0, p, r, b); 92 | r4_f1 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_1, p, r, b); 93 | r4_f2 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_2, p, r, b); 94 | 95 | ch = h / 4104.0; 96 | 97 | /* for ( i = 0; i < neqn; i++ ){ */ 98 | *f5_0 = y0 + ch * 99 | ( 100 | ( 8341.0 * yp0 - 845.0 * *f3_0 ) 101 | + ( 29440.0 * *f2_0 - 32832.0 * *f1_0 ) 102 | ); 103 | *f5_1 = y1 + ch * 104 | ( 105 | ( 8341.0 * yp1 - 845.0 * *f3_1 ) 106 | + ( 29440.0 * *f2_1 - 32832.0 * *f1_1 ) 107 | ); 108 | *f5_2 = y2 + ch * 109 | ( 110 | ( 8341.0 * yp2 - 845.0 * *f3_2 ) 111 | + ( 29440.0 * *f2_2 - 32832.0 * *f1_2 ) 112 | ); 113 | /* } */ 114 | 115 | r4_f0 ( t + h, *f5_0,*f5_1,*f5_2, f4_0, p, r, b); 116 | r4_f1 ( t + h, *f5_0,*f5_1,*f5_2, f4_1, p, r, b); 117 | r4_f2 ( t + h, *f5_0,*f5_1,*f5_2, f4_2, p, r, b); 118 | 119 | ch = h / 20520.0; 120 | 121 | /* for ( i = 0; i < neqn; i++ ){ */ 122 | *f1_0 = y0 + ch * 123 | ( 124 | ( -6080.0 * yp0 125 | + ( 9295.0 * *f3_0 - 5643.0 * *f4_0 ) 126 | ) 127 | + ( 41040.0 * *f1_0 - 28352.0 * *f2_0 ) 128 | ); 129 | *f1_1 = y1 + ch * 130 | ( 131 | ( -6080.0 * yp1 132 | + ( 9295.0 * *f3_1 - 5643.0 * *f4_1 ) 133 | ) 134 | + ( 41040.0 * *f1_1 - 28352.0 * *f2_1 ) 135 | ); 136 | *f1_2 = y2 + ch * 137 | ( 138 | ( -6080.0 * yp2 139 | + ( 9295.0 * *f3_2 - 5643.0 * *f4_2 ) 140 | ) 141 | + ( 41040.0 * *f1_2 - 28352.0 * *f2_2 ) 142 | ); 143 | /* } */ 144 | 145 | r4_f0 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_0, p,r,b); 146 | r4_f1 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_1, p,r,b); 147 | r4_f2 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_1, p,r,b); 148 | 149 | ch = h / 7618050.0; 150 | 151 | /* for ( i = 0; i < neqn; i++ ){ */ 152 | s0 = y0 + ch * 153 | ( 154 | ( 902880.0 * yp0 155 | + ( 3855735.0 * *f3_0 - 1371249.0 * *f4_0 ) ) 156 | + ( 3953664.0 * *f2_0 + 277020.0 * *f5_0 ) 157 | ); 158 | s1 = y1 + ch * 159 | ( 160 | ( 902880.0 * yp1 161 | + ( 3855735.0 * *f3_1 - 1371249.0 * *f4_1 ) ) 162 | + ( 3953664.0 * *f2_1 + 277020.0 * *f5_1 ) 163 | ); 164 | s2 = y2 + ch * 165 | ( 166 | ( 902880.0 * yp2 167 | + ( 3855735.0 * *f3_2 - 1371249.0 * *f4_2 ) ) 168 | + ( 3953664.0 * *f2_2 + 277020.0 * *f5_2 ) 169 | ); 170 | /* } */ 171 | 172 | *f1_0 = s0; 173 | *f1_1 = s1; 174 | *f1_2 = s2; 175 | 176 | /* printf("(*_*)< +++++ %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 177 | 178 | return; 179 | } 180 | 181 | 182 | /* GLOBAL FUNCTION */ 183 | __global__ void r4_rkf45 (int* flagM, float* pM,float* rM,float* bM, float *yM0, float *yM1, float *yM2, float tM, float toutM, float relerr, float abserr){ 184 | 185 | float ae; 186 | float dt; 187 | float ee; 188 | float eeoet; 189 | const float eps = 1.19209290E-07; 190 | float esttol; 191 | float et; 192 | float f1_0; 193 | float f2_0; 194 | float f3_0; 195 | float f4_0; 196 | float f5_0; 197 | float f1_1; 198 | float f2_1; 199 | float f3_1; 200 | float f4_1; 201 | float f5_1; 202 | float f1_2; 203 | float f2_2; 204 | float f3_2; 205 | float f4_2; 206 | float f5_2; 207 | float h = -1.0; 208 | bool hfaild; 209 | float hmin; 210 | int i; 211 | int init = -1000; 212 | int k; 213 | int kop = -1; 214 | int nfe = -1; 215 | float s; 216 | float scale; 217 | float tol; 218 | float toln; 219 | float ypk; 220 | bool output; 221 | 222 | /* user defined parameters */ 223 | float p; 224 | float r; 225 | float b; 226 | 227 | int ib = blockIdx.x; 228 | 229 | p = pM[ib]; 230 | r = rM[ib]; 231 | b = bM[ib]; 232 | 233 | 234 | /* USE register */ 235 | float t; 236 | float y0; 237 | float yp0; 238 | float y1; 239 | float yp1; 240 | float y2; 241 | float yp2; 242 | float tout; 243 | 244 | t = tM; 245 | tout = toutM; 246 | y0 = yM0[ib]; 247 | y1 = yM1[ib]; 248 | y2 = yM2[ib]; 249 | r4_f0 ( t, y0, y1, y2, &yp0, p,r,b); 250 | r4_f1 ( t, y0, y1, y2, &yp1, p,r,b); 251 | r4_f2 ( t, y0, y1, y2, &yp2, p,r,b); 252 | 253 | dt = tout - t; 254 | 255 | if ( init == 0 ){ 256 | 257 | init = 1; 258 | h = abs( dt ); 259 | toln = 0.0; 260 | 261 | /* for ( k = 0; k < neqn; k++ ){ */ 262 | tol = (relerr) * abs( y0 ) + abserr; 263 | if ( 0.0 < tol ){ 264 | toln = tol; 265 | ypk = abs( yp0 ); 266 | if ( tol < ypk * pow ( h, 5 ) ) 267 | { 268 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 269 | }} 270 | 271 | tol = (relerr) * abs( y1 ) + abserr; 272 | if ( 0.0 < tol ){ 273 | toln = tol; 274 | ypk = abs( yp1 ); 275 | if ( tol < ypk * pow ( h, 5 ) ) 276 | { 277 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 278 | }} 279 | 280 | tol = (relerr) * abs( y2 ) + abserr; 281 | if ( 0.0 < tol ){ 282 | toln = tol; 283 | ypk = abs( yp2 ); 284 | if ( tol < ypk * pow ( h, 5 ) ) 285 | { 286 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 287 | }} 288 | /* } */ 289 | 290 | 291 | if ( toln <= 0.0 ){h = 0.0;} 292 | h = max ( h, 26.0 * eps * max ( abs ( t ), abs ( dt ) ) ); 293 | } 294 | 295 | 296 | /* SIGN(positive/negative -> 1/-1) to signbit(positive/negative -> 0/1) in CUDA math API */ 297 | 298 | h = ( - 2.0* signbit( dt ) + 1.0 ) *abs( h ); 299 | 300 | if ( 2.0 * abs( dt ) <= abs( h ) ){ 301 | kop = kop + 1; 302 | } 303 | 304 | output = false; 305 | scale = 2.0 / (relerr); 306 | ae = scale * abserr; 307 | 308 | for ( ; ; ){ 309 | hfaild = false; 310 | hmin = 26.0 * eps * abs ( t ); 311 | dt = tout - t; 312 | 313 | if ( 2.0 * abs ( h ) <= abs ( dt ) ){ 314 | }else{ 315 | if ( abs ( dt ) <= abs ( h ) ){ 316 | output = true; 317 | h = dt; 318 | }else{ 319 | h = 0.5 * dt; 320 | } 321 | } 322 | 323 | for ( ; ; ){ 324 | 325 | if ( MAXNFE < nfe ){ 326 | 327 | tM = t; 328 | yM0[ib] = y0; 329 | yM1[ib] = y1; 330 | yM2[ib] = y2; 331 | 332 | 333 | flagM[ib]=4; 334 | 335 | printf("(*_*)< t=%2.8f \\n",t); 336 | printf("*WARNING! END MAXNFE < nfe condition! \\n"); 337 | 338 | return; 339 | } 340 | 341 | /* printf("(*_*)< >>>>> %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 342 | r4_fehl (y0, y1, y2, t, h, yp0, yp1, yp2, &f1_0, &f2_0, &f3_0, &f4_0, &f5_0, &f1_1, &f2_1, &f3_1, &f4_1, &f5_1, &f1_2, &f2_2, &f3_2, &f4_2, &f5_2, p, r, b); 343 | 344 | /* printf("(*_*)< <<<<< %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 345 | 346 | nfe = nfe + 5; 347 | eeoet = 0.0; 348 | 349 | /* for ( k = 0; k < neqn; k++ ){ */ 350 | et = abs ( y0 ) + abs ( f1_0 ) + ae; 351 | ee = abs ( ( -2090.0 * yp0 352 | + ( 21970.0 * f3_0 - 15048.0 * f4_0 ) 353 | ) 354 | + ( 22528.0 * f2_0 - 27360.0 * f5_0 ) 355 | ); 356 | eeoet = max ( eeoet, ee / et ); 357 | 358 | et = abs ( y1 ) + abs ( f1_1 ) + ae; 359 | ee = abs ( ( -2090.0 * yp1 360 | + ( 21970.0 * f3_1 - 15048.0 * f4_1 ) 361 | ) 362 | + ( 22528.0 * f2_1 - 27360.0 * f5_1 ) 363 | ); 364 | eeoet = max ( eeoet, ee / et ); 365 | 366 | et = abs ( y2 ) + abs ( f1_2 ) + ae; 367 | ee = abs ( ( -2090.0 * yp2 368 | + ( 21970.0 * f3_2 - 15048.0 * f4_2 ) 369 | ) 370 | + ( 22528.0 * f2_2 - 27360.0 * f5_2 ) 371 | ); 372 | eeoet = max ( eeoet, ee / et ); 373 | /* } */ 374 | 375 | esttol = abs ( h ) * eeoet * scale / 752400.0; 376 | 377 | if ( esttol <= 1.0 ) 378 | { 379 | break; 380 | } 381 | 382 | hfaild = true; 383 | output = false; 384 | 385 | /* printf("(*_*)< h = %2.8f, esttol= %2.8f \\n",h, esttol); */ 386 | 387 | if ( esttol < 59049.0 ){ 388 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 ); 389 | }else{ 390 | s = 0.1; 391 | } 392 | 393 | h = s * h; 394 | } 395 | 396 | t = t + h; 397 | /* for ( i = 0; i < neqn; i++ ){ */ 398 | y0 = f1_0; 399 | y1 = f1_1; 400 | y2 = f1_2; 401 | /* } */ 402 | 403 | r4_f0 ( t, y0, y1, y2, &yp0, p,r,b); 404 | r4_f1 ( t, y0, y1, y2, &yp1, p,r,b); 405 | r4_f2 ( t, y0, y1, y2, &yp2, p,r,b); 406 | 407 | nfe = nfe + 1; 408 | if ( 0.0001889568 < esttol ) 409 | { 410 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 ); 411 | } 412 | else 413 | { 414 | s = 5.0; 415 | } 416 | 417 | if ( hfaild ) 418 | { 419 | s = min ( s, 1.0 ); 420 | } 421 | 422 | h = ( - 2.0* signbit( h ) + 1.0 ) * max ( s * abs ( h ), hmin ); 423 | 424 | if (output){ 425 | 426 | tM = t; 427 | yM0[ib] = y0; 428 | yM1[ib] = y1; 429 | yM2[ib] = y2; 430 | 431 | flagM[ib]=2; 432 | 433 | /* printf("Normal Exit N=%d\\n",nfe); */ 434 | 435 | return; 436 | } 437 | 438 | } 439 | 440 | 441 | return; 442 | 443 | } 444 | 445 | """,options=['-use_fast_math']) 446 | 447 | return source_module 448 | 449 | 450 | if __name__ == "__main__": 451 | import numpy as np 452 | import matplotlib.pyplot as plt 453 | 454 | print("*******************************************") 455 | print("GPU RKF45 solver for the following example. 3D") 456 | print("Lorenz Attractor") 457 | print("*******************************************") 458 | 459 | source_module=grkf45_module() 460 | pkernel=source_module.get_function("r4_rkf45") 461 | 462 | eps=1.19209290e-07 463 | relerr=np.sqrt(eps) 464 | abserr=np.sqrt(eps) 465 | print("abserr=",abserr) 466 | print("relerr=",relerr) 467 | 468 | 469 | nw=1 470 | nt=64 471 | nq=1 472 | nb = nw*nt*nq 473 | sharedsize=0 #byte 474 | 475 | #parameters 476 | p=np.ones(nb)*np.array(10.0) 477 | p=p.astype(np.float32) 478 | dev_p = cuda.mem_alloc(p.nbytes) 479 | cuda.memcpy_htod(dev_p,p) 480 | 481 | r=np.ones(nb)*np.array(28.0) 482 | r=r.astype(np.float32) 483 | dev_r = cuda.mem_alloc(r.nbytes) 484 | cuda.memcpy_htod(dev_r,r) 485 | 486 | b=np.ones(nb)*np.array(8.0/3.0) 487 | b=b.astype(np.float32) 488 | dev_b = cuda.mem_alloc(b.nbytes) 489 | cuda.memcpy_htod(dev_b,b) 490 | 491 | 492 | y0=np.linspace(0,30.0,nb) 493 | y0=y0.astype(np.float32) 494 | dev_y0 = cuda.mem_alloc(y0.nbytes) 495 | cuda.memcpy_htod(dev_y0,y0) 496 | 497 | y1=np.ones(nb)*-7 498 | y1=y1.astype(np.float32) 499 | dev_y1 = cuda.mem_alloc(y1.nbytes) 500 | cuda.memcpy_htod(dev_y1,y1) 501 | 502 | y2=np.ones(nb)*-5 503 | y2=y1.astype(np.float32) 504 | dev_y2 = cuda.mem_alloc(y2.nbytes) 505 | cuda.memcpy_htod(dev_y2,y2) 506 | 507 | flag=np.zeros(nb) 508 | flag=flag.astype(np.int32) 509 | dev_flag = cuda.mem_alloc(flag.nbytes) 510 | cuda.memcpy_htod(dev_flag,flag) 511 | 512 | t=np.linspace(0.0,100.0,10000) 513 | yarr0=[] 514 | yarr0.append(np.copy(y0)) 515 | yarr1=[] 516 | yarr1.append(np.copy(y1)) 517 | yarr2=[] 518 | yarr2.append(np.copy(y2)) 519 | 520 | for j,tnow in enumerate(t[:-1]): 521 | tin=tnow 522 | tout=t[j+1] 523 | pkernel(dev_flag, dev_p,dev_r,dev_b,dev_y0,dev_y1,dev_y2,np.float32(tin),np.float32(tout),np.float32(relerr),np.float32(abserr),block=(int(nw),1,1), grid=(int(nt),int(nq)),shared=sharedsize) 524 | 525 | cuda.memcpy_dtoh(y0, dev_y0) 526 | cuda.memcpy_dtoh(y1, dev_y1) 527 | cuda.memcpy_dtoh(y2, dev_y2) 528 | 529 | cuda.memcpy_dtoh(flag, dev_flag) 530 | #print("T=",tnow,"RKF45: y=",y0) 531 | #print("FLAG=",flag) 532 | yarr0.append(np.copy(y0)) 533 | yarr1.append(np.copy(y1)) 534 | yarr2.append(np.copy(y2)) 535 | 536 | 537 | yarr0=np.array(yarr0) 538 | yarr1=np.array(yarr1) 539 | yarr2=np.array(yarr2) 540 | 541 | print("END") 542 | colors=plt.cm.Spectral(np.linspace(0,1,30)) 543 | cm = plt.get_cmap('magma') 544 | fig = plt.figure() 545 | ax = fig.add_subplot(121,aspect=1.0) 546 | ax.set_prop_cycle('color',colors) 547 | ax.plot(yarr0[:,:],yarr1[:,:],alpha=0.5) 548 | plt.ylim(-40,40) 549 | plt.xlim(-30,40) 550 | 551 | ax = fig.add_subplot(122,aspect=1.0) 552 | ax.set_prop_cycle('color',colors) 553 | ax.plot(yarr2[:,:],yarr1[:,:],alpha=0.5) 554 | plt.ylim(-40,40) 555 | plt.xlim(-10,60) 556 | plt.savefig("Lorentz.png") 557 | plt.show() 558 | -------------------------------------------------------------------------------- /grkf45/grkf45.py: -------------------------------------------------------------------------------- 1 | import pycuda.autoinit 2 | import pycuda.driver as cuda 3 | import pycuda.compiler 4 | from pycuda.compiler import SourceModule 5 | 6 | def grkf45_module(): 7 | source_module = SourceModule(""" 8 | 9 | #include 10 | #include 11 | #include 12 | # define MAXNFE 10000 13 | 14 | using namespace std; 15 | 16 | /* USER GIVEN DEVICE FUNCTION: parameter = a */ 17 | __device__ void r4_f0 ( float t, float y0, float y1, float y2, float *yp0, float p, float r, float b){ 18 | 19 | *yp0 = -p*y0 + p*y1; 20 | 21 | return; 22 | } 23 | __device__ void r4_f1 ( float t, float y0, float y1, float y2, float *yp1, float p, float r, float b){ 24 | 25 | *yp1 = -y0*y2 + r*y0 - y1; 26 | 27 | return; 28 | } 29 | __device__ void r4_f2 ( float t, float y0, float y1, float y2, float *yp2, float p, float r, float b){ 30 | 31 | *yp2 = y0*y1 - b*y2 ; 32 | 33 | return; 34 | } 35 | 36 | 37 | 38 | 39 | /* DEVICE FUNCTION */ 40 | 41 | __device__ void r4_fehl (float y0, float y1, float y2, float t, float h, float yp0, float yp1, float yp2, float *f1_0, float *f2_0, float *f3_0, float *f4_0, float *f5_0, float *f1_1, float *f2_1, float *f3_1, float *f4_1, float *f5_1,float *f1_2, float *f2_2, float *f3_2, float *f4_2, float *f5_2, float p, float r, float b){ 42 | 43 | float ch; 44 | int i; 45 | float s0; 46 | float s1; 47 | float s2; 48 | 49 | ch = h / 4.0; 50 | 51 | /* for ( i = 0; i < neqn; i++ ){ */ 52 | *f5_0 = y0 + ch * yp0; 53 | *f5_1 = y1 + ch * yp1; 54 | *f5_2 = y2 + ch * yp2; 55 | 56 | /* } */ 57 | 58 | r4_f0 ( t + ch, *f5_0, *f5_1, *f5_2, f1_0, p, r, b); 59 | r4_f1 ( t + ch, *f5_0, *f5_1, *f5_2, f1_1, p, r, b); 60 | r4_f2 ( t + ch, *f5_0, *f5_1, *f5_2, f1_2, p, r, b); 61 | 62 | ch = 3.0 * h / 32.0; 63 | 64 | /* for ( i = 0; i < neqn; i++ ){ */ 65 | *f5_0 = y0 + ch * ( yp0 + 3.0 * *f1_0 ); 66 | *f5_1 = y1 + ch * ( yp1 + 3.0 * *f1_1 ); 67 | *f5_2 = y2 + ch * ( yp2 + 3.0 * *f1_2 ); 68 | /* } */ 69 | 70 | r4_f0 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_0, p, r, b); 71 | r4_f1 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_1, p, r, b); 72 | r4_f2 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_2, p, r, b); 73 | 74 | ch = h / 2197.0; 75 | 76 | /* for ( i = 0; i < neqn; i++ ){ */ 77 | *f5_0 = y0 + ch * 78 | ( 1932.0 * yp0 79 | + ( 7296.0 * *f2_0 - 7200.0 * *f1_0 ) 80 | ); 81 | *f5_1 = y1 + ch * 82 | ( 1932.0 * yp1 83 | + ( 7296.0 * *f2_1 - 7200.0 * *f1_1 ) 84 | ); 85 | *f5_2 = y2 + ch * 86 | ( 1932.0 * yp2 87 | + ( 7296.0 * *f2_2 - 7200.0 * *f1_2 ) 88 | ); 89 | /* } */ 90 | 91 | r4_f0 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_0, p, r, b); 92 | r4_f1 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_1, p, r, b); 93 | r4_f2 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_2, p, r, b); 94 | 95 | ch = h / 4104.0; 96 | 97 | /* for ( i = 0; i < neqn; i++ ){ */ 98 | *f5_0 = y0 + ch * 99 | ( 100 | ( 8341.0 * yp0 - 845.0 * *f3_0 ) 101 | + ( 29440.0 * *f2_0 - 32832.0 * *f1_0 ) 102 | ); 103 | *f5_1 = y1 + ch * 104 | ( 105 | ( 8341.0 * yp1 - 845.0 * *f3_1 ) 106 | + ( 29440.0 * *f2_1 - 32832.0 * *f1_1 ) 107 | ); 108 | *f5_2 = y2 + ch * 109 | ( 110 | ( 8341.0 * yp2 - 845.0 * *f3_2 ) 111 | + ( 29440.0 * *f2_2 - 32832.0 * *f1_2 ) 112 | ); 113 | /* } */ 114 | 115 | r4_f0 ( t + h, *f5_0,*f5_1,*f5_2, f4_0, p, r, b); 116 | r4_f1 ( t + h, *f5_0,*f5_1,*f5_2, f4_1, p, r, b); 117 | r4_f2 ( t + h, *f5_0,*f5_1,*f5_2, f4_2, p, r, b); 118 | 119 | ch = h / 20520.0; 120 | 121 | /* for ( i = 0; i < neqn; i++ ){ */ 122 | *f1_0 = y0 + ch * 123 | ( 124 | ( -6080.0 * yp0 125 | + ( 9295.0 * *f3_0 - 5643.0 * *f4_0 ) 126 | ) 127 | + ( 41040.0 * *f1_0 - 28352.0 * *f2_0 ) 128 | ); 129 | *f1_1 = y1 + ch * 130 | ( 131 | ( -6080.0 * yp1 132 | + ( 9295.0 * *f3_1 - 5643.0 * *f4_1 ) 133 | ) 134 | + ( 41040.0 * *f1_1 - 28352.0 * *f2_1 ) 135 | ); 136 | *f1_2 = y2 + ch * 137 | ( 138 | ( -6080.0 * yp2 139 | + ( 9295.0 * *f3_2 - 5643.0 * *f4_2 ) 140 | ) 141 | + ( 41040.0 * *f1_2 - 28352.0 * *f2_2 ) 142 | ); 143 | /* } */ 144 | 145 | r4_f0 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_0, p,r,b); 146 | r4_f1 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_1, p,r,b); 147 | r4_f2 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_1, p,r,b); 148 | 149 | ch = h / 7618050.0; 150 | 151 | /* for ( i = 0; i < neqn; i++ ){ */ 152 | s0 = y0 + ch * 153 | ( 154 | ( 902880.0 * yp0 155 | + ( 3855735.0 * *f3_0 - 1371249.0 * *f4_0 ) ) 156 | + ( 3953664.0 * *f2_0 + 277020.0 * *f5_0 ) 157 | ); 158 | s1 = y1 + ch * 159 | ( 160 | ( 902880.0 * yp1 161 | + ( 3855735.0 * *f3_1 - 1371249.0 * *f4_1 ) ) 162 | + ( 3953664.0 * *f2_1 + 277020.0 * *f5_1 ) 163 | ); 164 | s2 = y2 + ch * 165 | ( 166 | ( 902880.0 * yp2 167 | + ( 3855735.0 * *f3_2 - 1371249.0 * *f4_2 ) ) 168 | + ( 3953664.0 * *f2_2 + 277020.0 * *f5_2 ) 169 | ); 170 | /* } */ 171 | 172 | *f1_0 = s0; 173 | *f1_1 = s1; 174 | *f1_2 = s2; 175 | 176 | /* printf("(*_*)< +++++ %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 177 | 178 | return; 179 | } 180 | 181 | 182 | /* GLOBAL FUNCTION */ 183 | __global__ void r4_rkf45 (int* flagM, float* pM,float* rM,float* bM, float *yM0, float *yM1, float *yM2, float tM, float toutM, float relerr, float abserr){ 184 | 185 | float ae; 186 | float dt; 187 | float ee; 188 | float eeoet; 189 | const float eps = 1.19209290E-07; 190 | float esttol; 191 | float et; 192 | float f1_0; 193 | float f2_0; 194 | float f3_0; 195 | float f4_0; 196 | float f5_0; 197 | float f1_1; 198 | float f2_1; 199 | float f3_1; 200 | float f4_1; 201 | float f5_1; 202 | float f1_2; 203 | float f2_2; 204 | float f3_2; 205 | float f4_2; 206 | float f5_2; 207 | float h = -1.0; 208 | bool hfaild; 209 | float hmin; 210 | int i; 211 | int init = -1000; 212 | int k; 213 | int kop = -1; 214 | int nfe = -1; 215 | float s; 216 | float scale; 217 | float tol; 218 | float toln; 219 | float ypk; 220 | bool output; 221 | 222 | /* user defined parameters */ 223 | float p; 224 | float r; 225 | float b; 226 | 227 | int ib = blockIdx.x; 228 | 229 | p = pM[ib]; 230 | r = rM[ib]; 231 | b = bM[ib]; 232 | 233 | 234 | /* USE register */ 235 | float t; 236 | float y0; 237 | float yp0; 238 | float y1; 239 | float yp1; 240 | float y2; 241 | float yp2; 242 | float tout; 243 | 244 | t = tM; 245 | tout = toutM; 246 | y0 = yM0[ib]; 247 | y1 = yM1[ib]; 248 | y2 = yM2[ib]; 249 | r4_f0 ( t, y0, y1, y2, &yp0, p,r,b); 250 | r4_f1 ( t, y0, y1, y2, &yp1, p,r,b); 251 | r4_f2 ( t, y0, y1, y2, &yp2, p,r,b); 252 | 253 | dt = tout - t; 254 | 255 | if ( init == 0 ){ 256 | 257 | init = 1; 258 | h = abs( dt ); 259 | toln = 0.0; 260 | 261 | /* for ( k = 0; k < neqn; k++ ){ */ 262 | tol = (relerr) * abs( y0 ) + abserr; 263 | if ( 0.0 < tol ){ 264 | toln = tol; 265 | ypk = abs( yp0 ); 266 | if ( tol < ypk * pow ( h, 5 ) ) 267 | { 268 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 269 | }} 270 | 271 | tol = (relerr) * abs( y1 ) + abserr; 272 | if ( 0.0 < tol ){ 273 | toln = tol; 274 | ypk = abs( yp1 ); 275 | if ( tol < ypk * pow ( h, 5 ) ) 276 | { 277 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 278 | }} 279 | 280 | tol = (relerr) * abs( y2 ) + abserr; 281 | if ( 0.0 < tol ){ 282 | toln = tol; 283 | ypk = abs( yp2 ); 284 | if ( tol < ypk * pow ( h, 5 ) ) 285 | { 286 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 ); 287 | }} 288 | /* } */ 289 | 290 | 291 | if ( toln <= 0.0 ){h = 0.0;} 292 | h = max ( h, 26.0 * eps * max ( abs ( t ), abs ( dt ) ) ); 293 | } 294 | 295 | 296 | /* SIGN(positive/negative -> 1/-1) to signbit(positive/negative -> 0/1) in CUDA math API */ 297 | 298 | h = ( - 2.0* signbit( dt ) + 1.0 ) *abs( h ); 299 | 300 | if ( 2.0 * abs( dt ) <= abs( h ) ){ 301 | kop = kop + 1; 302 | } 303 | 304 | output = false; 305 | scale = 2.0 / (relerr); 306 | ae = scale * abserr; 307 | 308 | for ( ; ; ){ 309 | hfaild = false; 310 | hmin = 26.0 * eps * abs ( t ); 311 | dt = tout - t; 312 | 313 | if ( 2.0 * abs ( h ) <= abs ( dt ) ){ 314 | }else{ 315 | if ( abs ( dt ) <= abs ( h ) ){ 316 | output = true; 317 | h = dt; 318 | }else{ 319 | h = 0.5 * dt; 320 | } 321 | } 322 | 323 | for ( ; ; ){ 324 | 325 | if ( MAXNFE < nfe ){ 326 | 327 | tM = t; 328 | yM0[ib] = y0; 329 | yM1[ib] = y1; 330 | yM2[ib] = y2; 331 | 332 | 333 | flagM[ib]=4; 334 | 335 | printf("(*_*)< t=%2.8f \\n",t); 336 | printf("*WARNING! END MAXNFE < nfe condition! \\n"); 337 | 338 | return; 339 | } 340 | 341 | /* printf("(*_*)< >>>>> %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 342 | r4_fehl (y0, y1, y2, t, h, yp0, yp1, yp2, &f1_0, &f2_0, &f3_0, &f4_0, &f5_0, &f1_1, &f2_1, &f3_1, &f4_1, &f5_1, &f1_2, &f2_2, &f3_2, &f4_2, &f5_2, p, r, b); 343 | 344 | /* printf("(*_*)< <<<<< %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */ 345 | 346 | nfe = nfe + 5; 347 | eeoet = 0.0; 348 | 349 | /* for ( k = 0; k < neqn; k++ ){ */ 350 | et = abs ( y0 ) + abs ( f1_0 ) + ae; 351 | ee = abs ( ( -2090.0 * yp0 352 | + ( 21970.0 * f3_0 - 15048.0 * f4_0 ) 353 | ) 354 | + ( 22528.0 * f2_0 - 27360.0 * f5_0 ) 355 | ); 356 | eeoet = max ( eeoet, ee / et ); 357 | 358 | et = abs ( y1 ) + abs ( f1_1 ) + ae; 359 | ee = abs ( ( -2090.0 * yp1 360 | + ( 21970.0 * f3_1 - 15048.0 * f4_1 ) 361 | ) 362 | + ( 22528.0 * f2_1 - 27360.0 * f5_1 ) 363 | ); 364 | eeoet = max ( eeoet, ee / et ); 365 | 366 | et = abs ( y2 ) + abs ( f1_2 ) + ae; 367 | ee = abs ( ( -2090.0 * yp2 368 | + ( 21970.0 * f3_2 - 15048.0 * f4_2 ) 369 | ) 370 | + ( 22528.0 * f2_2 - 27360.0 * f5_2 ) 371 | ); 372 | eeoet = max ( eeoet, ee / et ); 373 | /* } */ 374 | 375 | esttol = abs ( h ) * eeoet * scale / 752400.0; 376 | 377 | if ( esttol <= 1.0 ) 378 | { 379 | break; 380 | } 381 | 382 | hfaild = true; 383 | output = false; 384 | 385 | /* printf("(*_*)< h = %2.8f, esttol= %2.8f \\n",h, esttol); */ 386 | 387 | if ( esttol < 59049.0 ){ 388 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 ); 389 | }else{ 390 | s = 0.1; 391 | } 392 | 393 | h = s * h; 394 | } 395 | 396 | t = t + h; 397 | /* for ( i = 0; i < neqn; i++ ){ */ 398 | y0 = f1_0; 399 | y1 = f1_1; 400 | y2 = f1_2; 401 | /* } */ 402 | 403 | r4_f0 ( t, y0, y1, y2, &yp0, p,r,b); 404 | r4_f1 ( t, y0, y1, y2, &yp1, p,r,b); 405 | r4_f2 ( t, y0, y1, y2, &yp2, p,r,b); 406 | 407 | nfe = nfe + 1; 408 | if ( 0.0001889568 < esttol ) 409 | { 410 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 ); 411 | } 412 | else 413 | { 414 | s = 5.0; 415 | } 416 | 417 | if ( hfaild ) 418 | { 419 | s = min ( s, 1.0 ); 420 | } 421 | 422 | h = ( - 2.0* signbit( h ) + 1.0 ) * max ( s * abs ( h ), hmin ); 423 | 424 | if (output){ 425 | 426 | tM = t; 427 | yM0[ib] = y0; 428 | yM1[ib] = y1; 429 | yM2[ib] = y2; 430 | 431 | flagM[ib]=2; 432 | 433 | /* printf("Normal Exit N=%d\\n",nfe); */ 434 | 435 | return; 436 | } 437 | 438 | } 439 | 440 | 441 | return; 442 | 443 | } 444 | 445 | """,options=['-use_fast_math']) 446 | 447 | return source_module 448 | 449 | 450 | if __name__ == "__main__": 451 | import numpy as np 452 | import matplotlib.pyplot as plt 453 | import time 454 | import sys 455 | 456 | print("*******************************************") 457 | print("GPU RKF45 solver for the following example. 3D") 458 | print("Lorenz Attractor") 459 | print("*******************************************") 460 | tstart=time.time() 461 | source_module=grkf45_module() 462 | pkernel=source_module.get_function("r4_rkf45") 463 | 464 | eps=1.19209290e-07 465 | relerr=np.sqrt(eps) 466 | abserr=np.sqrt(eps) 467 | print("abserr=",abserr) 468 | print("relerr=",relerr) 469 | 470 | 471 | nw=1 472 | nt=10000 473 | nq=1 474 | nb = nw*nt*nq 475 | sharedsize=0 #byte 476 | 477 | #parameters 478 | p=np.ones(nb)*np.array(10.0) 479 | p=p.astype(np.float32) 480 | dev_p = cuda.mem_alloc(p.nbytes) 481 | cuda.memcpy_htod(dev_p,p) 482 | 483 | r=np.ones(nb)*np.array(28.0) 484 | r=r.astype(np.float32) 485 | dev_r = cuda.mem_alloc(r.nbytes) 486 | cuda.memcpy_htod(dev_r,r) 487 | 488 | b=np.ones(nb)*np.array(8.0/3.0) 489 | b=b.astype(np.float32) 490 | dev_b = cuda.mem_alloc(b.nbytes) 491 | cuda.memcpy_htod(dev_b,b) 492 | 493 | 494 | y0=np.linspace(0,30.0,nb) 495 | y0=y0.astype(np.float32) 496 | dev_y0 = cuda.mem_alloc(y0.nbytes) 497 | cuda.memcpy_htod(dev_y0,y0) 498 | 499 | y1=np.ones(nb)*-7 500 | y1=y1.astype(np.float32) 501 | dev_y1 = cuda.mem_alloc(y1.nbytes) 502 | cuda.memcpy_htod(dev_y1,y1) 503 | 504 | y2=np.ones(nb)*-5 505 | y2=y1.astype(np.float32) 506 | dev_y2 = cuda.mem_alloc(y2.nbytes) 507 | cuda.memcpy_htod(dev_y2,y2) 508 | 509 | flag=np.zeros(nb) 510 | flag=flag.astype(np.int32) 511 | dev_flag = cuda.mem_alloc(flag.nbytes) 512 | cuda.memcpy_htod(dev_flag,flag) 513 | 514 | t=np.linspace(0.0,100.0,10000) 515 | yarr0=[] 516 | yarr0.append(np.copy(y0)) 517 | yarr1=[] 518 | yarr1.append(np.copy(y1)) 519 | yarr2=[] 520 | yarr2.append(np.copy(y2)) 521 | 522 | for j,tnow in enumerate(t[:-1]): 523 | tin=tnow 524 | tout=t[j+1] 525 | pkernel(dev_flag, dev_p,dev_r,dev_b,dev_y0,dev_y1,dev_y2,np.float32(tin),np.float32(tout),np.float32(relerr),np.float32(abserr),block=(int(nw),1,1), grid=(int(nt),int(nq)),shared=sharedsize) 526 | 527 | cuda.memcpy_dtoh(y0, dev_y0) 528 | cuda.memcpy_dtoh(y1, dev_y1) 529 | cuda.memcpy_dtoh(y2, dev_y2) 530 | 531 | cuda.memcpy_dtoh(flag, dev_flag) 532 | #print("T=",tnow,"RKF45: y=",y0) 533 | #print("FLAG=",flag) 534 | yarr0.append(np.copy(y0)) 535 | yarr1.append(np.copy(y1)) 536 | yarr2.append(np.copy(y2)) 537 | 538 | 539 | yarr0=np.array(yarr0) 540 | yarr1=np.array(yarr1) 541 | yarr2=np.array(yarr2) 542 | 543 | print("END") 544 | tend=time.time() 545 | print("T=",tend-tstart) 546 | 547 | sys.exit() 548 | colors=plt.cm.Spectral(np.linspace(0,1,30)) 549 | cm = plt.get_cmap('magma') 550 | fig = plt.figure() 551 | ax = fig.add_subplot(121,aspect=1.0) 552 | ax.set_prop_cycle('color',colors) 553 | ax.plot(yarr0[:,:],yarr1[:,:],alpha=0.5) 554 | plt.ylim(-40,40) 555 | plt.xlim(-30,40) 556 | 557 | ax = fig.add_subplot(122,aspect=1.0) 558 | ax.set_prop_cycle('color',colors) 559 | ax.plot(yarr2[:,:],yarr1[:,:],alpha=0.5) 560 | plt.ylim(-40,40) 561 | plt.xlim(-10,60) 562 | plt.savefig("Lorentz.png") 563 | plt.show() 564 | --------------------------------------------------------------------------------