├── documents
└── figs
│ ├── Lorentz.png
│ └── vanderpol.png
├── Readme.md
├── LICENSE
└── grkf45
├── grkf45_1d.py
├── grkf45_2d.py
├── grkf45_3d.py
└── grkf45.py
/documents/figs/Lorentz.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HajimeKawahara/grkf45/HEAD/documents/figs/Lorentz.png
--------------------------------------------------------------------------------
/documents/figs/vanderpol.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HajimeKawahara/grkf45/HEAD/documents/figs/vanderpol.png
--------------------------------------------------------------------------------
/Readme.md:
--------------------------------------------------------------------------------
1 | # GPU RKF45 (Runge–Kutta–Fehlberg) ODE parallel solver written in pycuda
2 |
3 | The Runge–Kutta–Fehlberg method (RKF45) is one of the widely used ODE solvers. GRKF45 is a parallel RKF45 solver with many different parameter sets.
4 |
5 | The original RKF45 code was taken from [cpp code](http://people.sc.fsu.edu/~jburkardt/cpp_src/rkf45/rkf45.html) by John Burkardt (LGPL).
6 |
7 | Currently, you need to define the ODEs in r4_f0, r4_f1, r4_f2, .... in GRKF45.
8 |
9 |
10 | ## codes
11 |
12 | - grkf45_1d (1D ODE)
13 | - grkf45_2d (2D ODE, the van der pol oscillator)
14 |
15 |
16 |
17 | - grkf45_3d (3D ODE, the Lorenz attractor)
18 |
19 |
20 |
21 |
22 |
23 |
24 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU LESSER GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 |
9 | This version of the GNU Lesser General Public License incorporates
10 | the terms and conditions of version 3 of the GNU General Public
11 | License, supplemented by the additional permissions listed below.
12 |
13 | 0. Additional Definitions.
14 |
15 | As used herein, "this License" refers to version 3 of the GNU Lesser
16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU
17 | General Public License.
18 |
19 | "The Library" refers to a covered work governed by this License,
20 | other than an Application or a Combined Work as defined below.
21 |
22 | An "Application" is any work that makes use of an interface provided
23 | by the Library, but which is not otherwise based on the Library.
24 | Defining a subclass of a class defined by the Library is deemed a mode
25 | of using an interface provided by the Library.
26 |
27 | A "Combined Work" is a work produced by combining or linking an
28 | Application with the Library. The particular version of the Library
29 | with which the Combined Work was made is also called the "Linked
30 | Version".
31 |
32 | The "Minimal Corresponding Source" for a Combined Work means the
33 | Corresponding Source for the Combined Work, excluding any source code
34 | for portions of the Combined Work that, considered in isolation, are
35 | based on the Application, and not on the Linked Version.
36 |
37 | The "Corresponding Application Code" for a Combined Work means the
38 | object code and/or source code for the Application, including any data
39 | and utility programs needed for reproducing the Combined Work from the
40 | Application, but excluding the System Libraries of the Combined Work.
41 |
42 | 1. Exception to Section 3 of the GNU GPL.
43 |
44 | You may convey a covered work under sections 3 and 4 of this License
45 | without being bound by section 3 of the GNU GPL.
46 |
47 | 2. Conveying Modified Versions.
48 |
49 | If you modify a copy of the Library, and, in your modifications, a
50 | facility refers to a function or data to be supplied by an Application
51 | that uses the facility (other than as an argument passed when the
52 | facility is invoked), then you may convey a copy of the modified
53 | version:
54 |
55 | a) under this License, provided that you make a good faith effort to
56 | ensure that, in the event an Application does not supply the
57 | function or data, the facility still operates, and performs
58 | whatever part of its purpose remains meaningful, or
59 |
60 | b) under the GNU GPL, with none of the additional permissions of
61 | this License applicable to that copy.
62 |
63 | 3. Object Code Incorporating Material from Library Header Files.
64 |
65 | The object code form of an Application may incorporate material from
66 | a header file that is part of the Library. You may convey such object
67 | code under terms of your choice, provided that, if the incorporated
68 | material is not limited to numerical parameters, data structure
69 | layouts and accessors, or small macros, inline functions and templates
70 | (ten or fewer lines in length), you do both of the following:
71 |
72 | a) Give prominent notice with each copy of the object code that the
73 | Library is used in it and that the Library and its use are
74 | covered by this License.
75 |
76 | b) Accompany the object code with a copy of the GNU GPL and this license
77 | document.
78 |
79 | 4. Combined Works.
80 |
81 | You may convey a Combined Work under terms of your choice that,
82 | taken together, effectively do not restrict modification of the
83 | portions of the Library contained in the Combined Work and reverse
84 | engineering for debugging such modifications, if you also do each of
85 | the following:
86 |
87 | a) Give prominent notice with each copy of the Combined Work that
88 | the Library is used in it and that the Library and its use are
89 | covered by this License.
90 |
91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license
92 | document.
93 |
94 | c) For a Combined Work that displays copyright notices during
95 | execution, include the copyright notice for the Library among
96 | these notices, as well as a reference directing the user to the
97 | copies of the GNU GPL and this license document.
98 |
99 | d) Do one of the following:
100 |
101 | 0) Convey the Minimal Corresponding Source under the terms of this
102 | License, and the Corresponding Application Code in a form
103 | suitable for, and under terms that permit, the user to
104 | recombine or relink the Application with a modified version of
105 | the Linked Version to produce a modified Combined Work, in the
106 | manner specified by section 6 of the GNU GPL for conveying
107 | Corresponding Source.
108 |
109 | 1) Use a suitable shared library mechanism for linking with the
110 | Library. A suitable mechanism is one that (a) uses at run time
111 | a copy of the Library already present on the user's computer
112 | system, and (b) will operate properly with a modified version
113 | of the Library that is interface-compatible with the Linked
114 | Version.
115 |
116 | e) Provide Installation Information, but only if you would otherwise
117 | be required to provide such information under section 6 of the
118 | GNU GPL, and only to the extent that such information is
119 | necessary to install and execute a modified version of the
120 | Combined Work produced by recombining or relinking the
121 | Application with a modified version of the Linked Version. (If
122 | you use option 4d0, the Installation Information must accompany
123 | the Minimal Corresponding Source and Corresponding Application
124 | Code. If you use option 4d1, you must provide the Installation
125 | Information in the manner specified by section 6 of the GNU GPL
126 | for conveying Corresponding Source.)
127 |
128 | 5. Combined Libraries.
129 |
130 | You may place library facilities that are a work based on the
131 | Library side by side in a single library together with other library
132 | facilities that are not Applications and are not covered by this
133 | License, and convey such a combined library under terms of your
134 | choice, if you do both of the following:
135 |
136 | a) Accompany the combined library with a copy of the same work based
137 | on the Library, uncombined with any other library facilities,
138 | conveyed under the terms of this License.
139 |
140 | b) Give prominent notice with the combined library that part of it
141 | is a work based on the Library, and explaining where to find the
142 | accompanying uncombined form of the same work.
143 |
144 | 6. Revised Versions of the GNU Lesser General Public License.
145 |
146 | The Free Software Foundation may publish revised and/or new versions
147 | of the GNU Lesser General Public License from time to time. Such new
148 | versions will be similar in spirit to the present version, but may
149 | differ in detail to address new problems or concerns.
150 |
151 | Each version is given a distinguishing version number. If the
152 | Library as you received it specifies that a certain numbered version
153 | of the GNU Lesser General Public License "or any later version"
154 | applies to it, you have the option of following the terms and
155 | conditions either of that published version or of any later version
156 | published by the Free Software Foundation. If the Library as you
157 | received it does not specify a version number of the GNU Lesser
158 | General Public License, you may choose any version of the GNU Lesser
159 | General Public License ever published by the Free Software Foundation.
160 |
161 | If the Library as you received it specifies that a proxy can decide
162 | whether future versions of the GNU Lesser General Public License shall
163 | apply, that proxy's public statement of acceptance of any version is
164 | permanent authorization for you to choose that version for the
165 | Library.
166 |
--------------------------------------------------------------------------------
/grkf45/grkf45_1d.py:
--------------------------------------------------------------------------------
1 | import pycuda.autoinit
2 | import pycuda.driver as cuda
3 | import pycuda.compiler
4 | from pycuda.compiler import SourceModule
5 |
6 | def grkf45_1d_module():
7 | source_module = SourceModule("""
8 |
9 | #include
10 | #include
11 | #include
12 | # define MAXNFE 3000
13 |
14 | using namespace std;
15 |
16 | /* USER GIVEN DEVICE FUNCTION: parameter = a */
17 | __device__ void r4_f0 ( float t, float y0, float *yp0, float a){
18 |
19 | *yp0 = 1.0 + y0*y0 + a*sin(t);
20 |
21 | /* printf("(*_*)< t=%2.8f,y0= %2.8f,yp0= %2.8f \\n",t,y0,yp0); */
22 |
23 | return;
24 | }
25 |
26 |
27 |
28 |
29 | /* DEVICE FUNCTION */
30 |
31 | __device__ void r4_fehl (float y0, float t, float h, float yp0, float *f1_0, float *f2_0, float *f3_0, float *f4_0, float *f5_0, float a){
32 |
33 | float ch;
34 | int i;
35 | float s0;
36 |
37 | ch = h / 4.0;
38 |
39 | /* for ( i = 0; i < neqn; i++ ){ */
40 | *f5_0 = y0 + ch * yp0;
41 | /* } */
42 |
43 | r4_f0 ( t + ch, *f5_0, f1_0, a);
44 |
45 | ch = 3.0 * h / 32.0;
46 |
47 | /* for ( i = 0; i < neqn; i++ ){ */
48 | *f5_0 = y0 + ch * ( yp0 + 3.0 * *f1_0 );
49 | /* } */
50 |
51 | r4_f0 ( t + 3.0 * h / 8.0, *f5_0, f2_0, a);
52 | ch = h / 2197.0;
53 |
54 | /* for ( i = 0; i < neqn; i++ ){ */
55 | *f5_0 = y0 + ch *
56 | ( 1932.0 * yp0
57 | + ( 7296.0 * *f2_0 - 7200.0 * *f1_0 )
58 | );
59 | /* } */
60 |
61 | r4_f0 ( t + 12.0 * h / 13.0, *f5_0, f3_0, a);
62 | ch = h / 4104.0;
63 |
64 | /* for ( i = 0; i < neqn; i++ ){ */
65 | *f5_0 = y0 + ch *
66 | (
67 | ( 8341.0 * yp0 - 845.0 * *f3_0 )
68 | + ( 29440.0 * *f2_0 - 32832.0 * *f1_0 )
69 | );
70 | /* } */
71 |
72 | r4_f0 ( t + h, *f5_0, f4_0, a);
73 | ch = h / 20520.0;
74 |
75 | /* for ( i = 0; i < neqn; i++ ){ */
76 | *f1_0 = y0 + ch *
77 | (
78 | ( -6080.0 * yp0
79 | + ( 9295.0 * *f3_0 - 5643.0 * *f4_0 )
80 | )
81 | + ( 41040.0 * *f1_0 - 28352.0 * *f2_0 )
82 | );
83 | /* } */
84 |
85 | r4_f0 ( t + h / 2.0, *f1_0, f5_0, a);
86 | ch = h / 7618050.0;
87 |
88 | /* for ( i = 0; i < neqn; i++ ){ */
89 | s0 = y0 + ch *
90 | (
91 | ( 902880.0 * yp0
92 | + ( 3855735.0 * *f3_0 - 1371249.0 * *f4_0 ) )
93 | + ( 3953664.0 * *f2_0 + 277020.0 * *f5_0 )
94 | );
95 | /* } */
96 |
97 | *f1_0 = s0;
98 |
99 | /* printf("(*_*)< +++++ %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
100 |
101 | return;
102 | }
103 |
104 |
105 | /* GLOBAL FUNCTION */
106 | __global__ void r4_rkf45_1d (int* flagM, float* aM, float *yM, float tM, float toutM, float relerr, float abserr){
107 |
108 | float ae;
109 | float dt;
110 | float ee;
111 | float eeoet;
112 | const float eps = 1.19209290E-07;
113 | float esttol;
114 | float et;
115 | float f1_0;
116 | float f2_0;
117 | float f3_0;
118 | float f4_0;
119 | float f5_0;
120 | float h = -1.0;
121 | bool hfaild;
122 | float hmin;
123 | int i;
124 | int init = -1000;
125 | int k;
126 | int kop = -1;
127 | int nfe = -1;
128 | float s;
129 | float scale;
130 | float tol;
131 | float toln;
132 | float ypk;
133 | bool output;
134 |
135 | /* user defined parameters */
136 | float a;
137 | int ib = blockIdx.x;
138 |
139 | a = aM[ib];
140 |
141 |
142 | /* USE register */
143 | float t;
144 | float y0;
145 | float yp0;
146 | float tout;
147 |
148 | t = tM;
149 | tout = toutM;
150 | y0 = yM[ib];
151 | r4_f0 ( t, y0, &yp0, a);
152 |
153 | dt = tout - t;
154 |
155 | if ( init == 0 ){
156 |
157 | init = 1;
158 | h = abs( dt );
159 | toln = 0.0;
160 |
161 | /* for ( k = 0; k < neqn; k++ ){ */
162 | tol = (relerr) * abs( y0 ) + abserr;
163 |
164 | if ( 0.0 < tol ){
165 | toln = tol;
166 | ypk = abs( yp0 );
167 | if ( tol < ypk * pow ( h, 5 ) )
168 | {
169 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
170 | }}
171 |
172 | /* } */
173 |
174 |
175 | if ( toln <= 0.0 ){h = 0.0;}
176 | h = max ( h, 26.0 * eps * max ( abs ( t ), abs ( dt ) ) );
177 | }
178 |
179 |
180 | /* SIGN(positive/negative -> 1/-1) to signbit(positive/negative -> 0/1) in CUDA math API */
181 |
182 | h = ( - 2.0* signbit( dt ) + 1.0 ) *abs( h );
183 |
184 | if ( 2.0 * abs( dt ) <= abs( h ) ){
185 | kop = kop + 1;
186 | }
187 |
188 | output = false;
189 | scale = 2.0 / (relerr);
190 | ae = scale * abserr;
191 |
192 | for ( ; ; ){
193 | hfaild = false;
194 | hmin = 26.0 * eps * abs ( t );
195 | dt = tout - t;
196 |
197 | if ( 2.0 * abs ( h ) <= abs ( dt ) ){
198 | }else{
199 | if ( abs ( dt ) <= abs ( h ) ){
200 | output = true;
201 | h = dt;
202 | }else{
203 | h = 0.5 * dt;
204 | }
205 | }
206 |
207 | for ( ; ; ){
208 |
209 | if ( MAXNFE < nfe ){
210 |
211 | tM = t;
212 | yM[ib] = y0;
213 | flagM[ib]=4;
214 |
215 | printf("(*_*)< t=%2.8f \\n",t);
216 | printf("*WARNING! END MAXNFE < nfe condition! \\n");
217 |
218 | return;
219 | }
220 |
221 | /* printf("(*_*)< >>>>> %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
222 | r4_fehl (y0, t, h, yp0, &f1_0, &f2_0, &f3_0, &f4_0, &f5_0, a);
223 |
224 | /* printf("(*_*)< <<<<< %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
225 |
226 | nfe = nfe + 5;
227 | eeoet = 0.0;
228 |
229 |
230 | /* for ( k = 0; k < neqn; k++ ){ */
231 | et = abs ( y0 ) + abs ( f1_0 ) + ae;
232 |
233 | ee = abs ( ( -2090.0 * yp0
234 | + ( 21970.0 * f3_0 - 15048.0 * f4_0 )
235 | )
236 | + ( 22528.0 * f2_0 - 27360.0 * f5_0 )
237 | );
238 |
239 | eeoet = max ( eeoet, ee / et );
240 |
241 | /* } */
242 |
243 | esttol = abs ( h ) * eeoet * scale / 752400.0;
244 |
245 | if ( esttol <= 1.0 )
246 | {
247 | break;
248 | }
249 |
250 | hfaild = true;
251 | output = false;
252 |
253 | /* printf("(*_*)< h = %2.8f, esttol= %2.8f \\n",h, esttol); */
254 |
255 | if ( esttol < 59049.0 ){
256 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 );
257 | }else{
258 | s = 0.1;
259 | }
260 |
261 | h = s * h;
262 | }
263 |
264 | t = t + h;
265 | /* for ( i = 0; i < neqn; i++ ){ */
266 | y0 = f1_0;
267 | /* } */
268 |
269 | r4_f0 ( t, y0, &yp0, a);
270 |
271 | nfe = nfe + 1;
272 | if ( 0.0001889568 < esttol )
273 | {
274 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 );
275 | }
276 | else
277 | {
278 | s = 5.0;
279 | }
280 |
281 | if ( hfaild )
282 | {
283 | s = min ( s, 1.0 );
284 | }
285 |
286 | h = ( - 2.0* signbit( h ) + 1.0 ) * max ( s * abs ( h ), hmin );
287 |
288 | if (output){
289 |
290 | tM = t;
291 | yM[ib] = y0;
292 | flagM[ib]=2;
293 |
294 | /* printf("Normal Exit N=%d\\n",nfe); */
295 |
296 | return;
297 | }
298 |
299 | }
300 |
301 |
302 | return;
303 |
304 | }
305 |
306 | """,options=['-use_fast_math'])
307 |
308 | return source_module
309 |
310 |
311 | if __name__ == "__main__":
312 | import numpy as np
313 | import matplotlib.pyplot as plt
314 |
315 | print("*******************************************")
316 | print("GPU RKF45 1D ODE solver for the following example.")
317 | print("y'=1+y*y+a sin(t), y[0]=0 on t=[0,1.4] ")
318 | print("*******************************************")
319 |
320 | source_module=grkf45_1d_module()
321 | pkernel=source_module.get_function("r4_rkf45_1d")
322 |
323 | eps=1.19209290e-07
324 | relerr=np.sqrt(eps)
325 | abserr=np.sqrt(eps)
326 | print("abserr=",abserr)
327 | print("relerr=",relerr)
328 |
329 |
330 | nw=1
331 | nt=16
332 | nq=1
333 | nb = nw*nt*nq
334 | sharedsize=0 #byte
335 |
336 | #parameters
337 | a=np.linspace(-0.1,0.1,nb)
338 | a=a.astype(np.float32)
339 | dev_a = cuda.mem_alloc(a.nbytes)
340 | cuda.memcpy_htod(dev_a,a)
341 |
342 | y0=np.zeros(nb)
343 | y0=y0.astype(np.float32)
344 | dev_y0 = cuda.mem_alloc(y0.nbytes)
345 | cuda.memcpy_htod(dev_y0,y0)
346 |
347 | flag=np.zeros(nb)
348 | flag=flag.astype(np.int32)
349 | dev_flag = cuda.mem_alloc(flag.nbytes)
350 | cuda.memcpy_htod(dev_flag,flag)
351 |
352 |
353 | t=np.linspace(0.0,1.4,25)
354 | yarr=[]
355 | yarr.append(np.copy(y0))
356 | for j,tnow in enumerate(t[:-1]):
357 | tin=tnow
358 | tout=t[j+1]
359 | pkernel(dev_flag, dev_a,dev_y0, np.float32(tin),np.float32(tout),np.float32(relerr),np.float32(abserr),block=(int(nw),1,1), grid=(int(nt),int(nq)),shared=sharedsize)
360 |
361 | cuda.memcpy_dtoh(y0, dev_y0)
362 | cuda.memcpy_dtoh(flag, dev_flag)
363 | yarr.append(np.copy(y0))
364 |
365 | yarr=np.array(yarr)
366 |
367 | plt.plot(t,yarr,".",color="C0")
368 | plt.plot(t,yarr,alpha=0.3)
369 | plt.show()
370 |
371 |
--------------------------------------------------------------------------------
/grkf45/grkf45_2d.py:
--------------------------------------------------------------------------------
1 | import pycuda.autoinit
2 | import pycuda.driver as cuda
3 | import pycuda.compiler
4 | from pycuda.compiler import SourceModule
5 |
6 | def grkf45_module():
7 | source_module = SourceModule("""
8 |
9 | #include
10 | #include
11 | #include
12 | # define MAXNFE 3000
13 |
14 | using namespace std;
15 |
16 | /* USER GIVEN DEVICE FUNCTION: parameter = a */
17 | __device__ void r4_f0 ( float t, float y0, float y1, float *yp0, float a){
18 |
19 | *yp0 = y1;
20 |
21 | return;
22 | }
23 | __device__ void r4_f1 ( float t, float y0, float y1, float *yp1, float a){
24 |
25 | *yp1 = -y0 + a*(1.0 - y0*y0)*y1 ;
26 |
27 | return;
28 | }
29 |
30 |
31 |
32 |
33 | /* DEVICE FUNCTION */
34 |
35 | __device__ void r4_fehl (float y0, float y1, float t, float h, float yp0, float yp1, float *f1_0, float *f2_0, float *f3_0, float *f4_0, float *f5_0, float *f1_1, float *f2_1, float *f3_1, float *f4_1, float *f5_1, float a){
36 |
37 | float ch;
38 | int i;
39 | float s0;
40 | float s1;
41 |
42 | ch = h / 4.0;
43 |
44 | /* for ( i = 0; i < neqn; i++ ){ */
45 | *f5_0 = y0 + ch * yp0;
46 | *f5_1 = y1 + ch * yp1;
47 | /* } */
48 |
49 | r4_f0 ( t + ch, *f5_0, *f5_1, f1_0, a);
50 | r4_f1 ( t + ch, *f5_0, *f5_1, f1_1, a);
51 |
52 | ch = 3.0 * h / 32.0;
53 |
54 | /* for ( i = 0; i < neqn; i++ ){ */
55 | *f5_0 = y0 + ch * ( yp0 + 3.0 * *f1_0 );
56 | *f5_1 = y1 + ch * ( yp1 + 3.0 * *f1_1 );
57 | /* } */
58 |
59 | r4_f0 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, f2_0, a);
60 | r4_f1 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, f2_1, a);
61 |
62 | ch = h / 2197.0;
63 |
64 | /* for ( i = 0; i < neqn; i++ ){ */
65 | *f5_0 = y0 + ch *
66 | ( 1932.0 * yp0
67 | + ( 7296.0 * *f2_0 - 7200.0 * *f1_0 )
68 | );
69 | *f5_1 = y1 + ch *
70 | ( 1932.0 * yp1
71 | + ( 7296.0 * *f2_1 - 7200.0 * *f1_1 )
72 | );
73 |
74 | /* } */
75 |
76 | r4_f0 ( t + 12.0 * h / 13.0, *f5_0,*f5_1, f3_0, a);
77 | r4_f1 ( t + 12.0 * h / 13.0, *f5_0,*f5_1, f3_1, a);
78 |
79 | ch = h / 4104.0;
80 |
81 | /* for ( i = 0; i < neqn; i++ ){ */
82 | *f5_0 = y0 + ch *
83 | (
84 | ( 8341.0 * yp0 - 845.0 * *f3_0 )
85 | + ( 29440.0 * *f2_0 - 32832.0 * *f1_0 )
86 | );
87 | *f5_1 = y1 + ch *
88 | (
89 | ( 8341.0 * yp1 - 845.0 * *f3_1 )
90 | + ( 29440.0 * *f2_1 - 32832.0 * *f1_1 )
91 | );
92 | /* } */
93 |
94 | r4_f0 ( t + h, *f5_0,*f5_1, f4_0, a);
95 | r4_f1 ( t + h, *f5_0,*f5_1, f4_1, a);
96 |
97 | ch = h / 20520.0;
98 |
99 | /* for ( i = 0; i < neqn; i++ ){ */
100 | *f1_0 = y0 + ch *
101 | (
102 | ( -6080.0 * yp0
103 | + ( 9295.0 * *f3_0 - 5643.0 * *f4_0 )
104 | )
105 | + ( 41040.0 * *f1_0 - 28352.0 * *f2_0 )
106 | );
107 | *f1_1 = y1 + ch *
108 | (
109 | ( -6080.0 * yp1
110 | + ( 9295.0 * *f3_1 - 5643.0 * *f4_1 )
111 | )
112 | + ( 41040.0 * *f1_1 - 28352.0 * *f2_1 )
113 | );
114 | /* } */
115 |
116 | r4_f0 ( t + h / 2.0, *f1_0,*f1_1, f5_0, a);
117 | r4_f1 ( t + h / 2.0, *f1_0,*f1_1, f5_1, a);
118 |
119 | ch = h / 7618050.0;
120 |
121 | /* for ( i = 0; i < neqn; i++ ){ */
122 | s0 = y0 + ch *
123 | (
124 | ( 902880.0 * yp0
125 | + ( 3855735.0 * *f3_0 - 1371249.0 * *f4_0 ) )
126 | + ( 3953664.0 * *f2_0 + 277020.0 * *f5_0 )
127 | );
128 | s1 = y1 + ch *
129 | (
130 | ( 902880.0 * yp1
131 | + ( 3855735.0 * *f3_1 - 1371249.0 * *f4_1 ) )
132 | + ( 3953664.0 * *f2_1 + 277020.0 * *f5_1 )
133 | );
134 | /* } */
135 |
136 | *f1_0 = s0;
137 | *f1_1 = s1;
138 |
139 | /* printf("(*_*)< +++++ %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
140 |
141 | return;
142 | }
143 |
144 |
145 | /* GLOBAL FUNCTION */
146 | __global__ void r4_rkf45 (int* flagM, float* aM, float *yM0, float *yM1, float tM, float toutM, float relerr, float abserr){
147 |
148 | float ae;
149 | float dt;
150 | float ee;
151 | float eeoet;
152 | const float eps = 1.19209290E-07;
153 | float esttol;
154 | float et;
155 | float f1_0;
156 | float f2_0;
157 | float f3_0;
158 | float f4_0;
159 | float f5_0;
160 | float f1_1;
161 | float f2_1;
162 | float f3_1;
163 | float f4_1;
164 | float f5_1;
165 | float h = -1.0;
166 | bool hfaild;
167 | float hmin;
168 | int i;
169 | int init = -1000;
170 | int k;
171 | int kop = -1;
172 | int nfe = -1;
173 | float s;
174 | float scale;
175 | float tol;
176 | float toln;
177 | float ypk;
178 | bool output;
179 |
180 | /* user defined parameters */
181 | float a;
182 | int ib = blockIdx.x;
183 |
184 | a = aM[ib];
185 |
186 |
187 | /* USE register */
188 | float t;
189 | float y0;
190 | float yp0;
191 | float y1;
192 | float yp1;
193 | float tout;
194 |
195 | t = tM;
196 | tout = toutM;
197 | y0 = yM0[ib];
198 | y1 = yM1[ib];
199 | r4_f0 ( t, y0, y1, &yp0, a);
200 | r4_f1 ( t, y0, y1, &yp1, a);
201 |
202 |
203 | dt = tout - t;
204 |
205 | if ( init == 0 ){
206 |
207 | init = 1;
208 | h = abs( dt );
209 | toln = 0.0;
210 |
211 | /* for ( k = 0; k < neqn; k++ ){ */
212 | tol = (relerr) * abs( y0 ) + abserr;
213 | if ( 0.0 < tol ){
214 | toln = tol;
215 | ypk = abs( yp0 );
216 | if ( tol < ypk * pow ( h, 5 ) )
217 | {
218 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
219 | }}
220 |
221 | tol = (relerr) * abs( y1 ) + abserr;
222 | if ( 0.0 < tol ){
223 | toln = tol;
224 | ypk = abs( yp1 );
225 | if ( tol < ypk * pow ( h, 5 ) )
226 | {
227 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
228 | }}
229 | /* } */
230 |
231 |
232 | if ( toln <= 0.0 ){h = 0.0;}
233 | h = max ( h, 26.0 * eps * max ( abs ( t ), abs ( dt ) ) );
234 | }
235 |
236 |
237 | /* SIGN(positive/negative -> 1/-1) to signbit(positive/negative -> 0/1) in CUDA math API */
238 |
239 | h = ( - 2.0* signbit( dt ) + 1.0 ) *abs( h );
240 |
241 | if ( 2.0 * abs( dt ) <= abs( h ) ){
242 | kop = kop + 1;
243 | }
244 |
245 | output = false;
246 | scale = 2.0 / (relerr);
247 | ae = scale * abserr;
248 |
249 | for ( ; ; ){
250 | hfaild = false;
251 | hmin = 26.0 * eps * abs ( t );
252 | dt = tout - t;
253 |
254 | if ( 2.0 * abs ( h ) <= abs ( dt ) ){
255 | }else{
256 | if ( abs ( dt ) <= abs ( h ) ){
257 | output = true;
258 | h = dt;
259 | }else{
260 | h = 0.5 * dt;
261 | }
262 | }
263 |
264 | for ( ; ; ){
265 |
266 | if ( MAXNFE < nfe ){
267 |
268 | tM = t;
269 | yM0[ib] = y0;
270 | yM1[ib] = y1;
271 |
272 |
273 | flagM[ib]=4;
274 |
275 | printf("(*_*)< t=%2.8f \\n",t);
276 | printf("*WARNING! END MAXNFE < nfe condition! \\n");
277 |
278 | return;
279 | }
280 |
281 | /* printf("(*_*)< >>>>> %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
282 | r4_fehl (y0, y1, t, h, yp0, yp1, &f1_0, &f2_0, &f3_0, &f4_0, &f5_0, &f1_1, &f2_1, &f3_1, &f4_1, &f5_1, a);
283 |
284 | /* printf("(*_*)< <<<<< %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
285 |
286 | nfe = nfe + 5;
287 | eeoet = 0.0;
288 |
289 | /* for ( k = 0; k < neqn; k++ ){ */
290 | et = abs ( y0 ) + abs ( f1_0 ) + ae;
291 | ee = abs ( ( -2090.0 * yp0
292 | + ( 21970.0 * f3_0 - 15048.0 * f4_0 )
293 | )
294 | + ( 22528.0 * f2_0 - 27360.0 * f5_0 )
295 | );
296 | eeoet = max ( eeoet, ee / et );
297 |
298 | et = abs ( y1 ) + abs ( f1_1 ) + ae;
299 | ee = abs ( ( -2090.0 * yp1
300 | + ( 21970.0 * f3_1 - 15048.0 * f4_1 )
301 | )
302 | + ( 22528.0 * f2_1 - 27360.0 * f5_1 )
303 | );
304 | eeoet = max ( eeoet, ee / et );
305 | /* } */
306 |
307 | esttol = abs ( h ) * eeoet * scale / 752400.0;
308 |
309 | if ( esttol <= 1.0 )
310 | {
311 | break;
312 | }
313 |
314 | hfaild = true;
315 | output = false;
316 |
317 | /* printf("(*_*)< h = %2.8f, esttol= %2.8f \\n",h, esttol); */
318 |
319 | if ( esttol < 59049.0 ){
320 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 );
321 | }else{
322 | s = 0.1;
323 | }
324 |
325 | h = s * h;
326 | }
327 |
328 | t = t + h;
329 | /* for ( i = 0; i < neqn; i++ ){ */
330 | y0 = f1_0;
331 | y1 = f1_1;
332 | /* } */
333 |
334 | r4_f0 ( t, y0, y1, &yp0, a);
335 | r4_f1 ( t, y0, y1, &yp1, a);
336 |
337 | nfe = nfe + 1;
338 | if ( 0.0001889568 < esttol )
339 | {
340 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 );
341 | }
342 | else
343 | {
344 | s = 5.0;
345 | }
346 |
347 | if ( hfaild )
348 | {
349 | s = min ( s, 1.0 );
350 | }
351 |
352 | h = ( - 2.0* signbit( h ) + 1.0 ) * max ( s * abs ( h ), hmin );
353 |
354 | if (output){
355 |
356 | tM = t;
357 | yM0[ib] = y0;
358 | yM1[ib] = y1;
359 |
360 | flagM[ib]=2;
361 |
362 | /* printf("Normal Exit N=%d\\n",nfe); */
363 |
364 | return;
365 | }
366 |
367 | }
368 |
369 |
370 | return;
371 |
372 | }
373 |
374 | """,options=['-use_fast_math'])
375 |
376 | return source_module
377 |
378 |
379 | if __name__ == "__main__":
380 | import numpy as np
381 | import matplotlib.pyplot as plt
382 |
383 | print("*******************************************")
384 | print("GPU RKF45 2D ODE solver for the Van der Pol oscillator. ")
385 | print("y0' = y1 on t=[0,20] ")
386 | print("y1' = -y0 + a (1 - y0*y0)*y1 on t=[0,20] ")
387 | print("*******************************************")
388 |
389 | source_module=grkf45_module()
390 | pkernel=source_module.get_function("r4_rkf45")
391 |
392 | eps=1.19209290e-07
393 | relerr=np.sqrt(eps)
394 | abserr=np.sqrt(eps)
395 | print("abserr=",abserr)
396 | print("relerr=",relerr)
397 |
398 | nw=1
399 | nt=16
400 | nq=1
401 | nb = nw*nt*nq
402 | sharedsize=0 #byte
403 |
404 | #parameters
405 | a=np.linspace(0.1,4.0,nb)
406 | a=a.astype(np.float32)
407 | dev_a = cuda.mem_alloc(a.nbytes)
408 | cuda.memcpy_htod(dev_a,a)
409 |
410 | y0=np.ones(nb)*-4.0
411 | y0=y0.astype(np.float32)
412 | dev_y0 = cuda.mem_alloc(y0.nbytes)
413 | cuda.memcpy_htod(dev_y0,y0)
414 |
415 | y1=np.zeros(nb)
416 | y1=y1.astype(np.float32)
417 | dev_y1 = cuda.mem_alloc(y1.nbytes)
418 | cuda.memcpy_htod(dev_y1,y1)
419 |
420 | flag=np.zeros(nb)
421 | flag=flag.astype(np.int32)
422 | dev_flag = cuda.mem_alloc(flag.nbytes)
423 | cuda.memcpy_htod(dev_flag,flag)
424 |
425 | t=np.linspace(0.0,20.0,1000)
426 | yarr0=[]
427 | yarr0.append(np.copy(y0))
428 | yarr1=[]
429 | yarr1.append(np.copy(y1))
430 |
431 | for j,tnow in enumerate(t[:-1]):
432 | tin=tnow
433 | tout=t[j+1]
434 | pkernel(dev_flag, dev_a,dev_y0,dev_y1,np.float32(tin),np.float32(tout),np.float32(relerr),np.float32(abserr),block=(int(nw),1,1), grid=(int(nt),int(nq)),shared=sharedsize)
435 |
436 | cuda.memcpy_dtoh(y0, dev_y0)
437 | cuda.memcpy_dtoh(y1, dev_y1)
438 | cuda.memcpy_dtoh(flag, dev_flag)
439 | yarr0.append(np.copy(y0))
440 | yarr1.append(np.copy(y1))
441 |
442 | yarr0=np.array(yarr0)
443 | yarr1=np.array(yarr1)
444 |
445 |
446 | fig = plt.figure()
447 | ax = fig.add_subplot(111,aspect=1.0)
448 | plt.plot(yarr0,yarr1,alpha=0.5)
449 | plt.savefig("vanderpol.png")
450 | plt.show()
451 |
--------------------------------------------------------------------------------
/grkf45/grkf45_3d.py:
--------------------------------------------------------------------------------
1 | import pycuda.autoinit
2 | import pycuda.driver as cuda
3 | import pycuda.compiler
4 | from pycuda.compiler import SourceModule
5 |
6 | def grkf45_module():
7 | source_module = SourceModule("""
8 |
9 | #include
10 | #include
11 | #include
12 | # define MAXNFE 10000
13 |
14 | using namespace std;
15 |
16 | /* USER GIVEN DEVICE FUNCTION: parameter = a */
17 | __device__ void r4_f0 ( float t, float y0, float y1, float y2, float *yp0, float p, float r, float b){
18 |
19 | *yp0 = -p*y0 + p*y1;
20 |
21 | return;
22 | }
23 | __device__ void r4_f1 ( float t, float y0, float y1, float y2, float *yp1, float p, float r, float b){
24 |
25 | *yp1 = -y0*y2 + r*y0 - y1;
26 |
27 | return;
28 | }
29 | __device__ void r4_f2 ( float t, float y0, float y1, float y2, float *yp2, float p, float r, float b){
30 |
31 | *yp2 = y0*y1 - b*y2 ;
32 |
33 | return;
34 | }
35 |
36 |
37 |
38 |
39 | /* DEVICE FUNCTION */
40 |
41 | __device__ void r4_fehl (float y0, float y1, float y2, float t, float h, float yp0, float yp1, float yp2, float *f1_0, float *f2_0, float *f3_0, float *f4_0, float *f5_0, float *f1_1, float *f2_1, float *f3_1, float *f4_1, float *f5_1,float *f1_2, float *f2_2, float *f3_2, float *f4_2, float *f5_2, float p, float r, float b){
42 |
43 | float ch;
44 | int i;
45 | float s0;
46 | float s1;
47 | float s2;
48 |
49 | ch = h / 4.0;
50 |
51 | /* for ( i = 0; i < neqn; i++ ){ */
52 | *f5_0 = y0 + ch * yp0;
53 | *f5_1 = y1 + ch * yp1;
54 | *f5_2 = y2 + ch * yp2;
55 |
56 | /* } */
57 |
58 | r4_f0 ( t + ch, *f5_0, *f5_1, *f5_2, f1_0, p, r, b);
59 | r4_f1 ( t + ch, *f5_0, *f5_1, *f5_2, f1_1, p, r, b);
60 | r4_f2 ( t + ch, *f5_0, *f5_1, *f5_2, f1_2, p, r, b);
61 |
62 | ch = 3.0 * h / 32.0;
63 |
64 | /* for ( i = 0; i < neqn; i++ ){ */
65 | *f5_0 = y0 + ch * ( yp0 + 3.0 * *f1_0 );
66 | *f5_1 = y1 + ch * ( yp1 + 3.0 * *f1_1 );
67 | *f5_2 = y2 + ch * ( yp2 + 3.0 * *f1_2 );
68 | /* } */
69 |
70 | r4_f0 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_0, p, r, b);
71 | r4_f1 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_1, p, r, b);
72 | r4_f2 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_2, p, r, b);
73 |
74 | ch = h / 2197.0;
75 |
76 | /* for ( i = 0; i < neqn; i++ ){ */
77 | *f5_0 = y0 + ch *
78 | ( 1932.0 * yp0
79 | + ( 7296.0 * *f2_0 - 7200.0 * *f1_0 )
80 | );
81 | *f5_1 = y1 + ch *
82 | ( 1932.0 * yp1
83 | + ( 7296.0 * *f2_1 - 7200.0 * *f1_1 )
84 | );
85 | *f5_2 = y2 + ch *
86 | ( 1932.0 * yp2
87 | + ( 7296.0 * *f2_2 - 7200.0 * *f1_2 )
88 | );
89 | /* } */
90 |
91 | r4_f0 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_0, p, r, b);
92 | r4_f1 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_1, p, r, b);
93 | r4_f2 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_2, p, r, b);
94 |
95 | ch = h / 4104.0;
96 |
97 | /* for ( i = 0; i < neqn; i++ ){ */
98 | *f5_0 = y0 + ch *
99 | (
100 | ( 8341.0 * yp0 - 845.0 * *f3_0 )
101 | + ( 29440.0 * *f2_0 - 32832.0 * *f1_0 )
102 | );
103 | *f5_1 = y1 + ch *
104 | (
105 | ( 8341.0 * yp1 - 845.0 * *f3_1 )
106 | + ( 29440.0 * *f2_1 - 32832.0 * *f1_1 )
107 | );
108 | *f5_2 = y2 + ch *
109 | (
110 | ( 8341.0 * yp2 - 845.0 * *f3_2 )
111 | + ( 29440.0 * *f2_2 - 32832.0 * *f1_2 )
112 | );
113 | /* } */
114 |
115 | r4_f0 ( t + h, *f5_0,*f5_1,*f5_2, f4_0, p, r, b);
116 | r4_f1 ( t + h, *f5_0,*f5_1,*f5_2, f4_1, p, r, b);
117 | r4_f2 ( t + h, *f5_0,*f5_1,*f5_2, f4_2, p, r, b);
118 |
119 | ch = h / 20520.0;
120 |
121 | /* for ( i = 0; i < neqn; i++ ){ */
122 | *f1_0 = y0 + ch *
123 | (
124 | ( -6080.0 * yp0
125 | + ( 9295.0 * *f3_0 - 5643.0 * *f4_0 )
126 | )
127 | + ( 41040.0 * *f1_0 - 28352.0 * *f2_0 )
128 | );
129 | *f1_1 = y1 + ch *
130 | (
131 | ( -6080.0 * yp1
132 | + ( 9295.0 * *f3_1 - 5643.0 * *f4_1 )
133 | )
134 | + ( 41040.0 * *f1_1 - 28352.0 * *f2_1 )
135 | );
136 | *f1_2 = y2 + ch *
137 | (
138 | ( -6080.0 * yp2
139 | + ( 9295.0 * *f3_2 - 5643.0 * *f4_2 )
140 | )
141 | + ( 41040.0 * *f1_2 - 28352.0 * *f2_2 )
142 | );
143 | /* } */
144 |
145 | r4_f0 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_0, p,r,b);
146 | r4_f1 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_1, p,r,b);
147 | r4_f2 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_1, p,r,b);
148 |
149 | ch = h / 7618050.0;
150 |
151 | /* for ( i = 0; i < neqn; i++ ){ */
152 | s0 = y0 + ch *
153 | (
154 | ( 902880.0 * yp0
155 | + ( 3855735.0 * *f3_0 - 1371249.0 * *f4_0 ) )
156 | + ( 3953664.0 * *f2_0 + 277020.0 * *f5_0 )
157 | );
158 | s1 = y1 + ch *
159 | (
160 | ( 902880.0 * yp1
161 | + ( 3855735.0 * *f3_1 - 1371249.0 * *f4_1 ) )
162 | + ( 3953664.0 * *f2_1 + 277020.0 * *f5_1 )
163 | );
164 | s2 = y2 + ch *
165 | (
166 | ( 902880.0 * yp2
167 | + ( 3855735.0 * *f3_2 - 1371249.0 * *f4_2 ) )
168 | + ( 3953664.0 * *f2_2 + 277020.0 * *f5_2 )
169 | );
170 | /* } */
171 |
172 | *f1_0 = s0;
173 | *f1_1 = s1;
174 | *f1_2 = s2;
175 |
176 | /* printf("(*_*)< +++++ %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
177 |
178 | return;
179 | }
180 |
181 |
182 | /* GLOBAL FUNCTION */
183 | __global__ void r4_rkf45 (int* flagM, float* pM,float* rM,float* bM, float *yM0, float *yM1, float *yM2, float tM, float toutM, float relerr, float abserr){
184 |
185 | float ae;
186 | float dt;
187 | float ee;
188 | float eeoet;
189 | const float eps = 1.19209290E-07;
190 | float esttol;
191 | float et;
192 | float f1_0;
193 | float f2_0;
194 | float f3_0;
195 | float f4_0;
196 | float f5_0;
197 | float f1_1;
198 | float f2_1;
199 | float f3_1;
200 | float f4_1;
201 | float f5_1;
202 | float f1_2;
203 | float f2_2;
204 | float f3_2;
205 | float f4_2;
206 | float f5_2;
207 | float h = -1.0;
208 | bool hfaild;
209 | float hmin;
210 | int i;
211 | int init = -1000;
212 | int k;
213 | int kop = -1;
214 | int nfe = -1;
215 | float s;
216 | float scale;
217 | float tol;
218 | float toln;
219 | float ypk;
220 | bool output;
221 |
222 | /* user defined parameters */
223 | float p;
224 | float r;
225 | float b;
226 |
227 | int ib = blockIdx.x;
228 |
229 | p = pM[ib];
230 | r = rM[ib];
231 | b = bM[ib];
232 |
233 |
234 | /* USE register */
235 | float t;
236 | float y0;
237 | float yp0;
238 | float y1;
239 | float yp1;
240 | float y2;
241 | float yp2;
242 | float tout;
243 |
244 | t = tM;
245 | tout = toutM;
246 | y0 = yM0[ib];
247 | y1 = yM1[ib];
248 | y2 = yM2[ib];
249 | r4_f0 ( t, y0, y1, y2, &yp0, p,r,b);
250 | r4_f1 ( t, y0, y1, y2, &yp1, p,r,b);
251 | r4_f2 ( t, y0, y1, y2, &yp2, p,r,b);
252 |
253 | dt = tout - t;
254 |
255 | if ( init == 0 ){
256 |
257 | init = 1;
258 | h = abs( dt );
259 | toln = 0.0;
260 |
261 | /* for ( k = 0; k < neqn; k++ ){ */
262 | tol = (relerr) * abs( y0 ) + abserr;
263 | if ( 0.0 < tol ){
264 | toln = tol;
265 | ypk = abs( yp0 );
266 | if ( tol < ypk * pow ( h, 5 ) )
267 | {
268 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
269 | }}
270 |
271 | tol = (relerr) * abs( y1 ) + abserr;
272 | if ( 0.0 < tol ){
273 | toln = tol;
274 | ypk = abs( yp1 );
275 | if ( tol < ypk * pow ( h, 5 ) )
276 | {
277 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
278 | }}
279 |
280 | tol = (relerr) * abs( y2 ) + abserr;
281 | if ( 0.0 < tol ){
282 | toln = tol;
283 | ypk = abs( yp2 );
284 | if ( tol < ypk * pow ( h, 5 ) )
285 | {
286 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
287 | }}
288 | /* } */
289 |
290 |
291 | if ( toln <= 0.0 ){h = 0.0;}
292 | h = max ( h, 26.0 * eps * max ( abs ( t ), abs ( dt ) ) );
293 | }
294 |
295 |
296 | /* SIGN(positive/negative -> 1/-1) to signbit(positive/negative -> 0/1) in CUDA math API */
297 |
298 | h = ( - 2.0* signbit( dt ) + 1.0 ) *abs( h );
299 |
300 | if ( 2.0 * abs( dt ) <= abs( h ) ){
301 | kop = kop + 1;
302 | }
303 |
304 | output = false;
305 | scale = 2.0 / (relerr);
306 | ae = scale * abserr;
307 |
308 | for ( ; ; ){
309 | hfaild = false;
310 | hmin = 26.0 * eps * abs ( t );
311 | dt = tout - t;
312 |
313 | if ( 2.0 * abs ( h ) <= abs ( dt ) ){
314 | }else{
315 | if ( abs ( dt ) <= abs ( h ) ){
316 | output = true;
317 | h = dt;
318 | }else{
319 | h = 0.5 * dt;
320 | }
321 | }
322 |
323 | for ( ; ; ){
324 |
325 | if ( MAXNFE < nfe ){
326 |
327 | tM = t;
328 | yM0[ib] = y0;
329 | yM1[ib] = y1;
330 | yM2[ib] = y2;
331 |
332 |
333 | flagM[ib]=4;
334 |
335 | printf("(*_*)< t=%2.8f \\n",t);
336 | printf("*WARNING! END MAXNFE < nfe condition! \\n");
337 |
338 | return;
339 | }
340 |
341 | /* printf("(*_*)< >>>>> %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
342 | r4_fehl (y0, y1, y2, t, h, yp0, yp1, yp2, &f1_0, &f2_0, &f3_0, &f4_0, &f5_0, &f1_1, &f2_1, &f3_1, &f4_1, &f5_1, &f1_2, &f2_2, &f3_2, &f4_2, &f5_2, p, r, b);
343 |
344 | /* printf("(*_*)< <<<<< %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
345 |
346 | nfe = nfe + 5;
347 | eeoet = 0.0;
348 |
349 | /* for ( k = 0; k < neqn; k++ ){ */
350 | et = abs ( y0 ) + abs ( f1_0 ) + ae;
351 | ee = abs ( ( -2090.0 * yp0
352 | + ( 21970.0 * f3_0 - 15048.0 * f4_0 )
353 | )
354 | + ( 22528.0 * f2_0 - 27360.0 * f5_0 )
355 | );
356 | eeoet = max ( eeoet, ee / et );
357 |
358 | et = abs ( y1 ) + abs ( f1_1 ) + ae;
359 | ee = abs ( ( -2090.0 * yp1
360 | + ( 21970.0 * f3_1 - 15048.0 * f4_1 )
361 | )
362 | + ( 22528.0 * f2_1 - 27360.0 * f5_1 )
363 | );
364 | eeoet = max ( eeoet, ee / et );
365 |
366 | et = abs ( y2 ) + abs ( f1_2 ) + ae;
367 | ee = abs ( ( -2090.0 * yp2
368 | + ( 21970.0 * f3_2 - 15048.0 * f4_2 )
369 | )
370 | + ( 22528.0 * f2_2 - 27360.0 * f5_2 )
371 | );
372 | eeoet = max ( eeoet, ee / et );
373 | /* } */
374 |
375 | esttol = abs ( h ) * eeoet * scale / 752400.0;
376 |
377 | if ( esttol <= 1.0 )
378 | {
379 | break;
380 | }
381 |
382 | hfaild = true;
383 | output = false;
384 |
385 | /* printf("(*_*)< h = %2.8f, esttol= %2.8f \\n",h, esttol); */
386 |
387 | if ( esttol < 59049.0 ){
388 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 );
389 | }else{
390 | s = 0.1;
391 | }
392 |
393 | h = s * h;
394 | }
395 |
396 | t = t + h;
397 | /* for ( i = 0; i < neqn; i++ ){ */
398 | y0 = f1_0;
399 | y1 = f1_1;
400 | y2 = f1_2;
401 | /* } */
402 |
403 | r4_f0 ( t, y0, y1, y2, &yp0, p,r,b);
404 | r4_f1 ( t, y0, y1, y2, &yp1, p,r,b);
405 | r4_f2 ( t, y0, y1, y2, &yp2, p,r,b);
406 |
407 | nfe = nfe + 1;
408 | if ( 0.0001889568 < esttol )
409 | {
410 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 );
411 | }
412 | else
413 | {
414 | s = 5.0;
415 | }
416 |
417 | if ( hfaild )
418 | {
419 | s = min ( s, 1.0 );
420 | }
421 |
422 | h = ( - 2.0* signbit( h ) + 1.0 ) * max ( s * abs ( h ), hmin );
423 |
424 | if (output){
425 |
426 | tM = t;
427 | yM0[ib] = y0;
428 | yM1[ib] = y1;
429 | yM2[ib] = y2;
430 |
431 | flagM[ib]=2;
432 |
433 | /* printf("Normal Exit N=%d\\n",nfe); */
434 |
435 | return;
436 | }
437 |
438 | }
439 |
440 |
441 | return;
442 |
443 | }
444 |
445 | """,options=['-use_fast_math'])
446 |
447 | return source_module
448 |
449 |
450 | if __name__ == "__main__":
451 | import numpy as np
452 | import matplotlib.pyplot as plt
453 |
454 | print("*******************************************")
455 | print("GPU RKF45 solver for the following example. 3D")
456 | print("Lorenz Attractor")
457 | print("*******************************************")
458 |
459 | source_module=grkf45_module()
460 | pkernel=source_module.get_function("r4_rkf45")
461 |
462 | eps=1.19209290e-07
463 | relerr=np.sqrt(eps)
464 | abserr=np.sqrt(eps)
465 | print("abserr=",abserr)
466 | print("relerr=",relerr)
467 |
468 |
469 | nw=1
470 | nt=64
471 | nq=1
472 | nb = nw*nt*nq
473 | sharedsize=0 #byte
474 |
475 | #parameters
476 | p=np.ones(nb)*np.array(10.0)
477 | p=p.astype(np.float32)
478 | dev_p = cuda.mem_alloc(p.nbytes)
479 | cuda.memcpy_htod(dev_p,p)
480 |
481 | r=np.ones(nb)*np.array(28.0)
482 | r=r.astype(np.float32)
483 | dev_r = cuda.mem_alloc(r.nbytes)
484 | cuda.memcpy_htod(dev_r,r)
485 |
486 | b=np.ones(nb)*np.array(8.0/3.0)
487 | b=b.astype(np.float32)
488 | dev_b = cuda.mem_alloc(b.nbytes)
489 | cuda.memcpy_htod(dev_b,b)
490 |
491 |
492 | y0=np.linspace(0,30.0,nb)
493 | y0=y0.astype(np.float32)
494 | dev_y0 = cuda.mem_alloc(y0.nbytes)
495 | cuda.memcpy_htod(dev_y0,y0)
496 |
497 | y1=np.ones(nb)*-7
498 | y1=y1.astype(np.float32)
499 | dev_y1 = cuda.mem_alloc(y1.nbytes)
500 | cuda.memcpy_htod(dev_y1,y1)
501 |
502 | y2=np.ones(nb)*-5
503 | y2=y1.astype(np.float32)
504 | dev_y2 = cuda.mem_alloc(y2.nbytes)
505 | cuda.memcpy_htod(dev_y2,y2)
506 |
507 | flag=np.zeros(nb)
508 | flag=flag.astype(np.int32)
509 | dev_flag = cuda.mem_alloc(flag.nbytes)
510 | cuda.memcpy_htod(dev_flag,flag)
511 |
512 | t=np.linspace(0.0,100.0,10000)
513 | yarr0=[]
514 | yarr0.append(np.copy(y0))
515 | yarr1=[]
516 | yarr1.append(np.copy(y1))
517 | yarr2=[]
518 | yarr2.append(np.copy(y2))
519 |
520 | for j,tnow in enumerate(t[:-1]):
521 | tin=tnow
522 | tout=t[j+1]
523 | pkernel(dev_flag, dev_p,dev_r,dev_b,dev_y0,dev_y1,dev_y2,np.float32(tin),np.float32(tout),np.float32(relerr),np.float32(abserr),block=(int(nw),1,1), grid=(int(nt),int(nq)),shared=sharedsize)
524 |
525 | cuda.memcpy_dtoh(y0, dev_y0)
526 | cuda.memcpy_dtoh(y1, dev_y1)
527 | cuda.memcpy_dtoh(y2, dev_y2)
528 |
529 | cuda.memcpy_dtoh(flag, dev_flag)
530 | #print("T=",tnow,"RKF45: y=",y0)
531 | #print("FLAG=",flag)
532 | yarr0.append(np.copy(y0))
533 | yarr1.append(np.copy(y1))
534 | yarr2.append(np.copy(y2))
535 |
536 |
537 | yarr0=np.array(yarr0)
538 | yarr1=np.array(yarr1)
539 | yarr2=np.array(yarr2)
540 |
541 | print("END")
542 | colors=plt.cm.Spectral(np.linspace(0,1,30))
543 | cm = plt.get_cmap('magma')
544 | fig = plt.figure()
545 | ax = fig.add_subplot(121,aspect=1.0)
546 | ax.set_prop_cycle('color',colors)
547 | ax.plot(yarr0[:,:],yarr1[:,:],alpha=0.5)
548 | plt.ylim(-40,40)
549 | plt.xlim(-30,40)
550 |
551 | ax = fig.add_subplot(122,aspect=1.0)
552 | ax.set_prop_cycle('color',colors)
553 | ax.plot(yarr2[:,:],yarr1[:,:],alpha=0.5)
554 | plt.ylim(-40,40)
555 | plt.xlim(-10,60)
556 | plt.savefig("Lorentz.png")
557 | plt.show()
558 |
--------------------------------------------------------------------------------
/grkf45/grkf45.py:
--------------------------------------------------------------------------------
1 | import pycuda.autoinit
2 | import pycuda.driver as cuda
3 | import pycuda.compiler
4 | from pycuda.compiler import SourceModule
5 |
6 | def grkf45_module():
7 | source_module = SourceModule("""
8 |
9 | #include
10 | #include
11 | #include
12 | # define MAXNFE 10000
13 |
14 | using namespace std;
15 |
16 | /* USER GIVEN DEVICE FUNCTION: parameter = a */
17 | __device__ void r4_f0 ( float t, float y0, float y1, float y2, float *yp0, float p, float r, float b){
18 |
19 | *yp0 = -p*y0 + p*y1;
20 |
21 | return;
22 | }
23 | __device__ void r4_f1 ( float t, float y0, float y1, float y2, float *yp1, float p, float r, float b){
24 |
25 | *yp1 = -y0*y2 + r*y0 - y1;
26 |
27 | return;
28 | }
29 | __device__ void r4_f2 ( float t, float y0, float y1, float y2, float *yp2, float p, float r, float b){
30 |
31 | *yp2 = y0*y1 - b*y2 ;
32 |
33 | return;
34 | }
35 |
36 |
37 |
38 |
39 | /* DEVICE FUNCTION */
40 |
41 | __device__ void r4_fehl (float y0, float y1, float y2, float t, float h, float yp0, float yp1, float yp2, float *f1_0, float *f2_0, float *f3_0, float *f4_0, float *f5_0, float *f1_1, float *f2_1, float *f3_1, float *f4_1, float *f5_1,float *f1_2, float *f2_2, float *f3_2, float *f4_2, float *f5_2, float p, float r, float b){
42 |
43 | float ch;
44 | int i;
45 | float s0;
46 | float s1;
47 | float s2;
48 |
49 | ch = h / 4.0;
50 |
51 | /* for ( i = 0; i < neqn; i++ ){ */
52 | *f5_0 = y0 + ch * yp0;
53 | *f5_1 = y1 + ch * yp1;
54 | *f5_2 = y2 + ch * yp2;
55 |
56 | /* } */
57 |
58 | r4_f0 ( t + ch, *f5_0, *f5_1, *f5_2, f1_0, p, r, b);
59 | r4_f1 ( t + ch, *f5_0, *f5_1, *f5_2, f1_1, p, r, b);
60 | r4_f2 ( t + ch, *f5_0, *f5_1, *f5_2, f1_2, p, r, b);
61 |
62 | ch = 3.0 * h / 32.0;
63 |
64 | /* for ( i = 0; i < neqn; i++ ){ */
65 | *f5_0 = y0 + ch * ( yp0 + 3.0 * *f1_0 );
66 | *f5_1 = y1 + ch * ( yp1 + 3.0 * *f1_1 );
67 | *f5_2 = y2 + ch * ( yp2 + 3.0 * *f1_2 );
68 | /* } */
69 |
70 | r4_f0 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_0, p, r, b);
71 | r4_f1 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_1, p, r, b);
72 | r4_f2 ( t + 3.0 * h / 8.0, *f5_0, *f5_1, *f5_2, f2_2, p, r, b);
73 |
74 | ch = h / 2197.0;
75 |
76 | /* for ( i = 0; i < neqn; i++ ){ */
77 | *f5_0 = y0 + ch *
78 | ( 1932.0 * yp0
79 | + ( 7296.0 * *f2_0 - 7200.0 * *f1_0 )
80 | );
81 | *f5_1 = y1 + ch *
82 | ( 1932.0 * yp1
83 | + ( 7296.0 * *f2_1 - 7200.0 * *f1_1 )
84 | );
85 | *f5_2 = y2 + ch *
86 | ( 1932.0 * yp2
87 | + ( 7296.0 * *f2_2 - 7200.0 * *f1_2 )
88 | );
89 | /* } */
90 |
91 | r4_f0 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_0, p, r, b);
92 | r4_f1 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_1, p, r, b);
93 | r4_f2 ( t + 12.0 * h / 13.0, *f5_0,*f5_1,*f5_2, f3_2, p, r, b);
94 |
95 | ch = h / 4104.0;
96 |
97 | /* for ( i = 0; i < neqn; i++ ){ */
98 | *f5_0 = y0 + ch *
99 | (
100 | ( 8341.0 * yp0 - 845.0 * *f3_0 )
101 | + ( 29440.0 * *f2_0 - 32832.0 * *f1_0 )
102 | );
103 | *f5_1 = y1 + ch *
104 | (
105 | ( 8341.0 * yp1 - 845.0 * *f3_1 )
106 | + ( 29440.0 * *f2_1 - 32832.0 * *f1_1 )
107 | );
108 | *f5_2 = y2 + ch *
109 | (
110 | ( 8341.0 * yp2 - 845.0 * *f3_2 )
111 | + ( 29440.0 * *f2_2 - 32832.0 * *f1_2 )
112 | );
113 | /* } */
114 |
115 | r4_f0 ( t + h, *f5_0,*f5_1,*f5_2, f4_0, p, r, b);
116 | r4_f1 ( t + h, *f5_0,*f5_1,*f5_2, f4_1, p, r, b);
117 | r4_f2 ( t + h, *f5_0,*f5_1,*f5_2, f4_2, p, r, b);
118 |
119 | ch = h / 20520.0;
120 |
121 | /* for ( i = 0; i < neqn; i++ ){ */
122 | *f1_0 = y0 + ch *
123 | (
124 | ( -6080.0 * yp0
125 | + ( 9295.0 * *f3_0 - 5643.0 * *f4_0 )
126 | )
127 | + ( 41040.0 * *f1_0 - 28352.0 * *f2_0 )
128 | );
129 | *f1_1 = y1 + ch *
130 | (
131 | ( -6080.0 * yp1
132 | + ( 9295.0 * *f3_1 - 5643.0 * *f4_1 )
133 | )
134 | + ( 41040.0 * *f1_1 - 28352.0 * *f2_1 )
135 | );
136 | *f1_2 = y2 + ch *
137 | (
138 | ( -6080.0 * yp2
139 | + ( 9295.0 * *f3_2 - 5643.0 * *f4_2 )
140 | )
141 | + ( 41040.0 * *f1_2 - 28352.0 * *f2_2 )
142 | );
143 | /* } */
144 |
145 | r4_f0 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_0, p,r,b);
146 | r4_f1 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_1, p,r,b);
147 | r4_f2 ( t + h / 2.0, *f1_0,*f1_1,*f1_2, f5_1, p,r,b);
148 |
149 | ch = h / 7618050.0;
150 |
151 | /* for ( i = 0; i < neqn; i++ ){ */
152 | s0 = y0 + ch *
153 | (
154 | ( 902880.0 * yp0
155 | + ( 3855735.0 * *f3_0 - 1371249.0 * *f4_0 ) )
156 | + ( 3953664.0 * *f2_0 + 277020.0 * *f5_0 )
157 | );
158 | s1 = y1 + ch *
159 | (
160 | ( 902880.0 * yp1
161 | + ( 3855735.0 * *f3_1 - 1371249.0 * *f4_1 ) )
162 | + ( 3953664.0 * *f2_1 + 277020.0 * *f5_1 )
163 | );
164 | s2 = y2 + ch *
165 | (
166 | ( 902880.0 * yp2
167 | + ( 3855735.0 * *f3_2 - 1371249.0 * *f4_2 ) )
168 | + ( 3953664.0 * *f2_2 + 277020.0 * *f5_2 )
169 | );
170 | /* } */
171 |
172 | *f1_0 = s0;
173 | *f1_1 = s1;
174 | *f1_2 = s2;
175 |
176 | /* printf("(*_*)< +++++ %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
177 |
178 | return;
179 | }
180 |
181 |
182 | /* GLOBAL FUNCTION */
183 | __global__ void r4_rkf45 (int* flagM, float* pM,float* rM,float* bM, float *yM0, float *yM1, float *yM2, float tM, float toutM, float relerr, float abserr){
184 |
185 | float ae;
186 | float dt;
187 | float ee;
188 | float eeoet;
189 | const float eps = 1.19209290E-07;
190 | float esttol;
191 | float et;
192 | float f1_0;
193 | float f2_0;
194 | float f3_0;
195 | float f4_0;
196 | float f5_0;
197 | float f1_1;
198 | float f2_1;
199 | float f3_1;
200 | float f4_1;
201 | float f5_1;
202 | float f1_2;
203 | float f2_2;
204 | float f3_2;
205 | float f4_2;
206 | float f5_2;
207 | float h = -1.0;
208 | bool hfaild;
209 | float hmin;
210 | int i;
211 | int init = -1000;
212 | int k;
213 | int kop = -1;
214 | int nfe = -1;
215 | float s;
216 | float scale;
217 | float tol;
218 | float toln;
219 | float ypk;
220 | bool output;
221 |
222 | /* user defined parameters */
223 | float p;
224 | float r;
225 | float b;
226 |
227 | int ib = blockIdx.x;
228 |
229 | p = pM[ib];
230 | r = rM[ib];
231 | b = bM[ib];
232 |
233 |
234 | /* USE register */
235 | float t;
236 | float y0;
237 | float yp0;
238 | float y1;
239 | float yp1;
240 | float y2;
241 | float yp2;
242 | float tout;
243 |
244 | t = tM;
245 | tout = toutM;
246 | y0 = yM0[ib];
247 | y1 = yM1[ib];
248 | y2 = yM2[ib];
249 | r4_f0 ( t, y0, y1, y2, &yp0, p,r,b);
250 | r4_f1 ( t, y0, y1, y2, &yp1, p,r,b);
251 | r4_f2 ( t, y0, y1, y2, &yp2, p,r,b);
252 |
253 | dt = tout - t;
254 |
255 | if ( init == 0 ){
256 |
257 | init = 1;
258 | h = abs( dt );
259 | toln = 0.0;
260 |
261 | /* for ( k = 0; k < neqn; k++ ){ */
262 | tol = (relerr) * abs( y0 ) + abserr;
263 | if ( 0.0 < tol ){
264 | toln = tol;
265 | ypk = abs( yp0 );
266 | if ( tol < ypk * pow ( h, 5 ) )
267 | {
268 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
269 | }}
270 |
271 | tol = (relerr) * abs( y1 ) + abserr;
272 | if ( 0.0 < tol ){
273 | toln = tol;
274 | ypk = abs( yp1 );
275 | if ( tol < ypk * pow ( h, 5 ) )
276 | {
277 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
278 | }}
279 |
280 | tol = (relerr) * abs( y2 ) + abserr;
281 | if ( 0.0 < tol ){
282 | toln = tol;
283 | ypk = abs( yp2 );
284 | if ( tol < ypk * pow ( h, 5 ) )
285 | {
286 | h = ( float ) pow ( ( double ) ( tol / ypk ), 0.2 );
287 | }}
288 | /* } */
289 |
290 |
291 | if ( toln <= 0.0 ){h = 0.0;}
292 | h = max ( h, 26.0 * eps * max ( abs ( t ), abs ( dt ) ) );
293 | }
294 |
295 |
296 | /* SIGN(positive/negative -> 1/-1) to signbit(positive/negative -> 0/1) in CUDA math API */
297 |
298 | h = ( - 2.0* signbit( dt ) + 1.0 ) *abs( h );
299 |
300 | if ( 2.0 * abs( dt ) <= abs( h ) ){
301 | kop = kop + 1;
302 | }
303 |
304 | output = false;
305 | scale = 2.0 / (relerr);
306 | ae = scale * abserr;
307 |
308 | for ( ; ; ){
309 | hfaild = false;
310 | hmin = 26.0 * eps * abs ( t );
311 | dt = tout - t;
312 |
313 | if ( 2.0 * abs ( h ) <= abs ( dt ) ){
314 | }else{
315 | if ( abs ( dt ) <= abs ( h ) ){
316 | output = true;
317 | h = dt;
318 | }else{
319 | h = 0.5 * dt;
320 | }
321 | }
322 |
323 | for ( ; ; ){
324 |
325 | if ( MAXNFE < nfe ){
326 |
327 | tM = t;
328 | yM0[ib] = y0;
329 | yM1[ib] = y1;
330 | yM2[ib] = y2;
331 |
332 |
333 | flagM[ib]=4;
334 |
335 | printf("(*_*)< t=%2.8f \\n",t);
336 | printf("*WARNING! END MAXNFE < nfe condition! \\n");
337 |
338 | return;
339 | }
340 |
341 | /* printf("(*_*)< >>>>> %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
342 | r4_fehl (y0, y1, y2, t, h, yp0, yp1, yp2, &f1_0, &f2_0, &f3_0, &f4_0, &f5_0, &f1_1, &f2_1, &f3_1, &f4_1, &f5_1, &f1_2, &f2_2, &f3_2, &f4_2, &f5_2, p, r, b);
343 |
344 | /* printf("(*_*)< <<<<< %2.8f,%2.8f,%2.8f,%2.8f, %2.8f \\n",f1_0,f2_0,f3_0,f4_0,f5_0); */
345 |
346 | nfe = nfe + 5;
347 | eeoet = 0.0;
348 |
349 | /* for ( k = 0; k < neqn; k++ ){ */
350 | et = abs ( y0 ) + abs ( f1_0 ) + ae;
351 | ee = abs ( ( -2090.0 * yp0
352 | + ( 21970.0 * f3_0 - 15048.0 * f4_0 )
353 | )
354 | + ( 22528.0 * f2_0 - 27360.0 * f5_0 )
355 | );
356 | eeoet = max ( eeoet, ee / et );
357 |
358 | et = abs ( y1 ) + abs ( f1_1 ) + ae;
359 | ee = abs ( ( -2090.0 * yp1
360 | + ( 21970.0 * f3_1 - 15048.0 * f4_1 )
361 | )
362 | + ( 22528.0 * f2_1 - 27360.0 * f5_1 )
363 | );
364 | eeoet = max ( eeoet, ee / et );
365 |
366 | et = abs ( y2 ) + abs ( f1_2 ) + ae;
367 | ee = abs ( ( -2090.0 * yp2
368 | + ( 21970.0 * f3_2 - 15048.0 * f4_2 )
369 | )
370 | + ( 22528.0 * f2_2 - 27360.0 * f5_2 )
371 | );
372 | eeoet = max ( eeoet, ee / et );
373 | /* } */
374 |
375 | esttol = abs ( h ) * eeoet * scale / 752400.0;
376 |
377 | if ( esttol <= 1.0 )
378 | {
379 | break;
380 | }
381 |
382 | hfaild = true;
383 | output = false;
384 |
385 | /* printf("(*_*)< h = %2.8f, esttol= %2.8f \\n",h, esttol); */
386 |
387 | if ( esttol < 59049.0 ){
388 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 );
389 | }else{
390 | s = 0.1;
391 | }
392 |
393 | h = s * h;
394 | }
395 |
396 | t = t + h;
397 | /* for ( i = 0; i < neqn; i++ ){ */
398 | y0 = f1_0;
399 | y1 = f1_1;
400 | y2 = f1_2;
401 | /* } */
402 |
403 | r4_f0 ( t, y0, y1, y2, &yp0, p,r,b);
404 | r4_f1 ( t, y0, y1, y2, &yp1, p,r,b);
405 | r4_f2 ( t, y0, y1, y2, &yp2, p,r,b);
406 |
407 | nfe = nfe + 1;
408 | if ( 0.0001889568 < esttol )
409 | {
410 | s = 0.9 / ( float ) pow ( ( double ) esttol, 0.2 );
411 | }
412 | else
413 | {
414 | s = 5.0;
415 | }
416 |
417 | if ( hfaild )
418 | {
419 | s = min ( s, 1.0 );
420 | }
421 |
422 | h = ( - 2.0* signbit( h ) + 1.0 ) * max ( s * abs ( h ), hmin );
423 |
424 | if (output){
425 |
426 | tM = t;
427 | yM0[ib] = y0;
428 | yM1[ib] = y1;
429 | yM2[ib] = y2;
430 |
431 | flagM[ib]=2;
432 |
433 | /* printf("Normal Exit N=%d\\n",nfe); */
434 |
435 | return;
436 | }
437 |
438 | }
439 |
440 |
441 | return;
442 |
443 | }
444 |
445 | """,options=['-use_fast_math'])
446 |
447 | return source_module
448 |
449 |
450 | if __name__ == "__main__":
451 | import numpy as np
452 | import matplotlib.pyplot as plt
453 | import time
454 | import sys
455 |
456 | print("*******************************************")
457 | print("GPU RKF45 solver for the following example. 3D")
458 | print("Lorenz Attractor")
459 | print("*******************************************")
460 | tstart=time.time()
461 | source_module=grkf45_module()
462 | pkernel=source_module.get_function("r4_rkf45")
463 |
464 | eps=1.19209290e-07
465 | relerr=np.sqrt(eps)
466 | abserr=np.sqrt(eps)
467 | print("abserr=",abserr)
468 | print("relerr=",relerr)
469 |
470 |
471 | nw=1
472 | nt=10000
473 | nq=1
474 | nb = nw*nt*nq
475 | sharedsize=0 #byte
476 |
477 | #parameters
478 | p=np.ones(nb)*np.array(10.0)
479 | p=p.astype(np.float32)
480 | dev_p = cuda.mem_alloc(p.nbytes)
481 | cuda.memcpy_htod(dev_p,p)
482 |
483 | r=np.ones(nb)*np.array(28.0)
484 | r=r.astype(np.float32)
485 | dev_r = cuda.mem_alloc(r.nbytes)
486 | cuda.memcpy_htod(dev_r,r)
487 |
488 | b=np.ones(nb)*np.array(8.0/3.0)
489 | b=b.astype(np.float32)
490 | dev_b = cuda.mem_alloc(b.nbytes)
491 | cuda.memcpy_htod(dev_b,b)
492 |
493 |
494 | y0=np.linspace(0,30.0,nb)
495 | y0=y0.astype(np.float32)
496 | dev_y0 = cuda.mem_alloc(y0.nbytes)
497 | cuda.memcpy_htod(dev_y0,y0)
498 |
499 | y1=np.ones(nb)*-7
500 | y1=y1.astype(np.float32)
501 | dev_y1 = cuda.mem_alloc(y1.nbytes)
502 | cuda.memcpy_htod(dev_y1,y1)
503 |
504 | y2=np.ones(nb)*-5
505 | y2=y1.astype(np.float32)
506 | dev_y2 = cuda.mem_alloc(y2.nbytes)
507 | cuda.memcpy_htod(dev_y2,y2)
508 |
509 | flag=np.zeros(nb)
510 | flag=flag.astype(np.int32)
511 | dev_flag = cuda.mem_alloc(flag.nbytes)
512 | cuda.memcpy_htod(dev_flag,flag)
513 |
514 | t=np.linspace(0.0,100.0,10000)
515 | yarr0=[]
516 | yarr0.append(np.copy(y0))
517 | yarr1=[]
518 | yarr1.append(np.copy(y1))
519 | yarr2=[]
520 | yarr2.append(np.copy(y2))
521 |
522 | for j,tnow in enumerate(t[:-1]):
523 | tin=tnow
524 | tout=t[j+1]
525 | pkernel(dev_flag, dev_p,dev_r,dev_b,dev_y0,dev_y1,dev_y2,np.float32(tin),np.float32(tout),np.float32(relerr),np.float32(abserr),block=(int(nw),1,1), grid=(int(nt),int(nq)),shared=sharedsize)
526 |
527 | cuda.memcpy_dtoh(y0, dev_y0)
528 | cuda.memcpy_dtoh(y1, dev_y1)
529 | cuda.memcpy_dtoh(y2, dev_y2)
530 |
531 | cuda.memcpy_dtoh(flag, dev_flag)
532 | #print("T=",tnow,"RKF45: y=",y0)
533 | #print("FLAG=",flag)
534 | yarr0.append(np.copy(y0))
535 | yarr1.append(np.copy(y1))
536 | yarr2.append(np.copy(y2))
537 |
538 |
539 | yarr0=np.array(yarr0)
540 | yarr1=np.array(yarr1)
541 | yarr2=np.array(yarr2)
542 |
543 | print("END")
544 | tend=time.time()
545 | print("T=",tend-tstart)
546 |
547 | sys.exit()
548 | colors=plt.cm.Spectral(np.linspace(0,1,30))
549 | cm = plt.get_cmap('magma')
550 | fig = plt.figure()
551 | ax = fig.add_subplot(121,aspect=1.0)
552 | ax.set_prop_cycle('color',colors)
553 | ax.plot(yarr0[:,:],yarr1[:,:],alpha=0.5)
554 | plt.ylim(-40,40)
555 | plt.xlim(-30,40)
556 |
557 | ax = fig.add_subplot(122,aspect=1.0)
558 | ax.set_prop_cycle('color',colors)
559 | ax.plot(yarr2[:,:],yarr1[:,:],alpha=0.5)
560 | plt.ylim(-40,40)
561 | plt.xlim(-10,60)
562 | plt.savefig("Lorentz.png")
563 | plt.show()
564 |
--------------------------------------------------------------------------------