├── .gitignore ├── LICENSE ├── README.md ├── benchmark.sh ├── m_xoroshiro128plus.f90 ├── pi_monte_carlo_co_sum.f90 ├── pi_monte_carlo_co_sum_openmp.f90 ├── pi_monte_carlo_co_sum_steady.f90 ├── pi_monte_carlo_coarrays.f90 ├── pi_monte_carlo_coarrays_steady.f90 ├── pi_monte_carlo_openmp.f90 └── pi_monte_carlo_serial.f90 /.gitignore: -------------------------------------------------------------------------------- 1 | *.txt 2 | *.mod 3 | *.out 4 | *.backup 5 | *.1 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Vincent Magnin 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # exploring_coarrays 2 | 3 | Let's explore the modern Fortran coarrays features for parallel programming: coarrays, images, etc. 4 | 5 | ## Computing Pi by Monte Carlo 6 | 7 | ### The algorithm 8 | 9 | Imagine a disk of radius R=1 inside a square of side 2*R. And draw N points inside the square. Count K, the number of points inside the disk. The larger N, the closer `4*K/N` is to Pi. Because `K/N` will tend to be proportional to the ratio between the surface of the disk `Pi*R**2` and the surface of the square `(2*R)**2`. The programming is a little optimized by considering only a quarter disk inside a square of side 1. 10 | 11 | The advantage of Monte Carlo algorithms are that they are naturally parallel ("embarrassingly parallel"), each point being independent of the others. 12 | 13 | Warnings: 14 | 15 | * this is an inefficient method to compute Pi, as one more precision digit requires 100 times more points! 16 | * If the pseudo-random generator is biased, it can be a problem if our objective is really to compute precisely Pi. But our objective here is just to burn the CPU! 17 | 18 | ### The programs 19 | 20 | In this repository, you will find: 21 | 22 | * a serial version of the algorithm. 23 | * A parallel version using OpenMP. 24 | * A parallel version using Coarrays. 25 | * Another coarrays version printing steadily intermediate results. **Bug: the intermediate results are not correct.** 26 | * Versions using co_sum() instead of coarrays. 27 | 28 | Concerning the pseudo-random number generator, we use a [Fortran implementation](https://github.com/jannisteunissen/xoroshiro128plus_fortran) (public domain) of the xoroshiro128+ algorithm. See also the page ["xoshiro / xoroshiro generators and the PRNG shootout"](https://prng.di.unimi.it/). 29 | 30 | 31 | ### Compilation 32 | 33 | Each program will be compiled with the `-O3` flag for optimization, with GFortran and Intel compilers ifort and ifx (the new Intel compiler, based on LLVM). 34 | 35 | The OpenMP version will be compiled with the `-fopenmp` flag with gfortran or `-qopenmp` with Intel compilers. The number of threads is set via the `OMP_NUM_THREADS` environment variable. 36 | 37 | For gfortran, OpenCoarrays was installed with the MPICH library. The coarray versions will be compiled and run with commands like: 38 | 39 | ```bash 40 | $ caf -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && cafrun -n 2 ./a.out 41 | ``` 42 | 43 | And for ifort: 44 | 45 | ```bash 46 | $ export FOR_COARRAY_NUM_IMAGES=2 47 | $ ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && ./a.out 48 | ``` 49 | 50 | For ifx: 51 | 52 | ```bash 53 | $ ifx -O3 -coarray=shared -coarray-num-images=2 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 54 | ``` 55 | 56 | ### Methodology 57 | 58 | The values are the mean values obtained with 10 runs, computed by: 59 | 60 | ```bash 61 | $ ./benchmark.sh 62 | ``` 63 | 64 | Warning: this benchmark is valid for those programs, on those machines, with those compilers and libraries versions, with those compilers options. The results can not be generalized easily to other situations. Just try and see with your own programs. 65 | 66 | ### Results #1 (May 2021) 67 | 68 | The compiler versions are: 69 | 70 | * ifort 2021.2.0. 71 | * ifx 2021.2.0 Beta (ifx does not yet support `-corray`). 72 | * gfortran 10.2.0. 73 | 74 | on an Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, under Ubuntu 20.10. 75 | 76 | 77 | CPU time in seconds with 2 images/threads (except of course Serial): 78 | 79 | | Version | gfortran | ifort | ifx | 80 | | -------------------- | -------- | ------- | ------- | 81 | | Serial | 10.77 | 18.77 | 14.66 | 82 | | OpenMP | 5.75 | 9.32 | 60.30 | 83 | | Coarrays | 13.21 | 9.79 | | 84 | | Coarrays steady | 21.80 | 27.83 | | 85 | | Co_sum | 5.58 | 9.98 | | 86 | | Co_sum steady | 9.18 | 12.71 | | 87 | 88 | With 4 images/threads (except of course Serial): 89 | 90 | | Version | gfortran | ifort | ifx | 91 | | -------------------- | -------- | ------- | ------- | 92 | | Serial | 10.77 | 18.77 | 14.66 | 93 | | OpenMP | 4.36 | 8.42 | 43.21 | 94 | | Coarrays | 9.47 | 9.12 | | 95 | | Coarrays steady | 19.41 | 24.78 | | 96 | | Co_sum | 4.16 | 9.29 | | 97 | | Co_sum steady | 8.18 | 10.94 | | 98 | 99 | 100 | ### Results #2 (January 2024) 101 | 102 | The compiler versions are: 103 | * gfortran 11.4.0 104 | * ifort 2021.11.1 105 | * ifx 2024.0.2 106 | 107 | on a 13th Gen Intel(R) Core(TM) i5-13500, under Ubuntu 22.04. 108 | 109 | With 2 images/threads (except of course Serial) with additional co_sum and openMP benchmark. The gfortran `co_sum` method includes the `-flto` flag as below. 110 | 111 | 112 | | Version | gfortran | ifort | ifx | 113 | | -------------------- | -------- | ------- | ------- | 114 | | Serial | 11.11 | 28.02 | 14.26 | 115 | | OpenMP | 7.86 | 14.40 | 5.37 | 116 | | Coarrays | 8.06 | 10.42 | 7.29 | 117 | | Coarrays steady | 8.98 | 16.85 | 14.38 | 118 | | Co_sum | 2.12 | 10.45 | 6.99 | 119 | | Co_sum steady | 3.37 | 10.93 | 10.93 | 120 | | Co_sum & openMP | 1.12 | 7.59 | 2.72 | 121 | 122 | 123 | ### Further optimization 124 | 125 | With gfortran, the `-flto` *([standard link-time optimizer](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html))* compilation option has a strong effect on this algorithm: for example, with the `co_sum` version the CPU time with 4 images falls from 4.16 s to 2.38 s (results #1)! 126 | 127 | 128 | # Bibliography 129 | 130 | * Curcic, Milan. [Modern Fortran - Building efficient parallel applications](https://learning.oreilly.com/library/view/-/9781617295287/?ar), Manning Publications, 1st edition, novembre 2020, ISBN 978-1-61729-528-7. 131 | * Metcalf, Michael, John Ker Reid, et Malcolm Cohen. *[Modern Fortran Explained: Incorporating Fortran 2018.](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198811893.001.0001/oso-9780198811893)* Numerical Mathematics and Scientific Computation. Oxford (England): Oxford University Press, 2018, ISBN 978-0-19-185002-8. 132 | * Thomas Koenig, [coarray-tutorial](https://github.com/tkoenig1/coarray-tutorial/blob/main/tutorial.md). 133 | -------------------------------------------------------------------------------- /benchmark.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | # Vincent Magnin, 2021-05-09 3 | # Ryan Bignell, 2024-01-17 4 | # Last modification: 2024-09-04 5 | # Launch the Pi Monte Carlo benchmark 6 | # MIT license 7 | # Verified with shellcheck 8 | 9 | # Strict mode: 10 | set -uo pipefail 11 | 12 | # Launches N times the a.out executable and copy the output in a file. 13 | launch_N_times() 14 | { 15 | # Input parameters: 16 | local readonly N="$1" 17 | local readonly filename="$2" 18 | local readonly executable="$3" 19 | 20 | for ((i = 1 ; i <= N ; i++)) ; do 21 | ${executable} | tee -a "$filename" 22 | done 23 | } 24 | 25 | # Compute the computation mean time using the times in the file. 26 | mean_time() 27 | { 28 | # Input parameters: 29 | local readonly testname="$1" 30 | # We grep real numbers with 3 decimals followed by a space (CPU times) 31 | # and we compute their mean value using awk: 32 | echo $(grep -oE '[0-9]+\.[0-9]{3} ' "${testname}.txt" | awk '{ total += $1 } END { printf "%5.2f", total/NR }') 33 | } 34 | 35 | #*************** 36 | # Main program: 37 | #*************** 38 | readonly runs=10 39 | readonly threads=2 40 | 41 | # Environment variables for OpenMP and Coarrays (ifort): 42 | export OMP_NUM_THREADS="${threads}" 43 | export FOR_COARRAY_NUM_IMAGES="${threads}" 44 | 45 | # Cleanup: 46 | rm -f gfortran*.txt 47 | rm -f ifort*.txt 48 | rm -f ifx*.txt 49 | 50 | # All examples are compiled and launched several times, and the results are 51 | # copied into a txt file: 52 | 53 | test_name="gfortran_serial" 54 | echo "$test_name" 55 | gfortran -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 56 | 57 | test_name="ifort_serial" 58 | echo "$test_name" 59 | ifort -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 60 | 61 | test_name="ifx_serial" 62 | echo "$test_name" 63 | ifx -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 64 | 65 | test_name="gfortran_openmp" 66 | echo "$test_name" 67 | gfortran -O3 -fopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 68 | 69 | test_name="ifort_openmp" 70 | echo "$test_name" 71 | ifort -O3 -qopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 72 | 73 | test_name="ifx_openmp" 74 | echo "$test_name" 75 | ifx -O3 -qopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 76 | 77 | test_name="gfortran_coarrays" 78 | echo "$test_name" 79 | caf -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out" 80 | 81 | test_name="ifort_coarrays" 82 | echo "$test_name" 83 | ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 84 | 85 | test_name="ifx_coarrays" 86 | echo "$test_name" 87 | ifx -O3 -coarray=shared -coarray-num-images=${threads} m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 88 | 89 | test_name="gfortran_coarrays_steady" 90 | echo "$test_name" 91 | caf -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out" 92 | 93 | test_name="ifort_coarrays_steady" 94 | echo "$test_name" 95 | ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 96 | 97 | test_name="ifx_coarrays_steady" 98 | echo "$test_name" 99 | ifx -O3 -coarray=shared -coarray-num-images=${threads} m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 100 | 101 | test_name="gfortran_co_sum" 102 | echo "$test_name" 103 | caf -O3 -flto m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out" 104 | 105 | test_name="ifort_co_sum" 106 | echo "$test_name" 107 | ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 108 | 109 | test_name="ifx_co_sum" 110 | echo "$test_name" 111 | ifx -coarray=shared -coarray-num-images=${threads} -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 112 | 113 | 114 | test_name="gfortran_co_sum_steady" 115 | echo "$test_name" 116 | caf -O3 m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out" 117 | 118 | test_name="ifort_co_sum_steady" 119 | echo "$test_name" 120 | ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 121 | 122 | test_name="ifx_co_sum_steady" 123 | echo "$test_name" 124 | ifx -O3 -coarray=shared -coarray-num-images=${threads} m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 125 | 126 | test_name="gfortran_co_sum_openmp" 127 | echo "$test_name" 128 | caf -O3 -fopenmp -flto m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out" 129 | 130 | test_name="ifort_co_sum_openmp" 131 | echo "$test_name" 132 | ifort -O3 -qopenmp -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 133 | 134 | test_name="ifx_co_sum_openmp" 135 | echo "$test_name" 136 | ifx -coarray=shared -coarray-num-images=${threads} -qopenmp -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out" 137 | 138 | 139 | # The CPU times mean values are computed with each txt file: 140 | echo '****************************************' 141 | echo ' STATISTICS (Markdown table)' 142 | echo '****************************************' 143 | 144 | echo '| Version | gfortran | ifort | ifx |' 145 | echo '| -------------------- | -------- | ------- | ------- |' 146 | echo "| Serial | $(mean_time 'gfortran_serial') | $(mean_time 'ifort_serial') | $(mean_time 'ifx_serial') |" 147 | echo "| OpenMP | $(mean_time 'gfortran_openmp') | $(mean_time 'ifort_openmp') | $(mean_time 'ifx_openmp') |" 148 | echo "| Coarrays | $(mean_time 'gfortran_coarrays') | $(mean_time 'ifort_coarrays') | $(mean_time 'ifx_coarrays') |" 149 | echo "| Coarrays steady | $(mean_time 'gfortran_coarrays_steady') | $(mean_time 'ifort_coarrays_steady') | $(mean_time 'ifx_coarrays_steady') |" 150 | echo "| Co_sum | $(mean_time 'gfortran_co_sum') | $(mean_time 'ifort_co_sum') | $(mean_time 'ifx_co_sum') |" 151 | echo "| Co_sum steady | $(mean_time 'gfortran_co_sum_steady') | $(mean_time 'ifort_co_sum_steady') | $(mean_time 'ifort_co_sum_steady') |" 152 | echo "| Co_sum & openMP | $(mean_time 'gfortran_co_sum_openmp') | $(mean_time 'ifort_co_sum_openmp') | $(mean_time 'ifx_co_sum_openmp') |" 153 | echo 154 | 155 | echo "Compilers versions:" 156 | echo "-------------------" 157 | gfortran --version 158 | ifort --version 159 | ifx --version 160 | -------------------------------------------------------------------------------- /m_xoroshiro128plus.f90: -------------------------------------------------------------------------------- 1 | ! Written in 2016 by David Blackman and Sebastiano Vigna (vigna@acm.org) 2 | ! Translated to Fortran 2008 by Jannis Teunissen 3 | 4 | ! To the extent possible under law, the author has dedicated all copyright 5 | ! and related and neighboring rights to this software to the public domain 6 | ! worldwide. This software is distributed without any warranty. 7 | 8 | ! See . 9 | 10 | ! This is the successor to xorshift128+. It is the fastest full-period 11 | ! generator passing BigCrush without systematic failures, but due to the 12 | ! relatively short period it is acceptable only for applications with a 13 | ! mild amount of parallelism; otherwise, use a xorshift1024* generator. 14 | 15 | ! Beside passing BigCrush, this generator passes the PractRand test suite 16 | ! up to (and included) 16TB, with the exception of binary rank tests, 17 | ! which fail due to the lowest bit being an LFSR; all other bits pass all 18 | ! tests. We suggest to use a sign test to extract a random Boolean value. 19 | 20 | ! Note that the generator uses a simulated rotate operation, which most C 21 | ! compilers will turn into a single instruction. In Java, you can use 22 | ! Long.rotateLeft(). In languages that do not make low-level rotation 23 | ! instructions accessible xorshift128+ could be faster. 24 | 25 | ! The state must be seeded so that it is not everywhere zero. If you have 26 | ! a 64-bit seed, we suggest to seed a splitmix64 generator and use its 27 | ! output to fill s. 28 | 29 | ! Usage example: 30 | ! 31 | ! use m_xoroshiro128plus 32 | ! 33 | ! type(rng_t) :: rng 34 | ! call rng%seed((/1337_i8, 31337_i8/)) 35 | ! 36 | ! print *, rng%next() 37 | ! print *, rng%U01() 38 | 39 | module m_xoroshiro128plus 40 | implicit none 41 | private 42 | 43 | ! This defines a 64 bit integer type 44 | integer, parameter :: i8 = selected_int_kind(18) 45 | 46 | ! This defines a 64 bit floating point type (Double Precision) 47 | integer, parameter :: dp = kind(0.0d0) 48 | 49 | ! A type/class to store the RNG state 50 | type rng_t 51 | ! The default seed (arbitrarily chosen) 52 | integer(i8) :: s(2) = (/123456789_i8, 987654321_i8/) 53 | contains ! Methods: 54 | procedure, non_overridable :: next ! Get next random number 55 | procedure, non_overridable :: U01 ! Get next random float [0,1) 56 | procedure, non_overridable :: seed ! Seed the generator 57 | procedure, non_overridable :: jump ! Jump function (see below) 58 | end type rng_t 59 | 60 | ! List of public types 61 | public :: i8, dp 62 | public :: rng_t 63 | 64 | contains 65 | 66 | ! For internal use 67 | pure function rotl(x, k) result(res) 68 | integer(i8), intent(in) :: x 69 | integer, intent(in) :: k 70 | integer(i8) :: res 71 | 72 | res = ior(shiftl(x, k), shiftr(x, 64 - k)) 73 | end function rotl 74 | 75 | ! Get the next value (returned as 64 bit signed integer) 76 | function next(self) result(res) 77 | class(rng_t), intent(inout) :: self 78 | integer(i8) :: res 79 | integer(i8) :: t(2) 80 | 81 | t = self%s 82 | res = t(1) + t(2) 83 | t(2) = ieor(t(1), t(2)) 84 | self%s(1) = ieor(ieor(rotl(t(1), 55), t(2)), shiftl(t(2), 14)) 85 | self%s(2) = rotl(t(2), 36) 86 | end function next 87 | 88 | ! Get a uniform [0,1) random real (double precision) 89 | function U01(self) result(res) 90 | class(rng_t), intent(inout) :: self 91 | real(dp) :: res 92 | integer(i8) :: x 93 | 94 | x = self%next() 95 | x = ior(shiftl(1023_i8, 52), shiftr(x, 12)) 96 | res = transfer(x, res) - 1.0_dp 97 | end function U01 98 | 99 | ! Set a seed for the RNG 100 | subroutine seed(self, the_seed) 101 | class(rng_t), intent(inout) :: self 102 | integer(i8), intent(in) :: the_seed(2) 103 | 104 | self%s = the_seed 105 | end subroutine seed 106 | 107 | ! This is the jump function for the generator. It is equivalent 108 | ! to 2^64 calls to next(); it can be used to generate 2^64 109 | ! non-overlapping subsequences for parallel computations. 110 | subroutine jump(self) 111 | class(rng_t), intent(inout) :: self 112 | integer :: i, b 113 | integer(i8) :: t(2), dummy 114 | 115 | ! The signed equivalent of the unsigned constants 116 | integer(i8), parameter :: jmp_c(2) = & 117 | (/-4707382666127344949_i8, -2852180941702784734_i8/) 118 | 119 | t = 0 120 | do i = 1, 2 121 | do b = 0, 63 122 | if (iand(jmp_c(i), shiftl(1_i8, b)) /= 0) then 123 | t = ieor(t, self%s) 124 | end if 125 | dummy = self%next() 126 | end do 127 | end do 128 | 129 | self%s = t 130 | end subroutine jump 131 | 132 | end module m_xoroshiro128plus 133 | -------------------------------------------------------------------------------- /pi_monte_carlo_co_sum.f90: -------------------------------------------------------------------------------- 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm 2 | ! Co_sum with final results 3 | ! Vincent Magnin, 2021-04-22 4 | ! and Brad Richardson 5 | ! Last modification: 2024-09-03 6 | ! MIT license 7 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90 8 | ! $ cafrun -n 4 ./a.out 9 | ! or with ifx : 10 | ! $ ifx -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90 11 | 12 | program pi_monte_carlo_co_sum 13 | use, intrinsic :: iso_fortran_env, only: wp=>real64, int64 14 | use m_xoroshiro128plus 15 | implicit none 16 | type(rng_t) :: rng ! xoroshiro128+ pseudo-random number generator 17 | real(wp) :: x, y ! Coordinates of a point 18 | integer(int64) :: n ! Total number of points 19 | integer(int64) :: k ! Points into the quarter disk 20 | integer(int64) :: i ! Loop counter 21 | integer(int64) :: n_per_image ! Number of parallel images 22 | integer :: t1, t2 ! Clock ticks 23 | real :: count_rate ! Clock ticks per second 24 | 25 | n = 1000000000 26 | k = 0 27 | 28 | ! Each image have its own RNG seed, thanks to rng%jump() which 29 | ! generates non-overlapping subsequences for parallel computations: 30 | call rng%seed([ -1337_i8, 9812374_i8 ]) 31 | if (this_image() /= 1) then 32 | do i = 2, this_image() 33 | call rng%jump() 34 | end do 35 | end if 36 | 37 | x = rng%U01() 38 | 39 | call system_clock(t1, count_rate) 40 | 41 | n_per_image = n / num_images() 42 | write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images() 43 | write(*, '(a, i11, a)') " will compute", n_per_image, " points" 44 | 45 | do i = 1, n_per_image 46 | ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1: 47 | x = rng%U01() 48 | y = rng%U01() 49 | 50 | ! Is it in the quarter disk (R=1, center=origin) ? 51 | if ((x**2 + y**2) < 1.0_wp) k = k + 1 52 | end do 53 | 54 | ! At the end: 55 | call co_sum(k, result_image = 1) 56 | if (this_image() == 1) then 57 | write(*,*) 58 | write(*, '(a, i0, a, i0)', advance='no') "4 * ", k, " / ", n 59 | write(*, '(a, f17.15)') " = ", (4.0_wp * k) / n 60 | 61 | call system_clock(t2) 62 | write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s" 63 | write(*,'(a)') "---------------------------------------------------" 64 | end if 65 | 66 | end program pi_monte_carlo_co_sum 67 | -------------------------------------------------------------------------------- /pi_monte_carlo_co_sum_openmp.f90: -------------------------------------------------------------------------------- 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm 2 | ! Co_sum with final results 3 | ! Vincent Magnin, 2021-04-22 4 | ! and Brad Richardson 5 | ! and Ryan Bignell 6 | ! Last modification: 2024-09-04 7 | ! MIT license 8 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 -fopenmp m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90 9 | ! $ cafrun -n 4 ./a.out 10 | ! or with ifx : 11 | ! $ ifx -O3 -qopenmp -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90 12 | 13 | program pi_monte_carlo_co_sum_openmp 14 | use, intrinsic :: iso_fortran_env, only: wp=>real64, int64 15 | use m_xoroshiro128plus 16 | use omp_lib, only: omp_get_thread_num, omp_get_num_threads 17 | implicit none 18 | type(rng_t) :: rng ! xoroshiro128+ pseudo-random number generator 19 | real(wp) :: x, y ! Coordinates of a point 20 | integer(int64) :: n ! Total number of points 21 | integer(int64) :: k ! Points into the quarter disk 22 | integer(int64) :: i ! Loop counter 23 | integer(int64) :: n_per_image ! Number of parallel images 24 | integer :: t1, t2 ! Clock ticks 25 | real :: count_rate ! Clock ticks per second 26 | integer :: thread ! OpenMP thread number 27 | integer :: nth 28 | 29 | n = 1000000000 30 | k = 0 31 | 32 | call system_clock(t1, count_rate) 33 | 34 | !$OMP PARALLEL DEFAULT(NONE) SHARED(n_per_image,n) PRIVATE(thread, nth, i, x, y, rng) REDUCTION(+: k) 35 | thread = omp_get_thread_num() 36 | 37 | ! Each image have its own RNG seed, thanks to rng%jump() which 38 | ! generates non-overlapping subsequences for parallel computations: 39 | call rng%seed([ -1337_i8, 9812374_i8 ]) 40 | ! Threads are numbered from 0, images from 1. 41 | ! We compute a unique number for each task, starting from 1: 42 | nth = (this_image() - 1) * omp_get_num_threads() + (thread + 1) 43 | if (nth /= 1) then 44 | do i = 2, nth 45 | call rng%jump() 46 | end do 47 | end if 48 | 49 | x = rng%U01() 50 | 51 | n_per_image = n / num_images() 52 | write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images() 53 | write(*, '(a, i11, a)') " will compute", n_per_image, " points" 54 | 55 | !$OMP DO SCHEDULE(STATIC) 56 | do i = 1, n_per_image 57 | ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1: 58 | x = rng%U01() 59 | y = rng%U01() 60 | 61 | ! Is it in the quarter disk (R=1, center=origin) ? 62 | if ((x**2 + y**2) < 1.0_wp) k = k + 1 63 | end do 64 | !$OMP END DO 65 | !$OMP END PARALLEL 66 | 67 | ! At the end: 68 | call co_sum(k, result_image = 1) 69 | if (this_image() == 1) then 70 | write(*,*) 71 | write(*, '(a, i0, a, i0)', advance='no') "4 * ", k, " / ", n 72 | write(*, '(a, f17.15)') " = ", (4.0_wp * k) / n 73 | 74 | call system_clock(t2) 75 | write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s" 76 | write(*,'(a)') "---------------------------------------------------" 77 | end if 78 | 79 | end program pi_monte_carlo_co_sum_openmp 80 | -------------------------------------------------------------------------------- /pi_monte_carlo_co_sum_steady.f90: -------------------------------------------------------------------------------- 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm 2 | ! Co_sum version with steady results 3 | ! Vincent Magnin, 2021-04-22 4 | ! and Brad Richardson 5 | ! Last modification: 2024-09-03 6 | ! MIT license 7 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90 8 | ! $ cafrun -n 4 ./a.out 9 | ! or with ifx : 10 | ! $ ifx -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90 11 | 12 | program pi_monte_carlo_co_sum_steady 13 | use, intrinsic :: iso_fortran_env, only: wp=>real64, int64 14 | use m_xoroshiro128plus 15 | implicit none 16 | type(rng_t) :: rng ! xoroshiro128+ pseudo-random number generator 17 | real(wp) :: x, y ! Coordinates of a point 18 | integer(int64) :: n ! Total number of points 19 | integer(int64) :: k ! Points into the quarter disk 20 | integer(int64) :: kt, it ! Total k and i 21 | integer(int64) :: i ! Loop counter 22 | integer(int64) :: n_per_image ! Number of parallel images 23 | integer :: t1, t2 ! Clock ticks 24 | real :: count_rate ! Clock ticks per second 25 | 26 | n = 1000000000 27 | k = 0 28 | 29 | ! Each image have its own RNG seed, thanks to rng%jump() which 30 | ! generates non-overlapping subsequences for parallel computations: 31 | call rng%seed([ -1337_i8, 9812374_i8 ]) 32 | if (this_image() /= 1) then 33 | do i = 2, this_image() 34 | call rng%jump() 35 | end do 36 | end if 37 | 38 | x = rng%U01() 39 | 40 | call system_clock(t1, count_rate) 41 | 42 | n_per_image = n / num_images() 43 | write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images() 44 | write(*, '(a, i11, a)') " will compute", n_per_image, " points" 45 | 46 | do i = 1, n_per_image 47 | ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1: 48 | x = rng%U01() 49 | y = rng%U01() 50 | 51 | ! Is it in the quarter disk (R=1, center=origin) ? 52 | if ((x**2 + y**2) < 1.0_wp) k = k + 1 53 | 54 | ! Once in a while (20 times): 55 | if (mod(i, n_per_image/20) == 0) then 56 | kt = k 57 | call co_sum(kt, result_image = 1) 58 | if (this_image() == 1) then 59 | it = i*num_images() 60 | write(*, '(a, i0, a, i0, a, F17.15)') "4 * ", kt, " / ", it, " = ", (4.0_wp * kt) / it 61 | end if 62 | end if 63 | end do 64 | 65 | if (this_image() == 1) then 66 | call system_clock(t2) 67 | write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s" 68 | write(*,'(a)') "---------------------------------------------------" 69 | end if 70 | 71 | end program pi_monte_carlo_co_sum_steady 72 | -------------------------------------------------------------------------------- /pi_monte_carlo_coarrays.f90: -------------------------------------------------------------------------------- 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm 2 | ! Coarrays with final results 3 | ! Vincent Magnin, 2021-04-22 4 | ! Last modification: 2024-09-03 5 | ! MIT license 6 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 7 | ! $ cafrun -n 4 ./a.out 8 | ! or with ifx : 9 | ! $ ifx -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 10 | 11 | program pi_monte_carlo_coarrays_steady 12 | use, intrinsic :: iso_fortran_env, only: wp=>real64, int64 13 | use m_xoroshiro128plus 14 | implicit none 15 | type(rng_t) :: rng ! xoroshiro128+ pseudo-random number generator 16 | real(wp) :: x, y ! Coordinates of a point 17 | integer(int64) :: n ! Total number of points 18 | integer(int64) :: k[*] ! Points into the quarter disk 19 | integer(int64) :: kt ! Total k 20 | integer(int64) :: i, j ! Loops counters 21 | integer(int64) :: n_per_image ! Number of parallel images 22 | integer :: t1, t2 ! Clock ticks 23 | real :: count_rate ! Clock ticks per second 24 | 25 | n = 1000000000 26 | k = 0 27 | 28 | ! Each image have its own RNG seed, thanks to rng%jump() which 29 | ! generates non-overlapping subsequences for parallel computations: 30 | call rng%seed([ -1337_i8, 9812374_i8 ]) 31 | if (this_image() /= 1) then 32 | do i = 2, this_image() 33 | call rng%jump() 34 | end do 35 | end if 36 | 37 | x = rng%U01() 38 | 39 | call system_clock(t1, count_rate) 40 | 41 | n_per_image = n / num_images() 42 | write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images() 43 | write(*, '(a, i11, a)') " will compute", n_per_image, " points" 44 | 45 | do i = 1, n_per_image 46 | ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1: 47 | x = rng%U01() 48 | y = rng%U01() 49 | 50 | ! Is it in the quarter disk (R=1, center=origin) ? 51 | if ((x**2 + y**2) < 1.0_wp) k = k + 1 52 | end do 53 | 54 | ! At the end: 55 | sync all 56 | if (this_image() == 1) then 57 | kt = 0 58 | do j = 1, num_images() 59 | kt = kt + k[j] 60 | end do 61 | 62 | write(*,*) 63 | write(*, '(a, i0, a, i0)', advance='no') "4 * ", kt, " / ", n 64 | write(*, '(a, f17.15)') " = ", (4.0_wp * kt) / n 65 | 66 | call system_clock(t2) 67 | write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s" 68 | write(*,'(a)') "---------------------------------------------------" 69 | end if 70 | 71 | end program pi_monte_carlo_coarrays_steady 72 | -------------------------------------------------------------------------------- /pi_monte_carlo_coarrays_steady.f90: -------------------------------------------------------------------------------- 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm 2 | ! Coarrays version with steady results 3 | ! Vincent Magnin, 2021-04-22 4 | ! Last modification: 2024-09-03 5 | ! MIT license 6 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90 7 | ! $ cafrun -n 4 ./a.out 8 | ! or with ifx : 9 | ! $ ifx -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90 10 | 11 | program pi_monte_carlo_coarrays_steady 12 | use, intrinsic :: iso_fortran_env, only: wp=>real64, int64 13 | use m_xoroshiro128plus 14 | implicit none 15 | type(rng_t) :: rng ! xoroshiro128+ pseudo-random number generator 16 | real(wp) :: x, y ! Coordinates of a point 17 | integer(int64) :: n ! Total number of points 18 | integer(int64) :: k[*] ! Points into the quarter disk 19 | integer(int64) :: kt, it ! Total k and i 20 | integer(int64) :: i, j ! Loops counters 21 | integer(int64) :: n_per_image ! Number of parallel images 22 | integer :: t1, t2 ! Clock ticks 23 | real :: count_rate ! Clock ticks per second 24 | 25 | n = 1000000000 26 | k = 0 27 | 28 | ! Each image have its own RNG seed, thanks to rng%jump() which 29 | ! generates non-overlapping subsequences for parallel computations: 30 | call rng%seed([ -1337_i8, 9812374_i8 ]) 31 | if (this_image() /= 1) then 32 | do i = 2, this_image() 33 | call rng%jump() 34 | end do 35 | end if 36 | 37 | x = rng%U01() 38 | 39 | call system_clock(t1, count_rate) 40 | 41 | n_per_image = n / num_images() 42 | write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images() 43 | write(*, '(a, i11, a)') " will compute", n_per_image, " points" 44 | 45 | do i = 1, n_per_image 46 | ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1: 47 | x = rng%U01() 48 | y = rng%U01() 49 | 50 | ! Is it in the quarter disk (R=1, center=origin) ? 51 | if ((x**2 + y**2) < 1.0_wp) k = k + 1 52 | 53 | ! Once in a while (20 times): 54 | if (mod(i, n_per_image/20) == 0) then 55 | sync all 56 | if (this_image() == 1) then 57 | kt = 0 58 | do j = 1, num_images() 59 | kt = kt + k[j] 60 | end do 61 | it = i*num_images() 62 | 63 | write(*, '(a, i0, a, i0, a, F17.15)') "4 * ", kt, " / ", it, " = ", (4.0_wp * kt) / it 64 | end if 65 | end if 66 | end do 67 | 68 | if (this_image() == 1) then 69 | call system_clock(t2) 70 | write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s" 71 | write(*,'(a)') "---------------------------------------------------" 72 | end if 73 | 74 | end program pi_monte_carlo_coarrays_steady 75 | -------------------------------------------------------------------------------- /pi_monte_carlo_openmp.f90: -------------------------------------------------------------------------------- 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm 2 | ! OpenMP version 3 | ! Vincent Magnin, 2021-04-22 4 | ! Last modification: 2024-09-03 5 | ! MIT license 6 | ! $ gfortran -Wall -Wextra -std=f2018 -pedantic -O3 -fopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90 7 | ! $ ifx -O3 -qopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90 8 | 9 | program pi_monte_carlo_openmp 10 | use, intrinsic :: iso_fortran_env, only: wp=>real64, int64 11 | use m_xoroshiro128plus 12 | use omp_lib, only: omp_get_thread_num 13 | implicit none 14 | type(rng_t) :: rng ! xoroshiro128+ pseudo-random number generator 15 | real(wp) :: x, y ! Coordinates of a point 16 | integer(int64) :: n ! Total number of points 17 | integer(int64) :: k = 0 ! Points into the quarter disk 18 | integer(int64) :: i ! Loop counter 19 | integer :: t1, t2 ! Clock ticks 20 | real :: count_rate ! Clock ticks per second 21 | integer :: thread ! OpenMP thread number 22 | 23 | n = 1000000000 24 | 25 | call system_clock(t1, count_rate) 26 | 27 | !$OMP PARALLEL DEFAULT(NONE) SHARED(n) PRIVATE(thread, i, x, y, rng) REDUCTION(+: k) 28 | thread = omp_get_thread_num() 29 | 30 | ! Each image have its own RNG seed, thanks to rng%jump() which 31 | ! generates non-overlapping subsequences for parallel computations: 32 | call rng%seed([ -1337_i8, 9812374_i8 ]) 33 | ! Threads are numbered from 0 34 | if (thread+1 /= 1) then 35 | do i = 2, thread+1 36 | call rng%jump() 37 | end do 38 | end if 39 | 40 | x = rng%U01() 41 | 42 | !$OMP DO SCHEDULE(STATIC) 43 | do i = 1, n 44 | ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1: 45 | x = rng%U01() 46 | y = rng%U01() 47 | 48 | ! Is it in the quarter disk (R=1, center=origin) ? 49 | if ((x**2 + y**2) < 1.0_wp) k = k + 1 50 | end do 51 | !$OMP END DO 52 | print '(a, i0, a, i0)', "k", thread, " = ", k 53 | !$OMP END PARALLEL 54 | 55 | write(*,*) 56 | write(*, '(a, i0, a, i0)', advance='no') "4 * ", k, " / ", n 57 | write(*, '(a, f17.15)') " = ", (4.0_wp * k) / n 58 | 59 | call system_clock(t2) 60 | write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s" 61 | write(*,'(a)') "---------------------------------------------------" 62 | end program pi_monte_carlo_openmp 63 | -------------------------------------------------------------------------------- /pi_monte_carlo_serial.f90: -------------------------------------------------------------------------------- 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm 2 | ! Serial version 3 | ! Vincent Magnin, 2021-04-22 4 | ! Last modification: 2021-05-09 5 | ! MIT license 6 | ! $ gfortran -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90 7 | ! $ ifx -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90 8 | 9 | program pi_monte_carlo_serial 10 | use, intrinsic :: iso_fortran_env, only: wp=>real64, int64 11 | use m_xoroshiro128plus 12 | implicit none 13 | type(rng_t) :: rng ! xoroshiro128+ pseudo-random number generator 14 | real(wp) :: x, y ! Coordinates of a point 15 | integer(int64) :: n ! Total number of points 16 | integer(int64) :: k = 0 ! Points into the quarter disk 17 | integer(int64) :: i ! Loop counter 18 | integer :: t1, t2 ! Clock ticks 19 | real :: count_rate ! Clock ticks per second 20 | 21 | n = 1000000000 22 | 23 | ! Set the seed of the RNG: 24 | call rng%seed([ -1337_i8, 9812374_i8 ]) 25 | x = rng%U01() 26 | 27 | call system_clock(t1, count_rate) 28 | 29 | do i = 1, n 30 | ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1: 31 | x = rng%U01() 32 | y = rng%U01() 33 | 34 | ! Is it in the quarter disk (R=1, center=origin) ? 35 | if ((x**2 + y**2) < 1.0_wp) k = k + 1 36 | end do 37 | 38 | write(*,*) 39 | write(*, '(a, i0, a, i0)', advance='no') "4 * ", k, " / ", n 40 | write(*, '(a, f17.15)') " = ", (4.0_wp * k) / n 41 | 42 | call system_clock(t2) 43 | write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s" 44 | write(*,'(a)') "---------------------------------------------------" 45 | end program pi_monte_carlo_serial 46 | --------------------------------------------------------------------------------