├── .gitignore
├── LICENSE
├── README.md
├── benchmark.sh
├── m_xoroshiro128plus.f90
├── pi_monte_carlo_co_sum.f90
├── pi_monte_carlo_co_sum_openmp.f90
├── pi_monte_carlo_co_sum_steady.f90
├── pi_monte_carlo_coarrays.f90
├── pi_monte_carlo_coarrays_steady.f90
├── pi_monte_carlo_openmp.f90
└── pi_monte_carlo_serial.f90


/.gitignore:
--------------------------------------------------------------------------------
1 | *.txt
2 | *.mod
3 | *.out
4 | *.backup
5 | *.1
6 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Vincent Magnin
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # exploring_coarrays
  2 | 
  3 | Let's explore the modern Fortran coarrays features for parallel programming: coarrays, images, etc.
  4 | 
  5 | ## Computing Pi by Monte Carlo
  6 | 
  7 | ### The algorithm
  8 | 
  9 | Imagine a disk of radius R=1 inside a square of side 2*R. And draw N points inside the square. Count K, the number of points inside the disk. The larger N, the closer `4*K/N` is to Pi. Because `K/N` will tend to be proportional to the ratio between the surface of the disk `Pi*R**2` and the surface of the square `(2*R)**2`. The programming is a little optimized by considering only a quarter disk inside a square of side 1.
 10 | 
 11 | The advantage of Monte Carlo algorithms are that they are naturally parallel ("embarrassingly parallel"), each point being independent of the others.
 12 | 
 13 | Warnings:
 14 | 
 15 | * this is an inefficient method to compute Pi, as one more precision digit requires 100 times more points!
 16 | * If the pseudo-random generator is biased, it can be a problem if our objective is really to compute precisely Pi. But our objective here is just to burn the CPU!
 17 | 
 18 | ### The programs
 19 | 
 20 | In this repository, you will find:
 21 | 
 22 | * a serial version of the algorithm.
 23 | * A parallel version using OpenMP.
 24 | * A parallel version using Coarrays.
 25 | * Another coarrays version printing steadily intermediate results. **Bug: the intermediate results are not correct.**
 26 | * Versions using co_sum() instead of coarrays.
 27 | 
 28 | Concerning the pseudo-random number generator, we use a [Fortran implementation](https://github.com/jannisteunissen/xoroshiro128plus_fortran) (public domain) of the xoroshiro128+ algorithm. See also the page ["xoshiro / xoroshiro generators and the PRNG shootout"](https://prng.di.unimi.it/).
 29 | 
 30 | 
 31 | ### Compilation
 32 | 
 33 | Each program will be compiled with the `-O3` flag for optimization, with GFortran and Intel compilers ifort and ifx (the new Intel compiler, based on LLVM). 
 34 | 
 35 | The OpenMP version will be compiled with the `-fopenmp` flag with gfortran or `-qopenmp` with Intel compilers. The number of threads is set via the `OMP_NUM_THREADS` environment variable.
 36 | 
 37 | For gfortran, OpenCoarrays was installed with the MPICH library. The coarray versions will be compiled and run with commands like:
 38 | 
 39 | ```bash
 40 | $ caf -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && cafrun -n 2 ./a.out
 41 | ```
 42 | 
 43 | And for ifort:
 44 | 
 45 | ```bash
 46 | $ export FOR_COARRAY_NUM_IMAGES=2
 47 | $ ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && ./a.out
 48 | ```
 49 | 
 50 | For ifx:
 51 | 
 52 | ```bash
 53 | $ ifx -O3 -coarray=shared -coarray-num-images=2 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90
 54 | ```
 55 | 
 56 | ### Methodology
 57 | 
 58 | The values are the mean values obtained with 10 runs, computed by:
 59 | 
 60 | ```bash
 61 | $ ./benchmark.sh
 62 | ```
 63 | 
 64 | Warning: this benchmark is valid for those programs, on those machines, with those compilers and libraries versions, with those compilers options. The results can not be generalized easily to other situations. Just try and see with your own programs. 
 65 | 
 66 | ### Results #1 (May 2021)
 67 | 
 68 | The compiler versions are:
 69 | 
 70 | * ifort 2021.2.0.
 71 | * ifx 2021.2.0 Beta (ifx does not yet support `-corray`).
 72 | * gfortran 10.2.0.
 73 | 
 74 | on an Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz, under Ubuntu 20.10.
 75 | 
 76 | 
 77 | CPU time in seconds with 2 images/threads (except of course Serial):
 78 | 
 79 | | Version              | gfortran | ifort   | ifx     |
 80 | | -------------------- | -------- | ------- | ------- |
 81 | | Serial               |  10.77   | 18.77   | 14.66   |
 82 | | OpenMP               |   5.75   |  9.32   | 60.30   |
 83 | | Coarrays             |  13.21   |  9.79   |         |
 84 | | Coarrays steady      |  21.80   | 27.83   |         |
 85 | | Co_sum               |   5.58   |  9.98   |         |
 86 | | Co_sum steady        |   9.18   | 12.71   |         |
 87 | 
 88 | With 4 images/threads (except of course Serial):
 89 | 
 90 | | Version              | gfortran | ifort   | ifx     |
 91 | | -------------------- | -------- | ------- | ------- |
 92 | | Serial               |  10.77   | 18.77   | 14.66   |
 93 | | OpenMP               |   4.36   |  8.42   | 43.21   |
 94 | | Coarrays             |   9.47   |  9.12   |         |
 95 | | Coarrays steady      |  19.41   | 24.78   |         |
 96 | | Co_sum               |   4.16   |  9.29   |         |
 97 | | Co_sum steady        |   8.18   | 10.94   |         |
 98 | 
 99 | 
100 | ### Results #2 (January 2024)
101 | 
102 | The compiler versions are:
103 | * gfortran 11.4.0
104 | * ifort 2021.11.1
105 | * ifx 2024.0.2
106 | 
107 | on a 13th Gen Intel(R) Core(TM) i5-13500, under Ubuntu 22.04.
108 | 
109 | With 2 images/threads (except of course Serial) with additional co_sum and openMP benchmark. The gfortran `co_sum` method includes the `-flto` flag as below.
110 | 
111 | 
112 | | Version              | gfortran | ifort   | ifx     |
113 | | -------------------- | -------- | ------- | ------- |
114 | | Serial               |  11.11   | 28.02   | 14.26   |
115 | | OpenMP               |  7.86    | 14.40   | 5.37    |
116 | | Coarrays             |  8.06    | 10.42   | 7.29    |
117 | | Coarrays steady      |  8.98    | 16.85   | 14.38   |
118 | | Co_sum               |  2.12    | 10.45   | 6.99    |
119 | | Co_sum steady        |  3.37    | 10.93   | 10.93   |
120 | | Co_sum & openMP      |  1.12    | 7.59    | 2.72    |
121 | 
122 | 
123 | ### Further optimization
124 | 
125 | With gfortran, the `-flto` *([standard link-time optimizer](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html))* compilation option has a strong effect on this algorithm: for example, with the `co_sum` version the CPU time with 4 images falls from 4.16 s to 2.38 s (results #1)! 
126 | 
127 | 
128 | # Bibliography
129 | 
130 | * Curcic, Milan. [Modern Fortran - Building efficient parallel applications](https://learning.oreilly.com/library/view/-/9781617295287/?ar), Manning Publications, 1st edition, novembre 2020, ISBN 978-1-61729-528-7.
131 | * Metcalf, Michael, John Ker Reid, et Malcolm Cohen. *[Modern Fortran Explained: Incorporating Fortran 2018.](https://oxford.universitypressscholarship.com/view/10.1093/oso/9780198811893.001.0001/oso-9780198811893)* Numerical Mathematics and Scientific Computation. Oxford (England): Oxford University Press, 2018, ISBN 978-0-19-185002-8.
132 | * Thomas Koenig, [coarray-tutorial](https://github.com/tkoenig1/coarray-tutorial/blob/main/tutorial.md).
133 | 


--------------------------------------------------------------------------------
/benchmark.sh:
--------------------------------------------------------------------------------
  1 | #! /bin/bash
  2 | # Vincent Magnin, 2021-05-09
  3 | # Ryan Bignell, 2024-01-17
  4 | # Last modification: 2024-09-04
  5 | # Launch the Pi Monte Carlo benchmark
  6 | # MIT license
  7 | # Verified with shellcheck
  8 | 
  9 | # Strict mode:
 10 | set -uo pipefail
 11 | 
 12 | # Launches N times the a.out executable and copy the output in a file.
 13 | launch_N_times()
 14 | {
 15 |     # Input parameters:
 16 |     local readonly N="$1"
 17 |     local readonly filename="$2"
 18 |     local readonly executable="$3"
 19 | 
 20 |     for ((i = 1 ; i <= N ; i++)) ; do
 21 |         ${executable} | tee -a "$filename"
 22 |     done
 23 | }
 24 | 
 25 | # Compute the computation mean time using the times in the file.
 26 | mean_time()
 27 | {
 28 |     # Input parameters:
 29 |     local readonly testname="$1"
 30 |     # We grep real numbers with 3 decimals followed by a space (CPU times)
 31 |     # and we compute their mean value using awk:
 32 |     echo $(grep -oE '[0-9]+\.[0-9]{3} ' "${testname}.txt" | awk '{ total += $1 } END { printf "%5.2f", total/NR }')
 33 | }
 34 | 
 35 | #***************
 36 | # Main program:
 37 | #***************
 38 | readonly runs=10
 39 | readonly threads=2
 40 | 
 41 | # Environment variables for OpenMP and Coarrays (ifort):
 42 | export OMP_NUM_THREADS="${threads}"
 43 | export FOR_COARRAY_NUM_IMAGES="${threads}"
 44 | 
 45 | # Cleanup:
 46 | rm -f gfortran*.txt
 47 | rm -f ifort*.txt
 48 | rm -f ifx*.txt
 49 | 
 50 | # All examples are compiled and launched several times, and the results are
 51 | # copied into a txt file:
 52 | 
 53 | test_name="gfortran_serial"
 54 | echo "$test_name"
 55 | gfortran -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 56 | 
 57 | test_name="ifort_serial"
 58 | echo "$test_name"
 59 | ifort -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 60 | 
 61 | test_name="ifx_serial"
 62 | echo "$test_name"
 63 | ifx -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 64 | 
 65 | test_name="gfortran_openmp"
 66 | echo "$test_name"
 67 | gfortran -O3 -fopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 68 | 
 69 | test_name="ifort_openmp"
 70 | echo "$test_name"
 71 | ifort -O3 -qopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 72 | 
 73 | test_name="ifx_openmp"
 74 | echo "$test_name"
 75 | ifx -O3 -qopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 76 | 
 77 | test_name="gfortran_coarrays"
 78 | echo "$test_name"
 79 | caf -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out"
 80 | 
 81 | test_name="ifort_coarrays"
 82 | echo "$test_name"
 83 | ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 84 | 
 85 | test_name="ifx_coarrays"
 86 | echo "$test_name"
 87 | ifx -O3 -coarray=shared -coarray-num-images=${threads} m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 88 | 
 89 | test_name="gfortran_coarrays_steady"
 90 | echo "$test_name"
 91 | caf -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out"
 92 | 
 93 | test_name="ifort_coarrays_steady"
 94 | echo "$test_name"
 95 | ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
 96 | 
 97 | test_name="ifx_coarrays_steady"
 98 | echo "$test_name"
 99 | ifx -O3 -coarray=shared -coarray-num-images=${threads} m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
100 | 
101 | test_name="gfortran_co_sum"
102 | echo "$test_name"
103 | caf -O3 -flto m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out"
104 | 
105 | test_name="ifort_co_sum"
106 | echo "$test_name"
107 | ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
108 | 
109 | test_name="ifx_co_sum"
110 | echo "$test_name"
111 | ifx -coarray=shared -coarray-num-images=${threads} -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
112 | 
113 | 
114 | test_name="gfortran_co_sum_steady"
115 | echo "$test_name"
116 | caf -O3 m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out"
117 | 
118 | test_name="ifort_co_sum_steady"
119 | echo "$test_name"
120 | ifort -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
121 | 
122 | test_name="ifx_co_sum_steady"
123 | echo "$test_name"
124 | ifx -O3 -coarray=shared -coarray-num-images=${threads} m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
125 | 
126 | test_name="gfortran_co_sum_openmp"
127 | echo "$test_name"
128 | caf -O3 -fopenmp -flto m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "cafrun -n ${threads} ./a.out"
129 | 
130 | test_name="ifort_co_sum_openmp"
131 | echo "$test_name"
132 | ifort -O3 -qopenmp -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
133 | 
134 | test_name="ifx_co_sum_openmp"
135 | echo "$test_name"
136 | ifx -coarray=shared -coarray-num-images=${threads} -qopenmp -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90 && launch_N_times "$runs" "$test_name.txt" "./a.out"
137 | 
138 | 
139 | # The CPU times mean values are computed with each txt file:
140 | echo '****************************************'
141 | echo ' STATISTICS (Markdown table)'
142 | echo '****************************************'
143 | 
144 | echo '| Version              | gfortran | ifort   | ifx     |'
145 | echo '| -------------------- | -------- | ------- | ------- |'
146 | echo "| Serial               |  $(mean_time 'gfortran_serial')   | $(mean_time 'ifort_serial') | $(mean_time 'ifx_serial') |"
147 | echo "| OpenMP               |  $(mean_time 'gfortran_openmp')  | $(mean_time 'ifort_openmp') | $(mean_time 'ifx_openmp') |"
148 | echo "| Coarrays             |  $(mean_time 'gfortran_coarrays')  | $(mean_time 'ifort_coarrays') | $(mean_time 'ifx_coarrays')          |"
149 | echo "| Coarrays steady      |  $(mean_time 'gfortran_coarrays_steady')  | $(mean_time 'ifort_coarrays_steady') | $(mean_time 'ifx_coarrays_steady')          |"
150 | echo "| Co_sum               |  $(mean_time 'gfortran_co_sum')  | $(mean_time 'ifort_co_sum') | $(mean_time 'ifx_co_sum')          |"
151 | echo "| Co_sum steady        |  $(mean_time 'gfortran_co_sum_steady')  | $(mean_time 'ifort_co_sum_steady')   | $(mean_time 'ifort_co_sum_steady')        |"
152 | echo "| Co_sum & openMP      |  $(mean_time 'gfortran_co_sum_openmp')  | $(mean_time 'ifort_co_sum_openmp') | $(mean_time 'ifx_co_sum_openmp')          |"
153 | echo
154 | 
155 | echo "Compilers versions:"
156 | echo "-------------------"
157 | gfortran --version
158 | ifort --version
159 | ifx --version
160 | 


--------------------------------------------------------------------------------
/m_xoroshiro128plus.f90:
--------------------------------------------------------------------------------
  1 | ! Written in 2016 by David Blackman and Sebastiano Vigna (vigna@acm.org)
  2 | ! Translated to Fortran 2008 by Jannis Teunissen
  3 | 
  4 | ! To the extent possible under law, the author has dedicated all copyright
  5 | ! and related and neighboring rights to this software to the public domain
  6 | ! worldwide. This software is distributed without any warranty.
  7 | 
  8 | ! See <http://creativecommons.org/publicdomain/zero/1.0/>.
  9 | 
 10 | ! This is the successor to xorshift128+. It is the fastest full-period
 11 | ! generator passing BigCrush without systematic failures, but due to the
 12 | ! relatively short period it is acceptable only for applications with a
 13 | ! mild amount of parallelism; otherwise, use a xorshift1024* generator.
 14 | 
 15 | ! Beside passing BigCrush, this generator passes the PractRand test suite
 16 | ! up to (and included) 16TB, with the exception of binary rank tests,
 17 | ! which fail due to the lowest bit being an LFSR; all other bits pass all
 18 | ! tests. We suggest to use a sign test to extract a random Boolean value.
 19 | 
 20 | ! Note that the generator uses a simulated rotate operation, which most C
 21 | ! compilers will turn into a single instruction. In Java, you can use
 22 | ! Long.rotateLeft(). In languages that do not make low-level rotation
 23 | ! instructions accessible xorshift128+ could be faster.
 24 | 
 25 | ! The state must be seeded so that it is not everywhere zero. If you have
 26 | ! a 64-bit seed, we suggest to seed a splitmix64 generator and use its
 27 | ! output to fill s.
 28 | 
 29 | ! Usage example:
 30 | !
 31 | ! use m_xoroshiro128plus
 32 | !
 33 | ! type(rng_t) :: rng
 34 | ! call rng%seed((/1337_i8, 31337_i8/))
 35 | !
 36 | ! print *, rng%next()
 37 | ! print *, rng%U01()
 38 | 
 39 | module m_xoroshiro128plus
 40 |   implicit none
 41 |   private
 42 | 
 43 |   ! This defines a 64 bit integer type
 44 |   integer, parameter :: i8 = selected_int_kind(18)
 45 | 
 46 |   ! This defines a 64 bit floating point type (Double Precision)
 47 |   integer, parameter :: dp = kind(0.0d0)
 48 | 
 49 |   ! A type/class to store the RNG state
 50 |   type rng_t
 51 |      ! The default seed (arbitrarily chosen)
 52 |      integer(i8)                :: s(2) = (/123456789_i8, 987654321_i8/)
 53 |    contains                             ! Methods:
 54 |      procedure, non_overridable :: next ! Get next random number
 55 |      procedure, non_overridable :: U01  ! Get next random float [0,1)
 56 |      procedure, non_overridable :: seed ! Seed the generator
 57 |      procedure, non_overridable :: jump ! Jump function (see below)
 58 |   end type rng_t
 59 | 
 60 |   ! List of public types
 61 |   public :: i8, dp
 62 |   public :: rng_t
 63 | 
 64 | contains
 65 | 
 66 |   ! For internal use
 67 |   pure function rotl(x, k) result(res)
 68 |     integer(i8), intent(in) :: x
 69 |     integer, intent(in)     :: k
 70 |     integer(i8)             :: res
 71 | 
 72 |     res = ior(shiftl(x, k), shiftr(x, 64 - k))
 73 |   end function rotl
 74 | 
 75 |   ! Get the next value (returned as 64 bit signed integer)
 76 |   function next(self) result(res)
 77 |     class(rng_t), intent(inout) :: self
 78 |     integer(i8)                 :: res
 79 |     integer(i8)                 :: t(2)
 80 | 
 81 |     t         = self%s
 82 |     res       = t(1) + t(2)
 83 |     t(2)      = ieor(t(1), t(2))
 84 |     self%s(1) = ieor(ieor(rotl(t(1), 55), t(2)), shiftl(t(2), 14))
 85 |     self%s(2) = rotl(t(2), 36)
 86 |   end function next
 87 | 
 88 |   ! Get a uniform [0,1) random real (double precision)
 89 |   function U01(self) result(res)
 90 |     class(rng_t), intent(inout) :: self
 91 |     real(dp)                    :: res
 92 |     integer(i8)                 :: x
 93 | 
 94 |     x   = self%next()
 95 |     x   = ior(shiftl(1023_i8, 52), shiftr(x, 12))
 96 |     res = transfer(x, res) - 1.0_dp
 97 |   end function U01
 98 | 
 99 |   ! Set a seed for the RNG
100 |   subroutine seed(self, the_seed)
101 |     class(rng_t), intent(inout) :: self
102 |     integer(i8), intent(in)     :: the_seed(2)
103 | 
104 |     self%s = the_seed
105 |   end subroutine seed
106 | 
107 |   ! This is the jump function for the generator. It is equivalent
108 |   ! to 2^64 calls to next(); it can be used to generate 2^64
109 |   ! non-overlapping subsequences for parallel computations.
110 |   subroutine jump(self)
111 |     class(rng_t), intent(inout) :: self
112 |     integer                     :: i, b
113 |     integer(i8)                 :: t(2), dummy
114 | 
115 |     ! The signed equivalent of the unsigned constants
116 |     integer(i8), parameter      :: jmp_c(2) = &
117 |          (/-4707382666127344949_i8, -2852180941702784734_i8/)
118 | 
119 |     t = 0
120 |     do i = 1, 2
121 |        do b = 0, 63
122 |           if (iand(jmp_c(i), shiftl(1_i8, b)) /= 0) then
123 |              t = ieor(t, self%s)
124 |           end if
125 |           dummy = self%next()
126 |        end do
127 |     end do
128 | 
129 |     self%s = t
130 |   end subroutine jump
131 | 
132 | end module m_xoroshiro128plus
133 | 


--------------------------------------------------------------------------------
/pi_monte_carlo_co_sum.f90:
--------------------------------------------------------------------------------
 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm
 2 | ! Co_sum with final results
 3 | ! Vincent Magnin, 2021-04-22
 4 | ! and Brad Richardson
 5 | ! Last modification: 2024-09-03
 6 | ! MIT license
 7 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90
 8 | ! $ cafrun -n 4 ./a.out
 9 | ! or with ifx :
10 | ! $ ifx -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum.f90
11 | 
12 | program pi_monte_carlo_co_sum
13 |     use, intrinsic :: iso_fortran_env, only: wp=>real64, int64
14 |     use m_xoroshiro128plus
15 |     implicit none
16 |     type(rng_t)     :: rng          ! xoroshiro128+ pseudo-random number generator
17 |     real(wp)        :: x, y         ! Coordinates of a point
18 |     integer(int64)  :: n            ! Total number of points
19 |     integer(int64)  :: k            ! Points into the quarter disk
20 |     integer(int64)  :: i            ! Loop counter
21 |     integer(int64)  :: n_per_image  ! Number of parallel images
22 |     integer         :: t1, t2       ! Clock ticks
23 |     real            :: count_rate   ! Clock ticks per second
24 | 
25 |     n = 1000000000
26 |     k = 0
27 | 
28 |     ! Each image have its own RNG seed, thanks to rng%jump() which
29 |     ! generates non-overlapping subsequences for parallel computations:
30 |     call rng%seed([ -1337_i8, 9812374_i8 ])
31 |     if (this_image() /= 1) then
32 |         do i = 2, this_image()
33 |             call rng%jump()
34 |         end do
35 |     end if
36 | 
37 |     x = rng%U01()
38 | 
39 |     call system_clock(t1, count_rate)
40 | 
41 |     n_per_image = n / num_images()
42 |     write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images()
43 |     write(*, '(a, i11, a)') " will compute", n_per_image, " points"
44 | 
45 |     do i = 1, n_per_image
46 |         ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1:
47 |         x = rng%U01()
48 |         y = rng%U01()
49 | 
50 |         ! Is it in the quarter disk (R=1, center=origin) ?
51 |         if ((x**2 + y**2) < 1.0_wp) k = k + 1
52 |     end do
53 | 
54 |     ! At the end:
55 |     call co_sum(k, result_image = 1)
56 |     if (this_image() == 1) then
57 |         write(*,*)
58 |         write(*, '(a, i0, a, i0)', advance='no') "4 * ", k, " / ", n
59 |         write(*, '(a, f17.15)') " = ", (4.0_wp * k) / n
60 | 
61 |         call system_clock(t2)
62 |         write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s"
63 |         write(*,'(a)') "---------------------------------------------------"
64 |     end if
65 | 
66 | end program pi_monte_carlo_co_sum
67 | 


--------------------------------------------------------------------------------
/pi_monte_carlo_co_sum_openmp.f90:
--------------------------------------------------------------------------------
 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm
 2 | ! Co_sum with final results
 3 | ! Vincent Magnin, 2021-04-22
 4 | ! and Brad Richardson
 5 | ! and Ryan Bignell
 6 | ! Last modification: 2024-09-04
 7 | ! MIT license
 8 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 -fopenmp m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90
 9 | ! $ cafrun -n 4 ./a.out
10 | ! or with ifx :
11 | ! $ ifx -O3 -qopenmp -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_openmp.f90
12 | 
13 | program pi_monte_carlo_co_sum_openmp
14 |      use, intrinsic :: iso_fortran_env, only: wp=>real64, int64
15 |     use m_xoroshiro128plus
16 |     use omp_lib, only: omp_get_thread_num, omp_get_num_threads
17 |     implicit none
18 |     type(rng_t)     :: rng          ! xoroshiro128+ pseudo-random number generator
19 |     real(wp)        :: x, y         ! Coordinates of a point
20 |     integer(int64)  :: n            ! Total number of points
21 |     integer(int64)  :: k            ! Points into the quarter disk
22 |     integer(int64)  :: i            ! Loop counter
23 |     integer(int64)  :: n_per_image  ! Number of parallel images
24 |     integer         :: t1, t2       ! Clock ticks
25 |     real            :: count_rate   ! Clock ticks per second
26 |     integer         :: thread       ! OpenMP thread number
27 |     integer         :: nth
28 | 
29 |     n = 1000000000
30 |     k = 0
31 | 
32 |     call system_clock(t1, count_rate)
33 | 
34 |     !$OMP PARALLEL DEFAULT(NONE) SHARED(n_per_image,n) PRIVATE(thread, nth, i, x, y, rng) REDUCTION(+: k)
35 |     thread = omp_get_thread_num()
36 | 
37 |     ! Each image have its own RNG seed, thanks to rng%jump() which
38 |     ! generates non-overlapping subsequences for parallel computations:
39 |     call rng%seed([ -1337_i8, 9812374_i8 ])
40 |     ! Threads are numbered from 0, images from 1.
41 |     ! We compute a unique number for each task, starting from 1:
42 |     nth = (this_image() - 1) * omp_get_num_threads() + (thread + 1)
43 |     if (nth /= 1) then
44 |         do i = 2, nth
45 |             call rng%jump()
46 |         end do
47 |     end if
48 | 
49 |     x = rng%U01()
50 | 
51 |     n_per_image = n / num_images()
52 |     write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images()
53 |     write(*, '(a, i11, a)') " will compute", n_per_image, " points"
54 | 
55 |     !$OMP DO SCHEDULE(STATIC)
56 |     do i = 1, n_per_image
57 |         ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1:
58 |         x = rng%U01()
59 |         y = rng%U01()
60 | 
61 |         ! Is it in the quarter disk (R=1, center=origin) ?
62 |         if ((x**2 + y**2) < 1.0_wp) k = k + 1
63 |      end do
64 |      !$OMP END DO
65 |      !$OMP END PARALLEL
66 | 
67 |     ! At the end:
68 |     call co_sum(k, result_image = 1)
69 |     if (this_image() == 1) then
70 |         write(*,*)
71 |         write(*, '(a, i0, a, i0)', advance='no') "4 * ", k, " / ", n
72 |         write(*, '(a, f17.15)') " = ", (4.0_wp * k) / n
73 | 
74 |         call system_clock(t2)
75 |         write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s"
76 |         write(*,'(a)') "---------------------------------------------------"
77 |     end if
78 | 
79 |   end program pi_monte_carlo_co_sum_openmp
80 | 


--------------------------------------------------------------------------------
/pi_monte_carlo_co_sum_steady.f90:
--------------------------------------------------------------------------------
 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm
 2 | ! Co_sum version with steady results
 3 | ! Vincent Magnin, 2021-04-22
 4 | ! and Brad Richardson
 5 | ! Last modification: 2024-09-03
 6 | ! MIT license
 7 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90
 8 | ! $ cafrun -n 4 ./a.out
 9 | ! or with ifx :
10 | ! $ ifx -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_co_sum_steady.f90
11 | 
12 | program pi_monte_carlo_co_sum_steady
13 |     use, intrinsic :: iso_fortran_env, only: wp=>real64, int64
14 |     use m_xoroshiro128plus
15 |     implicit none
16 |     type(rng_t)     :: rng          ! xoroshiro128+ pseudo-random number generator
17 |     real(wp)        :: x, y         ! Coordinates of a point
18 |     integer(int64)  :: n            ! Total number of points
19 |     integer(int64)  :: k            ! Points into the quarter disk
20 |     integer(int64)  :: kt, it       ! Total k and i
21 |     integer(int64)  :: i            ! Loop counter
22 |     integer(int64)  :: n_per_image  ! Number of parallel images
23 |     integer         :: t1, t2       ! Clock ticks
24 |     real            :: count_rate   ! Clock ticks per second
25 | 
26 |     n = 1000000000
27 |     k = 0
28 | 
29 |     ! Each image have its own RNG seed, thanks to rng%jump() which
30 |     ! generates non-overlapping subsequences for parallel computations:
31 |     call rng%seed([ -1337_i8, 9812374_i8 ])
32 |     if (this_image() /= 1) then
33 |         do i = 2, this_image()
34 |             call rng%jump()
35 |         end do
36 |     end if
37 | 
38 |     x = rng%U01()
39 | 
40 |     call system_clock(t1, count_rate)
41 | 
42 |     n_per_image = n / num_images()
43 |     write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images()
44 |     write(*, '(a, i11, a)') " will compute", n_per_image, " points"
45 | 
46 |     do i = 1, n_per_image
47 |         ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1:
48 |         x = rng%U01()
49 |         y = rng%U01()
50 | 
51 |         ! Is it in the quarter disk (R=1, center=origin) ?
52 |         if ((x**2 + y**2) < 1.0_wp) k = k + 1
53 | 
54 |         ! Once in a while (20 times):
55 |         if (mod(i, n_per_image/20) == 0) then
56 |             kt = k
57 |             call co_sum(kt, result_image = 1)
58 |             if (this_image() == 1) then
59 |                 it = i*num_images()
60 |                 write(*, '(a, i0, a, i0, a, F17.15)') "4 * ", kt, " / ", it, " = ", (4.0_wp * kt) / it
61 |             end if
62 |         end if
63 |     end do
64 | 
65 |     if (this_image() == 1) then
66 |         call system_clock(t2)
67 |         write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s"
68 |         write(*,'(a)') "---------------------------------------------------"
69 |     end if
70 | 
71 | end program pi_monte_carlo_co_sum_steady
72 | 


--------------------------------------------------------------------------------
/pi_monte_carlo_coarrays.f90:
--------------------------------------------------------------------------------
 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm
 2 | ! Coarrays with final results
 3 | ! Vincent Magnin, 2021-04-22
 4 | ! Last modification: 2024-09-03
 5 | ! MIT license
 6 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90
 7 | ! $ cafrun -n 4 ./a.out
 8 | ! or with ifx :
 9 | ! $ ifx -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays.f90
10 | 
11 | program pi_monte_carlo_coarrays_steady
12 |     use, intrinsic :: iso_fortran_env, only: wp=>real64, int64
13 |     use m_xoroshiro128plus
14 |     implicit none
15 |     type(rng_t)     :: rng          ! xoroshiro128+ pseudo-random number generator
16 |     real(wp)        :: x, y         ! Coordinates of a point
17 |     integer(int64)  :: n            ! Total number of points
18 |     integer(int64)  :: k[*]         ! Points into the quarter disk
19 |     integer(int64)  :: kt           ! Total k
20 |     integer(int64)  :: i, j         ! Loops counters
21 |     integer(int64)  :: n_per_image  ! Number of parallel images
22 |     integer         :: t1, t2       ! Clock ticks
23 |     real            :: count_rate   ! Clock ticks per second
24 | 
25 |     n = 1000000000
26 |     k = 0
27 | 
28 |     ! Each image have its own RNG seed, thanks to rng%jump() which
29 |     ! generates non-overlapping subsequences for parallel computations:
30 |     call rng%seed([ -1337_i8, 9812374_i8 ])
31 |     if (this_image() /= 1) then
32 |         do i = 2, this_image()
33 |             call rng%jump()
34 |         end do
35 |     end if
36 | 
37 |     x = rng%U01()
38 | 
39 |     call system_clock(t1, count_rate)
40 | 
41 |     n_per_image = n / num_images()
42 |     write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images()
43 |     write(*, '(a, i11, a)') " will compute", n_per_image, " points"
44 | 
45 |     do i = 1, n_per_image
46 |         ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1:
47 |         x = rng%U01()
48 |         y = rng%U01()
49 | 
50 |         ! Is it in the quarter disk (R=1, center=origin) ?
51 |         if ((x**2 + y**2) < 1.0_wp) k = k + 1
52 |     end do
53 | 
54 |     ! At the end:
55 |     sync all
56 |     if (this_image() == 1) then
57 |         kt = 0
58 |         do j = 1, num_images()
59 |             kt = kt + k[j]
60 |         end do
61 | 
62 |         write(*,*)
63 |         write(*, '(a, i0, a, i0)', advance='no') "4 * ", kt, " / ", n
64 |         write(*, '(a, f17.15)') " = ", (4.0_wp * kt) / n
65 | 
66 |         call system_clock(t2)
67 |         write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s"
68 |         write(*,'(a)') "---------------------------------------------------"
69 |     end if
70 | 
71 | end program pi_monte_carlo_coarrays_steady
72 | 


--------------------------------------------------------------------------------
/pi_monte_carlo_coarrays_steady.f90:
--------------------------------------------------------------------------------
 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm
 2 | ! Coarrays version with steady results
 3 | ! Vincent Magnin, 2021-04-22
 4 | ! Last modification: 2024-09-03
 5 | ! MIT license
 6 | ! $ caf -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90
 7 | ! $ cafrun -n 4 ./a.out
 8 | ! or with ifx :
 9 | ! $ ifx -O3 -coarray m_xoroshiro128plus.f90 pi_monte_carlo_coarrays_steady.f90
10 | 
11 | program pi_monte_carlo_coarrays_steady
12 |     use, intrinsic :: iso_fortran_env, only: wp=>real64, int64
13 |     use m_xoroshiro128plus
14 |     implicit none
15 |     type(rng_t)     :: rng          ! xoroshiro128+ pseudo-random number generator
16 |     real(wp)        :: x, y         ! Coordinates of a point
17 |     integer(int64)  :: n            ! Total number of points
18 |     integer(int64)  :: k[*]         ! Points into the quarter disk
19 |     integer(int64)  :: kt, it       ! Total k and i
20 |     integer(int64)  :: i, j         ! Loops counters
21 |     integer(int64)  :: n_per_image  ! Number of parallel images
22 |     integer         :: t1, t2       ! Clock ticks
23 |     real            :: count_rate   ! Clock ticks per second
24 | 
25 |     n = 1000000000
26 |     k = 0
27 | 
28 |     ! Each image have its own RNG seed, thanks to rng%jump() which
29 |     ! generates non-overlapping subsequences for parallel computations:
30 |     call rng%seed([ -1337_i8, 9812374_i8 ])
31 |     if (this_image() /= 1) then
32 |         do i = 2, this_image()
33 |             call rng%jump()
34 |         end do
35 |     end if
36 | 
37 |     x = rng%U01()
38 | 
39 |     call system_clock(t1, count_rate)
40 | 
41 |     n_per_image = n / num_images()
42 |     write(*, '(a, i3, a, i3)', advance='no') "Image ", this_image(), "/", num_images()
43 |     write(*, '(a, i11, a)') " will compute", n_per_image, " points"
44 | 
45 |     do i = 1, n_per_image
46 |         ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1:
47 |         x = rng%U01()
48 |         y = rng%U01()
49 | 
50 |         ! Is it in the quarter disk (R=1, center=origin) ?
51 |         if ((x**2 + y**2) < 1.0_wp) k = k + 1
52 | 
53 |         ! Once in a while (20 times):
54 |         if (mod(i, n_per_image/20) == 0) then
55 |             sync all
56 |             if (this_image() == 1) then
57 |                 kt = 0
58 |                 do j = 1, num_images()
59 |                     kt = kt + k[j]
60 |                 end do
61 |                 it = i*num_images()
62 | 
63 |                 write(*, '(a, i0, a, i0, a, F17.15)') "4 * ", kt, " / ", it, " = ", (4.0_wp * kt) / it
64 |             end if
65 |         end if
66 |     end do
67 | 
68 |     if (this_image() == 1) then
69 |         call system_clock(t2)
70 |         write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s"
71 |         write(*,'(a)') "---------------------------------------------------"
72 |     end if
73 | 
74 | end program pi_monte_carlo_coarrays_steady
75 | 


--------------------------------------------------------------------------------
/pi_monte_carlo_openmp.f90:
--------------------------------------------------------------------------------
 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm
 2 | ! OpenMP version
 3 | ! Vincent Magnin, 2021-04-22
 4 | ! Last modification: 2024-09-03
 5 | ! MIT license
 6 | ! $ gfortran -Wall -Wextra -std=f2018 -pedantic -O3 -fopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90
 7 | ! $ ifx -O3 -qopenmp m_xoroshiro128plus.f90 pi_monte_carlo_openmp.f90
 8 | 
 9 | program pi_monte_carlo_openmp
10 |     use, intrinsic :: iso_fortran_env, only: wp=>real64, int64
11 |     use m_xoroshiro128plus
12 |     use omp_lib, only: omp_get_thread_num
13 |     implicit none
14 |     type(rng_t)     :: rng          ! xoroshiro128+ pseudo-random number generator
15 |     real(wp)        :: x, y         ! Coordinates of a point
16 |     integer(int64)  :: n            ! Total number of points
17 |     integer(int64)  :: k = 0        ! Points into the quarter disk
18 |     integer(int64)  :: i            ! Loop counter
19 |     integer         :: t1, t2       ! Clock ticks
20 |     real            :: count_rate   ! Clock ticks per second
21 |     integer         :: thread       ! OpenMP thread number
22 | 
23 |     n = 1000000000
24 | 
25 |     call system_clock(t1, count_rate)
26 | 
27 |     !$OMP PARALLEL DEFAULT(NONE) SHARED(n) PRIVATE(thread, i, x, y, rng) REDUCTION(+: k)
28 |     thread = omp_get_thread_num()
29 | 
30 |     ! Each image have its own RNG seed, thanks to rng%jump() which
31 |     ! generates non-overlapping subsequences for parallel computations:
32 |     call rng%seed([ -1337_i8, 9812374_i8 ])
33 |     ! Threads are numbered from 0
34 |     if (thread+1 /= 1) then
35 |         do i = 2, thread+1
36 |             call rng%jump()
37 |         end do
38 |     end if
39 | 
40 |     x = rng%U01()
41 | 
42 |     !$OMP DO SCHEDULE(STATIC)
43 |     do i = 1, n
44 |         ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1:
45 |         x = rng%U01()
46 |         y = rng%U01()
47 | 
48 |         ! Is it in the quarter disk (R=1, center=origin) ?
49 |         if ((x**2 + y**2) < 1.0_wp) k = k + 1
50 |     end do
51 |     !$OMP END DO
52 |     print '(a, i0, a, i0)', "k", thread, " = ", k
53 |     !$OMP END PARALLEL
54 | 
55 |     write(*,*)
56 |     write(*, '(a, i0, a, i0)', advance='no') "4 * ", k, " / ", n
57 |     write(*, '(a, f17.15)') " = ", (4.0_wp * k) / n
58 | 
59 |     call system_clock(t2)
60 |     write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s"
61 |     write(*,'(a)') "---------------------------------------------------"
62 | end program pi_monte_carlo_openmp
63 | 


--------------------------------------------------------------------------------
/pi_monte_carlo_serial.f90:
--------------------------------------------------------------------------------
 1 | ! Computes an approximation of Pi with a Monte Carlo algorithm
 2 | ! Serial version
 3 | ! Vincent Magnin, 2021-04-22
 4 | ! Last modification: 2021-05-09
 5 | ! MIT license
 6 | ! $ gfortran -Wall -Wextra -std=f2018 -pedantic -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90
 7 | ! $ ifx -O3 m_xoroshiro128plus.f90 pi_monte_carlo_serial.f90
 8 | 
 9 | program pi_monte_carlo_serial
10 |     use, intrinsic :: iso_fortran_env, only: wp=>real64, int64
11 |     use m_xoroshiro128plus
12 |     implicit none
13 |     type(rng_t)     :: rng          ! xoroshiro128+ pseudo-random number generator
14 |     real(wp)        :: x, y         ! Coordinates of a point
15 |     integer(int64)  :: n            ! Total number of points
16 |     integer(int64)  :: k = 0        ! Points into the quarter disk
17 |     integer(int64)  :: i            ! Loop counter
18 |     integer         :: t1, t2       ! Clock ticks
19 |     real            :: count_rate   ! Clock ticks per second
20 | 
21 |     n = 1000000000
22 | 
23 |     ! Set the seed of the RNG:
24 |     call rng%seed([ -1337_i8, 9812374_i8 ])
25 |     x = rng%U01()
26 | 
27 |     call system_clock(t1, count_rate)
28 | 
29 |     do i = 1, n
30 |         ! Computing a random point (x,y) into the square 0<=x<1, 0<=y<1:
31 |         x = rng%U01()
32 |         y = rng%U01()
33 | 
34 |         ! Is it in the quarter disk (R=1, center=origin) ?
35 |         if ((x**2 + y**2) < 1.0_wp) k = k + 1
36 |     end do
37 | 
38 |     write(*,*)
39 |     write(*, '(a, i0, a, i0)', advance='no') "4 * ", k, " / ", n
40 |     write(*, '(a, f17.15)') " = ", (4.0_wp * k) / n
41 | 
42 |     call system_clock(t2)
43 |     write(*,'(a, f6.3, a)') "Execution time: ", (t2 - t1) / count_rate, " s"
44 |     write(*,'(a)') "---------------------------------------------------"
45 | end program pi_monte_carlo_serial
46 | 


--------------------------------------------------------------------------------