├── 2024-ΑΙ.pdf ├── OpenMP-set-2019.pdf ├── OpenMP-set-2020.pdf ├── README.md ├── _config.yml ├── hello-world-parallel-id-func.c ├── hello-world-parallel-id-scope.c ├── hello-world-parallel-id.c ├── hello-world-parallel.c ├── laplace2D.c ├── pi_mc ├── pi_mc.c ├── pi_mc.ipynb ├── pi_mc.jl ├── pi_mc.py ├── pi_mc_julia.ipynb ├── pi_mc_python_multi-convergence.ipynb └── pi_mc_python_multi.ipynb ├── poisson-SOR.c ├── table-add1-combined.c ├── table-add1-manual.c ├── table-add1-wrong.c ├── table-add1.c ├── table-implicit-notpar.c ├── table-sum-wrong.c └── table-sum.c /2024-ΑΙ.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/niksterg/openmp-course/c6a8e1288fbd44a0f8a0a11f356a0bc9b880c0e1/2024-ΑΙ.pdf -------------------------------------------------------------------------------- /OpenMP-set-2019.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/niksterg/openmp-course/c6a8e1288fbd44a0f8a0a11f356a0bc9b880c0e1/OpenMP-set-2019.pdf -------------------------------------------------------------------------------- /OpenMP-set-2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/niksterg/openmp-course/c6a8e1288fbd44a0f8a0a11f356a0bc9b880c0e1/OpenMP-set-2020.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## OpenMP Examples 2 | ### Prof. N. Stergioulas 3 | ### Aristotle University of Thessaloniki 4 | 5 | 6 | ### Example programs 7 | 8 | The following example programs introduce the main concepts of OpenMP step by step. 9 | 10 | 11 | 1. [**hello-world-parallel.c**](https://github.com/niksterg/openmp-course/blob/master/hello-world-parallel.c) (the most basic hello world executed in parallel) 12 | 13 | 2. [**hello-world-parallel-id.c**](https://github.com/niksterg/openmp-course/blob/master/hello-world-parallel-id.c) (similar to 1. but the id of each thread and the total number of threads are also printed) 14 | 15 | 3. [**hello-world-parallel-id-func.c**](https://github.com/niksterg/openmp-course/blob/master/hello-world-parallel-id-func.c) (same result as 2. but a function is called to print inside the parallel region) 16 | 17 | 4. [**hello-world-parallel-id-scope.c**](https://github.com/niksterg/openmp-course/blob/master/hello-world-parallel-id-scope.c) (same result as 2. but data scope is used in the parallel region) 18 | 19 | 5. [**table-add1-manual.c**](https://github.com/niksterg/openmp-course/blob/master/table-add1-manual.c) (add a number to each element of a table, using manual parallelization) 20 | 21 | 6. [**table-add1.c**](https://github.com/niksterg/openmp-course/blob/master/table-add1.c) (same result as 5. but with automatic work scheduling) 22 | 23 | 7. [**table-add1-combined.c**](https://github.com/niksterg/openmp-course/blob/master/table-add1-combined.c) (same result as 6. but with combined parallel region and for construct) 24 | 25 | 8. [**table-add1-wrong.c**](https://github.com/niksterg/openmp-course/blob/master/table-add1-wrong.c) (similar to 6. and 7. but giving wrong answer - find the error in the code!) 26 | 27 | 9. [**table-implicit-notpar.c**](https://github.com/niksterg/openmp-course/blob/master/table-implicit-notpar.c) (parallel execution gives wrong answer - find out why!) 28 | 29 | 10. [**table-sum.c**](https://github.com/niksterg/openmp-course/blob/master/table-sum.c) (computing the sum of all elements in a table) 30 | 31 | 11. [**table-sum-wrong.c**](https://github.com/niksterg/openmp-course/blob/master/table-sum-wrong.c) (similar to 10. but gives wrong answer - find the error in the code!) 32 | 33 | ### Exercises 34 | 35 | [**2020 Homework on 2D wave equation**](https://github.com/niksterg/openmp-course/blob/master/OpenMP-set-2020.pdf) 36 | 37 | [**2019 Homework on Poisson solvers**](https://github.com/niksterg/openmp-course/blob/master/OpenMP-set-2019.pdf) (see [this figure](https://www.researchgate.net/profile/Chhote-Shah/publication/336512640/figure/fig2/AS:922552943792128@1596965170203/Red-Black-ordering-technique-and-implementation-of-the-SOR-algorithm-a-updating-the.png)) 38 | 39 | [**poisson-SOR.c**](https://github.com/niksterg/openmp-course/blob/master/poisson-SOR.c) Example: ./poisson-SOR -N 400 -M 400 -a 1e-6 -o 1.9 40 | 41 | 42 | **2023/24 Homework:** 43 | 44 | **Set #1:** 45 | - Run the examples 1 - 10. 46 | - Find and correct the mistakes in examples 8, 9 and 11. 47 | 48 | **Set #2:** 49 | 50 | - Write a code in C/C++ that calculates π using a Monte-Carlo method. Use up to 10^10 points. 51 | - Parallelize the code using OpenMP. Run the code using 1, 2, 4, etc. threads (up to twice the physical number of cores, try at least up to 8). 52 | - Create a Jupyter python notebook within which you automatically run the C/C++ code using different number of cores (use a list) and plot the execution time as a function of the number of cores. Alternatively, run the C/C++ code using a script, save the results in a file and load them into a Jupyter notebook to make the plot. 53 | - In the same notebook, create a second plot of parallel speedup vs number of cores. 54 | - Use [Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law) to fit the resulting curve and find the proportion p of the code that benefits from parallelization and the maximum possible speedup in the limit of 10000 cores. (You can do the fit with e.g. spipy's curvefit.) 55 | - Run the code for 10^2, 10^3, 10^4, ... , 10^11 points and, in a second notebook, calculate the convergence rate of your Monte-Carlo implementation. You can do this if you fit a line in a log-log- plot of the error in calculating π vs. the number of points. Try to find what the theoretical expectation is and compare your result to it. 56 | 57 | Use logarithmic scale wherever the numerical values change by orders of magnitude! 58 | 59 | (Bonus track: repeat with Python+Numba and/or Julia) 60 | 61 | **2024/25** 62 | 63 | Example code: [2D Laplace equation](https://github.com/niksterg/openmp-course/blob/master/laplace2D.c) with OpenMP 64 | 65 | Example codes: [Calculate pi using MCMC, with OpenMP, but also in python and Julia](https://github.com/niksterg/openmp-course/tree/master/pi_mc) 66 | 67 | Example code: [Multi-dimensional regression using ANN](https://www.kaggle.com/code/nikolaosstergioulas/house-prices-ann) 68 | 69 | **2024/25 Homework (due Feb. 28):** 70 | 71 | **Set #1:** 72 | - Run the examples 1 - 10. 73 | - Find and correct the mistakes in examples 8, 9 and 11. 74 | 75 | **Set #2:** 76 | 77 | OpenMP: Decide on your own problem to parallelize with OpenMP and construct a similar report as for Set #2 of 2023/24, following the guidelines in that problem set. 78 | 79 | **Set #3:** 80 | 81 | Select a multidimensional regression problem (real data or synthetic data that you create) and write an ANN to predict y(X). Perform many trainings varying the hyperparameters and write a report with your findings. You can use data from Kaggle or from other sources or from your own experiments or create your own synthetic data. Submit a) your codes, b) link to the data c) a detailed report of your findings. 82 | 83 | ### Tutorials 84 | 85 | 1. [Tutorial by N. Trifonidis (part 1)](http://www.astro.auth.gr/~niksterg/courses/progtools/1-OpenMP-tutorial.pdf) 86 | 2. [Tutorial by N. Trifonidis (part 2)](http://www.astro.auth.gr/~niksterg/courses/progtools/2-OpenMP-tutorial.pdf) 87 | 3. [A brief introduction by A. Kiessling](http://www.roe.ac.uk/ifa/postgrad/pedagogy/2009_kiessling.pdf) 88 | 4. [Tutorial by S.C. Huang](https://idre.ucla.edu/sites/default/files/intro-openmp-2013-02-11.pdf) 89 | 5. [Tutorial by Texas A&M](https://people.math.umass.edu/~johnston/PHI_WG_2014/OpenMPSlides_tamu_sc.pdf) 90 | 6. [Tutorial by T. Mattson and L. Meadows](http://www.openmp.org/wp-content/uploads/omp-hands-on-SC08.pdf) 91 | 7. [Tutorial by Y. W. Li (includes Vtune examples)](https://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-20-23416) 92 | 8. [Online tutorial by B. Barney](https://computing.llnl.gov/tutorials/openMP/) 93 | 9. [Online tutorial by Y. Yliluoma](https://bisqwit.iki.fi/story/howto/openmp/) 94 | 10. [Online list of potential mistakes](https://www.viva64.com/en/a/0054/) 95 | 11. [Video tutorial by C. Terboven (part 1)](https://www.youtube.com/watch?v=6FMn7M5jxrM) 96 | 12. [Video tutorial by C. Terboven (part 2)](https://www.youtube.com/watch?v=Whq28OaPW08) 97 | 13. [Video channel by PPCES](https://www.youtube.com/channel/UCtdrEoe46tD2IvJJRs_JH1A) 98 | 14. [Additional resources](https://www.openmp.org/resources/tutorials-articles/) 99 | 15. [OpenMP 3.1 Quick Reference Card](https://www.openmp.org//wp-content/uploads/OpenMP3.1-CCard.pdf) 100 | 101 | 102 | 103 | ### License 104 | 105 | ##### Content provided under a Creative Commons Attribution license, [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/); code under [MIT License](https://opensource.org/licenses/MIT). (c)2018 [Nikolaos Stergioulas](http://www.astro.auth.gr/~niksterg/) 106 | 107 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman -------------------------------------------------------------------------------- /hello-world-parallel-id-func.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | void printmessage(int myid) 5 | { 6 | printf("Thread %d says: Hello World!\n",myid); 7 | } 8 | 9 | int main(void) { 10 | 11 | int nThreads; 12 | 13 | #pragma omp parallel 14 | { 15 | int myid = omp_get_thread_num(); 16 | 17 | printmessage(myid); 18 | 19 | if(myid==0) nThreads = omp_get_num_threads(); 20 | } 21 | 22 | printf("The total number of threads used was %d.\n",nThreads); 23 | 24 | return 0; 25 | } 26 | -------------------------------------------------------------------------------- /hello-world-parallel-id-scope.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | int main(void) { 5 | 6 | int myid, 7 | nThreads; 8 | 9 | #pragma omp parallel shared(nThreads) private(myid) default(none) 10 | { 11 | myid = omp_get_thread_num(); 12 | 13 | printf("Thread %d says: Hello World!\n",myid); 14 | 15 | if(myid==0) nThreads = omp_get_num_threads(); 16 | } 17 | 18 | printf("The total number of threads used was %d.\n",nThreads); 19 | 20 | return 0; 21 | } 22 | -------------------------------------------------------------------------------- /hello-world-parallel-id.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | int main(void) { 5 | 6 | int nThreads; 7 | 8 | #pragma omp parallel 9 | { 10 | int myid = omp_get_thread_num(); 11 | printf("Thread %d says: Hello World!\n",myid); 12 | 13 | if(myid == 0) nThreads = omp_get_num_threads(); 14 | } 15 | 16 | printf("The total number of threads used was %d.\n",nThreads); 17 | 18 | return 0; 19 | } 20 | -------------------------------------------------------------------------------- /hello-world-parallel.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | int main(void) { 5 | 6 | #pragma omp parallel 7 | { 8 | printf("Hello World!\n"); 9 | } 10 | 11 | return 0; 12 | 13 | } 14 | -------------------------------------------------------------------------------- /laplace2D.c: -------------------------------------------------------------------------------- 1 | /* laplace.c: Solving Laplace's equation in 2D by Jacobi iteration using OpenMP. 2 | * 3 | * ∇²u(x,y) = 0 on a rectangular domain. 4 | * 5 | * We set up a grid (nx x ny) with Dirichlet boundary conditions (here we fix 6 | * the top boundary to 1.0 and the other boundaries to 0.0). Then we iteratively 7 | * update the interior points with the average of their four neighbors. 8 | * 9 | * To compile: 10 | * gcc -O3 -fopenmp laplace.c -o laplace 11 | * 12 | * To run (example with grid size 10000x10000): 13 | * ./laplace 10000 10000 14 | * 15 | * You can control the number of threads via the environment variable OMP_NUM_THREADS. 16 | * 17 | * This code is intended as a demonstration for MSc computational physics courses. 18 | */ 19 | 20 | #include 21 | #include 22 | #include 23 | #include 24 | 25 | int main(int argc, char *argv[]) { 26 | // Grid dimensions (default: 1000 x 1000; increase for larger memory usage / longer runtime) 27 | int nx = 1000; 28 | int ny = 1000; 29 | if (argc >= 3) { 30 | nx = atoi(argv[1]); 31 | ny = atoi(argv[2]); 32 | } 33 | 34 | // Total number of grid points. 35 | int size = nx * ny; 36 | 37 | // Allocate two 1D arrays to represent the 2D grid (row-major order). 38 | double *u = (double*) malloc(sizeof(double) * size); 39 | double *u_new = (double*) malloc(sizeof(double) * size); 40 | if (!u || !u_new) { 41 | fprintf(stderr, "Error allocating memory.\n"); 42 | return 1; 43 | } 44 | 45 | // Initialize arrays: interior points start at 0.0. 46 | for (int i = 0; i < size; i++){ 47 | u[i] = 0.0; 48 | u_new[i] = 0.0; 49 | } 50 | 51 | // Set boundary conditions. 52 | // Here, we fix the top boundary (last row) to 1.0 and all others to 0.0. 53 | for (int i = 0; i < nx; i++){ 54 | int idx = i * ny + (ny - 1); // index for top boundary in each column 55 | u[idx] = 1.0; 56 | u_new[idx] = 1.0; 57 | } 58 | 59 | // Convergence parameters. 60 | double tol = 1e-6; // convergence tolerance 61 | int max_iter = 1000000; // maximum number of iterations 62 | int iter = 0; 63 | double diff = 0.0; // will hold maximum difference between iterations 64 | 65 | // Timing the computation 66 | double start_time = omp_get_wtime(); 67 | 68 | // Main Jacobi iteration loop. 69 | while (iter < max_iter) { 70 | diff = 0.0; 71 | 72 | // Update the interior points. 73 | // Note: The update at each point depends only on values from the previous iteration, 74 | // so we can parallelize the two nested loops using OpenMP. 75 | #pragma omp parallel for reduction(max:diff) schedule(static) 76 | for (int i = 1; i < nx - 1; i++) { 77 | for (int j = 1; j < ny - 1; j++) { 78 | int idx = i * ny + j; 79 | int idx_up = (i - 1) * ny + j; 80 | int idx_down = (i + 1) * ny + j; 81 | int idx_left = i * ny + (j - 1); 82 | int idx_right = i * ny + (j + 1); 83 | 84 | // Compute the new value as the average of the four neighbors. 85 | u_new[idx] = 0.25 * (u[idx_up] + u[idx_down] + u[idx_left] + u[idx_right]); 86 | 87 | // Compute the local difference. 88 | double local_diff = fabs(u_new[idx] - u[idx]); 89 | if (local_diff > diff) 90 | diff = local_diff; 91 | } 92 | } 93 | 94 | // Copy new values back into u (this loop can also be parallelized). 95 | #pragma omp parallel for schedule(static) 96 | for (int i = 1; i < nx - 1; i++) { 97 | for (int j = 1; j < ny - 1; j++) { 98 | int idx = i * ny + j; 99 | u[idx] = u_new[idx]; 100 | } 101 | } 102 | 103 | iter++; 104 | if (iter % 100 == 0) { 105 | printf("Iteration %d: max diff = %e\n", iter, diff); 106 | } 107 | if (diff < tol) 108 | break; 109 | } 110 | 111 | double end_time = omp_get_wtime(); 112 | printf("Converged after %d iterations with max diff = %e\n", iter, diff); 113 | printf("Total runtime = %f seconds\n", end_time - start_time); 114 | 115 | // Clean up. 116 | free(u); 117 | free(u_new); 118 | 119 | return 0; 120 | } 121 | -------------------------------------------------------------------------------- /pi_mc/pi_mc.c: -------------------------------------------------------------------------------- 1 | // pi_mc.c 2 | // 3 | // This program calculates the value of pi using the Monte Carlo method. 4 | // It uses OpenMP for parallelization. 5 | // The number of threads is set by the environment variable OMP_NUM_THREADS. 6 | // The code is written for unsigned long int, which is 64 bits on a 64-bit machine. 7 | // The random number generator is rand_long, which is thread-safe and works with unsigned long int. 8 | // The random number generator is seeded with the time in seconds times the thread number. 9 | // 10 | // Nikolaos Stergioulas, Aristotle University of Thessaloniki 11 | // 12 | // Content provided under a Creative Commons Attribution license, CC BY-NC-SA 4.0; code under GNU GPLv3 License. 13 | // (c)2024 Nikolaos Stergioulas 14 | 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | 22 | 23 | unsigned long int rand_long(unsigned int *seed) { 24 | 25 | // This function defines a random number generator that returns an unsigned long int 26 | // between 0 and UINT_MAX*UINT_MAX, given a seed. It calls rand_r, which produces two 27 | // random numbers between 0 and RMAX and combines them into a single random number between 28 | // RAND_MAX << 32 | RAND_MAX. 29 | 30 | long int result = rand_r(seed); 31 | result <<= 32; 32 | result |= rand_r(seed); 33 | return result; 34 | } 35 | 36 | int main(int argc, char *argv[]) { 37 | 38 | int opt; // Command line option 39 | 40 | long int count = 0; 41 | 42 | long int i, // Loop counter 43 | N = 1000000000; // How many random numbers to generate (default value) 44 | 45 | double x, // Random number 46 | y, // Random number 47 | pi, // Estimate of pi 48 | scale_factor; // Scale factor for random numbers 49 | 50 | unsigned int seed; // Declare the seed variable 51 | 52 | unsigned long int unsigned_long_scale_max; // Maximum value of unsigned long int 53 | 54 | // Parse the command line arguments to set how many random numbers to generate 55 | while ((opt = getopt(argc, argv, "n:")) != -1) { 56 | switch (opt) { 57 | case 'n': 58 | N = atol(optarg); 59 | break; 60 | default: 61 | fprintf(stderr, "Usage: %s [-n N]\n", argv[0]); 62 | exit(EXIT_FAILURE); 63 | } 64 | } 65 | 66 | 67 | // This is the maximum value of unsigned long int produced by rand_long 68 | unsigned_long_scale_max = ((unsigned long int)RAND_MAX << 32) | RAND_MAX; 69 | 70 | scale_factor = 1.0/ unsigned_long_scale_max; // Scale the random numbers to the range [0, 1] 71 | 72 | #pragma omp parallel private(x, y, seed) reduction(+:count) 73 | { 74 | seed = time(NULL) * omp_get_thread_num(); // Initialize private seed for each thread 75 | 76 | #pragma omp for // Loop over the number of random numbers to generate 77 | for(i = 0; i < N; i++) { 78 | x = rand_long(&seed) *scale_factor * 2.0 - 1.0; // Generate a random number between -1 and 1 79 | y = rand_long(&seed) *scale_factor* 2.0 - 1.0; // Generate a random number between -1 and 1 80 | 81 | if(x * x + y * y <= 1.0) { // If the random number is in the unit circle 82 | count++; // increment the count 83 | } 84 | } 85 | } 86 | 87 | pi = 4.0 * count / N; // Calculate the estimate of pi 88 | 89 | printf("Estimated value of Pi = %f\n", pi); 90 | 91 | return 0; 92 | } -------------------------------------------------------------------------------- /pi_mc/pi_mc.jl: -------------------------------------------------------------------------------- 1 | # pi_mc.jl 2 | # 3 | # This program calculates the value of pi using the Monte Carlo method. 4 | # It uses Julia's thread parallelization. 5 | # The number of random trials is set by command-line argument -n. 6 | # 7 | # Nikolaos Stergioulas, Aristotle University of Thessaloniki 8 | # 9 | # Content provided under a Creative Commons Attribution license, CC BY-NC-SA 4.0; code under GNU GPLv3 License. 10 | # (c)2024 Nikolaos Stergioulas 11 | 12 | # in the Julia REPL install the ArgParse package with 13 | # import Pkg; Pkg.add("ArgParse") 14 | 15 | using ArgParse 16 | using Random 17 | 18 | function count_hits(random_trials::Int, seed::Int) 19 | rng = MersenneTwister(seed) 20 | count = 0 21 | for _ in 1:random_trials 22 | # Julia generates random numbers in the interval [0,1) 23 | # to generate random numbers in the interval [-1,1) we use 24 | # 2*(rand(rng) - 0.5), which means that we generate random 25 | # numbers in the interval [-0.5,0.5) and then we multiply 26 | # by 2 and subtract 1. 27 | x, y = 2*(rand(rng) - 0.5), 2*(rand(rng) - 0.5) 28 | if x * x + y * y <= 1.0 29 | count += 1 30 | end 31 | end 32 | return count 33 | end 34 | 35 | # function count_hits(random_trials::Int, seed::Int) 36 | # rng = MersenneTwister(seed) 37 | # xs = 2.0 * (rand(rng, random_trials) .- 0.5) 38 | # ys = 2.0 * (rand(rng, random_trials) .- 0.5) 39 | # count = 0 40 | # @inbounds for i in 1:random_trials 41 | # if xs[i]^2 + ys[i]^2 <= 1.0 42 | # count += 1 43 | # end 44 | # end 45 | # return count 46 | # end 47 | 48 | function main() 49 | s = ArgParseSettings() 50 | @add_arg_table! s begin 51 | "--random_trials", "-n" 52 | arg_type = Int 53 | required = true 54 | help = "number of random trials" 55 | "--seed", "-s" 56 | arg_type = Int 57 | default = trunc(Int, time()) 58 | help = "random seed" 59 | "--threads", "-p" 60 | arg_type = Int 61 | default = 1 62 | help = "number of threads" 63 | end 64 | 65 | parsed_args = parse_args(ARGS, s) 66 | random_trials = parsed_args["random_trials"] 67 | seed = parsed_args["seed"] 68 | threads = parsed_args["threads"] 69 | 70 | # Set the number of threads 71 | ENV["JULIA_NUM_THREADS"] = threads 72 | 73 | trials_per_thread = random_trials ÷ threads 74 | 75 | counts = zeros(Int, threads) 76 | Threads.@threads for i in 1:threads 77 | counts[i] = count_hits(trials_per_thread, seed + i) 78 | end 79 | 80 | count = sum(counts) 81 | 82 | pi_estimate = 4.0 * count / random_trials 83 | println("Pi estimate = ", pi_estimate) 84 | end 85 | 86 | main() -------------------------------------------------------------------------------- /pi_mc/pi_mc.py: -------------------------------------------------------------------------------- 1 | # pi_mc.py 2 | # 3 | # This program calculates the value of pi using the Monte Carlo method. 4 | # It uses Python multiprocessing for parallelization. 5 | # The number of threads is set by command-line argument -p. 6 | # The number of random trials is set by command-line argument -n. 7 | # 8 | # Nikolaos Stergioulas, Aristotle University of Thessaloniki 9 | # 10 | # Content provided under a Creative Commons Attribution license, CC BY-NC-SA 4.0; code under GNU GPLv3 License. 11 | # (c)2024 Nikolaos Stergioulas 12 | 13 | 14 | import argparse 15 | import random 16 | import multiprocessing 17 | import time 18 | from numba import njit 19 | from numba.typed import List 20 | from numba import prange 21 | from numba import jit 22 | import argparse 23 | import numpy as np 24 | 25 | 26 | @njit(fastmath=False) 27 | def count_hits(random_trials, seed=None): 28 | random.seed(seed) 29 | count = 0 30 | for _ in range(random_trials): 31 | x, y = random.uniform(-1.0, 1.0), random.uniform(-1.0, 1.0) 32 | if x * x + y * y <= 1.0: 33 | count += 1 34 | return count 35 | 36 | # @jit(nopython=True, parallel=True) 37 | # def count_hits(random_trials, seed=None): 38 | # np.random.seed(seed) 39 | # count = 0 40 | # for _ in prange(random_trials): 41 | # x, y = np.random.uniform(-1.0, 1.0), np.random.uniform(-1.0, 1.0) 42 | # if x * x + y * y <= 1.0: 43 | # count += 1 44 | # return count 45 | 46 | 47 | if __name__ == "__main__": 48 | 49 | parser = argparse.ArgumentParser(description='Calculate Pi using Monte Carlo method.') 50 | parser.add_argument('-p', '--processes', type=int, default=multiprocessing.cpu_count(), 51 | help='Number of processes to use') 52 | parser.add_argument('-n', '--N', type=int, default=100000000, 53 | help='Number of random trials') 54 | args = parser.parse_args() 55 | 56 | num_processes = args.processes #multiprocessing.cpu_count() 57 | 58 | N = args.N 59 | 60 | pool = multiprocessing.Pool(processes=num_processes) 61 | 62 | start_time = time.time() 63 | 64 | random_trials_per_process = N // num_processes 65 | 66 | #counts = pool.map(count_hits, [random_trials_per_process] * num_processes) 67 | 68 | #seeds = [i for i in range(num_processes)] 69 | #seeds = [random.randint(0, 1e9) for _ in range(num_processes)] 70 | seeds = [int(time.time()) + i for i in range(num_processes)] 71 | 72 | counts = pool.starmap(count_hits, [(random_trials_per_process, seed) for seed in seeds]) 73 | 74 | pi_estimate = 4 * float(sum(counts)) / float(N) 75 | 76 | end_time = time.time() 77 | print(f"Estimated value of Pi = {pi_estimate:.12f}") -------------------------------------------------------------------------------- /pi_mc/pi_mc_python_multi-convergence.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 37, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "Estimated value of Pi = 2.960000000000\n", 13 | "\n", 14 | "Estimated value of Pi = 3.116000000000\n", 15 | "\n", 16 | "Estimated value of Pi = 3.157600000000\n", 17 | "\n", 18 | "Estimated value of Pi = 3.150760000000\n", 19 | "\n", 20 | "Estimated value of Pi = 3.140860000000\n", 21 | "\n", 22 | "Estimated value of Pi = 3.142372800000\n", 23 | "\n", 24 | "Estimated value of Pi = 3.141508280000\n", 25 | "\n", 26 | "Estimated value of Pi = 3.141616232000\n", 27 | "\n", 28 | "Estimated value of Pi = 3.141586098400\n", 29 | "\n", 30 | "Estimated value of Pi = 3.141589239680\n", 31 | "\n", 32 | "Estimated value of Pi = 3.141591983784\n", 33 | "\n" 34 | ] 35 | } 36 | ], 37 | "source": [ 38 | "import subprocess\n", 39 | "import os\n", 40 | "import time\n", 41 | "import matplotlib.pyplot as plt\n", 42 | "import numpy as np\n", 43 | "\n", 44 | "# Set the OMP_NUM_THREADS environment variable\n", 45 | "num_threads = 32\n", 46 | "os.environ[\"OMP_NUM_THREADS\"] = str(num_threads)\n", 47 | "\n", 48 | "Nmin = 100\n", 49 | "Nmax = 10**12\n", 50 | "\n", 51 | "num_list = []\n", 52 | "value = Nmin\n", 53 | "\n", 54 | "while value <= Nmax:\n", 55 | " num_list.append(value)\n", 56 | " value *= 10\n", 57 | "\n", 58 | "# List to store execution times\n", 59 | "execution_times = []\n", 60 | "output_list = []\n", 61 | "\n", 62 | "for N in num_list:\n", 63 | "\n", 64 | " # Run the Python code and capture the output\n", 65 | " start_time = time.time()\n", 66 | " output = subprocess.check_output([\"python\", \"pi_mc.py\", \"-p\", str(num_threads), \"-n\", str(N)], universal_newlines=True)\n", 67 | "\n", 68 | " # Append the output to the list\n", 69 | " output_list.append(output)\n", 70 | " print(output)\n", 71 | "\n", 72 | " end_time = time.time()\n", 73 | "\n", 74 | " # Calculate and store the execution time\n", 75 | " execution_time = end_time - start_time\n", 76 | " execution_times.append(execution_time)\n" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 38, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "name": "stdout", 86 | "output_type": "stream", 87 | "text": [ 88 | "[2.96, 3.116, 3.1576, 3.15076, 3.14086, 3.1423728, 3.14150828, 3.141616232, 3.1415860984, 3.14158923968, 3.141591983784]\n" 89 | ] 90 | } 91 | ], 92 | "source": [ 93 | "numerical_values = [float(s.split('=')[1]) for s in output_list]\n", 94 | "print(numerical_values)" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 46, 100 | "metadata": {}, 101 | "outputs": [ 102 | { 103 | "name": "stdout", 104 | "output_type": "stream", 105 | "text": [ 106 | " Number of threads Pi\n", 107 | " 100 2.9600000000000000\n", 108 | " 1000 3.1160000000000001\n", 109 | " 10000 3.1576000000000000\n", 110 | " 100000 3.1507600000000000\n", 111 | " 1000000 3.1408600000000000\n", 112 | " 10000000 3.1423728000000000\n", 113 | " 100000000 3.1415082800000000\n", 114 | " 1000000000 3.1416162320000001\n", 115 | " 10000000000 3.1415860983999999\n", 116 | " 100000000000 3.1415892396800000\n", 117 | " 1000000000000 3.1415919837840001\n" 118 | ] 119 | } 120 | ], 121 | "source": [ 122 | "import pandas as pd\n", 123 | "\n", 124 | "# Create a DataFrame from the list of lists\n", 125 | "table_data = []\n", 126 | "for i in range(len(num_list)):\n", 127 | " table_data.append([num_list[i], numerical_values[i]])\n", 128 | "\n", 129 | "df = pd.DataFrame(table_data, columns=['Number of threads', 'Pi'])\n", 130 | "\n", 131 | "# Set the option to display 15 digits accuracy\n", 132 | "pd.set_option('display.precision', 16)\n", 133 | "\n", 134 | "# Print the table\n", 135 | "print(df.to_string(index=False))\n" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 40, 141 | "metadata": {}, 142 | "outputs": [ 143 | { 144 | "name": "stdout", 145 | "output_type": "stream", 146 | "text": [ 147 | "[-0.18159265358979315, -0.025592653589793013, 0.016007346410206846, 0.009167346410206889, -0.0007326535897931308, 0.00078014641020685, -8.437358979307419e-05, 2.3578410206948064e-05, -6.5551897932003556e-06, -3.4139097930818707e-06, -6.698057930520918e-07]\n" 148 | ] 149 | } 150 | ], 151 | "source": [ 152 | "difference = [value - np.pi for value in numerical_values]\n", 153 | "print(difference)\n" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 44, 159 | "metadata": {}, 160 | "outputs": [ 161 | { 162 | "data": { 163 | "image/png": "", 164 | "text/plain": [ 165 | "
" 166 | ] 167 | }, 168 | "metadata": {}, 169 | "output_type": "display_data" 170 | } 171 | ], 172 | "source": [ 173 | "plt.plot(num_list, np.abs(difference), marker='o')\n", 174 | "plt.xscale('log') # Set the x-axis scale to logarithmic\n", 175 | "plt.yscale('log') # Set the y-axis scale to logarithmic\n", 176 | "plt.xlabel('N')\n", 177 | "plt.ylabel('Difference')\n", 178 | "plt.title('Difference vs N')\n", 179 | "plt.grid(True)\n", 180 | "\n", 181 | "# Calculate the coefficients of the line\n", 182 | "coefficients = np.polyfit(np.log10(num_list), np.log10(np.abs(difference)), 1)\n", 183 | "slope = coefficients[0]\n", 184 | "intercept = coefficients[1]\n", 185 | "\n", 186 | "# Create the x and y values for the line\n", 187 | "x_line = np.logspace(np.log10(min(num_list)), np.log10(max(num_list)), 100)\n", 188 | "y_line = 10**(slope * np.log10(x_line) + intercept)\n", 189 | "\n", 190 | "# Plot the line\n", 191 | "plt.plot(x_line, y_line, color='red', label='fit')\n", 192 | "\n", 193 | "plt.legend()\n", 194 | "plt.show()" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 45, 200 | "metadata": {}, 201 | "outputs": [ 202 | { 203 | "name": "stdout", 204 | "output_type": "stream", 205 | "text": [ 206 | "-0.5358777509670519\n" 207 | ] 208 | } 209 | ], 210 | "source": [ 211 | "convergence_rate = slope\n", 212 | "print(convergence_rate)\n" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "The result confirms the anticipate convergence rate of Monte Carlo\n", 220 | "$$ {\\rm error} \\sim O(N^{-1/2})$$" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [] 227 | } 228 | ], 229 | "metadata": { 230 | "kernelspec": { 231 | "display_name": "igwn310", 232 | "language": "python", 233 | "name": "python3" 234 | }, 235 | "language_info": { 236 | "codemirror_mode": { 237 | "name": "ipython", 238 | "version": 3 239 | }, 240 | "file_extension": ".py", 241 | "mimetype": "text/x-python", 242 | "name": "python", 243 | "nbconvert_exporter": "python", 244 | "pygments_lexer": "ipython3", 245 | "version": "3.10.10" 246 | } 247 | }, 248 | "nbformat": 4, 249 | "nbformat_minor": 2 250 | } 251 | -------------------------------------------------------------------------------- /poisson-SOR.c: -------------------------------------------------------------------------------- 1 | /***************************************************************************** 2 | * 3 | * Poisson solver for red-black SOR method in OpenMP 4 | * 5 | * Author: Nikolaos Stergioulas, Aristotle University of Thessaloniki 6 | * 7 | * Content provided under a Creative Commons Attribution license, 8 | * CC BY-NC-SA 4.0; code under GNU GPLv3 License. (c)2020 Nikolaos Stergioulas 9 | *****************************************************************************/ 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | 16 | int main(int argc, 17 | char **argv) { 18 | 19 | int i, j, k=0, 20 | shift=0, 21 | *shift0, 22 | *shift1, 23 | N=200, /* default value */ 24 | M=200; /* default value */ 25 | 26 | double 27 | **uold, 28 | **unew, 29 | *x, 30 | *y, 31 | test = 0.0, 32 | oldavg, 33 | newavg, 34 | dx, 35 | dy, 36 | omega=1.9, /* default value */ 37 | accuracy = 1e-4, /* default value */ 38 | fTimeStart, 39 | fTimeEnd; 40 | 41 | // Record start time 42 | 43 | fTimeStart = omp_get_wtime(); 44 | 45 | // Read grid size, accuracy and omega from command line 46 | 47 | for(i=1;i accuracy); 180 | 181 | } 182 | 183 | // Free memory 184 | 185 | for (i = 0; i < N; i++){ 186 | free(unew[i] ); 187 | } 188 | free(unew); 189 | 190 | for (i = 0; i < N; i++){ 191 | free(uold[i] ); 192 | } 193 | free(uold); 194 | 195 | free(shift0); 196 | free(shift1); 197 | free(x); 198 | free(y); 199 | 200 | // Record end time 201 | 202 | fTimeEnd = omp_get_wtime(); 203 | 204 | // Print elapsed time 205 | 206 | printf("wall clock time = %.20f\n", fTimeEnd - fTimeStart); 207 | 208 | return 0; 209 | } 210 | -------------------------------------------------------------------------------- /table-add1-combined.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #define N 12 5 | 6 | int main(void) { 7 | 8 | int i, 9 | A[N]; 10 | 11 | for(i=0;i 2 | #include 3 | 4 | #define N 12 5 | 6 | int main(void) { 7 | 8 | int myid, 9 | nThreads, 10 | i, 11 | iStart, 12 | iEnd, 13 | A[N]; 14 | 15 | for(i=0;i 2 | #include 3 | 4 | #define N 12 5 | 6 | int main(void) { 7 | 8 | int i, 9 | A[N]; 10 | 11 | for(i=0;i 2 | #include 3 | 4 | #define N 12 5 | 6 | int main(void) { 7 | 8 | int i, 9 | A[N]; 10 | 11 | for(i=0;i 2 | #include 3 | 4 | #define N 12 5 | 6 | int main(void) { 7 | 8 | int i, 9 | A[N]; 10 | 11 | for(i=0;i 2 | #include 3 | 4 | #define N 10 5 | 6 | int main(void) { 7 | 8 | int i, 9 | sum, 10 | a[N]; 11 | 12 | for(i=0;i 2 | #include 3 | 4 | #define N 10 5 | 6 | int main(void) { 7 | 8 | int i, 9 | sum, 10 | a[N]; 11 | 12 | for(i=0;i