├── hpc.png └── README.md /hpc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/trevor-vincent/awesome-high-performance-computing/HEAD/hpc.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

4 |

5 | A curated list of awesome high performance computing resources. 6 |

7 |

8 | 9 | 10 | 11 |

12 | 13 | ## Table of Contents 14 | 15 | - [General Info](#general-info) 16 | - [Software](#software) 17 | - [Hardware](#hardware) 18 | - [People](#people) 19 | - [Resources](#resources) 20 | - [Other Curated Lists](#other-curated-lists) 21 | - [Acknowledgements](#acknowledgements ) 22 | 23 | ## General Info 24 | 25 | ### Most Recent List of the Top500 Supercomputers 26 | - [Top500 (Nov. 2025)](https://www.top500.org/lists/top500/2025/11/) 27 | - [HPCG Top500 (Nov. 2025)](https://www.top500.org/lists/hpcg/2025/11/) 28 | - [Green500 (Nov. 2025)](https://www.top500.org/lists/green500/2025/11/) 29 | - [io500](https://io500.org/) 30 | 31 | ### History 32 | - [History of Supercomputing (Wikipedia)](https://en.wikipedia.org/wiki/History_of_supercomputing) 33 | - [History of Parallel Computing (Wikipedia)](https://en.wikipedia.org/wiki/Parallel_computing#History) 34 | - [History of the Top500 (Wikipedia)](https://en.wikipedia.org/wiki/TOP500) 35 | - [History of LLNL Computing](https://computing.llnl.gov/about/machine-history) 36 | - [The Supermen: The Story of Seymour Cray ... (1997)](https://www.amazon.ca/Supermen-Seymour-Technical-Wizards-Supercomputer/dp/0471048852/ref=sr_1_1?crid=1IOWC3IOYWPOP&keywords=seymour+cray&qid=1690959561&sprefix=seymour+cray%2Caps%2C88&sr=8-1) 37 | - [Unmatched - 50 Years of Supercomputing (2023)](https://www.routledge.com/Unmatched-50-Years-of-Supercomputing/Barkai/p/book/9780367479619) 38 | 39 | ### Trends 40 | - [Trends in HPC for AI workloads](https://epochai.org/trends) 41 | 42 | ## Software 43 | 44 | #### Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators 45 | - [alpaka](https://github.com/alpaka-group/alpaka) - The alpaka library is a header-only C++17 abstraction library for accelerator development 46 | - [async-rdma](https://github.com/datenlord/async-rdma) - A framework for writing RDMA applications with high-level abstraction and asynchronous APIs 47 | - [CANN](https://developer.huawei.com/consumer/en/doc/hiai-guides/introduction-0000001051486804) - Compute Architecture for Neural Networks for Huawei Ascend GPUs 48 | - [CAF](https://github.com/actor-framework/actor-framework) - An Open Source Implementation of the Actor Model in C++ 49 | - [Chapel](https://chapel-lang.org/) - A Programming Language for Productive Parallel Computing on Large-scale Systems 50 | - [Charm++](http://charm.cs.illinois.edu/research/charm) - Parallel Programming with Migratable Objects 51 | - [Cilk Plus](https://www.cilkplus.org/) - C/C++ Extension for Data and Task Parallelism 52 | - [Codon](https://github.com/exaloop/codon) - high-performance Python compiler that compiles Python code to native machine code without any runtime overhead 53 | - [CUDA](https://developer.nvidia.com/cuda-toolkit) - High performance NVIDIA GPU acceleration 54 | - [dask](https://dask.org) - Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love 55 | - [DeepSpeed](https://github.com/microsoft/DeepSpeed) - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference 56 | - [DeterminedAI](https://www.determined.ai/) - Distributed deep learning 57 | - [Dispenso](https://github.com/facebookincubator/dispenso) - Meta/facebook C++ Task Library 58 | - [FastFlow](https://github.com/fastflow/fastflow) - High-performance Parallel Patterns in C++ 59 | - [Galois](https://github.com/IntelligentSoftwareSystems/Galois) - A C++ Library to Ease Parallel Programming with Irregular Parallelism 60 | - [Halide](https://halide-lang.org/index.html#gettingstarted) - A language for fast, portable computation on images and tensors 61 | - [Heteroflow](https://github.com/Heteroflow/Heteroflow) - Concurrent CPU-GPU Task Programming using Modern C++ 62 | - [highway](https://github.com/google/highway) - Performance portable SIMD intrinsics 63 | - [HIP](https://github.com/ROCm-Developer-Tools/HIP) - HIP is a C++ Runtime API and Kernel Language for AMD/Nvidia GPU 64 | - [HPC-X](https://developer.nvidia.com/networking/hpc-x) - Nvidia implementation of MPI 65 | - [HPX](https://github.com/STEllAR-GROUP/hpx) - A C++ Standard Library for Concurrency and Parallelism 66 | - [Horovod](https://github.com/horovod/horovod) - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet 67 | - [ISPC](https://ispc.github.io/) - An open-source compiler for high-performance SIMD programming on the CPU and GPU 68 | - [Intel ISPC](https://github.com/ispc/ispc) - SPMD compiler 69 | - [Intel TBB](https://www.threadingbuildingblocks.org/) - Threading Building Blocks 70 | - [joblib](https://joblib.readthedocs.io/en/latest/why.html) - Data-flow programming for performance (python) 71 | - [Kompute](https://github.com/KomputeProject/kompute) - The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends) 72 | - [Kokkos](https://github.com/kokkos/kokkos) - A C++ Programming Model for Writing Performance Portable Applications on HPC platforms 73 | - [Kubeflow MPI Operator](https://github.com/kubeflow/mpi-operator) - MPI Operator for Kubeflow 74 | - [Legate](https://github.com/nv-legate/legate.numpy) - Nvidia replacement for numpy based on Legion 75 | - [Legion](https://github.com/StanfordLegion/legion) - Distributed heterogeneous programming library 76 | - [MAGMA](https://developer.nvidia.com/magma) - Next generation linear algebra (LA) GPU accelerated libraries 77 | - [Merlin](https://merlin.readthedocs.io/en/latest/) - A distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations 78 | - [Metal](https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu) - Apple's GPU API 79 | - [Microsoft MPI](https://docs.microsoft.com/en-us/message-passing-interface/microsoft-mpi) - Microsoft's implementation of MPI 80 | - [MOGSLib](https://github.com/ECLScheduling/MOGSLib) - User defined schedulers 81 | - [mpi4jax](https://github.com/mpi4jax/mpi4jax) - Zero-copy mpi for jax arrays 82 | - [mpi4py](https://mpi4py.readthedocs.io/en/stable/) - Python bindings for MPI 83 | - [MPI](https://www.open-mpi.org/) - OpenMPI implementation of the Message passing interface 84 | - [MPI](https://www.mpich.org/) - MPICH implementation of the Message passing interface 85 | - [MPI Standardization Forum](https://www.mpi-forum.org/) - Forum for MPI standardization 86 | - [MPAVICH](https://mvapich.cse.ohio-state.edu/) - Implementation of MPI 87 | - [NCCL](https://developer.nvidia.com/nccl) - The NVIDIA Collective Communication Library for multi-GPU and multi-node communication 88 | - [NVSHMEM](https://developer.nvidia.com/nvshmem) - GPU-accelerated implementation of the OpenSHMEM programming model developed by NVIDIA 89 | - [cuNumeric](https://developer.nvidia.com/cunumeric) - GPU drop-in for numpy 90 | - [stdpar](https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/) - GPU accelerated C++ from NVIDIA 91 | - [numba](https://numba.pydata.org/) - A JIT compiler that translates a subset of Python into fast machine code 92 | - [oneAPI](https://www.oneapi.io/) - A unified, multiarchitecture, multi-vendor programming model 93 | - [OpenACC](https://www.openacc.org/) - "OpenMP for GPUs" 94 | - [OpenCilk](https://www.opencilk.org/) - MIT continuation of Cilk Plus 95 | - [OpenMP](https://www.openmp.org/) - Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran 96 | - [OpenSHMEM](http://openshmem.org/site/About) - OpenSHMEM is a one-sided, PGAS-based parallel programming model enabling direct remote memory access for high-performance computing 97 | - [PVM](https://www.csm.ornl.gov/pvm/) - Parallel Virtual Machine: A predecessor to MPI for distributed computing 98 | - [PMIX](https://pmix.github.io/standard) - Standard for process management 99 | - [Pollux](https://github.com/polluxio/pollux-payload) - Message Passing Cloud orchestrator 100 | - [Pyfi](https://github.com/radiantone/pyfi) - Distributed flow and computation system 101 | - [Pyper](https://github.com/pyper-dev/pyper) - concurrent python made simple 102 | - [RAJA](https://github.com/LLNL/RAJA) - Architecture and programming model portability for HPC applications 103 | - [RaftLib](https://github.com/RaftLib/RaftLib) - A C++ Library for Enabling Stream and Dataflow Parallel Computation 104 | - [ray](https://www.ray.io/) - Scale AI and Python workloads from reinforcement learning to deep learning 105 | - [ROCM](https://rocmdocs.com/en/latest/) - First open-source software development platform for HPC/Hyperscale-class GPU computing 106 | - [RS MPI](https://rsmpi.github.io/rsmpi/mpi/index.html) - Rust bindings for MPI 107 | - [Scalix](https://github.com/NAGAGroup/Scalix) - Data parallel computing framework 108 | - [Simgrid](https://simgrid.org/) - Simulate cluster/HPC environments 109 | - [SkelCL](https://skelcl.github.io/) - A Skeleton Library for Heterogeneous Systems 110 | - [STAPL](https://parasol.tamu.edu/stapl/) - Standard Template Adaptive Parallel Programming Library in C++ 111 | - [STLab](http://stlab.cc/libraries/concurrency/) - High-level Constructs for Implementing Multicore Algorithms with Minimized Contention 112 | - [SYCL](https://www.khronos.org/sycl/) - C++ Abstraction layer for heterogeneous devices 113 | - [Taichi](https://github.com/taichi-dev/taichi) - Parallel programming language for high-performance numerical computations in Python 114 | - [Taskflow](https://github.com/taskflow/taskflow) - A Modern C++ Parallel Task Programming Library 115 | - [The Open Community Runtime](https://wiki.modelado.org/Open_Community_Runtime) - Specification for Asynchronous Many Task systems 116 | - [Transwarp](https://github.com/bloomen/transwarp) - A Header-only C++ Library for Task Concurrency 117 | - [Triton](https://triton-lang.org/main/index.html) - Triton is a language and compiler for parallel programming 118 | - [Tuplex](https://tuplex.cs.brown.edu/) - Blazing fast python data science 119 | - [UCX](https://github.com/openucx/ucx#using-ucx) - Optimized production proven-communication framework 120 | - [Zluda](https://github.com/vosen/ZLUDA) - Run unmodified CUDA applications with near-native performance on Intel AMD GPUs. 121 | - [HyperQueue](https://github.com/It4innovations/hyperqueue) - HyperQueue is a tool designed to simplify execution of large workflows (task graphs) on HPC clusters. 122 | 123 | #### Cluster Hardware Discovery Tools 124 | - [cpuid](https://en.wikipedia.org/wiki/CPUID) - A software instruction available on Intel, AMD, and other processors that can be used to determine processor type and features. 125 | - [cpuid instruction note](https://www.scss.tcd.ie/~jones/CS4021/processor-identification-cpuid-instruction-note.pdf) - A detailed note on the CPUID instruction used for processor identification. 126 | - [cpufetch](https://github.com/Dr-Noob/cpufetch) - A simple yet fancy CPU architecture fetching tool. 127 | - [gpufetch](https://github.com/Dr-Noob/gpufetch) - A tool similar to cpufetch, but for fetching GPU architecture. 128 | - [intel cpuinfo](https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/command-reference/cpuinfo.html) - Intel tool providing information about the characteristics of Intel CPUs. 129 | - [Likwid](https://github.com/RRZE-HPC/likwid) - Provides all information about the supercomputer/cluster. 130 | - [LIKWID.jl](https://juliaperf.github.io/LIKWID.jl/dev/) - Julia wrapper for LIKWID. 131 | - [openmpi hwloc](https://www.open-mpi.org/projects/hwloc/) - Portable Hardware Locality (hwloc) software project. 132 | - [PRK - Parallel Research Kernels](https://github.com/ParRes/Kernels) - A collection of kernels for parallel programming research. 133 | 134 | #### Cluster Management/Tools/Schedulers/Stacks 135 | - [ClusterVisor](https://www.advancedclustering.com/products/software/clustervisor-2/) - Cluster management tool by Advanced Clustering. 136 | - [BeeGFS](http://beegfs.io/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf) - A parallel file system designed for performance-critical environments. 137 | - [Bluebanquise](https://github.com/bluebanquise/bluebanquise) - An open-source cluster management tool. 138 | - [NVIDIA Base Command Manager (formerly Bright Cluster Manager)](https://docs.nvidia.com/base-command-manager/index.html) - Software for deploying and managing HPC and AI server clusters. 139 | - [Ceph](https://ceph.io/en/) - An open-source distributed storage system. 140 | - [DeepOps](https://github.com/NVIDIA/deepops) - Nvidia's GPU infrastructure and automation tools for Kubernetes and Slurm clusters. 141 | - [E4S - The Extreme Scale HPC Scientific Stack](https://e4s-project.github.io/) - A collection of open-source software packages for HPC environments. 142 | - [Easybuild](https://docs.easybuild.io/en/latest/) - A package manager for HPC/supercomputers. 143 | - [EESSI](https://www.eessi.io) - A shared stack of scientific software installations. 144 | - [Flux framework](https://flux-framework.org/) - A framework for high-performance computing clusters. 145 | - [fpsync](http://www.fpart.org/fpsync/) - A tool for fast parallel data transfer using fpart and rsync. 146 | - [GPFS](https://en.wikipedia.org/wiki/GPFS) - A high-performance parallel file system developed by IBM. 147 | - [Guix](https://hpc.guix.info/) - A package manager for HPC/supercomputers. 148 | - [Intel DAOS](https://daos.io) - A software-defined scale-out object store for HPC applications. 149 | - [LSF](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=lsf-batch-jobs-tasks) - A batch system for HPC and distributed computing environments. 150 | - [Lmod](https://lmod.readthedocs.io/en/latest/) - A Lua-based module system for software environment management on HPC systems. 151 | - [Lustre Parallel File System](https://www.lustre.org/) - A high-performance distributed filesystem for large-scale cluster computing. 152 | - [moosefs](https://moosefs.com/) - A fault-tolerant, highly available, distributed file system. 153 | - [NetApp](www.netapp.com) - Intelligent data infrastructure for various workloads. 154 | - [OKA](https://oka.how) - Analytics and reporting tool for HPC schedulers: helps administrators and users to understand how compute resources are used, and optimize their usage. 155 | - [Open Cluster Scheduler](https://github.com/hpc-gridware/clusterscheduler/) - A scalable HPC/AI workload manager based on SGE. 156 | - [OpenHPC](https://openhpc.community/) - A community-led set of HPC components. 157 | - [OpenOnDemand](https://openondemand.org/) - A web portal for accessing supercomputing resources. 158 | - [OpenPBS](https://www.openpbs.org/) - A software for workload management and job scheduling. 159 | - [OpenXdMod](https://open.xdmod.org/7.5/index.html) - A tool for managing high-performance computing resources. 160 | - [RADIUSS](https://computing.llnl.gov/projects/radiuss) - Rapid Application Development via an Institutional Universal Software Stack. 161 | - [rocks](http://www.rocksclusters.org/) - An open-source Linux cluster distribution. 162 | - [Ruse](https://github.com/JanneM/Ruse) - A tool for managing software environments in HPC clusters. 163 | - [SGE](http://star.mit.edu/cluster/docs/0.93.3/guides/sge.html) - A resource management software for large clusters of computers. 164 | - [Slurm](https://slurm.schedmd.com/overview.html) - A cluster management and job scheduling system for Linux clusters. 165 | - [Spectrum LSF](https://www.ibm.com/products/hpc-workload-management) - Workload management platform and job scheduler for distributed high performance computing (HPC) 166 | - [Spack](https://spack.io/) - A package manager for HPC/supercomputers. 167 | - [sstack](https://gitlab.com/nmsu_hpc/sstack) - A tool to install multiple software stacks such as Spack, EasyBuild, and Conda. 168 | - [Starfish](https://starfishstorage.com/) - Unstructured data management and metadata solution for files and objects. 169 | - [Warewulf](https://warewulf.lbl.gov/) - An operating system provisioning system and cluster management tool. 170 | - [Velda](https://github.com/velda-io/velda) - A modern cluster management and job scheduler, with personalizable dev-containers and scale-to-cloud capabilities. 171 | - [xCat](https://xcat.org/) - A distributed computing management and provisioning tool. 172 | - [XDMoD](https://supremm.xdmod.org/10.0/supremm-overview.html) - An open-source tool for managing high-performance computing resources. 173 | - [Globus Connect](https://www.globus.org/globus-connect) - A fast data transfer tool between supercomputers. 174 | - [Slurm Web](https://slurm-web.com/) - Open source web dashboard for Slurm HPC clusters. 175 | 176 | #### HPC-specific Operating Systems 177 | - [Kitten](https://www.sandia.gov/app/uploads/sites/210/2022/11/pedretti_lanl11.pdf) - A lightweight kernel designed for high-performance computing. It focuses on providing low noise and predictable performance for HPC applications. 178 | - [McKernel](https://github.com/RIKEN-SysSoft/mckernel) - A hybrid kernel that combines Linux and a lightweight kernel designed to provide high performance for HPC applications. 179 | - [mOS](http://cs.iit.edu/~khale/docs/mos.pdf) - A specialized operating system for high-performance computing, designed to support large-scale, manycore processors. 180 | 181 | #### Development/Workflow/Monitoring Tools for HPC 182 | 183 | - [Apache Airflow](https://airflow.apache.org/) - A platform to programmatically author, schedule, and monitor workflows. 184 | - [Apptainer (formerly Singularity)](https://singularity.lbl.gov/) - Container platform designed for scientific and high-performance computing (HPC) environments. 185 | - [arbiter2](https://github.com/CHPC-UofU/arbiter2) - Monitors and protects interactive nodes with cgroups. 186 | - [Charliecloud](https://charliecloud.io/) - Lightweight container solution for high-performance computing (HPC). 187 | - [Docker](https://www.docker.com/) - A set of platform as a service products that use OS-level virtualization to deliver software in packages called containers. 188 | - [genv](https://github.com/run-ai/genv) - GPU Environment Management for managing and scheduling GPU resources. 189 | - [Grafana](https://github.com/grafana/grafana) - Open-source platform for monitoring and observability, visualizing metrics. 190 | - [grpc](https://grpc.io/) - A high-performance, open-source universal RPC framework. 191 | - [HPC Rocket](https://github.com/SvenMarcus/hpc-rocket) - Allows submitting Slurm jobs in Continuous Integration (CI) pipelines. 192 | - [HTCondor](https://research.cs.wisc.edu/htcondor/) - An open-source high-throughput computing software framework. 193 | - [Jacamar-ci](https://gitlab.com/ecp-ci/jacamar-ci/-/blob/develop/README.md) - CI/CD tool designed for HPC and scientific computing workflows. 194 | - [Kubernetes](https://kubernetes.io/) - An open-source system for automating deployment, scaling, and management of containerized applications. 195 | - [nextflow](https://www.nextflow.io/) - A workflow framework to deploy data-driven computational pipelines. 196 | - [perun](https://github.com/Helmholtz-AI-Energy/perun) - Energy monitor for HPC systems, focusing on performance and energy efficiency. 197 | - [Prefect](https://www.prefect.io/) - A workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine. 198 | - [Prometheus](https://prometheus.io/) - An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach. 199 | - [redun](https://github.com/insitro/redun) - Workflow engine that emphasizes simplicity, reliability, and scalability. 200 | - [remora](https://github.com/TACC/remora) - Tool for monitoring and reporting the performance of batch jobs on HPC systems. 201 | - [ruptime](https://github.com/alexmyczko/ruptime) - A utility for monitoring the status of computational jobs and systems. 202 | - [Slurmvision slurm dashboard](https://github.com/Ruunyox/slurmvision) - A dashboard for monitoring and managing Slurm jobs. 203 | - [slurm docker cluster](https://github.com/giovtorres/slurm-docker-cluster) - A Slurm cluster implemented using Docker containers, for development and testing. 204 | - [snakemake](https://snakemake.readthedocs.io/en/stable/) - A workflow management system that reduces the complexity of creating reproducible and scalable data analyses. 205 | - [Stui slurm dashboard for the terminal](https://github.com/mil-ad/stui) - A terminal-based UI for managing and monitoring Slurm clusters. 206 | - [Vaex](https://github.com/vaexio/vaex) - A Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. 207 | 208 | 209 | #### Debugging Tools for HPC 210 | 211 | - [ddt](https://www.arm.com/products/development-tools/server-and-hpc/forge/ddt) - A powerful debugger designed for developers to solve complex problems on multi-threaded and multi-process environments in HPC. 212 | - [marmot MPI checker](https://www.lrz.de/services/software/parallel/marmot/) - A tool for detecting and reporting issues in MPI (Message Passing Interface) applications. 213 | - [python debugging tools](https://wiki.python.org/moin/PythonDebuggingTools) - A collection of tools for debugging Python applications, including pdb and other utilities. 214 | - [seer modern gui for gdb](https://github.com/epasveer/seer) - A graphical user interface for GDB, aiming to improve the debugging experience with modern features and visuals. 215 | - [Summary of C/C++ debugging tools](http://pramodkumbhar.com/2018/06/summary-of-debugging-tools/) - An overview of various debugging tools available for C/C++ applications, focusing on HPC environments. 216 | - [totalview](https://totalview.io/) - A comprehensive source code analysis and debugging tool designed for complex software running on HPC systems, supporting a wide range of languages and architectures. 217 | 218 | 219 | #### Performance/Benchmark Tools for HPC 220 | 221 | - [demonspawn](https://github.com/TACC/demonspawn) - A framework for automated execution of benchmarks and simulations, designed for HPC environments. 222 | - [Google benchmark](https://github.com/google/benchmark) - A microbenchmark support library for C++ that tracks performance over time. 223 | - [HPL benchmark](https://www.netlib.org/benchmark/hpl/) - The High Performance Linpack Benchmark for measuring floating-point computing power of systems. 224 | - [kerncraft](https://github.com/RRZE-HPC/kerncraft) - A tool for analytical modeling of loop performance and cache behavior on HPC systems. 225 | - [NASA parallel benchmark suite](https://www.nas.nasa.gov/software/npb.html) - A set of benchmarks designed to evaluate the performance of parallel supercomputers. 226 | - [papi](https://icl.utk.edu/papi/) - Provides standard APIs for accessing hardware performance counters available on modern microprocessors. 227 | - [scalasca](https://www.scalasca.org/) - A software tool that supports performance analysis of large-scale parallel applications. 228 | - [scalene](https://github.com/plasma-umass/scalene) - A high-performance, high-precision CPU, GPU, and memory profiler for Python. 229 | - [Summary of code performance analysis tools](https://doku.lrz.de/display/PUBLIC/Performance+and+Code+Analysis+Tools+for+HPC) - An overview of tools for analyzing HPC application performance. 230 | - [Summary of profiling tools](https://pramodkumbhar.com/2017/04/summary-of-profiling-tools/) - A comprehensive list of profiling tools for performance analysis in HPC. 231 | - [tau](https://www.cs.uoregon.edu/research/tau/home.php) - TAU (Tuning and Analysis Utilities) is a profiling and tracing toolkit for performance analysis of parallel programs. 232 | - [The Bandwidth Benchmark](https://github.com/RRZE-HPC/TheBandwidthBenchmark/) - A tool for measuring memory bandwidth across various CPUs and systems. 233 | - [vampir](https://vampir.eu/) - A tool for detailed analysis of MPI program executions by visualizing their event traces. 234 | - [bytehound memory profiler](https://github.com/koute/bytehound) - A detailed memory profiler for tracking down memory issues and leaks. 235 | - [Flamegraphs](https://www.brendangregg.com/flamegraphs.html) - Visualization tool for profiling software, allowing quick identification of performance bottlenecks. 236 | - [fio](https://linux.die.net/man/1/fio) - Flexible I/O tester for benchmarking and stress/hardware verification. 237 | - [IBM Spectrum Scale Key Performance Indicators (KPI)](https://github.com/IBM/SpectrumScale_NETWORK_READINESS) - Provides key performance indicators for IBM Spectrum Scale, aiding in performance tuning and monitoring. 238 | - [Ior](https://github.com/hpc/ior) - A parallel file system I/O benchmarking tool used widely in HPC for testing storage systems. 239 | - [ngstress](https://github.com/ColinIanKing/stress-ng) - A versatile tool for stressing various subsystems of a computer to find hardware faults or to benchmark performance. 240 | - [Hotspot](https://github.com/KDAB/hotspot/) - The Linux perf GUI for in-depth performance analysis and visualization of software behavior. 241 | - [mixbench](https://github.com/ekondis/mixbench) - A benchmark suite designed to evaluate CPUs and GPUs across different compute and memory operations. 242 | - [pmu-tools (toplev)](https://github.com/andikleen/pmu-tools) - Performance monitoring tools for modern Intel CPUs, offering detailed insights into hardware and application performance. 243 | - [SPEC CPU Benchmark](https://www.spec.org/benchmarks.html) - A benchmark suite designed to provide a comparative measure of compute-intensive performance across the widest practical range of hardware. 244 | - [STREAM Memory Bandwidth Benchmark](https://www.cs.virginia.edu/stream/) - Measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels. 245 | - [Intel MPI benchmarks](https://www.intel.com/content/www/us/en/docs/mpi-library/user-guide-benchmarks/2021-2/overview.html) - A set of benchmarks designed to measure the performance and scalability of MPI implementations on Intel architectures. 246 | - [Ohio state MPI benchmarks](https://mvapich.cse.ohio-state.edu/benchmarks/) - A comprehensive suite of benchmarks for evaluating MPI performance across a variety of message passing patterns and communication protocols. 247 | - [hpctoolkit](http://hpctoolkit.org/man/hpctoolkit.html) - An integrated suite of tools for measurement and analysis of program performance on computers ranging from desktops to supercomputers. 248 | - [core-to-core-latency](https://github.com/nviennot/core-to-core-latency) - A diagnostic tool designed to measure and report the latency between CPU cores, aiding in the optimization of parallel computing tasks. 249 | - [speedscope](https://github.com/jlfwong/speedscope) - An interactive, web-based viewer for performance profiles of software. It supports various formats and provides a flamegraph visualization to identify hot paths efficiently. 250 | - [Differential Flamegraphs](https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html) - A visualization technique developed by Brendan Gregg that highlights differences between performance profiles, making it easier to spot performance regressions or improvements. 251 | - [Hyperfine](https://github.com/sharkdp/hyperfine) - A command-line benchmarking tool that provides a simple and user-friendly means to compare the performance of commands, featuring statistical analysis across multiple runs. 252 | - [Openfoam HPC benchmark](https://develop.openfoam.com/committees/hpc/-/wikis/home) - A benchmarking suite for evaluating the High Performance Computing capabilities of OpenFOAM, an open-source CFD software, under various computational loads. 253 | - [OSU microbenchmarks](https://mvapich.cse.ohio-state.edu/benchmarks/) - A collection of microbenchmarks designed to evaluate the performance of MPI implementations across various communication protocols and message sizes. 254 | - [fio flexible I/O tester](https://fio.readthedocs.io/) - A versatile tool for I/O workload simulation and benchmarking, capable of testing a wide array of storage and filesystem configurations. 255 | - [vftrace](https://github.com/SX-Aurora/Vftrace) - A tracing tool specifically designed for the NEC SX-Aurora TSUBASA Vector Engine, enabling detailed performance analysis of vectorized code. 256 | - [tinymembench](https://github.com/ssvb/tinymembench) - A simple memory benchmark tool, focusing on benchmarking memory bandwidth and latency with minimal dependencies, suitable for various platforms. 257 | - [Geekbench](https://www.geekbench.com/) - Cross platform benchmarking tool 258 | - [Empirical Roofline Tool (ERT)](https://crd.lbl.gov/divisions/amcr/computer-science-amcr/par/research/roofline/software/ert/) - Create empirical roofline plots, alternative to intel vtune for any machine 259 | - [Roofline Visualizer for ERT](https://crd.lbl.gov/divisions/amcr/computer-science-amcr/par/research/roofline/software/roofline-visualizer/) - Visualizer for ERT 260 | - [Caliper](https://github.com/LLNL/Caliper) - A Performance Analysis Toolbox in a Library 261 | - [KDiskMark](https://github.com/JonMagon/KDiskMark) - Benchmarking Tool For SSD/HDD Drives 262 | - [OpenBenchmarking](https://openbenchmarking.org/) - Open benchmarks on a variety of algorithms and hardware 263 | - [Phoronix Test Suite](https://github.com/phoronix-test-suite/phoronix-test-suite) - Benchmarking suite for Linux 264 | - [Palanteer Python/C++ Profiler](https://github.com/dfeneyrou/palanteer) - Profiler for both Python and C++ 265 | 266 | #### IO/Visualization Tools for HPC 267 | - [ADIOS2](https://github.com/ornladios/ADIOS2) - The Adaptable IO System version 2, designed for flexible and efficient I/O for scientific data, supporting a wide range of HPC simulations. 268 | - [Amira](https://www.thermofisher.com/ca/en/home/electron-microscopy/products/software-em-3d-vis/amira-software.html) - A powerful, multifaceted 3D software platform for visualizing, manipulating, and understanding Life Science and bio-medical data coming from all types of sources. 269 | - [hdf5](https://www.hdfgroup.org/solutions/hdf5/) - The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data. 270 | - [paraview](https://www.paraview.org/) - An open-source, multi-platform data analysis and visualization application. 271 | - [Scientific Visualization Wiki](https://en.wikipedia.org/wiki/Scientific_visualization) - A comprehensive guide to the field of scientific visualization, detailing techniques, tools, and applications. 272 | - [the yt project](https://yt-project.org/) - An open-source, Python-based package for analyzing and visualizing volumetric data. 273 | - [vedo](https://vedo.embl.es/) - A lightweight and powerful python module for scientific analysis and visualization of 3D objects and point clouds based on VTK. 274 | - [visit](https://wci.llnl.gov/simulation/computer-codes/visit) - An Open Source, interactive, scalable, visualization, animation and analysis tool. 275 | - [WebDataset](https://huggingface.co/docs/hub/datasets-webdataset) - library for writing I/O pipelines for large datasets. 276 | 277 | #### General Purpose Scientific Computing Libraries for HPC 278 | - [petsc](https://petsc.org/release/) 279 | - [ginkgo](https://ginkgo-project.github.io/) 280 | - [GSL](https://www.gnu.org/software/gsl/) 281 | - [Scalapack](https://netlib.org/scalapack/) 282 | - [rapids.ai](rapids.ai) - collection of libraries for executing end-to-end data science pipelines completely in the GPU 283 | - [trilinos](https://trilinos.github.io/) 284 | - [tnl project](https://tnl-project.org/) 285 | - [RunMat](https://github.com/runmat-org/runmat) - MATLAB-syntax runtime with automatic CPU/GPU execution and fused array math kernels. 286 | 287 | #### Misc. 288 | - [mimalloc memory allocator](https://github.com/microsoft/mimalloc) 289 | - [jemalloc memory allocator](https://github.com/jemalloc/jemalloc) 290 | - [tcmalloc memory allocator](https://github.com/google/tcmalloc) 291 | - [Horde memory allocator](https://github.com/emeryberger/Hoard) 292 | - [Software utilization at UK National Supercomputing Service, ARCHER2](https://www.archer2.ac.uk/support-access/status.html#software-usage-data) 293 | - [SIMD Info](https://simd.info) 294 | 295 | #### Wikis 296 | - [Comparison of cluster software](https://en.wikipedia.org/wiki/Comparison_of_cluster_software) 297 | - [List of cluster management software](https://en.wikipedia.org/wiki/List_of_cluster_management_software) 298 | 299 | ## Hardware 300 | 301 | ### Interconnects/Topology 302 | 303 | - [Ethernet](https://en.wikipedia.org/wiki/Ethernet) 304 | - [Infiniband](https://en.wikipedia.org/wiki/InfiniBand) 305 | - [Network topologies](https://www.hpcwire.com/2019/07/15/super-connecting-the-supercomputers-innovations-through-network-topologies/) 306 | - [Battle of the infinibands - Omnipath vs Infiniband](https://www.nextplatform.com/2017/11/29/the-battle-of-the-infinibands/) 307 | - [Mellanox infiniband cluster config](https://www.mellanox.com/clusterconfig/) 308 | - [RoCE - RDMA Over Converged Ethernet](https://en.wikipedia.org/wiki/RDMA_over_Converged_Ethernet) 309 | - [Slingshot interconnect](https://www.hpe.com/ca/en/compute/hpc/slingshot-interconnect.html) 310 | - [CXL - Compute Express Link](https://www.computeexpresslink.org/) 311 | - [Infiniband Essentials](https://academy.nvidia.com/en/course/infiniband-essentials/?cm=244) 312 | - [NVlink](https://en.wikipedia.org/wiki/NVLink) 313 | - [List of lan-based interconnect bit rates](https://en.wikipedia.org/wiki/List_of_interface_bit_rates) 314 | - [List of internet-based interconnect bit rates](https://en.wikipedia.org/wiki/Bandwidth_(computing)#Internet_connection_bandwidths) 315 | 316 | ### CPU 317 | - [Wikichip](https://en.wikichip.org/wiki/WikiChip) 318 | - [Microarchitecture of Intel/AMD CPUs](https://www.agner.org/optimize/microarchitecture.pdf) 319 | - [Apple M1](https://en.wikipedia.org/wiki/Apple_M1) 320 | - [Apple M2](https://en.wikipedia.org/wiki/Apple_M2) 321 | - [Apple M2 Teardown](https://www.ifixit.com/News/62674/m2-macbook-air-teardown-apple-forgot-the-heatsink) 322 | - [Apply M1/M2 AMX](https://github.com/corsix/amx) 323 | - [Apple M3](https://en.wikipedia.org/wiki/Apple_M3) 324 | - [List of Intel processors](https://en.wikipedia.org/wiki/List_of_Intel_processors) 325 | - [List of Intel micro architectures](https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures) 326 | - [Comparison of Intel processors](https://en.wikipedia.org/wiki/Comparison_of_Intel_processors) 327 | - [Comparison of Apple processors](https://en.wikipedia.org/wiki/Apple-designed_processors) 328 | - [List of AMD processors](https://en.wikipedia.org/wiki/List_of_AMD_processors) 329 | - [List of AMD CPU micro architectures](https://en.wikipedia.org/wiki/List_of_AMD_CPU_microarchitectures) 330 | - [Comparison of AMD architectures](https://en.wikipedia.org/wiki/Table_of_AMD_processors) 331 | 332 | ### GPU 333 | 334 | - [Inside NVIDIA GPUs: Anatomy of high performance matmul kernels](https://www.aleksagordic.com/blog/matmul) 335 | - [Gpu Architecture Analysis](https://graphicscodex.courses.nvidia.com/app.html?page=_rn_parallel) 336 | - [A trip through the Graphics Pipeline](https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/) 337 | - [A100 Whitepaper](https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf) 338 | - [MIG](https://www.nvidia.com/en-us/technologies/multi-instance-gpu/) 339 | - [Gentle Intro to GPU Inner Workings](https://vksegfault.github.io/posts/gentle-intro-gpu-inner-workings/) 340 | - [AMD Instinct GPUs](https://en.wikipedia.org/wiki/AMD_Instinct_accelerators) 341 | - [AMD GPU ROCm Support and OS Compatibility](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html) 342 | - [List of AMD GPUs](https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units) 343 | - [Comparison of CUDA architectures](https://en.wikipedia.org/wiki/CUDA) 344 | - [Tales of the M1 GPU](https://asahilinux.org/2022/11/tales-of-the-m1-gpu/) 345 | - [List of Intel GPUs](https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units) 346 | - [Performance of DGX Cluster](https://www.computer.org/csdl/proceedings-article/cloudcom/2022/636700a170/1JNqFu7QdTG) 347 | - [Cuda Ontology](https://jamesakl.com/posts/cuda-ontology/) 348 | 349 | ### TPU/Tensor Cores 350 | 351 | - [Google TPU](https://thechipletter.substack.com/p/googles-first-tpu-architecture) 352 | - [TPU Wiki](https://en.wikipedia.org/wiki/Tensor_Processing_Unit) 353 | - [NVIDIA Tensor Cores](https://www.nvidia.com/en-us/data-center/tensor-cores/) 354 | 355 | ### Many integrated core processor (MIC) 356 | 357 | - [Xeon Phi](https://en.wikipedia.org/wiki/Xeon_Phi) 358 | 359 | ### Cloud 360 | 361 | - [Awesome Cloud HPC](https://github.com/kjrstory/awesome-cloud-hpc) 362 | 363 | #### Vendors 364 | 365 | - [Official NVIDIA Vendors](https://marketplace.nvidia.com/en-us/enterprise/cloud-solutions/?limit=15) 366 | - [AWS HPC](https://aws.amazon.com/hpc/) 367 | - [Azure HPC](https://azure.microsoft.com/en-us/solutions/high-performance-computing/#intro) 368 | - [rescale](https://rescale.com/) 369 | - [vast.ai](https://vast.ai/) 370 | - [vultr - cheap bare metal CPU, GPU, DGX servers](vultr.com) 371 | - [hetzner - cheap servers incl. 80-core ARM](https://www.hetzner.com/) 372 | - [Ampere ARM cloud-native processors](https://amperecomputing.com/) 373 | - [Scaleway](https://www.scaleway.com/en/) 374 | - [Chameleon Cloud](https://www.chameleoncloud.org/) 375 | - [Lambda Labs](https://lambdalabs.com/) 376 | - [Runpod](https://www.runpod.io/) 377 | 378 | #### Articles/Papers 379 | - [The use of Microsoft Azure for high performance cloud computing – A case study](https://www.diva-portal.org/smash/get/diva2:1704798/FULLTEXT01.pdf) 380 | - [AWS Cluster in the cloud](https://cluster-in-the-cloud.readthedocs.io/en/latest/aws-infrastructure.html) 381 | - [AWS Parallel Cluster](https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials-running-your-first-job-on-version-3.html) 382 | - [AWS HPC Workshop](https://www.hpcworkshops.com/) 383 | - [An Empirical Study of Containerized MPI and GUI Application on HPC in the Cloud](https://ieeexplore.ieee.org/abstract/document/10046607) 384 | 385 | ### Custom/FPGA/ASIC/APU 386 | 387 | - [OpenPiton](http://parallel.princeton.edu/openpiton/) 388 | - [Parallela](https://www.parallella.org/) 389 | - [AMD APU](https://en.wikipedia.org/wiki/AMD_Accelerated_Processing_Unit) 390 | 391 | ### Certification 392 | 393 | - [Intel Cluster Ready](https://en.wikipedia.org/wiki/Intel_Cluster_Ready) 394 | 395 | ### Student Opportunities / Workshops 396 | 397 | - [Supercomputing Conference Student Opportunities](https://sc21.supercomputing.org/program/studentssc/) 398 | - [SCC Student cluster competition](https://www.studentclustercompetition.us/) 399 | - [Winter Classic Invitational](https://www.winterclassicinvitational.com/) 400 | - [Linux Cluster Institute](https://linuxclustersinstitute.org/) 401 | 402 | ### Other/Wikis 403 | 404 | - [Supercomputer](https://en.wikipedia.org/wiki/Supercomputer) 405 | - [Supercomputer architecture](https://en.wikipedia.org/wiki/Supercomputer_architecture) 406 | - [Beowulf cluster](https://en.wikipedia.org/wiki/Beowulf_cluster) 407 | - [Computer cluster](https://en.wikipedia.org/wiki/Computer_cluster) 408 | - [Comparison of Intel processors](https://en.wikipedia.org/wiki/Comparison_of_Intel_processors) 409 | - [Comparison of Apple processors](https://en.wikipedia.org/wiki/Apple-designed_processors) 410 | - [Comparison of AMD architectures](https://en.wikipedia.org/wiki/Table_of_AMD_processors) 411 | - [Comparison of CUDA architectures](https://en.wikipedia.org/wiki/CUDA) 412 | - [Cache](https://en.wikipedia.org/wiki/Cache_(computing)) 413 | - [Google TPU](https://en.wikipedia.org/wiki/Tensor_Processing_Unit) 414 | - [IPMI](https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface) 415 | - [FRU](https://en.wikipedia.org/wiki/Field-replaceable_unit) 416 | - [Disk Arrays](https://en.wikipedia.org/wiki/Disk_array) 417 | - [RAID](https://en.wikipedia.org/wiki/RAID) 418 | - [Cray](https://en.wikipedia.org/wiki/Cray) 419 | - [Digital Signal Processors](https://en.wikipedia.org/wiki/Digital_signal_processor) 420 | - [Vector Processor](https://en.wikipedia.org/wiki/Vector_processor) 421 | 422 | ## People 423 | 424 | - [Jack Dongarra - 2021 Turing Award - LINPACK, BLAS, LAPACK, MPI](https://www.nature.com/articles/s43588-022-00245-w) 425 | - [Bill Gropp - 2010 IEEE TCSC Medal for Excellence in Scalable Computing](https://en.wikipedia.org/wiki/Bill_Gropp) 426 | - [David Bader - built the first Linux supercomputer](https://en.wikipedia.org/wiki/David_Bader_(computer_scientist)) 427 | - [Thomas Sterling - "Father of Beowulf clusters", ParalleX/HPX](https://en.wikipedia.org/wiki/Thomas_Sterling_(computing)) 428 | - [Seymour Cray - Inventor of the Cray Supercomputer](https://en.wikipedia.org/wiki/Seymour_Cray) 429 | - [Larry Smarr - HPC Application Pioneer](https://en.wikipedia.org/wiki/Larry_Smarr) 430 | - [Donald Becker - Beowulf cluster software, Gordon Bell Prize Winner](https://en.wikipedia.org/wiki/Donald_Becker) 431 | - [HPCWire Class of 2025](https://www.hpcwire.com/35-hpc-legends/) 432 | - [HPCWire Class of 2024](https://www.hpcwire.com/35-hpc-legends-class-of-2024/) 433 | 434 | ## Resources 435 | 436 | #### Books/Manuals 437 | - [Free Modern HPC Books by Victor Eijkhout](https://theartofhpc.com/) 438 | - [High Performance Parallel Runtimes](https://www.amazon.com/High-Performance-Parallel-Runtimes-Implementation-ebook/dp/B08WH82KF9/ref=sr_1_1?keywords=High+Performance+Parallel+Runtimes&qid=1689287759&sr=8-1) 439 | - [The OpenMP Common Core: Making OpenMP Simple Again](https://www.amazon.com/OpenMP-Common-Core-Engineering-Computation/dp/0262538865/ref=d_pd_sbs_sccl_2_1/130-5660046-7109016?pd_rd_w=Cqnxw&content-id=amzn1.sym.3676f086-9496-4fd7-8490-77cf7f43f846&pf_rd_p=3676f086-9496-4fd7-8490-77cf7f43f846&pf_rd_r=HG04QQS87WDHAGV578EE&pd_rd_wg=u0csS&pd_rd_r=8a6a0024-5dec-4934-8fa5-99e24d9fc4bd&pd_rd_i=0262538865&psc=1) 440 | - [Parallel and High Performance Computing](https://www.manning.com/books/parallel-and-high-performance-computing) 441 | - [Algorithms for Modern Hardware](https://en.algorithmica.org/hpc/) 442 | - [High Performance Computing: Modern Systems and Practices](https://www.amazon.ca/High-Performance-Computing-Systems-Practices/dp/012420158X) - Thomas Sterling, Maciej Brodowicz, Matthew Anderson 2017 443 | - [Introduction to High Performance Computing for Scientists and Engineers](https://www.amazon.ca/Introduction-Performance-Computing-Scientists-Engineers/dp/143981192X/ref=sr_1_1?crid=1L276HPEB8K7I&keywords=Introduction+to+High+Performance+Computing+for+Scientists+and+Engineers&qid=1645137608&s=books&sprefix=introduction+to+high+performance+computing+for+scientists+and+engineers%2Cstripbooks%2C46&sr=1-1) - Hager 2010 444 | - [Computer Organization and Design](https://www.amazon.ca/Computer-Organization-Design-RISC-V-Interface/dp/0128203315/ref=sr_1_1?crid=1XLX1HWLGRVO6&keywords=Computer+Organization+and+Design&qid=1645137443&s=books&sprefix=computer+organization+and+design%2Cstripbooks%2C48&sr=1-1) 445 | - [Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops](C+Applications+with+Intel+Cluster+Tools&qid=1645137507&s=books&sprefix=optimizing+hpc+applications+with+intel+cluster+tools%2Cstripbooks%2C80&sr=1-1) 446 | - [Introduction to High Performance Scientific Computing](https://web.corral.tacc.utexas.edu/CompEdu/pdf/stc/EijkhoutIntroToHPC.pdf) - Victor Eijkhout 2021 447 | - [Parallel Programming for Science and Engineering](https://web.corral.tacc.utexas.edu/CompEdu/pdf/pcse/EijkhoutParallelProgramming.pdf) - Victor EIjkhout 2021 448 | - [Parallel Programming for Science and Engineering - HTML Version](https://pages.tacc.utexas.edu/~eijkhout/pcse/html/) 449 | - [C++ High Performance](https://www.amazon.ca/High-Performance-Master-optimizing-functioning/dp/1839216549/ref=sr_1_1?crid=31OVX4VQ6Z84X&keywords=C%2B%2B+high+performance&qid=1640671313&sprefix=c%2B%2B+high+performance%2Caps%2C99&sr=8-1) 450 | - [Data Parallel C++ Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL](https://www.apress.com/gp/book/9781484255735) 451 | - [High Performance Python](https://www.amazon.ca/High-Performance-Python-Performant-Programming/dp/1449361595) 452 | - [C++ Concurrency in Action: Practical Multithreading](https://www.manning.com/books/c-plus-plus-concurrency-in-action) - Anthony Williams 2012 453 | - [The Art of Multiprocessor Programming](https://www.amazon.com/Art-Multiprocessor-Programming-Revised-Reprint/dp/0123973376/ref=sr_1_1?ie=UTF8&qid=1438003865&sr=8-1&keywords=maurice+herlihy) - Maurice Herlihy 2012 454 | - [Parallel Computing: Theory and Practice](http://www.cs.cmu.edu/afs/cs/academic/class/15210-f15/www/tapp.html#ch:work-stealing) - Umut A. Acar 2016 455 | - [Introduction to Parallel Computing](https://www.amazon.ca/Introduction-Parallel-Computing-Zbigniew-Czech/dp/1107174392/ref=sr_1_7?dchild=1&keywords=parallel+computing&qid=1625711415&sr=8-7) - Zbigniew J. Czech 456 | - [Practical guide to bare metal C++](https://arobenko.github.io/bare_metal_cpp/) 457 | - [Optimizing software in C++](https://www.agner.org/optimize/optimizing_cpp.pdf) 458 | - [Optimizing subroutines in assembly code](https://www.agner.org/optimize/optimizing_assembly.pdf) 459 | - [Microarchitecture of Intel/AMD CPUs](https://www.agner.org/optimize/microarchitecture.pdf) 460 | - [Parallel Programming with MPI](https://www.cs.usfca.edu/~peter/ppmpi/) 461 | - [HPC, Big Data, AI Convergence Towards Exascale: Challenge and Vision](https://www.taylorfrancis.com/books/edit/10.1201/9781003176664/hpc-big-data-ai-convergence-towards-exascale-olivier-terzo-jan-martinovi%C4%8D?refId=2cd8b0ad-d63d-42fa-9c3e-fe47fbbe0e29&context=ubx) 462 | - [Introduction to parallel computing](https://www.amazon.com/Introduction-Parallel-Computing-Ananth-Grama/dp/0201648652/ref=sr_1_1?crid=LE1VD245VDX5&keywords=Ananth+Grama+-+Introduction+to+parallel+computing&qid=1644907263&sprefix=ananth+grama+-+introduction+to+parallel+computing%2Caps%2C43&sr=8-1) - Ananth Grama 463 | - [The Student Supercomputer Challenge Guide](https://www.amazon.ca/Student-Supercomputer-Challenge-Guide-Supercomputing/dp/9811338310/ref=sr_1_1?crid=2J5374I76RP2Y&keywords=The+student+supercomputer+challenge&qid=1657060946&sprefix=the+student+supercomputer+challenge%2Caps%2C53&sr=8-1) 464 | - [The Rust Performance Book](https://nnethercote.github.io/perf-book/introduction.html) 465 | - [E-Zines on Bash, Linux, Perf, etc - Julia Evans](https://wizardzines.com/) 466 | - [The Art of Writing Efficient Programs: An Advanced Programmer's Guide to Efficient Hardware Utilization and Compiler Optimizations Using C++ Examples](https://www.amazon.ca/Art-Writing-Efficient-Programs-optimizations/dp/1800208111) 467 | - [OpenMP Examples - openmp.org](https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf) 468 | - [Latest books on OpemMP - openmp.org](https://www.openmp.org/resources/openmp-books/) 469 | - [Programming Massively Parallel Processors 4th Edition 2023](https://www.amazon.ca/Programming-Massively-Parallel-Processors-Hands/dp/0128119861/ref=sr_1_1?crid=18EW0LVO2VFMC&keywords=Programming+Massively+Parallel+Processors+4th+Edition+2023&qid=1695110729&s=books&sprefix=programming+massively+parallel+processors+4th+edition+2023%2Cstripbooks%2C88&sr=1-1) 470 | - [Software Optimization Cookbook](https://www.amazon.ca/Software-Optimization-Cookbook-Performance-Platforms/dp/0976483211) 471 | - [Power and Performance_ Software Analysis and Optimization](https://www.amazon.ca/Power-Performance-Software-Analysis-Optimization-ebook/dp/B00WZ1AX6S/ref=sr_1_1?crid=22HMPRFCYAXC0&keywords=Power+and+Performance_+Software+Analysis+and+Optimization&qid=1695111518&s=books&sprefix=power+and+performance_+software+analysis+and+optimization%2Cstripbooks%2C85&sr=1-1) 472 | - [Gropp books on MPI](https://wgropp.cs.illinois.edu/usingmpiweb/) 473 | - [Performance Analysis and Tuning on Modern CPUs](https://book.easyperf.net/perf_book) 474 | - [High Performance Computing in Biomimetics Modeling, Architecture and Applications](https://link.springer.com/book/10.1007/978-981-97-1017-1) 475 | - [Systems Performance - Brendan Gregg](https://www.amazon.com/Systems-Performance-Brendan-Gregg/dp/0136820158) 476 | - [Is Parallel Programming Hard, And, If So, What Can You Do About It? - Paul E. McKenney](https://cdn.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html) 477 | - [The Little Book of Semaphores](https://greenteapress.com/wp/semaphores/) 478 | - [Building Clustered Linux Systems](https://www.amazon.com/Building-Clustered-Linux-Systems-Robert/dp/0131448536) 479 | - [Computer Architecture: A Quantitative Approach](https://www.amazon.com/Computer-Architecture-Quantitative-John-Hennessy/dp/012383872X) 480 | - [Supercomputers for Linux SysAdmins](https://link.springer.com/book/10.1007/979-8-8688-1600-0) 481 | - [Raspberry Pi Supercomputing and Scientific Programming](https://link.springer.com/book/10.1007/978-1-4842-2878-4) 482 | - [MPI with Python (free)](https://cloudmesh.github.io/cloudmesh-mpi/report-mpi.pdf) 483 | 484 | #### Courses 485 | - [HPC Carpentry](https://www.hpc-carpentry.org/) 486 | - [Berkeley: Applications of Parallel Computers](https://sites.google.com/lbl.gov/cs267-spr2019/) - Detailed course on HPC 487 | - [CS6290 High-performance Computer Architecture](https://www.udacity.com/course/high-performance-computer-architecture--ud007) - Milos Prvulovic and Catherine Gamboa at George Tech 488 | - [Udacity High Performance Computing](https://www..com/playlist?list=PLAwxTw4SYaPk8NaXIiFQXWK6VPnrtMRXC) 489 | - [Parallel Numerical Algorithms](https://solomonik.cs.illinois.edu/teaching/cs554/index.html) 490 | - [Vanderbilt - Intro to HPC](https://github.com/vanderbiltscl/SC3260_HPC) 491 | - [Illinois - Intro to HPC](https://andreask.cs.illinois.edu/Teaching/HPCFall2012/) - Creator of PyCuda 492 | - [Archer1 Courses](http://www.archer.ac.uk/training/past_courses.php) 493 | - [TACC tutorials](https://portal.tacc.utexas.edu/tutorials) 494 | - [Livermore training materials](https://hpc.llnl.gov/training/tutorials) 495 | - [Xsede training materials](https://www.hpc-training.org/xsede/moodle/) 496 | - [Parallel Computation Math](https://www.cct.lsu.edu/~pdiehl/teaching/2021/4997/) 497 | - [Introduction to High-Performance and Parallel Computing - Coursera](https://www.coursera.org/learn/introduction-high-performance-computing) 498 | - [Foundations of HPC 2020/2021](https://github.com/Foundations-of-HPC) 499 | - [Principles of Distributed Computing](https://disco.ethz.ch/courses/podc_allstars/) 500 | - [High Performance Visualization](https://www.uni-bremen.de/ag-high-performance-visualization) 501 | - [Temple course on building/maintaining a cluster](https://www.hpc.temple.edu/mhpc/2021/hpc-technology/index.html) 502 | - [Nvidia Deep Learning Course](https://www.nvidia.com/en-us/training/online/) 503 | - [Coursera GPU Programming Specialization](https://www.coursera.org/specializations/gpu-programming) 504 | - [Coursera Fundamentals of Parallelism on Intel Architecture](https://www.coursera.org/learn/parallelism-ia) 505 | - [Coursera Introduction to High Performance Computing](https://www.coursera.org/learn/introduction-high-performance-computing) 506 | - [Archer2 Shared Memory Programming with OpenMP](https://www.archer2.ac.uk/training/courses/210000-openmp-self-service/) 507 | - [Archer2 Message-Passing Programming with MPI](https://www.archer2.ac.uk/training/courses/210000-mpi-self-service/) 508 | - [HetSys 2022 Course](https://www.youtube.com/playlist?list=PL5Q2soXY2Zi9XrgXR38IM_FTjmY6h7Gzm) 509 | - [Edukamu Introduction to Supercomputing](https://edukamu.fi/elements-of-supercomputing) 510 | - [Heterogeneous Parallel Programming by S K](https://www.youtube.com/channel/UCbD5dhBi6DBSvCTgEDFz7uA/videos) 511 | - [NCSA HPC Training Moodle](https://www.hpc-training.org/xsede/moodle/) 512 | - [Supercomputing in plain english](http://www.oscer.ou.edu/education.php) 513 | - [Cornell workshop](https://cvw.cac.cornell.edu/topics) 514 | - [Carpentries Incubator HPC Intro](https://carpentries-incubator.github.io/hpc-intro/) 515 | - [UL HPC School](https://ulhpc-tutorials.readthedocs.io/en/latest/hpc-school/) 516 | - [Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran](https://bitbucket.org/berkeleylab/upcxx/wiki/events/CUF23) 517 | - [Performance Engineering off Software Systems (MIT-OCW)](https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/video_galleries/lecture-videos/) 518 | - [Introduction to Parallel Computing (CMSC 498X/818X)](https://www.cs.umd.edu/class/fall2020/cmsc498x/lectures.shtml) 519 | - [Infiniband Essentials](https://academy.nvidia.com/en/course/infiniband-essentials/?cm=244) 520 | - [Performance Ninja Optimization Course](https://github.com/dendibakh/perf-ninja) 521 | - [HPC Administration Virtual Residency 2024](https://www.youtube.com/@VirtualResidency2024/videos) 522 | - [Programming Parallel Computers](https://ppc-exercises.cs.aalto.fi/courses) 523 | - [High Performace Machine Learning - Columbia University](https://www.cs.columbia.edu/~aa4870/high-performance-machine-learning/) 524 | - [HPC.NRW: Various tutorials about Linux, GPU programming, performance tools, and research data management](https://www.youtube.com/@HPCNRW/playlists) 525 | - [Introduction to HPC - IIT Bombay](https://www.youtube.com/playlist?list=PLOzRYVm0a65fSrgx3kroerFJtFjGGIcXm) 526 | - [Aalto University course CS-E4580 Programming Parallel Computers](https://ppc.cs.aalto.fi/) 527 | - [HLRS Stuttgart HPC Courses](https://www.hlrs.de/de/training/uebersicht) 528 | 529 | #### Tutorials/Guides/Articles 530 | ##### General 531 | - [MpiTutorial](mpitutorial.com) - A fantastic mpi tutorial 532 | - [Beginners Guide to HPC](http://www.shodor.org/petascale/materials/UPModules/beginnersGuideHPC/) 533 | - [Rookie HPC Guide](https://rookiehpc.github.io/index.html) 534 | - [RedHat High Performance Computing 101](https://www.redhat.com/en/blog/high-performance-computing-101) 535 | - [Parallel Computing Training Tutorials](https://hpc.llnl.gov/training/tutorials) - Lawrence Livermore National Laboratory 536 | - [Foundations of Multithreaded, Parallel, and Distributed Programming](https://www.amazon.com/Foundations-Multithreaded-Parallel-Distributed-Programming/dp/B00F4I7HM2/ref=sr_1_2?dchild=1&keywords=Gregory+R.+Andrews+Distributed+Programming&qid=1625766665&s=books&sr=1-2) 537 | - [Building pipelines using slurm dependencies](https://hpc.nih.gov/docs/job_dependencies.html) 538 | - [Writing slurm scripts in python,r and bash](https://vsoch.github.io/lessons/sherlock-jobs/) 539 | - [Xsede new user tutorials](https://portal.xsede.org/online-training) 540 | - [Supercomputing in plain english](http://www.oscer.ou.edu/education.php) 541 | - [Improving Performance with SIMD intrinsics](https://stackoverflow.blog/2020/07/08/improving-performance-with-simd-intrinsics-in-three-use-cases/) 542 | - [Want speed? Pass by value](https://web.archive.org/web/20140205194657/http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/) 543 | - [Introduction to low level bit hacks](https://catonmat.net/low-level-bit-hacks) 544 | - [How to write fast numerical code: An Introduction](https://users.ece.cmu.edu/~franzf/papers/gttse07.pdf) 545 | - [Lecture notes on Loop optimizations](https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/17-loopopt.pdf) 546 | - [A practical approach to code optimization](https://www.einfochips.com/wp-content/uploads/resources/a-practical-approach-to-optimize-code-implementation.pdf) 547 | - [Software optimization manuals](https://www.agner.org/optimize/) 548 | - [Guide into OpenMP: Easy multithreading programming for C++](https://bisqwit.iki.fi/story/howto/openmp/) 549 | - [An Introduction to the Partitioned Global Address Space (PGAS) Programming Model](https://cnx.org/contents/gtg1AzdI@7/An-Introduction-to-the-Partitioned-Global-Address-Space-PGAS-Programming-Model) 550 | - [Jax in 2022](https://www.assemblyai.com/blog/why-you-should-or-shouldnt-be-using-jax-in-2022/) 551 | - [C++ Benchmarking for beginners](https://unum.cloud/post/2022-03-04-gbench/) 552 | - [Mapping MPI ranks to multiple cuda GPU](https://github.com/olcf-tutorials/local_mpi_to_gpu) 553 | - [Oak Ridge National Lab Tutorials](https://github.com/olcf-tutorials) 554 | - [How to perform large scale data processing in bioinformatics](https://medium.com/dnanexus/how-to-perform-large-scale-data-processing-in-bioinformatics-4006e8088af2) 555 | - [Step by step SGEMM in OpenCL](https://cnugteren.github.io/tutorial/pages/page1.html) 556 | - [Frontier User Guide](https://docs.olcf.ornl.gov/systems/frontier_user_guide.html) 557 | - [Allocating large blocks of memory in bare-metal C programming](https://lemire.me/blog/2020/01/17/allocating-large-blocks-of-memory-bare-metal-c-speeds/) 558 | - [Hashmap benchmarks 2022](https://martin.ankerl.com/2022/08/27/hashmap-bench-01/) 559 | - [LLNL HPC Tutorials](https://hpc.llnl.gov/documentation/tutorials) 560 | - [The dirty secret of high performance computing](https://www.techradar.com/news/the-dirty-secret-of-high-performance-computing) 561 | - [Multiple GPUs with pytorch](https://www.run.ai/guides/multi-gpu/pytorch-multi-gpu-4-techniques-explained) 562 | - [Brendan Gregg on Linux Performance](https://www.brendangregg.com/linuxperf.html) 563 | - [Automatic Slurm build scripts](https://www.ni-sp.com/slurm-build-script-and-container-commercial-support/#h-automatic-slurm-build-script-for-rh-centos-7-8-and-9) 564 | - [Fastest unordered_map implementation / benchmarks](https://martin.ankerl.com/2022/08/27/hashmap-bench-01/) 565 | - [Memory bandwith NapkinMath](https://www.forrestthewoods.com/blog/memory-bandwidth-napkin-math/) 566 | - [Avoiding Instruction Cache Misses](https://paweldziepak.dev/2019/06/21/avoiding-icache-misses/) 567 | - [Multi-GPU Programming with Standard Parallel C++](https://developer.nvidia.com/blog/multi-gpu-programming-with-standard-parallel-c-part-1/) 568 | - [EuroCC National Competence Center Sweden (ENCCS) HPC tutorials](https://enccs.se/lessons/) 569 | - [LLNL hpc tutorials](https://hpc-tutorials.llnl.gov/) 570 | - [python.org Python Performance Tips](https://wiki.python.org/moin/PythonSpeed/PerformanceTips) 571 | - [HPC toolset tutorial (cluster management)](https://github.com/ubccr/hpc-toolset-tutorial) 572 | - [OpenMP tutorials](https://www.openmp.org/resources/tutorials-articles/) 573 | - [CUDA best practices guide](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html) 574 | - [Understanding CPU Architecture And Performance Using LIKWID](https://pramodkumbhar.com/2020/03/architectural-optimisations-using-likwid-profiler/) 575 | - [32 OpenMP Traps For C++ Developers](https://pvs-studio.com/en/blog/posts/cpp/a0054/#ID0EWEAC) 576 | - [Best practices for running jobs on a HPC cluster](https://hpc.dccn.nl/docs/cluster_howto/best_practices.html) 577 | - [Glossary of HPC related terms](https://www.gigabyte.com/Glossary?lan=en) 578 | - [Setting the record straight: What is HPC?](https://www.gigabyte.com/Article/setting-the-record-straight-what-is-hpc-a-tech-guide-by-gigabyte?lan=en) 579 | - [Atomic operations and contention](https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/) 580 | - [A concurrency cost hiearchy](https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html) 581 | - [hpc-wiki.info - Tutorials and articles for HPC users, developers, administrators and specific HPC systems](https://hpc-wiki.info) 582 | - [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM) 583 | - [An Introduction to Parallel Programming with MPI and Python](https://materials.jeremybejarano.com/MPIwithPython/) 584 | - [Performance Hints](https://abseil.io/fast/hints.html) 585 | 586 | ##### Machine Learning Related 587 | - [Best practices for machine learning with HPC](https://info.gwdg.de/news/en/best-practices-for-machine-learning-with-hpc/) 588 | - [How to pick the right hardware for AI - Gigabyte - Part 1](https://www.gigabyte.com/Article/how-to-pick-the-right-server-for-ai-part-one-cpu-gpu) 589 | - [A practitioner's guide to testing and running large GPU clusters for training generative AI models](https://www.together.ai/blog/a-practitioners-guide-to-testing-and-running-large-gpu-clusters-for-training-generative-ai-models) 590 | - [AWS HPC Workshop](https://www.hpcworkshops.com/) 591 | - [Hardware Acceleration of LLMs: A comprehensive survey and comparison](https://news.ycombinator.com/item?id=41470074) 592 | - [The Utralscale Playbook - Training LLMs on GPU Clusters](https://huggingface.co/spaces/nanotron/ultrascale-playbook) 593 | 594 | #### Review Papers/Articles 595 | - [Interactive and Urgent HPC Challenges (2024)](https://arxiv.org/pdf/2401.14550.pdf) 596 | - [The Landscape of Exascale Research: A Data-Driven Literature Analysis (2020)](https://dl.acm.org/doi/pdf/10.1145/3372390) 597 | - [The Landscape of Parallel Computing Research: A View from Berkeley](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf) 598 | - [Extreme Heterogeneity 2018: Productive Computational Science in the Era of Extreme Heterogeneity](references/2018-Extreme-Heterogeneity-DoE.pdf) 599 | - [Programming for Exascale Computers - Will Gropp, Marc Snir](https://snir.cs.illinois.edu/listed/J55.pdf) 600 | - [On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems (2020)](https://www.mcs.anl.gov/research/projects/argo/publications/2020-sbacpad-peng.pdf) 601 | - [Advances in Parallel & Distributed Processing, and Applications (conference proceedings)](https://link.springer.com/book/10.1007/978-3-030-69984-0) 602 | - [Designing Heterogeneous Systems: Large Scale Architectural Exploration Via Simulation](https://ieeexplore.ieee.org/abstract/document/9651152) 603 | - [Reinventing High Performance Computing: Challenges and Opportunities (2022)](https://arxiv.org/pdf/2203.02544.pdf) 604 | - [Challenges in Heterogeneous HPC White Paper (2022)](https://www.etp4hpc.eu/pujades/files/ETP4HPC_WP_Heterogeneous-HPC_20220216.pdf) 605 | - [An Evolutionary Technical & Conceptual Review on High Performance Computing Systems (Dec 2021)](https://kalaharijournals.com/resources/DEC_597.pdf) 606 | - [New Horizons for High-Performance Computing (2022)](https://csdl-downloads.ieeecomputer.org/mags/co/2022/12/09963771.pdf?Expires=1669702667&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jc2RsLWRvd25sb2Fkcy5pZWVlY29tcHV0ZXIub3JnL21hZ3MvY28vMjAyMi8xMi8wOTk2Mzc3MS5wZGYiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2Njk3MDI2Njd9fX1dfQ__&Signature=s3K~-JXyED6vMVT9IKGj7LOhR75CrkQXiqAEsAEQt4zRqTbUFywmSoT10th1CdAaZcfZFuMsg23o2e719FRkCD6flVNB55d5tKyMUp7jUbkUtxnatOWLAKXfE4yQ-zrYQQEWBhtpSLKrTAS1oVmJ00YwkWqLYqCjhFIjW9La5od2SGQZEFZ136bbaGzxLZlED3JlMCMLB54YXKr-Ng1rngV4I9Wi-wSTFyLiA92~fUlk1KPQKU0XjtsMyYMYlt06Ze5H6jcQw4ytJ6c7r7qNJ43ifnsZepWmBywA8lVy2g3joOvZJtVjl~S91R8EZbiyWlYdWBGrO7pPdO6hH48~NQ__&Key-Pair-Id=K12PMWTCQBDMDT) 607 | - [CConfidential High-Performance Computing in the Public Cloud](https://arxiv.org/pdf/2212.02378.pdf) 608 | - [Containerisation for High Performance Computing Systems: Survey and Prospects](https://ieeexplore.ieee.org/abstract/document/9985426) 609 | - [Heterogeneous Computing Systems (2023)](https://arxiv.org/pdf/2212.14418.pdf) 610 | - [Myths and Legends in High-Performance Computing](https://arxiv.org/pdf/2301.02432.pdf) 611 | - [Energy-Aware Scheduling for High-Performance Computing Systems: A Survey](https://www.mdpi.com/1996-1073/16/2/890) 612 | - [Ultimate Physical limits to computation - Seth Lloyd](https://arxiv.org/abs/quant-ph/9908043) 613 | - [Myths and Legends in High-Performance Computing](https://arxiv.org/abs/2301.02432) 614 | - [Abstract Machine Models and Proxy Architectures for Exascale Computing, 2014, Sandia National Laboratories and Lawrence Berkeley National Laboratory](https://www.osti.gov/servlets/purl/1561498) 615 | - [Some thoughts on the environmental impact of High Performance Computing](https://sifflez.org/publications/environment-hpc/) 616 | - [A Research Retrospective on AMD's Exascale Computing Journey](https://dl.acm.org/doi/abs/10.1145/3579371.3589349) 617 | 618 | #### News 619 | - [InsideHPC](https://insidehpc.com/) 620 | - [HPCWire](https://www.hpcwire.com/) 621 | - [NextPlatform](https://www.nextplatform.com) 622 | - [Datacenter Dynamics](https://www.datacenterdynamics.com/en/) 623 | - [Admin Magazine HPC](https://www.admin-magazine.com/HPC/News) 624 | - [Toms hardware](https://www.tomshardware.com/) 625 | - [Tech Radar](https://www.techradar.com/) 626 | - [Phoronix](https://www.phoronix.com/) 627 | - [The Register](https://www.theregister.com/on_prem/hpc/) 628 | 629 | #### Podcasts 630 | - [This week in HPC](https://soundcloud.com/this-week-in-hpc) 631 | - [Preparing Applications for Aurora in the Exascale Era](https://connectedsocialmedia.com/20114/preparing-applications-for-aurora-in-the-exascale-era/) 632 | - [Slurm podcast](https://www.rce-cast.com/index.php/Podcast/rce-10-slurm.html) 633 | - [HPCPodcast](https://insidehpc.com/category/resources/hpc-podcast/) 634 | - [Developer Stories - The path to a career in high performance computing is not always equitable or clear.](https://rseng.github.io/devstories/2024/jay-lofstead/) 635 | - [Developer Stories - HPCToolkit](https://rseng.github.io/devstories/2024/wileam-phan/) 636 | 637 | #### Video Presentations/Courses/Channels 638 | - [Argonne lectures on Extreme Scale Computing 2022](https://www.youtube.com/playlist?list=PLcbxjEfgjpO9OeDu--H9_XqyxPj3MkjdN) 639 | - [Argonne supercomputer tour](https://www.youtube.com/watch?v=UT9HCgp2X3A) 640 | - [Containers in HPC - what they fix and what they break ](https://youtube.com/watch?v=WQTrA4-9ZXk&feature=share) 641 | - [HPC Tech Shorts](https://www.youtube.com/channel/UChSIn5kcWQvJxW17KIjdLVw) 642 | - [CppCon](https://www.youtube.com/user/CppCon/videos) 643 | - [Create a clustering server](https://www.youtube.com/watch?v=4LyL4sNZ1u4) 644 | - [Argonne national lab](https://www.youtube.com/channel/UCfwgjtIQB3puojz_N9ly_Ag) 645 | - [Oak Ridge National Lab](https://www.youtube.com/user/OakRidgeNationalLab) 646 | - [Concurrency in C++20 and Beyond](https://www.youtube.com/watch?v=jozHW_B3D4U) - A. Williams 647 | - [Is Parallel Programming still Hard?](https://www.youtube.com/watch?v=YM8Xy6oKVQg) - P. McKenney, M. Michael, and M. Wong at CppCon 2017 648 | - [The Speed of Concurrency: Is Lock-free Faster?](https://www.youtube.com/watch?v=9hJkWwHDDxs) - Fedor G Pikus in CppCon 2016 649 | - [Expressing Parallelism in C++ with Threading Building Blocks](https://www.youtube.com/watch?v=9Otq_fcUnPE) - Mike Voss at Intel Webinar 2018 650 | - [A Work-stealing Runtime for Rust](https://www.youtube.com/watch?v=4DQakkJ8XLI) - Aaron Todd in Air Mozilla 2017 651 | - [C++11/14/17 atomics and memory model: Before the story consumes you](https://www.youtube.com/watch?v=DS2m7T6NKZQ) - Michael Wong in CppCon 2015 652 | - [The C++ Memory Model](https://www.youtube.com/watch?v=gpsz8sc6mNU) - Valentin Ziegler at C++ Meeting 2014 653 | - [Sharcnet HPC](https://www.youtube.com/channel/UCCRmb5_GMWT2hSlALHlwIMg) 654 | - [Low Latency C++ for fun and profit](https://www.youtube.com/watch?v=BxfT9fiUsZ4) 655 | - [scalane python profiler](https://youtu.be/5iEf-_7mM1k) 656 | - [Kokkos lectures](https://www.youtube.com/watch?v=rUIcWtFU5qM&t=698s) 657 | - [EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)](https://www.youtube.com/watch?v=WpVbcYnFJmQ) 658 | - [The Spack 2022 Roadmap](https://www.youtube.com/watch?v=HyA7RpjoY1k) 659 | - [A Not So Simple Matter of Software | Talk by Turing Award Winner Prof. Jack Dongarra](https://youtu.be/QBCX3Oxp3vw) 660 | - [Vectorization/SIMD intrinsics](https://www.youtube.com/watch?v=x9Scb5Mku1g) 661 | - [New Silicon for Supercomputers: A Guide for Software Engineers](https://www.youtube.com/watch?v=w3xNLj6nRgs&t=197s) 662 | - [TechTechPotato Channel](TechTechPotato) 663 | - [How to write the perfect hash table ](https://www.youtube.com/watch?v=DMQ_HcNSOAI) 664 | - [FosDem 2024 HPC Big Data Conference videos](https://fosdem.org/2024/schedule/track/hpc-big-data-data-science/) 665 | - [Bright Computing Cluster Management Technical Overview](https://www.youtube.com/watch?v=0AxzcZuviW0) 666 | - [What is HPC? An introduction by Canonical](https://www.youtube.com/watch?v=tGIobcyKViI) 667 | - [Slurm job schedular basics](https://www.youtube.com/watch?v=Juo_mb3otJ0) 668 | - [EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)](https://youtu.be/WpVbcYnFJmQ?feature=shared) 669 | - [Warewulf HPC Youtube Channel](https://www.youtube.com/@WarewulfHPC) 670 | - [Scott Meyers: Cpu Caches and Why You Car](https://www.youtube.com/watch?v=WDIkqP4JbkE&t=1s) 671 | 672 | #### Presentation Slides 673 | - [Task based Parallelism and why it's awesome](https://www.fz-juelich.de/ias/jsc/EN/Expertise/Workshops/Conferences/CSAM-2015/Programme/lecture7a_gonnet-pdf.pdf?__blob=publicationFile) - Pedro Gonnet 674 | - [Tuning Slurm Scheduling for Optimal Responsiveness and Utilization](https://slurm.schedmd.com/SUG14/sched_tutorial.pdf) 675 | - [Parallel Programming Models Overview (2020)](https://www.researchgate.net/publication/348187154_Parallel_programming_models_overview_2020) 676 | - [Comparative Analysis of Kokkos and Sycl (Jeff Hammond)](https://www.iwocl.org/wp-content/uploads/iwocl-2019-dhpcc-jeff-hammond-a-comparitive-analysis-of-kokkos-and-sycl.pdf) 677 | - [Hybrid OpenMP/MPI Programming](https://www.nersc.gov/assets/Uploads/NUG2013hybridMPIOpenMP2.pdf) 678 | - [Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean (Google)](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf) 679 | - [Practical Debugging and Performance Engineering](https://orbilu.uni.lu/bitstream/10993/55305/1/Practical_Debugging_and_Performance_Engineering_for_HPC.pdf) 680 | 681 | #### Optimization Case Studies 682 | - [Optimizing a Math Expression Parser in Rust](https://rpallas.xyz/math-parser/) 683 | - [Performance Hints of the Week](https://abseil.io/fast/) 684 | 685 | #### Building Clusters/Virtual Clusters 686 | - [Resources for learning about HPC networks and storage r/HPC](https://www.reddit.com/r/HPC/comments/17o0q5d/resources_for_learning_about_hpc_networks_and/) 687 | - [Slurm for dummies guide](https://github.com/SergioMEV/slurm-for-dummies) 688 | - [Build a cluster under 50k](https://www.reddit.com/r/HPC/comments/srssrt/build_a_minicluster_under_50000/) 689 | - [Build a Beowulf cluster](https://github.com/darshanmandge/Cluster) 690 | - [Build a Raspberry Pi Cluster](https://www.raspberrypi.com/tutorials/cluster-raspberry-pi-tutorial/) 691 | - [Puget Systems](https://www.pugetsystems.com/) 692 | - [Lambda Systems](https://lambdalabs.com/) 693 | - [Titan computers](https://www.titancomputers.com) 694 | - [Temple course on building/maintaining a cluster](https://www.hpc.temple.edu/mhpc/2021/hpc-technology/index.html) 695 | - [Detailed reddit discussion on setting up a small cluster](https://www.reddit.com/r/HPC/comments/xeipt7/setting_up_a_small_hpc_cluster/) 696 | - [Tiny titan - build a really cool pi supercomputer](https://github.com/tinytitan) 697 | - [Turing PI - mini PI cluster off the shelf](https://turingpi.com/product/turing-pi-2-5/) 698 | - [Raspberry Pi Cluster](https://www.raspberrypi.com/tutorials/cluster-raspberry-pi-tutorial/) 699 | - [Building an Intel HPC cluster with OpenHPC](https://cdrdv2-public.intel.com/671501/installguide-openhpc2-centos8-18jul21.pdf) 700 | - [Reddit r/HPC post on building clusters](https://www.reddit.com/r/HPC/comments/11azmhy/wanting_to_setup_a_cluster/) 701 | - [Build a virtual cluster with PelicanHPC](https://sourceforge.net/projects/pelicanhpc/) 702 | - [Building a High-performance Computing Cluster Using FreeBSD](https://people.freebsd.org/~brooks/papers/bsdcon2003/fbsdcluster/) 703 | - [Supermicro GPU racks](https://www.supermicro.com/en/products/gpu) 704 | - [VirtualOrfeo - Virtual HPC Cluster](https://gitlab.com/area7/datacenter/codes/virtualorfeo) 705 | - [Is there a reason to build a raspberry pi clluster](https://www.reddit.com/r/HPC/comments/1bfywk8/is_there_ever_a_reason_to_build_a_raspberry_pi/) 706 | - [Building a NVIDIA Jetson Cluster](https://www.hackster.io/shahizat/how-to-build-nvidia-jetson-hpc-cluster-using-slurm-ed61a7) 707 | - [Building your own HPC using ebay parts](https://www.reddit.com/r/HPC/comments/1m7la7z/building_my_own_hpc_using_ebay_parts_beginner_tips/) 708 | - [Magic Castle - Terraform modules to replicate HPC in cloud](https://github.com/ComputeCanada/magic_castle) 709 | 710 | #### Forums 711 | - [r/hpc](https://www.reddit.com/r/HPC/) 712 | - [r/homelab](https://www.reddit.com/r/homelab/) 713 | - [r/slurm](https://www.reddit.com/r/SLURM/) 714 | 715 | #### Careers/Jobs 716 | - [HPC University Careers search](http://hpcuniversity.org/careers/) 717 | - [HPC wire career site](https://careers.hpcwire.com/) 718 | - [HPC wire job postings](https://jobs.hpcwire.com/) 719 | - [HPC certification](https://www.hpc-certification.org/) 720 | - [HPC SysAdmin Jobs (reddit)](https://www.reddit.com/r/HPC/comments/w5eu66/systems_administrator_systems_engineer_jobs/) 721 | - [The United States Research Software Engineer Association](https://us-rse.org/) 722 | - [NCSA Internship](https://wiki.ncsa.illinois.edu/display/NCSACIP/NCSA+Internship+Program+for+CI+Professionals+Home) 723 | - [AI and Future HPC Job Prospect](https://www.reddit.com/r/HPC/comments/12anrgq/hpc_future_career_prospects/) 724 | - [HPC sys admin career (reddit)](https://www.reddit.com/r/HPC/comments/16jkqlv/it_support_for_an_academic_hpc_cluster_as_a_career/) 725 | 726 | #### Membership Clubs 727 | - [Association for Computing Machinery](acm.org) 728 | - [ETP4HPC](https://www.etp4hpc.eu/) 729 | - [The SIGHPC Systems Professionals](https://sighpc-syspros.org/) 730 | 731 | #### Blogs 732 | - [1024 Cores](http://www.1024cores.net/) - Dmitry Vyukov 733 | - [The Black Art of Concurrency](https://www.internalpointers.com/post-group/black-art-concurrency) - Internal Pointers 734 | - [Cluster Monkey](https://www.clustermonkey.net/) 735 | - [Johnathon Dursi](https://www.dursi.ca/) 736 | - [Arm Vendor HPC blog](https://community.arm.com/developer/tools-software/hpc/b/hpc-blog) 737 | - [HPC Notes](https://www.hpcnotes.com/) 738 | - [Brendan Gregg Performance Blog](https://www.brendangregg.com/blog/index.html) 739 | - [Performance engineering blog](https://pramodkumbhar.com) 740 | - [Concurrency Freaks](https://concurrencyfreaks.blogspot.com/) 741 | - [Servers@Home](https://servers.hydrology.cc/blog/) 742 | - [Dr.Bandwith Blog](https://sites.utexas.edu/jdm4372/2010/10/01/welcome-to-dr-bandwidths-blog/) 743 | - [Johnny's Software Lab](https://johnnysswlab.com/) 744 | - [Daniel Lemire Blog](https://lemire.me/blog/) 745 | - [Gigabyte HPC Blog](https://www.gigabyte.com/) 746 | 747 | #### Journals 748 | - [IEEE Transactions on Parallel and Distributed Systems (TPDS)](https://www.computer.org/csdl/journal/td) 749 | - [Journal of Parallel and Distributed Computing](https://www.journals.elsevier.com/journal-of-parallel-and-distributed-computing) 750 | 751 | #### Conferences 752 | - [ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP)](https://ppopp19.sigplan.org/home) 753 | - [ACM Symposium on Parallel Algorithms and Architectures (SPAA)](https://spaa.acm.org/) 754 | - [SC conference (SC)](https://supercomputing.org/) 755 | - [IEEE International Parallel and Distributed Processing Symposium (IPDPS)](http://www.ipdps.org/) 756 | - [International Conference on Parallel Processing (ICPP)](https://www.hpcs.cs.tsukuba.ac.jp/icpp2019/) 757 | - [IEEE High Performance Extreme Computing Conference (HPEC)](https://ieee-hpec.org/cfp.htm) 758 | - [FosDem](https://fosdem.org/) 759 | - [Energy HPC Conference](https://www.energyhpc.rice.edu/) 760 | 761 | #### Communities/Chat Groups 762 | - [HPC Social Discord server](https://hpc.social/projects/chat/) 763 | - [HPC Social slack group](https://hpcsocial.slack.com/) 764 | - [HPC Social](https://hpc.social/) 765 | - [Beowulf Mailing List](https://www.beowulf.org/) 766 | - [Society of Research Software Engineering](https://society-rse.org/get-involved/) 767 | - [Women In HPC](https://womeninhpc.org/) 768 | - [HPC Hallway](https://hpc-hallway.github.io/The-Hallway/) 769 | - [The High Performance Computing Special Interest Group](https://hpc-sig.org.uk/) 770 | - [SigHPC](https://www.sighpc.org/) 771 | 772 | #### Twitters 773 | - [Top500](https://twitter.com/top500supercomp?s=20) 774 | - [HPE HPC](https://twitter.com/hpe_hpc) 775 | - [HPC Wire](https://twitter.com/HPCwire) 776 | - [Rookie HPC](https://twitter.com/RookieHPC?s=20) 777 | - [HPC_Guru](https://twitter.com/HPC_Guru?s=20&t=jHjVtUaZhz4s6Rq62IAmYg) 778 | - [Jeff Hammond](https://twitter.com/science_dot) 779 | 780 | #### Consulting 781 | - [Advanced Clustering](https://www.advancedclustering.com/) 782 | - [Redline Performance](https://redlineperf.com/) 783 | - [R systems](http://rsystemsinc.com/) 784 | 785 | #### Interview Preparation 786 | - [Reddit Entry Level HPC interview help](https://www.reddit.com/r/HPC/comments/nhpdfb/entrylevel_hpc_job_interview/) 787 | - [Reddit HPC Admin Interview help](https://www.reddit.com/r/HPC/comments/1lq3pjp/need_advice_upcoming_hpc_admin_interview/) 788 | 789 | #### Organizations 790 | - [Prace](https://prace-ri.eu/) 791 | - [Xsede](https://www.xsede.org/) 792 | - [Compute Canada](https://www.computecanada.ca/) 793 | - [Riken CSS](https://www.riken.jp/en/research/labs/r-ccs/) 794 | - [Pawsey](https://pawsey.org.au/) 795 | - [International Data Corporation](https://www.idc.com/) 796 | - [List of Federally funded research and development centers](https://en.wikipedia.org/wiki/Federally_funded_research_and_development_centers) 797 | - [The HPC.NRW competence network of North-Rhine-Westphalia](https://hpc.dh.nrw/) 798 | 799 | #### Interesting r/HPC posts 800 | - [finding a supercomputer to use for research](https://www.reddit.com/r/HPC/comments/19e58z7/how_do_i_go_about_finding_a_supercomputer_to_use/) 801 | 802 | #### Misc. Wikis 803 | - [Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law) 804 | - [HPC Wiki](https://hpc-wiki.info/hpc/HPC_Wiki) 805 | - [FLOPS](https://en.wikipedia.org/wiki/FLOPS) 806 | - [Computational complexity of math operations](https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations) 807 | - [Many Task Computing](https://en.wikipedia.org/wiki/Many-task_computing) 808 | - [High Throughput Computing](https://en.wikipedia.org/wiki/High-throughput_computing) 809 | - [Parallel Virtual Machine](https://en.wikipedia.org/wiki/Parallel_Virtual_Machine) 810 | - [OSI Model](https://en.wikipedia.org/wiki/OSI_model) 811 | - [Workflow management](https://en.wikipedia.org/wiki/Scientific_workflow_system) 812 | - [Compute Canada Documentation](https://docs.computecanada.ca/wiki/Compute_Canada_Documentation) 813 | - [Network Interface Controller (NIC)](https://en.wikipedia.org/wiki/Network_interface_controller) 814 | - [Just in time compilation](https://en.wikipedia.org/wiki/Just-in-time_compilation) 815 | - [List of distributed computing projects](https://en.wikipedia.org/wiki/List_of_distributed_computing_projects) 816 | - [Computer cluster](https://en.wikipedia.org/wiki/Computer_cluster) 817 | - [Quasi-opportunistic supercomputing](https://en.wikipedia.org/wiki/Quasi-opportunistic_supercomputing) 818 | - [Limits of Computation](https://en.wikipedia.org/wiki/Limits_of_computation) 819 | - [Bremermann's Limit](https://en.wikipedia.org/wiki/Bremermann%27s_limit) 820 | - [Concurrency patterns](https://en.wikipedia.org/wiki/Concurrency_pattern) 821 | - [Parallel Computing](https://en.wikipedia.org/wiki/Parallel_computing) 822 | - [Server Management](https://wiki.hydrology.cc/en/home) 823 | 824 | #### Misc. Papers/Articles 825 | - [Advanced Parallel Programming in C++](https://www.diehlpk.de/assets/modern_cpp.pdf) 826 | - [Tools for scientific computing](https://arxiv.org/pdf/2108.13053.pdf) 827 | - [Quantum Computing for High Performance Computing](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9537178) 828 | - [Benchmarking data science: Twelve ways to lie with statistics and performance on parallel computers.](http://ww.unixer.de/publications/img/hoefler-12-ways-data-science-preprint.pdf) 829 | - [Establishing the IO500 Benchmark](https://www.vi4io.org/_media/io500/about/io500-establishing.pdf) 830 | - [NVIDIA High Performance Computing articles](https://research.nvidia.com/research-area/high-performance-computing) 831 | - [Let's write a superoptimizer](https://austinhenley.com/blog/superoptimizer.html) 832 | - [Why I think C++ is still a desirable coding platform compared to Rust](https://lucisqr.substack.com/p/why-i-think-c-is-still-a-very-attractive) 833 | - [The State of Fortran (arxiv paper 2022)](https://arxiv.org/abs/2203.15110) 834 | - [50 years later, is two phase locking still the best](https://concurrencyfreaks.blogspot.com/2023/09/50-years-later-is-two-phase-locking.html) 835 | - [Estimating your memory bandwith](https://lemire.me/blog/2024/01/13/estimating-your-memory-bandwidth/) 836 | 837 | #### Misc. Repos 838 | - [Build a Beowulf cluster](https://github.com/darshanmandge/Cluster) 839 | - [libsc - Supercomputing library](https://github.com/cburstedde/libsc) 840 | - [xbyak jit assembler](https://github.com/herumi/xbyak) 841 | - [cpufetch - pretty cpu info fetcher](https://github.com/Dr-Noob/cpufetch) 842 | - [RRZE-HPC](https://github.com/RRZE-HPC) 843 | - [Argonne Github](https://github.com/Argonne-National-Laboratory) 844 | - [Argonne Leadership Computing Facility](https://github.com/argonne-lcf) 845 | - [Oak Ridge National Lab Github](https://github.com/ORNL) 846 | - [Compute Canada](https://github.com/ComputeCanada) 847 | - [HPCInfo by Jeff Hammond](https://github.com/jeffhammond/HPCInfo) 848 | - [Texas Advanced Computing Center (TACC) Github](https://github.com/TACC) 849 | - [LANL HPC Github](https://github.com/hpc) 850 | - [Rust in HPC](https://github.com/westernmagic/rust-in-hpc) 851 | - [University of Buffalo - Center for Computational Research](https://github.com/ubccr) 852 | - [Center for High Performance Computing - University of Utah](https://github.com/CHPC-UofU) 853 | - [Top500 Supercomputer Data Analysis](https://github.com/glennklockwood/top500-data) 854 | 855 | #### Misc. Theses 856 | - [Rust programming language in the high-performance computing environment](https://www.research-collection.ethz.ch/handle/20.500.11850/474922) 857 | 858 | #### Misc. 859 | - [Exascale Project](https://www.exascaleproject.org/) 860 | - [Pocket HPC Survival Guide](https://tin6150.github.io/psg/lsf.html) 861 | - [HPC Summer school](https://www.ihpcss.org/) 862 | - [Overview of all linear algebra packages](http://www.netlib.org/utk/people/JackDongarra/la-sw.html) 863 | - [Latency numbers](http://norvig.com/21-days.html#answers) 864 | - [Nvidia HPC benchmarks](https://ngc.nvidia.com/catalog/containers/nvidia:hpc-benchmarks) 865 | - [Intel Intrinsics Guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#) 866 | - [AWS Cloud calculator](https://calculator.aws/) 867 | - [Quickly benchmark C++ functions](https://quick-bench.com/) 868 | - [LLNL Software repository](https://software.llnl.gov/) 869 | - [Boinc - volunteer computing projects](https://boinc.berkeley.edu/projects.php) 870 | - [Prace Training Events](https://events.prace-ri.eu/category/2/) 871 | - [Nice discussion on FlameGraph profiling](https://stackoverflow.com/questions/27842281/unknown-events-in-nodejs-v8-flamegraph-using-perf-events/27867426#27867426) 872 | - [Nice discussion on parts of a supercomputer on reddit](https://www.reddit.com/r/HPC/comments/11elh93/job_node_socket_task_runner_device_thread_logical/) 873 | - [Technical Report on C++ performance](https://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf) 874 | - [BOINC Compute for science](https://boinc.berkeley.edu/) 875 | - [Count prime numbers using MPI](https://people.sc.fsu.edu/~jburkardt/c_src/prime_mpi/prime_mpi.html) 876 | - [How to build your LEGO Scafell Pike Supercomputer](https://www.youtube.com/watch?v=m499o5rLh38) 877 | 878 | #### Games/Challenges 879 | - [Deadlock empire - practice concurrency](https://github.com/deadlockempire/deadlockempire.github.io) 880 | - [Sad Server - practice linux server management](https://sadservers.com/scenarios) 881 | - [Vim Adventures](https://vim-adventures.com/) 882 | 883 | ## Other Curated Lists 884 | - [Awesome Cloud HPC](https://github.com/kjrstory/awesome-cloud-hpc) 885 | - [Parallel Computing Guide](https://github.com/mikeroyal/Parallel-Computing-Guide) 886 | - [Awesome Parallel Computing](https://github.com/taskflow/awesome-parallel-computing) 887 | - [Princeton resources on OpenMP](https://researchcomputing.princeton.edu/education/external-online-resources/openmp) 888 | - [Awesome HPC](https://github.com/dstdev/awesome-hpc/) 889 | - [Sig HPC Education](https://sighpceducation.acm.org/resources/hpcresources/) 890 | - [Fortran Codes On Github](https://github.com/Beliavsky/Fortran-code-on-GitHub) 891 | - [Fortran Tools](https://github.com/Beliavsky/Fortran-Tools) 892 | 893 | ## Acknowledgements 894 | 895 | This repo started from the great curated list https://github.com/taskflow/awesome-parallel-computing 896 | 897 | 898 | --------------------------------------------------------------------------------