├── hpc.png
└── README.md
/hpc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/trevor-vincent/awesome-high-performance-computing/HEAD/hpc.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | A curated list of awesome high performance computing resources.
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 | ## Table of Contents
14 |
15 | - [General Info](#general-info)
16 | - [Software](#software)
17 | - [Hardware](#hardware)
18 | - [People](#people)
19 | - [Resources](#resources)
20 | - [Other Curated Lists](#other-curated-lists)
21 | - [Acknowledgements](#acknowledgements )
22 |
23 | ## General Info
24 |
25 | ### Most Recent List of the Top500 Supercomputers
26 | - [Top500 (Nov. 2025)](https://www.top500.org/lists/top500/2025/11/)
27 | - [HPCG Top500 (Nov. 2025)](https://www.top500.org/lists/hpcg/2025/11/)
28 | - [Green500 (Nov. 2025)](https://www.top500.org/lists/green500/2025/11/)
29 | - [io500](https://io500.org/)
30 |
31 | ### History
32 | - [History of Supercomputing (Wikipedia)](https://en.wikipedia.org/wiki/History_of_supercomputing)
33 | - [History of Parallel Computing (Wikipedia)](https://en.wikipedia.org/wiki/Parallel_computing#History)
34 | - [History of the Top500 (Wikipedia)](https://en.wikipedia.org/wiki/TOP500)
35 | - [History of LLNL Computing](https://computing.llnl.gov/about/machine-history)
36 | - [The Supermen: The Story of Seymour Cray ... (1997)](https://www.amazon.ca/Supermen-Seymour-Technical-Wizards-Supercomputer/dp/0471048852/ref=sr_1_1?crid=1IOWC3IOYWPOP&keywords=seymour+cray&qid=1690959561&sprefix=seymour+cray%2Caps%2C88&sr=8-1)
37 | - [Unmatched - 50 Years of Supercomputing (2023)](https://www.routledge.com/Unmatched-50-Years-of-Supercomputing/Barkai/p/book/9780367479619)
38 |
39 | ### Trends
40 | - [Trends in HPC for AI workloads](https://epochai.org/trends)
41 |
42 | ## Software
43 |
44 | #### Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators
45 | - [alpaka](https://github.com/alpaka-group/alpaka) - The alpaka library is a header-only C++17 abstraction library for accelerator development
46 | - [async-rdma](https://github.com/datenlord/async-rdma) - A framework for writing RDMA applications with high-level abstraction and asynchronous APIs
47 | - [CANN](https://developer.huawei.com/consumer/en/doc/hiai-guides/introduction-0000001051486804) - Compute Architecture for Neural Networks for Huawei Ascend GPUs
48 | - [CAF](https://github.com/actor-framework/actor-framework) - An Open Source Implementation of the Actor Model in C++
49 | - [Chapel](https://chapel-lang.org/) - A Programming Language for Productive Parallel Computing on Large-scale Systems
50 | - [Charm++](http://charm.cs.illinois.edu/research/charm) - Parallel Programming with Migratable Objects
51 | - [Cilk Plus](https://www.cilkplus.org/) - C/C++ Extension for Data and Task Parallelism
52 | - [Codon](https://github.com/exaloop/codon) - high-performance Python compiler that compiles Python code to native machine code without any runtime overhead
53 | - [CUDA](https://developer.nvidia.com/cuda-toolkit) - High performance NVIDIA GPU acceleration
54 | - [dask](https://dask.org) - Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
55 | - [DeepSpeed](https://github.com/microsoft/DeepSpeed) - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference
56 | - [DeterminedAI](https://www.determined.ai/) - Distributed deep learning
57 | - [Dispenso](https://github.com/facebookincubator/dispenso) - Meta/facebook C++ Task Library
58 | - [FastFlow](https://github.com/fastflow/fastflow) - High-performance Parallel Patterns in C++
59 | - [Galois](https://github.com/IntelligentSoftwareSystems/Galois) - A C++ Library to Ease Parallel Programming with Irregular Parallelism
60 | - [Halide](https://halide-lang.org/index.html#gettingstarted) - A language for fast, portable computation on images and tensors
61 | - [Heteroflow](https://github.com/Heteroflow/Heteroflow) - Concurrent CPU-GPU Task Programming using Modern C++
62 | - [highway](https://github.com/google/highway) - Performance portable SIMD intrinsics
63 | - [HIP](https://github.com/ROCm-Developer-Tools/HIP) - HIP is a C++ Runtime API and Kernel Language for AMD/Nvidia GPU
64 | - [HPC-X](https://developer.nvidia.com/networking/hpc-x) - Nvidia implementation of MPI
65 | - [HPX](https://github.com/STEllAR-GROUP/hpx) - A C++ Standard Library for Concurrency and Parallelism
66 | - [Horovod](https://github.com/horovod/horovod) - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
67 | - [ISPC](https://ispc.github.io/) - An open-source compiler for high-performance SIMD programming on the CPU and GPU
68 | - [Intel ISPC](https://github.com/ispc/ispc) - SPMD compiler
69 | - [Intel TBB](https://www.threadingbuildingblocks.org/) - Threading Building Blocks
70 | - [joblib](https://joblib.readthedocs.io/en/latest/why.html) - Data-flow programming for performance (python)
71 | - [Kompute](https://github.com/KomputeProject/kompute) - The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)
72 | - [Kokkos](https://github.com/kokkos/kokkos) - A C++ Programming Model for Writing Performance Portable Applications on HPC platforms
73 | - [Kubeflow MPI Operator](https://github.com/kubeflow/mpi-operator) - MPI Operator for Kubeflow
74 | - [Legate](https://github.com/nv-legate/legate.numpy) - Nvidia replacement for numpy based on Legion
75 | - [Legion](https://github.com/StanfordLegion/legion) - Distributed heterogeneous programming library
76 | - [MAGMA](https://developer.nvidia.com/magma) - Next generation linear algebra (LA) GPU accelerated libraries
77 | - [Merlin](https://merlin.readthedocs.io/en/latest/) - A distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations
78 | - [Metal](https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu) - Apple's GPU API
79 | - [Microsoft MPI](https://docs.microsoft.com/en-us/message-passing-interface/microsoft-mpi) - Microsoft's implementation of MPI
80 | - [MOGSLib](https://github.com/ECLScheduling/MOGSLib) - User defined schedulers
81 | - [mpi4jax](https://github.com/mpi4jax/mpi4jax) - Zero-copy mpi for jax arrays
82 | - [mpi4py](https://mpi4py.readthedocs.io/en/stable/) - Python bindings for MPI
83 | - [MPI](https://www.open-mpi.org/) - OpenMPI implementation of the Message passing interface
84 | - [MPI](https://www.mpich.org/) - MPICH implementation of the Message passing interface
85 | - [MPI Standardization Forum](https://www.mpi-forum.org/) - Forum for MPI standardization
86 | - [MPAVICH](https://mvapich.cse.ohio-state.edu/) - Implementation of MPI
87 | - [NCCL](https://developer.nvidia.com/nccl) - The NVIDIA Collective Communication Library for multi-GPU and multi-node communication
88 | - [NVSHMEM](https://developer.nvidia.com/nvshmem) - GPU-accelerated implementation of the OpenSHMEM programming model developed by NVIDIA
89 | - [cuNumeric](https://developer.nvidia.com/cunumeric) - GPU drop-in for numpy
90 | - [stdpar](https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/) - GPU accelerated C++ from NVIDIA
91 | - [numba](https://numba.pydata.org/) - A JIT compiler that translates a subset of Python into fast machine code
92 | - [oneAPI](https://www.oneapi.io/) - A unified, multiarchitecture, multi-vendor programming model
93 | - [OpenACC](https://www.openacc.org/) - "OpenMP for GPUs"
94 | - [OpenCilk](https://www.opencilk.org/) - MIT continuation of Cilk Plus
95 | - [OpenMP](https://www.openmp.org/) - Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran
96 | - [OpenSHMEM](http://openshmem.org/site/About) - OpenSHMEM is a one-sided, PGAS-based parallel programming model enabling direct remote memory access for high-performance computing
97 | - [PVM](https://www.csm.ornl.gov/pvm/) - Parallel Virtual Machine: A predecessor to MPI for distributed computing
98 | - [PMIX](https://pmix.github.io/standard) - Standard for process management
99 | - [Pollux](https://github.com/polluxio/pollux-payload) - Message Passing Cloud orchestrator
100 | - [Pyfi](https://github.com/radiantone/pyfi) - Distributed flow and computation system
101 | - [Pyper](https://github.com/pyper-dev/pyper) - concurrent python made simple
102 | - [RAJA](https://github.com/LLNL/RAJA) - Architecture and programming model portability for HPC applications
103 | - [RaftLib](https://github.com/RaftLib/RaftLib) - A C++ Library for Enabling Stream and Dataflow Parallel Computation
104 | - [ray](https://www.ray.io/) - Scale AI and Python workloads from reinforcement learning to deep learning
105 | - [ROCM](https://rocmdocs.com/en/latest/) - First open-source software development platform for HPC/Hyperscale-class GPU computing
106 | - [RS MPI](https://rsmpi.github.io/rsmpi/mpi/index.html) - Rust bindings for MPI
107 | - [Scalix](https://github.com/NAGAGroup/Scalix) - Data parallel computing framework
108 | - [Simgrid](https://simgrid.org/) - Simulate cluster/HPC environments
109 | - [SkelCL](https://skelcl.github.io/) - A Skeleton Library for Heterogeneous Systems
110 | - [STAPL](https://parasol.tamu.edu/stapl/) - Standard Template Adaptive Parallel Programming Library in C++
111 | - [STLab](http://stlab.cc/libraries/concurrency/) - High-level Constructs for Implementing Multicore Algorithms with Minimized Contention
112 | - [SYCL](https://www.khronos.org/sycl/) - C++ Abstraction layer for heterogeneous devices
113 | - [Taichi](https://github.com/taichi-dev/taichi) - Parallel programming language for high-performance numerical computations in Python
114 | - [Taskflow](https://github.com/taskflow/taskflow) - A Modern C++ Parallel Task Programming Library
115 | - [The Open Community Runtime](https://wiki.modelado.org/Open_Community_Runtime) - Specification for Asynchronous Many Task systems
116 | - [Transwarp](https://github.com/bloomen/transwarp) - A Header-only C++ Library for Task Concurrency
117 | - [Triton](https://triton-lang.org/main/index.html) - Triton is a language and compiler for parallel programming
118 | - [Tuplex](https://tuplex.cs.brown.edu/) - Blazing fast python data science
119 | - [UCX](https://github.com/openucx/ucx#using-ucx) - Optimized production proven-communication framework
120 | - [Zluda](https://github.com/vosen/ZLUDA) - Run unmodified CUDA applications with near-native performance on Intel AMD GPUs.
121 | - [HyperQueue](https://github.com/It4innovations/hyperqueue) - HyperQueue is a tool designed to simplify execution of large workflows (task graphs) on HPC clusters.
122 |
123 | #### Cluster Hardware Discovery Tools
124 | - [cpuid](https://en.wikipedia.org/wiki/CPUID) - A software instruction available on Intel, AMD, and other processors that can be used to determine processor type and features.
125 | - [cpuid instruction note](https://www.scss.tcd.ie/~jones/CS4021/processor-identification-cpuid-instruction-note.pdf) - A detailed note on the CPUID instruction used for processor identification.
126 | - [cpufetch](https://github.com/Dr-Noob/cpufetch) - A simple yet fancy CPU architecture fetching tool.
127 | - [gpufetch](https://github.com/Dr-Noob/gpufetch) - A tool similar to cpufetch, but for fetching GPU architecture.
128 | - [intel cpuinfo](https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/command-reference/cpuinfo.html) - Intel tool providing information about the characteristics of Intel CPUs.
129 | - [Likwid](https://github.com/RRZE-HPC/likwid) - Provides all information about the supercomputer/cluster.
130 | - [LIKWID.jl](https://juliaperf.github.io/LIKWID.jl/dev/) - Julia wrapper for LIKWID.
131 | - [openmpi hwloc](https://www.open-mpi.org/projects/hwloc/) - Portable Hardware Locality (hwloc) software project.
132 | - [PRK - Parallel Research Kernels](https://github.com/ParRes/Kernels) - A collection of kernels for parallel programming research.
133 |
134 | #### Cluster Management/Tools/Schedulers/Stacks
135 | - [ClusterVisor](https://www.advancedclustering.com/products/software/clustervisor-2/) - Cluster management tool by Advanced Clustering.
136 | - [BeeGFS](http://beegfs.io/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf) - A parallel file system designed for performance-critical environments.
137 | - [Bluebanquise](https://github.com/bluebanquise/bluebanquise) - An open-source cluster management tool.
138 | - [NVIDIA Base Command Manager (formerly Bright Cluster Manager)](https://docs.nvidia.com/base-command-manager/index.html) - Software for deploying and managing HPC and AI server clusters.
139 | - [Ceph](https://ceph.io/en/) - An open-source distributed storage system.
140 | - [DeepOps](https://github.com/NVIDIA/deepops) - Nvidia's GPU infrastructure and automation tools for Kubernetes and Slurm clusters.
141 | - [E4S - The Extreme Scale HPC Scientific Stack](https://e4s-project.github.io/) - A collection of open-source software packages for HPC environments.
142 | - [Easybuild](https://docs.easybuild.io/en/latest/) - A package manager for HPC/supercomputers.
143 | - [EESSI](https://www.eessi.io) - A shared stack of scientific software installations.
144 | - [Flux framework](https://flux-framework.org/) - A framework for high-performance computing clusters.
145 | - [fpsync](http://www.fpart.org/fpsync/) - A tool for fast parallel data transfer using fpart and rsync.
146 | - [GPFS](https://en.wikipedia.org/wiki/GPFS) - A high-performance parallel file system developed by IBM.
147 | - [Guix](https://hpc.guix.info/) - A package manager for HPC/supercomputers.
148 | - [Intel DAOS](https://daos.io) - A software-defined scale-out object store for HPC applications.
149 | - [LSF](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=lsf-batch-jobs-tasks) - A batch system for HPC and distributed computing environments.
150 | - [Lmod](https://lmod.readthedocs.io/en/latest/) - A Lua-based module system for software environment management on HPC systems.
151 | - [Lustre Parallel File System](https://www.lustre.org/) - A high-performance distributed filesystem for large-scale cluster computing.
152 | - [moosefs](https://moosefs.com/) - A fault-tolerant, highly available, distributed file system.
153 | - [NetApp](www.netapp.com) - Intelligent data infrastructure for various workloads.
154 | - [OKA](https://oka.how) - Analytics and reporting tool for HPC schedulers: helps administrators and users to understand how compute resources are used, and optimize their usage.
155 | - [Open Cluster Scheduler](https://github.com/hpc-gridware/clusterscheduler/) - A scalable HPC/AI workload manager based on SGE.
156 | - [OpenHPC](https://openhpc.community/) - A community-led set of HPC components.
157 | - [OpenOnDemand](https://openondemand.org/) - A web portal for accessing supercomputing resources.
158 | - [OpenPBS](https://www.openpbs.org/) - A software for workload management and job scheduling.
159 | - [OpenXdMod](https://open.xdmod.org/7.5/index.html) - A tool for managing high-performance computing resources.
160 | - [RADIUSS](https://computing.llnl.gov/projects/radiuss) - Rapid Application Development via an Institutional Universal Software Stack.
161 | - [rocks](http://www.rocksclusters.org/) - An open-source Linux cluster distribution.
162 | - [Ruse](https://github.com/JanneM/Ruse) - A tool for managing software environments in HPC clusters.
163 | - [SGE](http://star.mit.edu/cluster/docs/0.93.3/guides/sge.html) - A resource management software for large clusters of computers.
164 | - [Slurm](https://slurm.schedmd.com/overview.html) - A cluster management and job scheduling system for Linux clusters.
165 | - [Spectrum LSF](https://www.ibm.com/products/hpc-workload-management) - Workload management platform and job scheduler for distributed high performance computing (HPC)
166 | - [Spack](https://spack.io/) - A package manager for HPC/supercomputers.
167 | - [sstack](https://gitlab.com/nmsu_hpc/sstack) - A tool to install multiple software stacks such as Spack, EasyBuild, and Conda.
168 | - [Starfish](https://starfishstorage.com/) - Unstructured data management and metadata solution for files and objects.
169 | - [Warewulf](https://warewulf.lbl.gov/) - An operating system provisioning system and cluster management tool.
170 | - [Velda](https://github.com/velda-io/velda) - A modern cluster management and job scheduler, with personalizable dev-containers and scale-to-cloud capabilities.
171 | - [xCat](https://xcat.org/) - A distributed computing management and provisioning tool.
172 | - [XDMoD](https://supremm.xdmod.org/10.0/supremm-overview.html) - An open-source tool for managing high-performance computing resources.
173 | - [Globus Connect](https://www.globus.org/globus-connect) - A fast data transfer tool between supercomputers.
174 | - [Slurm Web](https://slurm-web.com/) - Open source web dashboard for Slurm HPC clusters.
175 |
176 | #### HPC-specific Operating Systems
177 | - [Kitten](https://www.sandia.gov/app/uploads/sites/210/2022/11/pedretti_lanl11.pdf) - A lightweight kernel designed for high-performance computing. It focuses on providing low noise and predictable performance for HPC applications.
178 | - [McKernel](https://github.com/RIKEN-SysSoft/mckernel) - A hybrid kernel that combines Linux and a lightweight kernel designed to provide high performance for HPC applications.
179 | - [mOS](http://cs.iit.edu/~khale/docs/mos.pdf) - A specialized operating system for high-performance computing, designed to support large-scale, manycore processors.
180 |
181 | #### Development/Workflow/Monitoring Tools for HPC
182 |
183 | - [Apache Airflow](https://airflow.apache.org/) - A platform to programmatically author, schedule, and monitor workflows.
184 | - [Apptainer (formerly Singularity)](https://singularity.lbl.gov/) - Container platform designed for scientific and high-performance computing (HPC) environments.
185 | - [arbiter2](https://github.com/CHPC-UofU/arbiter2) - Monitors and protects interactive nodes with cgroups.
186 | - [Charliecloud](https://charliecloud.io/) - Lightweight container solution for high-performance computing (HPC).
187 | - [Docker](https://www.docker.com/) - A set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.
188 | - [genv](https://github.com/run-ai/genv) - GPU Environment Management for managing and scheduling GPU resources.
189 | - [Grafana](https://github.com/grafana/grafana) - Open-source platform for monitoring and observability, visualizing metrics.
190 | - [grpc](https://grpc.io/) - A high-performance, open-source universal RPC framework.
191 | - [HPC Rocket](https://github.com/SvenMarcus/hpc-rocket) - Allows submitting Slurm jobs in Continuous Integration (CI) pipelines.
192 | - [HTCondor](https://research.cs.wisc.edu/htcondor/) - An open-source high-throughput computing software framework.
193 | - [Jacamar-ci](https://gitlab.com/ecp-ci/jacamar-ci/-/blob/develop/README.md) - CI/CD tool designed for HPC and scientific computing workflows.
194 | - [Kubernetes](https://kubernetes.io/) - An open-source system for automating deployment, scaling, and management of containerized applications.
195 | - [nextflow](https://www.nextflow.io/) - A workflow framework to deploy data-driven computational pipelines.
196 | - [perun](https://github.com/Helmholtz-AI-Energy/perun) - Energy monitor for HPC systems, focusing on performance and energy efficiency.
197 | - [Prefect](https://www.prefect.io/) - A workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
198 | - [Prometheus](https://prometheus.io/) - An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
199 | - [redun](https://github.com/insitro/redun) - Workflow engine that emphasizes simplicity, reliability, and scalability.
200 | - [remora](https://github.com/TACC/remora) - Tool for monitoring and reporting the performance of batch jobs on HPC systems.
201 | - [ruptime](https://github.com/alexmyczko/ruptime) - A utility for monitoring the status of computational jobs and systems.
202 | - [Slurmvision slurm dashboard](https://github.com/Ruunyox/slurmvision) - A dashboard for monitoring and managing Slurm jobs.
203 | - [slurm docker cluster](https://github.com/giovtorres/slurm-docker-cluster) - A Slurm cluster implemented using Docker containers, for development and testing.
204 | - [snakemake](https://snakemake.readthedocs.io/en/stable/) - A workflow management system that reduces the complexity of creating reproducible and scalable data analyses.
205 | - [Stui slurm dashboard for the terminal](https://github.com/mil-ad/stui) - A terminal-based UI for managing and monitoring Slurm clusters.
206 | - [Vaex](https://github.com/vaexio/vaex) - A Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets.
207 |
208 |
209 | #### Debugging Tools for HPC
210 |
211 | - [ddt](https://www.arm.com/products/development-tools/server-and-hpc/forge/ddt) - A powerful debugger designed for developers to solve complex problems on multi-threaded and multi-process environments in HPC.
212 | - [marmot MPI checker](https://www.lrz.de/services/software/parallel/marmot/) - A tool for detecting and reporting issues in MPI (Message Passing Interface) applications.
213 | - [python debugging tools](https://wiki.python.org/moin/PythonDebuggingTools) - A collection of tools for debugging Python applications, including pdb and other utilities.
214 | - [seer modern gui for gdb](https://github.com/epasveer/seer) - A graphical user interface for GDB, aiming to improve the debugging experience with modern features and visuals.
215 | - [Summary of C/C++ debugging tools](http://pramodkumbhar.com/2018/06/summary-of-debugging-tools/) - An overview of various debugging tools available for C/C++ applications, focusing on HPC environments.
216 | - [totalview](https://totalview.io/) - A comprehensive source code analysis and debugging tool designed for complex software running on HPC systems, supporting a wide range of languages and architectures.
217 |
218 |
219 | #### Performance/Benchmark Tools for HPC
220 |
221 | - [demonspawn](https://github.com/TACC/demonspawn) - A framework for automated execution of benchmarks and simulations, designed for HPC environments.
222 | - [Google benchmark](https://github.com/google/benchmark) - A microbenchmark support library for C++ that tracks performance over time.
223 | - [HPL benchmark](https://www.netlib.org/benchmark/hpl/) - The High Performance Linpack Benchmark for measuring floating-point computing power of systems.
224 | - [kerncraft](https://github.com/RRZE-HPC/kerncraft) - A tool for analytical modeling of loop performance and cache behavior on HPC systems.
225 | - [NASA parallel benchmark suite](https://www.nas.nasa.gov/software/npb.html) - A set of benchmarks designed to evaluate the performance of parallel supercomputers.
226 | - [papi](https://icl.utk.edu/papi/) - Provides standard APIs for accessing hardware performance counters available on modern microprocessors.
227 | - [scalasca](https://www.scalasca.org/) - A software tool that supports performance analysis of large-scale parallel applications.
228 | - [scalene](https://github.com/plasma-umass/scalene) - A high-performance, high-precision CPU, GPU, and memory profiler for Python.
229 | - [Summary of code performance analysis tools](https://doku.lrz.de/display/PUBLIC/Performance+and+Code+Analysis+Tools+for+HPC) - An overview of tools for analyzing HPC application performance.
230 | - [Summary of profiling tools](https://pramodkumbhar.com/2017/04/summary-of-profiling-tools/) - A comprehensive list of profiling tools for performance analysis in HPC.
231 | - [tau](https://www.cs.uoregon.edu/research/tau/home.php) - TAU (Tuning and Analysis Utilities) is a profiling and tracing toolkit for performance analysis of parallel programs.
232 | - [The Bandwidth Benchmark](https://github.com/RRZE-HPC/TheBandwidthBenchmark/) - A tool for measuring memory bandwidth across various CPUs and systems.
233 | - [vampir](https://vampir.eu/) - A tool for detailed analysis of MPI program executions by visualizing their event traces.
234 | - [bytehound memory profiler](https://github.com/koute/bytehound) - A detailed memory profiler for tracking down memory issues and leaks.
235 | - [Flamegraphs](https://www.brendangregg.com/flamegraphs.html) - Visualization tool for profiling software, allowing quick identification of performance bottlenecks.
236 | - [fio](https://linux.die.net/man/1/fio) - Flexible I/O tester for benchmarking and stress/hardware verification.
237 | - [IBM Spectrum Scale Key Performance Indicators (KPI)](https://github.com/IBM/SpectrumScale_NETWORK_READINESS) - Provides key performance indicators for IBM Spectrum Scale, aiding in performance tuning and monitoring.
238 | - [Ior](https://github.com/hpc/ior) - A parallel file system I/O benchmarking tool used widely in HPC for testing storage systems.
239 | - [ngstress](https://github.com/ColinIanKing/stress-ng) - A versatile tool for stressing various subsystems of a computer to find hardware faults or to benchmark performance.
240 | - [Hotspot](https://github.com/KDAB/hotspot/) - The Linux perf GUI for in-depth performance analysis and visualization of software behavior.
241 | - [mixbench](https://github.com/ekondis/mixbench) - A benchmark suite designed to evaluate CPUs and GPUs across different compute and memory operations.
242 | - [pmu-tools (toplev)](https://github.com/andikleen/pmu-tools) - Performance monitoring tools for modern Intel CPUs, offering detailed insights into hardware and application performance.
243 | - [SPEC CPU Benchmark](https://www.spec.org/benchmarks.html) - A benchmark suite designed to provide a comparative measure of compute-intensive performance across the widest practical range of hardware.
244 | - [STREAM Memory Bandwidth Benchmark](https://www.cs.virginia.edu/stream/) - Measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.
245 | - [Intel MPI benchmarks](https://www.intel.com/content/www/us/en/docs/mpi-library/user-guide-benchmarks/2021-2/overview.html) - A set of benchmarks designed to measure the performance and scalability of MPI implementations on Intel architectures.
246 | - [Ohio state MPI benchmarks](https://mvapich.cse.ohio-state.edu/benchmarks/) - A comprehensive suite of benchmarks for evaluating MPI performance across a variety of message passing patterns and communication protocols.
247 | - [hpctoolkit](http://hpctoolkit.org/man/hpctoolkit.html) - An integrated suite of tools for measurement and analysis of program performance on computers ranging from desktops to supercomputers.
248 | - [core-to-core-latency](https://github.com/nviennot/core-to-core-latency) - A diagnostic tool designed to measure and report the latency between CPU cores, aiding in the optimization of parallel computing tasks.
249 | - [speedscope](https://github.com/jlfwong/speedscope) - An interactive, web-based viewer for performance profiles of software. It supports various formats and provides a flamegraph visualization to identify hot paths efficiently.
250 | - [Differential Flamegraphs](https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html) - A visualization technique developed by Brendan Gregg that highlights differences between performance profiles, making it easier to spot performance regressions or improvements.
251 | - [Hyperfine](https://github.com/sharkdp/hyperfine) - A command-line benchmarking tool that provides a simple and user-friendly means to compare the performance of commands, featuring statistical analysis across multiple runs.
252 | - [Openfoam HPC benchmark](https://develop.openfoam.com/committees/hpc/-/wikis/home) - A benchmarking suite for evaluating the High Performance Computing capabilities of OpenFOAM, an open-source CFD software, under various computational loads.
253 | - [OSU microbenchmarks](https://mvapich.cse.ohio-state.edu/benchmarks/) - A collection of microbenchmarks designed to evaluate the performance of MPI implementations across various communication protocols and message sizes.
254 | - [fio flexible I/O tester](https://fio.readthedocs.io/) - A versatile tool for I/O workload simulation and benchmarking, capable of testing a wide array of storage and filesystem configurations.
255 | - [vftrace](https://github.com/SX-Aurora/Vftrace) - A tracing tool specifically designed for the NEC SX-Aurora TSUBASA Vector Engine, enabling detailed performance analysis of vectorized code.
256 | - [tinymembench](https://github.com/ssvb/tinymembench) - A simple memory benchmark tool, focusing on benchmarking memory bandwidth and latency with minimal dependencies, suitable for various platforms.
257 | - [Geekbench](https://www.geekbench.com/) - Cross platform benchmarking tool
258 | - [Empirical Roofline Tool (ERT)](https://crd.lbl.gov/divisions/amcr/computer-science-amcr/par/research/roofline/software/ert/) - Create empirical roofline plots, alternative to intel vtune for any machine
259 | - [Roofline Visualizer for ERT](https://crd.lbl.gov/divisions/amcr/computer-science-amcr/par/research/roofline/software/roofline-visualizer/) - Visualizer for ERT
260 | - [Caliper](https://github.com/LLNL/Caliper) - A Performance Analysis Toolbox in a Library
261 | - [KDiskMark](https://github.com/JonMagon/KDiskMark) - Benchmarking Tool For SSD/HDD Drives
262 | - [OpenBenchmarking](https://openbenchmarking.org/) - Open benchmarks on a variety of algorithms and hardware
263 | - [Phoronix Test Suite](https://github.com/phoronix-test-suite/phoronix-test-suite) - Benchmarking suite for Linux
264 | - [Palanteer Python/C++ Profiler](https://github.com/dfeneyrou/palanteer) - Profiler for both Python and C++
265 |
266 | #### IO/Visualization Tools for HPC
267 | - [ADIOS2](https://github.com/ornladios/ADIOS2) - The Adaptable IO System version 2, designed for flexible and efficient I/O for scientific data, supporting a wide range of HPC simulations.
268 | - [Amira](https://www.thermofisher.com/ca/en/home/electron-microscopy/products/software-em-3d-vis/amira-software.html) - A powerful, multifaceted 3D software platform for visualizing, manipulating, and understanding Life Science and bio-medical data coming from all types of sources.
269 | - [hdf5](https://www.hdfgroup.org/solutions/hdf5/) - The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data.
270 | - [paraview](https://www.paraview.org/) - An open-source, multi-platform data analysis and visualization application.
271 | - [Scientific Visualization Wiki](https://en.wikipedia.org/wiki/Scientific_visualization) - A comprehensive guide to the field of scientific visualization, detailing techniques, tools, and applications.
272 | - [the yt project](https://yt-project.org/) - An open-source, Python-based package for analyzing and visualizing volumetric data.
273 | - [vedo](https://vedo.embl.es/) - A lightweight and powerful python module for scientific analysis and visualization of 3D objects and point clouds based on VTK.
274 | - [visit](https://wci.llnl.gov/simulation/computer-codes/visit) - An Open Source, interactive, scalable, visualization, animation and analysis tool.
275 | - [WebDataset](https://huggingface.co/docs/hub/datasets-webdataset) - library for writing I/O pipelines for large datasets.
276 |
277 | #### General Purpose Scientific Computing Libraries for HPC
278 | - [petsc](https://petsc.org/release/)
279 | - [ginkgo](https://ginkgo-project.github.io/)
280 | - [GSL](https://www.gnu.org/software/gsl/)
281 | - [Scalapack](https://netlib.org/scalapack/)
282 | - [rapids.ai](rapids.ai) - collection of libraries for executing end-to-end data science pipelines completely in the GPU
283 | - [trilinos](https://trilinos.github.io/)
284 | - [tnl project](https://tnl-project.org/)
285 | - [RunMat](https://github.com/runmat-org/runmat) - MATLAB-syntax runtime with automatic CPU/GPU execution and fused array math kernels.
286 |
287 | #### Misc.
288 | - [mimalloc memory allocator](https://github.com/microsoft/mimalloc)
289 | - [jemalloc memory allocator](https://github.com/jemalloc/jemalloc)
290 | - [tcmalloc memory allocator](https://github.com/google/tcmalloc)
291 | - [Horde memory allocator](https://github.com/emeryberger/Hoard)
292 | - [Software utilization at UK National Supercomputing Service, ARCHER2](https://www.archer2.ac.uk/support-access/status.html#software-usage-data)
293 | - [SIMD Info](https://simd.info)
294 |
295 | #### Wikis
296 | - [Comparison of cluster software](https://en.wikipedia.org/wiki/Comparison_of_cluster_software)
297 | - [List of cluster management software](https://en.wikipedia.org/wiki/List_of_cluster_management_software)
298 |
299 | ## Hardware
300 |
301 | ### Interconnects/Topology
302 |
303 | - [Ethernet](https://en.wikipedia.org/wiki/Ethernet)
304 | - [Infiniband](https://en.wikipedia.org/wiki/InfiniBand)
305 | - [Network topologies](https://www.hpcwire.com/2019/07/15/super-connecting-the-supercomputers-innovations-through-network-topologies/)
306 | - [Battle of the infinibands - Omnipath vs Infiniband](https://www.nextplatform.com/2017/11/29/the-battle-of-the-infinibands/)
307 | - [Mellanox infiniband cluster config](https://www.mellanox.com/clusterconfig/)
308 | - [RoCE - RDMA Over Converged Ethernet](https://en.wikipedia.org/wiki/RDMA_over_Converged_Ethernet)
309 | - [Slingshot interconnect](https://www.hpe.com/ca/en/compute/hpc/slingshot-interconnect.html)
310 | - [CXL - Compute Express Link](https://www.computeexpresslink.org/)
311 | - [Infiniband Essentials](https://academy.nvidia.com/en/course/infiniband-essentials/?cm=244)
312 | - [NVlink](https://en.wikipedia.org/wiki/NVLink)
313 | - [List of lan-based interconnect bit rates](https://en.wikipedia.org/wiki/List_of_interface_bit_rates)
314 | - [List of internet-based interconnect bit rates](https://en.wikipedia.org/wiki/Bandwidth_(computing)#Internet_connection_bandwidths)
315 |
316 | ### CPU
317 | - [Wikichip](https://en.wikichip.org/wiki/WikiChip)
318 | - [Microarchitecture of Intel/AMD CPUs](https://www.agner.org/optimize/microarchitecture.pdf)
319 | - [Apple M1](https://en.wikipedia.org/wiki/Apple_M1)
320 | - [Apple M2](https://en.wikipedia.org/wiki/Apple_M2)
321 | - [Apple M2 Teardown](https://www.ifixit.com/News/62674/m2-macbook-air-teardown-apple-forgot-the-heatsink)
322 | - [Apply M1/M2 AMX](https://github.com/corsix/amx)
323 | - [Apple M3](https://en.wikipedia.org/wiki/Apple_M3)
324 | - [List of Intel processors](https://en.wikipedia.org/wiki/List_of_Intel_processors)
325 | - [List of Intel micro architectures](https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures)
326 | - [Comparison of Intel processors](https://en.wikipedia.org/wiki/Comparison_of_Intel_processors)
327 | - [Comparison of Apple processors](https://en.wikipedia.org/wiki/Apple-designed_processors)
328 | - [List of AMD processors](https://en.wikipedia.org/wiki/List_of_AMD_processors)
329 | - [List of AMD CPU micro architectures](https://en.wikipedia.org/wiki/List_of_AMD_CPU_microarchitectures)
330 | - [Comparison of AMD architectures](https://en.wikipedia.org/wiki/Table_of_AMD_processors)
331 |
332 | ### GPU
333 |
334 | - [Inside NVIDIA GPUs: Anatomy of high performance matmul kernels](https://www.aleksagordic.com/blog/matmul)
335 | - [Gpu Architecture Analysis](https://graphicscodex.courses.nvidia.com/app.html?page=_rn_parallel)
336 | - [A trip through the Graphics Pipeline](https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/)
337 | - [A100 Whitepaper](https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf)
338 | - [MIG](https://www.nvidia.com/en-us/technologies/multi-instance-gpu/)
339 | - [Gentle Intro to GPU Inner Workings](https://vksegfault.github.io/posts/gentle-intro-gpu-inner-workings/)
340 | - [AMD Instinct GPUs](https://en.wikipedia.org/wiki/AMD_Instinct_accelerators)
341 | - [AMD GPU ROCm Support and OS Compatibility](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html)
342 | - [List of AMD GPUs](https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units)
343 | - [Comparison of CUDA architectures](https://en.wikipedia.org/wiki/CUDA)
344 | - [Tales of the M1 GPU](https://asahilinux.org/2022/11/tales-of-the-m1-gpu/)
345 | - [List of Intel GPUs](https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units)
346 | - [Performance of DGX Cluster](https://www.computer.org/csdl/proceedings-article/cloudcom/2022/636700a170/1JNqFu7QdTG)
347 | - [Cuda Ontology](https://jamesakl.com/posts/cuda-ontology/)
348 |
349 | ### TPU/Tensor Cores
350 |
351 | - [Google TPU](https://thechipletter.substack.com/p/googles-first-tpu-architecture)
352 | - [TPU Wiki](https://en.wikipedia.org/wiki/Tensor_Processing_Unit)
353 | - [NVIDIA Tensor Cores](https://www.nvidia.com/en-us/data-center/tensor-cores/)
354 |
355 | ### Many integrated core processor (MIC)
356 |
357 | - [Xeon Phi](https://en.wikipedia.org/wiki/Xeon_Phi)
358 |
359 | ### Cloud
360 |
361 | - [Awesome Cloud HPC](https://github.com/kjrstory/awesome-cloud-hpc)
362 |
363 | #### Vendors
364 |
365 | - [Official NVIDIA Vendors](https://marketplace.nvidia.com/en-us/enterprise/cloud-solutions/?limit=15)
366 | - [AWS HPC](https://aws.amazon.com/hpc/)
367 | - [Azure HPC](https://azure.microsoft.com/en-us/solutions/high-performance-computing/#intro)
368 | - [rescale](https://rescale.com/)
369 | - [vast.ai](https://vast.ai/)
370 | - [vultr - cheap bare metal CPU, GPU, DGX servers](vultr.com)
371 | - [hetzner - cheap servers incl. 80-core ARM](https://www.hetzner.com/)
372 | - [Ampere ARM cloud-native processors](https://amperecomputing.com/)
373 | - [Scaleway](https://www.scaleway.com/en/)
374 | - [Chameleon Cloud](https://www.chameleoncloud.org/)
375 | - [Lambda Labs](https://lambdalabs.com/)
376 | - [Runpod](https://www.runpod.io/)
377 |
378 | #### Articles/Papers
379 | - [The use of Microsoft Azure for high performance cloud computing – A case study](https://www.diva-portal.org/smash/get/diva2:1704798/FULLTEXT01.pdf)
380 | - [AWS Cluster in the cloud](https://cluster-in-the-cloud.readthedocs.io/en/latest/aws-infrastructure.html)
381 | - [AWS Parallel Cluster](https://docs.aws.amazon.com/parallelcluster/latest/ug/tutorials-running-your-first-job-on-version-3.html)
382 | - [AWS HPC Workshop](https://www.hpcworkshops.com/)
383 | - [An Empirical Study of Containerized MPI and GUI Application on HPC in the Cloud](https://ieeexplore.ieee.org/abstract/document/10046607)
384 |
385 | ### Custom/FPGA/ASIC/APU
386 |
387 | - [OpenPiton](http://parallel.princeton.edu/openpiton/)
388 | - [Parallela](https://www.parallella.org/)
389 | - [AMD APU](https://en.wikipedia.org/wiki/AMD_Accelerated_Processing_Unit)
390 |
391 | ### Certification
392 |
393 | - [Intel Cluster Ready](https://en.wikipedia.org/wiki/Intel_Cluster_Ready)
394 |
395 | ### Student Opportunities / Workshops
396 |
397 | - [Supercomputing Conference Student Opportunities](https://sc21.supercomputing.org/program/studentssc/)
398 | - [SCC Student cluster competition](https://www.studentclustercompetition.us/)
399 | - [Winter Classic Invitational](https://www.winterclassicinvitational.com/)
400 | - [Linux Cluster Institute](https://linuxclustersinstitute.org/)
401 |
402 | ### Other/Wikis
403 |
404 | - [Supercomputer](https://en.wikipedia.org/wiki/Supercomputer)
405 | - [Supercomputer architecture](https://en.wikipedia.org/wiki/Supercomputer_architecture)
406 | - [Beowulf cluster](https://en.wikipedia.org/wiki/Beowulf_cluster)
407 | - [Computer cluster](https://en.wikipedia.org/wiki/Computer_cluster)
408 | - [Comparison of Intel processors](https://en.wikipedia.org/wiki/Comparison_of_Intel_processors)
409 | - [Comparison of Apple processors](https://en.wikipedia.org/wiki/Apple-designed_processors)
410 | - [Comparison of AMD architectures](https://en.wikipedia.org/wiki/Table_of_AMD_processors)
411 | - [Comparison of CUDA architectures](https://en.wikipedia.org/wiki/CUDA)
412 | - [Cache](https://en.wikipedia.org/wiki/Cache_(computing))
413 | - [Google TPU](https://en.wikipedia.org/wiki/Tensor_Processing_Unit)
414 | - [IPMI](https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface)
415 | - [FRU](https://en.wikipedia.org/wiki/Field-replaceable_unit)
416 | - [Disk Arrays](https://en.wikipedia.org/wiki/Disk_array)
417 | - [RAID](https://en.wikipedia.org/wiki/RAID)
418 | - [Cray](https://en.wikipedia.org/wiki/Cray)
419 | - [Digital Signal Processors](https://en.wikipedia.org/wiki/Digital_signal_processor)
420 | - [Vector Processor](https://en.wikipedia.org/wiki/Vector_processor)
421 |
422 | ## People
423 |
424 | - [Jack Dongarra - 2021 Turing Award - LINPACK, BLAS, LAPACK, MPI](https://www.nature.com/articles/s43588-022-00245-w)
425 | - [Bill Gropp - 2010 IEEE TCSC Medal for Excellence in Scalable Computing](https://en.wikipedia.org/wiki/Bill_Gropp)
426 | - [David Bader - built the first Linux supercomputer](https://en.wikipedia.org/wiki/David_Bader_(computer_scientist))
427 | - [Thomas Sterling - "Father of Beowulf clusters", ParalleX/HPX](https://en.wikipedia.org/wiki/Thomas_Sterling_(computing))
428 | - [Seymour Cray - Inventor of the Cray Supercomputer](https://en.wikipedia.org/wiki/Seymour_Cray)
429 | - [Larry Smarr - HPC Application Pioneer](https://en.wikipedia.org/wiki/Larry_Smarr)
430 | - [Donald Becker - Beowulf cluster software, Gordon Bell Prize Winner](https://en.wikipedia.org/wiki/Donald_Becker)
431 | - [HPCWire Class of 2025](https://www.hpcwire.com/35-hpc-legends/)
432 | - [HPCWire Class of 2024](https://www.hpcwire.com/35-hpc-legends-class-of-2024/)
433 |
434 | ## Resources
435 |
436 | #### Books/Manuals
437 | - [Free Modern HPC Books by Victor Eijkhout](https://theartofhpc.com/)
438 | - [High Performance Parallel Runtimes](https://www.amazon.com/High-Performance-Parallel-Runtimes-Implementation-ebook/dp/B08WH82KF9/ref=sr_1_1?keywords=High+Performance+Parallel+Runtimes&qid=1689287759&sr=8-1)
439 | - [The OpenMP Common Core: Making OpenMP Simple Again](https://www.amazon.com/OpenMP-Common-Core-Engineering-Computation/dp/0262538865/ref=d_pd_sbs_sccl_2_1/130-5660046-7109016?pd_rd_w=Cqnxw&content-id=amzn1.sym.3676f086-9496-4fd7-8490-77cf7f43f846&pf_rd_p=3676f086-9496-4fd7-8490-77cf7f43f846&pf_rd_r=HG04QQS87WDHAGV578EE&pd_rd_wg=u0csS&pd_rd_r=8a6a0024-5dec-4934-8fa5-99e24d9fc4bd&pd_rd_i=0262538865&psc=1)
440 | - [Parallel and High Performance Computing](https://www.manning.com/books/parallel-and-high-performance-computing)
441 | - [Algorithms for Modern Hardware](https://en.algorithmica.org/hpc/)
442 | - [High Performance Computing: Modern Systems and Practices](https://www.amazon.ca/High-Performance-Computing-Systems-Practices/dp/012420158X) - Thomas Sterling, Maciej Brodowicz, Matthew Anderson 2017
443 | - [Introduction to High Performance Computing for Scientists and Engineers](https://www.amazon.ca/Introduction-Performance-Computing-Scientists-Engineers/dp/143981192X/ref=sr_1_1?crid=1L276HPEB8K7I&keywords=Introduction+to+High+Performance+Computing+for+Scientists+and+Engineers&qid=1645137608&s=books&sprefix=introduction+to+high+performance+computing+for+scientists+and+engineers%2Cstripbooks%2C46&sr=1-1) - Hager 2010
444 | - [Computer Organization and Design](https://www.amazon.ca/Computer-Organization-Design-RISC-V-Interface/dp/0128203315/ref=sr_1_1?crid=1XLX1HWLGRVO6&keywords=Computer+Organization+and+Design&qid=1645137443&s=books&sprefix=computer+organization+and+design%2Cstripbooks%2C48&sr=1-1)
445 | - [Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops](C+Applications+with+Intel+Cluster+Tools&qid=1645137507&s=books&sprefix=optimizing+hpc+applications+with+intel+cluster+tools%2Cstripbooks%2C80&sr=1-1)
446 | - [Introduction to High Performance Scientific Computing](https://web.corral.tacc.utexas.edu/CompEdu/pdf/stc/EijkhoutIntroToHPC.pdf) - Victor Eijkhout 2021
447 | - [Parallel Programming for Science and Engineering](https://web.corral.tacc.utexas.edu/CompEdu/pdf/pcse/EijkhoutParallelProgramming.pdf) - Victor EIjkhout 2021
448 | - [Parallel Programming for Science and Engineering - HTML Version](https://pages.tacc.utexas.edu/~eijkhout/pcse/html/)
449 | - [C++ High Performance](https://www.amazon.ca/High-Performance-Master-optimizing-functioning/dp/1839216549/ref=sr_1_1?crid=31OVX4VQ6Z84X&keywords=C%2B%2B+high+performance&qid=1640671313&sprefix=c%2B%2B+high+performance%2Caps%2C99&sr=8-1)
450 | - [Data Parallel C++ Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL](https://www.apress.com/gp/book/9781484255735)
451 | - [High Performance Python](https://www.amazon.ca/High-Performance-Python-Performant-Programming/dp/1449361595)
452 | - [C++ Concurrency in Action: Practical Multithreading](https://www.manning.com/books/c-plus-plus-concurrency-in-action) - Anthony Williams 2012
453 | - [The Art of Multiprocessor Programming](https://www.amazon.com/Art-Multiprocessor-Programming-Revised-Reprint/dp/0123973376/ref=sr_1_1?ie=UTF8&qid=1438003865&sr=8-1&keywords=maurice+herlihy) - Maurice Herlihy 2012
454 | - [Parallel Computing: Theory and Practice](http://www.cs.cmu.edu/afs/cs/academic/class/15210-f15/www/tapp.html#ch:work-stealing) - Umut A. Acar 2016
455 | - [Introduction to Parallel Computing](https://www.amazon.ca/Introduction-Parallel-Computing-Zbigniew-Czech/dp/1107174392/ref=sr_1_7?dchild=1&keywords=parallel+computing&qid=1625711415&sr=8-7) - Zbigniew J. Czech
456 | - [Practical guide to bare metal C++](https://arobenko.github.io/bare_metal_cpp/)
457 | - [Optimizing software in C++](https://www.agner.org/optimize/optimizing_cpp.pdf)
458 | - [Optimizing subroutines in assembly code](https://www.agner.org/optimize/optimizing_assembly.pdf)
459 | - [Microarchitecture of Intel/AMD CPUs](https://www.agner.org/optimize/microarchitecture.pdf)
460 | - [Parallel Programming with MPI](https://www.cs.usfca.edu/~peter/ppmpi/)
461 | - [HPC, Big Data, AI Convergence Towards Exascale: Challenge and Vision](https://www.taylorfrancis.com/books/edit/10.1201/9781003176664/hpc-big-data-ai-convergence-towards-exascale-olivier-terzo-jan-martinovi%C4%8D?refId=2cd8b0ad-d63d-42fa-9c3e-fe47fbbe0e29&context=ubx)
462 | - [Introduction to parallel computing](https://www.amazon.com/Introduction-Parallel-Computing-Ananth-Grama/dp/0201648652/ref=sr_1_1?crid=LE1VD245VDX5&keywords=Ananth+Grama+-+Introduction+to+parallel+computing&qid=1644907263&sprefix=ananth+grama+-+introduction+to+parallel+computing%2Caps%2C43&sr=8-1) - Ananth Grama
463 | - [The Student Supercomputer Challenge Guide](https://www.amazon.ca/Student-Supercomputer-Challenge-Guide-Supercomputing/dp/9811338310/ref=sr_1_1?crid=2J5374I76RP2Y&keywords=The+student+supercomputer+challenge&qid=1657060946&sprefix=the+student+supercomputer+challenge%2Caps%2C53&sr=8-1)
464 | - [The Rust Performance Book](https://nnethercote.github.io/perf-book/introduction.html)
465 | - [E-Zines on Bash, Linux, Perf, etc - Julia Evans](https://wizardzines.com/)
466 | - [The Art of Writing Efficient Programs: An Advanced Programmer's Guide to Efficient Hardware Utilization and Compiler Optimizations Using C++ Examples](https://www.amazon.ca/Art-Writing-Efficient-Programs-optimizations/dp/1800208111)
467 | - [OpenMP Examples - openmp.org](https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf)
468 | - [Latest books on OpemMP - openmp.org](https://www.openmp.org/resources/openmp-books/)
469 | - [Programming Massively Parallel Processors 4th Edition 2023](https://www.amazon.ca/Programming-Massively-Parallel-Processors-Hands/dp/0128119861/ref=sr_1_1?crid=18EW0LVO2VFMC&keywords=Programming+Massively+Parallel+Processors+4th+Edition+2023&qid=1695110729&s=books&sprefix=programming+massively+parallel+processors+4th+edition+2023%2Cstripbooks%2C88&sr=1-1)
470 | - [Software Optimization Cookbook](https://www.amazon.ca/Software-Optimization-Cookbook-Performance-Platforms/dp/0976483211)
471 | - [Power and Performance_ Software Analysis and Optimization](https://www.amazon.ca/Power-Performance-Software-Analysis-Optimization-ebook/dp/B00WZ1AX6S/ref=sr_1_1?crid=22HMPRFCYAXC0&keywords=Power+and+Performance_+Software+Analysis+and+Optimization&qid=1695111518&s=books&sprefix=power+and+performance_+software+analysis+and+optimization%2Cstripbooks%2C85&sr=1-1)
472 | - [Gropp books on MPI](https://wgropp.cs.illinois.edu/usingmpiweb/)
473 | - [Performance Analysis and Tuning on Modern CPUs](https://book.easyperf.net/perf_book)
474 | - [High Performance Computing in Biomimetics Modeling, Architecture and Applications](https://link.springer.com/book/10.1007/978-981-97-1017-1)
475 | - [Systems Performance - Brendan Gregg](https://www.amazon.com/Systems-Performance-Brendan-Gregg/dp/0136820158)
476 | - [Is Parallel Programming Hard, And, If So, What Can You Do About It? - Paul E. McKenney](https://cdn.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html)
477 | - [The Little Book of Semaphores](https://greenteapress.com/wp/semaphores/)
478 | - [Building Clustered Linux Systems](https://www.amazon.com/Building-Clustered-Linux-Systems-Robert/dp/0131448536)
479 | - [Computer Architecture: A Quantitative Approach](https://www.amazon.com/Computer-Architecture-Quantitative-John-Hennessy/dp/012383872X)
480 | - [Supercomputers for Linux SysAdmins](https://link.springer.com/book/10.1007/979-8-8688-1600-0)
481 | - [Raspberry Pi Supercomputing and Scientific Programming](https://link.springer.com/book/10.1007/978-1-4842-2878-4)
482 | - [MPI with Python (free)](https://cloudmesh.github.io/cloudmesh-mpi/report-mpi.pdf)
483 |
484 | #### Courses
485 | - [HPC Carpentry](https://www.hpc-carpentry.org/)
486 | - [Berkeley: Applications of Parallel Computers](https://sites.google.com/lbl.gov/cs267-spr2019/) - Detailed course on HPC
487 | - [CS6290 High-performance Computer Architecture](https://www.udacity.com/course/high-performance-computer-architecture--ud007) - Milos Prvulovic and Catherine Gamboa at George Tech
488 | - [Udacity High Performance Computing](https://www..com/playlist?list=PLAwxTw4SYaPk8NaXIiFQXWK6VPnrtMRXC)
489 | - [Parallel Numerical Algorithms](https://solomonik.cs.illinois.edu/teaching/cs554/index.html)
490 | - [Vanderbilt - Intro to HPC](https://github.com/vanderbiltscl/SC3260_HPC)
491 | - [Illinois - Intro to HPC](https://andreask.cs.illinois.edu/Teaching/HPCFall2012/) - Creator of PyCuda
492 | - [Archer1 Courses](http://www.archer.ac.uk/training/past_courses.php)
493 | - [TACC tutorials](https://portal.tacc.utexas.edu/tutorials)
494 | - [Livermore training materials](https://hpc.llnl.gov/training/tutorials)
495 | - [Xsede training materials](https://www.hpc-training.org/xsede/moodle/)
496 | - [Parallel Computation Math](https://www.cct.lsu.edu/~pdiehl/teaching/2021/4997/)
497 | - [Introduction to High-Performance and Parallel Computing - Coursera](https://www.coursera.org/learn/introduction-high-performance-computing)
498 | - [Foundations of HPC 2020/2021](https://github.com/Foundations-of-HPC)
499 | - [Principles of Distributed Computing](https://disco.ethz.ch/courses/podc_allstars/)
500 | - [High Performance Visualization](https://www.uni-bremen.de/ag-high-performance-visualization)
501 | - [Temple course on building/maintaining a cluster](https://www.hpc.temple.edu/mhpc/2021/hpc-technology/index.html)
502 | - [Nvidia Deep Learning Course](https://www.nvidia.com/en-us/training/online/)
503 | - [Coursera GPU Programming Specialization](https://www.coursera.org/specializations/gpu-programming)
504 | - [Coursera Fundamentals of Parallelism on Intel Architecture](https://www.coursera.org/learn/parallelism-ia)
505 | - [Coursera Introduction to High Performance Computing](https://www.coursera.org/learn/introduction-high-performance-computing)
506 | - [Archer2 Shared Memory Programming with OpenMP](https://www.archer2.ac.uk/training/courses/210000-openmp-self-service/)
507 | - [Archer2 Message-Passing Programming with MPI](https://www.archer2.ac.uk/training/courses/210000-mpi-self-service/)
508 | - [HetSys 2022 Course](https://www.youtube.com/playlist?list=PL5Q2soXY2Zi9XrgXR38IM_FTjmY6h7Gzm)
509 | - [Edukamu Introduction to Supercomputing](https://edukamu.fi/elements-of-supercomputing)
510 | - [Heterogeneous Parallel Programming by S K](https://www.youtube.com/channel/UCbD5dhBi6DBSvCTgEDFz7uA/videos)
511 | - [NCSA HPC Training Moodle](https://www.hpc-training.org/xsede/moodle/)
512 | - [Supercomputing in plain english](http://www.oscer.ou.edu/education.php)
513 | - [Cornell workshop](https://cvw.cac.cornell.edu/topics)
514 | - [Carpentries Incubator HPC Intro](https://carpentries-incubator.github.io/hpc-intro/)
515 | - [UL HPC School](https://ulhpc-tutorials.readthedocs.io/en/latest/hpc-school/)
516 | - [Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran](https://bitbucket.org/berkeleylab/upcxx/wiki/events/CUF23)
517 | - [Performance Engineering off Software Systems (MIT-OCW)](https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/video_galleries/lecture-videos/)
518 | - [Introduction to Parallel Computing (CMSC 498X/818X)](https://www.cs.umd.edu/class/fall2020/cmsc498x/lectures.shtml)
519 | - [Infiniband Essentials](https://academy.nvidia.com/en/course/infiniband-essentials/?cm=244)
520 | - [Performance Ninja Optimization Course](https://github.com/dendibakh/perf-ninja)
521 | - [HPC Administration Virtual Residency 2024](https://www.youtube.com/@VirtualResidency2024/videos)
522 | - [Programming Parallel Computers](https://ppc-exercises.cs.aalto.fi/courses)
523 | - [High Performace Machine Learning - Columbia University](https://www.cs.columbia.edu/~aa4870/high-performance-machine-learning/)
524 | - [HPC.NRW: Various tutorials about Linux, GPU programming, performance tools, and research data management](https://www.youtube.com/@HPCNRW/playlists)
525 | - [Introduction to HPC - IIT Bombay](https://www.youtube.com/playlist?list=PLOzRYVm0a65fSrgx3kroerFJtFjGGIcXm)
526 | - [Aalto University course CS-E4580 Programming Parallel Computers](https://ppc.cs.aalto.fi/)
527 | - [HLRS Stuttgart HPC Courses](https://www.hlrs.de/de/training/uebersicht)
528 |
529 | #### Tutorials/Guides/Articles
530 | ##### General
531 | - [MpiTutorial](mpitutorial.com) - A fantastic mpi tutorial
532 | - [Beginners Guide to HPC](http://www.shodor.org/petascale/materials/UPModules/beginnersGuideHPC/)
533 | - [Rookie HPC Guide](https://rookiehpc.github.io/index.html)
534 | - [RedHat High Performance Computing 101](https://www.redhat.com/en/blog/high-performance-computing-101)
535 | - [Parallel Computing Training Tutorials](https://hpc.llnl.gov/training/tutorials) - Lawrence Livermore National Laboratory
536 | - [Foundations of Multithreaded, Parallel, and Distributed Programming](https://www.amazon.com/Foundations-Multithreaded-Parallel-Distributed-Programming/dp/B00F4I7HM2/ref=sr_1_2?dchild=1&keywords=Gregory+R.+Andrews+Distributed+Programming&qid=1625766665&s=books&sr=1-2)
537 | - [Building pipelines using slurm dependencies](https://hpc.nih.gov/docs/job_dependencies.html)
538 | - [Writing slurm scripts in python,r and bash](https://vsoch.github.io/lessons/sherlock-jobs/)
539 | - [Xsede new user tutorials](https://portal.xsede.org/online-training)
540 | - [Supercomputing in plain english](http://www.oscer.ou.edu/education.php)
541 | - [Improving Performance with SIMD intrinsics](https://stackoverflow.blog/2020/07/08/improving-performance-with-simd-intrinsics-in-three-use-cases/)
542 | - [Want speed? Pass by value](https://web.archive.org/web/20140205194657/http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/)
543 | - [Introduction to low level bit hacks](https://catonmat.net/low-level-bit-hacks)
544 | - [How to write fast numerical code: An Introduction](https://users.ece.cmu.edu/~franzf/papers/gttse07.pdf)
545 | - [Lecture notes on Loop optimizations](https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/17-loopopt.pdf)
546 | - [A practical approach to code optimization](https://www.einfochips.com/wp-content/uploads/resources/a-practical-approach-to-optimize-code-implementation.pdf)
547 | - [Software optimization manuals](https://www.agner.org/optimize/)
548 | - [Guide into OpenMP: Easy multithreading programming for C++](https://bisqwit.iki.fi/story/howto/openmp/)
549 | - [An Introduction to the Partitioned Global Address Space (PGAS) Programming Model](https://cnx.org/contents/gtg1AzdI@7/An-Introduction-to-the-Partitioned-Global-Address-Space-PGAS-Programming-Model)
550 | - [Jax in 2022](https://www.assemblyai.com/blog/why-you-should-or-shouldnt-be-using-jax-in-2022/)
551 | - [C++ Benchmarking for beginners](https://unum.cloud/post/2022-03-04-gbench/)
552 | - [Mapping MPI ranks to multiple cuda GPU](https://github.com/olcf-tutorials/local_mpi_to_gpu)
553 | - [Oak Ridge National Lab Tutorials](https://github.com/olcf-tutorials)
554 | - [How to perform large scale data processing in bioinformatics](https://medium.com/dnanexus/how-to-perform-large-scale-data-processing-in-bioinformatics-4006e8088af2)
555 | - [Step by step SGEMM in OpenCL](https://cnugteren.github.io/tutorial/pages/page1.html)
556 | - [Frontier User Guide](https://docs.olcf.ornl.gov/systems/frontier_user_guide.html)
557 | - [Allocating large blocks of memory in bare-metal C programming](https://lemire.me/blog/2020/01/17/allocating-large-blocks-of-memory-bare-metal-c-speeds/)
558 | - [Hashmap benchmarks 2022](https://martin.ankerl.com/2022/08/27/hashmap-bench-01/)
559 | - [LLNL HPC Tutorials](https://hpc.llnl.gov/documentation/tutorials)
560 | - [The dirty secret of high performance computing](https://www.techradar.com/news/the-dirty-secret-of-high-performance-computing)
561 | - [Multiple GPUs with pytorch](https://www.run.ai/guides/multi-gpu/pytorch-multi-gpu-4-techniques-explained)
562 | - [Brendan Gregg on Linux Performance](https://www.brendangregg.com/linuxperf.html)
563 | - [Automatic Slurm build scripts](https://www.ni-sp.com/slurm-build-script-and-container-commercial-support/#h-automatic-slurm-build-script-for-rh-centos-7-8-and-9)
564 | - [Fastest unordered_map implementation / benchmarks](https://martin.ankerl.com/2022/08/27/hashmap-bench-01/)
565 | - [Memory bandwith NapkinMath](https://www.forrestthewoods.com/blog/memory-bandwidth-napkin-math/)
566 | - [Avoiding Instruction Cache Misses](https://paweldziepak.dev/2019/06/21/avoiding-icache-misses/)
567 | - [Multi-GPU Programming with Standard Parallel C++](https://developer.nvidia.com/blog/multi-gpu-programming-with-standard-parallel-c-part-1/)
568 | - [EuroCC National Competence Center Sweden (ENCCS) HPC tutorials](https://enccs.se/lessons/)
569 | - [LLNL hpc tutorials](https://hpc-tutorials.llnl.gov/)
570 | - [python.org Python Performance Tips](https://wiki.python.org/moin/PythonSpeed/PerformanceTips)
571 | - [HPC toolset tutorial (cluster management)](https://github.com/ubccr/hpc-toolset-tutorial)
572 | - [OpenMP tutorials](https://www.openmp.org/resources/tutorials-articles/)
573 | - [CUDA best practices guide](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
574 | - [Understanding CPU Architecture And Performance Using LIKWID](https://pramodkumbhar.com/2020/03/architectural-optimisations-using-likwid-profiler/)
575 | - [32 OpenMP Traps For C++ Developers](https://pvs-studio.com/en/blog/posts/cpp/a0054/#ID0EWEAC)
576 | - [Best practices for running jobs on a HPC cluster](https://hpc.dccn.nl/docs/cluster_howto/best_practices.html)
577 | - [Glossary of HPC related terms](https://www.gigabyte.com/Glossary?lan=en)
578 | - [Setting the record straight: What is HPC?](https://www.gigabyte.com/Article/setting-the-record-straight-what-is-hpc-a-tech-guide-by-gigabyte?lan=en)
579 | - [Atomic operations and contention](https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/)
580 | - [A concurrency cost hiearchy](https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html)
581 | - [hpc-wiki.info - Tutorials and articles for HPC users, developers, administrators and specific HPC systems](https://hpc-wiki.info)
582 | - [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM)
583 | - [An Introduction to Parallel Programming with MPI and Python](https://materials.jeremybejarano.com/MPIwithPython/)
584 | - [Performance Hints](https://abseil.io/fast/hints.html)
585 |
586 | ##### Machine Learning Related
587 | - [Best practices for machine learning with HPC](https://info.gwdg.de/news/en/best-practices-for-machine-learning-with-hpc/)
588 | - [How to pick the right hardware for AI - Gigabyte - Part 1](https://www.gigabyte.com/Article/how-to-pick-the-right-server-for-ai-part-one-cpu-gpu)
589 | - [A practitioner's guide to testing and running large GPU clusters for training generative AI models](https://www.together.ai/blog/a-practitioners-guide-to-testing-and-running-large-gpu-clusters-for-training-generative-ai-models)
590 | - [AWS HPC Workshop](https://www.hpcworkshops.com/)
591 | - [Hardware Acceleration of LLMs: A comprehensive survey and comparison](https://news.ycombinator.com/item?id=41470074)
592 | - [The Utralscale Playbook - Training LLMs on GPU Clusters](https://huggingface.co/spaces/nanotron/ultrascale-playbook)
593 |
594 | #### Review Papers/Articles
595 | - [Interactive and Urgent HPC Challenges (2024)](https://arxiv.org/pdf/2401.14550.pdf)
596 | - [The Landscape of Exascale Research: A Data-Driven Literature Analysis (2020)](https://dl.acm.org/doi/pdf/10.1145/3372390)
597 | - [The Landscape of Parallel Computing Research: A View from Berkeley](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf)
598 | - [Extreme Heterogeneity 2018: Productive Computational Science in the Era of Extreme Heterogeneity](references/2018-Extreme-Heterogeneity-DoE.pdf)
599 | - [Programming for Exascale Computers - Will Gropp, Marc Snir](https://snir.cs.illinois.edu/listed/J55.pdf)
600 | - [On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems (2020)](https://www.mcs.anl.gov/research/projects/argo/publications/2020-sbacpad-peng.pdf)
601 | - [Advances in Parallel & Distributed Processing, and Applications (conference proceedings)](https://link.springer.com/book/10.1007/978-3-030-69984-0)
602 | - [Designing Heterogeneous Systems: Large Scale Architectural Exploration Via Simulation](https://ieeexplore.ieee.org/abstract/document/9651152)
603 | - [Reinventing High Performance Computing: Challenges and Opportunities (2022)](https://arxiv.org/pdf/2203.02544.pdf)
604 | - [Challenges in Heterogeneous HPC White Paper (2022)](https://www.etp4hpc.eu/pujades/files/ETP4HPC_WP_Heterogeneous-HPC_20220216.pdf)
605 | - [An Evolutionary Technical & Conceptual Review on High Performance Computing Systems (Dec 2021)](https://kalaharijournals.com/resources/DEC_597.pdf)
606 | - [New Horizons for High-Performance Computing (2022)](https://csdl-downloads.ieeecomputer.org/mags/co/2022/12/09963771.pdf?Expires=1669702667&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jc2RsLWRvd25sb2Fkcy5pZWVlY29tcHV0ZXIub3JnL21hZ3MvY28vMjAyMi8xMi8wOTk2Mzc3MS5wZGYiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2Njk3MDI2Njd9fX1dfQ__&Signature=s3K~-JXyED6vMVT9IKGj7LOhR75CrkQXiqAEsAEQt4zRqTbUFywmSoT10th1CdAaZcfZFuMsg23o2e719FRkCD6flVNB55d5tKyMUp7jUbkUtxnatOWLAKXfE4yQ-zrYQQEWBhtpSLKrTAS1oVmJ00YwkWqLYqCjhFIjW9La5od2SGQZEFZ136bbaGzxLZlED3JlMCMLB54YXKr-Ng1rngV4I9Wi-wSTFyLiA92~fUlk1KPQKU0XjtsMyYMYlt06Ze5H6jcQw4ytJ6c7r7qNJ43ifnsZepWmBywA8lVy2g3joOvZJtVjl~S91R8EZbiyWlYdWBGrO7pPdO6hH48~NQ__&Key-Pair-Id=K12PMWTCQBDMDT)
607 | - [CConfidential High-Performance Computing in the Public Cloud](https://arxiv.org/pdf/2212.02378.pdf)
608 | - [Containerisation for High Performance Computing Systems: Survey and Prospects](https://ieeexplore.ieee.org/abstract/document/9985426)
609 | - [Heterogeneous Computing Systems (2023)](https://arxiv.org/pdf/2212.14418.pdf)
610 | - [Myths and Legends in High-Performance Computing](https://arxiv.org/pdf/2301.02432.pdf)
611 | - [Energy-Aware Scheduling for High-Performance Computing Systems: A Survey](https://www.mdpi.com/1996-1073/16/2/890)
612 | - [Ultimate Physical limits to computation - Seth Lloyd](https://arxiv.org/abs/quant-ph/9908043)
613 | - [Myths and Legends in High-Performance Computing](https://arxiv.org/abs/2301.02432)
614 | - [Abstract Machine Models and Proxy Architectures for Exascale Computing, 2014, Sandia National Laboratories and Lawrence Berkeley National Laboratory](https://www.osti.gov/servlets/purl/1561498)
615 | - [Some thoughts on the environmental impact of High Performance Computing](https://sifflez.org/publications/environment-hpc/)
616 | - [A Research Retrospective on AMD's Exascale Computing Journey](https://dl.acm.org/doi/abs/10.1145/3579371.3589349)
617 |
618 | #### News
619 | - [InsideHPC](https://insidehpc.com/)
620 | - [HPCWire](https://www.hpcwire.com/)
621 | - [NextPlatform](https://www.nextplatform.com)
622 | - [Datacenter Dynamics](https://www.datacenterdynamics.com/en/)
623 | - [Admin Magazine HPC](https://www.admin-magazine.com/HPC/News)
624 | - [Toms hardware](https://www.tomshardware.com/)
625 | - [Tech Radar](https://www.techradar.com/)
626 | - [Phoronix](https://www.phoronix.com/)
627 | - [The Register](https://www.theregister.com/on_prem/hpc/)
628 |
629 | #### Podcasts
630 | - [This week in HPC](https://soundcloud.com/this-week-in-hpc)
631 | - [Preparing Applications for Aurora in the Exascale Era](https://connectedsocialmedia.com/20114/preparing-applications-for-aurora-in-the-exascale-era/)
632 | - [Slurm podcast](https://www.rce-cast.com/index.php/Podcast/rce-10-slurm.html)
633 | - [HPCPodcast](https://insidehpc.com/category/resources/hpc-podcast/)
634 | - [Developer Stories - The path to a career in high performance computing is not always equitable or clear.](https://rseng.github.io/devstories/2024/jay-lofstead/)
635 | - [Developer Stories - HPCToolkit](https://rseng.github.io/devstories/2024/wileam-phan/)
636 |
637 | #### Video Presentations/Courses/Channels
638 | - [Argonne lectures on Extreme Scale Computing 2022](https://www.youtube.com/playlist?list=PLcbxjEfgjpO9OeDu--H9_XqyxPj3MkjdN)
639 | - [Argonne supercomputer tour](https://www.youtube.com/watch?v=UT9HCgp2X3A)
640 | - [Containers in HPC - what they fix and what they break ](https://youtube.com/watch?v=WQTrA4-9ZXk&feature=share)
641 | - [HPC Tech Shorts](https://www.youtube.com/channel/UChSIn5kcWQvJxW17KIjdLVw)
642 | - [CppCon](https://www.youtube.com/user/CppCon/videos)
643 | - [Create a clustering server](https://www.youtube.com/watch?v=4LyL4sNZ1u4)
644 | - [Argonne national lab](https://www.youtube.com/channel/UCfwgjtIQB3puojz_N9ly_Ag)
645 | - [Oak Ridge National Lab](https://www.youtube.com/user/OakRidgeNationalLab)
646 | - [Concurrency in C++20 and Beyond](https://www.youtube.com/watch?v=jozHW_B3D4U) - A. Williams
647 | - [Is Parallel Programming still Hard?](https://www.youtube.com/watch?v=YM8Xy6oKVQg) - P. McKenney, M. Michael, and M. Wong at CppCon 2017
648 | - [The Speed of Concurrency: Is Lock-free Faster?](https://www.youtube.com/watch?v=9hJkWwHDDxs) - Fedor G Pikus in CppCon 2016
649 | - [Expressing Parallelism in C++ with Threading Building Blocks](https://www.youtube.com/watch?v=9Otq_fcUnPE) - Mike Voss at Intel Webinar 2018
650 | - [A Work-stealing Runtime for Rust](https://www.youtube.com/watch?v=4DQakkJ8XLI) - Aaron Todd in Air Mozilla 2017
651 | - [C++11/14/17 atomics and memory model: Before the story consumes you](https://www.youtube.com/watch?v=DS2m7T6NKZQ) - Michael Wong in CppCon 2015
652 | - [The C++ Memory Model](https://www.youtube.com/watch?v=gpsz8sc6mNU) - Valentin Ziegler at C++ Meeting 2014
653 | - [Sharcnet HPC](https://www.youtube.com/channel/UCCRmb5_GMWT2hSlALHlwIMg)
654 | - [Low Latency C++ for fun and profit](https://www.youtube.com/watch?v=BxfT9fiUsZ4)
655 | - [scalane python profiler](https://youtu.be/5iEf-_7mM1k)
656 | - [Kokkos lectures](https://www.youtube.com/watch?v=rUIcWtFU5qM&t=698s)
657 | - [EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)](https://www.youtube.com/watch?v=WpVbcYnFJmQ)
658 | - [The Spack 2022 Roadmap](https://www.youtube.com/watch?v=HyA7RpjoY1k)
659 | - [A Not So Simple Matter of Software | Talk by Turing Award Winner Prof. Jack Dongarra](https://youtu.be/QBCX3Oxp3vw)
660 | - [Vectorization/SIMD intrinsics](https://www.youtube.com/watch?v=x9Scb5Mku1g)
661 | - [New Silicon for Supercomputers: A Guide for Software Engineers](https://www.youtube.com/watch?v=w3xNLj6nRgs&t=197s)
662 | - [TechTechPotato Channel](TechTechPotato)
663 | - [How to write the perfect hash table ](https://www.youtube.com/watch?v=DMQ_HcNSOAI)
664 | - [FosDem 2024 HPC Big Data Conference videos](https://fosdem.org/2024/schedule/track/hpc-big-data-data-science/)
665 | - [Bright Computing Cluster Management Technical Overview](https://www.youtube.com/watch?v=0AxzcZuviW0)
666 | - [What is HPC? An introduction by Canonical](https://www.youtube.com/watch?v=tGIobcyKViI)
667 | - [Slurm job schedular basics](https://www.youtube.com/watch?v=Juo_mb3otJ0)
668 | - [EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)](https://youtu.be/WpVbcYnFJmQ?feature=shared)
669 | - [Warewulf HPC Youtube Channel](https://www.youtube.com/@WarewulfHPC)
670 | - [Scott Meyers: Cpu Caches and Why You Car](https://www.youtube.com/watch?v=WDIkqP4JbkE&t=1s)
671 |
672 | #### Presentation Slides
673 | - [Task based Parallelism and why it's awesome](https://www.fz-juelich.de/ias/jsc/EN/Expertise/Workshops/Conferences/CSAM-2015/Programme/lecture7a_gonnet-pdf.pdf?__blob=publicationFile) - Pedro Gonnet
674 | - [Tuning Slurm Scheduling for Optimal Responsiveness and Utilization](https://slurm.schedmd.com/SUG14/sched_tutorial.pdf)
675 | - [Parallel Programming Models Overview (2020)](https://www.researchgate.net/publication/348187154_Parallel_programming_models_overview_2020)
676 | - [Comparative Analysis of Kokkos and Sycl (Jeff Hammond)](https://www.iwocl.org/wp-content/uploads/iwocl-2019-dhpcc-jeff-hammond-a-comparitive-analysis-of-kokkos-and-sycl.pdf)
677 | - [Hybrid OpenMP/MPI Programming](https://www.nersc.gov/assets/Uploads/NUG2013hybridMPIOpenMP2.pdf)
678 | - [Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean (Google)](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf)
679 | - [Practical Debugging and Performance Engineering](https://orbilu.uni.lu/bitstream/10993/55305/1/Practical_Debugging_and_Performance_Engineering_for_HPC.pdf)
680 |
681 | #### Optimization Case Studies
682 | - [Optimizing a Math Expression Parser in Rust](https://rpallas.xyz/math-parser/)
683 | - [Performance Hints of the Week](https://abseil.io/fast/)
684 |
685 | #### Building Clusters/Virtual Clusters
686 | - [Resources for learning about HPC networks and storage r/HPC](https://www.reddit.com/r/HPC/comments/17o0q5d/resources_for_learning_about_hpc_networks_and/)
687 | - [Slurm for dummies guide](https://github.com/SergioMEV/slurm-for-dummies)
688 | - [Build a cluster under 50k](https://www.reddit.com/r/HPC/comments/srssrt/build_a_minicluster_under_50000/)
689 | - [Build a Beowulf cluster](https://github.com/darshanmandge/Cluster)
690 | - [Build a Raspberry Pi Cluster](https://www.raspberrypi.com/tutorials/cluster-raspberry-pi-tutorial/)
691 | - [Puget Systems](https://www.pugetsystems.com/)
692 | - [Lambda Systems](https://lambdalabs.com/)
693 | - [Titan computers](https://www.titancomputers.com)
694 | - [Temple course on building/maintaining a cluster](https://www.hpc.temple.edu/mhpc/2021/hpc-technology/index.html)
695 | - [Detailed reddit discussion on setting up a small cluster](https://www.reddit.com/r/HPC/comments/xeipt7/setting_up_a_small_hpc_cluster/)
696 | - [Tiny titan - build a really cool pi supercomputer](https://github.com/tinytitan)
697 | - [Turing PI - mini PI cluster off the shelf](https://turingpi.com/product/turing-pi-2-5/)
698 | - [Raspberry Pi Cluster](https://www.raspberrypi.com/tutorials/cluster-raspberry-pi-tutorial/)
699 | - [Building an Intel HPC cluster with OpenHPC](https://cdrdv2-public.intel.com/671501/installguide-openhpc2-centos8-18jul21.pdf)
700 | - [Reddit r/HPC post on building clusters](https://www.reddit.com/r/HPC/comments/11azmhy/wanting_to_setup_a_cluster/)
701 | - [Build a virtual cluster with PelicanHPC](https://sourceforge.net/projects/pelicanhpc/)
702 | - [Building a High-performance Computing Cluster Using FreeBSD](https://people.freebsd.org/~brooks/papers/bsdcon2003/fbsdcluster/)
703 | - [Supermicro GPU racks](https://www.supermicro.com/en/products/gpu)
704 | - [VirtualOrfeo - Virtual HPC Cluster](https://gitlab.com/area7/datacenter/codes/virtualorfeo)
705 | - [Is there a reason to build a raspberry pi clluster](https://www.reddit.com/r/HPC/comments/1bfywk8/is_there_ever_a_reason_to_build_a_raspberry_pi/)
706 | - [Building a NVIDIA Jetson Cluster](https://www.hackster.io/shahizat/how-to-build-nvidia-jetson-hpc-cluster-using-slurm-ed61a7)
707 | - [Building your own HPC using ebay parts](https://www.reddit.com/r/HPC/comments/1m7la7z/building_my_own_hpc_using_ebay_parts_beginner_tips/)
708 | - [Magic Castle - Terraform modules to replicate HPC in cloud](https://github.com/ComputeCanada/magic_castle)
709 |
710 | #### Forums
711 | - [r/hpc](https://www.reddit.com/r/HPC/)
712 | - [r/homelab](https://www.reddit.com/r/homelab/)
713 | - [r/slurm](https://www.reddit.com/r/SLURM/)
714 |
715 | #### Careers/Jobs
716 | - [HPC University Careers search](http://hpcuniversity.org/careers/)
717 | - [HPC wire career site](https://careers.hpcwire.com/)
718 | - [HPC wire job postings](https://jobs.hpcwire.com/)
719 | - [HPC certification](https://www.hpc-certification.org/)
720 | - [HPC SysAdmin Jobs (reddit)](https://www.reddit.com/r/HPC/comments/w5eu66/systems_administrator_systems_engineer_jobs/)
721 | - [The United States Research Software Engineer Association](https://us-rse.org/)
722 | - [NCSA Internship](https://wiki.ncsa.illinois.edu/display/NCSACIP/NCSA+Internship+Program+for+CI+Professionals+Home)
723 | - [AI and Future HPC Job Prospect](https://www.reddit.com/r/HPC/comments/12anrgq/hpc_future_career_prospects/)
724 | - [HPC sys admin career (reddit)](https://www.reddit.com/r/HPC/comments/16jkqlv/it_support_for_an_academic_hpc_cluster_as_a_career/)
725 |
726 | #### Membership Clubs
727 | - [Association for Computing Machinery](acm.org)
728 | - [ETP4HPC](https://www.etp4hpc.eu/)
729 | - [The SIGHPC Systems Professionals](https://sighpc-syspros.org/)
730 |
731 | #### Blogs
732 | - [1024 Cores](http://www.1024cores.net/) - Dmitry Vyukov
733 | - [The Black Art of Concurrency](https://www.internalpointers.com/post-group/black-art-concurrency) - Internal Pointers
734 | - [Cluster Monkey](https://www.clustermonkey.net/)
735 | - [Johnathon Dursi](https://www.dursi.ca/)
736 | - [Arm Vendor HPC blog](https://community.arm.com/developer/tools-software/hpc/b/hpc-blog)
737 | - [HPC Notes](https://www.hpcnotes.com/)
738 | - [Brendan Gregg Performance Blog](https://www.brendangregg.com/blog/index.html)
739 | - [Performance engineering blog](https://pramodkumbhar.com)
740 | - [Concurrency Freaks](https://concurrencyfreaks.blogspot.com/)
741 | - [Servers@Home](https://servers.hydrology.cc/blog/)
742 | - [Dr.Bandwith Blog](https://sites.utexas.edu/jdm4372/2010/10/01/welcome-to-dr-bandwidths-blog/)
743 | - [Johnny's Software Lab](https://johnnysswlab.com/)
744 | - [Daniel Lemire Blog](https://lemire.me/blog/)
745 | - [Gigabyte HPC Blog](https://www.gigabyte.com/)
746 |
747 | #### Journals
748 | - [IEEE Transactions on Parallel and Distributed Systems (TPDS)](https://www.computer.org/csdl/journal/td)
749 | - [Journal of Parallel and Distributed Computing](https://www.journals.elsevier.com/journal-of-parallel-and-distributed-computing)
750 |
751 | #### Conferences
752 | - [ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP)](https://ppopp19.sigplan.org/home)
753 | - [ACM Symposium on Parallel Algorithms and Architectures (SPAA)](https://spaa.acm.org/)
754 | - [SC conference (SC)](https://supercomputing.org/)
755 | - [IEEE International Parallel and Distributed Processing Symposium (IPDPS)](http://www.ipdps.org/)
756 | - [International Conference on Parallel Processing (ICPP)](https://www.hpcs.cs.tsukuba.ac.jp/icpp2019/)
757 | - [IEEE High Performance Extreme Computing Conference (HPEC)](https://ieee-hpec.org/cfp.htm)
758 | - [FosDem](https://fosdem.org/)
759 | - [Energy HPC Conference](https://www.energyhpc.rice.edu/)
760 |
761 | #### Communities/Chat Groups
762 | - [HPC Social Discord server](https://hpc.social/projects/chat/)
763 | - [HPC Social slack group](https://hpcsocial.slack.com/)
764 | - [HPC Social](https://hpc.social/)
765 | - [Beowulf Mailing List](https://www.beowulf.org/)
766 | - [Society of Research Software Engineering](https://society-rse.org/get-involved/)
767 | - [Women In HPC](https://womeninhpc.org/)
768 | - [HPC Hallway](https://hpc-hallway.github.io/The-Hallway/)
769 | - [The High Performance Computing Special Interest Group](https://hpc-sig.org.uk/)
770 | - [SigHPC](https://www.sighpc.org/)
771 |
772 | #### Twitters
773 | - [Top500](https://twitter.com/top500supercomp?s=20)
774 | - [HPE HPC](https://twitter.com/hpe_hpc)
775 | - [HPC Wire](https://twitter.com/HPCwire)
776 | - [Rookie HPC](https://twitter.com/RookieHPC?s=20)
777 | - [HPC_Guru](https://twitter.com/HPC_Guru?s=20&t=jHjVtUaZhz4s6Rq62IAmYg)
778 | - [Jeff Hammond](https://twitter.com/science_dot)
779 |
780 | #### Consulting
781 | - [Advanced Clustering](https://www.advancedclustering.com/)
782 | - [Redline Performance](https://redlineperf.com/)
783 | - [R systems](http://rsystemsinc.com/)
784 |
785 | #### Interview Preparation
786 | - [Reddit Entry Level HPC interview help](https://www.reddit.com/r/HPC/comments/nhpdfb/entrylevel_hpc_job_interview/)
787 | - [Reddit HPC Admin Interview help](https://www.reddit.com/r/HPC/comments/1lq3pjp/need_advice_upcoming_hpc_admin_interview/)
788 |
789 | #### Organizations
790 | - [Prace](https://prace-ri.eu/)
791 | - [Xsede](https://www.xsede.org/)
792 | - [Compute Canada](https://www.computecanada.ca/)
793 | - [Riken CSS](https://www.riken.jp/en/research/labs/r-ccs/)
794 | - [Pawsey](https://pawsey.org.au/)
795 | - [International Data Corporation](https://www.idc.com/)
796 | - [List of Federally funded research and development centers](https://en.wikipedia.org/wiki/Federally_funded_research_and_development_centers)
797 | - [The HPC.NRW competence network of North-Rhine-Westphalia](https://hpc.dh.nrw/)
798 |
799 | #### Interesting r/HPC posts
800 | - [finding a supercomputer to use for research](https://www.reddit.com/r/HPC/comments/19e58z7/how_do_i_go_about_finding_a_supercomputer_to_use/)
801 |
802 | #### Misc. Wikis
803 | - [Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law)
804 | - [HPC Wiki](https://hpc-wiki.info/hpc/HPC_Wiki)
805 | - [FLOPS](https://en.wikipedia.org/wiki/FLOPS)
806 | - [Computational complexity of math operations](https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations)
807 | - [Many Task Computing](https://en.wikipedia.org/wiki/Many-task_computing)
808 | - [High Throughput Computing](https://en.wikipedia.org/wiki/High-throughput_computing)
809 | - [Parallel Virtual Machine](https://en.wikipedia.org/wiki/Parallel_Virtual_Machine)
810 | - [OSI Model](https://en.wikipedia.org/wiki/OSI_model)
811 | - [Workflow management](https://en.wikipedia.org/wiki/Scientific_workflow_system)
812 | - [Compute Canada Documentation](https://docs.computecanada.ca/wiki/Compute_Canada_Documentation)
813 | - [Network Interface Controller (NIC)](https://en.wikipedia.org/wiki/Network_interface_controller)
814 | - [Just in time compilation](https://en.wikipedia.org/wiki/Just-in-time_compilation)
815 | - [List of distributed computing projects](https://en.wikipedia.org/wiki/List_of_distributed_computing_projects)
816 | - [Computer cluster](https://en.wikipedia.org/wiki/Computer_cluster)
817 | - [Quasi-opportunistic supercomputing](https://en.wikipedia.org/wiki/Quasi-opportunistic_supercomputing)
818 | - [Limits of Computation](https://en.wikipedia.org/wiki/Limits_of_computation)
819 | - [Bremermann's Limit](https://en.wikipedia.org/wiki/Bremermann%27s_limit)
820 | - [Concurrency patterns](https://en.wikipedia.org/wiki/Concurrency_pattern)
821 | - [Parallel Computing](https://en.wikipedia.org/wiki/Parallel_computing)
822 | - [Server Management](https://wiki.hydrology.cc/en/home)
823 |
824 | #### Misc. Papers/Articles
825 | - [Advanced Parallel Programming in C++](https://www.diehlpk.de/assets/modern_cpp.pdf)
826 | - [Tools for scientific computing](https://arxiv.org/pdf/2108.13053.pdf)
827 | - [Quantum Computing for High Performance Computing](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9537178)
828 | - [Benchmarking data science: Twelve ways to lie with statistics and performance on parallel computers.](http://ww.unixer.de/publications/img/hoefler-12-ways-data-science-preprint.pdf)
829 | - [Establishing the IO500 Benchmark](https://www.vi4io.org/_media/io500/about/io500-establishing.pdf)
830 | - [NVIDIA High Performance Computing articles](https://research.nvidia.com/research-area/high-performance-computing)
831 | - [Let's write a superoptimizer](https://austinhenley.com/blog/superoptimizer.html)
832 | - [Why I think C++ is still a desirable coding platform compared to Rust](https://lucisqr.substack.com/p/why-i-think-c-is-still-a-very-attractive)
833 | - [The State of Fortran (arxiv paper 2022)](https://arxiv.org/abs/2203.15110)
834 | - [50 years later, is two phase locking still the best](https://concurrencyfreaks.blogspot.com/2023/09/50-years-later-is-two-phase-locking.html)
835 | - [Estimating your memory bandwith](https://lemire.me/blog/2024/01/13/estimating-your-memory-bandwidth/)
836 |
837 | #### Misc. Repos
838 | - [Build a Beowulf cluster](https://github.com/darshanmandge/Cluster)
839 | - [libsc - Supercomputing library](https://github.com/cburstedde/libsc)
840 | - [xbyak jit assembler](https://github.com/herumi/xbyak)
841 | - [cpufetch - pretty cpu info fetcher](https://github.com/Dr-Noob/cpufetch)
842 | - [RRZE-HPC](https://github.com/RRZE-HPC)
843 | - [Argonne Github](https://github.com/Argonne-National-Laboratory)
844 | - [Argonne Leadership Computing Facility](https://github.com/argonne-lcf)
845 | - [Oak Ridge National Lab Github](https://github.com/ORNL)
846 | - [Compute Canada](https://github.com/ComputeCanada)
847 | - [HPCInfo by Jeff Hammond](https://github.com/jeffhammond/HPCInfo)
848 | - [Texas Advanced Computing Center (TACC) Github](https://github.com/TACC)
849 | - [LANL HPC Github](https://github.com/hpc)
850 | - [Rust in HPC](https://github.com/westernmagic/rust-in-hpc)
851 | - [University of Buffalo - Center for Computational Research](https://github.com/ubccr)
852 | - [Center for High Performance Computing - University of Utah](https://github.com/CHPC-UofU)
853 | - [Top500 Supercomputer Data Analysis](https://github.com/glennklockwood/top500-data)
854 |
855 | #### Misc. Theses
856 | - [Rust programming language in the high-performance computing environment](https://www.research-collection.ethz.ch/handle/20.500.11850/474922)
857 |
858 | #### Misc.
859 | - [Exascale Project](https://www.exascaleproject.org/)
860 | - [Pocket HPC Survival Guide](https://tin6150.github.io/psg/lsf.html)
861 | - [HPC Summer school](https://www.ihpcss.org/)
862 | - [Overview of all linear algebra packages](http://www.netlib.org/utk/people/JackDongarra/la-sw.html)
863 | - [Latency numbers](http://norvig.com/21-days.html#answers)
864 | - [Nvidia HPC benchmarks](https://ngc.nvidia.com/catalog/containers/nvidia:hpc-benchmarks)
865 | - [Intel Intrinsics Guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#)
866 | - [AWS Cloud calculator](https://calculator.aws/)
867 | - [Quickly benchmark C++ functions](https://quick-bench.com/)
868 | - [LLNL Software repository](https://software.llnl.gov/)
869 | - [Boinc - volunteer computing projects](https://boinc.berkeley.edu/projects.php)
870 | - [Prace Training Events](https://events.prace-ri.eu/category/2/)
871 | - [Nice discussion on FlameGraph profiling](https://stackoverflow.com/questions/27842281/unknown-events-in-nodejs-v8-flamegraph-using-perf-events/27867426#27867426)
872 | - [Nice discussion on parts of a supercomputer on reddit](https://www.reddit.com/r/HPC/comments/11elh93/job_node_socket_task_runner_device_thread_logical/)
873 | - [Technical Report on C++ performance](https://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf)
874 | - [BOINC Compute for science](https://boinc.berkeley.edu/)
875 | - [Count prime numbers using MPI](https://people.sc.fsu.edu/~jburkardt/c_src/prime_mpi/prime_mpi.html)
876 | - [How to build your LEGO Scafell Pike Supercomputer](https://www.youtube.com/watch?v=m499o5rLh38)
877 |
878 | #### Games/Challenges
879 | - [Deadlock empire - practice concurrency](https://github.com/deadlockempire/deadlockempire.github.io)
880 | - [Sad Server - practice linux server management](https://sadservers.com/scenarios)
881 | - [Vim Adventures](https://vim-adventures.com/)
882 |
883 | ## Other Curated Lists
884 | - [Awesome Cloud HPC](https://github.com/kjrstory/awesome-cloud-hpc)
885 | - [Parallel Computing Guide](https://github.com/mikeroyal/Parallel-Computing-Guide)
886 | - [Awesome Parallel Computing](https://github.com/taskflow/awesome-parallel-computing)
887 | - [Princeton resources on OpenMP](https://researchcomputing.princeton.edu/education/external-online-resources/openmp)
888 | - [Awesome HPC](https://github.com/dstdev/awesome-hpc/)
889 | - [Sig HPC Education](https://sighpceducation.acm.org/resources/hpcresources/)
890 | - [Fortran Codes On Github](https://github.com/Beliavsky/Fortran-code-on-GitHub)
891 | - [Fortran Tools](https://github.com/Beliavsky/Fortran-Tools)
892 |
893 | ## Acknowledgements
894 |
895 | This repo started from the great curated list https://github.com/taskflow/awesome-parallel-computing
896 |
897 |
898 |
--------------------------------------------------------------------------------