├── README.md ├── microarchitecture-cheatsheet.pdf ├── references.txt └── snapshot.png /README.md: -------------------------------------------------------------------------------- 1 | Modern CPUs are very complex beasts and there are so much information about them across different departments , therefore they can be overwhelming. 2 | Microarchitecture cheat sheet aims to provide an organised collection of overviews about X86 CPUs that developers shall have on their mind when thinking about performance : 3 | 4 | Last update date : 09 Mar, 2025 5 | 6 |

7 | 8 |

9 | 10 | It divides that huge subject into following realms : 11 | 12 | - Pipeline realm 13 | - Arithmetic realm 14 | - Branch prediction realm 15 | - Load store realm 16 | - Cache memory realm 17 | - System memory realm 18 | - Virtual memory realm 19 | - Multicore realm 20 | - MultiCPU realm 21 | - "Across realms" realm 22 | 23 | ## **Major resources used:** 24 | 25 | Intel® 64 and IA-32 Architectures Software Developer’s Manual 26 | https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html 27 | 28 | Intel® 64 and IA-32 Architectures Optimization Reference Manual 29 | https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html 30 | 31 | AMD64 Architecture Programmer’s Manual: Volumes 1-5 32 | https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf 33 | 34 | The microarchitecture of Intel, AMD, and VIA CPUs , by Agner Fog 35 | https://www.agner.org/optimize/microarchitecture.pdf 36 | 37 | Instruction tables , by Agner Fog 38 | https://www.agner.org/optimize/instruction_tables.pdf 39 | 40 | What every programmer should know about memory, by Ulrich Drepper 41 | https://akkadia.org/drepper/cpumemory.pdf 42 | 43 | ## **Licences:** 44 | As for info and visuals coming from other sources, I provided references with links. 45 | 46 | Everything else is completely free. 47 | 48 | ## **Contact & feedbacks:** 49 | I don't claim that every information is exact or expressed in the best way. Some of them are topics that I had the chance to get my hands on during projects, but some other are from personal researches out of curiosity. 50 | 51 | In case of an issue or suggestion, please feel free to raise an issue. 52 | 53 | My email is akin_ocal@hotmail.com . 54 | 55 | ## **Misc:** 56 | - I used diagrams.net ( which is open source version of drawio ) : https://app.diagrams.net/ -------------------------------------------------------------------------------- /microarchitecture-cheatsheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/akhin/microarchitecture-cheatsheet/cb01f1410453a68c48eefb8395761f8467f376b7/microarchitecture-cheatsheet.pdf -------------------------------------------------------------------------------- /references.txt: -------------------------------------------------------------------------------- 1 | UICA Online Microarchitecture Analysis Tool 2 | https://uica.uops.info/ 3 | 4 | Perf Wiki Tutorial 5 | https://perf.wiki.kernel.org/index.php/Tutorial 6 | 7 | Denis Bakhvalov`s article : Visualizing Performance-Critical Dependency Chains 8 | https://easyperf.net/blog/2022/05/11/Visualizing-Performance-Critical-Dependency-Chains 9 | 10 | Intel`s "How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures" docuementation 11 | https://www.yumpu.com/en/document/view/27787317/how-to-benchmark-code-execution-times-on-intel-ia-32-and-ia-64- 12 | 13 | ACPI on Wikipedia 14 | https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface 15 | 16 | AMD Turbocore page 17 | https://en.wikipedia.org/wiki/AMD_Turbo_Core 18 | 19 | AMD cache injection 20 | https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/white-papers/58725.pdf 21 | 22 | AMD Ryzen Processor Software optimisation by Ken Mitchell , GDC2022 23 | https://gpuopen.com/gdc-presentations/2022/GDC_AMD_Ryzen_Processor_Software_Optimization.pdf 24 | 25 | SIMD : GCC auto vectorisation 26 | https://gcc.gnu.org/projects/tree-ssa/vectorization.html 27 | 28 | Daniel Lemire`s article : AVX-512: when and how to use these new instructions 29 | https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/ 30 | 31 | SIMD JSON on Github 32 | https://github.com/simdjson/simdjson 33 | 34 | Memory disambiguation on Wikipedia 35 | https://en.wikipedia.org/wiki/Memory_disambiguation 36 | 37 | Memory disambiguation - Store to load forwarding on Wikipedia 38 | https://en.wikipedia.org/wiki/Memory_disambiguation#Store_to_load_forwarding 39 | 40 | Load Hit Store on Wikipedia 41 | https://en.wikipedia.org/wiki/Load-Hit-Store 42 | 43 | __restrict__ / Restricting Pointer Aliasing on GCC docs site 44 | https://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html 45 | 46 | Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion on Anandtech 47 | https://www.anandtech.com/show/1719/5 48 | 49 | Elan Ruskin`s article : LOAD-HIT-STORES AND THE __RESTRICT KEYWORD 50 | https://web.archive.org/web/20210120214304/http://assemblyrequired.crashworks.org/load-hit-stores-and-the-__restrict-keyword/ 51 | 52 | Subnormal numbers on Wikipedia 53 | https://en.wikipedia.org/wiki/Subnormal_number 54 | 55 | Bruce Dawson`s article : That’s Not Normal–the Performance of Odd Floats 56 | https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ 57 | 58 | Intel Intrinsic`s Guide 59 | https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html 60 | 61 | Intel AVX10 Specs 62 | https://www.intel.com/content/www/us/en/content-details/828964/intel-advanced-vector-extensions-10-1-intel-avx10-1-architecture-specification.html 63 | 64 | Intel AES on Wikipedia 65 | https://en.wikipedia.org/wiki/AES_instruction_set 66 | 67 | Intel Vector Neural Network Instructions 68 | https://en.wikipedia.org/wiki/AVX-512#VNNI 69 | 70 | Perceptrons on Wikipedia 71 | https://en.wikipedia.org/wiki/Perceptron 72 | 73 | Dynamic Branch Prediction with Perceptrons , by Daniel A. Jimenez , Calvin Lin 74 | https://www.cs.utexas.edu/~lin/papers/hpca01.pdf 75 | 76 | Kernel.org page : The kernel’s command-line parameters to disable security mitigations 77 | https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html 78 | 79 | Controlling the Performance Impact of Microcode and Security Patches for CVE-2017-5754 CVE-2017-5715 and CVE-2017-5753 using Red Hat Enterprise Linux Tunables 80 | https://access.redhat.com/articles/3311301 81 | 82 | Meltdown paper 83 | https://meltdownattack.com/meltdown.pdf 84 | 85 | Spectre paper 86 | https://spectreattack.com/spectre.pdf 87 | 88 | Marek Majkovski`s article : Branch predictor: How many "if"s are too many? Including x86 and M1 benchmarks! 89 | https://blog.cloudflare.com/branch-predictor/ 90 | 91 | Broadwell microarchitecture 92 | https://en.wikipedia.org/wiki/Broadwell_(microarchitecture) 93 | 94 | Cache prefetching on Wikipedia 95 | https://en.wikipedia.org/wiki/Cache_prefetching 96 | 97 | Intel® Data Direct I/O Technology (Intel® DDIO): A Primer 98 | https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf 99 | 100 | Raymond Chen's article : Why is address space allocation granularity 64KB? 101 | https://devblogs.microsoft.com/oldnewthing/20031008-00/?p=42223 102 | 103 | Why DDR5 is the Industry’s Powerful Next-gen Memory? 104 | https://news.skhynix.com/why-ddr5-is-the-industrys-powerful-next-gen-memory/ 105 | 106 | DIMM image source 107 | https://pixabay.com/vectors/dimm-ram-memory-ram-computer-23265/ 108 | 109 | Intel hybrid architecture 110 | https://www.intel.com/content/www/us/en/developer/articles/technical/hybrid-architecture.html 111 | 112 | Intel Alder Lake e-cores sharing L2 cache on Anandtech 113 | https://en.wikipedia.org/wiki/Alder_Lake 114 | 115 | AMD Phoenix2 hybrid cpus 116 | https://www.tomshardware.com/news/amd-phoenix-2-review-evaluates-zen-4-zen-4c-performance 117 | 118 | AMD CCX 119 | https://www.tomshardware.com/reviews/amd-ccx-definition-cpu-core-explained,6338.html 120 | 121 | MESI protocol on Wikipedia 122 | https://en.wikipedia.org/wiki/MESI_protocol 123 | 124 | Intel MESIF protocol on Wikipedia 125 | https://en.wikipedia.org/wiki/MESIF_protocol 126 | 127 | AMD MOESI protocol on Wikipedia 128 | https://en.wikipedia.org/wiki/MOESI_protocol 129 | 130 | Erik Rigtorp`s article : Optimising a ring buffer for throughput 131 | https://rigtorp.se/ringbuffer/ 132 | 133 | Jeff Preshing`s article : Memory Ordering at Compile Time 134 | https://preshing.com/20120625/memory-ordering-at-compile-time/ 135 | 136 | Detecting and handling split locks (in Linux kernel) on lwn.net 137 | https://lwn.net/Articles/790464/ 138 | 139 | CAS / Compare and swap on Wikipedia 140 | https://en.wikipedia.org/wiki/Compare-and-swap 141 | 142 | Test-And-Set 143 | https://en.wikipedia.org/wiki/Test-and-set 144 | 145 | Intel sticks another nail in the coffin of TSX with feature-disabling microcode update 146 | https://www.theregister.com/2021/06/29/intel_tsx_disabled/ 147 | 148 | AMD`s Advanced Syncronisation Facility ( Transactional memory ) on Wikipedia 149 | https://en.wikipedia.org/wiki/Advanced_Synchronization_Facility 150 | 151 | Intel Cache Allocation Technology page 152 | https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-cache-allocation-technology.html 153 | 154 | Intel Code and Data Prioritisation page 155 | https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-code-and-data-prioritization-with-usage-models.html 156 | 157 | AMD64 Technology Platform Quality of Service Extensions 158 | https://kib.kiev.ua/x86docs/AMD/MISC/56375_1.00_PUB.pdf 159 | 160 | Intel Memory Bandwidth Allocation page 161 | https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-memory-bandwidth-allocation.html 162 | 163 | lstopo on linux.die.net 164 | https://linux.die.net/man/1/lstopo 165 | 166 | Intel memory latency checker 167 | https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html 168 | 169 | Dell`s AMD NUMA node per socket article 170 | https://infohub.delltechnologies.com/l/cpu-best-practices-3/poweredge-numa-nodes-per-socket-1#:~:text=AMD%20servers%20provide%20the%20ability,bank%20into%20two%20equal%20parts. 171 | 172 | Intel Vtune 173 | https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html 174 | 175 | Andi Kleen`s PMU tools / Toplev on Github 176 | https://github.com/andikleen/pmu-tools 177 | 178 | How TMA Addresses Challenges in Modern Servers and Enhancements Coming in IceLake by Ahmad Yasin 179 | https://dyninst.github.io/scalable_tools_workshop/petascale2018/assets/slides/TMA%20addressing%20challenges%20in%20Icelake%20-%20Ahmad%20Yasin.pdf 180 | 181 | Infographics: Operation Costs in CPU Clock Cycles on IT Hare 182 | http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/ -------------------------------------------------------------------------------- /snapshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/akhin/microarchitecture-cheatsheet/cb01f1410453a68c48eefb8395761f8467f376b7/snapshot.png --------------------------------------------------------------------------------