├── README.md
├── microarchitecture-cheatsheet.pdf
├── references.txt
└── snapshot.png
/README.md:
--------------------------------------------------------------------------------
1 | Modern CPUs are very complex beasts and there are so much information about them across different departments , therefore they can be overwhelming.
2 | Microarchitecture cheat sheet aims to provide an organised collection of overviews about X86 CPUs that developers shall have on their mind when thinking about performance :
3 |
4 | Last update date : 09 Mar, 2025
5 |
6 |
7 |
8 |
9 |
10 | It divides that huge subject into following realms :
11 |
12 | - Pipeline realm
13 | - Arithmetic realm
14 | - Branch prediction realm
15 | - Load store realm
16 | - Cache memory realm
17 | - System memory realm
18 | - Virtual memory realm
19 | - Multicore realm
20 | - MultiCPU realm
21 | - "Across realms" realm
22 |
23 | ## **Major resources used:**
24 |
25 | Intel® 64 and IA-32 Architectures Software Developer’s Manual
26 | https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
27 |
28 | Intel® 64 and IA-32 Architectures Optimization Reference Manual
29 | https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
30 |
31 | AMD64 Architecture Programmer’s Manual: Volumes 1-5
32 | https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
33 |
34 | The microarchitecture of Intel, AMD, and VIA CPUs , by Agner Fog
35 | https://www.agner.org/optimize/microarchitecture.pdf
36 |
37 | Instruction tables , by Agner Fog
38 | https://www.agner.org/optimize/instruction_tables.pdf
39 |
40 | What every programmer should know about memory, by Ulrich Drepper
41 | https://akkadia.org/drepper/cpumemory.pdf
42 |
43 | ## **Licences:**
44 | As for info and visuals coming from other sources, I provided references with links.
45 |
46 | Everything else is completely free.
47 |
48 | ## **Contact & feedbacks:**
49 | I don't claim that every information is exact or expressed in the best way. Some of them are topics that I had the chance to get my hands on during projects, but some other are from personal researches out of curiosity.
50 |
51 | In case of an issue or suggestion, please feel free to raise an issue.
52 |
53 | My email is akin_ocal@hotmail.com .
54 |
55 | ## **Misc:**
56 | - I used diagrams.net ( which is open source version of drawio ) : https://app.diagrams.net/
--------------------------------------------------------------------------------
/microarchitecture-cheatsheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/akhin/microarchitecture-cheatsheet/cb01f1410453a68c48eefb8395761f8467f376b7/microarchitecture-cheatsheet.pdf
--------------------------------------------------------------------------------
/references.txt:
--------------------------------------------------------------------------------
1 | UICA Online Microarchitecture Analysis Tool
2 | https://uica.uops.info/
3 |
4 | Perf Wiki Tutorial
5 | https://perf.wiki.kernel.org/index.php/Tutorial
6 |
7 | Denis Bakhvalov`s article : Visualizing Performance-Critical Dependency Chains
8 | https://easyperf.net/blog/2022/05/11/Visualizing-Performance-Critical-Dependency-Chains
9 |
10 | Intel`s "How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures" docuementation
11 | https://www.yumpu.com/en/document/view/27787317/how-to-benchmark-code-execution-times-on-intel-ia-32-and-ia-64-
12 |
13 | ACPI on Wikipedia
14 | https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface
15 |
16 | AMD Turbocore page
17 | https://en.wikipedia.org/wiki/AMD_Turbo_Core
18 |
19 | AMD cache injection
20 | https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/white-papers/58725.pdf
21 |
22 | AMD Ryzen Processor Software optimisation by Ken Mitchell , GDC2022
23 | https://gpuopen.com/gdc-presentations/2022/GDC_AMD_Ryzen_Processor_Software_Optimization.pdf
24 |
25 | SIMD : GCC auto vectorisation
26 | https://gcc.gnu.org/projects/tree-ssa/vectorization.html
27 |
28 | Daniel Lemire`s article : AVX-512: when and how to use these new instructions
29 | https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/
30 |
31 | SIMD JSON on Github
32 | https://github.com/simdjson/simdjson
33 |
34 | Memory disambiguation on Wikipedia
35 | https://en.wikipedia.org/wiki/Memory_disambiguation
36 |
37 | Memory disambiguation - Store to load forwarding on Wikipedia
38 | https://en.wikipedia.org/wiki/Memory_disambiguation#Store_to_load_forwarding
39 |
40 | Load Hit Store on Wikipedia
41 | https://en.wikipedia.org/wiki/Load-Hit-Store
42 |
43 | __restrict__ / Restricting Pointer Aliasing on GCC docs site
44 | https://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html
45 |
46 | Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion on Anandtech
47 | https://www.anandtech.com/show/1719/5
48 |
49 | Elan Ruskin`s article : LOAD-HIT-STORES AND THE __RESTRICT KEYWORD
50 | https://web.archive.org/web/20210120214304/http://assemblyrequired.crashworks.org/load-hit-stores-and-the-__restrict-keyword/
51 |
52 | Subnormal numbers on Wikipedia
53 | https://en.wikipedia.org/wiki/Subnormal_number
54 |
55 | Bruce Dawson`s article : That’s Not Normal–the Performance of Odd Floats
56 | https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/
57 |
58 | Intel Intrinsic`s Guide
59 | https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
60 |
61 | Intel AVX10 Specs
62 | https://www.intel.com/content/www/us/en/content-details/828964/intel-advanced-vector-extensions-10-1-intel-avx10-1-architecture-specification.html
63 |
64 | Intel AES on Wikipedia
65 | https://en.wikipedia.org/wiki/AES_instruction_set
66 |
67 | Intel Vector Neural Network Instructions
68 | https://en.wikipedia.org/wiki/AVX-512#VNNI
69 |
70 | Perceptrons on Wikipedia
71 | https://en.wikipedia.org/wiki/Perceptron
72 |
73 | Dynamic Branch Prediction with Perceptrons , by Daniel A. Jimenez , Calvin Lin
74 | https://www.cs.utexas.edu/~lin/papers/hpca01.pdf
75 |
76 | Kernel.org page : The kernel’s command-line parameters to disable security mitigations
77 | https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
78 |
79 | Controlling the Performance Impact of Microcode and Security Patches for CVE-2017-5754 CVE-2017-5715 and CVE-2017-5753 using Red Hat Enterprise Linux Tunables
80 | https://access.redhat.com/articles/3311301
81 |
82 | Meltdown paper
83 | https://meltdownattack.com/meltdown.pdf
84 |
85 | Spectre paper
86 | https://spectreattack.com/spectre.pdf
87 |
88 | Marek Majkovski`s article : Branch predictor: How many "if"s are too many? Including x86 and M1 benchmarks!
89 | https://blog.cloudflare.com/branch-predictor/
90 |
91 | Broadwell microarchitecture
92 | https://en.wikipedia.org/wiki/Broadwell_(microarchitecture)
93 |
94 | Cache prefetching on Wikipedia
95 | https://en.wikipedia.org/wiki/Cache_prefetching
96 |
97 | Intel® Data Direct I/O Technology (Intel® DDIO): A Primer
98 | https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf
99 |
100 | Raymond Chen's article : Why is address space allocation granularity 64KB?
101 | https://devblogs.microsoft.com/oldnewthing/20031008-00/?p=42223
102 |
103 | Why DDR5 is the Industry’s Powerful Next-gen Memory?
104 | https://news.skhynix.com/why-ddr5-is-the-industrys-powerful-next-gen-memory/
105 |
106 | DIMM image source
107 | https://pixabay.com/vectors/dimm-ram-memory-ram-computer-23265/
108 |
109 | Intel hybrid architecture
110 | https://www.intel.com/content/www/us/en/developer/articles/technical/hybrid-architecture.html
111 |
112 | Intel Alder Lake e-cores sharing L2 cache on Anandtech
113 | https://en.wikipedia.org/wiki/Alder_Lake
114 |
115 | AMD Phoenix2 hybrid cpus
116 | https://www.tomshardware.com/news/amd-phoenix-2-review-evaluates-zen-4-zen-4c-performance
117 |
118 | AMD CCX
119 | https://www.tomshardware.com/reviews/amd-ccx-definition-cpu-core-explained,6338.html
120 |
121 | MESI protocol on Wikipedia
122 | https://en.wikipedia.org/wiki/MESI_protocol
123 |
124 | Intel MESIF protocol on Wikipedia
125 | https://en.wikipedia.org/wiki/MESIF_protocol
126 |
127 | AMD MOESI protocol on Wikipedia
128 | https://en.wikipedia.org/wiki/MOESI_protocol
129 |
130 | Erik Rigtorp`s article : Optimising a ring buffer for throughput
131 | https://rigtorp.se/ringbuffer/
132 |
133 | Jeff Preshing`s article : Memory Ordering at Compile Time
134 | https://preshing.com/20120625/memory-ordering-at-compile-time/
135 |
136 | Detecting and handling split locks (in Linux kernel) on lwn.net
137 | https://lwn.net/Articles/790464/
138 |
139 | CAS / Compare and swap on Wikipedia
140 | https://en.wikipedia.org/wiki/Compare-and-swap
141 |
142 | Test-And-Set
143 | https://en.wikipedia.org/wiki/Test-and-set
144 |
145 | Intel sticks another nail in the coffin of TSX with feature-disabling microcode update
146 | https://www.theregister.com/2021/06/29/intel_tsx_disabled/
147 |
148 | AMD`s Advanced Syncronisation Facility ( Transactional memory ) on Wikipedia
149 | https://en.wikipedia.org/wiki/Advanced_Synchronization_Facility
150 |
151 | Intel Cache Allocation Technology page
152 | https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-cache-allocation-technology.html
153 |
154 | Intel Code and Data Prioritisation page
155 | https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-code-and-data-prioritization-with-usage-models.html
156 |
157 | AMD64 Technology Platform Quality of Service Extensions
158 | https://kib.kiev.ua/x86docs/AMD/MISC/56375_1.00_PUB.pdf
159 |
160 | Intel Memory Bandwidth Allocation page
161 | https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-memory-bandwidth-allocation.html
162 |
163 | lstopo on linux.die.net
164 | https://linux.die.net/man/1/lstopo
165 |
166 | Intel memory latency checker
167 | https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html
168 |
169 | Dell`s AMD NUMA node per socket article
170 | https://infohub.delltechnologies.com/l/cpu-best-practices-3/poweredge-numa-nodes-per-socket-1#:~:text=AMD%20servers%20provide%20the%20ability,bank%20into%20two%20equal%20parts.
171 |
172 | Intel Vtune
173 | https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html
174 |
175 | Andi Kleen`s PMU tools / Toplev on Github
176 | https://github.com/andikleen/pmu-tools
177 |
178 | How TMA Addresses Challenges in Modern Servers and Enhancements Coming in IceLake by Ahmad Yasin
179 | https://dyninst.github.io/scalable_tools_workshop/petascale2018/assets/slides/TMA%20addressing%20challenges%20in%20Icelake%20-%20Ahmad%20Yasin.pdf
180 |
181 | Infographics: Operation Costs in CPU Clock Cycles on IT Hare
182 | http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/
--------------------------------------------------------------------------------
/snapshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/akhin/microarchitecture-cheatsheet/cb01f1410453a68c48eefb8395761f8467f376b7/snapshot.png
--------------------------------------------------------------------------------