├── pdf_ange_albertini.png └── README.md /pdf_ange_albertini.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zbetcheckin/PDF_analysis/HEAD/pdf_ange_albertini.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PDF Analysis 2 | 3 | >Several PDF analysis has already been done, I reassembled a lot of them with additional tips & tools here 4 | 5 | 6 | - [PDF format](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#pdf-format-page_facing_up)
7 | - [Tools list](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#tools-list-wrench)
8 | - [Quick Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#quick-analysis-rocket)
9 | - [Complete Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#complete-analysis-mag_right)
10 | - [Basic informations](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#basic-informations-1)
11 | - [Metadata](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#metadata)
12 | - [Search for older version](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#search-for-older-versions)
13 | - [Online Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#online-analysis-1)
14 | - [Statistics](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#statistics)
15 | - [Visual analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#visual-analysis)
16 | - [Go deeper in the analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#go-deeper-in-the-analysis)
17 | - [Displaying objects and actions structure](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#displaying-objects-and-actions-structure-1)
18 | - [Map of the objects flows](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#map-of-the-objects-flows)
19 | - [Actions](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#actions)
20 | - [Compression](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#compression)
21 | - [Embeded files](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#embeded-files)
22 | - [Extract files / scripts / objects](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#extract-files--scripts--objects-1)
23 | - [Conversions](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#conversion)
24 | - [Encryption](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#encryption)
25 | - [Javascript](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#javascript)
26 | - [Flash](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#flash)
27 | - [Sources](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#sources-information_source)
28 | 29 | 30 | ## PDF Format :page_facing_up: 31 | 32 |

33 | alt text 34 |

35 |
36 | https://www.adobe.com/devnet/pdf/pdf_reference.html
37 | https://blog.didierstevens.com/2008/04/09/quickpost-about-the-physical-and-logical-structure-of-pdf-files/
38 | https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF
39 | 40 | ## Tools list :wrench: 41 | 42 | Tool | URL 43 | ------------------------------------ | --------------------------------------------- 44 | AnalyzePDF.py | https://github.com/hiddenillusion/AnalyzePDF 45 | ByteForce | https://github.com/weaknetlabs/ByteForce 46 | Caradoc | https://github.com/ANSSI-FR/caradoc 47 | Didier Stevens suite | https://github.com/DidierStevens/DidierStevensSuite 48 | dumppdf | https://packages.debian.org/jessie/python-pdfminer 49 | forensics-all | https://packages.debian.org/jessie-backports/forensics-all 50 | Origami | https://code.google.com/archive/p/origami-pdf/ 51 | ParanoiDF | https://github.com/patrickdw123/ParanoiDF 52 | peepdf | https://github.com/jesparza/peepdf 53 | PDF Xray | https://github.com/9b/pdfxray_public 54 | pdf-parser | http://didierstevens.com/files/software/pdf-parser_V0_6_4.zip 55 | pdf2jhon.py | https://github.com/magnumripper/JohnTheRipper/blob/unstable-jumbo/run/pdf2john.py 56 | pdfcrack | https://packages.debian.org/jessie/pdfcrack 57 | pdfextract | https://github.com/CrossRef/pdfextract 58 | pdfobjflow.py | https://bitbucket.org/sebastiendamaye/pdfobjflow 59 | pdfresurrect | https://packages.debian.org/jessie/pdfresurrect 60 | PdfStreamDumper.exe | http://sandsprite.com/CodeStuff/PDFStreamDumper_Setup.exe 61 | pdftk | https://packages.debian.org/en/jessie/pdftk 62 | pdfxray_lite.py | https://github.com/9b/pdfxray_lite 63 | poppler-utils | https://packages.debian.org/en/jessie/poppler-utils (pdftotext, pdfimages, pdftohtml, pdftops, pdfinfo, pdffonts, pdfdetach, pdfseparate, pdfsig, pdftocairo, pdftoppm, pdfunite) 64 | pyew | https://packages.debian.org/en/jessie/pyew 65 | qpdf | https://packages.debian.org/jessie/qpdf 66 | swf_mastah.py | https://github.com/9b/pdfxray_public/blob/master/builder/swf_mastah.py 67 | 68 | 69 | #### Existing list 70 | http://blog.didierstevens.com/programs/pdf-tools/
71 | https://github.com/sans-dfir/sift-files/tree/master/pdf-tools 72 | 73 | 74 | ## Quick Analysis :rocket: 75 | 76 | 77 | #### Basic informations 78 | ``` 79 | $ file file.pdf 80 | $ pdfinfo -box -meta -js -rawdates file.pdf 81 | ``` 82 | 83 | 84 | #### Displaying objects and actions structure 85 | ``` 86 | $ python pdfdid.py -aefv file.pdf 87 | ``` 88 | 89 | 90 | #### Search for /OpenAction /AA /Launch /GoTo /GoToR /SubmitForm /Richmedia (for Flash) /JS /JavaScript /URI - Encode - Cipher - Shell code - Obfuscation... 91 | Automatically with ParanoiDF 92 | ``` 93 | $ python paranoiDF.py -fl file.pdf 94 | ``` 95 | Or with pdf-parser 96 | ``` 97 | $ python pdf-parser.py -v file.pdf 98 | ``` 99 | With an hexadecimal analyser 100 | ``` 101 | $ bless file.pdf 102 | ``` 103 | 104 | 105 | #### Extract files / scripts / Objects 106 | pdf-parser to extract a js object for example 107 | ``` 108 | $ pdf-parser --object 32 --raw > extractedObject.js 109 | ``` 110 | pdfextract from Origami 111 | ``` 112 | $ pdfextract file.pdf 113 | ``` 114 | 115 | 116 | #### Online analysis
117 | *Beware to don't leak any important/professional/personnal data or to expose your research*
118 | https://www.hybrid-analysis.com/ 119 | 120 | 121 | ## Complete Analysis :mag_right: 122 | 123 | 124 | ### Basic informations 125 | ``` 126 | $ file file.pdf 127 | $ pdfinfo file.pdf 128 | $ pdfinfo -box -meta -js -rawdates file.pdf 129 | ``` 130 | 131 | 132 | ### Powerfull Python tool to analyze PDF and exploit 133 | ``` 134 | $ pyew file.pdf 135 | ``` 136 | 137 | 138 | ### Other Python tool to explore PDF 139 | ``` 140 | $ peepdf -fl file.pdf 141 | $ peepdf --interactive file.pdf 142 | ``` 143 | 144 | 145 | #### Analysis under Windows 146 | PDF Stream Dumper
147 | https://github.com/dzzie/pdfstreamdumper 148 | 149 | 150 | 151 | ### Metadata 152 | Get metadata 153 | ``` 154 | $ exiftool -a -u -g2 file.pdf 155 | ``` 156 | 157 | Get metadata recursivly from current directory 158 | ``` 159 | $ exiftool -r -ext pdf . 160 | ``` 161 | 162 | Change an element 163 | ``` 164 | $ exiftool -Title="New title" file.pdf 165 | ``` 166 | 167 | Remove metadata 168 | ``` 169 | $ exiftool -all= file.pdf && exiftool -all:all= file.pdf && qpdf --linearize file.pdf filewithoutmeta.pdf 170 | $ mat file.pdf # latest version of mat doesn't support pdf format anymore... 171 | ``` 172 | 173 | Remove metadata recursively from the current directory : 174 | *Very dirty but work well* 175 | *The filename must not have space at the moment, the commande will be optimized* 176 | ``` 177 | $ find . -name "*.pdf" -print0 | while read -d $'\0' file; do echo ${file:2} && mv ${file:2} ${file:2}.pdf && exiftool -all= ${file:2}.pdf && exiftool -all:all= ${file:2}.pdf && qpdf --linearize ${file:2}.pdf ${file:2} && rm ${file:2}.pdf && rm ${file:2}.pdf_original; done 178 | ``` 179 | 180 | 181 | ### Search for older versions 182 | Search for older "hidden" versions 183 | ``` 184 | $ pdfresurrect file.pdf -i 185 | $ exiftool -pdf-update:all= file.pdf 186 | ``` 187 | 188 | 189 | ### Online Analysis 190 | Name | URL 191 | ------------------------------------ | --------------------------------------------- 192 | Malwr | https://malwr.com/submission/ 193 | Hybrid analysis | https://www.hybrid-analysis.com/ 194 | Malware Tracker | https://www.malwaretracker.com/pdf.php 195 | VirusTotal | http://www.virustotal.com/ 196 | PDF examiner | http://www.pdfexaminer.com/ 197 | Document Analyzer | http://www.document-analyzer.net/ 198 | Jotti | https://virusscan.jotti.org/ 199 | PDF X-ray | http://www.pdfxray.com/ 200 | PDF Online | https://www.pdf-online.com/ 201 | Extract PDF | http://www.extractpdf.com 202 | Char conversion | https://kt.pe/tools.html#conv/ 203 | 204 | 205 | ### Statistics 206 | Calcul byte statistics, entropy min and max, ASCII count, ... from a PDF 207 | ``` 208 | $ python byte-stats.py file.pdf 209 | ``` 210 | 211 | 212 | ### Visual analysis 213 | Visual analysis of a PDF or a binary file
214 | http://binvis.io 215 | 216 | 217 | ## Go deeper in the analysis 218 | 219 | ### Displaying objects and actions structure 220 | ``` 221 | $ python pdfid.py --all --extra --force --verbose file.pdf 222 | ``` 223 | 224 | ### Map of the objects flows 225 | ``` 226 | $ pdf-parser file.pdf | ./pdfobjflow 227 | $ eog pdfobjflow.png 228 | ``` 229 | 230 | 231 | ### Actions 232 | Search for :
233 | /OpenAction /AA specifies the script or action to run automatically.
234 | /Names /AcroForm /Action can also specify and launch scripts or actions.
235 | /JavaScript specifies JavaScript to run.
236 | /GoTo changes the view to a specified destination within the PDF or in another PDF file.
237 | /Launch a program or opens a document.
238 | /URI accesses a resource by its URL.
239 | /SubmitForm /GoToR can send data to URL.
240 | /RichMedia can be used to embed Flash in PDF.
241 | /ObjStm can hide objects inside an Object Stream.
242 | /JavaScript > /J#61vaScript Beware on obfuscation technique with hex codes 243 | 244 | 245 | With ParanoiDF 246 | ``` 247 | $ python paranoiDF.py -fl file.pdf 248 | ``` 249 | With pdf-parser 250 | ``` 251 | $ python pdf-parser.py -v file.pdf 252 | ``` 253 | With an hexadecimal analyser 254 | ``` 255 | $ bless file.pdf 256 | ``` 257 | With dumppdf 258 | ``` 259 | $ dumppdf -a file.pdf 260 | ``` 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | ### Compression 269 | Search for compression 270 | ``` 271 | $ strings file.pdf | grep --color "/Filter" 272 | ``` 273 | 274 | 2 ways to decompress a PDF 275 | ``` 276 | $ pdftk compressed.pdf output uncompressed.pdf uncompress 277 | $ qpdf --stream-data=uncompress compressed.pdf uncompressed.pdf 278 | ``` 279 | 280 | 281 | 282 | ### Embeded files 283 | 4 ways to search for embeded files/scripts inside a PDF 284 | ``` 285 | $ binwalk file.pdf 286 | $ foremost -a -v file.pdf 287 | $ hachoir-subfile file.pdf 288 | $ scalpel file.pdf 289 | ``` 290 | 291 | 292 | ### Extract files / scripts / objects 293 | Extract file corresponding to object ID, jpg for example 294 | ``` 295 | $ dumppdf.py -i 32 -r file.pdf > image.jpg 296 | ``` 297 | Extract js from an object for example 298 | ``` 299 | $ pdf-parser --object 32 --raw > extractedObject.js 300 | ``` 301 | pdfextract from Origami 302 | ``` 303 | $ pdfextract file.pdf 304 | ``` 305 | 306 | 307 | 308 | ### Conversion 309 | PDF to Postscript 310 | ``` 311 | $ pdftops file.pdf 312 | ``` 313 | PDF to TXT 314 | ``` 315 | $ pdftotext file.pdf 316 | ``` 317 | PDF to JPG 318 | ``` 319 | $ convert file.pdf image.jpg 320 | ``` 321 | Non-exhaustive list of possible conversion 322 | 323 | 324 | 325 | 326 | ### LZWDecode filter 327 | Convert a PDF to Postscript without the LZWDecode filter 328 | ``` 329 | $ qpdf --stream-data=uncompress original.pdf decoded.pdf # Decompress it 330 | $ pdftops decoded.pdf decoded.ps # Convert it 331 | ``` 332 | 333 | 334 | ### Encryption 335 | PDF supports RC4 encryption (40 to 128 bits keys) and AES (128 to 256 with the Extension Level 3).
336 | Beware with empty password. 337 | 338 | 339 | #### Password recovering 340 | Brute force a PDF with pdfcrack 341 | ``` 342 | $ pdfcrack -w yourDictionnary.txt file.pdf 343 | ``` 344 | With john 345 | ``` 346 | $ pdf2john.py file.pdf > x.hash 347 | $ john --wordlist=yourDictionnary.txt x.hash 348 | ``` 349 | 350 | 351 | ### Javascript 352 | 353 | 2 ways to search for Javascript 354 | ``` 355 | $ pdf-parser --search=JavaScript file.pdf 356 | $ pdfinfo -js file.pdf 357 | ``` 358 | 359 | 360 | Extract an object 361 | With jsunpack 362 | ``` 363 | $ jsunpack-extractjs file.pdf 364 | ``` 365 | With pdf-parser 366 | ``` 367 | $ pdf-parser --object 32 --raw file.pdf > file.js 368 | ``` 369 | With pdfextract from Origami 370 | ``` 371 | $ pdfextract --js file.pdf 372 | ``` 373 | 374 | 375 | #### De-obfuscate 376 | https://github.com/urule99/jsunpack-n
377 | 378 | Online :
379 | http://jsunpack.jeek.org/java/
380 | 381 | Malzilla and SpiderMonkey can also help deobfuscate JavaScript.
382 | Malzilla :
383 | http://www.malzilla.org/downloads.html
384 | SpiderMonkey :
385 | http://www.didierstevens.com/files/software/js-1.7.0-mod.tar.gz
386 | More details coming soon. 387 | 388 | 389 | #### Add Javascript to PDF 390 | https://didierstevens.com/files/software/make-pdf_V0_1_6.zip
391 | https://neonprimetime.blogspot.fr/2015/03/how-to-add-javascript-to-pdf.html
392 | 393 | 394 | #### Disarming a PDF 395 | ``` 396 | $ python pdfid.py --disarm file.pdf 397 | ``` 398 | 399 | 400 | ### Flash 401 | 402 | Search for flash 403 | ``` 404 | $ python pdf-parser.py --search flash file.pdf 405 | ``` 406 | 407 | Extract flash with swf_mastah 408 | ``` 409 | $ python swf_mastah.py -f file.pdf -o ./ 410 | $ file *.swf 411 | ``` 412 | With pdf-parser 413 | ``` 414 | $ pdf-parser.py --object 32 --filter --raw file.pdf > flashFile.swf 415 | $ file flashFile.swf 416 | ``` 417 | 418 | Analysing flash program 419 | ``` 420 | $ swfdump -Ddu flashFile.swf > flashFile.txt 421 | ``` 422 | More details coming soon. 423 | 424 | 425 | ## Sources :information_source: 426 | 427 | https://blog.didierstevens.com/category/pdf/
428 | http://www.decalage.info/file_formats_security/pdf
429 | https://zeltser.com/analyzing-malicious-documents/
430 | https://code.google.com/archive/p/corkami/wikis/PDFTricks.wiki
431 | https://www.sans.org/reading-room/whitepapers/malicious/owned-malicious-pdf-analysis-33443
432 | https://digital-forensics.sans.org/blog/2009/12/14/pdf-malware-analysis/
433 | http://fileformats.archiveteam.org/wiki/PDF
434 | 435 | --------------------------------------------------------------------------------