├── pdf_ange_albertini.png
└── README.md


/pdf_ange_albertini.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zbetcheckin/PDF_analysis/HEAD/pdf_ange_albertini.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # PDF Analysis
  2 | 
  3 | >Several PDF analysis has already been done, I reassembled a lot of them with additional tips & tools here
  4 | 
  5 | 
  6 | - [PDF format](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#pdf-format-page_facing_up)<br />
  7 | - [Tools list](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#tools-list-wrench)<br />
  8 | - [Quick Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#quick-analysis-rocket)<br />
  9 | - [Complete Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#complete-analysis-mag_right)<br />
 10 | 	- [Basic informations](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#basic-informations-1)<br />
 11 | 	- [Metadata](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#metadata)<br />
 12 | 	- [Search for older version](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#search-for-older-versions)<br />
 13 | 	- [Online Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#online-analysis-1)<br />
 14 | 	- [Statistics](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#statistics)<br />
 15 | 	- [Visual analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#visual-analysis)<br />
 16 | 	- [Go deeper in the analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#go-deeper-in-the-analysis)<br />
 17 | 	- [Displaying objects and actions structure](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#displaying-objects-and-actions-structure-1)<br />
 18 | 	- [Map of the objects flows](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#map-of-the-objects-flows)<br />
 19 | 	- [Actions](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#actions)<br />
 20 | 	- [Compression](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#compression)<br />
 21 | 	- [Embeded files](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#embeded-files)<br />
 22 | 	- [Extract files / scripts / objects](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#extract-files--scripts--objects-1)<br />
 23 | 	- [Conversions](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#conversion)<br />
 24 | 	- [Encryption](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#encryption)<br />
 25 | 	- [Javascript](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#javascript)<br />
 26 | 	- [Flash](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#flash)<br />
 27 | - [Sources](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#sources-information_source)<br />
 28 | 
 29 | 
 30 | ## PDF Format :page_facing_up:
 31 | 
 32 | <p align="center">
 33 |   <img src="https://github.com/zbetcheckin/PDF_analysis/blob/master/pdf_ange_albertini.png" alt="alt text" width="580" height="403">
 34 | </p>
 35 | <br />
 36 | https://www.adobe.com/devnet/pdf/pdf_reference.html <br />
 37 | https://blog.didierstevens.com/2008/04/09/quickpost-about-the-physical-and-logical-structure-of-pdf-files/ <br />
 38 | https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF <br />
 39 | 
 40 | ## Tools list :wrench:
 41 | 
 42 | Tool | URL
 43 | ------------------------------------ | ---------------------------------------------
 44 | AnalyzePDF.py | https://github.com/hiddenillusion/AnalyzePDF
 45 | ByteForce | https://github.com/weaknetlabs/ByteForce
 46 | Caradoc | https://github.com/ANSSI-FR/caradoc
 47 | Didier Stevens suite | https://github.com/DidierStevens/DidierStevensSuite
 48 | dumppdf | https://packages.debian.org/jessie/python-pdfminer
 49 | forensics-all | https://packages.debian.org/jessie-backports/forensics-all
 50 | Origami | https://code.google.com/archive/p/origami-pdf/
 51 | ParanoiDF | https://github.com/patrickdw123/ParanoiDF
 52 | peepdf | https://github.com/jesparza/peepdf
 53 | PDF Xray | https://github.com/9b/pdfxray_public
 54 | pdf-parser | http://didierstevens.com/files/software/pdf-parser_V0_6_4.zip
 55 | pdf2jhon.py | https://github.com/magnumripper/JohnTheRipper/blob/unstable-jumbo/run/pdf2john.py
 56 | pdfcrack | https://packages.debian.org/jessie/pdfcrack
 57 | pdfextract | https://github.com/CrossRef/pdfextract
 58 | pdfobjflow.py | https://bitbucket.org/sebastiendamaye/pdfobjflow
 59 | pdfresurrect | https://packages.debian.org/jessie/pdfresurrect
 60 | PdfStreamDumper.exe | http://sandsprite.com/CodeStuff/PDFStreamDumper_Setup.exe
 61 | pdftk | https://packages.debian.org/en/jessie/pdftk
 62 | pdfxray_lite.py | https://github.com/9b/pdfxray_lite
 63 | poppler-utils | https://packages.debian.org/en/jessie/poppler-utils (pdftotext, pdfimages, pdftohtml, pdftops, pdfinfo, pdffonts, pdfdetach, pdfseparate, pdfsig, pdftocairo, pdftoppm, pdfunite)
 64 | pyew | https://packages.debian.org/en/jessie/pyew
 65 | qpdf | https://packages.debian.org/jessie/qpdf
 66 | swf_mastah.py | https://github.com/9b/pdfxray_public/blob/master/builder/swf_mastah.py
 67 | 
 68 | 
 69 | #### Existing list
 70 | http://blog.didierstevens.com/programs/pdf-tools/ <br />
 71 | https://github.com/sans-dfir/sift-files/tree/master/pdf-tools
 72 | 
 73 | 
 74 | ## Quick Analysis :rocket:
 75 | 
 76 | 
 77 | #### Basic informations
 78 | ```
 79 | $ file file.pdf
 80 | $ pdfinfo -box -meta -js -rawdates file.pdf
 81 | ```
 82 | 
 83 | 
 84 | #### Displaying objects and actions structure 
 85 | ```
 86 | $ python pdfdid.py -aefv file.pdf
 87 | ```
 88 | 
 89 | 
 90 | #### Search for /OpenAction /AA /Launch /GoTo /GoToR /SubmitForm /Richmedia (for Flash) /JS /JavaScript /URI - Encode - Cipher - Shell code - Obfuscation...
 91 | Automatically with ParanoiDF
 92 | ```
 93 | $ python paranoiDF.py -fl file.pdf
 94 | ```
 95 | Or with pdf-parser
 96 | ```
 97 | $ python pdf-parser.py -v file.pdf
 98 | ```
 99 | With an hexadecimal analyser
100 | ```
101 | $ bless file.pdf
102 | ```
103 | 
104 | 
105 | #### Extract files / scripts / Objects
106 | pdf-parser to extract a js object for example
107 | ```
108 | $ pdf-parser --object 32 --raw > extractedObject.js
109 | ```
110 | pdfextract from Origami
111 | ```
112 | $ pdfextract file.pdf
113 | ```
114 | 
115 | 
116 | #### Online analysis<br />
117 | *Beware to don't leak any important/professional/personnal data or to expose your research*<br />
118 | https://www.hybrid-analysis.com/
119 | 
120 | 
121 | ## Complete Analysis :mag_right:
122 | 
123 | 
124 | ### Basic informations
125 | ```
126 | $ file file.pdf
127 | $ pdfinfo file.pdf
128 | $ pdfinfo -box -meta -js -rawdates file.pdf
129 | ```
130 | 
131 | 
132 | ### Powerfull Python tool to analyze PDF and exploit
133 | ```
134 | $ pyew file.pdf 	
135 | ```
136 | 
137 | 
138 | ### Other Python tool to explore PDF
139 | ```
140 | $ peepdf -fl file.pdf
141 | $ peepdf --interactive file.pdf
142 | ```
143 | 
144 | 
145 | #### Analysis under Windows
146 | PDF Stream Dumper<br />
147 | https://github.com/dzzie/pdfstreamdumper
148 | 
149 | 
150 | 
151 | ### Metadata
152 | Get metadata
153 | ```
154 | $ exiftool -a -u -g2 file.pdf
155 | ```
156 | 
157 | Get metadata recursivly from current directory
158 | ```
159 | $ exiftool -r -ext pdf .
160 | ```
161 | 
162 | Change an element
163 | ```
164 | $ exiftool -Title="New title" file.pdf
165 | ```
166 | 
167 | Remove metadata
168 | ```
169 | $ exiftool -all= file.pdf && exiftool -all:all= file.pdf && qpdf --linearize file.pdf filewithoutmeta.pdf
170 | $ mat file.pdf # latest version of mat doesn't support pdf format anymore...
171 | ```
172 | 
173 | Remove metadata recursively from the current directory :
174 | *Very dirty but work well*
175 | *The filename must not have space at the moment, the commande will be optimized*
176 | ```
177 | $ find . -name "*.pdf" -print0 | while read -d $'\0' file; do echo ${file:2} && mv ${file:2} ${file:2}.pdf && exiftool -all= ${file:2}.pdf && exiftool -all:all= ${file:2}.pdf && qpdf --linearize ${file:2}.pdf ${file:2} && rm ${file:2}.pdf && rm ${file:2}.pdf_original; done
178 | ```
179 | 
180 | 
181 | ### Search for older versions
182 | Search for older "hidden" versions
183 | ```
184 | $ pdfresurrect file.pdf -i
185 | $ exiftool -pdf-update:all= file.pdf
186 | ```
187 | 
188 | 
189 | ### Online Analysis 
190 | Name | URL
191 | ------------------------------------ | ---------------------------------------------
192 | Malwr | https://malwr.com/submission/
193 | Hybrid analysis | https://www.hybrid-analysis.com/
194 | Malware Tracker | https://www.malwaretracker.com/pdf.php
195 | VirusTotal | http://www.virustotal.com/
196 | PDF examiner | http://www.pdfexaminer.com/
197 | Document Analyzer | http://www.document-analyzer.net/
198 | Jotti | https://virusscan.jotti.org/
199 | PDF X-ray | http://www.pdfxray.com/
200 | PDF Online | https://www.pdf-online.com/
201 | Extract PDF | http://www.extractpdf.com
202 | Char conversion | https://kt.pe/tools.html#conv/
203 | 
204 | 
205 | ### Statistics 
206 | Calcul byte statistics, entropy min and max, ASCII count, ... from a PDF
207 | ```
208 | $ python byte-stats.py file.pdf
209 | ```
210 | 
211 | 
212 | ### Visual analysis
213 | Visual analysis of a PDF or a binary file<br />
214 | http://binvis.io
215 | 
216 | 
217 | ## Go deeper in the analysis
218 | 
219 | ### Displaying objects and actions structure 
220 | ```
221 | $ python pdfid.py --all --extra --force --verbose file.pdf
222 | ```
223 | 
224 | ### Map of the objects flows
225 | ```
226 | $ pdf-parser file.pdf | ./pdfobjflow
227 | $ eog pdfobjflow.png
228 | ```
229 | 
230 | 
231 | ### Actions
232 | Search for : <br />
233 | /OpenAction /AA specifies the script or action to run automatically.<br />
234 | /Names /AcroForm /Action can also specify and launch scripts or actions.<br />
235 | /JavaScript specifies JavaScript to run.<br />
236 | /GoTo changes the view to a specified destination within the PDF or in another PDF file.<br />
237 | /Launch a program or opens a document.<br />
238 | /URI accesses a resource by its URL.<br />
239 | /SubmitForm /GoToR can send data to URL.<br />
240 | /RichMedia can be used to embed Flash in PDF.<br />
241 | /ObjStm can hide objects inside an Object Stream.<br />
242 | /JavaScript > /J#61vaScript Beware on obfuscation technique with hex codes
243 | 
244 | 
245 | With ParanoiDF
246 | ```
247 | $ python paranoiDF.py -fl file.pdf
248 | ```
249 | With pdf-parser
250 | ```
251 | $ python pdf-parser.py -v file.pdf
252 | ```
253 | With an hexadecimal analyser
254 | ```
255 | $ bless file.pdf
256 | ```
257 | With dumppdf
258 | ```
259 | $ dumppdf -a file.pdf
260 | ```
261 | 
262 | 
263 | 
264 | 
265 | 
266 | 
267 | 
268 | ### Compression
269 | Search for compression
270 | ```
271 | $ strings file.pdf | grep --color "/Filter"
272 | ```
273 | 
274 | 2 ways to decompress a PDF
275 | ```
276 | $ pdftk compressed.pdf output uncompressed.pdf uncompress
277 | $ qpdf --stream-data=uncompress compressed.pdf uncompressed.pdf 
278 | ```
279 | 
280 | 
281 | 
282 | ### Embeded files
283 | 4 ways to search for embeded files/scripts inside a PDF
284 | ```
285 | $ binwalk file.pdf
286 | $ foremost -a -v file.pdf
287 | $ hachoir-subfile file.pdf
288 | $ scalpel file.pdf
289 | ```
290 | 
291 | 
292 | ### Extract files / scripts / objects
293 | Extract file corresponding to object ID, jpg for example
294 | ```
295 | $ dumppdf.py -i 32 -r file.pdf > image.jpg
296 | ```
297 | Extract js from an object for example
298 | ```
299 | $ pdf-parser --object 32 --raw > extractedObject.js
300 | ```
301 | pdfextract from Origami
302 | ```
303 | $ pdfextract file.pdf
304 | ```
305 | 
306 | 
307 | 
308 | ### Conversion
309 | PDF to Postscript
310 | ```
311 | $ pdftops file.pdf
312 | ```
313 | PDF to TXT
314 | ```
315 | $ pdftotext file.pdf
316 | ```
317 | PDF to JPG
318 | ```
319 | $ convert file.pdf image.jpg
320 | ```
321 | Non-exhaustive list of possible conversion 
322 | 
323 | 
324 | 
325 | 
326 | ### LZWDecode filter
327 | Convert a PDF to Postscript without the LZWDecode filter
328 | ```
329 | $ qpdf --stream-data=uncompress original.pdf decoded.pdf # Decompress it
330 | $ pdftops decoded.pdf decoded.ps # Convert it
331 | ```
332 | 
333 | 
334 | ### Encryption
335 | PDF supports RC4 encryption (40 to 128 bits keys) and AES (128 to 256 with the Extension Level 3). <br />
336 | Beware with empty password.
337 | 
338 | 
339 | #### Password recovering
340 | Brute force a PDF with pdfcrack
341 | ```
342 | $ pdfcrack -w yourDictionnary.txt file.pdf
343 | ```
344 | With john
345 | ```
346 | $ pdf2john.py file.pdf > x.hash
347 | $ john --wordlist=yourDictionnary.txt x.hash
348 | ```
349 | 
350 | 
351 | ### Javascript
352 | 
353 | 2 ways to search for Javascript
354 | ```
355 | $ pdf-parser --search=JavaScript file.pdf 
356 | $ pdfinfo -js file.pdf
357 | ```
358 | 
359 | 
360 | Extract an object
361 | With jsunpack
362 | ```
363 | $ jsunpack-extractjs file.pdf
364 | ```
365 | With pdf-parser
366 | ```
367 | $ pdf-parser --object 32 --raw file.pdf > file.js
368 | ```
369 | With pdfextract from Origami
370 | ```
371 | $ pdfextract --js file.pdf
372 | ```
373 | 
374 | 
375 | #### De-obfuscate
376 | https://github.com/urule99/jsunpack-n <br />
377 | 
378 | Online :<br />
379 | http://jsunpack.jeek.org/java/ <br />
380 | 
381 | Malzilla and SpiderMonkey can also help deobfuscate JavaScript.<br />
382 | Malzilla : <br />
383 | http://www.malzilla.org/downloads.html <br />
384 | SpiderMonkey : <br />
385 | http://www.didierstevens.com/files/software/js-1.7.0-mod.tar.gz <br />
386 | More details coming soon.
387 | 
388 | 
389 | #### Add Javascript to PDF
390 | https://didierstevens.com/files/software/make-pdf_V0_1_6.zip <br />
391 | https://neonprimetime.blogspot.fr/2015/03/how-to-add-javascript-to-pdf.html <br />
392 | 
393 | 
394 | #### Disarming a PDF
395 | ```
396 | $ python pdfid.py --disarm file.pdf
397 | ```
398 | 
399 | 
400 | ### Flash
401 | 
402 | Search for flash
403 | ```
404 | $ python pdf-parser.py --search flash file.pdf
405 | ```
406 | 
407 | Extract flash with swf_mastah
408 | ```
409 | $ python swf_mastah.py -f file.pdf -o ./
410 | $ file *.swf
411 | ```
412 | With pdf-parser
413 | ```
414 | $ pdf-parser.py --object 32 --filter --raw file.pdf > flashFile.swf
415 | $ file flashFile.swf
416 | ```
417 | 
418 | Analysing flash program
419 | ```
420 | $ swfdump -Ddu flashFile.swf > flashFile.txt
421 | ```
422 | More details coming soon.
423 | 
424 | 
425 | ## Sources :information_source:
426 | 
427 | https://blog.didierstevens.com/category/pdf/ <br />
428 | http://www.decalage.info/file_formats_security/pdf <br />
429 | https://zeltser.com/analyzing-malicious-documents/ <br />
430 | https://code.google.com/archive/p/corkami/wikis/PDFTricks.wiki <br />
431 | https://www.sans.org/reading-room/whitepapers/malicious/owned-malicious-pdf-analysis-33443 <br />
432 | https://digital-forensics.sans.org/blog/2009/12/14/pdf-malware-analysis/ <br />
433 | http://fileformats.archiveteam.org/wiki/PDF <br />
434 | 
435 | 


--------------------------------------------------------------------------------