├── pdf_ange_albertini.png
└── README.md
/pdf_ange_albertini.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zbetcheckin/PDF_analysis/HEAD/pdf_ange_albertini.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # PDF Analysis
2 |
3 | >Several PDF analysis has already been done, I reassembled a lot of them with additional tips & tools here
4 |
5 |
6 | - [PDF format](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#pdf-format-page_facing_up)
7 | - [Tools list](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#tools-list-wrench)
8 | - [Quick Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#quick-analysis-rocket)
9 | - [Complete Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#complete-analysis-mag_right)
10 | - [Basic informations](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#basic-informations-1)
11 | - [Metadata](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#metadata)
12 | - [Search for older version](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#search-for-older-versions)
13 | - [Online Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#online-analysis-1)
14 | - [Statistics](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#statistics)
15 | - [Visual analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#visual-analysis)
16 | - [Go deeper in the analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#go-deeper-in-the-analysis)
17 | - [Displaying objects and actions structure](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#displaying-objects-and-actions-structure-1)
18 | - [Map of the objects flows](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#map-of-the-objects-flows)
19 | - [Actions](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#actions)
20 | - [Compression](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#compression)
21 | - [Embeded files](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#embeded-files)
22 | - [Extract files / scripts / objects](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#extract-files--scripts--objects-1)
23 | - [Conversions](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#conversion)
24 | - [Encryption](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#encryption)
25 | - [Javascript](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#javascript)
26 | - [Flash](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#flash)
27 | - [Sources](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#sources-information_source)
28 |
29 |
30 | ## PDF Format :page_facing_up:
31 |
32 |
33 |
34 |
35 |
36 | https://www.adobe.com/devnet/pdf/pdf_reference.html
37 | https://blog.didierstevens.com/2008/04/09/quickpost-about-the-physical-and-logical-structure-of-pdf-files/
38 | https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF
39 |
40 | ## Tools list :wrench:
41 |
42 | Tool | URL
43 | ------------------------------------ | ---------------------------------------------
44 | AnalyzePDF.py | https://github.com/hiddenillusion/AnalyzePDF
45 | ByteForce | https://github.com/weaknetlabs/ByteForce
46 | Caradoc | https://github.com/ANSSI-FR/caradoc
47 | Didier Stevens suite | https://github.com/DidierStevens/DidierStevensSuite
48 | dumppdf | https://packages.debian.org/jessie/python-pdfminer
49 | forensics-all | https://packages.debian.org/jessie-backports/forensics-all
50 | Origami | https://code.google.com/archive/p/origami-pdf/
51 | ParanoiDF | https://github.com/patrickdw123/ParanoiDF
52 | peepdf | https://github.com/jesparza/peepdf
53 | PDF Xray | https://github.com/9b/pdfxray_public
54 | pdf-parser | http://didierstevens.com/files/software/pdf-parser_V0_6_4.zip
55 | pdf2jhon.py | https://github.com/magnumripper/JohnTheRipper/blob/unstable-jumbo/run/pdf2john.py
56 | pdfcrack | https://packages.debian.org/jessie/pdfcrack
57 | pdfextract | https://github.com/CrossRef/pdfextract
58 | pdfobjflow.py | https://bitbucket.org/sebastiendamaye/pdfobjflow
59 | pdfresurrect | https://packages.debian.org/jessie/pdfresurrect
60 | PdfStreamDumper.exe | http://sandsprite.com/CodeStuff/PDFStreamDumper_Setup.exe
61 | pdftk | https://packages.debian.org/en/jessie/pdftk
62 | pdfxray_lite.py | https://github.com/9b/pdfxray_lite
63 | poppler-utils | https://packages.debian.org/en/jessie/poppler-utils (pdftotext, pdfimages, pdftohtml, pdftops, pdfinfo, pdffonts, pdfdetach, pdfseparate, pdfsig, pdftocairo, pdftoppm, pdfunite)
64 | pyew | https://packages.debian.org/en/jessie/pyew
65 | qpdf | https://packages.debian.org/jessie/qpdf
66 | swf_mastah.py | https://github.com/9b/pdfxray_public/blob/master/builder/swf_mastah.py
67 |
68 |
69 | #### Existing list
70 | http://blog.didierstevens.com/programs/pdf-tools/
71 | https://github.com/sans-dfir/sift-files/tree/master/pdf-tools
72 |
73 |
74 | ## Quick Analysis :rocket:
75 |
76 |
77 | #### Basic informations
78 | ```
79 | $ file file.pdf
80 | $ pdfinfo -box -meta -js -rawdates file.pdf
81 | ```
82 |
83 |
84 | #### Displaying objects and actions structure
85 | ```
86 | $ python pdfdid.py -aefv file.pdf
87 | ```
88 |
89 |
90 | #### Search for /OpenAction /AA /Launch /GoTo /GoToR /SubmitForm /Richmedia (for Flash) /JS /JavaScript /URI - Encode - Cipher - Shell code - Obfuscation...
91 | Automatically with ParanoiDF
92 | ```
93 | $ python paranoiDF.py -fl file.pdf
94 | ```
95 | Or with pdf-parser
96 | ```
97 | $ python pdf-parser.py -v file.pdf
98 | ```
99 | With an hexadecimal analyser
100 | ```
101 | $ bless file.pdf
102 | ```
103 |
104 |
105 | #### Extract files / scripts / Objects
106 | pdf-parser to extract a js object for example
107 | ```
108 | $ pdf-parser --object 32 --raw > extractedObject.js
109 | ```
110 | pdfextract from Origami
111 | ```
112 | $ pdfextract file.pdf
113 | ```
114 |
115 |
116 | #### Online analysis
117 | *Beware to don't leak any important/professional/personnal data or to expose your research*
118 | https://www.hybrid-analysis.com/
119 |
120 |
121 | ## Complete Analysis :mag_right:
122 |
123 |
124 | ### Basic informations
125 | ```
126 | $ file file.pdf
127 | $ pdfinfo file.pdf
128 | $ pdfinfo -box -meta -js -rawdates file.pdf
129 | ```
130 |
131 |
132 | ### Powerfull Python tool to analyze PDF and exploit
133 | ```
134 | $ pyew file.pdf
135 | ```
136 |
137 |
138 | ### Other Python tool to explore PDF
139 | ```
140 | $ peepdf -fl file.pdf
141 | $ peepdf --interactive file.pdf
142 | ```
143 |
144 |
145 | #### Analysis under Windows
146 | PDF Stream Dumper
147 | https://github.com/dzzie/pdfstreamdumper
148 |
149 |
150 |
151 | ### Metadata
152 | Get metadata
153 | ```
154 | $ exiftool -a -u -g2 file.pdf
155 | ```
156 |
157 | Get metadata recursivly from current directory
158 | ```
159 | $ exiftool -r -ext pdf .
160 | ```
161 |
162 | Change an element
163 | ```
164 | $ exiftool -Title="New title" file.pdf
165 | ```
166 |
167 | Remove metadata
168 | ```
169 | $ exiftool -all= file.pdf && exiftool -all:all= file.pdf && qpdf --linearize file.pdf filewithoutmeta.pdf
170 | $ mat file.pdf # latest version of mat doesn't support pdf format anymore...
171 | ```
172 |
173 | Remove metadata recursively from the current directory :
174 | *Very dirty but work well*
175 | *The filename must not have space at the moment, the commande will be optimized*
176 | ```
177 | $ find . -name "*.pdf" -print0 | while read -d $'\0' file; do echo ${file:2} && mv ${file:2} ${file:2}.pdf && exiftool -all= ${file:2}.pdf && exiftool -all:all= ${file:2}.pdf && qpdf --linearize ${file:2}.pdf ${file:2} && rm ${file:2}.pdf && rm ${file:2}.pdf_original; done
178 | ```
179 |
180 |
181 | ### Search for older versions
182 | Search for older "hidden" versions
183 | ```
184 | $ pdfresurrect file.pdf -i
185 | $ exiftool -pdf-update:all= file.pdf
186 | ```
187 |
188 |
189 | ### Online Analysis
190 | Name | URL
191 | ------------------------------------ | ---------------------------------------------
192 | Malwr | https://malwr.com/submission/
193 | Hybrid analysis | https://www.hybrid-analysis.com/
194 | Malware Tracker | https://www.malwaretracker.com/pdf.php
195 | VirusTotal | http://www.virustotal.com/
196 | PDF examiner | http://www.pdfexaminer.com/
197 | Document Analyzer | http://www.document-analyzer.net/
198 | Jotti | https://virusscan.jotti.org/
199 | PDF X-ray | http://www.pdfxray.com/
200 | PDF Online | https://www.pdf-online.com/
201 | Extract PDF | http://www.extractpdf.com
202 | Char conversion | https://kt.pe/tools.html#conv/
203 |
204 |
205 | ### Statistics
206 | Calcul byte statistics, entropy min and max, ASCII count, ... from a PDF
207 | ```
208 | $ python byte-stats.py file.pdf
209 | ```
210 |
211 |
212 | ### Visual analysis
213 | Visual analysis of a PDF or a binary file
214 | http://binvis.io
215 |
216 |
217 | ## Go deeper in the analysis
218 |
219 | ### Displaying objects and actions structure
220 | ```
221 | $ python pdfid.py --all --extra --force --verbose file.pdf
222 | ```
223 |
224 | ### Map of the objects flows
225 | ```
226 | $ pdf-parser file.pdf | ./pdfobjflow
227 | $ eog pdfobjflow.png
228 | ```
229 |
230 |
231 | ### Actions
232 | Search for :
233 | /OpenAction /AA specifies the script or action to run automatically.
234 | /Names /AcroForm /Action can also specify and launch scripts or actions.
235 | /JavaScript specifies JavaScript to run.
236 | /GoTo changes the view to a specified destination within the PDF or in another PDF file.
237 | /Launch a program or opens a document.
238 | /URI accesses a resource by its URL.
239 | /SubmitForm /GoToR can send data to URL.
240 | /RichMedia can be used to embed Flash in PDF.
241 | /ObjStm can hide objects inside an Object Stream.
242 | /JavaScript > /J#61vaScript Beware on obfuscation technique with hex codes
243 |
244 |
245 | With ParanoiDF
246 | ```
247 | $ python paranoiDF.py -fl file.pdf
248 | ```
249 | With pdf-parser
250 | ```
251 | $ python pdf-parser.py -v file.pdf
252 | ```
253 | With an hexadecimal analyser
254 | ```
255 | $ bless file.pdf
256 | ```
257 | With dumppdf
258 | ```
259 | $ dumppdf -a file.pdf
260 | ```
261 |
262 |
263 |
264 |
265 |
266 |
267 |
268 | ### Compression
269 | Search for compression
270 | ```
271 | $ strings file.pdf | grep --color "/Filter"
272 | ```
273 |
274 | 2 ways to decompress a PDF
275 | ```
276 | $ pdftk compressed.pdf output uncompressed.pdf uncompress
277 | $ qpdf --stream-data=uncompress compressed.pdf uncompressed.pdf
278 | ```
279 |
280 |
281 |
282 | ### Embeded files
283 | 4 ways to search for embeded files/scripts inside a PDF
284 | ```
285 | $ binwalk file.pdf
286 | $ foremost -a -v file.pdf
287 | $ hachoir-subfile file.pdf
288 | $ scalpel file.pdf
289 | ```
290 |
291 |
292 | ### Extract files / scripts / objects
293 | Extract file corresponding to object ID, jpg for example
294 | ```
295 | $ dumppdf.py -i 32 -r file.pdf > image.jpg
296 | ```
297 | Extract js from an object for example
298 | ```
299 | $ pdf-parser --object 32 --raw > extractedObject.js
300 | ```
301 | pdfextract from Origami
302 | ```
303 | $ pdfextract file.pdf
304 | ```
305 |
306 |
307 |
308 | ### Conversion
309 | PDF to Postscript
310 | ```
311 | $ pdftops file.pdf
312 | ```
313 | PDF to TXT
314 | ```
315 | $ pdftotext file.pdf
316 | ```
317 | PDF to JPG
318 | ```
319 | $ convert file.pdf image.jpg
320 | ```
321 | Non-exhaustive list of possible conversion
322 |
323 |
324 |
325 |
326 | ### LZWDecode filter
327 | Convert a PDF to Postscript without the LZWDecode filter
328 | ```
329 | $ qpdf --stream-data=uncompress original.pdf decoded.pdf # Decompress it
330 | $ pdftops decoded.pdf decoded.ps # Convert it
331 | ```
332 |
333 |
334 | ### Encryption
335 | PDF supports RC4 encryption (40 to 128 bits keys) and AES (128 to 256 with the Extension Level 3).
336 | Beware with empty password.
337 |
338 |
339 | #### Password recovering
340 | Brute force a PDF with pdfcrack
341 | ```
342 | $ pdfcrack -w yourDictionnary.txt file.pdf
343 | ```
344 | With john
345 | ```
346 | $ pdf2john.py file.pdf > x.hash
347 | $ john --wordlist=yourDictionnary.txt x.hash
348 | ```
349 |
350 |
351 | ### Javascript
352 |
353 | 2 ways to search for Javascript
354 | ```
355 | $ pdf-parser --search=JavaScript file.pdf
356 | $ pdfinfo -js file.pdf
357 | ```
358 |
359 |
360 | Extract an object
361 | With jsunpack
362 | ```
363 | $ jsunpack-extractjs file.pdf
364 | ```
365 | With pdf-parser
366 | ```
367 | $ pdf-parser --object 32 --raw file.pdf > file.js
368 | ```
369 | With pdfextract from Origami
370 | ```
371 | $ pdfextract --js file.pdf
372 | ```
373 |
374 |
375 | #### De-obfuscate
376 | https://github.com/urule99/jsunpack-n
377 |
378 | Online :
379 | http://jsunpack.jeek.org/java/
380 |
381 | Malzilla and SpiderMonkey can also help deobfuscate JavaScript.
382 | Malzilla :
383 | http://www.malzilla.org/downloads.html
384 | SpiderMonkey :
385 | http://www.didierstevens.com/files/software/js-1.7.0-mod.tar.gz
386 | More details coming soon.
387 |
388 |
389 | #### Add Javascript to PDF
390 | https://didierstevens.com/files/software/make-pdf_V0_1_6.zip
391 | https://neonprimetime.blogspot.fr/2015/03/how-to-add-javascript-to-pdf.html
392 |
393 |
394 | #### Disarming a PDF
395 | ```
396 | $ python pdfid.py --disarm file.pdf
397 | ```
398 |
399 |
400 | ### Flash
401 |
402 | Search for flash
403 | ```
404 | $ python pdf-parser.py --search flash file.pdf
405 | ```
406 |
407 | Extract flash with swf_mastah
408 | ```
409 | $ python swf_mastah.py -f file.pdf -o ./
410 | $ file *.swf
411 | ```
412 | With pdf-parser
413 | ```
414 | $ pdf-parser.py --object 32 --filter --raw file.pdf > flashFile.swf
415 | $ file flashFile.swf
416 | ```
417 |
418 | Analysing flash program
419 | ```
420 | $ swfdump -Ddu flashFile.swf > flashFile.txt
421 | ```
422 | More details coming soon.
423 |
424 |
425 | ## Sources :information_source:
426 |
427 | https://blog.didierstevens.com/category/pdf/
428 | http://www.decalage.info/file_formats_security/pdf
429 | https://zeltser.com/analyzing-malicious-documents/
430 | https://code.google.com/archive/p/corkami/wikis/PDFTricks.wiki
431 | https://www.sans.org/reading-room/whitepapers/malicious/owned-malicious-pdf-analysis-33443
432 | https://digital-forensics.sans.org/blog/2009/12/14/pdf-malware-analysis/
433 | http://fileformats.archiveteam.org/wiki/PDF
434 |
435 |
--------------------------------------------------------------------------------