├── Dockerfile
├── LICENSE.txt
├── README.md
├── extract.sh
└── extractor.py


/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM ubuntu:20.04
 2 | 
 3 | WORKDIR /root
 4 | 
 5 | ENV DEBIAN_FRONTEND=noninteractive
 6 | RUN apt-get update && apt-get upgrade -y && \
 7 |     apt-get install -y \
 8 |       build-essential \
 9 |       git-core \
10 |       liblzma-dev \
11 |       liblzo2-dev \
12 |       python3-pip \
13 |       unrar-free \
14 |       wget \
15 |       zlib1g-dev && \
16 |     update-alternatives --install /usr/bin/python python /usr/bin/python3 10 # python should be py3
17 | 
18 | RUN git clone -q --depth=1 https://github.com/devttys0/binwalk.git /root/binwalk && \
19 |     cd /root/binwalk && \
20 |     ./deps.sh --yes && \
21 |     python3 ./setup.py install && \
22 |     pip3 install git+https://github.com/ahupp/python-magic && \
23 |     pip3 install git+https://github.com/sviehb/jefferson && \
24 |     pip3 install pylzma # jefferson dependency, needs build-essential
25 | 
26 | COPY extractor.py /root/
27 | WORKDIR /root/
28 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2015 - 2016, Daming Dominic Chen
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | Introduction
 2 | ============
 3 | 
 4 | This is a recursive firmware extractor that aims to extract a kernel image 
 5 | and/or compressed filesystem from a Linux-based firmware image. A number of 
 6 | heuristics are included to avoid extraction of certain blacklisted file types, 
 7 | and to avoid unproductive extraction beyond certain breadth and depth 
 8 | limitations.
 9 | 
10 | Firmware images with multiple filesystems are not fully supported; this tool
11 | cannot reassemble them and will instead extract the first filesystem that has
12 | sufficient UNIX-like root directories (e.g. `/bin`, `/etc/`, etc.)
13 | 
14 | For the impatients: Dockerize all the things!
15 | =============================================
16 | 1. Install [Docker](https://docs.docker.com/engine/getstarted/)
17 | 2. Run the dockerized extractor
18 | ```
19 | git clone https://github.com/firmadyne/extractor.git
20 | cd extractor
21 | ./extract.sh path/to/firmware.img path/to/output/directory
22 | ```
23 | 
24 | Dependencies
25 | ============
26 | * [fakeroot](https://fakeroot.alioth.debian.org)
27 | * [psycopg2](http://initd.org/psycopg/)
28 | * [binwalk](https://github.com/devttys0/binwalk)
29 | * [python-magic](https://github.com/ahupp/python-magic)
30 | 
31 | Please use the latest version of `binwalk`. Note that there are two
32 | Python modules that both share the name `python-magic`; both should be usable,
33 | but only the one linked above has been tested extensively.
34 | 
35 | Binwalk
36 | -------
37 | 
38 | * [jefferson](https://github.com/sviehb/jefferson)
39 | * [sasquatch](https://github.com/firmadyne/sasquatch) (optional)
40 | 
41 | When installing `binwalk`, it is optional to use the forked version of the
42 | `sasquatch` tool, which has been modified to make SquashFS file extraction
43 | errors fatal to prevent false positives.
44 | 
45 | Usage
46 | =====
47 | 
48 | During execution, the extractor will temporarily extract files into `/tmp`
49 | while recursing. Since firmware images can be large, preferably mount this
50 | mount point as `tmpfs` backed by a large amount of memory, to optimize
51 | performance.
52 | 
53 | To preserve filesystem permissions during extraction, while avoiding execution
54 | with root privileges, wrap execution of this extractor within `fakeroot`. This
55 | will emulate privileged operations.
56 | 
57 | `fakeroot python3 ./extractor.py -np <infile> <outdir>`
58 | 
59 | Notes
60 | =====
61 | 
62 | This tool is beta quality. In particular, it was written before the 
63 | `binwalk` API was updated to provide an interface for accessing information
64 | about the extraction of each signature match. As a result, it walks the
65 | filesystem to identify the extracted files that correspond to a given
66 | signature match. Additionally, parallel operation has not been thoroughly
67 | tested.
68 | 
69 | Pull requests are greatly appreciated!
70 | 


--------------------------------------------------------------------------------
/extract.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/bash
 2 | 
 3 | infile=$1
 4 | outdir=$2
 5 | 
 6 | # override 65535k docker default size with tmpfs default
 7 | mem=$(($(free | awk '/^Mem:/{print $2}') / 2))k
 8 | 
 9 | indir=$(realpath $(dirname "${infile}"))
10 | outdir=$(realpath "${outdir}")
11 | infilebn=$(basename "${infile}")
12 | 
13 | docker run --rm -t -i --tmpfs /tmp:rw,size=${mem} \
14 |   -v "${indir}":/firmware-in:ro \
15 |   -v "${outdir}":/firmware-out \
16 |   "ddcc/firmadyne-extractor:latest" \
17 |   fakeroot /home/extractor/extractor/extractor.py \
18 |   -np \
19 |   /firmware-in/"${infilebn}" \
20 |   /firmware-out
21 | 


--------------------------------------------------------------------------------
/extractor.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | """
  4 | Module that performs extraction. For usage, refer to documentation for the class
  5 | 'Extractor'. This module can also be executed directly,
  6 | e.g. 'extractor.py <input> <output>'.
  7 | """
  8 | 
  9 | import argparse
 10 | import hashlib
 11 | import multiprocessing
 12 | import os
 13 | from stat import S_ISREG
 14 | import shutil
 15 | import tempfile
 16 | import traceback
 17 | 
 18 | import magic
 19 | import binwalk
 20 | 
 21 | class Extractor(object):
 22 |     """
 23 |     Class that extracts kernels and filesystems from firmware images, given an
 24 |     input file or directory and output directory.
 25 |     """
 26 | 
 27 |     # Directories that define the root of a UNIX filesystem, and the
 28 |     # appropriate threshold condition
 29 |     UNIX_DIRS = ["bin", "etc", "dev", "home", "lib", "mnt", "opt", "root",
 30 |                  "run", "sbin", "tmp", "usr", "var"]
 31 |     UNIX_THRESHOLD = 4
 32 | 
 33 |     # Lock to prevent concurrent access to visited set. Unfortunately, must be
 34 |     # static because it cannot be pickled or passed as instance attribute.
 35 |     visited_lock = multiprocessing.Lock()
 36 | 
 37 |     def __init__(self, indir, outdir=None, rootfs=True, kernel=True,
 38 |                  numproc=True, server=None, brand=None):
 39 |         # Input firmware update file or directory
 40 |         self._input = os.path.abspath(indir)
 41 |         # Output firmware directory
 42 |         self.output_dir = os.path.abspath(outdir) if outdir else None
 43 | 
 44 |         # Whether to attempt to extract kernel
 45 |         self.do_kernel = kernel
 46 | 
 47 |         # Whether to attempt to extract root filesystem
 48 |         self.do_rootfs = rootfs
 49 | 
 50 |         # Brand of the firmware
 51 |         self.brand = brand
 52 | 
 53 |         # Hostname of SQL server
 54 |         self.database = server
 55 | 
 56 |         # Worker pool.
 57 |         self._pool = multiprocessing.Pool() if numproc else None
 58 | 
 59 |         # Set containing MD5 checksums of visited items
 60 |         self.visited = set()
 61 | 
 62 |         # List containing tagged items to extract as 2-tuple: (tag [e.g. MD5],
 63 |         # path)
 64 |         self._list = list()
 65 | 
 66 |     def __getstate__(self):
 67 |         """
 68 |         Eliminate attributes that should not be pickled.
 69 |         """
 70 |         self_dict = self.__dict__.copy()
 71 |         del self_dict["_pool"]
 72 |         del self_dict["_list"]
 73 |         return self_dict
 74 | 
 75 |     @staticmethod
 76 |     def io_dd(indir, offset, size, outdir):
 77 |         """
 78 |         Given a path to a target file, extract size bytes from specified offset
 79 |         to given output file.
 80 |         """
 81 |         if not size:
 82 |             return
 83 | 
 84 |         with open(indir, "rb") as ifp:
 85 |             with open(outdir, "wb") as ofp:
 86 |                 ifp.seek(offset, 0)
 87 |                 ofp.write(ifp.read(size))
 88 | 
 89 |     @staticmethod
 90 |     def magic(indata, mime=False):
 91 |         """
 92 |         Performs file magic while maintaining compatibility with different
 93 |         libraries.
 94 |         """
 95 | 
 96 |         try:
 97 |             if mime:
 98 |                 mymagic = magic.open(magic.MAGIC_MIME_TYPE)
 99 |             else:
100 |                 mymagic = magic.open(magic.MAGIC_NONE)
101 |             mymagic.load()
102 |         except AttributeError:
103 |             mymagic = magic.Magic(mime)
104 |             mymagic.file = mymagic.from_file
105 |         return mymagic.file(indata)
106 | 
107 |     @staticmethod
108 |     def io_md5(target):
109 |         """
110 |         Performs MD5 with a block size of 64kb.
111 |         """
112 |         blocksize = 65536
113 |         hasher = hashlib.md5()
114 | 
115 |         stat = os.stat(target)
116 |         if not S_ISREG(stat.st_mode):
117 |             hasher.update(target.encode('utf-8'))
118 |         else:
119 |             with open(target, 'rb') as ifp:
120 |                 buf = ifp.read(blocksize)
121 |                 while buf:
122 |                     hasher.update(buf)
123 |                     buf = ifp.read(blocksize)
124 |         return hasher.hexdigest()
125 | 
126 |     @staticmethod
127 |     def io_rm(target):
128 |         """
129 |         Attempts to recursively delete a directory.
130 |         """
131 |         shutil.rmtree(target, ignore_errors=False, onerror=Extractor._io_err)
132 | 
133 |     @staticmethod
134 |     def _io_err(function, path, excinfo):
135 |         """
136 |         Internal function used by '_rm' to print out errors.
137 |         """
138 |         print(("!! %s: Cannot delete %s!\n%s" % (function, path, excinfo)))
139 | 
140 |     @staticmethod
141 |     def io_find_rootfs(start, recurse=True):
142 |         """
143 |         Attempts to find a Linux root directory.
144 |         """
145 | 
146 |         # Recurse into single directory chains, e.g. jffs2-root/fs_1/.../
147 |         path = start
148 |         while (len(os.listdir(path)) == 1 and
149 |                os.path.isdir(os.path.join(path, os.listdir(path)[0]))):
150 |             path = os.path.join(path, os.listdir(path)[0])
151 | 
152 |         # count number of unix-like directories
153 |         count = 0
154 |         for subdir in os.listdir(path):
155 |             if subdir in Extractor.UNIX_DIRS and \
156 |                 os.path.isdir(os.path.join(path, subdir)):
157 |                 count += 1
158 | 
159 |         # check for extracted filesystem, otherwise update queue
160 |         if count >= Extractor.UNIX_THRESHOLD:
161 |             return (True, path)
162 | 
163 |         # in some cases, multiple filesystems may be extracted, so recurse to
164 |         # find best one
165 |         if recurse:
166 |             for subdir in os.listdir(path):
167 |                 if os.path.isdir(os.path.join(path, subdir)):
168 |                     res = Extractor.io_find_rootfs(os.path.join(path, subdir),
169 |                                                    False)
170 |                     if res[0]:
171 |                         return res
172 | 
173 |         return (False, start)
174 | 
175 |     def extract(self):
176 |         """
177 |         Perform extraction of firmware updates from input to tarballs in output
178 |         directory using a thread pool.
179 |         """
180 |         if os.path.isdir(self._input):
181 |             for path, _, files in os.walk(self._input):
182 |                 for item in files:
183 |                     self._list.append(os.path.join(path, item))
184 |         elif os.path.isfile(self._input):
185 |             self._list.append(self._input)
186 |         else:
187 |             print("!! Cannot read file: %s" % (self._input,))
188 | 
189 |         if self.output_dir and not os.path.isdir(self.output_dir):
190 |             os.makedirs(self.output_dir)
191 | 
192 |         if self._pool:
193 |             self._pool.map(self._extract_item, self._list)
194 |         else:
195 |             for item in self._list:
196 |                 self._extract_item(item)
197 | 
198 |     def _extract_item(self, path):
199 |         """
200 |         Wrapper function that creates an ExtractionItem and calls the extract()
201 |         method.
202 |         """
203 | 
204 |         ExtractionItem(self, path, 0).extract()
205 | 
206 | class ExtractionItem(object):
207 |     """
208 |     Class that encapsulates the state of a single item that is being extracted.
209 |     """
210 | 
211 |     # Maximum recursion breadth and depth
212 |     RECURSION_BREADTH = 5
213 |     RECURSION_DEPTH = 3
214 | 
215 |     def __init__(self, extractor, path, depth, tag=None):
216 |         # Temporary directory
217 |         self.temp = None
218 | 
219 |         # Recursion depth counter
220 |         self.depth = depth
221 | 
222 |         # Reference to parent extractor object
223 |         self.extractor = extractor
224 | 
225 |         # File path
226 |         self.item = path
227 | 
228 |         # Database connection
229 |         if self.extractor.database:
230 |             import psycopg2
231 |             self.database = psycopg2.connect(database="firmware",
232 |                                              user="firmadyne",
233 |                                              password="firmadyne",
234 |                                              host=self.extractor.database)
235 |         else:
236 |             self.database = None
237 | 
238 |         # Checksum
239 |         self.checksum = Extractor.io_md5(path)
240 | 
241 |         # Tag
242 |         self.tag = tag if tag else self.generate_tag()
243 | 
244 |         # Output file path and filename prefix
245 |         self.output = os.path.join(self.extractor.output_dir, self.tag) if \
246 |                                    self.extractor.output_dir else None
247 | 
248 |         # Status, with terminate indicating early termination for this item
249 |         self.terminate = False
250 |         self.status = None
251 |         self.update_status()
252 | 
253 |     def __del__(self):
254 |         if self.database:
255 |             self.database.close()
256 | 
257 |         if self.temp:
258 |             self.printf(">> Cleaning up %s..." % self.temp)
259 |             Extractor.io_rm(self.temp)
260 | 
261 |     def printf(self, fmt):
262 |         """
263 |         Prints output string with appropriate depth indentation.
264 |         """
265 |         print(("\t" * self.depth + fmt))
266 | 
267 |     def generate_tag(self):
268 |         """
269 |         Generate the filename tag.
270 |         """
271 |         if not self.database:
272 |             return os.path.basename(self.item) + "_" + self.checksum
273 | 
274 |         try:
275 |             image_id = None
276 |             cur = self.database.cursor()
277 |             if self.extractor.brand:
278 |                 brand = self.extractor.brand
279 |             else:
280 |                 brand = os.path.relpath(self.item).split(os.path.sep)[0]
281 |             cur.execute("SELECT id FROM brand WHERE name=%s", (brand, ))
282 |             brand_id = cur.fetchone()
283 |             if not brand_id:
284 |                 cur.execute("INSERT INTO brand (name) VALUES (%s) RETURNING id",
285 |                             (brand, ))
286 |                 brand_id = cur.fetchone()
287 |             if brand_id:
288 |                 cur.execute("SELECT id FROM image WHERE hash=%s",
289 |                             (self.checksum, ))
290 |                 image_id = cur.fetchone()
291 |                 if not image_id:
292 |                     cur.execute("INSERT INTO image (filename, brand_id, hash) \
293 |                                 VALUES (%s, %s, %s) RETURNING id",
294 |                                 (os.path.basename(self.item), brand_id[0],
295 |                                  self.checksum))
296 |                     image_id = cur.fetchone()
297 |             self.database.commit()
298 |         except BaseException:
299 |             traceback.print_exc()
300 |             self.database.rollback()
301 |         finally:
302 |             if cur:
303 |                 cur.close()
304 | 
305 |         if image_id:
306 |             self.printf(">> Database Image ID: %s" % image_id[0])
307 | 
308 |         return str(image_id[0]) if \
309 |                image_id else os.path.basename(self.item) + "_" + self.checksum
310 | 
311 |     def get_kernel_status(self):
312 |         """
313 |         Get the flag corresponding to the kernel status.
314 |         """
315 |         return self.status[0]
316 | 
317 |     def get_rootfs_status(self):
318 |         """
319 |         Get the flag corresponding to the root filesystem status.
320 |         """
321 |         return self.status[1]
322 | 
323 |     def update_status(self):
324 |         """
325 |         Updates the status flags using the tag to determine completion status.
326 |         """
327 |         kernel_done = os.path.isfile(self.get_kernel_path()) if \
328 |             self.extractor.do_kernel and self.output else \
329 |             not self.extractor.do_kernel
330 |         rootfs_done = os.path.isfile(self.get_rootfs_path()) if \
331 |             self.extractor.do_rootfs and self.output else \
332 |             not self.extractor.do_rootfs
333 |         self.status = (kernel_done, rootfs_done)
334 | 
335 |         if self.database and kernel_done and self.extractor.do_kernel:
336 |             self.update_database("kernel_extracted", "True")
337 | 
338 |         if self.database and rootfs_done and self.extractor.do_rootfs:
339 |             self.update_database("rootfs_extracted", "True")
340 | 
341 |         return self.get_status()
342 | 
343 |     def update_database(self, field, value):
344 |         """
345 |         Update a given field in the database.
346 |         """
347 |         ret = True
348 |         if self.database:
349 |             try:
350 |                 cur = self.database.cursor()
351 |                 cur.execute("UPDATE image SET " + field + "='" + value +
352 |                             "' WHERE id=%s", (self.tag, ))
353 |                 self.database.commit()
354 |             except BaseException:
355 |                 ret = False
356 |                 traceback.print_exc()
357 |                 self.database.rollback()
358 |             finally:
359 |                 if cur:
360 |                     cur.close()
361 |         return ret
362 | 
363 |     def get_status(self):
364 |         """
365 |         Returns True if early terminate signaled, extraction is complete,
366 |         otherwise False.
367 |         """
368 |         return True if self.terminate or all(i for i in self.status) else False
369 | 
370 |     def get_kernel_path(self):
371 |         """
372 |         Return the full path (including filename) to the output kernel file.
373 |         """
374 |         return self.output + ".kernel" if self.output else None
375 | 
376 |     def get_rootfs_path(self):
377 |         """
378 |         Return the full path (including filename) to the output root filesystem
379 |         file.
380 |         """
381 |         return self.output + ".tar.gz" if self.output else None
382 | 
383 |     def extract(self):
384 |         """
385 |         Perform the actual extraction of firmware updates, recursively. Returns
386 |         True if extraction complete, otherwise False.
387 |         """
388 |         self.printf("\n" + self.item.encode("utf-8", "replace").decode("utf-8"))
389 | 
390 |         # check if item is complete
391 |         if self.get_status():
392 |             self.printf(">> Skipping: completed!")
393 |             return True
394 | 
395 |         # check if exceeding recursion depth
396 |         if self.depth > ExtractionItem.RECURSION_DEPTH:
397 |             self.printf(">> Skipping: recursion depth %d" % self.depth)
398 |             return self.get_status()
399 | 
400 |         # check if checksum is in visited set
401 |         self.printf(">> MD5: %s" % self.checksum)
402 |         with Extractor.visited_lock:
403 |             if self.checksum in self.extractor.visited:
404 |                 self.printf(">> Skipping: %s..." % self.checksum)
405 |                 return self.get_status()
406 |             else:
407 |                 self.extractor.visited.add(self.checksum)
408 | 
409 |         # check if filetype is blacklisted
410 |         if self._check_blacklist():
411 |             return self.get_status()
412 | 
413 |         # create working directory
414 |         self.temp = tempfile.mkdtemp()
415 | 
416 |         try:
417 |             self.printf(">> Tag: %s" % self.tag)
418 |             self.printf(">> Temp: %s" % self.temp)
419 |             self.printf(">> Status: Kernel: %s, Rootfs: %s, Do_Kernel: %s, \
420 |                 Do_Rootfs: %s" % (self.get_kernel_status(),
421 |                                   self.get_rootfs_status(),
422 |                                   self.extractor.do_kernel,
423 |                                   self.extractor.do_rootfs))
424 | 
425 |             for analysis in [self._check_archive, self._check_encryption, self._check_firmware,
426 |                              self._check_kernel, self._check_rootfs,
427 |                              self._check_compressed]:
428 |                 # Move to temporary directory so binwalk does not write to input
429 |                 os.chdir(self.temp)
430 | 
431 |                 # Update status only if analysis changed state
432 |                 if analysis():
433 |                     if self.update_status():
434 |                         self.printf(">> Skipping: completed!")
435 |                         return True
436 | 
437 |         except Exception:
438 |             traceback.print_exc()
439 | 
440 |         return False
441 | 
442 |     def _check_blacklist(self):
443 |         """
444 |         Check if this file is blacklisted for analysis based on file type.
445 |         """
446 |         # First, use MIME-type to exclude large categories of files
447 |         filetype = Extractor.magic(self.item.encode("utf-8", "surrogateescape"),
448 |                                    mime=True)
449 |         if any(s in filetype for s in ["application/x-executable",
450 |                                        "application/x-dosexec",
451 |                                        "application/x-object",
452 |                                        "application/pdf",
453 |                                        "application/msword",
454 |                                        "image/", "text/", "video/"]):
455 |             self.printf(">> Skipping: %s..." % filetype)
456 |             return True
457 | 
458 |         # Next, check for specific file types that have MIME-type
459 |         # 'application/octet-stream'
460 |         filetype = Extractor.magic(self.item.encode("utf-8", "surrogateescape"))
461 |         if any(s in filetype for s in ["executable", "universal binary",
462 |                                        "relocatable", "bytecode", "applet"]):
463 |             self.printf(">> Skipping: %s..." % filetype)
464 |             return True
465 | 
466 |         # Finally, check for specific file extensions that would be incorrectly
467 |         # identified
468 |         if self.item.endswith(".dmg"):
469 |             self.printf(">> Skipping: %s..." % (self.item))
470 |             return True
471 | 
472 |         return False
473 | 
474 |     def _check_archive(self):
475 |         """
476 |         If this file is an archive, recurse over its contents, unless it matches
477 |         an extracted root filesystem.
478 |         """
479 |         return self._check_recursive("archive")
480 | 
481 |     def _check_encryption(self):
482 |         header = b""
483 |         with open(self.item, "rb") as f:
484 |             header = f.read(4)
485 | 
486 |         if header == b"SHRS":
487 |             print(">>>> Found D-Link encrypted firmware in %s!" % (self.item))
488 | 
489 |             # Source: https://github.com/0xricksanchez/dlink-decrypt
490 |             command = 'dd if=%s skip=1756 iflag=skip_bytes status=none | openssl aes-128-cbc -d -nopad -nosalt -K "c05fbf1936c99429ce2a0781f08d6ad8" -iv "67c6697351ff4aec29cdbaabf2fbe346" --nosalt -in /dev/stdin -out %s > /dev/null 2>&1' % (self.item, os.path.join(self.temp, "dlink_decrypt"))
491 |             os.system(command)
492 |             return True
493 |         return False
494 | 
495 |     def _check_firmware(self):
496 |         """
497 |         If this file is of a known firmware type, directly attempt to extract
498 |         the kernel and root filesystem.
499 |         """
500 |         for module in binwalk.scan(self.item, "-y", "header", "--run-as=root", "--preserve-symlinks",
501 |                                    signature=True, quiet=True):
502 |             for entry in module.results:
503 |                 # uImage
504 |                 if "uImage header" in entry.description:
505 |                     if not self.get_kernel_status() and \
506 |                         "OS Kernel Image" in entry.description:
507 |                         kernel_offset = entry.offset + 64
508 |                         kernel_size = 0
509 | 
510 |                         for stmt in entry.description.split(','):
511 |                             if "image size:" in stmt:
512 |                                 kernel_size = int(''.join(
513 |                                     i for i in stmt if i.isdigit()), 10)
514 | 
515 |                         if kernel_size != 0 and kernel_offset + kernel_size \
516 |                             <= os.path.getsize(self.item):
517 |                             self.printf(">>>> %s" % entry.description)
518 | 
519 |                             tmp_fd, tmp_path = tempfile.mkstemp(dir=self.temp)
520 |                             os.close(tmp_fd)
521 |                             Extractor.io_dd(self.item, kernel_offset,
522 |                                             kernel_size, tmp_path)
523 |                             kernel = ExtractionItem(self.extractor, tmp_path,
524 |                                                     self.depth, self.tag)
525 | 
526 |                             return kernel.extract()
527 |                     # elif "RAMDisk Image" in entry.description:
528 |                     #     self.printf(">>>> %s" % entry.description)
529 |                     #     self.printf(">>>> Skipping: RAMDisk / initrd")
530 |                     #     self.terminate = True
531 |                     #     return True
532 | 
533 |                 # TP-Link or TRX
534 |                 elif not self.get_kernel_status() and \
535 |                     not self.get_rootfs_status() and \
536 |                     "rootfs offset: " in entry.description and \
537 |                     "kernel offset: " in entry.description:
538 |                     kernel_offset = 0
539 |                     kernel_size = 0
540 |                     rootfs_offset = 0
541 |                     rootfs_size = 0
542 | 
543 |                     for stmt in entry.description.split(','):
544 |                         if "kernel offset:" in stmt:
545 |                             kernel_offset = int(stmt.split(':')[1], 16)
546 |                         elif "kernel length:" in stmt:
547 |                             kernel_size = int(stmt.split(':')[1], 16)
548 |                         elif "rootfs offset:" in stmt:
549 |                             rootfs_offset = int(stmt.split(':')[1], 16)
550 |                         elif "rootfs length:" in stmt:
551 |                             rootfs_size = int(stmt.split(':')[1], 16)
552 | 
553 |                     # compute sizes if only offsets provided
554 |                     if kernel_offset != rootfs_size and kernel_size == 0 and \
555 |                         rootfs_size == 0:
556 |                         kernel_size = rootfs_offset - kernel_offset
557 |                         rootfs_size = os.path.getsize(self.item) - rootfs_offset
558 | 
559 |                     # ensure that computed values are sensible
560 |                     if (kernel_size > 0 and kernel_offset + kernel_size \
561 |                         <= os.path.getsize(self.item)) and \
562 |                         (rootfs_size != 0 and rootfs_offset + rootfs_size \
563 |                             <= os.path.getsize(self.item)):
564 |                         self.printf(">>>> %s" % entry.description)
565 | 
566 |                         tmp_fd, tmp_path = tempfile.mkstemp(dir=self.temp)
567 |                         os.close(tmp_fd)
568 |                         Extractor.io_dd(self.item, kernel_offset, kernel_size,
569 |                                         tmp_path)
570 |                         kernel = ExtractionItem(self.extractor, tmp_path,
571 |                                                 self.depth, self.tag)
572 |                         kernel.extract()
573 | 
574 |                         tmp_fd, tmp_path = tempfile.mkstemp(dir=self.temp)
575 |                         os.close(tmp_fd)
576 |                         Extractor.io_dd(self.item, rootfs_offset, rootfs_size,
577 |                                         tmp_path)
578 |                         rootfs = ExtractionItem(self.extractor, tmp_path,
579 |                                                 self.depth, self.tag)
580 |                         rootfs.extract()
581 | 
582 |                         return self.update_status()
583 |         return False
584 | 
585 |     def _check_kernel(self):
586 |         """
587 |         If this file contains a kernel version string, assume it is a kernel.
588 |         Only Linux kernels are currently extracted.
589 |         """
590 |         if not self.get_kernel_status():
591 |             for module in binwalk.scan(self.item, "-y", "kernel", "--run-as=root", "--preserve-symlinks",
592 |                                        signature=True, quiet=True):
593 |                 for entry in module.results:
594 |                     if "kernel version" in entry.description:
595 |                         self.update_database("kernel_version",
596 |                                              entry.description)
597 |                         if "Linux" in entry.description:
598 |                             if self.get_kernel_path():
599 |                                 shutil.copy(self.item, self.get_kernel_path())
600 |                             else:
601 |                                 self.extractor.do_kernel = False
602 |                             self.printf(">>>> %s" % entry.description)
603 |                             return True
604 |                         # VxWorks, etc
605 |                         else:
606 |                             self.printf(">>>> Ignoring: %s" % entry.description)
607 |                             return False
608 |                 return False
609 |         return False
610 | 
611 |     def _check_rootfs(self):
612 |         """
613 |         If this file contains a known filesystem type, extract it.
614 |         """
615 | 
616 |         if not self.get_rootfs_status():
617 |             # work-around issue with binwalk signature definitions for ubi
618 |             for module in binwalk.scan(self.item, "-e", "-r", "-y",
619 |                                        "filesystem", "-y", "ubi", "--run-as=root", "--preserve-symlinks", 
620 |                                        signature=True, quiet=True):
621 |                 for entry in module.results:
622 |                     self.printf(">>>> %s" % entry.description)
623 |                     break
624 | 
625 |                 if module.extractor.directory:
626 |                     unix = Extractor.io_find_rootfs(module.extractor.directory)
627 | 
628 |                     if not unix[0]:
629 |                         return False
630 | 
631 |                     self.printf(">>>> Found Linux filesystem in %s!" % unix[1])
632 |                     if self.output:
633 |                         shutil.make_archive(self.output, "gztar",
634 |                                             root_dir=unix[1])
635 |                     else:
636 |                         self.extractor.do_rootfs = False
637 |                     return True
638 |         return False
639 | 
640 |     def _check_compressed(self):
641 |         """
642 |         If this file appears to be compressed, decompress it and recurse over
643 |         its contents.
644 |         """
645 |         return self._check_recursive("compressed")
646 | 
647 |     # treat both archived and compressed files using the same pathway. this is
648 |     # because certain files may appear as e.g. "xz compressed data" but still
649 |     # extract into a root filesystem.
650 |     def _check_recursive(self, fmt):
651 |         """
652 |         Unified implementation for checking both "archive" and "compressed"
653 |         items.
654 |         """
655 |         desc = None
656 |         # perform extraction
657 |         for module in binwalk.scan(self.item, "-e", "-r", "-y", fmt, "--run-as=root", "--preserve-symlinks",
658 |                                    signature=True, quiet=True):
659 |             for entry in module.results:
660 |                 # skip cpio/initrd files since they should be included with
661 |                 # kernel
662 |                 # if "cpio archive" in entry.description:
663 |                 #     self.printf(">> Skipping: cpio: %s" % entry.description)
664 |                 #     self.terminate = True
665 |                 #     return True
666 |                 desc = entry.description
667 |                 self.printf(">>>> %s" % entry.description)
668 |                 break
669 | 
670 |             if module.extractor.directory:
671 |                 unix = Extractor.io_find_rootfs(module.extractor.directory)
672 | 
673 |                 # check for extracted filesystem, otherwise update queue
674 |                 if unix[0]:
675 |                     self.printf(">>>> Found Linux filesystem in %s!" % unix[1])
676 |                     if self.output:
677 |                         shutil.make_archive(self.output, "gztar",
678 |                                             root_dir=unix[1])
679 |                     else:
680 |                         self.extractor.do_rootfs = False
681 |                     return True
682 |                 else:
683 |                     count = 0
684 |                     self.printf(">> Recursing into %s ..." % fmt)
685 |                     for root, _, files in os.walk(module.extractor.directory):
686 |                         # sort both descending alphabetical and increasing
687 |                         # length
688 |                         files.sort()
689 |                         files.sort(key=len)
690 | 
691 |                         # handle case where original file name is restored; put
692 |                         # it to front of queue
693 |                         if desc and "original file name:" in desc:
694 |                             orig = None
695 |                             for stmt in desc.split(","):
696 |                                 if "original file name:" in stmt:
697 |                                     orig = stmt.split("\"")[1]
698 |                             if orig and orig in files:
699 |                                 files.remove(orig)
700 |                                 files.insert(0, orig)
701 | 
702 |                         for filename in files:
703 |                             if count > ExtractionItem.RECURSION_BREADTH:
704 |                                 self.printf(">> Skipping: recursion breadth %d"\
705 |                                     % ExtractionItem.RECURSION_BREADTH)
706 |                                 self.terminate = True
707 |                                 return True
708 |                             else:
709 |                                 new_item = ExtractionItem(self.extractor,
710 |                                                           os.path.join(root,
711 |                                                                        filename),
712 |                                                           self.depth + 1,
713 |                                                           self.tag)
714 |                                 if new_item.extract():
715 |                                     # check that we are actually done before
716 |                                     # performing early termination. for example,
717 |                                     # we might decide to skip on one subitem,
718 |                                     # but we still haven't finished
719 |                                     if self.update_status():
720 |                                         return True
721 |                             count += 1
722 |         return False
723 | 
724 | def main():
725 |     parser = argparse.ArgumentParser(description="Extracts filesystem and \
726 |         kernel from Linux-based firmware images")
727 |     parser.add_argument("input", action="store", help="Input file or directory")
728 |     parser.add_argument("output", action="store", nargs="?", default="images",
729 |                         help="Output directory for extracted firmware")
730 |     parser.add_argument("-sql ", dest="sql", action="store", default=None,
731 |                         help="Hostname of SQL server")
732 |     parser.add_argument("-nf", dest="rootfs", action="store_false",
733 |                         default=True, help="Disable extraction of root \
734 |                         filesystem (may decrease extraction time)")
735 |     parser.add_argument("-nk", dest="kernel", action="store_false",
736 |                         default=True, help="Disable extraction of kernel \
737 |                         (may decrease extraction time)")
738 |     parser.add_argument("-np", dest="parallel", action="store_false",
739 |                         default=True, help="Disable parallel operation \
740 |                         (may increase extraction time)")
741 |     parser.add_argument("-b", dest="brand", action="store", default=None,
742 |                         help="Brand of the firmware image")
743 |     result = parser.parse_args()
744 | 
745 |     extract = Extractor(result.input, result.output, result.rootfs,
746 |                         result.kernel, result.parallel, result.sql,
747 |                         result.brand)
748 |     extract.extract()
749 | 
750 | if __name__ == "__main__":
751 |     main()
752 | 


--------------------------------------------------------------------------------