├── Dockerfile ├── LICENSE.txt ├── README.md ├── extract.sh └── extractor.py /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:20.04 2 | 3 | WORKDIR /root 4 | 5 | ENV DEBIAN_FRONTEND=noninteractive 6 | RUN apt-get update && apt-get upgrade -y && \ 7 | apt-get install -y \ 8 | build-essential \ 9 | git-core \ 10 | liblzma-dev \ 11 | liblzo2-dev \ 12 | python3-pip \ 13 | unrar-free \ 14 | wget \ 15 | zlib1g-dev && \ 16 | update-alternatives --install /usr/bin/python python /usr/bin/python3 10 # python should be py3 17 | 18 | RUN git clone -q --depth=1 https://github.com/devttys0/binwalk.git /root/binwalk && \ 19 | cd /root/binwalk && \ 20 | ./deps.sh --yes && \ 21 | python3 ./setup.py install && \ 22 | pip3 install git+https://github.com/ahupp/python-magic && \ 23 | pip3 install git+https://github.com/sviehb/jefferson && \ 24 | pip3 install pylzma # jefferson dependency, needs build-essential 25 | 26 | COPY extractor.py /root/ 27 | WORKDIR /root/ 28 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 - 2016, Daming Dominic Chen 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Introduction 2 | ============ 3 | 4 | This is a recursive firmware extractor that aims to extract a kernel image 5 | and/or compressed filesystem from a Linux-based firmware image. A number of 6 | heuristics are included to avoid extraction of certain blacklisted file types, 7 | and to avoid unproductive extraction beyond certain breadth and depth 8 | limitations. 9 | 10 | Firmware images with multiple filesystems are not fully supported; this tool 11 | cannot reassemble them and will instead extract the first filesystem that has 12 | sufficient UNIX-like root directories (e.g. `/bin`, `/etc/`, etc.) 13 | 14 | For the impatients: Dockerize all the things! 15 | ============================================= 16 | 1. Install [Docker](https://docs.docker.com/engine/getstarted/) 17 | 2. Run the dockerized extractor 18 | ``` 19 | git clone https://github.com/firmadyne/extractor.git 20 | cd extractor 21 | ./extract.sh path/to/firmware.img path/to/output/directory 22 | ``` 23 | 24 | Dependencies 25 | ============ 26 | * [fakeroot](https://fakeroot.alioth.debian.org) 27 | * [psycopg2](http://initd.org/psycopg/) 28 | * [binwalk](https://github.com/devttys0/binwalk) 29 | * [python-magic](https://github.com/ahupp/python-magic) 30 | 31 | Please use the latest version of `binwalk`. Note that there are two 32 | Python modules that both share the name `python-magic`; both should be usable, 33 | but only the one linked above has been tested extensively. 34 | 35 | Binwalk 36 | ------- 37 | 38 | * [jefferson](https://github.com/sviehb/jefferson) 39 | * [sasquatch](https://github.com/firmadyne/sasquatch) (optional) 40 | 41 | When installing `binwalk`, it is optional to use the forked version of the 42 | `sasquatch` tool, which has been modified to make SquashFS file extraction 43 | errors fatal to prevent false positives. 44 | 45 | Usage 46 | ===== 47 | 48 | During execution, the extractor will temporarily extract files into `/tmp` 49 | while recursing. Since firmware images can be large, preferably mount this 50 | mount point as `tmpfs` backed by a large amount of memory, to optimize 51 | performance. 52 | 53 | To preserve filesystem permissions during extraction, while avoiding execution 54 | with root privileges, wrap execution of this extractor within `fakeroot`. This 55 | will emulate privileged operations. 56 | 57 | `fakeroot python3 ./extractor.py -np ` 58 | 59 | Notes 60 | ===== 61 | 62 | This tool is beta quality. In particular, it was written before the 63 | `binwalk` API was updated to provide an interface for accessing information 64 | about the extraction of each signature match. As a result, it walks the 65 | filesystem to identify the extracted files that correspond to a given 66 | signature match. Additionally, parallel operation has not been thoroughly 67 | tested. 68 | 69 | Pull requests are greatly appreciated! 70 | -------------------------------------------------------------------------------- /extract.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | 3 | infile=$1 4 | outdir=$2 5 | 6 | # override 65535k docker default size with tmpfs default 7 | mem=$(($(free | awk '/^Mem:/{print $2}') / 2))k 8 | 9 | indir=$(realpath $(dirname "${infile}")) 10 | outdir=$(realpath "${outdir}") 11 | infilebn=$(basename "${infile}") 12 | 13 | docker run --rm -t -i --tmpfs /tmp:rw,size=${mem} \ 14 | -v "${indir}":/firmware-in:ro \ 15 | -v "${outdir}":/firmware-out \ 16 | "ddcc/firmadyne-extractor:latest" \ 17 | fakeroot /home/extractor/extractor/extractor.py \ 18 | -np \ 19 | /firmware-in/"${infilebn}" \ 20 | /firmware-out 21 | -------------------------------------------------------------------------------- /extractor.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """ 4 | Module that performs extraction. For usage, refer to documentation for the class 5 | 'Extractor'. This module can also be executed directly, 6 | e.g. 'extractor.py '. 7 | """ 8 | 9 | import argparse 10 | import hashlib 11 | import multiprocessing 12 | import os 13 | from stat import S_ISREG 14 | import shutil 15 | import tempfile 16 | import traceback 17 | 18 | import magic 19 | import binwalk 20 | 21 | class Extractor(object): 22 | """ 23 | Class that extracts kernels and filesystems from firmware images, given an 24 | input file or directory and output directory. 25 | """ 26 | 27 | # Directories that define the root of a UNIX filesystem, and the 28 | # appropriate threshold condition 29 | UNIX_DIRS = ["bin", "etc", "dev", "home", "lib", "mnt", "opt", "root", 30 | "run", "sbin", "tmp", "usr", "var"] 31 | UNIX_THRESHOLD = 4 32 | 33 | # Lock to prevent concurrent access to visited set. Unfortunately, must be 34 | # static because it cannot be pickled or passed as instance attribute. 35 | visited_lock = multiprocessing.Lock() 36 | 37 | def __init__(self, indir, outdir=None, rootfs=True, kernel=True, 38 | numproc=True, server=None, brand=None): 39 | # Input firmware update file or directory 40 | self._input = os.path.abspath(indir) 41 | # Output firmware directory 42 | self.output_dir = os.path.abspath(outdir) if outdir else None 43 | 44 | # Whether to attempt to extract kernel 45 | self.do_kernel = kernel 46 | 47 | # Whether to attempt to extract root filesystem 48 | self.do_rootfs = rootfs 49 | 50 | # Brand of the firmware 51 | self.brand = brand 52 | 53 | # Hostname of SQL server 54 | self.database = server 55 | 56 | # Worker pool. 57 | self._pool = multiprocessing.Pool() if numproc else None 58 | 59 | # Set containing MD5 checksums of visited items 60 | self.visited = set() 61 | 62 | # List containing tagged items to extract as 2-tuple: (tag [e.g. MD5], 63 | # path) 64 | self._list = list() 65 | 66 | def __getstate__(self): 67 | """ 68 | Eliminate attributes that should not be pickled. 69 | """ 70 | self_dict = self.__dict__.copy() 71 | del self_dict["_pool"] 72 | del self_dict["_list"] 73 | return self_dict 74 | 75 | @staticmethod 76 | def io_dd(indir, offset, size, outdir): 77 | """ 78 | Given a path to a target file, extract size bytes from specified offset 79 | to given output file. 80 | """ 81 | if not size: 82 | return 83 | 84 | with open(indir, "rb") as ifp: 85 | with open(outdir, "wb") as ofp: 86 | ifp.seek(offset, 0) 87 | ofp.write(ifp.read(size)) 88 | 89 | @staticmethod 90 | def magic(indata, mime=False): 91 | """ 92 | Performs file magic while maintaining compatibility with different 93 | libraries. 94 | """ 95 | 96 | try: 97 | if mime: 98 | mymagic = magic.open(magic.MAGIC_MIME_TYPE) 99 | else: 100 | mymagic = magic.open(magic.MAGIC_NONE) 101 | mymagic.load() 102 | except AttributeError: 103 | mymagic = magic.Magic(mime) 104 | mymagic.file = mymagic.from_file 105 | return mymagic.file(indata) 106 | 107 | @staticmethod 108 | def io_md5(target): 109 | """ 110 | Performs MD5 with a block size of 64kb. 111 | """ 112 | blocksize = 65536 113 | hasher = hashlib.md5() 114 | 115 | stat = os.stat(target) 116 | if not S_ISREG(stat.st_mode): 117 | hasher.update(target.encode('utf-8')) 118 | else: 119 | with open(target, 'rb') as ifp: 120 | buf = ifp.read(blocksize) 121 | while buf: 122 | hasher.update(buf) 123 | buf = ifp.read(blocksize) 124 | return hasher.hexdigest() 125 | 126 | @staticmethod 127 | def io_rm(target): 128 | """ 129 | Attempts to recursively delete a directory. 130 | """ 131 | shutil.rmtree(target, ignore_errors=False, onerror=Extractor._io_err) 132 | 133 | @staticmethod 134 | def _io_err(function, path, excinfo): 135 | """ 136 | Internal function used by '_rm' to print out errors. 137 | """ 138 | print(("!! %s: Cannot delete %s!\n%s" % (function, path, excinfo))) 139 | 140 | @staticmethod 141 | def io_find_rootfs(start, recurse=True): 142 | """ 143 | Attempts to find a Linux root directory. 144 | """ 145 | 146 | # Recurse into single directory chains, e.g. jffs2-root/fs_1/.../ 147 | path = start 148 | while (len(os.listdir(path)) == 1 and 149 | os.path.isdir(os.path.join(path, os.listdir(path)[0]))): 150 | path = os.path.join(path, os.listdir(path)[0]) 151 | 152 | # count number of unix-like directories 153 | count = 0 154 | for subdir in os.listdir(path): 155 | if subdir in Extractor.UNIX_DIRS and \ 156 | os.path.isdir(os.path.join(path, subdir)): 157 | count += 1 158 | 159 | # check for extracted filesystem, otherwise update queue 160 | if count >= Extractor.UNIX_THRESHOLD: 161 | return (True, path) 162 | 163 | # in some cases, multiple filesystems may be extracted, so recurse to 164 | # find best one 165 | if recurse: 166 | for subdir in os.listdir(path): 167 | if os.path.isdir(os.path.join(path, subdir)): 168 | res = Extractor.io_find_rootfs(os.path.join(path, subdir), 169 | False) 170 | if res[0]: 171 | return res 172 | 173 | return (False, start) 174 | 175 | def extract(self): 176 | """ 177 | Perform extraction of firmware updates from input to tarballs in output 178 | directory using a thread pool. 179 | """ 180 | if os.path.isdir(self._input): 181 | for path, _, files in os.walk(self._input): 182 | for item in files: 183 | self._list.append(os.path.join(path, item)) 184 | elif os.path.isfile(self._input): 185 | self._list.append(self._input) 186 | else: 187 | print("!! Cannot read file: %s" % (self._input,)) 188 | 189 | if self.output_dir and not os.path.isdir(self.output_dir): 190 | os.makedirs(self.output_dir) 191 | 192 | if self._pool: 193 | self._pool.map(self._extract_item, self._list) 194 | else: 195 | for item in self._list: 196 | self._extract_item(item) 197 | 198 | def _extract_item(self, path): 199 | """ 200 | Wrapper function that creates an ExtractionItem and calls the extract() 201 | method. 202 | """ 203 | 204 | ExtractionItem(self, path, 0).extract() 205 | 206 | class ExtractionItem(object): 207 | """ 208 | Class that encapsulates the state of a single item that is being extracted. 209 | """ 210 | 211 | # Maximum recursion breadth and depth 212 | RECURSION_BREADTH = 5 213 | RECURSION_DEPTH = 3 214 | 215 | def __init__(self, extractor, path, depth, tag=None): 216 | # Temporary directory 217 | self.temp = None 218 | 219 | # Recursion depth counter 220 | self.depth = depth 221 | 222 | # Reference to parent extractor object 223 | self.extractor = extractor 224 | 225 | # File path 226 | self.item = path 227 | 228 | # Database connection 229 | if self.extractor.database: 230 | import psycopg2 231 | self.database = psycopg2.connect(database="firmware", 232 | user="firmadyne", 233 | password="firmadyne", 234 | host=self.extractor.database) 235 | else: 236 | self.database = None 237 | 238 | # Checksum 239 | self.checksum = Extractor.io_md5(path) 240 | 241 | # Tag 242 | self.tag = tag if tag else self.generate_tag() 243 | 244 | # Output file path and filename prefix 245 | self.output = os.path.join(self.extractor.output_dir, self.tag) if \ 246 | self.extractor.output_dir else None 247 | 248 | # Status, with terminate indicating early termination for this item 249 | self.terminate = False 250 | self.status = None 251 | self.update_status() 252 | 253 | def __del__(self): 254 | if self.database: 255 | self.database.close() 256 | 257 | if self.temp: 258 | self.printf(">> Cleaning up %s..." % self.temp) 259 | Extractor.io_rm(self.temp) 260 | 261 | def printf(self, fmt): 262 | """ 263 | Prints output string with appropriate depth indentation. 264 | """ 265 | print(("\t" * self.depth + fmt)) 266 | 267 | def generate_tag(self): 268 | """ 269 | Generate the filename tag. 270 | """ 271 | if not self.database: 272 | return os.path.basename(self.item) + "_" + self.checksum 273 | 274 | try: 275 | image_id = None 276 | cur = self.database.cursor() 277 | if self.extractor.brand: 278 | brand = self.extractor.brand 279 | else: 280 | brand = os.path.relpath(self.item).split(os.path.sep)[0] 281 | cur.execute("SELECT id FROM brand WHERE name=%s", (brand, )) 282 | brand_id = cur.fetchone() 283 | if not brand_id: 284 | cur.execute("INSERT INTO brand (name) VALUES (%s) RETURNING id", 285 | (brand, )) 286 | brand_id = cur.fetchone() 287 | if brand_id: 288 | cur.execute("SELECT id FROM image WHERE hash=%s", 289 | (self.checksum, )) 290 | image_id = cur.fetchone() 291 | if not image_id: 292 | cur.execute("INSERT INTO image (filename, brand_id, hash) \ 293 | VALUES (%s, %s, %s) RETURNING id", 294 | (os.path.basename(self.item), brand_id[0], 295 | self.checksum)) 296 | image_id = cur.fetchone() 297 | self.database.commit() 298 | except BaseException: 299 | traceback.print_exc() 300 | self.database.rollback() 301 | finally: 302 | if cur: 303 | cur.close() 304 | 305 | if image_id: 306 | self.printf(">> Database Image ID: %s" % image_id[0]) 307 | 308 | return str(image_id[0]) if \ 309 | image_id else os.path.basename(self.item) + "_" + self.checksum 310 | 311 | def get_kernel_status(self): 312 | """ 313 | Get the flag corresponding to the kernel status. 314 | """ 315 | return self.status[0] 316 | 317 | def get_rootfs_status(self): 318 | """ 319 | Get the flag corresponding to the root filesystem status. 320 | """ 321 | return self.status[1] 322 | 323 | def update_status(self): 324 | """ 325 | Updates the status flags using the tag to determine completion status. 326 | """ 327 | kernel_done = os.path.isfile(self.get_kernel_path()) if \ 328 | self.extractor.do_kernel and self.output else \ 329 | not self.extractor.do_kernel 330 | rootfs_done = os.path.isfile(self.get_rootfs_path()) if \ 331 | self.extractor.do_rootfs and self.output else \ 332 | not self.extractor.do_rootfs 333 | self.status = (kernel_done, rootfs_done) 334 | 335 | if self.database and kernel_done and self.extractor.do_kernel: 336 | self.update_database("kernel_extracted", "True") 337 | 338 | if self.database and rootfs_done and self.extractor.do_rootfs: 339 | self.update_database("rootfs_extracted", "True") 340 | 341 | return self.get_status() 342 | 343 | def update_database(self, field, value): 344 | """ 345 | Update a given field in the database. 346 | """ 347 | ret = True 348 | if self.database: 349 | try: 350 | cur = self.database.cursor() 351 | cur.execute("UPDATE image SET " + field + "='" + value + 352 | "' WHERE id=%s", (self.tag, )) 353 | self.database.commit() 354 | except BaseException: 355 | ret = False 356 | traceback.print_exc() 357 | self.database.rollback() 358 | finally: 359 | if cur: 360 | cur.close() 361 | return ret 362 | 363 | def get_status(self): 364 | """ 365 | Returns True if early terminate signaled, extraction is complete, 366 | otherwise False. 367 | """ 368 | return True if self.terminate or all(i for i in self.status) else False 369 | 370 | def get_kernel_path(self): 371 | """ 372 | Return the full path (including filename) to the output kernel file. 373 | """ 374 | return self.output + ".kernel" if self.output else None 375 | 376 | def get_rootfs_path(self): 377 | """ 378 | Return the full path (including filename) to the output root filesystem 379 | file. 380 | """ 381 | return self.output + ".tar.gz" if self.output else None 382 | 383 | def extract(self): 384 | """ 385 | Perform the actual extraction of firmware updates, recursively. Returns 386 | True if extraction complete, otherwise False. 387 | """ 388 | self.printf("\n" + self.item.encode("utf-8", "replace").decode("utf-8")) 389 | 390 | # check if item is complete 391 | if self.get_status(): 392 | self.printf(">> Skipping: completed!") 393 | return True 394 | 395 | # check if exceeding recursion depth 396 | if self.depth > ExtractionItem.RECURSION_DEPTH: 397 | self.printf(">> Skipping: recursion depth %d" % self.depth) 398 | return self.get_status() 399 | 400 | # check if checksum is in visited set 401 | self.printf(">> MD5: %s" % self.checksum) 402 | with Extractor.visited_lock: 403 | if self.checksum in self.extractor.visited: 404 | self.printf(">> Skipping: %s..." % self.checksum) 405 | return self.get_status() 406 | else: 407 | self.extractor.visited.add(self.checksum) 408 | 409 | # check if filetype is blacklisted 410 | if self._check_blacklist(): 411 | return self.get_status() 412 | 413 | # create working directory 414 | self.temp = tempfile.mkdtemp() 415 | 416 | try: 417 | self.printf(">> Tag: %s" % self.tag) 418 | self.printf(">> Temp: %s" % self.temp) 419 | self.printf(">> Status: Kernel: %s, Rootfs: %s, Do_Kernel: %s, \ 420 | Do_Rootfs: %s" % (self.get_kernel_status(), 421 | self.get_rootfs_status(), 422 | self.extractor.do_kernel, 423 | self.extractor.do_rootfs)) 424 | 425 | for analysis in [self._check_archive, self._check_encryption, self._check_firmware, 426 | self._check_kernel, self._check_rootfs, 427 | self._check_compressed]: 428 | # Move to temporary directory so binwalk does not write to input 429 | os.chdir(self.temp) 430 | 431 | # Update status only if analysis changed state 432 | if analysis(): 433 | if self.update_status(): 434 | self.printf(">> Skipping: completed!") 435 | return True 436 | 437 | except Exception: 438 | traceback.print_exc() 439 | 440 | return False 441 | 442 | def _check_blacklist(self): 443 | """ 444 | Check if this file is blacklisted for analysis based on file type. 445 | """ 446 | # First, use MIME-type to exclude large categories of files 447 | filetype = Extractor.magic(self.item.encode("utf-8", "surrogateescape"), 448 | mime=True) 449 | if any(s in filetype for s in ["application/x-executable", 450 | "application/x-dosexec", 451 | "application/x-object", 452 | "application/pdf", 453 | "application/msword", 454 | "image/", "text/", "video/"]): 455 | self.printf(">> Skipping: %s..." % filetype) 456 | return True 457 | 458 | # Next, check for specific file types that have MIME-type 459 | # 'application/octet-stream' 460 | filetype = Extractor.magic(self.item.encode("utf-8", "surrogateescape")) 461 | if any(s in filetype for s in ["executable", "universal binary", 462 | "relocatable", "bytecode", "applet"]): 463 | self.printf(">> Skipping: %s..." % filetype) 464 | return True 465 | 466 | # Finally, check for specific file extensions that would be incorrectly 467 | # identified 468 | if self.item.endswith(".dmg"): 469 | self.printf(">> Skipping: %s..." % (self.item)) 470 | return True 471 | 472 | return False 473 | 474 | def _check_archive(self): 475 | """ 476 | If this file is an archive, recurse over its contents, unless it matches 477 | an extracted root filesystem. 478 | """ 479 | return self._check_recursive("archive") 480 | 481 | def _check_encryption(self): 482 | header = b"" 483 | with open(self.item, "rb") as f: 484 | header = f.read(4) 485 | 486 | if header == b"SHRS": 487 | print(">>>> Found D-Link encrypted firmware in %s!" % (self.item)) 488 | 489 | # Source: https://github.com/0xricksanchez/dlink-decrypt 490 | command = 'dd if=%s skip=1756 iflag=skip_bytes status=none | openssl aes-128-cbc -d -nopad -nosalt -K "c05fbf1936c99429ce2a0781f08d6ad8" -iv "67c6697351ff4aec29cdbaabf2fbe346" --nosalt -in /dev/stdin -out %s > /dev/null 2>&1' % (self.item, os.path.join(self.temp, "dlink_decrypt")) 491 | os.system(command) 492 | return True 493 | return False 494 | 495 | def _check_firmware(self): 496 | """ 497 | If this file is of a known firmware type, directly attempt to extract 498 | the kernel and root filesystem. 499 | """ 500 | for module in binwalk.scan(self.item, "-y", "header", "--run-as=root", "--preserve-symlinks", 501 | signature=True, quiet=True): 502 | for entry in module.results: 503 | # uImage 504 | if "uImage header" in entry.description: 505 | if not self.get_kernel_status() and \ 506 | "OS Kernel Image" in entry.description: 507 | kernel_offset = entry.offset + 64 508 | kernel_size = 0 509 | 510 | for stmt in entry.description.split(','): 511 | if "image size:" in stmt: 512 | kernel_size = int(''.join( 513 | i for i in stmt if i.isdigit()), 10) 514 | 515 | if kernel_size != 0 and kernel_offset + kernel_size \ 516 | <= os.path.getsize(self.item): 517 | self.printf(">>>> %s" % entry.description) 518 | 519 | tmp_fd, tmp_path = tempfile.mkstemp(dir=self.temp) 520 | os.close(tmp_fd) 521 | Extractor.io_dd(self.item, kernel_offset, 522 | kernel_size, tmp_path) 523 | kernel = ExtractionItem(self.extractor, tmp_path, 524 | self.depth, self.tag) 525 | 526 | return kernel.extract() 527 | # elif "RAMDisk Image" in entry.description: 528 | # self.printf(">>>> %s" % entry.description) 529 | # self.printf(">>>> Skipping: RAMDisk / initrd") 530 | # self.terminate = True 531 | # return True 532 | 533 | # TP-Link or TRX 534 | elif not self.get_kernel_status() and \ 535 | not self.get_rootfs_status() and \ 536 | "rootfs offset: " in entry.description and \ 537 | "kernel offset: " in entry.description: 538 | kernel_offset = 0 539 | kernel_size = 0 540 | rootfs_offset = 0 541 | rootfs_size = 0 542 | 543 | for stmt in entry.description.split(','): 544 | if "kernel offset:" in stmt: 545 | kernel_offset = int(stmt.split(':')[1], 16) 546 | elif "kernel length:" in stmt: 547 | kernel_size = int(stmt.split(':')[1], 16) 548 | elif "rootfs offset:" in stmt: 549 | rootfs_offset = int(stmt.split(':')[1], 16) 550 | elif "rootfs length:" in stmt: 551 | rootfs_size = int(stmt.split(':')[1], 16) 552 | 553 | # compute sizes if only offsets provided 554 | if kernel_offset != rootfs_size and kernel_size == 0 and \ 555 | rootfs_size == 0: 556 | kernel_size = rootfs_offset - kernel_offset 557 | rootfs_size = os.path.getsize(self.item) - rootfs_offset 558 | 559 | # ensure that computed values are sensible 560 | if (kernel_size > 0 and kernel_offset + kernel_size \ 561 | <= os.path.getsize(self.item)) and \ 562 | (rootfs_size != 0 and rootfs_offset + rootfs_size \ 563 | <= os.path.getsize(self.item)): 564 | self.printf(">>>> %s" % entry.description) 565 | 566 | tmp_fd, tmp_path = tempfile.mkstemp(dir=self.temp) 567 | os.close(tmp_fd) 568 | Extractor.io_dd(self.item, kernel_offset, kernel_size, 569 | tmp_path) 570 | kernel = ExtractionItem(self.extractor, tmp_path, 571 | self.depth, self.tag) 572 | kernel.extract() 573 | 574 | tmp_fd, tmp_path = tempfile.mkstemp(dir=self.temp) 575 | os.close(tmp_fd) 576 | Extractor.io_dd(self.item, rootfs_offset, rootfs_size, 577 | tmp_path) 578 | rootfs = ExtractionItem(self.extractor, tmp_path, 579 | self.depth, self.tag) 580 | rootfs.extract() 581 | 582 | return self.update_status() 583 | return False 584 | 585 | def _check_kernel(self): 586 | """ 587 | If this file contains a kernel version string, assume it is a kernel. 588 | Only Linux kernels are currently extracted. 589 | """ 590 | if not self.get_kernel_status(): 591 | for module in binwalk.scan(self.item, "-y", "kernel", "--run-as=root", "--preserve-symlinks", 592 | signature=True, quiet=True): 593 | for entry in module.results: 594 | if "kernel version" in entry.description: 595 | self.update_database("kernel_version", 596 | entry.description) 597 | if "Linux" in entry.description: 598 | if self.get_kernel_path(): 599 | shutil.copy(self.item, self.get_kernel_path()) 600 | else: 601 | self.extractor.do_kernel = False 602 | self.printf(">>>> %s" % entry.description) 603 | return True 604 | # VxWorks, etc 605 | else: 606 | self.printf(">>>> Ignoring: %s" % entry.description) 607 | return False 608 | return False 609 | return False 610 | 611 | def _check_rootfs(self): 612 | """ 613 | If this file contains a known filesystem type, extract it. 614 | """ 615 | 616 | if not self.get_rootfs_status(): 617 | # work-around issue with binwalk signature definitions for ubi 618 | for module in binwalk.scan(self.item, "-e", "-r", "-y", 619 | "filesystem", "-y", "ubi", "--run-as=root", "--preserve-symlinks", 620 | signature=True, quiet=True): 621 | for entry in module.results: 622 | self.printf(">>>> %s" % entry.description) 623 | break 624 | 625 | if module.extractor.directory: 626 | unix = Extractor.io_find_rootfs(module.extractor.directory) 627 | 628 | if not unix[0]: 629 | return False 630 | 631 | self.printf(">>>> Found Linux filesystem in %s!" % unix[1]) 632 | if self.output: 633 | shutil.make_archive(self.output, "gztar", 634 | root_dir=unix[1]) 635 | else: 636 | self.extractor.do_rootfs = False 637 | return True 638 | return False 639 | 640 | def _check_compressed(self): 641 | """ 642 | If this file appears to be compressed, decompress it and recurse over 643 | its contents. 644 | """ 645 | return self._check_recursive("compressed") 646 | 647 | # treat both archived and compressed files using the same pathway. this is 648 | # because certain files may appear as e.g. "xz compressed data" but still 649 | # extract into a root filesystem. 650 | def _check_recursive(self, fmt): 651 | """ 652 | Unified implementation for checking both "archive" and "compressed" 653 | items. 654 | """ 655 | desc = None 656 | # perform extraction 657 | for module in binwalk.scan(self.item, "-e", "-r", "-y", fmt, "--run-as=root", "--preserve-symlinks", 658 | signature=True, quiet=True): 659 | for entry in module.results: 660 | # skip cpio/initrd files since they should be included with 661 | # kernel 662 | # if "cpio archive" in entry.description: 663 | # self.printf(">> Skipping: cpio: %s" % entry.description) 664 | # self.terminate = True 665 | # return True 666 | desc = entry.description 667 | self.printf(">>>> %s" % entry.description) 668 | break 669 | 670 | if module.extractor.directory: 671 | unix = Extractor.io_find_rootfs(module.extractor.directory) 672 | 673 | # check for extracted filesystem, otherwise update queue 674 | if unix[0]: 675 | self.printf(">>>> Found Linux filesystem in %s!" % unix[1]) 676 | if self.output: 677 | shutil.make_archive(self.output, "gztar", 678 | root_dir=unix[1]) 679 | else: 680 | self.extractor.do_rootfs = False 681 | return True 682 | else: 683 | count = 0 684 | self.printf(">> Recursing into %s ..." % fmt) 685 | for root, _, files in os.walk(module.extractor.directory): 686 | # sort both descending alphabetical and increasing 687 | # length 688 | files.sort() 689 | files.sort(key=len) 690 | 691 | # handle case where original file name is restored; put 692 | # it to front of queue 693 | if desc and "original file name:" in desc: 694 | orig = None 695 | for stmt in desc.split(","): 696 | if "original file name:" in stmt: 697 | orig = stmt.split("\"")[1] 698 | if orig and orig in files: 699 | files.remove(orig) 700 | files.insert(0, orig) 701 | 702 | for filename in files: 703 | if count > ExtractionItem.RECURSION_BREADTH: 704 | self.printf(">> Skipping: recursion breadth %d"\ 705 | % ExtractionItem.RECURSION_BREADTH) 706 | self.terminate = True 707 | return True 708 | else: 709 | new_item = ExtractionItem(self.extractor, 710 | os.path.join(root, 711 | filename), 712 | self.depth + 1, 713 | self.tag) 714 | if new_item.extract(): 715 | # check that we are actually done before 716 | # performing early termination. for example, 717 | # we might decide to skip on one subitem, 718 | # but we still haven't finished 719 | if self.update_status(): 720 | return True 721 | count += 1 722 | return False 723 | 724 | def main(): 725 | parser = argparse.ArgumentParser(description="Extracts filesystem and \ 726 | kernel from Linux-based firmware images") 727 | parser.add_argument("input", action="store", help="Input file or directory") 728 | parser.add_argument("output", action="store", nargs="?", default="images", 729 | help="Output directory for extracted firmware") 730 | parser.add_argument("-sql ", dest="sql", action="store", default=None, 731 | help="Hostname of SQL server") 732 | parser.add_argument("-nf", dest="rootfs", action="store_false", 733 | default=True, help="Disable extraction of root \ 734 | filesystem (may decrease extraction time)") 735 | parser.add_argument("-nk", dest="kernel", action="store_false", 736 | default=True, help="Disable extraction of kernel \ 737 | (may decrease extraction time)") 738 | parser.add_argument("-np", dest="parallel", action="store_false", 739 | default=True, help="Disable parallel operation \ 740 | (may increase extraction time)") 741 | parser.add_argument("-b", dest="brand", action="store", default=None, 742 | help="Brand of the firmware image") 743 | result = parser.parse_args() 744 | 745 | extract = Extractor(result.input, result.output, result.rootfs, 746 | result.kernel, result.parallel, result.sql, 747 | result.brand) 748 | extract.extract() 749 | 750 | if __name__ == "__main__": 751 | main() 752 | --------------------------------------------------------------------------------