├── .gitignore ├── .gitmodules ├── LICENSE.txt ├── README.md ├── analyzers ├── __init__.py ├── aws_finder.py ├── bugsniffer.py ├── elf_files.py ├── private_keys.py ├── silverpush.py ├── so_census.py └── utils.py ├── apkminer.py ├── requirements.txt ├── setup.py └── test ├── example_decomp.py ├── test_parse.py ├── test_silver.py └── testbug.py /.gitignore: -------------------------------------------------------------------------------- 1 | apks/* 2 | output/* 3 | *.pyc 4 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "androguard"] 2 | path = androguard 3 | url = https://github.com/androguard/androguard.git 4 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright 2017 W. Parker Thompson 2 | 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 4 | 5 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 6 | 7 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 8 | 9 | 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 10 | 11 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # apkminer 2 | 3 | Simple program to mine through APKs at high speed. It uses a modular method of calling specific analyzers on each apk provided 4 | 5 | ## Setup 6 | 7 | ```bash 8 | git submodule init 9 | git submodule update 10 | ``` 11 | 12 | Standard CPython works fine but I highly recommend pypy, I have seen 70% faster runs using pypy. 13 | 14 | ## Usage 15 | 16 | ``` 17 | usage: apkminer.py [-h] [-i IN_DIR] [-o LOG_FILE] [-c CORES] [-a ANALYZER] 18 | [-l] 19 | 20 | analyzer of APKs 21 | 22 | optional arguments: 23 | -h, --help show this help message and exit 24 | -i IN_DIR, --in_dir IN_DIR 25 | directory of apk files to analyze 26 | -o LOG_FILE, --log_file LOG_FILE 27 | log file to write to 28 | -c CORES, --cores CORES 29 | force a number of cores to use 30 | -a ANALYZER, --analyzer ANALYZER 31 | Select the analyzer you want to use. 32 | -l, --list_analyzers List the possible analyzers 33 | ``` 34 | 35 | ## Analyzers 36 | 37 | ``` 38 | private_keys - Find private keys in files or dex strings 39 | elf_files - Report string data from specific sections of elf files 40 | aws_finder - Find AWS key pairs in files and dex strings 41 | so_census - Report on data about .so's in APKs 42 | silverpush - Finds apks that contain the silverpush library 43 | ``` 44 | 45 | ## Dependencies 46 | 47 | - pyelftools 48 | 49 | 50 | ## Writing an analyzer 51 | 52 | Below I will layout the steps for writing an analyzer and the components of apkminer that a analyzer developer should understand. 53 | 54 | ### Analyzer template 55 | 56 | ```python 57 | # import the utils.py file for helper functions and Logger object 58 | from utils import * 59 | 60 | # Define the analyzer() function, this function name needs to be the same for each analyzer 61 | # because apkminer searches for this function name. 62 | def analyze(args, apk_queue, res_queue, output_data): 63 | # The Logger class uses a multiprocessing Queue to perform atomic writes to the defined log file 64 | # this is helpful for debugging data and logging and errors that might occur during the run. 65 | log = Logger(args.log_file, res_queue) 66 | 67 | # Continually check the input 'apk_queue' for new file names 68 | while True: 69 | # break the loop if the queue is empty 70 | if apk_queue.empty(): 71 | return 72 | else: 73 | # fetch the file off the queue 74 | apk_file = apk_queue.get() 75 | 76 | # Logging works similar to stdout / stderr, 77 | # the log() function writes to an internal buffer (new line delimited) 78 | # then flush() pushes the data the actually logging process 79 | log.log(apk_file) 80 | log.flush() 81 | 82 | # write analyzer here. 83 | ``` 84 | 85 | In order to register a analyzer inside of apkminer, save the analyzer as a .py in the analyzers/ directory and then edit analyzers/__init__.py to include the name of your analyzer. 86 | 87 | For example: 88 | 89 | ``` 90 | analyzers/test_analyzer.py 91 | ``` 92 | 93 | Then add "test_analyzer" to the line import list in __init__.py 94 | 95 | Check out the aws_finder.py or other analyzers for examples. Also spend some time looking at the helper functions inside of utils.py. 96 | 97 | ### Optional features for analyzers 98 | 99 | In order to enable structured output that is separate from the log file a analyzer writer can define two other methods in their .py file: 100 | 101 | 1. output_results - Used for bulk writes after completion of all input apk's. 102 | 2. stream_results - Used for streaming results as they are generated. 103 | 104 | ### output_results example 105 | 106 | ```python 107 | import pickle 108 | 109 | def output_results(output_data): 110 | fd = open("output.pick", "wb") 111 | pickle.dump(output_data, fd) 112 | fd.close() 113 | ``` 114 | 115 | ### stream_results example 116 | 117 | ```python 118 | import csv 119 | import Queue 120 | 121 | def stream_results(output_queue, end_event): 122 | csv_fd = open('test.csv', 'wb') 123 | datawriter = csv.writer(csv_fd) 124 | 125 | while not end_event.is_set(): 126 | try: 127 | data = output_queue.get(True, 1) 128 | datawriter.writerow(data) 129 | 130 | except Queue.Empty: 131 | continue 132 | ``` 133 | -------------------------------------------------------------------------------- /analyzers/__init__.py: -------------------------------------------------------------------------------- 1 | from analyzers import elf_files, private_keys, silverpush, aws_finder, so_census, bugsniffer -------------------------------------------------------------------------------- /analyzers/aws_finder.py: -------------------------------------------------------------------------------- 1 | from utils import * 2 | 3 | def analyze(args, apk_queue, res_queue, output_data): 4 | log = Logger(args.log_file, res_queue) 5 | while True: 6 | if apk_queue.empty(): 7 | return 8 | else: 9 | apk_file = apk_queue.get() 10 | file_path = args.in_dir + "/" + apk_file 11 | log.log("Checking: %s\n" % file_path) 12 | 13 | try: 14 | a = apk.APK(file_path) 15 | except: 16 | log.log("ERROR parsing apk\n") 17 | log.flush() 18 | continue 19 | 20 | found_aws = False 21 | main_act = a.get_main_activity() 22 | if not main_act: 23 | log.log("NO ACTIVITY: %s" % file_path) 24 | # fall back to just the apk file name 25 | main_act = apk_file 26 | 27 | # try and skip any com.amazon.* apps 28 | if re.search(".com.amazon.", main_act): 29 | log.log("skipping: %s\n" % main_act) 30 | log.flush() 31 | continue 32 | 33 | d = dvm.DalvikVMFormat(a.get_dex()) 34 | 35 | for current_class in d.get_classes(): 36 | # log.log(current_class.get_name()) 37 | if re.search(".amazon.", current_class.get_name(), re.IGNORECASE): 38 | found_aws = True 39 | break 40 | 41 | if found_aws: 42 | assets = get_asset_files(a) 43 | found = regex_apk_files(a, assets, AWS_KEY_C) 44 | 45 | log.log("asset KEYS:") 46 | for data, file in found: 47 | log.log(" %s: %s" % (file, data)) 48 | 49 | 50 | d = dvm.DalvikVMFormat(a.get_dex()) 51 | dx = analysis.newVMAnalysis( d ) 52 | d.set_vmanalysis( dx ) 53 | dx.create_xref() 54 | 55 | 56 | fp_detect = FPDetect(a,d) 57 | 58 | log.log("\nDalvik keys:") 59 | for str_val, ref_obj in dx.get_strings_analysis().iteritems(): 60 | found_key = re.findall(AWS_KEY_C, str_val) 61 | 62 | for res in found_key: 63 | #bail out on FP hits 64 | if fp_detect.is_sec_fp(res): 65 | continue 66 | 67 | log.log(" %s" % res) 68 | for ref_class, ref_method in ref_obj.get_xref_from(): 69 | if fp_detect.is_xref_fp(ref_method.get_class_name()): 70 | continue 71 | 72 | log.log(" REF: %s->%s%s" % (ref_method.get_class_name(), 73 | ref_method.get_name(), 74 | ref_method.get_descriptor())) 75 | 76 | log.log("\n") 77 | log.flush() 78 | -------------------------------------------------------------------------------- /analyzers/bugsniffer.py: -------------------------------------------------------------------------------- 1 | from utils import * 2 | 3 | def analyze(args, apk_queue, res_queue, output_data): 4 | log = Logger(args.log_file, res_queue) 5 | while True: 6 | if apk_queue.empty(): 7 | return 8 | else: 9 | apk_file = apk_queue.get() 10 | 11 | file_path = args.in_dir + "/" + apk_file 12 | log.log("Checking: %s\n" % file_path) 13 | 14 | try: 15 | a = apk.APK(file_path) 16 | except: 17 | log.log("ERROR parsing apk\n") 18 | log.flush() 19 | continue 20 | log.log("Parsed APK") 21 | 22 | d = dvm.DalvikVMFormat(a.get_dex()) 23 | log.log("Parsed Dalvik") 24 | 25 | dx = analysis.newVMAnalysis(d) 26 | d.set_vmanalysis(dx) 27 | dx.create_xref() 28 | log.log("Completed VM analysis") 29 | 30 | # Check for WebView 31 | find_call(dx, log, "Landroid/webkit/WebView;", "addJavascriptInterface") 32 | 33 | # # Check for Runtime.exec() 34 | find_call(dx, log, "Ljava/lang/Runtime;", "exec") 35 | 36 | # Check for pinning 37 | find_implements(dx, log, "Ljavax/net/ssl/X509TrustManager;") 38 | 39 | find_methods(dx, log, ["checkServerTrusted"]) 40 | 41 | log.log("\n") 42 | log.flush() 43 | -------------------------------------------------------------------------------- /analyzers/elf_files.py: -------------------------------------------------------------------------------- 1 | from elftools.elf.elffile import ELFFile 2 | 3 | from utils import * 4 | 5 | def analyze(args, apk_queue, res_queue, output_data): 6 | log = Logger(args.log_file, res_queue) 7 | while True: 8 | if apk_queue.empty(): 9 | return 10 | else: 11 | apk_file = apk_queue.get() 12 | 13 | file_path = args.in_dir + "/" + apk_file 14 | log.log("Checking: %s\n" % file_path) 15 | a = apk.APK(file_path) 16 | 17 | so_files = [] 18 | for file in a.get_files(): 19 | exten = file.split('.')[-1] 20 | if exten == "so": 21 | so_files.append(file) 22 | 23 | if len(so_files) == 0: 24 | continue 25 | 26 | log.log("Found %d .so files" % len(so_files)) 27 | 28 | for so_file in so_files: 29 | elf_data = a.get_file(so_file) 30 | elf_stream = cStringIO.StringIO(elf_data) 31 | try: 32 | elf = ELFFile(elf_stream) 33 | except: 34 | log.log("ERROR: bad elf file") 35 | log.flush() 36 | continue 37 | 38 | log.log(" File: %s" % so_file) 39 | log.log(" Elf sections:") 40 | for section in elf.iter_sections(): 41 | log.log("\t%s" % section.name) 42 | if section.name == ".comment" or section.name == ".conststring": 43 | log.log("\t\t%s" % section.data().replace("\x00", "\n")) 44 | 45 | log.log("\n\n") 46 | log.flush() 47 | 48 | -------------------------------------------------------------------------------- /analyzers/private_keys.py: -------------------------------------------------------------------------------- 1 | from utils import * 2 | 3 | def analyze(args, apk_queue, res_queue, output_data): 4 | log = Logger(args.log_file, res_queue) 5 | while True: 6 | if apk_queue.empty(): 7 | return 8 | else: 9 | apk_file = apk_queue.get() 10 | file_path = args.in_dir + "/" + apk_file 11 | log.log("Checking: %s\n" % file_path) 12 | 13 | try: 14 | a = apk.APK(file_path) 15 | except Exception as err: 16 | log.log("ERROR parsing apk: %s\n" % err) 17 | log.flush() 18 | continue 19 | 20 | PRIV_KEY_PAT = ".PRIVATE KEY-----" 21 | dex_files = [] 22 | for file in a.get_files(): 23 | file_data = a.get_file(file) 24 | if re.search(PRIV_KEY_PAT, file_data): 25 | log.log(" FOUND %s" % file) 26 | if file[-4:] == ".dex": 27 | dex_files.append(file) 28 | #log.log(" FOUND %s:\n%s" % (file.decode('utf-8', 'ignore'), file_data.decode('utf-8', 'ignore'))) 29 | 30 | for dex_file in dex_files: 31 | d = dvm.DalvikVMFormat(a.get_file(dex_file)) 32 | dx = analysis.newVMAnalysis( d ) 33 | d.set_vmanalysis( dx ) 34 | dx.create_xref() 35 | 36 | for str_val, ref_obj in dx.get_strings_analysis().iteritems(): 37 | found_key = re.findall(PRIV_KEY_PAT, str_val) 38 | 39 | for res in found_key: 40 | log.log(" %s" % str_val) 41 | for ref_class, ref_method in ref_obj.get_xref_from(): 42 | log.log(" REF: %s->%s%s" % (ref_method.get_class_name(), 43 | ref_method.get_name(), 44 | ref_method.get_descriptor())) 45 | 46 | log.log("\n\n") 47 | log.flush() 48 | -------------------------------------------------------------------------------- /analyzers/silverpush.py: -------------------------------------------------------------------------------- 1 | from utils import * 2 | 3 | def analyze(args, apk_queue, res_queue, output_data): 4 | log = Logger(args.log_file, res_queue) 5 | while True: 6 | if apk_queue.empty(): 7 | return 8 | else: 9 | apk_file = apk_queue.get() 10 | 11 | file_path = args.in_dir + "/" + apk_file 12 | log.log("Checking: %s\n" % file_path) 13 | 14 | try: 15 | a = apk.APK(file_path) 16 | except: 17 | log.log("ERROR parsing apk\n") 18 | log.flush() 19 | continue 20 | 21 | record_perm = "android.permission.RECORD_AUDIO" in a.get_permissions() 22 | 23 | try: 24 | if "com.silverpush.sdk.android.SPService" in a.get_services() or "com.silverpush.sdk.android.BR_CallState" in a.get_receivers(): 25 | log.log("found silverpush, can record: %s" % str(record_perm)) 26 | log.flush() 27 | continue 28 | except: 29 | log.log("BAD APK DATA: %s" % apk_file) 30 | 31 | log.log("\n") 32 | log.flush() 33 | -------------------------------------------------------------------------------- /analyzers/so_census.py: -------------------------------------------------------------------------------- 1 | import hashlib 2 | 3 | from utils import * 4 | 5 | try: 6 | import cPickle as pickle 7 | except: 8 | import pickle 9 | 10 | def get_base_name(elf_file): 11 | if elf_file.find("/") != -1: 12 | return elf_file.split("/")[-1] 13 | else: 14 | return elf_file 15 | 16 | 17 | def output_results(output_data): 18 | final_data = {} 19 | for element in output_data: 20 | cur_name = element["name"] 21 | if cur_name in final_data: 22 | final_data[cur_name]["count"] += 1 23 | final_data[cur_name]["sha_hashes"].append(element["sha_hash"]) 24 | final_data[cur_name]["apk_files"].append(element["apk_file"]) 25 | else: 26 | final_data[cur_name] = {} 27 | final_data[cur_name]["count"] = 1 28 | final_data[cur_name]["sha_hashes"] = [element["sha_hash"]] 29 | final_data[cur_name]["apk_files"] = [element["apk_file"]] 30 | 31 | fd = open("output.pick", "wb") 32 | pickle.dump(final_data, fd ) 33 | fd.close() 34 | 35 | # for name, obj in final_data.iteritems(): 36 | # print " %s, %d, %s, %s" % (name, obj["count"], str(obj["sha_hashes"]), str(obj["apk_files"])) 37 | 38 | def analyze(args, apk_queue, res_queue, output_data): 39 | log = Logger(args.log_file, res_queue) 40 | while True: 41 | if apk_queue.empty(): 42 | return 43 | else: 44 | apk_file = apk_queue.get() 45 | 46 | file_path = args.in_dir + "/" + apk_file 47 | log.log("Checking: %s\n" % file_path) 48 | 49 | try: 50 | a = apk.APK(file_path) 51 | except: 52 | log.log("ERROR parsing apk\n") 53 | log.flush() 54 | continue 55 | 56 | so_files = [] 57 | for file in a.get_files(): 58 | exten = file.split('.')[-1] 59 | if exten == "so": 60 | so_files.append(file) 61 | 62 | if len(so_files) == 0: 63 | continue 64 | 65 | log.log("Found %d .so files" % len(so_files)) 66 | 67 | for elf_file in so_files: 68 | cur_hash = hashlib.sha1(a.get_file(elf_file)).hexdigest() 69 | 70 | base_name = get_base_name(elf_file) 71 | 72 | output_data.put({"name": base_name, "sha_hash": cur_hash, "apk_file": apk_file}) 73 | 74 | log.log("\n\n") 75 | log.flush() 76 | -------------------------------------------------------------------------------- /analyzers/utils.py: -------------------------------------------------------------------------------- 1 | from androguard.core.bytecodes import apk 2 | from androguard.core.bytecodes import dvm 3 | from androguard.core.analysis import analysis 4 | 5 | import re 6 | 7 | BLACKLIST_FILETYPES = [ 8 | "jpg", 9 | "jet", 10 | "css", 11 | "js", 12 | "ttf", 13 | "fbstr", 14 | "svg", 15 | "png", 16 | "otf", 17 | "mp3"] 18 | 19 | 20 | AWS_ID_PAT = "(? p cur_class.orig_class.get_interfaces() 73 | # ['Ljavax/net/ssl/X509TrustManager;'] 74 | 75 | # log.log("%s implements: %s" % (name, class_name)) 76 | # if name == "Lo/md;": 77 | # import ipdb; ipdb.set_trace(); 78 | 79 | def find_methods(dx, log, methods): 80 | for name, cur_class in dx.classes.items(): 81 | for method in cur_class.get_methods(): 82 | if method.method.name in methods: 83 | log.log("%s implements: %s" % (name, method.method.name)) 84 | 85 | 86 | 87 | def is_blacklist_filetype(file): 88 | exten = file.split('.')[-1] 89 | for b_extenion in BLACKLIST_FILETYPES: 90 | if exten == b_extenion: 91 | return True 92 | return False 93 | 94 | def get_asset_files(a): 95 | started = False 96 | # break once we are at the end of assets, get_files is alphabetical. 97 | ret = [] 98 | for file in a.get_files(): 99 | if file[:7] == 'assets/': 100 | started = True 101 | if not is_blacklist_filetype(file): 102 | ret.append(file) 103 | 104 | elif started == True: 105 | break 106 | return ret 107 | 108 | def regex_apk_files(a, files, pat): 109 | results = [] 110 | for file in files: 111 | data = a.get_file(file) 112 | found = re.findall(pat, data) 113 | for find in found: 114 | results.append([find, file]) 115 | return results 116 | 117 | class FPDetect(): 118 | def __init__(self, a, d): 119 | self.a = a 120 | self.d = d 121 | # create blob of data to regex, kinda a hack but faster than multiple calls to re.* 122 | self.classes_str = str(self.d.get_classes_names()) 123 | 124 | def _find_in_classes(self, data): 125 | if self.classes_str.find(data) != -1: 126 | return True 127 | 128 | # fall back to case insensitive search. 129 | if re.search(".%s." % data, self.classes_str, re.IGNORECASE): 130 | return True 131 | return False 132 | 133 | def is_sec_fp(self, data): 134 | if data[:5] == "/com/": 135 | return True 136 | elif data[:9] == "Landroid/": 137 | return True 138 | elif data[:5] == "Lcom/": 139 | return True 140 | elif data[:6] == "Ljava/": 141 | return True 142 | elif data == "ABCDEFGHJKLMNPQRSTXY": # placeholder AWS_ID 143 | return True 144 | elif data == "DROPPEDSESSIONLENGTH": 145 | return True 146 | elif data == "LAUNCHESAFTERUPGRADE": 147 | return True 148 | elif data == "COMPROMISEDLIBRARIES": 149 | return True 150 | elif data == "========================================": # derp 151 | return True 152 | elif data == "3i2ndDfv2rTHiSisAbouNdArYfORhtTPEefj3q2f": # MIME boundry 153 | return True 154 | elif data == "5e8f16062ea3cd2c4a0d547876baa6f38cabf625": # FB hash 155 | return True 156 | elif data == "8a3c4b262d721acd49a4bf97d5213199c86fa2b9": # FB hash 157 | return True 158 | elif data == "a4b7452e2ed8f5f191058ca7bbfd26b0d3214bfc": # FB hash 159 | return True 160 | elif data == "bca6990fc3c15a8105800c0673517a4b579634a1": # X-CRASHLYTICS-DEVELOPER-TOKEN 161 | return True 162 | elif data == "registerOnSharedPreferenceChangeListener": # nfc why this is not found 163 | return True 164 | elif data == "setJavaScriptCanOpenWindowsAutomatically": 165 | return True 166 | elif data == "startAppWidgetConfigureActivityForResult": 167 | return True 168 | elif self._find_in_classes(data): # is this string a method 169 | return True 170 | elif self.d.get_method(data): # is this string a class 171 | return True 172 | else: 173 | return False 174 | 175 | def is_xref_fp(self, data): 176 | if data[:14] == "Lmono/android/": 177 | return True 178 | elif data[:25] == "Lcom/twitter/sdk/android/": 179 | return True 180 | elif data[:12] == "Lcom/amazon/": 181 | return True 182 | elif data[:20] == "Lcom/google/android/": 183 | return True 184 | elif data[:18] == "Lorg/spongycastle/": 185 | return True 186 | else: 187 | return False 188 | 189 | -------------------------------------------------------------------------------- /apkminer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sys 4 | import time 5 | import types 6 | import pprint 7 | import signal 8 | import logging 9 | import argparse 10 | import traceback 11 | import cStringIO 12 | 13 | import Queue # just for exceptions 14 | import multiprocessing as mp 15 | 16 | import os 17 | from os import listdir 18 | from os.path import isfile, join 19 | 20 | from androguard.core.bytecodes import apk 21 | from androguard.core.bytecodes import dvm 22 | from androguard.core.analysis import analysis 23 | 24 | import analyzers 25 | 26 | def get_files_in_dir(dir_path): 27 | return [f for f in listdir(dir_path) if isfile(join(dir_path, f))] 28 | 29 | def logger_runner(log_file, res_queue, end_event): 30 | print "started logger" 31 | fd = open(log_file, "a") 32 | while not end_event.is_set(): 33 | try: 34 | log_data = res_queue.get(True, 1) 35 | except Queue.Empty: 36 | continue 37 | 38 | fd.write(log_data) 39 | fd.flush() 40 | 41 | fd.close() 42 | 43 | def output_processor(output_queue, end_event, analyzer_out_func): 44 | data = [] 45 | while not end_event.is_set(): 46 | try: 47 | data.append(output_queue.get(True, 1)) 48 | except Queue.Empty: 49 | continue 50 | analyzer_out_func(data) 51 | 52 | def runner(func, args, queue, res_queue, output_data): 53 | try: 54 | func(args, queue, res_queue, output_data) 55 | except: 56 | raise Exception("".join(traceback.format_exception(*sys.exc_info()))) 57 | 58 | def init_worker(): 59 | signal.signal(signal.SIGINT, signal.SIG_IGN) 60 | 61 | def main(): 62 | parser = argparse.ArgumentParser(description='analyzer of APKs') 63 | parser.add_argument("-i", "--in_dir", type=str, 64 | help="directory of apk files to analyze", default=None) 65 | parser.add_argument("-o", "--log_file", type=str, 66 | help="log file to write to", default="OUTPUT.log") 67 | parser.add_argument("-c", "--cores", type=int, 68 | help="force a number of cores to use") 69 | parser.add_argument("-a", "--analyzer", type=str, 70 | help="Select the analyzer you want to use.", default="elf_files") 71 | parser.add_argument("-l", "--list_analyzers", action="store_true", 72 | help="List the possible analyzers") 73 | 74 | args = parser.parse_args() 75 | 76 | publics = (name for name in dir(analyzers) if not name.startswith('_')) 77 | 78 | # dynamically get all analyzers in the directory 79 | analyzer_funcs = {} 80 | selected_output_func = None 81 | selected_stream_func = None 82 | 83 | for name in publics: 84 | obj = getattr(analyzers, name) 85 | if hasattr(obj, "analyze"): 86 | analyzer_funcs[name] = obj 87 | 88 | if args.list_analyzers: 89 | print "Analyzers:" 90 | for func_name, func in analyzer_funcs.iteritems(): 91 | print " %s" % func_name 92 | return 93 | 94 | if not args.in_dir: 95 | print "Please provide a input directory with -i" 96 | return 97 | 98 | selected_analyzer = None 99 | for func_name, obj in analyzer_funcs.iteritems(): 100 | if func_name == args.analyzer: 101 | selected_analyzer = obj 102 | 103 | if hasattr(obj, "output_results"): 104 | selected_output_func = getattr(obj,"output_results") 105 | elif hasattr(obj, "stream_results"): 106 | selected_stream_func = getattr(obj,"stream_results") 107 | break 108 | 109 | if not selected_analyzer: 110 | print "You selected a bad analyzer [%s]" % args.analyzer 111 | print "Analyzers:" 112 | for func_name, func in analyzer_funcs.iteritems(): 113 | print " %s" % func_name 114 | 115 | return 116 | 117 | if args.cores: 118 | cores = args.cores 119 | else: 120 | cores = mp.cpu_count() 121 | 122 | print "Starting '%s' analyzer with %d cores, log file: %s" % (selected_analyzer.__name__, cores, args.log_file) 123 | 124 | apk_files = get_files_in_dir(args.in_dir) 125 | 126 | # Enable for debugging info. 127 | # mp.log_to_stderr(logging.DEBUG) 128 | manager = mp.Manager() 129 | pool = mp.Pool(cores + 2, init_worker) 130 | 131 | apk_queue = manager.Queue() 132 | 133 | # for logging 134 | res_queue = manager.Queue() 135 | 136 | # for data output 137 | output_data = manager.Queue() 138 | end_event = manager.Event() 139 | 140 | # if we have a small count of APK files, limit our worker count 141 | apk_count = len(apk_files) 142 | if apk_count < cores: 143 | cores = apk_count 144 | 145 | for apk in apk_files: 146 | apk_queue.put(apk) 147 | 148 | try: 149 | # TODO: make the runner handle multiple arg lists? 150 | log_result = pool.apply_async(logger_runner, (args.log_file, res_queue, end_event)) 151 | 152 | if selected_output_func: 153 | print "started output output_processor" 154 | output_res = pool.apply_async(output_processor, (output_data, end_event, selected_output_func)) 155 | elif selected_stream_func: 156 | print "started streaming output processor" 157 | output_res = pool.apply_async(selected_stream_func, (output_data, end_event)) 158 | 159 | worker_results = [] 160 | for i in xrange(0, cores): 161 | worker_results.append(pool.apply_async(runner, (selected_analyzer.analyze, args, apk_queue, res_queue, output_data))) 162 | pool.close() 163 | 164 | while len(worker_results) > 0: 165 | for i, res in enumerate(worker_results): 166 | if res.ready(): 167 | result = res.get() 168 | if not res.successful(): 169 | print "one of the workers failed" 170 | worker_results = [] 171 | break 172 | else: 173 | worker_results.pop(i) 174 | 175 | time.sleep(1) 176 | 177 | print "completed all work" 178 | end_event.set() 179 | pool.join() 180 | 181 | # get the exception if the output func fails. 182 | if selected_output_func or selected_stream_func: 183 | output_res.get() 184 | 185 | pool.terminate() 186 | pool.join() 187 | 188 | except KeyboardInterrupt: 189 | print "Exiting!" 190 | pool.terminate() 191 | pool.join() 192 | 193 | if __name__ == '__main__': 194 | main() 195 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pyelftools 2 | -e git+https://github.com/androguard/androguard.git#egg=androguard 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from setuptools import setup 4 | 5 | import pip 6 | from pip.req import parse_requirements 7 | 8 | install_reqs = parse_requirements("requirements.txt", session=pip.download.PipSession()) 9 | requirements = [str(ir.req) for ir in install_reqs] 10 | 11 | setup(name = 'apkminer', 12 | description = 'Parallel APK vulnerability analyzer', 13 | author = 'mothran', 14 | py_modules = ['apkminer'], 15 | url = 'http://github.com/mothran/apkminer', 16 | install_requires = requirements, 17 | entry_points = { 18 | 'console_scripts': ['apkminer = apkminer:main'] 19 | }, 20 | ) 21 | -------------------------------------------------------------------------------- /test/example_decomp.py: -------------------------------------------------------------------------------- 1 | import re 2 | import pprint 3 | import argparse 4 | 5 | 6 | from androguard.core.bytecodes import apk 7 | from androguard.core.bytecodes import dvm 8 | from androguard.core.analysis import analysis 9 | 10 | from androguard.decompiler.decompiler import * 11 | 12 | 13 | # allow for copy + pasting from the main codebase 14 | class Log(): 15 | def log(self,data): 16 | print data 17 | def flush(self): 18 | pass 19 | 20 | log = Log() 21 | 22 | pp = pprint.PrettyPrinter(indent=4) 23 | 24 | parser = argparse.ArgumentParser(description='analyzer of APKs') 25 | parser.add_argument("-i", "--in_apk", type=str, 26 | help="apk file to analyze", required=True) 27 | args = parser.parse_args() 28 | apk_file = file_path = args.in_apk 29 | 30 | for one in xrange(0,1): 31 | log.log("starting analysis of %s" % apk_file) 32 | 33 | try: 34 | a = apk.APK(file_path) 35 | except Exception as err: 36 | log.log("ERROR parsing apk: %s\n" % err) 37 | log.flush() 38 | continue 39 | 40 | PRIV_KEY_PAT = ".PRIVATE KEY-----" 41 | dex_files = [] 42 | for file in a.get_files(): 43 | file_data = a.get_file(file) 44 | if re.search(PRIV_KEY_PAT, file_data): 45 | log.log(" FOUND %s" % file) 46 | if file[-4:] == ".dex": 47 | dex_files.append(file) 48 | 49 | for dex_file in dex_files: 50 | d = dvm.DalvikVMFormat(a.get_file(dex_file)) 51 | dx = analysis.newVMAnalysis( d ) 52 | d.set_vmanalysis( dx ) 53 | dx.create_xref() 54 | 55 | d.set_decompiler(DecompilerDAD(d, dx)) 56 | d.set_vmanalysis(dx) 57 | 58 | for str_val, ref_obj in dx.get_strings_analysis().iteritems(): 59 | found_key = re.findall(PRIV_KEY_PAT, str_val) 60 | 61 | for res in found_key: 62 | log.log(" %s" % str_val) 63 | for ref_class, ref_method in ref_obj.get_xref_from(): 64 | log.log(" REF: %s->%s%s" % (ref_method.get_class_name(), 65 | ref_method.get_name(), 66 | ref_method.get_descriptor())) 67 | 68 | 69 | current_class = d.get_class(ref_method.get_class_name()) 70 | if current_class != None: 71 | print current_class.get_source() 72 | else: 73 | print "ref'd class not found" 74 | 75 | xmethod = dx.get_method_analysis_by_name(ref_method.get_class_name(),ref_method.get_name(), ref_method.get_descriptor()) 76 | 77 | for xref_class, xref_method, xoffset in xmethod.get_xref_from(): 78 | log.log(" REF: %s->%s%s\n" % (xref_method.get_class_name(), 79 | xref_method.get_name(), 80 | xref_method.get_descriptor())) 81 | 82 | current_class = d.get_class(xref_method.get_class_name()) 83 | if current_class != None: 84 | print current_class.get_source() 85 | else: 86 | print "ref'd class not found" 87 | -------------------------------------------------------------------------------- /test/test_parse.py: -------------------------------------------------------------------------------- 1 | import re 2 | import pprint 3 | import argparse 4 | 5 | 6 | from androguard.core.bytecodes import apk 7 | from androguard.core.bytecodes import dvm 8 | from androguard.core.analysis import analysis 9 | 10 | 11 | # allow for copy + pasting from the main codebase 12 | class Log(): 13 | def log(self,data): 14 | print data 15 | def flush(self): 16 | pass 17 | 18 | log = Log() 19 | 20 | pp = pprint.PrettyPrinter(indent=4) 21 | 22 | parser = argparse.ArgumentParser(description='analyzer of APKs') 23 | parser.add_argument("-i", "--in_apk", type=str, 24 | help="apk file to analyze", required=True) 25 | args = parser.parse_args() 26 | apk_file = file_path = args.in_apk 27 | 28 | for one in xrange(0,1): 29 | log.log("starting analysis of %s" % apk_file) 30 | 31 | a = apk.APK(file_path) 32 | d = dvm.DalvikVMFormat(a.get_dex()) 33 | 34 | log.log("completed base analysis") 35 | 36 | for method in d.get_methods(): 37 | print method 38 | 39 | 40 | dx = analysis.newVMAnalysis( d ) 41 | d.set_vmanalysis( dx ) 42 | 43 | log.log("creating xrefs") 44 | dx.create_xref() 45 | 46 | class_name = "Ljava/lang/Runtime;" 47 | func_name = "exec" 48 | func_proto = "(Ljava/lang/String;)V" 49 | # method = dx.get_method_by_name(class_name, func_name, func_proto) 50 | 51 | break 52 | 53 | # print dir() 54 | # for class_str, class_obj in dx.classes.iteritems(): 55 | # print class_str 56 | 57 | # print dir(method) 58 | # print method 59 | 60 | 61 | 62 | for key, val in dx.get_strings_analysis().iteritems(): 63 | if key == "logcat -d -f ": 64 | print "FOUND: %s " % key 65 | 66 | for ref_class, ref_method in val.get_xref_from(): 67 | print type(ref_method) 68 | pp.pprint(dir(ref_method)) 69 | print type(ref_class) 70 | pp.pprint(dir(ref_class)) 71 | 72 | for classobj, class_set in ref_class.get_xref_to().iteritems(): 73 | class_list = list(class_set) 74 | 75 | print classobj.orig_class 76 | for obj in class_list: 77 | print obj[1].get_name() 78 | 79 | 80 | 81 | 82 | # import pdb; pdb.set_trace() 83 | 84 | # import pdb; pdb.set_trace() 85 | # print ref_method.class_name 86 | # print ref_method.proto 87 | # print ref_method.get_name() 88 | 89 | # print dir(ref_method) 90 | # print dir(ref_class) 91 | # # ref_method.show() 92 | 93 | break -------------------------------------------------------------------------------- /test/test_silver.py: -------------------------------------------------------------------------------- 1 | import re 2 | import sys 3 | import argparse 4 | 5 | from androguard.core.bytecodes import apk 6 | from androguard.core.bytecodes import dvm 7 | from androguard.core.analysis import analysis 8 | 9 | # allow for copy + pasting from the main codebase 10 | class Log(): 11 | def log(self,data): 12 | print data 13 | def flush(self): 14 | pass 15 | 16 | log = Log() 17 | 18 | parser = argparse.ArgumentParser(description='analyzer of APKs') 19 | parser.add_argument("-i", "--in_apk", type=str, 20 | help="apk file to analyze", required=True) 21 | args = parser.parse_args() 22 | apk_file = file_path = args.in_apk 23 | 24 | 25 | for one in xrange(0,1): 26 | log.log("Checking: %s\n" % file_path) 27 | a = apk.APK(file_path) 28 | 29 | record_perm = "android.permission.RECORD_AUDIO" in a.get_permissions() 30 | 31 | if "com.silverpush.sdk.android.SPService" in a.get_services() or "com.silverpush.sdk.android.BR_CallState" in a.get_receivers(): 32 | log.log("found silverpush, can record: %s" % str(record_perm)) 33 | log.flush() 34 | # continue 35 | 36 | # elif 37 | if record_perm: 38 | dex_files = list(a.get_all_dex()) 39 | if not dex_files: 40 | log.log("no dex files") 41 | continue 42 | 43 | for dex in dex_files: 44 | d = dvm.DalvikVMFormat(dex) 45 | for data in d.get_strings(): 46 | print data.replace("\x00", "") 47 | # if re.search("\"silverpush\"", data, re.IGNORECASE): 48 | # print data 49 | # break -------------------------------------------------------------------------------- /test/testbug.py: -------------------------------------------------------------------------------- 1 | import re 2 | import sys 3 | import argparse 4 | 5 | from androguard.core.bytecodes import apk 6 | from androguard.core.bytecodes import dvm 7 | from androguard.core.analysis import analysis 8 | 9 | # allow for copy + pasting from the main codebase 10 | class Log(): 11 | def log(self,data): 12 | print(data) 13 | def flush(self): 14 | pass 15 | 16 | log = Log() 17 | 18 | parser = argparse.ArgumentParser(description='analyzer of APKs') 19 | parser.add_argument("-i", "--in_apk", type=str, 20 | help="apk file to analyze", required=True) 21 | args = parser.parse_args() 22 | apk_file = file_path = args.in_apk 23 | 24 | 25 | def find_call(dx, class_name, func_name): 26 | for name, cur_class in dx.classes.items(): 27 | for method in cur_class.get_methods(): 28 | xref_from = method.get_xref_to() 29 | 30 | for ref_class, ref_method, offset in xref_from: 31 | ref_class_name = ref_class.orig_class 32 | # WTF 33 | if type(ref_class_name) == analysis.ExternalClass: 34 | ref_class_name = ref_class_name.name 35 | 36 | ref_method_name = ref_method.get_name() 37 | 38 | if ref_class_name == class_name and ref_method_name == func_name: 39 | log.log("%s.%s calls %s.%s()" % (name, method.method.name, class_name, func_name)) 40 | log.log("") 41 | 42 | def find_implements(dx, srch_class_name): 43 | for name, cur_class in dx.classes.items(): 44 | class_name = cur_class.orig_class 45 | if type(class_name) == analysis.ExternalClass: 46 | class_name = class_name.name 47 | 48 | # log.log("%s implements: %s" % (name, class_name)) 49 | if class_name == srch_class_name: 50 | log.log("%s implements: %s" % (name, class_name)) 51 | 52 | def find_methods(dx, methods): 53 | for name, cur_class in dx.classes.items(): 54 | 55 | # ipdb> p cur_class.orig_class.get_interfaces() 56 | # ['Ljavax/net/ssl/X509TrustManager;'] 57 | 58 | if name == "Lo/md;": 59 | import ipdb; ipdb.set_trace(); 60 | 61 | for method in cur_class.get_methods(): 62 | if "checkServerTrusted" in method.method.name: 63 | log.log("%s inplements: %s" % (name, "checkServerTrusted")) 64 | 65 | 66 | for one in xrange(0,1): 67 | log.log("Checking: %s\n" % file_path) 68 | 69 | try: 70 | a = apk.APK(file_path) 71 | except: 72 | log.log("ERROR parsing apk\n") 73 | log.flush() 74 | continue 75 | log.log("Parsed APK") 76 | 77 | d = dvm.DalvikVMFormat(a.get_dex()) 78 | log.log("Parsed Dalvik") 79 | 80 | dx = analysis.newVMAnalysis(d) 81 | d.set_vmanalysis(dx) 82 | dx.create_xref() 83 | log.log("Completed VM analysis") 84 | 85 | 86 | # Check for WebView 87 | find_call(dx, "Landroid/webkit/WebView;", "addJavascriptInterface") 88 | 89 | # Check for Runtime.exec() 90 | find_call(dx, "Ljava/lang/Runtime;", "exec") 91 | 92 | # Check for pinning 93 | find_implements(dx, "Ljavax/net/ssl/X509TrustManager;") 94 | 95 | find_methods(dx, ["checkServerTrusted"]) 96 | 97 | log.log("\n") 98 | log.flush() 99 | --------------------------------------------------------------------------------