├── .gitignore
├── .gitmodules
├── LICENSE.txt
├── README.md
├── analyzers
    ├── __init__.py
    ├── aws_finder.py
    ├── bugsniffer.py
    ├── elf_files.py
    ├── private_keys.py
    ├── silverpush.py
    ├── so_census.py
    └── utils.py
├── apkminer.py
├── requirements.txt
├── setup.py
└── test
    ├── example_decomp.py
    ├── test_parse.py
    ├── test_silver.py
    └── testbug.py


/.gitignore:
--------------------------------------------------------------------------------
1 | apks/*
2 | output/*
3 | *.pyc
4 | 


--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "androguard"]
2 | 	path = androguard
3 | 	url = https://github.com/androguard/androguard.git
4 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | Copyright 2017 W. Parker Thompson
 2 | 
 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 4 | 
 5 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
 6 | 
 7 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
 8 | 
 9 | 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
10 | 
11 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # apkminer
  2 | 
  3 | Simple program to mine through APKs at high speed.  It uses a modular method of calling specific analyzers on each apk provided
  4 | 
  5 | ## Setup
  6 | 
  7 | ```bash
  8 | git submodule init
  9 | git submodule update
 10 | ```
 11 | 
 12 | Standard CPython works fine but I highly recommend pypy, I have seen 70% faster runs using pypy.
 13 | 
 14 | ## Usage
 15 | 
 16 | ```
 17 | usage: apkminer.py [-h] [-i IN_DIR] [-o LOG_FILE] [-c CORES] [-a ANALYZER]
 18 |                    [-l]
 19 | 
 20 | analyzer of APKs
 21 | 
 22 | optional arguments:
 23 |   -h, --help            show this help message and exit
 24 |   -i IN_DIR, --in_dir IN_DIR
 25 |                         directory of apk files to analyze
 26 |   -o LOG_FILE, --log_file LOG_FILE
 27 |                         log file to write to
 28 |   -c CORES, --cores CORES
 29 |                         force a number of cores to use
 30 |   -a ANALYZER, --analyzer ANALYZER
 31 |                         Select the analyzer you want to use.
 32 |   -l, --list_analyzers  List the possible analyzers
 33 | ```
 34 | 
 35 | ## Analyzers
 36 | 
 37 | ```
 38 | private_keys  -  Find private keys in files or dex strings
 39 | elf_files     -  Report string data from specific sections of elf files
 40 | aws_finder    -  Find AWS key pairs in files and dex strings
 41 | so_census     -  Report on data about .so's in APKs
 42 | silverpush    -  Finds apks that contain the silverpush library
 43 | ```
 44 | 
 45 | ## Dependencies
 46 | 
 47 | - pyelftools
 48 | 
 49 | 
 50 | ## Writing an analyzer
 51 | 
 52 | Below I will layout the steps for writing an analyzer and the components of apkminer that a analyzer developer should understand.
 53 | 
 54 | ### Analyzer template
 55 | 
 56 | ```python
 57 | # import the utils.py file for helper functions and Logger object
 58 | from utils import *
 59 | 
 60 | # Define the analyzer() function, this function name needs to be the same for each analyzer
 61 | # because apkminer searches for this function name.
 62 | def analyze(args, apk_queue, res_queue, output_data):
 63 | 	# The Logger class uses a multiprocessing Queue to perform atomic writes to the defined log file
 64 | 	# this is helpful for debugging data and logging and errors that might occur during the run.
 65 | 	log = Logger(args.log_file, res_queue)
 66 | 
 67 | 	# Continually check the input 'apk_queue' for new file names
 68 | 	while True:
 69 | 		# break the loop if the queue is empty
 70 | 		if apk_queue.empty():
 71 | 			return
 72 | 		else:
 73 | 			# fetch the file off the queue
 74 | 			apk_file = apk_queue.get()
 75 | 
 76 | 			# Logging works similar to stdout / stderr,
 77 | 			# the log() function writes to an internal buffer (new line delimited)
 78 | 			# then flush() pushes the data the actually logging process
 79 | 			log.log(apk_file)
 80 | 			log.flush()
 81 | 
 82 | 			# write analyzer here.
 83 | ```
 84 | 
 85 | In order to register a analyzer inside of apkminer, save the analyzer as a .py in the analyzers/ directory and then edit analyzers/__init__.py to include the name of your analyzer.
 86 | 
 87 | For example:
 88 | 
 89 | ```
 90 | analyzers/test_analyzer.py
 91 | ```
 92 | 
 93 | Then add "test_analyzer" to the line import list in __init__.py
 94 | 
 95 | Check out the aws_finder.py or other analyzers for examples.  Also spend some time looking at the helper functions inside of utils.py.
 96 | 
 97 | ### Optional features for analyzers
 98 | 
 99 | In order to enable structured output that is separate from the log file a analyzer writer can define two other methods in their .py file:
100 | 
101 | 1. output_results - Used for bulk writes after completion of all input apk's.
102 | 2. stream_results - Used for streaming results as they are generated.
103 | 
104 | ### output_results example
105 | 
106 | ```python
107 | import pickle
108 | 
109 | def output_results(output_data):
110 | 	fd = open("output.pick", "wb")
111 | 	pickle.dump(output_data, fd)
112 | 	fd.close()
113 | ```
114 | 
115 | ### stream_results example
116 | 
117 | ```python
118 | import csv
119 | import Queue
120 | 
121 | def stream_results(output_queue, end_event):
122 | 	csv_fd = open('test.csv', 'wb')
123 | 	datawriter = csv.writer(csv_fd)
124 | 
125 | 	while not end_event.is_set():
126 | 		try:
127 | 			data = output_queue.get(True, 1)
128 | 			datawriter.writerow(data)
129 | 
130 | 		except Queue.Empty:
131 | 			continue
132 | ```
133 | 


--------------------------------------------------------------------------------
/analyzers/__init__.py:
--------------------------------------------------------------------------------
1 | from analyzers import elf_files, private_keys, silverpush, aws_finder, so_census, bugsniffer


--------------------------------------------------------------------------------
/analyzers/aws_finder.py:
--------------------------------------------------------------------------------
 1 | from utils import *
 2 | 
 3 | def analyze(args, apk_queue, res_queue, output_data):
 4 | 	log = Logger(args.log_file, res_queue)
 5 | 	while True:
 6 | 		if apk_queue.empty():
 7 | 			return
 8 | 		else:
 9 | 			apk_file = apk_queue.get()
10 | 			file_path = args.in_dir + "/" + apk_file
11 | 			log.log("Checking: %s\n" % file_path)
12 | 
13 | 			try:
14 | 				a = apk.APK(file_path)
15 | 			except:
16 | 				log.log("ERROR parsing apk\n")
17 | 				log.flush()
18 | 				continue
19 | 
20 | 			found_aws = False
21 | 			main_act = a.get_main_activity()
22 | 			if not main_act:
23 | 				log.log("NO ACTIVITY: %s" % file_path)
24 | 				# fall back to just the apk file name
25 | 				main_act = apk_file
26 | 
27 | 			# try and skip any com.amazon.* apps
28 | 			if re.search(".com.amazon.", main_act):
29 | 				log.log("skipping: %s\n" % main_act)
30 | 				log.flush()
31 | 				continue
32 | 
33 | 			d = dvm.DalvikVMFormat(a.get_dex())
34 | 
35 | 			for current_class in d.get_classes():
36 | 				# log.log(current_class.get_name())
37 | 				if re.search(".amazon.", current_class.get_name(), re.IGNORECASE):
38 | 					found_aws = True
39 | 					break
40 | 
41 | 			if found_aws:
42 | 				assets = get_asset_files(a)
43 | 				found = regex_apk_files(a, assets, AWS_KEY_C)
44 | 
45 | 				log.log("asset KEYS:")
46 | 				for data, file in found:
47 | 					log.log("  %s: %s" % (file, data))
48 | 
49 | 
50 | 				d = dvm.DalvikVMFormat(a.get_dex())
51 | 				dx = analysis.newVMAnalysis( d )
52 | 				d.set_vmanalysis( dx )
53 | 				dx.create_xref()
54 | 
55 | 
56 | 				fp_detect = FPDetect(a,d)
57 | 
58 | 				log.log("\nDalvik keys:")
59 | 				for str_val, ref_obj in dx.get_strings_analysis().iteritems():
60 | 					found_key = re.findall(AWS_KEY_C, str_val)
61 | 
62 | 					for res in found_key:
63 | 						#bail out on FP hits
64 | 						if fp_detect.is_sec_fp(res):
65 | 							continue
66 | 						
67 | 						log.log("  %s" % res)
68 | 						for ref_class, ref_method in ref_obj.get_xref_from():
69 | 							if fp_detect.is_xref_fp(ref_method.get_class_name()):
70 | 								continue
71 | 
72 | 							log.log("    REF: %s->%s%s" % (ref_method.get_class_name(), 
73 | 														   ref_method.get_name(),
74 | 														   ref_method.get_descriptor()))
75 | 						
76 | 			log.log("\n")
77 | 			log.flush()
78 | 


--------------------------------------------------------------------------------
/analyzers/bugsniffer.py:
--------------------------------------------------------------------------------
 1 | from utils import *
 2 | 
 3 | def analyze(args, apk_queue, res_queue, output_data):
 4 | 	log = Logger(args.log_file, res_queue)
 5 | 	while True:
 6 | 		if apk_queue.empty():
 7 | 			return
 8 | 		else:
 9 | 			apk_file = apk_queue.get()
10 | 
11 | 			file_path = args.in_dir + "/" + apk_file
12 | 			log.log("Checking: %s\n" % file_path)
13 | 
14 | 			try:
15 | 				a = apk.APK(file_path)
16 | 			except:
17 | 				log.log("ERROR parsing apk\n")
18 | 				log.flush()
19 | 				continue
20 | 			log.log("Parsed APK")
21 | 
22 | 			d = dvm.DalvikVMFormat(a.get_dex())
23 | 			log.log("Parsed Dalvik")
24 | 
25 | 			dx = analysis.newVMAnalysis(d)
26 | 			d.set_vmanalysis(dx)
27 | 			dx.create_xref()
28 | 			log.log("Completed VM analysis")
29 | 
30 | 			# Check for WebView
31 | 			find_call(dx, log, "Landroid/webkit/WebView;", "addJavascriptInterface")
32 | 
33 | 			# # Check for Runtime.exec()
34 | 			find_call(dx, log, "Ljava/lang/Runtime;", "exec")
35 | 
36 | 			# Check for pinning
37 | 			find_implements(dx, log, "Ljavax/net/ssl/X509TrustManager;")
38 | 
39 | 			find_methods(dx, log, ["checkServerTrusted"])
40 | 
41 | 			log.log("\n")
42 | 			log.flush()
43 | 


--------------------------------------------------------------------------------
/analyzers/elf_files.py:
--------------------------------------------------------------------------------
 1 | from elftools.elf.elffile import ELFFile
 2 | 
 3 | from utils import *
 4 | 
 5 | def analyze(args, apk_queue, res_queue, output_data):
 6 | 	log = Logger(args.log_file, res_queue)
 7 | 	while True:
 8 | 		if apk_queue.empty():
 9 | 			return
10 | 		else:
11 | 			apk_file = apk_queue.get()
12 | 
13 | 			file_path = args.in_dir + "/" + apk_file
14 | 			log.log("Checking: %s\n" % file_path)
15 | 			a = apk.APK(file_path)
16 | 
17 | 			so_files = []
18 | 			for file in a.get_files():
19 | 				exten = file.split('.')[-1]
20 | 				if exten == "so":
21 | 					so_files.append(file)
22 | 
23 | 			if len(so_files) == 0:
24 | 				continue
25 | 
26 | 			log.log("Found %d .so files" % len(so_files))
27 | 
28 | 			for so_file in so_files:
29 | 				elf_data = a.get_file(so_file)
30 | 				elf_stream = cStringIO.StringIO(elf_data)
31 | 				try:
32 | 					elf = ELFFile(elf_stream)
33 | 				except:
34 | 					log.log("ERROR: bad elf file")
35 | 					log.flush()
36 | 					continue
37 | 
38 | 				log.log("  File: %s" % so_file)
39 | 				log.log("  Elf sections:")
40 | 				for section in elf.iter_sections():
41 | 					log.log("\t%s" % section.name)
42 | 					if section.name == ".comment" or section.name == ".conststring":
43 | 						log.log("\t\t%s" % section.data().replace("\x00", "\n"))
44 | 
45 | 			log.log("\n\n")
46 | 			log.flush()
47 | 
48 | 


--------------------------------------------------------------------------------
/analyzers/private_keys.py:
--------------------------------------------------------------------------------
 1 | from utils import *
 2 | 
 3 | def analyze(args, apk_queue, res_queue, output_data):
 4 | 	log = Logger(args.log_file, res_queue)
 5 | 	while True:
 6 | 		if apk_queue.empty():
 7 | 			return
 8 | 		else:
 9 | 			apk_file = apk_queue.get()
10 | 			file_path = args.in_dir + "/" + apk_file
11 | 			log.log("Checking: %s\n" % file_path)
12 | 
13 | 			try:
14 | 				a = apk.APK(file_path)
15 | 			except Exception as err:
16 | 				log.log("ERROR parsing apk: %s\n" % err)
17 | 				log.flush()
18 | 				continue
19 | 
20 | 			PRIV_KEY_PAT = ".PRIVATE KEY-----"
21 | 			dex_files = []
22 | 			for file in a.get_files():
23 | 				file_data = a.get_file(file)
24 | 				if re.search(PRIV_KEY_PAT, file_data):
25 | 					log.log("  FOUND %s" % file)
26 | 					if file[-4:] == ".dex":
27 | 						dex_files.append(file)
28 | 					#log.log("  FOUND  %s:\n%s" % (file.decode('utf-8', 'ignore'), file_data.decode('utf-8', 'ignore')))
29 | 
30 | 			for dex_file in dex_files:
31 | 				d = dvm.DalvikVMFormat(a.get_file(dex_file))
32 | 				dx = analysis.newVMAnalysis( d )
33 | 				d.set_vmanalysis( dx )
34 | 				dx.create_xref()
35 | 
36 | 				for str_val, ref_obj in dx.get_strings_analysis().iteritems():
37 | 					found_key = re.findall(PRIV_KEY_PAT, str_val)
38 | 
39 | 					for res in found_key:
40 | 						log.log("  %s" % str_val)
41 | 						for ref_class, ref_method in ref_obj.get_xref_from():
42 | 							log.log("    REF: %s->%s%s" % (ref_method.get_class_name(), 
43 | 														   ref_method.get_name(),
44 | 														   ref_method.get_descriptor()))
45 | 
46 | 			log.log("\n\n")
47 | 			log.flush()
48 | 


--------------------------------------------------------------------------------
/analyzers/silverpush.py:
--------------------------------------------------------------------------------
 1 | from utils import *
 2 | 
 3 | def analyze(args, apk_queue, res_queue, output_data):
 4 | 	log = Logger(args.log_file, res_queue)
 5 | 	while True:
 6 | 		if apk_queue.empty():
 7 | 			return
 8 | 		else:
 9 | 			apk_file = apk_queue.get()
10 | 
11 | 			file_path = args.in_dir + "/" + apk_file
12 | 			log.log("Checking: %s\n" % file_path)
13 | 
14 | 			try:
15 | 				a = apk.APK(file_path)
16 | 			except:
17 | 				log.log("ERROR parsing apk\n")
18 | 				log.flush()
19 | 				continue
20 | 
21 | 			record_perm = "android.permission.RECORD_AUDIO" in a.get_permissions()
22 | 
23 | 			try:
24 | 				if "com.silverpush.sdk.android.SPService" in a.get_services() or "com.silverpush.sdk.android.BR_CallState" in a.get_receivers():
25 | 					log.log("found silverpush, can record: %s" % str(record_perm))
26 | 					log.flush()
27 | 					continue
28 | 			except:
29 | 				log.log("BAD APK DATA: %s" % apk_file)
30 | 
31 | 			log.log("\n")
32 | 			log.flush()
33 | 


--------------------------------------------------------------------------------
/analyzers/so_census.py:
--------------------------------------------------------------------------------
 1 | import hashlib
 2 | 
 3 | from utils import *
 4 | 
 5 | try:
 6 |    import cPickle as pickle
 7 | except:
 8 |    import pickle
 9 | 
10 | def get_base_name(elf_file):
11 | 	if elf_file.find("/") != -1:
12 | 		return elf_file.split("/")[-1]
13 | 	else:
14 | 		return elf_file
15 | 
16 | 
17 | def output_results(output_data):
18 | 	final_data = {}
19 | 	for element in output_data:
20 | 		cur_name = element["name"]
21 | 		if cur_name in final_data:
22 | 			final_data[cur_name]["count"] += 1
23 | 			final_data[cur_name]["sha_hashes"].append(element["sha_hash"])
24 | 			final_data[cur_name]["apk_files"].append(element["apk_file"])
25 | 		else:
26 | 			final_data[cur_name] = {}
27 | 			final_data[cur_name]["count"] = 1
28 | 			final_data[cur_name]["sha_hashes"] = [element["sha_hash"]]
29 | 			final_data[cur_name]["apk_files"] = [element["apk_file"]]
30 | 
31 | 	fd = open("output.pick", "wb")
32 | 	pickle.dump(final_data, fd )
33 | 	fd.close()
34 | 
35 | 	# for name, obj in final_data.iteritems():
36 | 	# 	print "  %s, %d, %s, %s" % (name, obj["count"], str(obj["sha_hashes"]), str(obj["apk_files"]))
37 | 
38 | def analyze(args, apk_queue, res_queue, output_data):
39 | 	log = Logger(args.log_file, res_queue)
40 | 	while True:
41 | 		if apk_queue.empty():
42 | 			return
43 | 		else:
44 | 			apk_file = apk_queue.get()
45 | 
46 | 			file_path = args.in_dir + "/" + apk_file
47 | 			log.log("Checking: %s\n" % file_path)
48 | 
49 | 			try:
50 | 				a = apk.APK(file_path)
51 | 			except:
52 | 				log.log("ERROR parsing apk\n")
53 | 				log.flush()
54 | 				continue
55 | 
56 | 			so_files = []
57 | 			for file in a.get_files():
58 | 				exten = file.split('.')[-1]
59 | 				if exten == "so":
60 | 					so_files.append(file)
61 | 
62 | 			if len(so_files) == 0:
63 | 				continue
64 | 
65 | 			log.log("Found %d .so files" % len(so_files))
66 | 
67 | 			for elf_file in so_files:
68 | 				cur_hash = hashlib.sha1(a.get_file(elf_file)).hexdigest()
69 | 
70 | 				base_name = get_base_name(elf_file)
71 | 				
72 | 				output_data.put({"name": base_name, "sha_hash": cur_hash, "apk_file": apk_file})
73 | 
74 | 			log.log("\n\n")
75 | 			log.flush()
76 | 


--------------------------------------------------------------------------------
/analyzers/utils.py:
--------------------------------------------------------------------------------
  1 | from androguard.core.bytecodes import apk
  2 | from androguard.core.bytecodes import dvm
  3 | from androguard.core.analysis import analysis
  4 | 
  5 | import re
  6 | 
  7 | BLACKLIST_FILETYPES = [
  8 | 	"jpg",
  9 | 	"jet",
 10 | 	"css",
 11 | 	"js",
 12 | 	"ttf",
 13 | 	"fbstr",
 14 | 	"svg",
 15 | 	"png",
 16 | 	"otf",
 17 | 	"mp3"]
 18 | 
 19 | 
 20 | AWS_ID_PAT = "(?<![A-Z0-9])[A-Z0-9]{20}(?![A-Z0-9])"
 21 | AWS_SEC_PAT = "(?<![A-Za-z0-9/+])[A-Za-z0-9/+=]{40}(?![A-Za-z0-9/+=;$])"
 22 | AWS_KEY_C = re.compile(AWS_ID_PAT + "|" + AWS_SEC_PAT)
 23 | 
 24 | class Logger():
 25 | 	def __init__(self, file, res_queue):
 26 | 		self.file = file
 27 | 		self.LOG = ""
 28 | 		self.res_queue = res_queue
 29 | 	def log(self, data):
 30 | 		self.LOG += "%s\n" % data
 31 | 	def flush(self):
 32 | 		self.res_queue.put(self.LOG)
 33 | 		self.LOG = ""
 34 | 	def clean(self):
 35 | 		self.LOG = ""
 36 | 
 37 | def find_call(dx, log, class_name, func_name):
 38 | 	for name, cur_class in dx.classes.items():
 39 | 		for method in cur_class.get_methods():
 40 | 			xref_from = method.get_xref_to()
 41 | 
 42 | 			for ref_class, ref_method, offset in xref_from:
 43 | 				ref_class_name = ref_class.orig_class
 44 | 				# WTF
 45 | 				if type(ref_class_name) == analysis.ExternalClass:
 46 | 					ref_class_name = ref_class_name.name
 47 | 
 48 | 				ref_method_name = ref_method.get_name()
 49 | 
 50 | 				if ref_class_name == class_name and ref_method_name == func_name:
 51 | 					log.log("%s.%s  calls  %s.%s()" % (name, method.method.name, class_name, func_name))
 52 | 					log.log("")
 53 | 
 54 | def find_implements(dx, log, srch_class_name):
 55 | 	for name, cur_class in dx.classes.items():
 56 | 		class_name = cur_class.orig_class
 57 | 
 58 | 		found = False
 59 | 		if type(class_name) == analysis.ExternalClass:
 60 | 			class_name = class_name.name
 61 | 
 62 | 		else:
 63 | 			if srch_class_name in cur_class.orig_class.get_interfaces():
 64 | 				found = True
 65 | 
 66 | 		if class_name == srch_class_name:
 67 | 			found = True
 68 | 
 69 | 		if found:
 70 | 			log.log("%s implements: %s" % (name, srch_class_name))
 71 | 
 72 | 		# ipdb> p cur_class.orig_class.get_interfaces()
 73 | 		# ['Ljavax/net/ssl/X509TrustManager;']
 74 | 
 75 | 		# log.log("%s implements: %s" % (name, class_name))
 76 | 		# if name == "Lo/md;":
 77 | 		# 	import ipdb; ipdb.set_trace();
 78 | 
 79 | def find_methods(dx, log, methods):
 80 | 	for name, cur_class in dx.classes.items():
 81 | 		for method in cur_class.get_methods():
 82 | 			if method.method.name in methods:
 83 | 				log.log("%s implements: %s" % (name, method.method.name))
 84 | 
 85 | 
 86 | 
 87 | def is_blacklist_filetype(file):
 88 | 	exten = file.split('.')[-1]
 89 | 	for b_extenion in BLACKLIST_FILETYPES:
 90 | 		if exten == b_extenion:
 91 | 			return True
 92 | 	return False
 93 | 
 94 | def get_asset_files(a):
 95 | 	started = False
 96 | 	# break once we are at the end of assets, get_files is alphabetical.
 97 | 	ret = []
 98 | 	for file in a.get_files():
 99 | 		if file[:7] == 'assets/':
100 | 			started = True
101 | 			if not is_blacklist_filetype(file):
102 | 				ret.append(file)
103 | 
104 | 		elif started == True:
105 | 			break
106 | 	return ret
107 | 
108 | def regex_apk_files(a, files, pat):
109 | 	results = []
110 | 	for file in files:
111 | 		data = a.get_file(file)
112 | 		found = re.findall(pat, data)
113 | 		for find in found:
114 | 			results.append([find, file])
115 | 	return results
116 | 
117 | class FPDetect():
118 | 	def __init__(self, a, d):
119 | 		self.a = a
120 | 		self.d = d
121 | 		# create blob of data to regex, kinda a hack but faster than multiple calls to re.*
122 | 		self.classes_str = str(self.d.get_classes_names())
123 | 
124 | 	def _find_in_classes(self, data):
125 | 		if self.classes_str.find(data) != -1:
126 | 			return True
127 | 
128 | 		# fall back to case insensitive search. 
129 | 		if re.search(".%s." % data, self.classes_str, re.IGNORECASE):
130 | 			return True
131 | 		return False
132 | 
133 | 	def is_sec_fp(self, data):
134 | 		if data[:5] == "/com/":
135 | 			return True
136 | 		elif data[:9] == "Landroid/":
137 | 			return True
138 | 		elif data[:5] == "Lcom/":
139 | 			return True
140 | 		elif data[:6] == "Ljava/":
141 | 			return True
142 | 		elif data == "ABCDEFGHJKLMNPQRSTXY": # placeholder AWS_ID
143 | 			return True
144 | 		elif data == "DROPPEDSESSIONLENGTH":
145 | 			return True
146 | 		elif data == "LAUNCHESAFTERUPGRADE":
147 | 			return True
148 | 		elif data == "COMPROMISEDLIBRARIES":
149 | 			return True
150 | 		elif data == "========================================": # derp
151 | 			return True
152 | 		elif data == "3i2ndDfv2rTHiSisAbouNdArYfORhtTPEefj3q2f": # MIME boundry
153 | 			return True
154 | 		elif data == "5e8f16062ea3cd2c4a0d547876baa6f38cabf625": # FB hash
155 | 			return True
156 | 		elif data == "8a3c4b262d721acd49a4bf97d5213199c86fa2b9": # FB hash
157 | 			return True
158 | 		elif data == "a4b7452e2ed8f5f191058ca7bbfd26b0d3214bfc": # FB hash
159 | 			return True
160 | 		elif data == "bca6990fc3c15a8105800c0673517a4b579634a1": # X-CRASHLYTICS-DEVELOPER-TOKEN
161 | 			return True
162 | 		elif data == "registerOnSharedPreferenceChangeListener": # nfc why this is not found
163 | 			return True
164 | 		elif data == "setJavaScriptCanOpenWindowsAutomatically":
165 | 			return True
166 | 		elif data == "startAppWidgetConfigureActivityForResult":
167 | 			return True
168 | 		elif self._find_in_classes(data):     # is this string a method
169 | 			return True
170 | 		elif self.d.get_method(data): # is this string a class
171 | 			return True
172 | 		else:
173 | 			return False
174 | 
175 | 	def is_xref_fp(self, data):
176 | 		if data[:14] == "Lmono/android/":
177 | 			return True
178 | 		elif data[:25] == "Lcom/twitter/sdk/android/":
179 | 			return True
180 | 		elif data[:12] == "Lcom/amazon/":
181 | 			return True
182 | 		elif data[:20] == "Lcom/google/android/":
183 | 			return True
184 | 		elif data[:18] == "Lorg/spongycastle/":
185 | 			return True
186 | 		else:
187 | 			return False
188 | 
189 | 


--------------------------------------------------------------------------------
/apkminer.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | import sys
  4 | import time
  5 | import types
  6 | import pprint
  7 | import signal
  8 | import logging
  9 | import argparse
 10 | import traceback
 11 | import cStringIO
 12 | 
 13 | import Queue # just for exceptions
 14 | import multiprocessing as mp
 15 | 
 16 | import os
 17 | from os import listdir
 18 | from os.path import isfile, join
 19 | 
 20 | from androguard.core.bytecodes import apk
 21 | from androguard.core.bytecodes import dvm
 22 | from androguard.core.analysis import analysis
 23 | 
 24 | import analyzers
 25 | 
 26 | def get_files_in_dir(dir_path):
 27 | 	return [f for f in listdir(dir_path) if isfile(join(dir_path, f))]
 28 | 
 29 | def logger_runner(log_file, res_queue, end_event):
 30 | 	print "started logger"
 31 | 	fd = open(log_file, "a")
 32 | 	while not end_event.is_set():
 33 | 		try:
 34 | 			log_data = res_queue.get(True, 1)
 35 | 		except Queue.Empty:
 36 | 			continue
 37 | 
 38 | 		fd.write(log_data)
 39 | 		fd.flush()
 40 | 
 41 | 	fd.close()
 42 | 
 43 | def output_processor(output_queue, end_event, analyzer_out_func):
 44 | 	data = []
 45 | 	while not end_event.is_set():
 46 | 		try:
 47 | 			data.append(output_queue.get(True, 1))
 48 | 		except Queue.Empty:
 49 | 			continue
 50 | 	analyzer_out_func(data)
 51 | 
 52 | def runner(func, args, queue, res_queue, output_data):
 53 | 	try:
 54 | 		func(args, queue, res_queue, output_data)
 55 | 	except:
 56 | 		raise Exception("".join(traceback.format_exception(*sys.exc_info())))
 57 | 
 58 | def init_worker():
 59 | 	signal.signal(signal.SIGINT, signal.SIG_IGN)
 60 | 
 61 | def main():
 62 | 	parser = argparse.ArgumentParser(description='analyzer of APKs')
 63 | 	parser.add_argument("-i", "--in_dir", type=str,
 64 | 						help="directory of apk files to analyze", default=None)
 65 | 	parser.add_argument("-o", "--log_file", type=str,
 66 | 						help="log file to write to", default="OUTPUT.log")
 67 | 	parser.add_argument("-c", "--cores", type=int,
 68 | 						help="force a number of cores to use")
 69 | 	parser.add_argument("-a", "--analyzer", type=str,
 70 | 						help="Select the analyzer you want to use.", default="elf_files")
 71 | 	parser.add_argument("-l", "--list_analyzers", action="store_true",
 72 | 						help="List the possible analyzers")
 73 | 
 74 | 	args = parser.parse_args()
 75 | 
 76 | 	publics = (name for name in dir(analyzers) if not name.startswith('_'))
 77 | 
 78 | 	# dynamically get all analyzers in the directory
 79 | 	analyzer_funcs = {}
 80 | 	selected_output_func = None
 81 | 	selected_stream_func = None
 82 | 
 83 | 	for name in publics:
 84 | 		obj = getattr(analyzers, name)
 85 | 		if hasattr(obj, "analyze"):
 86 | 			analyzer_funcs[name] = obj
 87 | 
 88 | 	if args.list_analyzers:
 89 | 		print "Analyzers:"
 90 | 		for func_name, func in analyzer_funcs.iteritems():
 91 | 			print "  %s" % func_name
 92 | 		return
 93 | 
 94 | 	if not args.in_dir:
 95 | 		print "Please provide a input directory with -i"
 96 | 		return
 97 | 
 98 | 	selected_analyzer = None
 99 | 	for func_name, obj in analyzer_funcs.iteritems():
100 | 		if func_name == args.analyzer:
101 | 			selected_analyzer = obj
102 | 
103 | 			if hasattr(obj, "output_results"):
104 | 				selected_output_func = getattr(obj,"output_results")
105 | 			elif hasattr(obj, "stream_results"):
106 | 				selected_stream_func = getattr(obj,"stream_results")
107 | 			break
108 | 
109 | 	if not selected_analyzer:
110 | 		print "You selected a bad analyzer [%s]" % args.analyzer
111 | 		print "Analyzers:"
112 | 		for func_name, func in analyzer_funcs.iteritems():
113 | 			print "  %s" % func_name
114 | 
115 | 		return
116 | 
117 | 	if args.cores:
118 | 		cores = args.cores
119 | 	else:
120 | 		cores = mp.cpu_count()
121 | 
122 | 	print "Starting '%s' analyzer with %d cores, log file: %s" % (selected_analyzer.__name__, cores, args.log_file)
123 | 
124 | 	apk_files = get_files_in_dir(args.in_dir)
125 | 
126 | 	# Enable for debugging info.
127 | 	# mp.log_to_stderr(logging.DEBUG)
128 | 	manager = mp.Manager()
129 | 	pool = mp.Pool(cores + 2, init_worker)
130 | 
131 | 	apk_queue = manager.Queue()
132 | 	
133 | 	# for logging 
134 | 	res_queue = manager.Queue()
135 | 
136 | 	# for data output
137 | 	output_data = manager.Queue()
138 | 	end_event = manager.Event()
139 | 
140 | 	# if we have a small count of APK files, limit our worker count
141 | 	apk_count = len(apk_files)
142 | 	if apk_count < cores:
143 | 		cores = apk_count
144 | 
145 | 	for apk in apk_files:
146 | 		apk_queue.put(apk)
147 | 
148 | 	try:
149 | 		# TODO: make the runner handle multiple arg lists?
150 | 		log_result = pool.apply_async(logger_runner, (args.log_file, res_queue, end_event))
151 | 
152 | 		if selected_output_func:
153 | 			print "started output output_processor"
154 | 			output_res = pool.apply_async(output_processor, (output_data, end_event, selected_output_func))
155 | 		elif selected_stream_func:
156 | 			print "started streaming output processor"
157 | 			output_res = pool.apply_async(selected_stream_func, (output_data, end_event))
158 | 
159 | 		worker_results = []
160 | 		for i in xrange(0, cores):
161 | 			worker_results.append(pool.apply_async(runner, (selected_analyzer.analyze, args, apk_queue, res_queue, output_data)))
162 | 		pool.close()
163 | 
164 | 		while len(worker_results) > 0:
165 | 			for i, res in enumerate(worker_results):
166 | 				if res.ready():
167 | 					result = res.get()
168 | 					if not res.successful():
169 | 						print "one of the workers failed"
170 | 						worker_results = []
171 | 						break
172 | 					else:
173 | 						worker_results.pop(i)
174 | 
175 | 				time.sleep(1)
176 | 
177 | 		print "completed all work"
178 | 		end_event.set()
179 | 		pool.join()
180 | 
181 | 		# get the exception if the output func fails.
182 | 		if selected_output_func or selected_stream_func:
183 | 			output_res.get()
184 | 
185 | 		pool.terminate()
186 | 		pool.join()
187 | 
188 | 	except KeyboardInterrupt:
189 | 		print "Exiting!"
190 | 		pool.terminate()
191 | 		pool.join()
192 | 	
193 | if __name__ == '__main__':
194 | 	main()
195 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pyelftools
2 | -e git+https://github.com/androguard/androguard.git#egg=androguard
3 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | from setuptools import setup
 4 | 
 5 | import pip
 6 | from pip.req import parse_requirements
 7 | 
 8 | install_reqs = parse_requirements("requirements.txt", session=pip.download.PipSession())
 9 | requirements = [str(ir.req) for ir in install_reqs]
10 | 
11 | setup(name              = 'apkminer',
12 |       description       = 'Parallel APK vulnerability analyzer',
13 |       author            = 'mothran',
14 |       py_modules        = ['apkminer'],
15 |       url               = 'http://github.com/mothran/apkminer',
16 |       install_requires  = requirements,
17 |       entry_points  = {
18 |           'console_scripts': ['apkminer = apkminer:main']
19 |           },
20 | )
21 | 


--------------------------------------------------------------------------------
/test/example_decomp.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | import pprint
 3 | import argparse
 4 | 
 5 | 
 6 | from androguard.core.bytecodes import apk
 7 | from androguard.core.bytecodes import dvm
 8 | from androguard.core.analysis import analysis
 9 | 
10 | from androguard.decompiler.decompiler import *
11 | 
12 | 
13 | # allow for copy + pasting from the main codebase
14 | class Log():
15 | 	def log(self,data):
16 | 		print data
17 | 	def flush(self):
18 | 		pass
19 | 
20 | log = Log()
21 | 
22 | pp = pprint.PrettyPrinter(indent=4)
23 | 
24 | parser = argparse.ArgumentParser(description='analyzer of APKs')
25 | parser.add_argument("-i", "--in_apk", type=str,
26 | 					help="apk file to analyze", required=True)
27 | args = parser.parse_args()
28 | apk_file = file_path = args.in_apk
29 | 
30 | for one in xrange(0,1):
31 | 	log.log("starting analysis of %s" % apk_file)
32 | 	
33 | 	try:
34 | 		a = apk.APK(file_path)
35 | 	except Exception as err:
36 | 		log.log("ERROR parsing apk: %s\n" % err)
37 | 		log.flush()
38 | 		continue
39 | 
40 | 	PRIV_KEY_PAT = ".PRIVATE KEY-----"
41 | 	dex_files = []
42 | 	for file in a.get_files():
43 | 		file_data = a.get_file(file)
44 | 		if re.search(PRIV_KEY_PAT, file_data):
45 | 			log.log("  FOUND %s" % file)
46 | 			if file[-4:] == ".dex":
47 | 				dex_files.append(file)
48 | 
49 | 	for dex_file in dex_files:
50 | 		d = dvm.DalvikVMFormat(a.get_file(dex_file))
51 | 		dx = analysis.newVMAnalysis( d )
52 | 		d.set_vmanalysis( dx )
53 | 		dx.create_xref()
54 | 
55 | 		d.set_decompiler(DecompilerDAD(d, dx))
56 | 		d.set_vmanalysis(dx)
57 | 
58 | 		for str_val, ref_obj in dx.get_strings_analysis().iteritems():
59 | 			found_key = re.findall(PRIV_KEY_PAT, str_val)
60 | 
61 | 			for res in found_key:
62 | 				log.log("  %s" % str_val)
63 | 				for ref_class, ref_method in ref_obj.get_xref_from():
64 | 					log.log("    REF: %s->%s%s" % (ref_method.get_class_name(), 
65 | 												   ref_method.get_name(),
66 | 												   ref_method.get_descriptor()))
67 | 
68 | 
69 | 				current_class = d.get_class(ref_method.get_class_name())
70 | 				if current_class != None:
71 | 					print current_class.get_source()
72 | 				else:
73 | 					print "ref'd class not found"
74 | 
75 | 				xmethod = dx.get_method_analysis_by_name(ref_method.get_class_name(),ref_method.get_name(), ref_method.get_descriptor())
76 | 
77 | 				for xref_class, xref_method, xoffset in xmethod.get_xref_from():
78 | 					log.log("    REF: %s->%s%s\n" % (xref_method.get_class_name(), 
79 | 												   xref_method.get_name(),
80 | 												   xref_method.get_descriptor()))
81 | 
82 | 				current_class = d.get_class(xref_method.get_class_name())
83 | 				if current_class != None:
84 | 					print current_class.get_source()
85 | 				else:
86 | 					print "ref'd class not found"
87 | 


--------------------------------------------------------------------------------
/test/test_parse.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | import pprint
 3 | import argparse
 4 | 
 5 | 
 6 | from androguard.core.bytecodes import apk
 7 | from androguard.core.bytecodes import dvm
 8 | from androguard.core.analysis import analysis
 9 | 
10 | 
11 | # allow for copy + pasting from the main codebase
12 | class Log():
13 | 	def log(self,data):
14 | 		print data
15 | 	def flush(self):
16 | 		pass
17 | 
18 | log = Log()
19 | 
20 | pp = pprint.PrettyPrinter(indent=4)
21 | 
22 | parser = argparse.ArgumentParser(description='analyzer of APKs')
23 | parser.add_argument("-i", "--in_apk", type=str,
24 | 					help="apk file to analyze", required=True)
25 | args = parser.parse_args()
26 | apk_file = file_path = args.in_apk
27 | 
28 | for one in xrange(0,1):
29 | 	log.log("starting analysis of %s" % apk_file)
30 | 	
31 | 	a = apk.APK(file_path)
32 | 	d = dvm.DalvikVMFormat(a.get_dex())
33 | 
34 | 	log.log("completed base analysis")
35 | 
36 | 	for method in d.get_methods():
37 | 		print method
38 | 
39 | 
40 | 	dx = analysis.newVMAnalysis( d )
41 | 	d.set_vmanalysis( dx )
42 | 
43 | 	log.log("creating xrefs")
44 | 	dx.create_xref()
45 | 
46 | 	class_name = "Ljava/lang/Runtime;"
47 | 	func_name = "exec"
48 | 	func_proto = "(Ljava/lang/String;)V"
49 | 	# method = dx.get_method_by_name(class_name, func_name, func_proto)
50 | 
51 | 	break
52 | 
53 | 	# print dir()
54 | 	# for class_str, class_obj in dx.classes.iteritems():
55 | 	# 	print class_str
56 | 
57 | 	# print dir(method)
58 | 	# print method
59 | 
60 | 
61 | 
62 | 	for key, val in dx.get_strings_analysis().iteritems():
63 | 		if key == "logcat -d -f ":
64 | 			print "FOUND: %s " % key
65 | 
66 | 			for ref_class, ref_method in val.get_xref_from():
67 | 				print type(ref_method)
68 | 				pp.pprint(dir(ref_method))
69 | 				print type(ref_class)
70 | 				pp.pprint(dir(ref_class))
71 | 
72 | 				for classobj, class_set in ref_class.get_xref_to().iteritems():
73 | 					class_list = list(class_set)
74 | 					
75 | 					print classobj.orig_class
76 | 					for obj in class_list:
77 | 						print obj[1].get_name()
78 | 
79 | 
80 | 
81 | 
82 | 					# import pdb; pdb.set_trace()
83 | 
84 | 				# import pdb; pdb.set_trace()
85 | 				# print ref_method.class_name
86 | 				# print ref_method.proto
87 | 				# print ref_method.get_name()
88 | 
89 | 				# print dir(ref_method)
90 | 				# print dir(ref_class)
91 | 				# # ref_method.show()
92 | 
93 | 			break


--------------------------------------------------------------------------------
/test/test_silver.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | import sys
 3 | import argparse
 4 | 
 5 | from androguard.core.bytecodes import apk
 6 | from androguard.core.bytecodes import dvm
 7 | from androguard.core.analysis import analysis
 8 | 
 9 | # allow for copy + pasting from the main codebase
10 | class Log():
11 | 	def log(self,data):
12 | 		print data
13 | 	def flush(self):
14 | 		pass
15 | 
16 | log = Log()
17 | 
18 | parser = argparse.ArgumentParser(description='analyzer of APKs')
19 | parser.add_argument("-i", "--in_apk", type=str,
20 | 					help="apk file to analyze", required=True)
21 | args = parser.parse_args()
22 | apk_file = file_path = args.in_apk
23 | 
24 | 
25 | for one in xrange(0,1):
26 | 	log.log("Checking: %s\n" % file_path)
27 | 	a = apk.APK(file_path)
28 | 
29 | 	record_perm = "android.permission.RECORD_AUDIO" in a.get_permissions()
30 | 
31 | 	if "com.silverpush.sdk.android.SPService" in a.get_services() or "com.silverpush.sdk.android.BR_CallState" in a.get_receivers():
32 | 		log.log("found silverpush, can record: %s" % str(record_perm))
33 | 		log.flush()
34 | 		# continue
35 | 
36 | 	# elif
37 | 	if record_perm:
38 | 		dex_files = list(a.get_all_dex())
39 | 		if not dex_files:
40 | 			log.log("no dex files")
41 | 			continue
42 | 
43 | 		for dex in dex_files:
44 | 			d = dvm.DalvikVMFormat(dex)
45 | 			for data in d.get_strings():
46 | 				print data.replace("\x00", "")
47 | 				# if re.search("\"silverpush\"", data, re.IGNORECASE):
48 | 				# 	print data
49 | 				# 	break


--------------------------------------------------------------------------------
/test/testbug.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | import sys
 3 | import argparse
 4 | 
 5 | from androguard.core.bytecodes import apk
 6 | from androguard.core.bytecodes import dvm
 7 | from androguard.core.analysis import analysis
 8 | 
 9 | # allow for copy + pasting from the main codebase
10 | class Log():
11 | 	def log(self,data):
12 | 		print(data)
13 | 	def flush(self):
14 | 		pass
15 | 
16 | log = Log()
17 | 
18 | parser = argparse.ArgumentParser(description='analyzer of APKs')
19 | parser.add_argument("-i", "--in_apk", type=str,
20 | 					help="apk file to analyze", required=True)
21 | args = parser.parse_args()
22 | apk_file = file_path = args.in_apk
23 | 
24 | 
25 | def find_call(dx, class_name, func_name):
26 | 	for name, cur_class in dx.classes.items():
27 | 		for method in cur_class.get_methods():
28 | 			xref_from = method.get_xref_to()
29 | 			
30 | 			for ref_class, ref_method, offset in xref_from:
31 | 				ref_class_name = ref_class.orig_class
32 | 				# WTF
33 | 				if type(ref_class_name) == analysis.ExternalClass:
34 | 					ref_class_name = ref_class_name.name
35 | 
36 | 				ref_method_name = ref_method.get_name()
37 | 
38 | 				if ref_class_name == class_name and ref_method_name == func_name:
39 | 					log.log("%s.%s  calls  %s.%s()" % (name, method.method.name, class_name, func_name))
40 | 					log.log("")
41 | 
42 | def find_implements(dx, srch_class_name):
43 | 	for name, cur_class in dx.classes.items():
44 | 		class_name = cur_class.orig_class
45 | 		if type(class_name) == analysis.ExternalClass:
46 | 			class_name = class_name.name
47 | 
48 | 		# log.log("%s implements: %s" % (name, class_name))
49 | 		if class_name == srch_class_name:
50 | 			log.log("%s implements: %s" % (name, class_name))
51 | 
52 | def find_methods(dx, methods):
53 | 	for name, cur_class in dx.classes.items():
54 | 
55 | 		# ipdb> p cur_class.orig_class.get_interfaces()
56 | 		# ['Ljavax/net/ssl/X509TrustManager;']
57 | 
58 | 		if name == "Lo/md;":
59 | 			import ipdb; ipdb.set_trace();
60 | 
61 | 		for method in cur_class.get_methods():
62 | 			if "checkServerTrusted" in method.method.name:
63 | 				log.log("%s inplements: %s" % (name, "checkServerTrusted"))
64 | 
65 | 
66 | for one in xrange(0,1):
67 | 	log.log("Checking: %s\n" % file_path)
68 | 
69 | 	try:
70 | 		a = apk.APK(file_path)
71 | 	except:
72 | 		log.log("ERROR parsing apk\n")
73 | 		log.flush()
74 | 		continue
75 | 	log.log("Parsed APK")
76 | 
77 | 	d = dvm.DalvikVMFormat(a.get_dex())
78 | 	log.log("Parsed Dalvik")
79 | 
80 | 	dx = analysis.newVMAnalysis(d)
81 | 	d.set_vmanalysis(dx)
82 | 	dx.create_xref()
83 | 	log.log("Completed VM analysis")
84 | 	
85 | 
86 | 	# Check for WebView
87 | 	find_call(dx, "Landroid/webkit/WebView;", "addJavascriptInterface")
88 | 
89 | 	# Check for Runtime.exec()
90 | 	find_call(dx, "Ljava/lang/Runtime;", "exec")
91 | 
92 | 	# Check for pinning
93 | 	find_implements(dx, "Ljavax/net/ssl/X509TrustManager;")
94 | 
95 | 	find_methods(dx, ["checkServerTrusted"])
96 | 
97 | 	log.log("\n")
98 | 	log.flush()
99 | 


--------------------------------------------------------------------------------