├── README.md ├── LICENSE └── pyarascanner.py /README.md: -------------------------------------------------------------------------------- 1 | # PyaraScanner 2 | 3 | A multithreaded many-rules to many-files YARA scanner for incident response or malware zoos 4 | ## Prerequisites 5 | 6 | YARA installed and Python 3.0-3.5 with the Yara-Python package 7 | 8 | 9 | ``` 10 | pip install yara-python 11 | ``` 12 | 13 | Yara-Python requires Microsoft Visual C++ Build Tools available [here](http://landinghub.visualstudio.com/visual-cpp-build-tools) under 'Build Tools for Visual Studio 2017' 14 | and the Yara binaries, available [here](https://github.com/VirusTotal/yara/releases) or [here](https://www.dropbox.com/sh/umip8ndplytwzj1/AADdLRsrpJL1CM1vPVAxc5JZa?dl=0&lst=) 15 | 16 | Alternatively, you can download an easy installer which should download everything you need for your version of Python [here](https://www.dropbox.com/sh/umip8ndplytwzj1/AADdLRsrpJL1CM1vPVAxc5JZa?dl=0&lst=) (only supports up to Python 3.5) 17 | 18 | 19 | ## Running a scan 20 | 21 | To run with default settings, just specify a folder for .yar rules and a starting point for files to scan. All directories for both inputs are scanned recursively 22 | 23 | ``` 24 | pyarascanner.py C:\Yara_Rules_Path C:\Scan_Directory 25 | ``` 26 | Full syntax: 27 | 28 | ``` 29 | pyarascanner.py [-h] [-e] [-a] [-l LOG] [-m MAXSIZE] [-c CORES] [-x EXISTING_RULES] rules_path scan_path 30 | 31 | ``` 32 | 33 | ### Optional Arguments 34 | 35 | * -h show this help message and exit 36 | * -e Show all errors 37 | * -a Show alerts only 38 | * -l LOG Output to specified log file 39 | * -m MAXSIZE Set maximum file size (MB) 40 | * -c CORES Number of cores to use (defaults to number on system if unspecified) 41 | * -x EXISTING_RULES If specified, look for .rules file in same path as 42 | script 43 | ### Known Problems 44 | 45 | * Problematic files can cause a hang in the multiprocessing with each thread needing to finis 46 | * Only scan results are logged, no script messages (including yara compiling) 47 | 48 | ## Built With 49 | 50 | * [Yara-Python](https://github.com/VirusTotal/yara-python) - The awesome python implementation of awesome YARA rules -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /pyarascanner.py: -------------------------------------------------------------------------------- 1 | # PyYaraScanner 2 | # https://github.com/nogoodconfig/pyarascanner 3 | 4 | import argparse 5 | import os 6 | import hashlib 7 | import time 8 | import multiprocessing 9 | from datetime import datetime 10 | # yara-python imported later 11 | 12 | # define default configuration 13 | config = {'alerts_only': False, 14 | 'errors': True, 15 | 'log': '', 16 | 'maxsize': 150, 17 | 'rules_path': '', 18 | 'scan_path': '', 19 | 'compiled_path': 'compiled_yara_rules.rules', 20 | 'cores': multiprocessing.cpu_count(), 21 | 'existing_rules': False} 22 | 23 | 24 | class MyError(Exception): 25 | # Basic class to catch all errors and still print error code 26 | def __init__(self, message): 27 | # Call the base class constructor with the parameters it needs 28 | Exception.__init__(self, message) 29 | self.message = message 30 | 31 | 32 | class Messenger: 33 | """ 34 | Print/logging class 35 | """ 36 | 37 | def __init__(self, log_file_path="yarascan_{0}.txt".format(datetime.now().strftime('%Y-%m-%d-%H-%M-%S'))): 38 | #Replaced with log_result function 39 | #self.log_file = open(log_file_path, 'w') 40 | pass 41 | 42 | 43 | @staticmethod 44 | 45 | def make_message(code, text): 46 | timestamp = datetime.now().strftime('%Y-%m-%d-%H:%M:%S') 47 | output = '{0}: {1} {2}'.format(timestamp, code, text) 48 | return output 49 | 50 | def output_message(self, message): 51 | print(message) 52 | return message 53 | 54 | def error(self, text, sub_code='GENERAL'): 55 | code = '[ERROR:{0}]'.format(sub_code) 56 | m = self.make_message(code, text) 57 | return self.output_message(m) 58 | 59 | def found(self, text, sub_code=''): 60 | if sub_code == '': 61 | code = '[FOUND]' 62 | else: 63 | code = '[FOUND:{0}]'.format(sub_code) 64 | m = self.make_message(code, text) 65 | return self.output_message(m) 66 | 67 | def info(self, text, sub_code=''): 68 | if sub_code == '': 69 | code = '[INFO]' 70 | else: 71 | code = '[INFO:{0}]'.format(sub_code) 72 | m = self.make_message(code, text) 73 | return self.output_message(m) 74 | 75 | 76 | # Make global messenger class 77 | MSG = Messenger() 78 | 79 | # Try import yara-python, fail if not present 80 | try: 81 | import yara 82 | except Exception as e: 83 | MSG.error("Yara-Python module error! Make sure you have 'yara-python' not 'yara'!") 84 | MSG.error(e) 85 | exit(1) 86 | 87 | 88 | def compile_rules(rules_folder, compiled_rules_path='compiled_yara_rules.rules'): 89 | """ 90 | # Reads files in folder of 'directory' 91 | # Finds YARA rules, tests them, compiles them, saves them to file 92 | :param rules_folder: full path to folder containing yara rules 93 | :param compiled_rules_path: relative or full path to save compiled rules to 94 | :return: None 95 | """ 96 | 97 | global MSG 98 | 99 | # Create list of rules in input directory 100 | yara_hashes = [] 101 | yara_filepaths = {} 102 | count_duplicates = 0 103 | MSG.info("Getting rules from {0}...".format(rules_folder)) 104 | for root, directories, file_names in os.walk(rules_folder): 105 | for filename in file_names: 106 | # Check for matching file extension 107 | if filename.endswith(".yar"): 108 | # Hash file then check for duplicates 109 | md5 = md5_hash(os.path.join(root, filename)) 110 | # Check for duplicates... 111 | if md5 in yara_hashes: 112 | count_duplicates += count_duplicates 113 | else: 114 | # Add to list of yara rule hashes 115 | yara_hashes.append(md5) 116 | # Add to dictionary of rule names, containing full path to each yara rule 117 | yara_filepaths[filename] = os.path.join(root, filename) 118 | continue 119 | else: 120 | continue 121 | if len(yara_hashes) is 0: 122 | MSG.error("No YARA rules found in directory {0}!".format(rules_folder)) 123 | exit(1) 124 | MSG.info("{0} YARA rules found...".format(len(yara_hashes) + count_duplicates)) 125 | MSG.info("{0} duplicate YARA rules identified and removed from the set...".format(count_duplicates)) 126 | MSG.info("{0} YARA rules prepared to compile...".format(len(yara_hashes))) 127 | 128 | # Compile .yar files into yara compiled objects, store in list, cleanly error bad files 129 | 130 | # First test each rule to see if it compiles. 131 | rules_to_delete = [] 132 | MSG.info('Testing each yara rule') 133 | for rule, file_path in yara_filepaths.items(): 134 | try: 135 | yara.compile(filepath=file_path) 136 | except yara.SyntaxError as err: 137 | MSG.error('YARA syntax error: {0}'.format(err)) 138 | rules_to_delete.append(rule) 139 | 140 | # Discard those that won't compile 141 | for rule in rules_to_delete: 142 | del yara_filepaths[rule] 143 | 144 | MSG.info('{0} invalid rules deleted from list of rules to compile'.format(len(rules_to_delete))) 145 | 146 | MSG.info('Compiling {0} remaining rules'.format(len(yara_filepaths))) 147 | yara_rules = yara.compile(filepaths=yara_filepaths) 148 | """ 149 | # Old method for compiling lists... 150 | # This does allow larger number of yara rules to scan for 151 | # but can cause issues with multi-threading for now 152 | compile_success = 0 153 | compile_error = 0 154 | for rule in yara_filepaths: 155 | try: 156 | yara_compiled.append(yara.compile(filepath=str(yara_filepaths[rule]))) 157 | compile_success += 1 158 | except yara.SyntaxError as e: 159 | MSG.error('YARA syntax error: {0}'.format(e)) 160 | compile_error += 1 161 | continue 162 | if compile_error > 0: 163 | MSG.error(str(compile_error) + " YARA rules failed to compile...") 164 | MSG.info(str(compile_success) + " YARA rules compiled successfully...") 165 | """ 166 | # Finished compiling 167 | 168 | # Write to file 169 | yara_rules.save(compiled_rules_path) 170 | MSG.info('Compiled rules saved to {0}'.format(compiled_rules_path)) 171 | 172 | 173 | def md5_hash(file): 174 | # https://stackoverflow.com/questions/22058048/hashing-a-file-in-python 175 | buffer_size = 65536 176 | md5 = hashlib.md5() 177 | with open(file, 'rb') as f: 178 | while True: 179 | data = f.read(buffer_size) 180 | if not data: 181 | break 182 | md5.update(data) 183 | 184 | return "{0}".format(md5.hexdigest()) 185 | 186 | 187 | def parse_file(file_path, yara_rules): 188 | """ 189 | 190 | :param file_path: file path to be scanned for yara rule matches 191 | :param yara_rules: compiled yara rules object 192 | :return: 193 | """ 194 | # Run yara rules across a file 195 | # print('parsing {0}'.format(file_path)) # For error checking, it's currently printing 'None' for quite a few 196 | 197 | # Don't need yara rules for parsing each file now, as it's passed as an arg 198 | # yara_rules = yara.load(compiled_rules_path) # For multi processing, want to make this global 199 | matches = [] 200 | message = "" 201 | try: 202 | matches = yara_rules.match(file_path) 203 | except yara.Error as err: 204 | message = MSG.error('{0} Yara.Error parsing this file: {1}'.format(file_path, err)) 205 | except MyError as err: 206 | message = MSG.error('{0}: Unknown error: {1}'.format(file_path, err)) 207 | # If any matches found, create one string containing all matches within file 208 | if len(matches) > 0: 209 | str_matches = '' 210 | count = 0 211 | # Run through them, compiling string 212 | for m in matches: 213 | str_matches += str(m) 214 | if count < len(matches) - 1: 215 | str_matches += ', ' 216 | count += 1 217 | message = MSG.found('{0}: {1} matches: {2}'.format(file_path, len(matches), str_matches)) 218 | else: 219 | message = MSG.info('{0}: No matches'.format(file_path)) 220 | return message 221 | 222 | 223 | def split_list(input_list, num_sub_lists): 224 | """ 225 | 226 | :param input_list: List to be split 227 | :param num_sub_lists: Number of sub lists to be split into 228 | :return: list containing sub lists 229 | """ 230 | output_list = [] 231 | # First make empty sub lists, one for each process 232 | for n in range(num_sub_lists): 233 | output_list.append([]) 234 | # Now add file paths evenly to them 235 | count = 0 236 | for item in input_list: 237 | output_list[count % num_sub_lists].append(item) 238 | count += 1 239 | 240 | return output_list 241 | 242 | 243 | def worker(file_list): 244 | """ 245 | This is the function detailing what each worker (process) will do. 246 | :param file_list: list of full file paths to process 247 | :return: list of results for each file 248 | """ 249 | 250 | import time 251 | 252 | global MSG # Specify global messenger 253 | global config # Specify global config 254 | 255 | # Load rules from global variable 256 | yara_rules = yara.load(config['compiled_path']) 257 | results = [] 258 | for path in file_list: 259 | MSG.info('Parsing {}'.format(path)) 260 | #parse_file(path, yara_rules) 261 | results.append(parse_file(path, yara_rules)) 262 | return results 263 | 264 | to_log = [] 265 | def log_result(result): 266 | #Writing directly to file from here causes broken lines, likely IO limitation 267 | if isinstance(result, list): 268 | for r1 in result: 269 | for r2 in r1: 270 | if r2 is not None: 271 | to_log.append(r2) 272 | elif isinstance(result, str): 273 | to_log.append(result) 274 | 275 | 276 | def main(conf): 277 | # Add and process arguments 278 | parser = argparse.ArgumentParser() 279 | parser.add_argument('rules_path', help="Directory containing .yar files to compile and search for") 280 | parser.add_argument('scan_path', help="Folder or drive letter to parse") 281 | parser.add_argument("-e", "--errors", help="Show all errors", action="store_true") 282 | parser.add_argument("-a", "--alerts", help="Show alerts only", action="store_true") 283 | parser.add_argument("-l", "--log", help="Output to specified log file") 284 | parser.add_argument("-m", "--maxsize", type=int, help="Set maximum file size (MB)") 285 | parser.add_argument("-c", "--cores", help="Number of cores to use (defaults to number on system if unspecified)") 286 | parser.add_argument("-x", "--existing_rules", help="if specified, look for .rules file in same path as script ", action="store_true") 287 | args = parser.parse_args() 288 | if args.errors: 289 | conf["errors"] = True 290 | if args.alerts: 291 | conf["alerts_only"] = True 292 | conf["errors"] = False 293 | if args.log: 294 | try: 295 | conf["log"] = open(args.log, 'w') 296 | except MyError as err: 297 | MSG.error("Could not create log file '{0}'".format(args.log)) 298 | MSG.error("Python error: {}".format(err)) 299 | exit(1) 300 | if args.maxsize: 301 | conf["maxsize"] = args.maxsize 302 | if args.maxsize > 1024: 303 | MSG.info("Setting the maximum file size above 1GB is strongly discouraged!") 304 | if args.cores: 305 | try: 306 | conf["cores"] = int(args.cores) 307 | except ValueError as err: 308 | MSG.error("Number of cores specified must be integer") 309 | MSG.error(err) 310 | exit(1) 311 | 312 | # Check required arguments provided 313 | if (os.path.exists(args.rules_path)) and (os.path.exists(args.scan_path)): 314 | conf["rules_path"] = args.rules_path 315 | # Check to see if existing rules file should be used 316 | print(args.existing_rules) 317 | if args.existing_rules is True: 318 | 319 | # Look for 'compiled_yara_rules.rules' in working directory 320 | if os.path.isfile(conf["compiled_path"]) is True: 321 | MSG.info("Existing rules file found, using that") 322 | else: 323 | MSG.error("Existing rules file specified, but not found in working directory", sub_code="FILE") 324 | MSG.error( 325 | "Ensure {0} exists in same path as script, or remove '-x' switch".format(conf["compiled_path"])) 326 | exit(1) 327 | else: 328 | MSG.info("Compiling rules from {}".format(conf["rules_path"])) 329 | compile_rules(conf["rules_path"], conf["compiled_path"]) 330 | conf["scan_path"] = args.scan_path 331 | else: 332 | MSG.error("Could not read rules or scan path!") 333 | exit(1) 334 | 335 | # Build list of files to process, conduct pre-processing 336 | MSG.info("BUILDING FILE LIST FOR PARSING") 337 | list_files = [] 338 | for root, directories, file_names in os.walk(conf["scan_path"]): 339 | for name in file_names: 340 | path = os.path.join(root, name) 341 | # Check for file size 342 | try: 343 | mb = round(os.path.getsize(path) / 1024 / 1024) 344 | except: 345 | MSG.error("Unable to read file " +path +" Check permissions?") 346 | continue 347 | if mb > conf['maxsize']: 348 | MSG.error("{0} ({1}MB): File larger than maxsize ({2}MB)".format(path, mb, conf['maxsize'])) 349 | else: 350 | # parse_file(path, yara_compiled_path) # Use this for checking one at a time 351 | list_files.append(path) 352 | 353 | # Build process pool with specified number of workers 354 | pool = multiprocessing.Pool(processes=conf["cores"]) 355 | 356 | # Split list_files into sub lists for each sub process 357 | MSG.info("Splitting input file list into {0} sub lists for sub-processes".format(conf["cores"])) 358 | lists_for_cores = split_list(list_files, conf['cores']) 359 | 360 | # Pass the work to separate workers, one for each sub process 361 | MSG.info("BEGINNING MULTI-THREADED PARSING OF FILES") 362 | # Record the start time 363 | start_time = time.time() 364 | 365 | #results = pool.map(worker, lists_for_cores) 366 | r = pool.map_async(worker, lists_for_cores, callback=log_result) 367 | r.wait() 368 | pool.close() 369 | pool.join() 370 | 371 | # Record the end time 372 | end_time = time.time() 373 | MSG.info("{0} parsed in {1} seconds".format(len(list_files), end_time - start_time)) 374 | 375 | 376 | 377 | """ 378 | # Left out for now, trying it with Pools 379 | with concurrent.futures.ProcessPoolExecutor(max_workers=12) as executor: 380 | for file_path in executor.map(parse_file, list_files): 381 | MSG.info('Parsing {0}'.format(file_path)) # file_path only prints 'None' 382 | pass 383 | """ 384 | 385 | MSG.info("Finished") 386 | log_file_path = "yarascan_{0}.txt".format(datetime.now().strftime('%Y-%m-%d-%H-%M-%S')) 387 | with open(log_file_path, 'w') as log_file: 388 | if to_log: 389 | for line in to_log: 390 | log_file.write(str(line) + "\n") 391 | 392 | if __name__ == '__main__': 393 | main(config) 394 | exit(0) --------------------------------------------------------------------------------