├── .gitignore ├── LICENSE ├── README.md ├── data └── grepbugs.db ├── etc └── grepbugs.cfg └── grepbugs.py /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | # Created when GrepBugs runs 3 | log 4 | remotesrc 5 | out 6 | 7 | # This is uploaded from the master database with each run of GrepBugs 8 | data/grepbugs.json 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc., 5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Lesser General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | {description} 294 | Copyright (C) {year} {fullname} 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License along 307 | with this program; if not, write to the Free Software Foundation, Inc., 308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 309 | 310 | Also add information on how to contact you by electronic and paper mail. 311 | 312 | If the program is interactive, make it output a short notice like this 313 | when it starts in an interactive mode: 314 | 315 | Gnomovision version 69, Copyright (C) year name of author 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 317 | This is free software, and you are welcome to redistribute it 318 | under certain conditions; type `show c' for details. 319 | 320 | The hypothetical commands `show w' and `show c' should show the appropriate 321 | parts of the General Public License. Of course, the commands you use may 322 | be called something other than `show w' and `show c'; they could even be 323 | mouse-clicks or menu items--whatever suits your program. 324 | 325 | You should also get your employer (if you work as a programmer) or your 326 | school, if any, to sign a "copyright disclaimer" for the program, if 327 | necessary. Here is a sample; alter the names: 328 | 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 331 | 332 | {signature of Ty Coon}, 1 April 1989 333 | Ty Coon, President of Vice 334 | 335 | This General Public License does not permit incorporating your program into 336 | proprietary programs. If your program is a subroutine library, you may 337 | consider it more useful to permit linking proprietary applications with the 338 | library. If this is what you want to do, use the GNU Lesser General 339 | Public License instead of this License. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | GrepBugs 2 | ======== 3 | 4 | A regex based source code scanner. 5 | 6 | ## Usage 7 | ``` 8 | python grepbugs.py -d 9 | python grepbugs.py -r github -a 10 | python grepbugs.py -r github -a -f 11 | ``` 12 | 13 | The latest regular expressions will be pulled from https://www.grepbugs.com 14 | You can now sign-in at https://grepbugs.com/login to contribute regex rules. 15 | 16 | A basic HTML report will be generated in the out/ directory. A 17 | tab-delimited file with a subset of the information is also created. 18 | 19 | Example reports: https://www.grepbugs.com/reports 20 | 21 | ### Offline Usage 22 | If you need to run grepbugs when there is not Internet connection then you should, before going offline, download the rules file from https://grepbugs.com/rules and save it to `GrepBugs/data/grepbugs.json`. 23 | 24 | ## Configuration 25 | The `etc/grepbugs.cfg` file can be used to configure: 26 | - MySQL database (for storing scan results) 27 | - Path to grep binary 28 | 29 | ## Dependencies 30 | - GNU grep (http://www.gnu.org/software/grep/) 31 | - On Debian run: apt-get install grep 32 | - On OSX, you will need to install gnu grep (see http://www.heystephenwood.com/2013/09/install-gnu-grep-on-mac-osx.html) 33 | - On Windows, download the installer package from http://gnuwin32.sourceforge.net/packages/grep.htm 34 | 35 | - cloc (http://cloc.sourceforge.net/) 36 | - On Debian run: `apt-get install cloc` 37 | - On OSX run: `brew install cloc` 38 | - On Windows, download the binary from http://sourceforge.net/projects/cloc/files/cloc/v1.64/ 39 | 40 | - git (http://git-scm.com/) 41 | - Only required if you want to do repo scanning 42 | - On Debian run: `apt-get install git` 43 | - On OSX, configure Xcode command line tools 44 | 45 | - svn (https://subversion.apache.org/) 46 | - Only required if you want to do repo scanning 47 | - On Debian run: `apt-get install subversion` 48 | - On OSX, configure Xcode command line tools 49 | 50 | - MySQL support 51 | - On Debian run: `apt-get install python-mysqldb` and if this does not work then try one of these: 52 | - `apt-get install libmysqlclient-dev` 53 | - `pip install MySQL-python` 54 | 55 | - requests (http://docs.python-requests.org/en/latest/) 56 | - On Debian run one of these two options: 57 | - `apt-get install python-requests` 58 | - `pip install requests` 59 | - On OSX run: `pip install requests` 60 | - On Windows, 61 | 1. Download tarball: https://github.com/kennethreitz/requests/zipball/master 62 | 2. Manually install requests via `python setup.py install` 63 | 64 | ## Using MySQL Database 65 | Create a database and run the following create statements. 66 | 67 | ``` 68 | CREATE TABLE `projects` ( 69 | `project_id` varchar(36) NOT NULL, 70 | `repo` varchar(50) NOT NULL, 71 | `account` varchar(50) NOT NULL, 72 | `project` varchar(100) DEFAULT NULL, 73 | `default_branch` varchar(50) DEFAULT NULL, 74 | `last_scan` datetime DEFAULT NULL, 75 | PRIMARY KEY (`project_id`), 76 | KEY `idx_account` (`account`) 77 | ); 78 | 79 | CREATE TABLE `results` ( 80 | `result_id` varchar(36) NOT NULL, 81 | `scan_id` varchar(36) NOT NULL, 82 | `language` varchar(50) DEFAULT NULL, 83 | `regex_id` int(11) DEFAULT NULL, 84 | `regex_text` text, 85 | `description` text, 86 | PRIMARY KEY (`result_id`), 87 | KEY `idx_scan_id` (`scan_id`) 88 | ); 89 | 90 | CREATE TABLE `results_detail` ( 91 | `result_detail_id` varchar(36) NOT NULL, 92 | `result_id` varchar(36) NOT NULL, 93 | `file` text, 94 | `line` int(11) DEFAULT NULL, 95 | `code` text, 96 | PRIMARY KEY (`result_detail_id`), 97 | KEY `idx_result_id` (`result_id`) 98 | ); 99 | 100 | CREATE TABLE `scans` ( 101 | `scan_id` varchar(36) NOT NULL, 102 | `project_id` varchar(36) DEFAULT NULL, 103 | `date_time` datetime DEFAULT NULL, 104 | `cloc_out` text, 105 | PRIMARY KEY (`scan_id`), 106 | KEY `idx_project_id` (`project_id`) 107 | ); 108 | ``` 109 | 110 | ## Using on Windows 111 | 112 | The Windows instructions are beta (we've done it once!) and we welcome 113 | suggestions from users. Install python on Windows and make sure requests is 114 | installed too. Install grep and cloc as needed, then modify the configuration 115 | file with the full path to the binaries if they are not on the path. We are 116 | unsure if you use a single \\ or a double one in the PATH or if you can specify 117 | drives. Modify the tmpdir setting to a location which exists. 118 | 119 | Then, run grepbugs as normal. It should work correctly. 120 | -------------------------------------------------------------------------------- /data/grepbugs.db: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/foospidy/GrepBugs/3694cc60f1c28a2138be8d051011e6e4c77e8ed8/data/grepbugs.db -------------------------------------------------------------------------------- /etc/grepbugs.cfg: -------------------------------------------------------------------------------- 1 | # grepbugs config file 2 | 3 | [database] 4 | # by default grepbugs will store results in sqlite. 5 | # to store results in MySQL change database to mysql 6 | # and set the remaining parameters accordingly. 7 | database = sqlite3 8 | host = localhost 9 | dbname = 10 | dbuname = 11 | dbpword = 12 | 13 | [grep] 14 | binary = grep 15 | 16 | [cloc] 17 | binary = cloc 18 | 19 | [rules] 20 | url = https://grepbugs.com/rules 21 | 22 | [paths] 23 | # Some people like /var/tmp or even c:\temp 24 | # Directory must already exist 25 | tmpdir = /tmp 26 | 27 | [output] 28 | # Extension on the tab delimited file, you must specify the "." 29 | # When using csv, Excel sometimes loads it wrong 30 | tabsext = .tabs.txt 31 | -------------------------------------------------------------------------------- /grepbugs.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # GrepBugs Copyright (c) 2014-2017 GrepBugs.com 3 | # 4 | # GrepBugs is licensed under GPL v2.0 or later; please see the main 5 | # LICENSE file in the installation folder for more information. 6 | # 7 | 8 | import os 9 | import sys 10 | import shutil 11 | import argparse 12 | import uuid 13 | import requests 14 | import json 15 | import datetime 16 | import sqlite3 as lite 17 | from subprocess import call 18 | import subprocess 19 | import cgi 20 | import time 21 | import logging 22 | import ConfigParser 23 | 24 | cfgfile = os.path.dirname(os.path.abspath(__file__)) + '/etc/grepbugs.cfg' 25 | dbfile = os.path.dirname(os.path.abspath(__file__)) + '/data/grepbugs.db' 26 | gbfile = os.path.dirname(os.path.abspath(__file__)) + '/data/grepbugs.json' 27 | logfile = os.path.dirname(os.path.abspath(__file__)) + '/log/grepbugs.log' 28 | 29 | # get configuration 30 | gbconfig = ConfigParser.ConfigParser() 31 | gbconfig.read(cfgfile) 32 | 33 | # determine which binary executables to use 34 | grepbin = gbconfig.get('grep', 'binary') 35 | clocbin = gbconfig.get('cloc', 'binary') 36 | 37 | tmpdir = gbconfig.get('paths', 'tmpdir') 38 | tabsext = gbconfig.get('output', 'tabsext') 39 | 40 | 41 | # BSD and OS X grep do not support -P; change this to path to GNU grep; e.g. /usr/local/bin/grep, ggrep, etc. 42 | # http://www.heystephenwood.com/2013/09/install-gnu-grep-on-mac-osx.html 43 | if 'darwin' == sys.platform: 44 | for root, dirnames, filenames in os.walk('/usr/local/Cellar/grep'): 45 | for filename in filenames: 46 | if 'ggrep' == filename: 47 | grepbin = os.path.join(root, filename) 48 | 49 | # print "Debug: grepbin = " + grepbin # uncomment to debug your grep path 50 | 51 | # setup logging; create directory if it doesn't exist, and configure logging 52 | if not os.path.exists(os.path.dirname(logfile)): 53 | os.makedirs(os.path.dirname(logfile)) 54 | 55 | logging.basicConfig(filename=logfile, level=logging.DEBUG, format='%(asctime)s %(levelname)s %(message)s', datefmt='%Y-%m-%d %H:%M:%S') 56 | 57 | def local_scan(srcdir, repo='none', account='local_scan', project='none', default_branch='none', no_reports=False): 58 | """ 59 | Perform a scan of local files 60 | """ 61 | # new scan so new scan_id 62 | scan_id = str(uuid.uuid1()) 63 | clocsql = '/tmp/gb.cloc.' + scan_id + '.sql' 64 | basedir = os.path.dirname(os.path.abspath(__file__)) + '/' + srcdir.rstrip('/') 65 | logging.info('Using grep binary ' + grepbin) 66 | logging.info('Starting local scan with scan id ' + scan_id) 67 | 68 | # get db connection 69 | if 'mysql' == gbconfig.get('database', 'database'): 70 | try: 71 | import MySQLdb 72 | mysqldb = MySQLdb.connect(host=gbconfig.get('database', 'host'), user=gbconfig.get('database', 'dbuname'), passwd=gbconfig.get('database', 'dbpword'), db=gbconfig.get('database', 'dbname')) 73 | mysqlcur = mysqldb.cursor() 74 | except Exception as e: 75 | print 'Error connecting to MySQL! See log file for details.' 76 | logging.debug('Error connecting to MySQL: ' + str(e)) 77 | sys.exit(1) 78 | 79 | try: 80 | db = lite.connect(dbfile) 81 | cur = db.cursor() 82 | 83 | except lite.Error as e: 84 | print 'Error connecting to db file! See log file for details.' 85 | logging.debug('Error connecting to db file: ' + str(e)) 86 | sys.exit(1) 87 | except Exception as e: 88 | print 'CRITICAL: Unhandled exception occured! Quiters gonna quit! See log file for details.' 89 | logging.critical('Unhandled exception: ' + str(e)) 90 | sys.exit(1) 91 | 92 | if args.u == True: 93 | print 'Scanning with existing rules set' 94 | logging.info('Scanning with existing rules set') 95 | else: 96 | # get latest greps 97 | download_rules() 98 | 99 | # prep db for capturing scan results 100 | try: 101 | # clean database 102 | cur.execute("DROP TABLE IF EXISTS metadata;") 103 | cur.execute("DROP TABLE IF EXISTS t;") 104 | cur.execute("VACUUM") 105 | 106 | # update database with new project info 107 | if 'none' == project: 108 | project = srcdir 109 | 110 | # query database 111 | params = [repo, account, project] 112 | if 'mysql' == gbconfig.get('database', 'database'): 113 | mysqlcur.execute("SELECT project_id FROM projects WHERE repo=%s AND account=%s AND project=%s LIMIT 1;", params) 114 | rows = mysqlcur.fetchall() 115 | else: 116 | cur.execute("SELECT project_id FROM projects WHERE repo=? AND account=? AND project=? LIMIT 1;", params) 117 | rows = cur.fetchall() 118 | 119 | # assume new project by default 120 | newproject = True 121 | 122 | for row in rows: 123 | # not so fast, not a new project 124 | newproject = False 125 | project_id = row[0] 126 | 127 | if True == newproject: 128 | project_id = str(uuid.uuid1()) 129 | params = [project_id, repo, account, project, default_branch] 130 | if 'mysql' == gbconfig.get('database', 'database'): 131 | mysqlcur.execute("INSERT INTO projects (project_id, repo, account, project, default_branch) VALUES (%s, %s, %s, %s, %s);", params) 132 | else: 133 | cur.execute("INSERT INTO projects (project_id, repo, account, project, default_branch) VALUES (?, ?, ?, ?, ?);", params) 134 | 135 | # update database with new scan info 136 | params = [scan_id, project_id] 137 | if 'mysql' == gbconfig.get('database', 'database'): 138 | mysqlcur.execute("INSERT INTO scans (scan_id, project_id) VALUES (%s, %s);", params) 139 | mysqldb.commit() 140 | else: 141 | cur.execute("INSERT INTO scans (scan_id, project_id) VALUES (?, ?);", params) 142 | db.commit() 143 | 144 | except Exception as e: 145 | print 'CRITICAL: Unhandled exception occured! Quiters gonna quit! See log file for details.' 146 | logging.critical('Unhandled exception: ' + str(e)) 147 | sys.exit(1) 148 | 149 | # execute cloc to get sql output 150 | try: 151 | print 'counting source files...' 152 | logging.info('Running cloc for sql output.') 153 | return_code = call(["cloc", "--skip-uniqueness", "--quiet", "--sql=" + clocsql, "--sql-project=" + srcdir, srcdir]) 154 | if 0 != return_code: 155 | logging.debug('WARNING: cloc did not run normally. return code: ' + str(return_code)) 156 | 157 | # run sql script generated by cloc to save output to database 158 | f = open(clocsql, 'r') 159 | cur.executescript(f.read()) 160 | db.commit() 161 | f.close() 162 | os.remove(clocsql) 163 | 164 | except Exception as e: 165 | print 'Error executing cloc sql! Aborting scan! See log file for details.' 166 | logging.debug('Error executing cloc sql (scan aborted). It is possible there were no results from running cloc.: ' + str(e)) 167 | return scan_id 168 | 169 | # query cloc results 170 | cur.execute("SELECT Language, count(File), SUM(nBlank), SUM(nComment), SUM(nCode) FROM t GROUP BY Language ORDER BY Language;") 171 | 172 | rows = cur.fetchall() 173 | cloctxt = '-------------------------------------------------------------------------------' + "\n" 174 | cloctxt += 'Language files blank comment code' + "\n" 175 | cloctxt += '-------------------------------------------------------------------------------' + "\n" 176 | 177 | sum_files = 0 178 | sum_blank = 0 179 | sum_comment = 0 180 | sum_code = 0 181 | 182 | for row in rows: 183 | cloctxt += '{0:20} {1:>12} {2:>13} {3:>14} {4:>14}'.format(str(row[0]), str(row[1]), str(row[2]), str(row[3]), str(row[4])) + "\n" 184 | sum_files += row[1] 185 | sum_blank += row[2] 186 | sum_comment += row[3] 187 | sum_code += row[4] 188 | 189 | cloctxt += '-------------------------------------------------------------------------------' + "\n" 190 | cloctxt += '{0:20} {1:>12} {2:>13} {3:>14} {4:>14}'.format('Sum', str(sum_files), str(sum_blank), str(sum_comment), str(sum_code)) + "\n" 191 | cloctxt += '-------------------------------------------------------------------------------' + "\n" 192 | 193 | # execute cloc again to get txt output 194 | try: 195 | params = [cloctxt, scan_id] 196 | if 'mysql' == gbconfig.get('database', 'database'): 197 | mysqlcur.execute("UPDATE scans SET date_time=NOW(), cloc_out=%s WHERE scan_id=%s;", params) 198 | mysqldb.commit() 199 | else: 200 | cur.execute("UPDATE scans SET cloc_out=? WHERE scan_id=?;", params) 201 | db.commit() 202 | 203 | except Exception as e: 204 | print 'Error saving cloc txt! Aborting scan! See log file for details.' 205 | logging.debug('Error saving cloc txt (scan aborted): ' + str(e)) 206 | return scan_id 207 | 208 | # load json data 209 | try: 210 | logging.info('Reading grep rules from json file.') 211 | json_file = open(gbfile, "r") 212 | greps = json.load(json_file) 213 | json_file.close() 214 | except Exception as e: 215 | print 'CRITICAL: Unhandled exception occured! Quiters gonna quit! See log file for details.' 216 | logging.critical('Unhandled exception: ' + str(e)) 217 | sys.exit(1) 218 | 219 | # query database 220 | cur.execute("SELECT DISTINCT Language FROM t ORDER BY Language;") 221 | rows = cur.fetchall() 222 | 223 | # grep all the bugs and output to file 224 | print 'grepping for bugs...' 225 | logging.info('Start grepping for bugs.') 226 | 227 | # get cloc extensions and create extension array 228 | clocext = '' 229 | proc = subprocess.Popen([clocbin, "--show-ext"], stdout=subprocess.PIPE) 230 | ext = proc.communicate() 231 | extarray = str(ext[0]).split("\n") 232 | 233 | # override some extensions 234 | extarray.append('inc -> PHP') 235 | 236 | # loop through languages identified by cloc 237 | for row in rows: 238 | count = 0 239 | # loop through all grep rules for each language identified by cloc 240 | for i in range(0, len(greps)): 241 | # if the language matches a language in the gb rules file then do stuff 242 | if row[0] == greps[i]['language']: 243 | 244 | # get all applicable extensions based on language 245 | extensions = [] 246 | for ii in range(0, len(extarray)): 247 | lang = str(extarray[ii]).split("->") 248 | if len(lang) > 1: 249 | if str(lang[1]).strip() == greps[i]['language']: 250 | extensions.append(str(lang[0]).strip()) 251 | 252 | # search with regex, filter by extensions, and capture result 253 | result = '' 254 | filter = [] 255 | 256 | # build filter by extension 257 | for e in extensions: 258 | filter.append('--include=*.' + e) 259 | 260 | try: 261 | proc = subprocess.Popen([grepbin, "-n", "-r", "-P"] + filter + [greps[i]['regex'], srcdir], stdout=subprocess.PIPE) 262 | result = proc.communicate() 263 | 264 | if len(result[0]): 265 | # update database with new results info 266 | result_id = str(uuid.uuid1()) 267 | params = [result_id, scan_id, greps[i]['language'], greps[i]['id'], greps[i]['regex'], greps[i]['description']] 268 | if 'mysql' == gbconfig.get('database', 'database'): 269 | mysqlcur.execute("INSERT INTO results (result_id, scan_id, language, regex_id, regex_text, description) VALUES (%s, %s, %s, %s, %s, %s);", params) 270 | mysqldb.commit() 271 | else: 272 | cur.execute("INSERT INTO results (result_id, scan_id, language, regex_id, regex_text, description) VALUES (?, ?, ?, ?, ?, ?);", params) 273 | db.commit() 274 | 275 | perline = str(result[0]).split("\n") 276 | for r in range(0, len(perline) - 1): 277 | try: 278 | rr = str(perline[r]).replace(basedir, '').split(':', 1) 279 | # update database with new results_detail info 280 | result_detail_id = str(uuid.uuid1()) 281 | code = str(rr[1]).split(':', 1) 282 | params = [result_detail_id, result_id, rr[0], code[0], str(code[1]).strip()] 283 | 284 | if 'mysql' == gbconfig.get('database', 'database'): 285 | mysqlcur.execute("INSERT INTO results_detail (result_detail_id, result_id, file, line, code) VALUES (%s, %s, %s, %s, %s);", params) 286 | mysqldb.commit() 287 | else: 288 | cur.execute("INSERT INTO results_detail (result_detail_id, result_id, file, line, code) VALUES (?, ?, ?, ?, ?);", params) 289 | db.commit() 290 | 291 | except lite.Error, e: 292 | print 'SQL error! See log file for details.' 293 | logging.debug('SQL error with params ' + str(params) + ' and error ' + str(e)) 294 | except Exception as e: 295 | print 'Error parsing result! See log file for details.' 296 | logging.debug('Error parsing result: ' + str(e)) 297 | 298 | except Exception as e: 299 | print 'Error calling grep! See log file for details' 300 | logging.debug('Error calling grep: ' + str(e)) 301 | 302 | params = [project_id] 303 | if 'mysql' == gbconfig.get('database', 'database'): 304 | mysqlcur.execute("UPDATE projects SET last_scan=NOW() WHERE project_id=%s;", params) 305 | mysqldb.commit() 306 | mysqldb.close() 307 | else: 308 | cur.execute("UPDATE projects SET last_scan=datetime('now') WHERE project_id=?;", params) 309 | db.commit() 310 | db.close() 311 | 312 | if not no_reports: 313 | html_report(scan_id) 314 | 315 | return scan_id 316 | 317 | def repo_scan(repo, account, force, no_reports): 318 | """ 319 | Check code out from a remote repo and scan import 320 | """ 321 | try: 322 | db = lite.connect(dbfile) 323 | cur = db.cursor() 324 | 325 | except lite.Error as e: 326 | print 'Error connecting to db file' 327 | logging.debug('Error connecting to db file' + str(e)) 328 | sys.exit(1) 329 | 330 | params = [repo] 331 | cur.execute("SELECT command, checkout_url, api_url FROM repo_sites WHERE site=? LIMIT 1;", params) 332 | rows = cur.fetchall() 333 | 334 | for row in rows: 335 | api_url = row[2].replace('ACCOUNT', account) 336 | 337 | if 'github' == repo: 338 | page = 1 339 | 340 | # call api_url 341 | # if request fails, try 3 times 342 | count = 0 343 | max_tries = 3 344 | logging.info('Calling github api for ' + api_url) 345 | while count < max_tries: 346 | try: 347 | r = requests.get(api_url + '?page=' + str(page) + '&per_page=100') 348 | 349 | if 200 != r.status_code: 350 | raise ValueError('Request failed!', r.status_code) 351 | 352 | data = r.json() 353 | 354 | # no exceptions so break out of while loop 355 | break 356 | 357 | except ValueError as e: 358 | count = count + 1 359 | logging.debug(str(e.args)) 360 | time.sleep(5) 361 | 362 | except requests.ConnectionError as e: 363 | count = count + 1 364 | if count <= max_tries: 365 | logging.warning('Error retreiving grep rules: ConnectionError (attempt ' + str(count) + ' of ' + str(max_tries) + '): ' + str(e)) 366 | time.sleep(3) # take a break, throttle a bit 367 | 368 | except requests.HTTPError as e: 369 | count = count + 1 370 | if count <= max_tries: 371 | logging.warning('Error retreiving grep rules: HTTPError (attempt ' + str(count) + ' of ' + str(max_tries) + '): ' + str(e)) 372 | time.sleep(3) # take a break, throttle a bit 373 | 374 | except requests.Timeout as e: 375 | count = count + 1 376 | if count <= max_tries: 377 | logging.warning('Error retreiving grep rules: Timeout (attempt ' + str(count) + ' of ' + str(max_tries) + '): ' + str(e)) 378 | time.sleep(3) # take a break, throttle a bit 379 | 380 | except Exception as e: 381 | print 'CRITICAL: Unhandled exception occured! Quiters gonna quit! See log file for details.' 382 | logging.critical('Unhandled exception: ' + str(e)) 383 | sys.exit(1) 384 | 385 | if count == max_tries: 386 | # grep rules were not retrieved, could be working with old rules. 387 | logging.critical('Error retreiving data from github api (no more tries left. could be using old grep rules.): ' + str(e)) 388 | sys.exit(1) 389 | 390 | while len(data): 391 | print 'Get page: ' + str(page) 392 | for i in range(0, len(data)): 393 | do_scan = True 394 | project_name = data[i]["name"] 395 | default_branch = data[i]["default_branch"] 396 | last_scanned = last_scan(repo, account, project_name) 397 | last_changed = datetime.datetime.strptime(data[i]['pushed_at'], "%Y-%m-%dT%H:%M:%SZ") 398 | checkout_url = 'https://github.com/' + account + '/' + project_name + '.git' 399 | cmd = 'git' 400 | 401 | print project_name + ' last changed on ' + str(last_changed) + ' and last scanned on ' + str(last_scanned) 402 | 403 | if None != last_scanned: 404 | if last_changed < last_scanned: 405 | do_scan = False 406 | time.sleep(1) # throttle requests; github could be temperamental 407 | 408 | if True == force: 409 | do_scan = True 410 | 411 | if True == do_scan: 412 | checkout_code(cmd, checkout_url, account, project_name) 413 | # scan local files 414 | local_scan(os.path.dirname(os.path.abspath(__file__)) + '/remotesrc/' + account + '/' + project_name, repo, account, project_name, default_branch, no_reports) 415 | # clean up because of big projects and stuff 416 | call(['rm', '-rf', os.path.dirname(os.path.abspath(__file__)) + '/remotesrc/' + account + '/' + project_name]) 417 | 418 | # get next page of projects 419 | page += 1 420 | r = requests.get(api_url + '?page=' + str(page) + '&per_page=100') 421 | data = r.json() 422 | 423 | elif 'bitbucket' == repo: 424 | # call api_url 425 | r = requests.get(api_url) 426 | data = r.json() 427 | 428 | for j in range(0, len(data["values"])): 429 | value = data["values"][j] 430 | 431 | if 'git' == value['scm']: 432 | do_scan = True 433 | project_name = str(value['full_name']).split('/')[1] 434 | last_scanned = last_scan(repo, account, project_name) 435 | date_split = str(value['updated_on']).split('.')[0] 436 | last_changed = datetime.datetime.strptime(date_split, "%Y-%m-%dT%H:%M:%S") 437 | checkout_url = 'https://bitbucket.org/' + value['full_name'] 438 | cmd = 'git' 439 | 440 | print project_name + ' last changed on ' + str(last_changed) + ' and last scanned on ' + str(last_scanned) 441 | 442 | if None != last_scanned: 443 | if last_changed < last_scanned: 444 | do_scan = False 445 | 446 | if True == do_scan: 447 | checkout_code(cmd, checkout_url, account, project_name) 448 | # scan local files 449 | local_scan(os.path.dirname(os.path.abspath(__file__)) + '/remotesrc/' + account + '/' + project_name, repo, account, project_name, 'none', no_reports) 450 | 451 | elif 'sourceforge' == repo: 452 | message = 'Support for sourceforge removed because of http://seclists.org/nmap-dev/2015/q2/194. You should move your project to another hosting site, such as GitHub or BitBucket.' 453 | logging.debug(message) 454 | print message 455 | """ 456 | # call api_url 457 | r = requests.get(api_url) 458 | data = r.json() 459 | 460 | for i in data['projects']: 461 | do_scan = True 462 | project_name = i["url"].replace('/p/', '').replace('/', '') 463 | cmd = None 464 | r = requests.get('https://sourceforge.net/rest' + i['url']) 465 | project_json = r.json() 466 | for j in project_json: 467 | for t in project_json['tools']: 468 | if 'code' == t['mount_point']: 469 | if 'git' == t['name']: 470 | cmd = 'git' 471 | checkout_url = 'git://git.code.sf.net/p/' + str(project_name).lower() + '/code' 472 | elif 'svn' == t['name']: 473 | cmd = 'svn' 474 | checkout_url = 'svn://svn.code.sf.net/p/' + str(project_name).lower() + '/code' 475 | 476 | last_scanned = last_scan(repo, account, project_name) 477 | date_split = i['last_updated'].split('.')[0] 478 | last_changed = datetime.datetime.strptime(date_split, "%Y-%m-%d %H:%M:%S") 479 | 480 | print project_name + ' last changed on ' + str(last_changed) + ' and last scanned on ' + str(last_scanned) 481 | 482 | if None != last_scanned: 483 | if last_changed < last_scanned: 484 | do_scan = False 485 | 486 | if True == do_scan: 487 | if None != cmd: 488 | checkout_code(cmd, checkout_url, account, project_name) 489 | # scan local files 490 | local_scan(os.path.dirname(os.path.abspath(__file__)) + '/remotesrc/' + account + '/' + project_name, repo, account, project_name) 491 | else: 492 | print 'No sourceforge repo for ' + account + ' ' + project_name 493 | """ 494 | 495 | db.close() 496 | # clean up 497 | try: 498 | shutil.rmtree(os.path.abspath(__file__) + '/remotesrc/' + account) 499 | except Exception as e: 500 | logging.debug('Error removing directory: ' + str(e)) 501 | 502 | print 'SCAN COMPLETE!' 503 | 504 | def download_rules(): 505 | url = gbconfig.get('rules', 'url') 506 | 507 | logging.info('Retreiving rules from ' + url) 508 | print 'attempting to retreive rules...' 509 | 510 | try: 511 | # if request fails, try 3 times 512 | count = 0 513 | max_tries = 3 514 | while count < max_tries: 515 | try: 516 | headers = {'User-agent': 'GrepBugs for Python (1.0)'} 517 | r = requests.get(url, headers=headers) 518 | 519 | with open(gbfile, 'wb') as jsonfile: 520 | jsonfile.write(r.text) 521 | 522 | print 'got rules!' 523 | 524 | # no exceptions so break out of while loop 525 | break 526 | except requests.ConnectionError as e: 527 | count = count + 1 528 | if count <= max_tries: 529 | logging.warning('Error retreiving grep rules: ConnectionError (attempt ' + str(count) + ' of ' + str(max_tries) + '): ' + str(e)) 530 | time.sleep(3) 531 | 532 | except requests.HTTPError as e: 533 | count = count + 1 534 | if count <= max_tries: 535 | logging.warning('Error retreiving grep rules: HTTPError (attempt ' + str(count) + ' of ' + str(max_tries) + '): ' + str(e)) 536 | time.sleep(3) 537 | 538 | except requests.Timeout as e: 539 | count = count + 1 540 | if count <= max_tries: 541 | logging.warning('Error retreiving grep rules: Timeout (attempt ' + str(count) + ' of ' + str(max_tries) + '): ' + str(e)) 542 | time.sleep(3) 543 | 544 | except Exception as e: 545 | print 'CRITICAL: Unhandled exception occured! Quiters gonna quit! See log file for details.' 546 | logging.critical('Unhandled exception: ' + str(e)) 547 | sys.exit(1) 548 | 549 | if count == max_tries: 550 | # method of last resort! 551 | print 'attempting download method of last resort...' 552 | proc = subprocess.Popen(["which", "wget"], stdout=subprocess.PIPE) 553 | out = proc.communicate() 554 | wget = str(out[0]).split("\n") 555 | 556 | if '' != wget[0].strip(): 557 | proc = subprocess.Popen([wget[0], "-O", gbfile, url], stdout=subprocess.PIPE) 558 | out = proc.communicate() 559 | else: 560 | # grep rules were not retrieved, could be working with old rules. 561 | logging.debug('Error retreiving grep rules (no more tries left. could be using old grep rules.): ' + str(e)) 562 | 563 | except Exception as e: 564 | print 'CRITICAL: Unhandled exception occured! Quiters gonna quit! See log file for details.' 565 | logging.critical('Unhandled exception: ' + str(e)) 566 | sys.exit(1) 567 | 568 | def checkout_code(cmd, checkout_url, account, project): 569 | account_folder = os.path.dirname(os.path.abspath(__file__)) + '/remotesrc/' + account 570 | 571 | if not os.path.exists(account_folder): 572 | os.makedirs(account_folder) 573 | 574 | # checkout code 575 | call(['rm', '-rf', account_folder + '/' + project]) 576 | if 'git' == cmd: 577 | # in cases where auth is required inject credentials into checkout_url. 578 | # clone does not require auth so injecting credentials has no impact. 579 | # however if an account is locked (e.g. github.com locks an account for copyright violations) 580 | # the clone command will be prompted for credentials. The default credentials are intended 581 | # to fail auth in this scenario. 582 | split_checkout_url = checkout_url.split('://') 583 | 584 | # repos with large history can take a long time to clone due to the "receiving objects/resolving deltas" 585 | # phase. To help reduce the time it takes to complete cloning they argument --depth=1 will be used. This 586 | # should become a command argument to grepbugs.py in the future. 587 | 588 | print 'git clone...' 589 | call(['git', 'clone', '--depth=1', split_checkout_url[0] + '://' + args.repo_user + ':' + args.repo_pass + '@' + split_checkout_url[1], account_folder + '/' + project]) 590 | elif 'svn' == cmd: 591 | # need to do a lot of craziness for svn, no wonder people use git now. 592 | print 'svn checkout...' 593 | found_trunk = False 594 | 595 | call(['svn', '-q', 'checkout', '--depth', 'immediates', checkout_url, account_folder + '/tmp/' + project]) 596 | 597 | # look for first level trunks 598 | for path, dirs, files in os.walk(os.path.abspath(account_folder + '/tmp/' + project)): 599 | for i in range(0, len(dirs)): 600 | if 'trunk' == dirs[i]: 601 | if os.path.isdir(path + '/' + dirs[i]): 602 | found_trunk = True 603 | print 'co ' + checkout_url + '/' + dirs[i] 604 | call(['svn', '-q', 'checkout', checkout_url + '/' + dirs[i], account_folder + '/' + project]) 605 | 606 | if False == found_trunk: 607 | # try looking for tunk in second level 608 | path = os.path.abspath(account_folder + '/tmp/' + project) 609 | for n in os.listdir(path): 610 | if os.path.isdir(path + '/' + n): 611 | if '.svn' != n: 612 | print 'co ' + checkout_url + '/' + n + '/trunk' 613 | return_code = call(['svn', '-q', 'checkout', checkout_url + '/' + n + '/trunk', account_folder + '/' + project]) 614 | if 0 == return_code: 615 | found_trunk = True 616 | 617 | if False == found_trunk: 618 | # didn't find a trunk, so checkout of last resort 619 | print 'WARNING: no trunk found so checking out everything. This could take a while and consume disk space if there are many branches.' 620 | call(['svn', '-q', 'checkout', checkout_url, account_folder + '/' + project]) 621 | 622 | # remove temp checkout 623 | call(['rm', '-rf', os.path.abspath(account_folder + '/tmp/')]) 624 | 625 | def last_scan(repo, account, project): 626 | if 'mysql' == gbconfig.get('database', 'database'): 627 | try: 628 | import MySQLdb 629 | mysqldb = MySQLdb.connect(host=gbconfig.get('database', 'host'), user=gbconfig.get('database', 'dbuname'), passwd=gbconfig.get('database', 'dbpword'), db=gbconfig.get('database', 'dbname')) 630 | mysqlcur = mysqldb.cursor() 631 | except Exception as e: 632 | print 'Error connecting to MySQL! See log file for details.' 633 | logging.debug('Error connecting to MySQL: ' + str(e)) 634 | sys.exit(1) 635 | 636 | else: 637 | try: 638 | db = lite.connect(dbfile) 639 | cur = db.cursor() 640 | 641 | except lite.Error as e: 642 | print 'Error connecting to db file! See log file for details.' 643 | logging.debug('Error connecting to db file: ' + str(e)) 644 | sys.exit(1) 645 | except Exception as e: 646 | print 'CRITICAL: Unhandled exception occured! Quiters gonna quit! See log file for details.' 647 | logging.critical('Unhandled exception: ' + str(e)) 648 | sys.exit(1) 649 | 650 | params = [repo, account, project] 651 | if 'mysql' == gbconfig.get('database', 'database'): 652 | mysqlcur.execute("SELECT last_scan FROM projects WHERE repo=%s AND account=%s and project=%s;", params) 653 | rows = mysqlcur.fetchall() 654 | else: 655 | cur.execute("SELECT last_scan FROM projects WHERE repo=? AND account=? and project=?;", params) 656 | rows = cur.fetchall() 657 | 658 | last_scan = None 659 | 660 | for row in rows: 661 | if None != row[0]: 662 | last_scan = datetime.datetime.strptime(str(row[0]), "%Y-%m-%d %H:%M:%S") 663 | 664 | if 'mysql' == gbconfig.get('database', 'database'): 665 | mysqldb.close() 666 | else: 667 | db.close() 668 | 669 | return last_scan 670 | 671 | def html_report(scan_id): 672 | """ 673 | Create html report for a given scan_id 674 | """ 675 | 676 | if 'mysql' == gbconfig.get('database', 'database'): 677 | try: 678 | import MySQLdb 679 | mysqldb = MySQLdb.connect(host=gbconfig.get('database', 'host'), user=gbconfig.get('database', 'dbuname'), passwd=gbconfig.get('database', 'dbpword'), db=gbconfig.get('database', 'dbname')) 680 | mysqlcur = mysqldb.cursor() 681 | except Exception as e: 682 | print 'Error connecting to MySQL! See log file for details.' 683 | logging.debug('Error connecting to MySQL: ' + str(e)) 684 | sys.exit(1) 685 | 686 | else: 687 | try: 688 | import sqlite3 as lite 689 | db = lite.connect(dbfile) 690 | cur = db.cursor() 691 | 692 | except lite.Error as e: 693 | print 'Error connecting to db file! See log file for details.' 694 | logging.debug('Error connecting to db file: ' + str(e)) 695 | sys.exit(1) 696 | except Exception as e: 697 | print 'CRITICAL: Unhandled exception occured! Quiters gonna quit! See log file for details.' 698 | logging.critical('Unhandled exception: ' + str(e)) 699 | sys.exit(1) 700 | 701 | html = '' 702 | h = 'ICAgX19fX19fICAgICAgICAgICAgICAgIF9fX18KICAvIF9fX18vX19fX19fXyAgX19fXyAgLyBfXyApX18gIF9fX19fXyBfX19fX18KIC8gLyBfXy8gX19fLyBfIFwvIF9fIFwvIF9fICAvIC8gLyAvIF9fIGAvIF9fXy8KLyAvXy8gLyAvICAvICBfXy8gL18vIC8gL18vIC8gL18vIC8gL18vIChfXyAgKQpcX19fXy9fLyAgIFxfX18vIC5fX18vX19fX18vXF9fLF8vXF9fLCAvX19fXy8KICAgICAgICAgICAgICAvXy8gICAgICAgICAgICAgICAgL19fX18v' 703 | params = [scan_id] 704 | 705 | if 'mysql' == gbconfig.get('database', 'database'): 706 | mysqlcur.execute("SELECT a.repo, a.account, a.project, b.scan_id, b.date_time, b.cloc_out FROM projects a, scans b WHERE a.project_id=b.project_id AND b.scan_id=%s LIMIT 1;", params) 707 | rows = mysqlcur.fetchall() 708 | else: 709 | cur.execute("SELECT a.repo, a.account, a.project, b.scan_id, b.date_time, b.cloc_out FROM projects a, scans b WHERE a.project_id=b.project_id AND b.scan_id=? LIMIT 1;", params) 710 | rows = cur.fetchall() 711 | 712 | # for loop on rows, but only one row 713 | for row in rows: 714 | print 'writing report...' 715 | htmlfile = os.path.dirname(os.path.abspath(__file__)) + '/out/' + row[0] + '.' + row[1] + '.' + row[2].replace("/", "_") + '.' + row[3] + '.html' 716 | tabfile = os.path.dirname(os.path.abspath(__file__)) + '/out/' + row[0] + '.' + row[1] + '.' + row[2].replace("/", "_") + '.' + row[3] + tabsext 717 | 718 | if not os.path.exists(os.path.dirname(htmlfile)): 719 | os.makedirs(os.path.dirname(htmlfile)) 720 | 721 | # include repo/account/project link 722 | if 'github' == row[0]: 723 | project_base_url = 'https://github.com/' + row[1] + '/' + row[2] 724 | link = '(' + project_base_url + ')' 725 | else: 726 | project_base_url = '' 727 | link = '' 728 | 729 | o = open(htmlfile, 'w') 730 | o.write(""" 731 | """) 740 | o.write("
\n" + h.decode('base64') + "
") 741 | o.write("\n\n
"
742 | 				+ "\nrepo:     " + row[0]
743 | 				+ "\naccount:  " + row[1]
744 | 				+ "\nproject:  " + row[2] + "   " + link
745 | 				+ "\nscan id:  " + row[3]
746 | 				+ "\ndate:     " + str(row[4]) + "
\n") 747 | #o.write("
\n" + str(row[5]).replace("\n", "
") + "
") 748 | o.write("
\n" + row[5] + "
") 749 | o.close() 750 | 751 | t = open(tabfile, 'w') 752 | t.write("GrepBugs\n") 753 | t.write("repo:\t" + row[0] + "\naccount:\t" + row[1] + "\nproject:\t" + row[2] + " " + link + "\nscan id:\t" + row[3] + "\ndate:\t" + str(row[4]) + "\n") 754 | t.close() 755 | 756 | if 'mysql' == gbconfig.get('database', 'database'): 757 | mysqlcur.execute("SELECT b.language, b.regex_text, b.description, c.result_detail_id, c.file, c.line, c.code FROM scans a, results b, results_detail c WHERE a.scan_id=%s AND a.scan_id=b.scan_id AND b.result_id=c.result_id ORDER BY b.language, b.regex_id, c.file;", params) 758 | rs = mysqlcur.fetchall() 759 | else: 760 | cur.execute("SELECT b.language, b.regex_text, b.description, c.result_detail_id, c.file, c.line, c.code FROM scans a, results b, results_detail c WHERE a.scan_id=? AND a.scan_id=b.scan_id AND b.result_id=c.result_id ORDER BY b.language, b.regex_id, c.file;", params) 761 | rs = cur.fetchall() 762 | 763 | o = open(htmlfile, 'a') 764 | t = open(tabfile, 'a') 765 | html = "\n\n" 766 | tabs = "\n\nlang\tdescription\tfile\tline\tc.code\n" 767 | language = '' 768 | regex = '' 769 | count = 0 770 | 771 | # loop through all results, do some fancy coordination for output 772 | for r in rs: 773 | tab_lang = r[0].replace("\t"," ").replace("\n"," ").replace("\r"," ") 774 | #tab_regex = r[1].replace("\t"," ").replace("\n"," ").replace("\r"," ") 775 | tab_desc = r[2].replace("\t"," ").replace("\n"," ").replace("\r"," ") 776 | #tab_id = r[3].replace("\t"," ").replace("\n"," ").replace("\r"," ") 777 | tab_file = r[4].replace("\t"," ").replace("\n"," ").replace("\r"," ") 778 | tab_line = str(r[5]) 779 | tab_code = r[6].replace("\t"," ").replace("\n"," ").replace("\r"," ") 780 | 781 | tabs += tab_lang +"\t"+ tab_desc +"\t"+ tab_file +"\t"+ tab_line +"\t"+ tab_code +"\n" 782 | 783 | if regex != r[1]: 784 | if 0 != count: 785 | html += ' ' + "\n"; # end result set for regex 786 | 787 | if language != r[0]: 788 | html += '

' + r[0] + '

' + "\n" 789 | 790 | if regex != r[1]: 791 | html += ' \n" 792 | html += ' ' + "\n" 824 | tabs += "\n" 825 | 826 | html += '' 827 | o.write(html) 828 | o.close() 829 | t.write(tabs) 830 | t.close() 831 | 832 | if 'mysql' == gbconfig.get('database', 'database'): 833 | mysqldb.close() 834 | else: 835 | db.close() 836 | 837 | """ 838 | Handle and process command line arguments 839 | """ 840 | parser = argparse.ArgumentParser(description='At minimum, the -d or -r options must be specified.') 841 | parser.add_argument('-d', help='specify a LOCAL directory to scan.') 842 | parser.add_argument('-f', help='force scan even if project has not been modified since last scan.', default=False, action="store_true") 843 | parser.add_argument('-u', help='Use existing rules, do not download updated set.', default=False, action="store_true") 844 | 845 | group = parser.add_argument_group('REMOTE Repository Scanning') 846 | group.add_argument('-r', help='specify a repo to scan (e.g. github, bitbucket, or sourceforge).') 847 | group.add_argument('-a', help='specify an account for the specified repo.') 848 | group.add_argument('-repo_user', help='specify a username to be used in authenticating to the specified repo (default: grepbugs).', default='grepbugs') 849 | group.add_argument('-repo_pass', help='specify a password to be used in authenticating to the specified repo (default: grepbugs).', default='grepbugs') 850 | parser.add_argument('-no_reports', help='Do not generate reports, only store results in the database.', default=False, action="store_true") 851 | 852 | args = parser.parse_args() 853 | 854 | if None == args.d and None == args.r: 855 | parser.print_help() 856 | sys.exit(1) 857 | 858 | if None != args.d: 859 | print 'scan directory: ' + args.d 860 | scan_id = local_scan(args.d) 861 | elif None != args.r: 862 | if None == args.a: 863 | print 'an account must be specified! use -a to specify an account.' 864 | sys.exit(1) 865 | 866 | print 'scan repo: ' + args.r + ' ' + args.a 867 | scan_id = repo_scan(args.r, args.a, args.f, args.no_reports) 868 | --------------------------------------------------------------------------------