├── .gitignore ├── README.md ├── core ├── __init__.py ├── codechecker.py ├── datastore.py ├── elastic_search_helpers.py ├── evaluators.py ├── git_repo_querier.py ├── git_repo_updater.py ├── notifier.py ├── repository_handler.py └── ruleparser.py ├── etc └── config.yml.template ├── guardserver ├── .gitignore ├── default_config.py ├── guardserver.py ├── repoguard.apache.vhost.sample ├── repoguard.wsgi ├── server │ ├── __init__.py │ ├── authentication.py │ ├── filters.py │ └── issues.py ├── setup_ui_dependencies.sh └── static │ ├── css │ └── repoguard.css │ ├── fonts │ ├── glyphicons-halflings-regular.eot │ ├── glyphicons-halflings-regular.svg │ ├── glyphicons-halflings-regular.ttf │ └── glyphicons-halflings-regular.woff │ ├── img │ └── repo_guard.png │ ├── index.html │ └── js │ ├── daterange.js │ ├── filters.js │ ├── issues.js │ ├── paginate.js │ └── user.js ├── repoguard.py ├── repominer.py ├── requirements-test.txt ├── requirements.txt ├── rules ├── action_script.yml ├── alert_config_exported.yml ├── android.yml ├── chef.yml ├── comments.yml ├── cpp.yml ├── csrf.yml ├── generic_best_practices.yml ├── java.yml ├── jsonp.yml ├── known_vulnerabilities.yml ├── object_enumeration.yml ├── open_redirect.yml ├── os_code_exec.yml ├── scala.yml ├── secrets.yml ├── smoketest.yml ├── sql.yml ├── whitelisted_files.yml ├── xss.yml └── xxe.yml ├── testrules.py └── tests ├── __init__.py ├── base.py ├── test_codechecker.py ├── test_data ├── test_repo_list.json ├── test_repo_status.json ├── test_response_01.json ├── test_response_02.json └── test_response_03.json ├── test_evaluators.py ├── test_git_repo_updater.py ├── test_notifier.py ├── test_repoguard.py ├── test_repository_handler.py └── test_ruleparser.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.py[co] 2 | *~ 3 | .#* 4 | *.pid 5 | virtualenv 6 | repos 7 | **/config.yml 8 | **/.idea 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Repoguard 2 | 3 | Repoguard is a simple generic tool to check and alert on any change in git repositories which might be interesting for you. 4 | 5 | We created repoguard to help us (the security team at Prezi) to detect changes which might lead to security issues in the high amount of commits and repositories we have. 6 | 7 | It can track all the repositories in a Github organization and send email (or store the result in Elasticsearch) 8 | if it founds a dangerous or interesting line. It uses an easily extendable ruleset (regular expressions) with 9 | existing rules for languages like Python, Java, Javascript, C(++), Chef or Scala. 10 | 11 | We encourage everyone to add new rules or improve the existing ones! :) 12 | 13 | ## Repominer 14 | 15 | Repominer is the little brother of Repoguard. It can be used to check a local directory for dangerous lines. We believe it could be useful for security code reviews, where you don't have to care about previous commits, but just the current state. 16 | It uses the same ruleset and configuration. 17 | 18 | ## Installation 19 | 20 | Installing and running the project is pretty simple: 21 | 22 | ``` 23 | git clone https://github.com/prezi/repoguard.git 24 | cd repoguard 25 | virtualenv virtualenv 26 | . virtualenv/bin/activate 27 | pip install -r requirements.txt 28 | mkdir repos 29 | python repoguard.py --config --working-dir './repos' --since '2014-08-01 00:00:00' --refresh 30 | ``` 31 | 32 | And setup a cron job which calls this script periodically. 33 | 34 | ## Usage 35 | 36 | Syncing with Github API (--refresh), pulling changes and alerting in mail (--notify): 37 | ``` 38 | python repoguard.py --refresh --notify --working-dir ../repos/ 39 | ``` 40 | 41 | Pulling new changes, checking for alerts and notifying in mail (--notify) + send results to ElasticSearch (--store): 42 | ``` 43 | python repoguard.py --notify --store elasticsearch.host:9200 --working-dir ../repos/ 44 | ``` 45 | 46 | Don't pull new changes (--nopull), check for alerts since given time (--since): 47 | ``` 48 | python repoguard.py --nopull --since "2014-08-12 03:00" --working-dir ../repos/ 49 | ``` 50 | 51 | Pull new changes and check commits which were already checked (--ignorestatus) with custom rules defined in directory "custom_rules" (--rule-dir): 52 | ``` 53 | python repoguard.py --ignorestatus --rule-dir ../custom_rules --working-dir ../repos/ 54 | ``` 55 | 56 | Don't pull new changes (--nopull) only for repository "foobaar" and "starfleet" (--limit) and check all allerts defined in "xss.yml" and "xxe.yml" (--alerts) since long-long time ago (--since): 57 | ``` 58 | python repoguard.py --nopull --limit "foobaar,starfleet" --alerts "xss::*,xxe::*" --since "2010-01-01" --working-dir ../repos/ 59 | 60 | ``` 61 | 62 | ## The Repoguard configuration file 63 | 64 | Repoguard needs a Github API token (can be generated on Github's settings page) in order to be able to fetch 65 | the repositories of your organization. It has to be defined in the config file: 66 | ``` 67 | github: 68 | token: "" 69 | organization_name: "" 70 | ``` 71 | 72 | It is possible to send specific alerts to specific email addresses, therefore it is possible to define 73 | custom rules which is only interesting for a subset of people (e.g. our data team has their own rules 74 | for detecting changes in the log format). 75 | 76 | ## The RepoGuard UI 77 | 78 | RepoGuard ships with a UI and a backend API. The UI is written using Bootstrap 3 and jQuery, making 79 | XHR calls to an API backend that is written in Flask. 80 | 81 | The RepoGuard UI lives in the `guardserver` directory, and ships with a default configuration `default_config.py` 82 | that needs to be renamed to `config.py` and modified. In order for the application to function effectively, 83 | the following options needs to be set at a minimum: 84 | 85 | ```AUTHENTICATION_REQUIRED = False | True``` 86 | 87 | The UI needs your Github API token as well, you can set it under: 88 | 89 | ``` 90 | GITHUB_TOKEN = "" 91 | ORG_NAME = "" 92 | ``` 93 | 94 | In order to use the RepoGuard UI, you need to run `repoguard` with the `--store` option, and store 95 | the results in ElasticSearch. The ElasticSearch configurations need to be set in your `config.py` file: 96 | 97 | ``` 98 | ELASTIC_HOST = "localhost" 99 | ELASTIC_PORT = "9200" 100 | INDEX = "repoguard" 101 | DOC_TYPE = "repoguard" 102 | ``` 103 | 104 | If `AUTHENTICATION_REQUIRED` is set to `True`, then the following configurations also need to be set: 105 | 106 | ``` 107 | LDAP_DN = "cn=%s,ou=people,dc=example,dc=com" 108 | LDAP_SERVER = "" 109 | LDAP_OU = "" 110 | ``` 111 | 112 | Currently, you can ignore the `LDAP_USERNAME` and `LDAP_PASSWORD` options. 113 | 114 | Before you start the server, you need to download the dependencies - this should be a one-time task. A bash script, 115 | `setup_ui_dependencies.sh` is provided to download the dependencies into the correct folders. 116 | 117 | You can run the RepoGuard server directly via Python - which can be slow over the network, especially for multiple users, 118 | or on Apache and similar web servers (recommended). A sample Apache configuration file (`repoguard.apache.vhost.sample`) 119 | is included for your convenience. To run the server directly via Python, do the following: 120 | 121 | ``` 122 | export PYTHONPATH=/path/to/repoguard 123 | cd /path/to/repoguard/guardserver 124 | python guardserver.py 125 | ``` 126 | 127 | This will start the web server on `0.0.0.0:5000`. 128 | ## Creating rules 129 | 130 | We've shared most of our rules within the "rules" folder of this repository, but of course you can create your own ones as well (if you do so, we are happy to receive pull requests ;)). The rule files are pretty self explaining yaml files, however let's see an example and clarify what kind of things are possible. 131 | 132 | ### namespaces 133 | 134 | Repoguard will read all "yml" files recursively in the directory you define with the "--rule-dir" argument. Each yml rule file will have its own namespace (based on the filename). For example xss.yml rules will be under the xss:: namespace. 135 | 136 | ### abstract rules 137 | 138 | You can create abstract rules, which you can later use to set some defaults / extend other rules. All sections starting with ```#!~```` are handled as abstract rules, these won't run automatically (but you will be able to refer to them): 139 | 140 | ``` 141 | --- #!~base 142 | description: "Unescaped user input might lead to Cross-site scripting issues, please ensure that input can only come from trusted sources" 143 | extends: whitelisted_files::whitelisted_files,comments::comments 144 | ``` 145 | 146 | ### basic rule - simple line matching 147 | 148 | The following simple rule will extend the base abstract rule (inherit its settigns like description) and detect any change which adds a new line containing the string "|safe" or "{% autoescape off %}". 149 | 150 | ``` 151 | --- #!django 152 | extends: base 153 | diff: add 154 | line: 155 | - match: \|safe 156 | - match: "{% autoescape off %}" 157 | ``` 158 | 159 | Possible options for "diff" are: 160 | 161 | - all (default): no restrictions on the git diff, since we get the context as well it can match on anything within the context of the change (like method name / class name) 162 | - add: the diff line starts with + 163 | - del: the diff line starts with - 164 | - mod: the diff line starts with + or - 165 | 166 | Possible options for "line": 167 | - match: alert if the line contains the given regex 168 | - except: don't alert if the line contains the given regex (even if it matched any "match" line rules) 169 | 170 | There is an "or" condition between the different "match" regex patterns, there is an "or" condition between the different "except" regex pattern and an "and" condition between the "match" and "except" groups. 171 | 172 | ### advanced rule - line and file name matching 173 | 174 | The following rule will alert if the newly introduced code ("diff: add") matches the given regex (```(WebSocket|\.listen\(|http\.request|socket\.io).*```) except if it matches ```EventListener\.```. The rule will only check code if the file name matches ```.*\.(hx|js)$``` (hs or js file extension). 175 | 176 | ``` 177 | --- #!js_network_listen 178 | extends: base 179 | diff: add 180 | line: 181 | - match: (WebSocket|\.listen\(|http\.request|socket\.io).* 182 | - except: EventListener\. 183 | file: 184 | - match: .*\.(hx|js)$ 185 | ``` 186 | 187 | ### matching on context 188 | 189 | The following rule will alert if there were any changes within any method (line starts with "def") containing the string "auth" or "login" in any .py file except those which's filename contain the string "test": 190 | 191 | ``` 192 | extends: base 193 | diff: any 194 | line: 195 | - match: \s+def.*(auth|login).* 196 | file: 197 | - match: .*\.py$ 198 | - except: .*test.* 199 | ``` 200 | 201 | ### matching within script tags 202 | 203 | To detect possibly exploitable XSS attacks it is important to know if the matching line is within a script tag or not. The reason is simple: some frameworks do decent job in escaping strings in template files, however within a script tag the default escaping might not be enough. The following rule will alert if the added line contains {{SOMETHING except urlencode}}. In django the double curly brackets refer to a template variable, which if comes from a user supplied input proper escaping would be cruicial: 204 | 205 | ``` 206 | --- #!django_inscripttags 207 | extends: base 208 | diff: add 209 | line: 210 | - match: "{{((?!urlencode).)+}}" 211 | inscripttag: true 212 | ``` 213 | 214 | ## The status file 215 | 216 | This file is used to store the status between each run so Repoguard knows which commits to check and does not alert 217 | on the same change twice. 218 | 219 | ## Project layout 220 | 221 | ``` 222 | [project dir] 223 | \- core (core files) 224 | \- etc (configuration files) 225 | \- rules (rule files) 226 | \- tests (unit tests) 227 | \- repoguard.py (repoguard executable) 228 | \- repominer.py (repominer executable) 229 | \- guardserver (the repoguard UI and backend API) 230 | ``` 231 | 232 | ## How do we use it at Prezi? 233 | 234 | We run it periodically every 10 minutes and also every hour with the ```--refresh``` option (which fetches new repositories 235 | from Github). The alerts are sent to ElasticSearch and then an internal tool creates Trac tickets from them but 236 | for a long time we received the alerts via email which was a feasible workflow as well. 237 | 238 | ## How can you contribute? 239 | 240 | Extend or fine-tune the ruleset, improve the code or the documentation and send a pull request! 241 | Tests are highly appreciated. 242 | -------------------------------------------------------------------------------- /core/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prezi/repoguard/e0bf5ca9ecdff3472740f07e416ae85133f5a914/core/__init__.py -------------------------------------------------------------------------------- /core/codechecker.py: -------------------------------------------------------------------------------- 1 | from evaluators import * 2 | 3 | 4 | class CodeChecker: 5 | def __init__(self, context_processors, rules, repo_groups={}, rules_to_groups={}): 6 | self.context_processors = context_processors 7 | self.rules = rules 8 | self.repo_groups = repo_groups 9 | self.rules_to_groups = rules_to_groups 10 | 11 | def check(self, lines, context, repo=None): 12 | rules_applied_for_this_repo = filter(self._filter_rules(repo.name), self.rules) if repo else self.rules 13 | # pre-filter rules with line-invariant rules: 14 | applicable_rules = filter(self._check_line_invariants(context), rules_applied_for_this_repo) 15 | # check each line 16 | alerts = [] 17 | for idx, line in enumerate(lines): 18 | context['line_idx'] = idx 19 | alerts.extend(self.check_line(applicable_rules, context, line)) 20 | return alerts 21 | 22 | def _filter_rules(self, repo_name): 23 | def rule_filter(rule): 24 | for group_name, repo_group in self.repo_groups.iteritems(): 25 | rules_to_group = self.rules_to_groups.get(group_name) + self.rules_to_groups.get('*', []) 26 | if repo_name in repo_group and rules_to_group: 27 | # repo_name is in a group which has rules assigned to it 28 | positive_patterns = [re.compile(r["match"]) for r in rules_to_group if "match" in r] 29 | negative_patterns = [re.compile(r["except"]) for r in rules_to_group if "except" in r] 30 | 31 | ctx = reduce(lambda acc, p: acc or p.search(rule.name) is not None, positive_patterns, False) 32 | return ctx and reduce(lambda ctx, p: ctx and p.search(rule.name) is None, negative_patterns, ctx) 33 | return True 34 | 35 | return rule_filter 36 | 37 | def _check_line_invariants(self, context): 38 | def filename_filter(rule): 39 | return all(e.matches(context, None) for e in rule.evaluators if e.key in ["file", "author"]) 40 | 41 | return filename_filter 42 | 43 | def check_line(self, rules, line_ctx, line): 44 | if len(line) > 512: 45 | # probably not readable source, but it's hard to match regexes at least 46 | # TODO: logging 47 | return 48 | 49 | for cp in self.context_processors: 50 | line_ctx = cp.preprocess(line_ctx, line) 51 | 52 | for rule in rules: 53 | matches = [e.matches(line_ctx, line) for e in rule.evaluators] 54 | if len(matches) > 0 and all(matches): 55 | yield (rule, line) 56 | 57 | 58 | class Alert: 59 | def __init__(self, rule, filename, repo, commit, line, diff_line_number=0, line_number=0, author=None, 60 | commit_description=None): 61 | self.rule = rule 62 | self.filename = filename 63 | self.repo = repo 64 | self.commit = commit 65 | self.line = line 66 | self.line_number = line_number 67 | self.diff_line_number = diff_line_number 68 | self.author = author 69 | self.commit_description = commit_description 70 | 71 | 72 | class Rule: 73 | def __init__(self, name, evaluators, rule_config): 74 | self.name = name 75 | assert "::" in name 76 | self.namespace, self.localname = name.split("::") 77 | self.evaluators = evaluators 78 | self.description = rule_config.get('description', 'no description') 79 | self.email_template = rule_config.get('preferred_email_template', None) 80 | 81 | def __str__(self): 82 | return self.name 83 | 84 | 85 | class CodeCheckerFactory: 86 | def __init__(self, ruleset, repo_groups={}, rules_to_groups={}): 87 | self.ruleset = ruleset 88 | self.repo_groups = repo_groups 89 | self.rules_to_groups = rules_to_groups 90 | 91 | def create(self, mode=LineEvalFactory.MODE_DIFF): 92 | factories = [LineEvalFactory(mode), InScriptEvalFactory(), InAngularControllerEvalFactory(), FileEvalFactory(), 93 | CommitMessageEvalFactory(), AuthorEvalFactory(), PreviousLineEvaluatorFactory()] 94 | context_processors = [InScriptEvalFactory.ContextProcessor(), InAngularControllerEvalFactory.ContextProcessor()] 95 | rules = [self.create_single(rn, factories) for rn in self.ruleset] 96 | return CodeChecker(context_processors, rules, self.repo_groups, self.rules_to_groups) 97 | 98 | def create_single(self, rule_name, factories): 99 | rule = self.ruleset[rule_name] 100 | evaluators = filter(lambda e: e is not None, [f.create(rule) for f in factories]) 101 | return Rule(rule_name, evaluators, rule) 102 | -------------------------------------------------------------------------------- /core/datastore.py: -------------------------------------------------------------------------------- 1 | from elasticsearch import Elasticsearch, ElasticsearchException 2 | import os 3 | import sys 4 | import hashlib 5 | 6 | 7 | class DataStoreException (Exception): 8 | def __init__(self, error): 9 | self.error = error 10 | 11 | def __str__(self): 12 | return repr(self.error) 13 | 14 | 15 | class DataStore: 16 | def __init__(self, host, port, username=None, password=None, use_ssl=False, default_index=None, 17 | default_doctype=None): 18 | self.index = default_index 19 | self.doc_type = default_doctype 20 | if username and password: 21 | self.es_connection = Elasticsearch(host=host, port=port, http_auth=username + ":" + password, 22 | use_ssl=use_ssl) 23 | else: 24 | self.es_connection = Elasticsearch(host=host, port=port, use_ssl=use_ssl) 25 | if not self.es_connection.ping(): 26 | raise DataStoreException("Connection to ElasticSearch failed.") 27 | self.es_connection = False 28 | 29 | def store(self, body): 30 | try: 31 | self.es_connection.create(body=body, id=hashlib.sha1(str(body)).hexdigest(), index=self.index, 32 | doc_type=self.doc_type) 33 | except ElasticsearchException, e: 34 | raise DataStoreException("Exception while storing data in Elastic Search: " + str(e)) 35 | 36 | def search(self, query=None, params=None): 37 | try: 38 | if params: 39 | results = self.es_connection.search(body=query, index=self.index, doc_type=self.doc_type, params=params) 40 | else: 41 | results = self.es_connection.search(body=query, index=self.index, doc_type=self.doc_type) 42 | return results 43 | except ElasticsearchException, e: 44 | raise DataStoreException("Exception while searching data in Elastic Search: " + str(e)) 45 | 46 | def get(self, issue_id): 47 | try: 48 | results = self.es_connection.get(index=self.index, doc_type=self.doc_type, id=issue_id) 49 | return results 50 | except ElasticsearchException, e: 51 | raise DataStoreException("Exception while retrieving data based on index ID: " + str(e)) 52 | 53 | def update(self, index_id, doc): 54 | try: 55 | self.es_connection.update(body=doc, id=index_id, doc_type=self.doc_type, index=self.index) 56 | except ElasticsearchException, e: 57 | raise DataStoreException("Exception while updating data in Elastic Search: " + str(e)) -------------------------------------------------------------------------------- /core/elastic_search_helpers.py: -------------------------------------------------------------------------------- 1 | class ElasticSearchHelpers: 2 | def __init__(self): 3 | pass 4 | 5 | 6 | # These methods help create a filtered elastic search query: 7 | # see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html#_multiple_filters 8 | 9 | 10 | @staticmethod 11 | def create_query_string_filter(lucene_query): 12 | filtered_query_string = { 13 | "bool": { 14 | "should": [ 15 | { 16 | "query_string": { 17 | "query": lucene_query 18 | } 19 | } 20 | ] 21 | } 22 | } 23 | return filtered_query_string 24 | 25 | 26 | @staticmethod 27 | def create_timestamp_filter(from_date, to_date): 28 | filtered_timestamp = { 29 | "bool": { 30 | "must": [ 31 | { 32 | "range": { 33 | "@timestamp": { 34 | "from": from_date, 35 | "to": to_date 36 | } 37 | } 38 | } 39 | ] 40 | } 41 | } 42 | return filtered_timestamp 43 | 44 | 45 | @staticmethod 46 | def create_sort(order): 47 | if order: 48 | sort_order = "desc" 49 | else: 50 | sort_order = "asc" 51 | sort_order_query = [ 52 | { 53 | "@timestamp": { 54 | "order": sort_order 55 | } 56 | } 57 | ] 58 | return sort_order_query 59 | 60 | 61 | @staticmethod 62 | def create_elasticsearch_filtered_query(timestamp_filter, sort_order, filtered_query=None): 63 | filtered_query_params = dict(filter=timestamp_filter) 64 | if filtered_query: 65 | filtered_query_params["query"] = filtered_query 66 | filtered_query_dict = dict(filtered=filtered_query_params) 67 | query = dict(query=filtered_query_dict, sort=sort_order) 68 | 69 | return query 70 | 71 | @staticmethod 72 | def create_elasticsearch_simple_query(search_parameter, search_string): 73 | return dict(q=search_parameter + ":" + search_string) 74 | 75 | @staticmethod 76 | def create_elasticsearch_aggregate_query(field_name): 77 | field_dict = dict(field=field_name) 78 | terms_dict = dict(terms=field_dict) 79 | agg_name_dict = dict(my_aggregation=terms_dict) 80 | aggs = dict(aggs=agg_name_dict) 81 | return aggs 82 | 83 | @staticmethod 84 | def create_elasticsearch_doc(changed_values): 85 | return dict(doc=changed_values) 86 | -------------------------------------------------------------------------------- /core/evaluators.py: -------------------------------------------------------------------------------- 1 | import re 2 | 3 | 4 | class EvaluatorException(Exception): 5 | def __init__(self, error): 6 | self.error = error 7 | 8 | def __str__(self): 9 | return repr(self.error) 10 | 11 | 12 | class InScriptEvalFactory: 13 | def __init__(self): 14 | self.evaluator = self.InScriptEvaluator() 15 | 16 | def create(self, rule): 17 | return self.evaluator if "inscripttag" in rule else None 18 | 19 | 20 | class InScriptEvaluator: 21 | key = "inscripttag" 22 | 23 | def matches(self, line_context, line): 24 | value = line_context["inside_script_tag"] 25 | return value is not None and value > 0 26 | 27 | 28 | class ContextProcessor: 29 | def __init__(self): 30 | self.script_begin_re = re.compile(r'(?!.+type="text/(tpl|template|html)".+)]*>', 31 | flags=re.IGNORECASE) 32 | self.script_end_re = re.compile(r'', flags=re.IGNORECASE) 33 | 34 | def preprocess(self, line_context, line): 35 | if "inside_script_tag" not in line_context: 36 | # initialise 37 | line_context["inside_script_tag"] = 0 38 | tag_start_cnt = len(self.script_begin_re.findall(line)) 39 | tag_end_cnt = len(self.script_end_re.findall(line)) 40 | line_context["inside_script_tag"] += (tag_start_cnt - tag_end_cnt) 41 | return line_context 42 | 43 | 44 | class InAngularControllerEvalFactory: 45 | def __init__(self): 46 | self.evaluator = self.InAngularControllerEvaluator() 47 | 48 | def create(self, rule): 49 | return self.evaluator if "in_angular_controller" in rule else None 50 | 51 | 52 | class InAngularControllerEvaluator: 53 | key = "in_angular_controller" 54 | 55 | def matches(self, line_context, line): 56 | value = line_context["inside_ngcontroller_tag"] 57 | return value is not None and value > 0 58 | 59 | 60 | class ContextProcessor: 61 | def __init__(self): 62 | self.begin_re = re.compile(r'<(\w+) ng-controller=[^>]+>', flags=re.IGNORECASE) 63 | self.end_re = re.compile(r'', flags=re.IGNORECASE) 64 | 65 | def preprocess(self, line_context, line): 66 | if "inside_ngcontroller_tag" not in line_context: 67 | # initialise 68 | line_context["inside_ngcontroller_tag"] = 0 69 | begin_matches = self.begin_re.findall(line) 70 | tag_start_cnt = len(begin_matches) 71 | 72 | tag_end_cnt = len([m1 for m1, m2 in zip(begin_matches, self.end_re.findall(line)) 73 | if tag_start_cnt and m1 == m2]) 74 | line_context["inside_ngcontroller_tag"] += (tag_start_cnt - tag_end_cnt) 75 | return line_context 76 | 77 | 78 | class LineEvalFactory: 79 | MODE_DIFF = 1 80 | MODE_SINGLE = 2 81 | 82 | def __init__(self, mode=MODE_DIFF): 83 | self.mode = mode 84 | 85 | def create(self, rule): 86 | if "line" not in rule: 87 | return None 88 | else: 89 | flags = 0 if rule.get('case_sensitive', False) else re.IGNORECASE 90 | positive_patterns = [re.compile(r["match"], flags=flags) for r in rule["line"] if "match" in r] 91 | negative_patterns = [re.compile(r["except"], flags=flags) for r in rule["line"] if "except" in r] 92 | diff_mode = rule["diff"] if "diff" in rule else "all" 93 | diff_mode = diff_mode if diff_mode in ("add", "del", "mod") else "all" 94 | if self.mode == self.MODE_DIFF: 95 | diff_mode_prefixes = {"add": ("+",), "del": ("-",), "mod": ("+", "-")} 96 | must_begin_with = diff_mode_prefixes.get(diff_mode, None) 97 | return self.DiffLineEvaluator(positive_patterns, negative_patterns, must_begin_with) 98 | else: 99 | if diff_mode != "del": 100 | return self.SimpleLineEvaluator(positive_patterns, negative_patterns) 101 | else: 102 | return self.AlwaysFalseLineEvaluator() 103 | 104 | 105 | class SimpleLineEvaluator: 106 | key = "line" 107 | 108 | def __init__(self, positive_patterns, negative_patterns): 109 | self.positive_patterns = positive_patterns 110 | self.negative_patterns = negative_patterns 111 | 112 | def matches(self, line_context, line): 113 | if line is None or len(line) == 0: 114 | return False 115 | ctx = reduce(lambda ctx, p: ctx or p.search(line) is not None, self.positive_patterns, False) 116 | return ctx and reduce(lambda ctx, p: ctx and p.search(line) is None, self.negative_patterns, ctx) 117 | 118 | 119 | class DiffLineEvaluator: 120 | key = "line" 121 | 122 | def __init__(self, positive_patterns, negative_patterns, must_begin_with=None): 123 | self.must_begin_with = must_begin_with 124 | self.positive_patterns = positive_patterns 125 | self.negative_patterns = negative_patterns 126 | 127 | def matches(self, line_context, line): 128 | if line is None or len(line) <= 2: 129 | return False 130 | 131 | ctx = True 132 | if self.must_begin_with is not None: 133 | ctx = line.startswith(self.must_begin_with) 134 | line = line[1:] 135 | ctx = ctx and reduce(lambda ctx, p: ctx or p.search(line) is not None, self.positive_patterns, False) 136 | return ctx and reduce(lambda ctx, p: ctx and p.search(line) is None, self.negative_patterns, ctx) 137 | 138 | 139 | class AlwaysFalseLineEvaluator: 140 | key = "line" 141 | 142 | def matches(self, line_context, line): 143 | return False 144 | 145 | 146 | class ContextBasedPatternEvaluator(object): 147 | def __init__(self, rule, rule_key, context_key): 148 | self.key = rule_key 149 | self.context_key = context_key 150 | self.positive_patterns = [] 151 | self.negative_patterns = [] 152 | specific_rules = rule.get(rule_key, []) 153 | flags = 0 if rule.get('case_sensitive', False) else re.IGNORECASE 154 | 155 | for rule in specific_rules: 156 | if "match" in rule: 157 | self.positive_patterns.append(re.compile(rule["match"], flags=flags)) 158 | elif "except" in rule: 159 | self.negative_patterns.append(re.compile(rule["except"], flags=flags)) 160 | else: 161 | raise EvaluatorException("Unknown key in %s" % str(rule)) 162 | 163 | def matches(self, line_context, line): 164 | if self.context_key in ['commit_message'] and line_context.get('line_idx') > 0: 165 | # another hacky way to prevent alerting on every line if commit message matches 166 | return False 167 | 168 | ctx_value = line_context.get(self.context_key) 169 | if ctx_value is None: 170 | return False 171 | 172 | pos = not self.positive_patterns or reduce(lambda ctx, p: ctx or p.search(ctx_value), 173 | self.positive_patterns, False) 174 | neg = reduce(lambda ctx, p: ctx and not p.search(ctx_value), self.negative_patterns, True) 175 | return pos and neg 176 | 177 | 178 | class FileEvalFactory: 179 | def create(self, rule): 180 | return FileEvaluator(rule) if "file" in rule else None 181 | 182 | 183 | class FileEvaluator(ContextBasedPatternEvaluator): 184 | def __init__(self, rule): 185 | super(FileEvaluator, self).__init__(rule, "file", "filename") 186 | 187 | 188 | class CommitMessageEvalFactory: 189 | def create(self, rule): 190 | return CommitMessageEvaluator(rule) if "message" in rule else None 191 | 192 | 193 | class CommitMessageEvaluator(ContextBasedPatternEvaluator): 194 | def __init__(self, rule): 195 | super(CommitMessageEvaluator, self).__init__(rule=rule, rule_key="message", context_key="commit_message") 196 | 197 | 198 | class AuthorEvalFactory: 199 | def create(self, rule): 200 | return AuthorEvalFactory(rule) if "author" in rule else None 201 | 202 | 203 | class AuthorEvaluator(ContextBasedPatternEvaluator): 204 | def __init__(self, rule): 205 | super(AuthorEvaluator, self).__init__(rule, "author", "author") 206 | 207 | 208 | class PreviousLineEvaluatorFactory: 209 | def __init__(self): 210 | pass 211 | 212 | def create(self, rule): 213 | if "previously" not in rule: 214 | return None 215 | 216 | if any("match" in r for r in rule["previously"]): 217 | raise ValueError("Only negative (except:) matches implemented yet") 218 | 219 | negative_patterns = [re.compile(r["except"]) for r in rule["previously"] if "except" in r] 220 | 221 | return self.PreviousLineEvaluator(negative_patterns) 222 | 223 | 224 | class PreviousLineEvaluator: 225 | key = "previously" 226 | 227 | def __init__(self, negative_patterns): 228 | self.negative_patterns = negative_patterns 229 | 230 | def matches(self, line_context, line): 231 | rolled_negative = line_context.get("previously_negative", False) # Was there a negative pattern match? 232 | if rolled_negative: 233 | return False # As there was already a negative match so far, no need to check further 234 | 235 | is_negative_match = any(p.search(line) is not None for p in self.negative_patterns) 236 | line_context["previously_negative"] = is_negative_match 237 | 238 | return True # Same reasoning: this may be a negative match, but it affects this line, not a previous 239 | -------------------------------------------------------------------------------- /core/git_repo_querier.py: -------------------------------------------------------------------------------- 1 | from github import Github 2 | 3 | class GitRepoQuerier(): 4 | def __init__(self, org_name, github_token): 5 | self.github_connection = Github(github_token) 6 | self.organization = self.github_connection.get_organization(org_name) 7 | 8 | def get_file_contents(self, repo, filename, commit_id): 9 | repo = self.organization.get_repo(repo) 10 | file_contents = repo.get_contents(filename, commit_id) 11 | return file_contents.decoded_content 12 | -------------------------------------------------------------------------------- /core/git_repo_updater.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | import json 4 | import re 5 | import os 6 | 7 | import requests 8 | import requests.exceptions 9 | 10 | 11 | class GitRepoUpdater: 12 | def __init__(self, org_name, github_token, repo_list_path, logger): 13 | self.REPO_LIST_PATH = repo_list_path 14 | self.api_url = 'https://api.github.com/orgs/%s/repos' % (org_name) 15 | self.request_headers = {'Authorization': 'token %s' % github_token} 16 | self.token = github_token 17 | self.repo_attributes_to_store = ('name', 'language', 'full_name', 'private', 'fork') 18 | self.logger = logger 19 | 20 | self.actpage = 0 21 | self.lastpage = 0 22 | self.stop = False 23 | 24 | self.repo_list_cache = {} 25 | 26 | def refresh_repo_list(self): 27 | while self.actpage <= self.lastpage and not self.stop: 28 | self.fetch_repo_list("%s?page=%s" % (self.api_url, self.actpage)) 29 | self.actpage += 1 30 | 31 | def get_repo_attributes_from_repo_json_obj(self, repo_json_obj): 32 | repo_info_to_store = {} 33 | for repo_attribute in self.repo_attributes_to_store: 34 | repo_info_to_store[repo_attribute] = repo_json_obj[repo_attribute] 35 | repo_info_to_store['url_with_token'] = 'https://%%s@github.com/%s.git' % (repo_json_obj['full_name']) 36 | return repo_info_to_store 37 | 38 | def store_repo_attributes_from_response_json(self, response_json): 39 | for repo in response_json: 40 | repo_id = str(repo["id"]) 41 | if repo_id not in self.repo_list_cache: 42 | self.repo_list_cache[repo_id] = self.get_repo_attributes_from_repo_json_obj(repo) 43 | 44 | def fetch_repo_list(self, url): 45 | try: 46 | self.logger.info('Fetching repo list from: %s' % url) 47 | r = requests.get(url, verify=True, headers=self.request_headers) 48 | 49 | if r.status_code == 200: 50 | if 'X-RateLimit-Remaining' in r.headers: 51 | if int(r.headers['X-RateLimit-Remaining']) == 0: 52 | self.logger.warning('OUT OF RATELIMIT') 53 | self.stop = True 54 | return 55 | try: 56 | lasturl_re = re.compile('.*<([\w\:\/\.]+)\?page=([0-9]+)>; rel="last"') 57 | lasturl = lasturl_re.match(r.headers['link']).groups() 58 | self.lastpage = int(lasturl[1]) 59 | self.logger.debug("PAGE %s/%s" % (self.actpage, self.lastpage)) 60 | # TODO (KR): this is too broad. figure out what needs to be caught. 61 | except: 62 | self.logger.debug("... finished (PAGE: %s)" % self.actpage) 63 | self.logger.debug( 64 | "(rate limit: %s / %s)" % (r.headers['X-RateLimit-Remaining'], r.headers['X-RateLimit-Limit'])) 65 | self.store_repo_attributes_from_response_json(json.loads(r.text or r.content)) 66 | else: 67 | self.logger.error('github.com returned non-200 status code: %s' % r.text) 68 | self.stop = True 69 | except requests.exceptions.RequestException: 70 | self.logger.exception('Exception during HTTP request.') 71 | 72 | def write_repo_list_to_file(self): 73 | with open(self.REPO_LIST_PATH, 'w') as repo_file: 74 | json.dump(self.repo_list_cache, repo_file, indent=4, sort_keys=True) 75 | 76 | def read_repo_list_from_file(self): 77 | if not os.path.exists(self.REPO_LIST_PATH): 78 | return [] 79 | with open(self.REPO_LIST_PATH, 'r') as repo_file: 80 | return json.load(repo_file) 81 | 82 | def refresh_repos_and_detect_new_public_repos(self): 83 | new_public_repos = [] 84 | self.refresh_repo_list() 85 | original_repo_status = self.read_repo_list_from_file() 86 | for repo_id in self.repo_list_cache: 87 | if self.repo_list_cache[repo_id]["private"] == False: 88 | if repo_id not in original_repo_status: 89 | self.logger.debug("Totally new public repo %s" % self.repo_list_cache[repo_id]["name"]) 90 | new_public_repos.append(self.repo_list_cache[repo_id]) 91 | elif original_repo_status[repo_id]["private"] == True: 92 | self.logger.debug( 93 | "Previously private repo set to public %s" % self.repo_list_cache[repo_id]["name"]) 94 | new_public_repos.append(self.repo_list_cache[repo_id]) 95 | return new_public_repos 96 | -------------------------------------------------------------------------------- /core/notifier.py: -------------------------------------------------------------------------------- 1 | from email.mime.multipart import MIMEMultipart 2 | from email.mime.text import MIMEText 3 | 4 | import datetime 5 | import smtplib 6 | 7 | class EmailNotifierException(Exception): 8 | def __init__(self, error): 9 | self.error = error 10 | 11 | def __str__(self): 12 | return repr(self.error) 13 | 14 | class EmailNotifier: 15 | def __init__(self, email_from, email_to, subject, text, connection_string='localhost', 16 | smtp_username = None, smtp_password = None, use_tls = None): 17 | self.email_from = email_from 18 | self.email_to = email_to 19 | self.connection_string = connection_string 20 | self.mime_message = MIMEMultipart() 21 | self.prepare_message_headers(subject) 22 | self.create_mime_message(text) 23 | self.username = smtp_username 24 | self.password = smtp_password 25 | self.use_tls = use_tls 26 | 27 | @staticmethod 28 | def create_notification(from_addr, to_addr, subject, text, connection_string='localhost', 29 | smtp_username = None, smtp_password = None, use_tls = None): 30 | return EmailNotifier(from_addr, to_addr, subject, text, connection_string, 31 | smtp_username, smtp_password, use_tls) 32 | 33 | def prepare_message_headers(self, subject): 34 | self.subject = subject 35 | self.mime_message["Subject"] = subject 36 | self.mime_message["From"] = self.email_from 37 | self.mime_message["To"] = self.email_to 38 | 39 | def create_mime_message(self, text): 40 | self.text = text 41 | self.mime_message.attach(MIMEText(text.encode("utf-8"), "plain")) 42 | 43 | def send_if_fine(self): 44 | if self.email_from and self.email_to and self.mime_message: 45 | self.smtp_send() 46 | else: 47 | raise EmailNotifierException("Mails should have FROM, TO headers and a message as well!") 48 | 49 | def smtp_send(self): 50 | try: 51 | smtp = smtplib.SMTP(self.connection_string) 52 | if self.use_tls: 53 | smtp.starttls() 54 | if self.username is not None and self.password is not None: 55 | smtp.login(self.username, self.password) 56 | # this needs a separate try/catch because in other cases, the connection is automatically 57 | # closed (and attempts to close it throws a SMTPServerDisconnected exception, but in this case, 58 | # the connection stays open if it fails. This seems to be the cleanest way to handle it without 59 | # handling every single exception sendmail can throw. 60 | try: 61 | smtp.sendmail(self.email_from, self.email_to, self.mime_message.as_string()) 62 | except smtplib.SMTPException, e: 63 | smtp.quit() 64 | raise EmailNotifierException(e) 65 | except smtplib.SMTPException, e: 66 | raise EmailNotifierException(e) 67 | -------------------------------------------------------------------------------- /core/repository_handler.py: -------------------------------------------------------------------------------- 1 | import json 2 | import logging 3 | import os 4 | from collections import OrderedDict 5 | import subprocess 6 | import shutil 7 | 8 | 9 | module_logger = logging.getLogger("repoguard.repository_handler") 10 | 11 | 12 | class RepositoryException(Exception): 13 | pass 14 | 15 | 16 | def git_clone_or_pull(existing_repo_dirs, github_token, repo): 17 | if repo.dir_name in existing_repo_dirs: 18 | repo.git_reset_to_oldest_hash() 19 | if not repo.git_pull(github_token): 20 | # if there was any error on pulling, let's reclone the directory 21 | module_logger.debug('Git pull failed, reclone repository.') 22 | repo.remove() 23 | repo.git_clone(github_token) 24 | else: 25 | module_logger.debug('Repository not in existing repo dirs, cloning it.') 26 | repo.git_clone(github_token) 27 | 28 | repo.detect_new_commit_hashes() 29 | 30 | 31 | class Repository(): 32 | def __init__(self, repo_id, github_repo_json_response, working_directory, logger): 33 | self._status_json_attributes = ("name", "last_checked_commit_hashes") 34 | self.repo_id = repo_id 35 | self.name = github_repo_json_response["name"] 36 | self.working_directory = working_directory 37 | self.url_with_token = github_repo_json_response["url_with_token"] 38 | self.language = github_repo_json_response["language"] 39 | self.fork = github_repo_json_response["fork"] 40 | self.private = github_repo_json_response["private"] 41 | self.dir_name = '%s_%s' % (self.name, self.repo_id) 42 | self.full_dir_path = '%s%s' % (working_directory, self.dir_name) 43 | self.last_checked_commit_hashes = [] 44 | self.not_checked_commit_hashes = [] 45 | self.logger = logging.getLogger('repository_handler.Repository') 46 | 47 | def add_status_info_from_json(self, repo_status_info_json): 48 | self.last_checked_commit_hashes = repo_status_info_json["last_checked_commit_hashes"] 49 | 50 | def add_commit_hash_to_checked(self, rev_hash): 51 | if rev_hash not in self.get_last_checked_commit_hashes(): 52 | self.last_checked_commit_hashes.append(rev_hash) 53 | 54 | def get_last_commit_hashes(self): 55 | result = self.call_command("git rev-list --remotes --no-merges --max-count=100 HEAD") 56 | return result.split('\n')[:-1] if result is not None else [] 57 | 58 | def detect_new_commit_hashes(self): 59 | for commit_sha in self.get_last_commit_hashes(): 60 | if commit_sha not in self.get_last_checked_commit_hashes() \ 61 | and commit_sha not in self.get_not_checked_commit_hashes(): 62 | self.not_checked_commit_hashes.append(commit_sha) 63 | 64 | def get_last_checked_commit_hashes(self): 65 | return self.last_checked_commit_hashes 66 | 67 | def get_not_checked_commit_hashes(self): 68 | return self.not_checked_commit_hashes 69 | 70 | def get_rev_list_since_date(self, since): 71 | cmd = "git", "rev-list", "--remotes", "--no-merges", "--since=\"%s\"" % since, "HEAD" 72 | try: 73 | return subprocess.check_output(cmd, cwd=self.full_dir_path).split("\n")[:-1] 74 | except subprocess.CalledProcessError, e: 75 | error_msg = "Error when calling %s (cwd: %s): %s" % (repr(cmd), self.full_dir_path, e) 76 | self.logger.error(error_msg) 77 | return [] 78 | 79 | def git_reset_to_oldest_hash(self): 80 | if self.last_checked_commit_hashes: 81 | self.call_command("git reset --hard %s" % self.last_checked_commit_hashes[0]) 82 | 83 | def git_clone(self, token): 84 | # using git pull to avoid storing the token in .git/config 85 | # see: https://github.com/blog/1270-easier-builds-and-deployments-using-git-over-https-and-oauth 86 | os.mkdir(self.full_dir_path) 87 | self.call_command('git init') 88 | self.call_command("git pull %s" % (self.url_with_token % token), cwd=self.full_dir_path) 89 | 90 | def git_pull(self, token): 91 | return self.call_command("git pull %s" % (self.url_with_token % token), cwd=self.full_dir_path) 92 | 93 | def remove(self): 94 | try: 95 | shutil.rmtree(self.full_dir_path) 96 | except: 97 | self.logger.exception('Failed to remove repo_dir: %s', self.full_dir_path) 98 | 99 | def call_command(self, cmd, cwd=None): 100 | cwd = self.full_dir_path if not cwd else cwd 101 | self.logger.debug("Calling %s (cwd: %s)" % (cmd, cwd)) 102 | try: 103 | cmd_output = subprocess.check_output(cmd.split(), cwd=cwd) 104 | return cmd_output 105 | except: 106 | self.logger.exception("Error when calling %s (cwd: %s)" % (cmd, cwd)) 107 | return None 108 | 109 | def to_dict(self): 110 | state = self.__dict__.copy() 111 | for attr in self.__dict__: 112 | if attr not in self._status_json_attributes: 113 | del state[attr] 114 | return state 115 | 116 | def __getstate__(self): 117 | state = self.__dict__.copy() 118 | for attr in self.__dict__: 119 | if attr in ['logger']: 120 | del state[attr] 121 | return state 122 | 123 | def __setstate__(self, state): 124 | # Restore instance attributes (i.e., filename and lineno). 125 | self.__dict__.update(state) 126 | # Restore the previously opened file's state. To do so, we need to 127 | # reopen it and read from it until the line count is restored. 128 | self.logger = logging.getLogger('repository_handler.Repository') 129 | 130 | 131 | class RepositoryHandler(): 132 | def __init__(self, working_directory, logger): 133 | self.logger = logger 134 | self.working_directory = working_directory 135 | self.repo_list_file = working_directory + 'repo_list.json' 136 | self.repo_status_file = working_directory + 'repo_status.json' 137 | self.repo_list = OrderedDict() 138 | self.create_repo_list_and_status_from_files() 139 | self.logger.debug("Repository handler started") 140 | 141 | def create_repo_list_and_status_from_files(self): 142 | repo_list = self.load_repo_list_from_file() 143 | for repo_id, repo_data in sorted(repo_list.iteritems(), key=lambda r: r[1]['name']): 144 | try: 145 | self.repo_list[repo_id] = Repository(repo_id, repo_list[repo_id], self.working_directory, self.logger) 146 | except KeyError: 147 | self.logger.exception('Got KeyError during Repository instantiation.') 148 | 149 | def load_status_info_from_file(self): 150 | repo_status_info = self.load_repo_status_from_file() 151 | for repo_id, repo_data in self.repo_list.iteritems(): 152 | if repo_status_info and repo_id in repo_status_info: 153 | self.get_repo_by_id(repo_id).add_status_info_from_json(repo_status_info[repo_id]) 154 | 155 | def get_repo_list(self): 156 | return self.repo_list.values() 157 | 158 | def get_repo_by_id(self, repo_id): 159 | return self.repo_list[repo_id] 160 | 161 | def load_repo_status_from_file(self): 162 | try: 163 | with open(self.repo_status_file) as repo_status: 164 | return json.load(repo_status) 165 | except IOError: 166 | self.logger.info("repo status file %s doesn't exist" % self.repo_status_file) 167 | return {} 168 | 169 | def load_repo_list_from_file(self): 170 | try: 171 | with open(self.repo_list_file) as repo_list: 172 | return json.load(repo_list) 173 | except IOError: 174 | self.logger.critical("repo list file %s doesn't exist" % self.repo_list_file) 175 | return {} 176 | 177 | def save_repo_status_to_file(self): 178 | if not self.repo_list: 179 | self.logger.warning('Got empty repository list, not updating status file!') 180 | return 181 | 182 | with open(self.repo_status_file, 'w') as repo_status: 183 | json.dump({k: v.to_dict() for k, v in self.repo_list.iteritems()}, repo_status, indent=4, sort_keys=True) 184 | -------------------------------------------------------------------------------- /core/ruleparser.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import sys 3 | import yaml 4 | import os 5 | 6 | class RuleLoaderException(Exception): 7 | def __init__(self, error, ctx=''): 8 | self.error = error 9 | self.ctx = ctx 10 | 11 | def __str__(self): 12 | return str(self.error) + ":\n" + str(self.ctx) 13 | 14 | class RuleLoader: 15 | file_name = None 16 | namespace = None 17 | autoincr_base = 0 18 | 19 | def __init__(self, file_name): 20 | self.file_name = file_name 21 | self.namespace = self._find_default_namespace() 22 | 23 | def load(self): 24 | with open(self.file_name) as f: 25 | content = f.read() 26 | return {self._get_key(c): self._load_yaml(c) for c in content.split('---') if len(c) > 0} 27 | 28 | @staticmethod 29 | def _load_yaml(text): 30 | try: 31 | return yaml.load(text) 32 | except yaml.YAMLError, exc: 33 | raise RuleLoaderException("Error loading yaml", text) 34 | 35 | def _find_default_namespace(self): 36 | dpos = self.file_name.rfind("/") 37 | if dpos < 0: 38 | dpos = 0 39 | else: 40 | dpos += 1 41 | ppos = self.file_name.rfind(".") 42 | if ppos < 0: 43 | ppos = len(self.file_name) 44 | 45 | return self.file_name[dpos:ppos] 46 | 47 | def _get_key(self, document): 48 | d = document.lstrip() 49 | if d.startswith("#!"): 50 | end = d.find("\n") 51 | return "%s::%s" % (self.namespace, d[2:end].strip()) 52 | else: 53 | self.autoincr_base += 1 54 | return "%s::gen%d" % (self.namespace, self.autoincr_base) 55 | 56 | 57 | # Helper method to load configs in a dir 58 | def load_rules(rule_dir): 59 | rules = {} 60 | for (dirpath, dirnames, filenames) in os.walk(rule_dir, followlinks=True): 61 | for filename in filenames: 62 | if filename.endswith(".yml"): 63 | try: 64 | rules.update(RuleLoader(os.path.join(dirpath, filename)).load()) 65 | except Exception as e: 66 | raise RuleLoaderException("Error parsing file %s" % filename, str(e)), \ 67 | None, sys.exc_info()[2] 68 | return rules 69 | 70 | 71 | # Resolves rule hierarchy, and omits abstract rules 72 | def build_resolved_ruleset(rules): 73 | return {name: resolve_rule(name, rules) for name in rules if not _is_abstract(name)} 74 | 75 | 76 | def _is_abstract(rule_name): 77 | ns, name = rule_name.split("::") 78 | return name.startswith("~") 79 | 80 | 81 | # Resolves a rule 82 | def resolve_rule(rule_name, ruleset, in_progress=()): 83 | namespace, localname = rule_name.split("::") 84 | if rule_name not in ruleset: 85 | abstract_name = "%s::~%s" % (namespace, localname) 86 | if abstract_name not in ruleset: 87 | raise RuleLoaderException("Unknown rule: %s, ruleset: %s" % (rule_name, ruleset)) 88 | else: 89 | rule_name = abstract_name 90 | if rule_name in in_progress: 91 | raise RuleLoaderException("Circular depencencies found: %s -> %s" % (" -> ".join(in_progress), rule_name)) 92 | rule_specs = ruleset[rule_name] 93 | if "extends" in rule_specs: 94 | base_rule_names = [b.strip() for b in rule_specs["extends"].split(",")] 95 | base_rule_fqdns = ["%s::%s" % (namespace, rn) if "::" not in rn else rn for rn in base_rule_names] 96 | base_rules = [resolve_rule(rname, ruleset, in_progress + (rule_name,)) for rname in base_rule_fqdns] 97 | return merge_many_rules(rule_specs, base_rules) 98 | else: 99 | return rule_specs 100 | 101 | 102 | def merge_many_rules(target, sources): 103 | rule = copy.deepcopy(target) 104 | for base_rule in sources: 105 | merge_rules(rule, base_rule) 106 | return rule 107 | 108 | 109 | def merge_rules(target, source): 110 | if isinstance(source, dict): 111 | for k, v in source.iteritems(): 112 | if isinstance(target, dict) and k not in target: 113 | target[k] = v 114 | elif isinstance(target, list) and {k: v} not in target: 115 | target.append({k: v}) 116 | else: 117 | merge_rules(target[k], v) 118 | elif isinstance(source, list): 119 | for e in source: 120 | if e not in target: 121 | target.append(e) 122 | -------------------------------------------------------------------------------- /etc/config.yml.template: -------------------------------------------------------------------------------- 1 | github: 2 | token: "" 3 | organization_name: "" 4 | 5 | git: 6 | # whether to run git diffs with rename detection or not 7 | # if disabled, file renames act like file additions 8 | detect_rename: false 9 | 10 | # repository names which should not be checked/fetched from github 11 | skip_repo_list: 12 | - internal-junk-repo 13 | 14 | repo_groups: 15 | local_repos: 16 | - aws_tools 17 | tool_repos: 18 | - gradle-plugin 19 | 20 | rules_to_groups: 21 | '*': 22 | - except: something::not_applicable 23 | local_repos: 24 | - except: os_code_exec::.* 25 | - except: generic_best_practices::unsecure_protocol 26 | tool_repos: 27 | - except: os_code_exec::.* 28 | 29 | # emails will be sent with this from address by default 30 | default_notification_src_address: "" 31 | 32 | # emails will be sent to this address by default 33 | default_notification_to_address: "" 34 | 35 | # SMTP defaults 36 | smtp: 37 | host: "localhost" 38 | port: 25 39 | username: null 40 | password: null 41 | use_tls: false 42 | 43 | # send email notifications on errors? 44 | notifications: false 45 | # defines where to send the alerts 46 | # '*' and '?' are wildcards 47 | subscribers: 48 | "*": ["root@localhost"] 49 | "secrets::*": ["security@localhost"] 50 | 51 | # list of rules which should run on full scans triggered by repository becoming public (or false if there are none) 52 | full_scan_triggered_rules: false 53 | -------------------------------------------------------------------------------- /guardserver/.gitignore: -------------------------------------------------------------------------------- 1 | **/bootstrap* 2 | **/jquery.* 3 | config.py 4 | **/prettify.js 5 | **/datepicker* 6 | **/config.js 7 | **/typeahead* 8 | -------------------------------------------------------------------------------- /guardserver/default_config.py: -------------------------------------------------------------------------------- 1 | # Application Config 2 | DEBUG = True 3 | SECRET_KEY = "secretkeybro" 4 | 5 | LDAP_USERNAME = "" 6 | LDAP_PASSWORD = "" 7 | 8 | AUTHENTICATION_REQUIRED = False 9 | 10 | # LDAP Configuration 11 | LDAP_DN = "cn=%s,ou=people,dc=example,dc=com" 12 | LDAP_SERVER = "" 13 | LDAP_OU = "" 14 | 15 | 16 | # Elastic Search Configuration 17 | ELASTIC_HOST = "localhost" 18 | ELASTIC_PORT = "9200" 19 | INDEX = "repoguard" 20 | DOC_TYPE = "repoguard" 21 | 22 | # Github Configuration 23 | GITHUB_TOKEN = "" 24 | ORG_NAME = "" -------------------------------------------------------------------------------- /guardserver/guardserver.py: -------------------------------------------------------------------------------- 1 | from server import app 2 | 3 | app.run(debug=True, host="0.0.0.0") 4 | 5 | -------------------------------------------------------------------------------- /guardserver/repoguard.apache.vhost.sample: -------------------------------------------------------------------------------- 1 | NameVirtualHost *:443 2 | 3 | 4 | ServerName repoguard.example.com 5 | ServerAlias repoguard.example.com 6 | SSLEngine On 7 | SSLCertificateFile /etc/ssl/certs/cert.pem 8 | SSLCertificateKeyFile /etc/ssl/private/private.key 9 | ErrorLog ${APACHE_LOG_DIR}/repoguard.error.log 10 | WSGIDaemonProcess repoguard user=www-data group=www-data threads=100 11 | WSGIScriptAlias / /path/to/repoguard/repoguard.wsgi 12 | WSGIPassAuthorization On 13 | 14 | 15 | WSGIProcessGroup repoguard 16 | WSGIApplicationGroup %{GLOBAL} 17 | Order deny,allow 18 | Allow from all 19 | 20 | 21 | -------------------------------------------------------------------------------- /guardserver/repoguard.wsgi: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('/path/to/repoguard') 3 | sys.path.append('/path/to/repoguard/server') 4 | 5 | from server import app as application -------------------------------------------------------------------------------- /guardserver/server/__init__.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, redirect 2 | app = Flask(__name__, static_folder="../static") 3 | app.config.from_object('config') 4 | 5 | @app.route("/") 6 | def index_route(): 7 | return redirect("/static/index.html") 8 | 9 | from issues import get_issues 10 | from filters import filter_by_commit 11 | -------------------------------------------------------------------------------- /guardserver/server/authentication.py: -------------------------------------------------------------------------------- 1 | from . import app 2 | import ldap 3 | from flask import request, Response, make_response 4 | from functools import wraps 5 | import re 6 | import json 7 | 8 | def check_auth(username, password): 9 | """This function is called to check if a username / 10 | password combination is valid. 11 | """ 12 | valid = re.match("^[\w.]+$", username) is not None 13 | if not valid: 14 | return False 15 | user_dn = app.config["LDAP_DN"] % username 16 | connect = ldap.initialize(app.config["LDAP_SERVER"]) 17 | try: 18 | connect.bind_s(user_dn, password) 19 | if "CURRENT_USER" not in app.config: 20 | result = connect.search_st(app.config["LDAP_OU"], ldap.SCOPE_SUBTREE, "cn=" + username) 21 | given_name = result[0][1]["givenName"][0] 22 | last_name = result[0][1]["sn"][0] 23 | name = given_name + " " + last_name 24 | app.config["CURRENT_USER"] = name 25 | return True 26 | except ldap.LDAPError, e: 27 | connect.unbind_s() 28 | return False 29 | 30 | 31 | def authenticate(): 32 | """Sends a 401 response that enables basic auth""" 33 | return Response( 34 | 'Could not verify your access level for that URL.\n' 35 | 'You have to login with proper credentials', 401, 36 | {'WWW-Authenticate': 'Basic realm="Login Required"'}) 37 | 38 | 39 | def requires_auth(f): 40 | @wraps(f) 41 | def decorated(*args, **kwargs): 42 | auth = request.authorization 43 | if app.config["AUTHENTICATION_REQUIRED"]: 44 | if not auth or not check_auth(auth.username, auth.password): 45 | return authenticate() 46 | return f(*args, **kwargs) 47 | 48 | return decorated 49 | 50 | @app.route("/current_user", methods=["GET"]) 51 | @requires_auth 52 | def current_user(): 53 | user = "" 54 | if "CURRENT_USER" in app.config: 55 | user = app.config["CURRENT_USER"] 56 | result = dict(name=user) 57 | response = make_response(json.dumps(result)) 58 | response.headers["Content-Type"] = "application/json" 59 | return response 60 | -------------------------------------------------------------------------------- /guardserver/server/filters.py: -------------------------------------------------------------------------------- 1 | from . import app 2 | from authentication import requires_auth 3 | from core.elastic_search_helpers import ElasticSearchHelpers 4 | from core.datastore import DataStore, DataStoreException 5 | from flask import request, make_response 6 | import datetime 7 | import json 8 | 9 | 10 | #helper function 11 | class HashableDict(dict): 12 | def __hash__(self): 13 | return hash(tuple(sorted(self.items()))) 14 | 15 | def get_data_store(): 16 | datastore = DataStore(host=app.config["ELASTIC_HOST"], port=app.config["ELASTIC_PORT"], 17 | default_index=app.config["INDEX"], default_doctype=app.config["DOC_TYPE"]) 18 | return datastore 19 | 20 | @app.route("/filter/commits", methods=["GET"]) 21 | @requires_auth 22 | def filter_by_commit(): 23 | from_time = request.args.get("start_time", arrow.utcnow().replace(days=-7)) 24 | to_time = request.args.get("end_time", arrow.utcnow()) 25 | start = int(request.args.get("from", 0)) 26 | end = int(request.args.get("size", 100)) 27 | start_time = int(arrow.get(from_time).float_timestamp * 1000) 28 | end_time = int(arrow.get(to_time).float_timestamp * 1000) 29 | false_positive = request.args.get("false_positive", "false") 30 | sort_order = ElasticSearchHelpers.create_sort(True) 31 | time_filter = ElasticSearchHelpers.create_timestamp_filter(start_time, end_time) 32 | query_filter = ElasticSearchHelpers.create_query_string_filter("false_positive:" + false_positive) 33 | try: 34 | query = ElasticSearchHelpers.create_elasticsearch_filtered_query(filtered_query=query_filter, 35 | timestamp_filter=time_filter, 36 | sort_order=sort_order) 37 | datastore = get_data_store() 38 | params = dict(from_=start) 39 | params["size"] = end 40 | params["_source"] = "commit_id,commit_description" 41 | results = datastore.search(query=query, params=params) 42 | commits = set() 43 | for result in results["hits"]["hits"]: 44 | commit_and_description = HashableDict(commit=result["_source"]["commit_id"]) 45 | commit_and_description["description"] = result["_source"]["commit_description"] 46 | commits.add(commit_and_description) 47 | # sets are not JSON serializable 48 | response = make_response(json.dumps(list(commits))) 49 | response.headers["Content-Type"] = "application/json" 50 | return response 51 | except DataStoreException: 52 | return "Failed to retrieve commits", 500 53 | 54 | @app.route("/filter/reviewers", methods=["GET"]) 55 | @requires_auth 56 | def filter_by_reviewer(): 57 | from_time = request.args.get("start_time", arrow.utcnow().replace(days=-7)) 58 | to_time = request.args.get("end_time", arrow.utcnow()) 59 | start = int(request.args.get("from", 0)) 60 | end = int(request.args.get("size", 100)) 61 | start_time = int(arrow.get(from_time).float_timestamp * 1000) 62 | end_time = int(arrow.get(to_time).float_timestamp * 1000) 63 | false_positive = request.args.get("false_positive", "false") 64 | sort_order = ElasticSearchHelpers.create_sort(True) 65 | time_filter = ElasticSearchHelpers.create_timestamp_filter(start_time, end_time) 66 | query_filter = ElasticSearchHelpers.create_query_string_filter("false_positive:" + false_positive) 67 | try: 68 | query = ElasticSearchHelpers.create_elasticsearch_filtered_query(filtered_query=query_filter, 69 | timestamp_filter=time_filter, 70 | sort_order=sort_order) 71 | datastore = get_data_store() 72 | params = dict(from_=start) 73 | params["size"] = end 74 | params["_source"] = "last_reviewer" 75 | results = datastore.search(query=query, params=params) 76 | reviewers = set() 77 | for result in results["hits"]["hits"]: 78 | reviewers.add(result["_source"]["last_reviewer"]) 79 | # sets are not JSON serializable 80 | response = make_response(json.dumps(list(reviewers))) 81 | response.headers["Content-Type"] = "application/json" 82 | return response 83 | except DataStoreException: 84 | return "Failed to retrieve commits", 500 85 | 86 | @app.route("/filter/rules", methods=["GET"]) 87 | @requires_auth 88 | def filter_by_rule(): 89 | from_time = request.args.get("start_time", arrow.utcnow().replace(days=-7)) 90 | to_time = request.args.get("end_time", arrow.utcnow()) 91 | start = int(request.args.get("from", 0)) 92 | end = int(request.args.get("size", 100)) 93 | start_time = int(arrow.get(from_time).float_timestamp * 1000) 94 | end_time = int(arrow.get(to_time).float_timestamp * 1000) 95 | false_positive = request.args.get("false_positive", "false") 96 | sort_order = ElasticSearchHelpers.create_sort(True) 97 | time_filter = ElasticSearchHelpers.create_timestamp_filter(start_time, end_time) 98 | query_filter = ElasticSearchHelpers.create_query_string_filter("false_positive:" + false_positive) 99 | try: 100 | query = ElasticSearchHelpers.create_elasticsearch_filtered_query(filtered_query=query_filter, 101 | timestamp_filter=time_filter, 102 | sort_order=sort_order) 103 | datastore = get_data_store() 104 | params = dict(from_=start) 105 | params["size"] = end 106 | params["_source"] = "check_id" 107 | results = datastore.search(query=query, params=params) 108 | rules = set() 109 | for result in results["hits"]["hits"]: 110 | rule = result["_source"]["check_id"] 111 | rules .add(rule) 112 | # sets are not JSON serializable 113 | response = make_response(json.dumps(list(rules))) 114 | response.headers["Content-Type"] = "application/json" 115 | return response 116 | except DataStoreException: 117 | return "Failed to retrieve commits", 500 118 | 119 | @app.route("/filter/repos", methods=["GET"]) 120 | @requires_auth 121 | def filter_by_repo(): 122 | try: 123 | query = ElasticSearchHelpers.create_elasticsearch_aggregate_query("repo_name") 124 | datastore = get_data_store() 125 | results = datastore.search(query=query) 126 | repos = set() 127 | for result in results["aggregations"]["my_aggregation"]["buckets"]: 128 | repo = result["key"] 129 | repos.add(repo) 130 | # sets are not JSON serializable 131 | response = make_response(json.dumps(list(repos))) 132 | response.headers["Content-Type"] = "application/json" 133 | return response 134 | except DataStoreException: 135 | return "Failed to retrieve commits", 500 136 | 137 | -------------------------------------------------------------------------------- /guardserver/server/issues.py: -------------------------------------------------------------------------------- 1 | from . import app 2 | from authentication import requires_auth 3 | from core.elastic_search_helpers import ElasticSearchHelpers 4 | from core.datastore import DataStore, DataStoreException 5 | from core.git_repo_querier import GitRepoQuerier 6 | from flask import request, make_response, escape 7 | import arrow 8 | import json 9 | import urllib 10 | 11 | def get_data_store(): 12 | datastore = DataStore(host=app.config["ELASTIC_HOST"], port=app.config["ELASTIC_PORT"], 13 | default_index=app.config["INDEX"], default_doctype=app.config["DOC_TYPE"]) 14 | return datastore 15 | 16 | @app.route('/issues/', methods=['GET']) 17 | @requires_auth 18 | # Takes the time in days as an argument 19 | def get_issues(): 20 | from_time = request.args.get("start_time") 21 | to_time = request.args.get("end_time") 22 | start = int(request.args.get("from", 0)) 23 | end = int(request.args.get("size", 100)) 24 | start_time = int(arrow.get(from_time).float_timestamp * 1000) 25 | end_time = int(arrow.get(to_time).float_timestamp * 1000) 26 | false_positive = request.args.get("false_positive", "false") 27 | sort_order = ElasticSearchHelpers.create_sort(True) 28 | time_filter = ElasticSearchHelpers.create_timestamp_filter(start_time, end_time) 29 | query_filter = ElasticSearchHelpers.create_query_string_filter("false_positive:" + false_positive) 30 | try: 31 | query = ElasticSearchHelpers.create_elasticsearch_filtered_query(filtered_query=query_filter, 32 | timestamp_filter=time_filter, 33 | sort_order=sort_order) 34 | datastore = get_data_store() 35 | params = dict(from_=start) 36 | params["size"] = end 37 | results = datastore.search(query=query, params=params) 38 | issues = make_issues_object(results["hits"]["hits"], results["hits"]["total"]) 39 | response = make_response(json.dumps(issues)) 40 | response.headers["Content-Type"] = "application/json" 41 | return response 42 | except DataStoreException: 43 | return "Failed to retrieve issues", 500 44 | 45 | @app.route('/issue/') 46 | @requires_auth 47 | def get_issue(issue_id): 48 | try: 49 | datastore = get_data_store() 50 | result = datastore.get(issue_id=issue_id) 51 | response = make_response(json.dumps(result)) 52 | response.headers["Content-Type"] = "application/json" 53 | return response 54 | except DataStoreException: 55 | return "Failed to retrieve issues", 500 56 | 57 | @app.route('/issues/commit/') 58 | @requires_auth 59 | def get_issues_by_commit(commit_id): 60 | start = int(request.args.get("from", 0)) 61 | end = int(request.args.get("size", 100)) 62 | query = ElasticSearchHelpers.create_elasticsearch_simple_query(search_parameter="commit_id", 63 | search_string=commit_id) 64 | query["from_"] = start 65 | query["size"] = end 66 | try: 67 | datastore = get_data_store() 68 | results = datastore.search(params=query) 69 | issues = make_issues_object(results["hits"]["hits"], results["hits"]["total"]) 70 | response = make_response(json.dumps(issues)) 71 | response.headers["Content-Type"] = "application/json" 72 | return response 73 | except DataStoreException: 74 | return "Failed to retrieve issues by commit", 500 75 | 76 | @app.route('/issues/rule/') 77 | @requires_auth 78 | def get_issues_by_rule(check_id): 79 | start = int(request.args.get("from", 0)) 80 | end = int(request.args.get("size", 100)) 81 | query = ElasticSearchHelpers.create_elasticsearch_simple_query(search_parameter="check_id", 82 | search_string=urllib.quote_plus(check_id)) 83 | query["from_"] = start 84 | query["size"] = end 85 | try: 86 | datastore = get_data_store() 87 | results = datastore.search(params=query) 88 | issues = make_issues_object(results["hits"]["hits"], results["hits"]["total"]) 89 | response = make_response(json.dumps(issues)) 90 | response.headers["Content-Type"] = "application/json" 91 | return response 92 | except DataStoreException: 93 | return "Failed to retrieve issues by rule", 500 94 | 95 | @app.route('/issues/reviewer/') 96 | @requires_auth 97 | def get_issues_by_reviewer(reviewer): 98 | start = int(request.args.get("from", 0)) 99 | end = int(request.args.get("size", 100)) 100 | query = ElasticSearchHelpers.create_elasticsearch_simple_query(search_parameter="last_reviewer", 101 | search_string=urllib.quote_plus(reviewer)) 102 | query["from_"] = start 103 | query["size"] = end 104 | try: 105 | datastore = get_data_store() 106 | results = datastore.search(params=query) 107 | issues = make_issues_object(results["hits"]["hits"], results["hits"]["total"]) 108 | response = make_response(json.dumps(issues)) 109 | response.headers["Content-Type"] = "application/json" 110 | return response 111 | except DataStoreException: 112 | return "Failed to retrieve issues by reviewer", 500 113 | 114 | @app.route('/issues/repo/') 115 | @requires_auth 116 | def get_issues_by_repo(repo): 117 | start = int(request.args.get("from", 0)) 118 | end = int(request.args.get("size", 100)) 119 | query = ElasticSearchHelpers.create_elasticsearch_simple_query(search_parameter="repo_name", 120 | search_string=urllib.quote_plus(repo)) 121 | query["from_"] = start 122 | query["size"] = end 123 | try: 124 | datastore = get_data_store() 125 | results = datastore.search(params=query) 126 | issues = make_issues_object(results["hits"]["hits"], results["hits"]["total"]) 127 | response = make_response(json.dumps(issues)) 128 | response.headers["Content-Type"] = "application/json" 129 | return response 130 | except DataStoreException: 131 | return "Failed to retrieve issues by reviewer", 500 132 | 133 | @app.route('/issue/get_contents/') 134 | @requires_auth 135 | def get_file_contents_by_commit(commit_id): 136 | file_path = request.args.get("file_path") 137 | repo = request.args.get("repo") 138 | if not file_path or not repo: 139 | return "File Path/Repository is required", 400 140 | github_querier = GitRepoQuerier(app.config["ORG_NAME"], app.config["GITHUB_TOKEN"]) 141 | file_contents = github_querier.get_file_contents(repo=repo, filename=file_path, commit_id=commit_id) 142 | response = make_response(file_contents) 143 | response.headers["Content-Type"] = "text/plain" 144 | return response 145 | 146 | @app.route('/issue/status/', methods=['PUT']) 147 | def update_issue_state(issue_id): 148 | if "status" in request.form and "current_user" in request.form: 149 | changed_status = request.form["status"] 150 | current_user = request.form["current_user"] 151 | else: 152 | return "Changed Status Value Required.", 400 153 | 154 | doc = ElasticSearchHelpers.create_elasticsearch_doc({"false_positive": changed_status, 155 | "last_reviewer": current_user}) 156 | try: 157 | datastore = get_data_store() 158 | datastore.update(index_id=issue_id, doc=doc) 159 | response = make_response("Completed") 160 | return response 161 | except DataStore, e: 162 | return "Failed to update issue status", 500 163 | 164 | 165 | def make_issues_object(results,total): 166 | issues = dict(total=total) 167 | issues["issues"] = [] 168 | for result in results: 169 | issue = dict(id=result["_id"]) 170 | issue["_source"] = result["_source"] 171 | issues["issues"].append(issue) 172 | return issues 173 | 174 | -------------------------------------------------------------------------------- /guardserver/setup_ui_dependencies.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Setup the static dependencies for Guard UI 4 | 5 | # Create directories 6 | mkdir static/css 7 | mkdir static/js 8 | 9 | # Download Bootstrap 10 | curl https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css > static/css/bootstrap.min.css 11 | curl https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js > static/js/bootstrap.min.js 12 | 13 | # Download jQuery 14 | curl https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js > static/js/jquery.min.js 15 | 16 | # Get Prettify 17 | curl https://google-code-prettify.googlecode.com/svn/loader/run_prettify.js > static/js/prettify.js 18 | 19 | # Get datepicker 20 | curl https://raw.githubusercontent.com/eternicode/bootstrap-datepicker/release/js/bootstrap-datepicker.js > static/js/bootstrap-datepicker.js 21 | curl https://raw.githubusercontent.com/eternicode/bootstrap-datepicker/release/js/datepicker3.js > static/css/datepicker3.css 22 | curl https://raw.githubusercontent.com/eternicode/bootstrap-datepicker/release/css/datepicker.css > static/css/datepicker.css 23 | 24 | # Get Typeahead and Bloodhound 25 | curl https://raw.githubusercontent.com/twitter/typeahead.js/master/dist/typeahead.bundle.min.js > static/js/typeahead.bundle.min.js 26 | -------------------------------------------------------------------------------- /guardserver/static/css/repoguard.css: -------------------------------------------------------------------------------- 1 | body{ 2 | color: rgba(0,0,0,.6); 3 | font-weight: 300; 4 | } 5 | 6 | .navbar-inverse{ 7 | background-color: rgba(0,0,0,.9); 8 | letter-spacing: 1.5px; 9 | } 10 | 11 | .navbar-inverse li a{ 12 | padding-left: 2em; 13 | padding-right: 2em; 14 | } 15 | 16 | .nav-emphasis{ 17 | color: #d7d7d7; 18 | } 19 | 20 | .navbar-brand img{ 21 | max-width: 80%; 22 | max-height: 80%; 23 | margin-bottom: 10em; 24 | } 25 | 26 | .navbar-brand:hover .nav-emphasis{ 27 | color: #fff; 28 | } 29 | 30 | .content { 31 | margin-top: 80px; 32 | } 33 | 34 | a { 35 | color: #27AAE1; 36 | } 37 | 38 | .input-daterange { 39 | margin-top: 7px; 40 | } 41 | 42 | .input-date-field, 43 | .dropdown-toggle, 44 | .input-group-sm > .input-group-btn > .btn.dropdown-toggle, 45 | .typeahead { 46 | border: none; 47 | border-bottom: 1px dashed #27AAE1; 48 | background: none; 49 | box-shadow: none; 50 | color: rgba(0,0,0,.7); 51 | text-transform: uppercase; 52 | font-weight: 200; 53 | } 54 | 55 | .typeahead{ 56 | text-transform: none; 57 | } 58 | 59 | .table { 60 | table-layout: fixed; 61 | margin-top: 30px; 62 | } 63 | 64 | .table-striped > tbody > tr:nth-child(odd) > td { 65 | background-color: #f9f9f9; 66 | } 67 | 68 | .table-striped > tbody > tr:hover > td{ 69 | background-color: #f7fdff; 70 | } 71 | 72 | .table th, .table td { word-wrap: break-word } 73 | 74 | th, 75 | .navbar{ 76 | font-weight: 400; 77 | text-transform: uppercase; 78 | font-kerning: normal; 79 | letter-spacing: 1px; 80 | } 81 | 82 | th{ 83 | font-size: .8em; 84 | } 85 | 86 | .btn-status { 87 | float: right; 88 | vertical-align: middle; 89 | visibility: hidden; 90 | border: 1px solid #ccc; 91 | background-color: #f9f9f9; 92 | text-transform: uppercase; 93 | font-weight: 200; 94 | font-size: .5em; 95 | color: rgba(0,0,0,.4); 96 | letter-spacing: .6px; 97 | border-radius: 3px; 98 | } 99 | 100 | .btn-status .glyphicon{ 101 | font-size: 2em; 102 | opacity: .6; 103 | } 104 | 105 | .btn-status:hover{ 106 | background-color: #e58481; 107 | border: 1px solid #d9534f; 108 | color: rgba(255,255,255,.6); 109 | } 110 | 111 | tr:hover .btn-status { 112 | visibility: visible; 113 | } 114 | 115 | .btn-refresh { 116 | background: none; 117 | color: rgba(0,0,0,.3); 118 | font-size: .8em; 119 | padding: 0; 120 | line-height: normal; 121 | margin-left: .5em; 122 | border: none; 123 | } 124 | 125 | .pager .disabled > a, 126 | .pager .disabled > a:hover, 127 | .pager .disabled > a:focus, 128 | .pager .previous > a, 129 | .pager .next > a, 130 | .pager .li > a { 131 | border: none; 132 | color: #27AAE1; 133 | } 134 | 135 | .btn-danger:hover{ 136 | background: none; 137 | color: #e58481; 138 | } 139 | 140 | 141 | 142 | .modal-title{ 143 | text-transform: uppercase; 144 | font-weight: 300; 145 | letter-spacing: 1px; 146 | font-size: .9em; 147 | padding-top: .5em; 148 | } 149 | 150 | pre.prettyprint, 151 | pre.prettyprinted{ 152 | border: 1px solid #ccc; 153 | border-radius: 0; 154 | } 155 | 156 | .tt-query, 157 | .tt-hint { 158 | width: 396px; 159 | height: 30px; 160 | padding: 8px 12px; 161 | font-size: 12px; 162 | line-height: 30px; 163 | border: 2px solid #ccc; 164 | border-radius: 8px; 165 | outline: none; 166 | } 167 | 168 | .tt-query { 169 | box-shadow: inset 0 1px 1px rgba(0, 0, 0, 0.075); 170 | } 171 | 172 | .tt-hint { 173 | color: #999 174 | } 175 | 176 | .tt-dropdown-menu { 177 | width: 422px; 178 | margin-top: 12px; 179 | padding: 8px 0; 180 | background-color: #fff; 181 | border: 1px solid #ccc; 182 | border: 1px solid rgba(0, 0, 0, 0.2); 183 | border-radius: 8px; 184 | box-shadow: 0 5px 10px rgba(0,0,0,.2); 185 | } 186 | 187 | .tt-suggestion { 188 | padding: 3px 20px; 189 | font-size: 12px; 190 | line-height: 24px; 191 | } 192 | 193 | .tt-suggestion.tt-is-under-cursor { /* UPDATE: newer versions use .tt-suggestion.tt-cursor */ 194 | color: #fff; 195 | background-color: #0097cf; 196 | 197 | } 198 | .tt-input { 199 | font-size: 12px; 200 | } 201 | 202 | .tt-suggestion p { 203 | margin: 0; 204 | } 205 | -------------------------------------------------------------------------------- /guardserver/static/fonts/glyphicons-halflings-regular.eot: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prezi/repoguard/e0bf5ca9ecdff3472740f07e416ae85133f5a914/guardserver/static/fonts/glyphicons-halflings-regular.eot -------------------------------------------------------------------------------- /guardserver/static/fonts/glyphicons-halflings-regular.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prezi/repoguard/e0bf5ca9ecdff3472740f07e416ae85133f5a914/guardserver/static/fonts/glyphicons-halflings-regular.ttf -------------------------------------------------------------------------------- /guardserver/static/fonts/glyphicons-halflings-regular.woff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prezi/repoguard/e0bf5ca9ecdff3472740f07e416ae85133f5a914/guardserver/static/fonts/glyphicons-halflings-regular.woff -------------------------------------------------------------------------------- /guardserver/static/img/repo_guard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prezi/repoguard/e0bf5ca9ecdff3472740f07e416ae85133f5a914/guardserver/static/img/repo_guard.png -------------------------------------------------------------------------------- /guardserver/static/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | Repoguard Issues 22 | 23 | 24 | 40 | 41 | 42 |
43 |
44 |
45 |
46 | 58 | 59 | 60 | 63 | 64 |
65 |
66 |
67 |
68 | 69 | to 70 | 71 | 72 | 75 | 76 |
77 |
78 |
79 |
80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 |
RepositoryMatching LineFile NameCommit DescriptionTriggered ByReviewed
95 | 99 |
100 |
101 | 102 | 119 | 120 | 121 | 122 | -------------------------------------------------------------------------------- /guardserver/static/js/daterange.js: -------------------------------------------------------------------------------- 1 | jQuery(function(){ 2 | var current_date = new Date((new Date()).toISOString()); 3 | var default_from_date = new Date(); 4 | default_from_date.setDate(default_from_date.getDate() - 7); 5 | $("#to-date").datepicker("setDate", current_date); 6 | $("#from-date").datepicker("setDate", default_from_date); 7 | 8 | localStorage.setItem("end_date", current_date.toISOString()); 9 | localStorage.setItem("start_date", default_from_date.toISOString()); 10 | 11 | $("#refresh-issues").click(function(){ 12 | localStorage.setItem("end_date", $("#to-date").datepicker("getDate").toISOString()); 13 | localStorage.setItem("start_date", $("#from-date").datepicker("getDate").toISOString()); 14 | get_issues(); 15 | }) 16 | }); -------------------------------------------------------------------------------- /guardserver/static/js/filters.js: -------------------------------------------------------------------------------- 1 | jQuery(function(){ 2 | var filter = $("#filter"); 3 | $("#commit").click(function(){ 4 | var engine = new Bloodhound({ 5 | name: 'commits', 6 | prefetch: { 7 | url: '/filter/commits?size=' + localStorage.getItem("issues") 8 | }, 9 | datumTokenizer: function(data) { 10 | return Bloodhound.tokenizers.whitespace(data.description); 11 | }, 12 | queryTokenizer: Bloodhound.tokenizers.whitespace 13 | }); 14 | 15 | initialize_typeahead(engine, filter); 16 | 17 | $("#filter").typeahead(null, { 18 | displayKey: function(data) { 19 | return data.description; 20 | }, 21 | source: engine.ttAdapter() 22 | }).on('typeahead:selected', function(event, datum) { 23 | localStorage.setItem("current_url", "/issues/commit/" + datum.commit); 24 | get_issues(); 25 | }); 26 | }); 27 | 28 | $("#rule").click(function(){ 29 | var engine = new Bloodhound({ 30 | name: 'rules', 31 | prefetch: { 32 | url: '/filter/rules?size=' + localStorage.getItem("issues") 33 | }, 34 | datumTokenizer: function(data) { 35 | return Bloodhound.tokenizers.whitespace(data); 36 | }, 37 | queryTokenizer: Bloodhound.tokenizers.whitespace 38 | }); 39 | 40 | initialize_typeahead(engine, filter); 41 | 42 | filter.typeahead(null, { 43 | displayKey: function(data) { 44 | return data; 45 | }, 46 | source: engine.ttAdapter() 47 | }).on('typeahead:selected', function(event, datum) { 48 | localStorage.setItem("current_url", "/issues/rule/" + datum); 49 | get_issues(); 50 | }); 51 | }); 52 | 53 | $("#reviewer").click(function(){ 54 | var engine = new Bloodhound({ 55 | name: 'reviewers', 56 | prefetch: { 57 | url: '/filter/reviewers?size=' + localStorage.getItem("issues") 58 | }, 59 | datumTokenizer: function(data) { 60 | return Bloodhound.tokenizers.whitespace(data); 61 | }, 62 | queryTokenizer: Bloodhound.tokenizers.whitespace 63 | }); 64 | 65 | initialize_typeahead(engine, filter); 66 | filter.typeahead(null, { 67 | displayKey: function(data) { 68 | return data; 69 | }, 70 | source: engine.ttAdapter() 71 | }).on('typeahead:selected', function(event, datum) { 72 | localStorage.setItem("current_url", "/issues/reviewer/" + datum); 73 | get_issues(); 74 | }); 75 | }); 76 | 77 | $("#repo").click(function(){ 78 | var engine = new Bloodhound({ 79 | name: 'repos', 80 | prefetch: { 81 | url: '/filter/repos' 82 | }, 83 | datumTokenizer: function(data) { 84 | return Bloodhound.tokenizers.whitespace(data); 85 | }, 86 | queryTokenizer: Bloodhound.tokenizers.whitespace 87 | }); 88 | 89 | initialize_typeahead(engine, filter); 90 | filter.typeahead(null, { 91 | displayKey: function(data) { 92 | return data; 93 | }, 94 | source: engine.ttAdapter() 95 | }).on('typeahead:selected', function(event, datum) { 96 | localStorage.setItem("current_url", "/issues/repo/" + datum); 97 | get_issues(); 98 | }); 99 | }); 100 | 101 | 102 | $("#reset-filter").click(function(){ 103 | $("#filter").typeahead('destroy').off('typeahead:selected').val(""); 104 | localStorage.setItem("current_url", "/issues/"); 105 | get_issues(); 106 | }); 107 | 108 | }); 109 | 110 | function initialize_typeahead(engine, filter) { 111 | engine.clearPrefetchCache(); 112 | engine.initialize(); 113 | 114 | filter.typeahead('destroy'); 115 | filter.off('typeahead:selected'); 116 | 117 | } -------------------------------------------------------------------------------- /guardserver/static/js/issues.js: -------------------------------------------------------------------------------- 1 | jQuery(function(){ 2 | var issue_data = Object(); 3 | issue_data.current_page = 1; 4 | issue_data.size = 10; 5 | localStorage.setItem("issue_data", JSON.stringify(issue_data)); 6 | localStorage.setItem("false_positive", "false"); 7 | localStorage.setItem("current_url", "/issues/"); 8 | get_issues(); 9 | 10 | $("#true-positive").click(function(){ 11 | localStorage.setItem("false_positive", "false"); 12 | reset_pagination(); 13 | }); 14 | 15 | $("#false-positive").click(function(){ 16 | localStorage.setItem("false_positive", "false"); 17 | reset_pagination(); 18 | }); 19 | }); 20 | 21 | function get_issues() { 22 | url = localStorage.getItem("current_url"); 23 | var false_positive = localStorage.getItem("false_positive") === "true"; 24 | var page_state = JSON.parse(localStorage.getItem("issue_data")); 25 | var start_time = localStorage.getItem("start_date"); 26 | var end_time = localStorage.getItem("end_date"); 27 | var params = Object(); 28 | params.start_time = start_time; 29 | params.end_time = end_time; 30 | params.from = (page_state.current_page - 1) * page_state.size; 31 | params.size = page_state.size; 32 | params.false_positive = false_positive; 33 | $.getJSON(url, params=params, function(data){ 34 | add_issues_to_table(data.issues, "#issue-body"); 35 | localStorage.setItem("issues", data.total); 36 | }) 37 | } 38 | 39 | function add_issues_to_table(data, dom_element) { 40 | $(dom_element).empty(); 41 | $.each(data, function(){ 42 | var source = this["_source"]; 43 | var status_change = source["false_positive"] == "true" ? ["
Valid", false]: ["
False", true]; 44 | var table_row = '' + 45 | "" + source["repo_name"] + "" + 46 | "" + $("
").text(source["matching_line"]).html() + "" + 47 | '' + source["filename"] + "" + 48 | "" + source["commit_description"] + "" + 49 | "" + source["check_id"] + "" + 50 | "" + source["last_reviewer"] + 51 | '' + 53 | ""; 54 | $(dom_element).prepend(table_row); 55 | $("#" + source["commit_id"]).click(function(){ 56 | var commit_id = $(this).attr('id'); 57 | var params = Object(); 58 | params.repo = $(this).closest('tr').attr('data-repo'); 59 | params.file_path = $(this).text(); 60 | $.get("/issue/get_contents/" + commit_id, params=params, function(data){ 61 | $('#code-space').removeClass("prettyprinted"); 62 | $("#code-space").append($("
").text(data).html()); 63 | $("#code-holder").modal('show'); 64 | }); 65 | }); 66 | $("#" + this["id"]).click(function(){ 67 | var index_id = $(this).attr('id'); 68 | var data_status = $(this).attr('data-status'); 69 | var table_row = $(this).closest('tr'); 70 | var params = Object(); 71 | params.status = (data_status === "true"); 72 | params.current_user = localStorage.getItem("current_user"); 73 | 74 | $.ajax({ 75 | url: "/issue/status/" + index_id, 76 | type: 'PUT', 77 | data: params, 78 | success: function(data) { 79 | $(table_row).remove(); 80 | } 81 | }) 82 | }) 83 | }); 84 | } 85 | -------------------------------------------------------------------------------- /guardserver/static/js/paginate.js: -------------------------------------------------------------------------------- 1 | jQuery(function(){ 2 | 3 | var next_button = $("#next"); 4 | var previous_button = $("#previous"); 5 | 6 | previous_button.addClass("disabled"); 7 | 8 | var page_state = JSON.parse(localStorage.getItem("issue_data")); 9 | 10 | 11 | if (page_state.size > localStorage.getItem("issues")) { 12 | next_button.addClass("disabled"); 13 | } 14 | 15 | previous_button.click(function(){ 16 | change_page(false, this, "#next"); 17 | }); 18 | 19 | next_button.click(function(){ 20 | change_page(true, this, "#previous"); 21 | }); 22 | 23 | $("#false-positive").click(function(){ 24 | localStorage.setItem("false_positive", "true"); 25 | $(this).parent().parent().find('.active').removeClass('active'); 26 | $(this).parent().addClass("active"); 27 | get_issues(); 28 | }); 29 | 30 | $("#true-positive").click(function(){ 31 | localStorage.setItem("false_positive", "false"); 32 | $(this).parent().parent().find('.active').removeClass('active'); 33 | $(this).parent().addClass("active"); 34 | get_issues(); 35 | }) 36 | }); 37 | 38 | function change_page(next_page, this_element, dom_element) { 39 | var page_state, issues_size; 40 | page_state = JSON.parse(localStorage.getItem("issue_data")); 41 | issues_size = localStorage.getItem("issues"); 42 | if (next_page) { 43 | page_state.current_page += 1; 44 | if (page_state.current_page * page_state.size > issues_size) { 45 | $(this_element).addClass("disabled"); 46 | } 47 | } 48 | else { 49 | if (page_state.current_page != 1) { 50 | page_state.current_page -= 1; 51 | if (page_state.current_page == 1) { 52 | $(this_element).addClass("disabled"); 53 | } 54 | } 55 | } 56 | 57 | if ($(dom_element).hasClass("disabled")) { 58 | $(dom_element).removeClass("disabled"); 59 | } 60 | localStorage.setItem("issue_data", JSON.stringify(page_state)); 61 | get_issues(); 62 | } 63 | 64 | function reset_pagination() { 65 | var page_state = JSON.parse(localStorage.getItem("issue_data")); 66 | page_state.current_page = 1; 67 | localStorage.setItem("issue_data", JSON.stringify(page_state)); 68 | } -------------------------------------------------------------------------------- /guardserver/static/js/user.js: -------------------------------------------------------------------------------- 1 | jQuery(function(){ 2 | $.getJSON("/current_user", function(data){ 3 | current_name = data.name; 4 | $("#current-user").append(current_name); 5 | localStorage.setItem("current_user", current_name); 6 | }); 7 | }); 8 | -------------------------------------------------------------------------------- /repominer.py: -------------------------------------------------------------------------------- 1 | # !/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | import argparse 4 | import os 5 | import os.path 6 | import sys 7 | import datetime 8 | 9 | from core.codechecker import CodeCheckerFactory, Alert 10 | from core.evaluators import LineEvalFactory 11 | from core.ruleparser import load_rules, build_resolved_ruleset 12 | from core.datastore import DataStore, DataStoreException 13 | 14 | 15 | def check_alerts_in_file(code_checker, file, filename): 16 | content = file.readlines() 17 | check_context = {"filename": filename} 18 | result = code_checker.check(content, check_context) 19 | 20 | def create_alert(rule, vuln_line): 21 | vuln_line_number = content.index(vuln_line) + 1 22 | return Alert(rule, filename, repo='', commit='', line=line, line_number=vuln_line_number) 23 | 24 | actual_alerts = [create_alert(rule, line) for rule, line in result] 25 | return actual_alerts 26 | 27 | 28 | parser = argparse.ArgumentParser(description='Check a sourcecode repo') 29 | parser.add_argument('--rule-dir', default="rules/", help='Directory of rules') 30 | parser.add_argument('--alerts', '-a', default=False, 31 | help='Limit running only the given alert checks (comma separated list)') 32 | parser.add_argument('--store', '-S', default=False, help='ElasticSearch node (host:port)') 33 | parser.add_argument('files', metavar='file', nargs='*', default=None, help='Files to check') 34 | args = parser.parse_args() 35 | 36 | bare_rules = load_rules(args.rule_dir) 37 | resolved_rules = build_resolved_ruleset(bare_rules) 38 | 39 | # filter for items in --alerts parameter 40 | enabled_alerts = [a.strip() for a in args.alerts.split(',')] if args.alerts else False 41 | applied_alerts = {aid: adata for aid, adata in resolved_rules.iteritems() 42 | if not enabled_alerts or any(aid.startswith(ea) for ea in enabled_alerts)} 43 | 44 | if not applied_alerts: 45 | print "No matching alerts" 46 | sys.exit() 47 | if not args.files: 48 | print "No files given." 49 | parser.print_help() 50 | sys.exit() 51 | 52 | code_checker = CodeCheckerFactory(applied_alerts).create(LineEvalFactory.MODE_SINGLE) 53 | 54 | textchars = ''.join(map(chr, [7, 8, 9, 10, 12, 13, 27] + range(0x20, 0x100))) 55 | is_binary_string = lambda bytes: bool(bytes.translate(None, textchars)) 56 | for path in args.files: 57 | print "Checking " + path 58 | alerts = [] 59 | if os.path.isdir(path): 60 | for root, subFolders, files in os.walk(path): 61 | for fname in files: 62 | fpath = os.path.join(root, fname) 63 | if not os.path.islink(fpath): 64 | with open(fpath) as f: 65 | if is_binary_string(f.read(128)): 66 | continue 67 | else: 68 | f.seek(0) 69 | alerts.extend(check_alerts_in_file(code_checker, f, fname)) 70 | else: 71 | with open(path) as f: 72 | alerts.extend(check_alerts_in_file(code_checker, f, path)) 73 | 74 | data_store = None 75 | if args.store: 76 | (host, port) = args.store.split(":") 77 | data_store = DataStore(host=host, port=port, default_doctype="repoguard", default_index="repoguard") 78 | 79 | for alert in alerts: 80 | print 'file:\t%s:%s\nrule:\t%s\nline:\t%s\ndescr:\t%s\n' % ( 81 | alert.filename, alert.line_number, alert.rule.name, 82 | alert.line[0:200].strip().replace("\t", " ").decode('utf-8', 'replace'), alert.rule.description, 83 | ) 84 | if args.store: 85 | try: 86 | body = { 87 | "check_id": alert.rule.name, 88 | "description": alert.rule.description, 89 | "filename": alert.filename, 90 | "commit_id": alert.commit, 91 | "matching_line": alert.line[0:200].replace("\t", " ").decode('utf-8', 'replace'), 92 | "line_number": alert.line_number, 93 | "diff_line_number": alert.diff_line_number, 94 | "repo_name": alert.repo, 95 | "@timestamp": datetime.datetime.utcnow().isoformat() + 'Z', 96 | "type": "repoguard", 97 | "false_positive": False, 98 | "last_reviewer": "repoguard", 99 | "author": alert.author, 100 | "commit_description": alert.commit_description 101 | } 102 | 103 | data_store.store(body=body) 104 | except DataStoreException: 105 | print 'Got exception during storing results to ES.' 106 | -------------------------------------------------------------------------------- /requirements-test.txt: -------------------------------------------------------------------------------- 1 | httpretty==0.5.13 2 | nose==1.3.0 3 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests==1.1.0 2 | argparse==1.2.1 3 | elasticsearch==1.2.0 4 | pyyaml==3.11 5 | jsonpickle==0.8.0 6 | flask==0.10.1 7 | arrow==0.4.4 8 | pygithub==1.25.1 9 | python-ldap==2.4.16 10 | lockfile==0.10.2 11 | raven==5.26 12 | mock==1.0.1 13 | -------------------------------------------------------------------------------- /rules/action_script.yml: -------------------------------------------------------------------------------- 1 | --- #!~base 2 | description: ActionScript related rules 3 | extends: whitelisted_files::whitelisted_files,comments::comments 4 | file: 5 | - match: '\.as$' 6 | 7 | --- #!eval 8 | diff: add 9 | extends: base 10 | case_sensitive: true 11 | line: 12 | - match: 'loadBytes\s*\(' 13 | description: Loader.loadBytes injects bytes into the security context of your application 14 | tests: 15 | - pass: " _loader.loadBytes(bytes);" 16 | - fail: ' LOADBYTES()' 17 | 18 | --- #!allow_code_import_usage 19 | extends: base 20 | diff: add 21 | case_sensitive: true 22 | line: 23 | - match: allowCodeImport 24 | description: "allowCodeImport allows arbitrary ActionScript to run in the given context which is dangerous, see: http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/system/LoaderContext.html#allowCodeImport" 25 | tests: 26 | - pass: "allowCodeImport somewhere in the line" 27 | - fail: "AllowCodeImport" 28 | 29 | --- #!dangerous_calls 30 | extends: base 31 | diff: add 32 | case_sensitive: true 33 | description: "Dangerous function calls" 34 | line: 35 | - match: (loadVariables|navigateToURL|loadMovie|getURL|FScrollPane\.loadScrollContent|LoadVars\.load|LoadVars\.send|XML\.load|Sound\.loadSound|NetStream\.play|asFunction|clickTAG).* 36 | test: 37 | - pass: ' navigateToURL(new URLRequest(user_input), windowName);' 38 | 39 | --- #!external_interface_call 40 | extends: base 41 | diff: add 42 | case_sensitive: true 43 | description: "ExternalInterface parameters need to be HTML escaped (all of them), otherwise they can cause XSS: https://soroush.secproject.com/blog/2011/03/flash-externalinterface-call-javascript-injection-%E2%80%93-can-make-the-websites-vulnerable-to-xss/" 44 | line: 45 | - match: 'ExternalInterface\(' 46 | test: 47 | - pass: ' ExternalInterface.call();' 48 | -------------------------------------------------------------------------------- /rules/alert_config_exported.yml: -------------------------------------------------------------------------------- 1 | --- #!~base 2 | description: Alert configs exported from the previous format 3 | extends: whitelisted_files::whitelisted_files,comments::comments 4 | 5 | --- #!redirect_field_usage 6 | extends: base 7 | diff: add 8 | line: 9 | - match: request.*\.get\(('next'|redirect_field).* 10 | 11 | --- #!unsafe_as_htmlText 12 | extends: base 13 | diff: add 14 | line: 15 | - match: \.htmlText.*=.* 16 | - except: .*=.*htmlEscape.* 17 | file: 18 | - match: .*\.(as) 19 | 20 | --- #!csharp_base64 21 | extends: base 22 | diff: add 23 | line: 24 | - match: \.ToBase64String.* 25 | file: 26 | - match: .*\.(cs) 27 | 28 | --- #!new_url_endpoint 29 | extends: base 30 | diff: add 31 | line: 32 | - match: \+\s+(url)*\(r'\^.*\$',.* 33 | file: 34 | - match: .*/urls\.py 35 | 36 | --- #!jsonp_middleware 37 | extends: base 38 | diff: add 39 | line: 40 | - match: middleware.*(JSONP|jsonp).* 41 | 42 | --- #!django_xfo_exempt 43 | extends: base 44 | diff: add 45 | line: 46 | - match: "@xframe_options_exempt" 47 | file: 48 | - match: .*\.py 49 | 50 | --- #!hashing_function 51 | extends: base 52 | diff: add 53 | line: 54 | - match: (sha|md)\d+\( 55 | 56 | --- #!jsonp_usage 57 | extends: base 58 | diff: add 59 | line: 60 | - match: \$\.jsonp\( 61 | 62 | --- #!direct_prezi_access_helpers 63 | extends: base 64 | diff: add 65 | line: 66 | - match: zuibackend\.cache(\.get_prezi|\s+import\s+get_prezi).* 67 | file: 68 | - match: .*\.(py) 69 | 70 | --- #!csharp_storage 71 | extends: base 72 | diff: add 73 | line: 74 | - match: Windows\.Storage.* 75 | file: 76 | - match: .*\.(cs) 77 | 78 | --- #!http_variable_used_js 79 | extends: base 80 | diff: add 81 | line: 82 | - match: (Node\.url\.parse|\.query\.) 83 | file: 84 | - match: '\.(hx|js)$' 85 | 86 | --- #!cross-domain-policy-access 87 | extends: base 88 | diff: add 89 | line: 90 | - match: (allow-access-from domain="\*").* 91 | 92 | --- #!s3_set_object_acl 93 | extends: base 94 | diff: add 95 | line: 96 | - match: (setObjectAcl|set_acl|set_canned_acl).* 97 | 98 | --- #!http_response_js 99 | extends: base 100 | diff: add 101 | line: 102 | - match: (\.end\(.*req\.).* 103 | file: 104 | - match: .*\.(hx|js) 105 | 106 | --- #!direct_prezi_access 107 | extends: base 108 | diff: add 109 | line: 110 | - match: Presentation\.objects\.(all|get_by_id_or_oid).* 111 | file: 112 | - match: .*\.(py) 113 | 114 | --- #!chef_new_vhost 115 | extends: base 116 | diff: add 117 | line: 118 | - match: (apache|nginx)\_vhost.* 119 | file: 120 | - match: .*\.(rb) 121 | 122 | --- #!urlopen 123 | extends: base 124 | diff: add 125 | line: 126 | - match: (urlopen\(|pycurl\.Curl|_make_call\() 127 | file: 128 | - match: .*\.(py) 129 | 130 | --- #!chef_new_services 131 | extends: base 132 | diff: add 133 | line: 134 | - match: (prezi\_django\_app\_base\_placement|prezi\_placement\_deploy|supervisor\_service|apache\_vhost|Virtualhost ).* 135 | file: 136 | - match: .*\.(rb|erb) 137 | 138 | --- #!root_what 139 | extends: base 140 | diff: add 141 | line: 142 | - match: ("|')root("|')root.* 143 | 144 | --- #!django_xfo_remove 145 | extends: base 146 | diff: add 147 | line: 148 | - match: \-.*clickjacking.XFrameOptionsMiddleware.* 149 | file: 150 | - match: .*\.(py|cfg) 151 | 152 | --- #!js_file_access 153 | extends: base 154 | diff: add 155 | line: 156 | - match: (Node\.fs\.).* 157 | file: 158 | - match: .*\.(hx|js) 159 | 160 | --- #!interesting_object_get 161 | extends: base 162 | diff: add 163 | line: 164 | - match: objects\.get\(.*request\..* 165 | 166 | --- #!tastypie_usage 167 | extends: base 168 | diff: add 169 | line: 170 | - match: tastypie.* 171 | 172 | --- #!csharp_invokescript 173 | extends: base 174 | diff: add 175 | line: 176 | - match: WebBrowser\.InvokeScript.* 177 | file: 178 | - match: .*\.(cs) 179 | 180 | --- #!cpp_system_popen_calls 181 | extends: base 182 | diff: add 183 | case_sensitive: true 184 | line: 185 | - match: (system|popen|exec(ve)?|execvp?|execlp?)\(.* 186 | file: 187 | - match: .*\.(cpp|cxx|h|hpp|c)$ 188 | 189 | --- #!python_object_enumeration 190 | extends: base 191 | diff: add 192 | line: 193 | - match: (\.get_by_secondary_key\().* 194 | file: 195 | - match: .*\.(py) 196 | 197 | --- #!godauth_whitelist 198 | extends: base 199 | diff: add 200 | line: 201 | - match: "\"who\"\\: \"all\".*" 202 | file: 203 | - match: .*\.(json) 204 | 205 | --- #!csharp_request 206 | extends: base 207 | diff: add 208 | line: 209 | - match: client\.(Post|Get).*(Request).* 210 | file: 211 | - match: .*\.(cs) 212 | 213 | --- #!cpp_bof_calls 214 | extends: base 215 | diff: add 216 | line: 217 | - match: (strcpy|strcat|f?gets|memcpy)\(.* 218 | file: 219 | - match: .*\.(cpp|cxx|h|hpp|c)$ 220 | 221 | --- #!python_password_form 222 | extends: base 223 | diff: add 224 | line: 225 | - match: (forms\.PasswordInput).* 226 | file: 227 | - match: .*\.(py) 228 | 229 | --- #!chef_new_user_access 230 | extends: base 231 | diff: add 232 | line: 233 | - match: "(assessment_weekers =|\"users\": |recipe\\[users\\:\\:).*" 234 | file: 235 | - match: .*\.(rb|json) 236 | 237 | --- #!backbonejs_model_get 238 | extends: base 239 | diff: add 240 | line: 241 | - match: this\.model\.get.* 242 | file: 243 | - match: .*\.(js|html) 244 | 245 | --- #!python_remote_objects 246 | extends: base 247 | diff: add 248 | line: 249 | - match: (RemoteUserFactory).* 250 | file: 251 | - match: .*\.(py) 252 | 253 | --- #!chef_new_open_port 254 | extends: base 255 | diff: add 256 | line: 257 | - match: \+\s*bind\s*=.* 258 | file: 259 | - match: .*\.(rb|erb) 260 | 261 | --- #!unsafe_underscore_js_template 262 | extends: base 263 | diff: add 264 | line: 265 | - match: <%=.* 266 | file: 267 | - match: .*\.(js|html|tpl)$ 268 | - except: 'Gruntfile.js$' 269 | -------------------------------------------------------------------------------- /rules/android.yml: -------------------------------------------------------------------------------- 1 | --- #!~base 2 | description: Alerts for Android 3 | extends: whitelisted_files::whitelisted_files,comments::comments 4 | 5 | --- #!android_app_permission 6 | extends: base 7 | diff: add 8 | line: 9 | - match: (uses-permission android:name).* 10 | file: 11 | - match: .*\.xml 12 | 13 | --- #!android_storage 14 | extends: base 15 | diff: add 16 | line: 17 | - match: (getFilesDir|getExternalFilesDir|getExternalStorageDirectory|getSharedPreferences|addPreferencesFromResource).* 18 | file: 19 | - match: .*\.java 20 | 21 | --- #!android_webview 22 | extends: base 23 | diff: add 24 | line: 25 | - match: (addJavascriptInterface|setJavaScriptEnabled|webView\.loadData).* 26 | file: 27 | - match: .*\.java 28 | -------------------------------------------------------------------------------- /rules/chef.yml: -------------------------------------------------------------------------------- 1 | --- #!~base 2 | description: Chef related repoguard rules 3 | extends: whitelisted_files::whitelisted_files,comments::comments 4 | 5 | --- #!avoid_auto_restart 6 | diff: add 7 | extends: base 8 | case_sensitive: true 9 | line: 10 | - match: '(notifies|subscribes)\s*:restart' 11 | file: 12 | - match: '\.rb$' 13 | description: "Restarting services automatically with the notifies / subscribes chef notification is risky, we try to avoid it, please check if this commit doesn't pose significant threat to our availability." 14 | tests: 15 | - pass: " notifies :restart, 'service[ossec]', :delayed" 16 | - fail: 'notifies :reload, "service[syslog-ng]", :delayed' 17 | 18 | --- #!unencrypted_data_bag 19 | diff: add 20 | extends: base 21 | case_sensitive: true 22 | line: 23 | - match: '(Chef::DataBagItem\.load|=\s*data_bag_item\()' 24 | file: 25 | - match: '\.rb$' 26 | description: "Please check if the data read from the unencrypted data bag doesn't contain any secrets" 27 | tests: 28 | - pass: 'store_db = Chef::DataBagItem.load("graphite", "stores")' 29 | - pass: 'hostdb = data_bag_item("hosts","hosts")' 30 | - fail: "dbag = Chef::DataBagItem.new" 31 | 32 | --- #!apache_directory_listing 33 | extends: base 34 | diff: add 35 | case_sensitive: true 36 | line: 37 | - match: 'Options Indexes' 38 | file: 39 | - match: '\.conf(\.erb)?$' 40 | tests: 41 | - pass: ' Options Indexes FollowSymLinks MultiViews' 42 | - fail: ' Options -Indexes FollowSymLinks MultiViews' 43 | -------------------------------------------------------------------------------- /rules/comments.yml: -------------------------------------------------------------------------------- 1 | --- #!~comments 2 | diff: mod 3 | line: 4 | - except: '^\s*(//|#|/\*|