├── README.md ├── anathema ├── __init__.py ├── anathema.py └── signature.json ├── output.example.html ├── requirements.txt ├── scalp ├── __init__.py ├── scalp.py └── sexr.py └── scalp_xmldtd.dtd /README.md: -------------------------------------------------------------------------------- 1 | Scalp!/Anathema is a fork (or rather saving the code before GoogleCode collapses) of the original project originally hosted at GoogleCode (one of thousands forks, I feel); My aim is to rewrite outdated places (Scalp! was written at 2008, and since then Python has made a big step forward), add multiprocessing plus implement Anathema heuristic module. 2 | 3 | # Scalp! 4 | 5 | Scalp! is a log analyzer for the Apache web server that aims to look for security problems developed by Romain Gaucher. The main idea is to look through huge log files and extract the possible attacks that have been sent through HTTP/GET (By default, Apache does not log the HTTP/POST variable). 6 | 7 | default_filters.xml is a part of PHP IDS project; 8 | 9 | ## How it works 10 | Scalp is basically using the regular expression from the PHP-IDS project and matches the lines from the Apache access log file. These regexp has been chosen because of their quality and the top activity of the team maintaining that project. 11 | 12 | You will then need latest version of this file https://dev.itratos.de/projects/php-ids/repository/raw/trunk/lib/IDS/default_filter.xml in order to run Scalp. (actually, Scalp! can even download it for you :3 ) 13 | 14 | Scalp started as a simple python script which is still maintained, but I plan to focus my effort on the binary version (written in C++) for efficiency when it comes to scalp huge log files. 15 | 16 | ### Usage 17 | Scalp has a couple of options that may be useful in order to save time when scalping a huge log file or in order to perform a full examination; the default options are almost okay for log files of hundreds of MB. 18 | 19 | Current options: 20 | 21 | - exhaustive: Won't stop at the first pattern matched, but will test all the patterns 22 | - tough: Will decode a part of potential attacks (this is done to use better the regexp from PHP-IDS in order to - decrease the false-negative rate) 23 | - period: Specify a time-frame to look at, all the rest will be ignored 24 | - sample: Does a random sampling of the log lines in order to look at a certain percentage, this is useful when the user doesn't want to do a full scan of all the log, but just ping it to see if there is some problem... 25 | - attack: Specify what classes of vulnerabilities the tool will look at (eg, look only for XSS, SQL Injection, etc.) 26 | Example of utilization: 27 | 28 | ./scalp-0.4.py -l /var/log/httpd_log -f ./default_filter.xml -o ./scalp-output --html 29 | 30 | ### Help 31 | 32 | rgaucher@plop:~/work/scalp/branches$ ./scalp-0.4.py --help 33 | 34 | ### Features 35 | Since the main engine is done, I am currently focusing on the speed; for now, I am around 250000 lines of log in 170 seconds (which I consider not good, but okay compared to the Python's version I did before starting this one in C++) if I don't select an exhaustive list of the attacks (which means, it will not perform all the attack checking but stop at the first found -- based on criteria which is IMPACT > TYPE). To increase the speed, I am looking to use a multi-thread engine in order to take advantage of the muti-core processors. 36 | 37 | Beside the speed of this software, a couple of points are important: 38 | 39 | - output in many formats (TEXT, XML, HTML) 40 | - options in order to let the user do a pre-selection (mainly with a range of dates) 41 | - configuration of the format of the Apache log may come later... 42 | -------------------------------------------------------------------------------- /anathema/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nanopony/apache-scalp/94cf094fb36360a39de9c407e87841363dbd4500/anathema/__init__.py -------------------------------------------------------------------------------- /anathema/anathema.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | Anathema heuristic module 4 | by Nanopony 5 | 6 | Copyright (c) 2015 Nanopony 7 | 8 | Licensed under the Apache License, Version 2.0 (the "License"); 9 | you may not use this file except in compliance with the License. 10 | You may obtain a copy of the License at 11 | 12 | http://www.apache.org/licenses/LICENSE-2.0 13 | 14 | Unless required by applicable law or agreed to in writing, software 15 | distributed under the License is distributed on an "AS IS" BASIS, 16 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 17 | See the License for the specific language governing permissions and 18 | limitations under the License. 19 | """ 20 | import json 21 | import re 22 | from datetime import datetime 23 | 24 | 25 | class AbstractSignatureBase: 26 | def analyse(self, ip, method, url, agent): 27 | """ 28 | Gets request, returns heuristic level 0 - code blue, 10 - code red 29 | :param url: 30 | :param agent: 31 | :return: 32 | """ 33 | return 0 34 | 35 | 36 | class JsonSignatureBase: 37 | def __init__(self): 38 | with open('./signature.json') as rf: 39 | raw = rf.read() 40 | self._json = json.loads(raw) 41 | self._compile_signatures() 42 | 43 | def _compile_signatures(self): 44 | self._signatures = [] 45 | for s in self._json['signatures']: 46 | self._signatures.append(( 47 | re.compile(s['q'],re.I), 48 | int(s['s']), 49 | )) 50 | def analyse(self, ip, method, url, agent): 51 | """ 52 | Gets request, returns heuristic level 0 - code blue, 10 - code red 53 | :param url: 54 | :param agent: 55 | :return: 56 | """ 57 | for s in self._signatures: 58 | if s[0].search(url): 59 | return s[1] 60 | return 0 61 | 62 | class Violator: 63 | def __init__(self, ip): 64 | self.ip = ip 65 | self.violations = [] 66 | self.should_be_banned = False 67 | def pretty_print(self): 68 | print('Violator: %s %s' % (self.ip, '[BUSTED]' if self.should_be_banned else '')) 69 | print('Crimes:') 70 | 71 | for v in self.violations: 72 | print(' %s %s %s : %s'%v) 73 | print('') 74 | def push_violation(self, date, method, url, agent, severity): 75 | """ 76 | Violator isn't in jail 77 | :param date: 78 | :param method: 79 | :param url: 80 | :param agent: 81 | :param severity: 82 | :return: 83 | """ 84 | self.latest_violation = date 85 | self.violations.append((method, url, agent, severity)) 86 | self.should_be_banned = True 87 | return self.should_be_banned 88 | 89 | def push_evidence(self, date, method, url, agent): 90 | """ 91 | Violator is in jail, we won't analyse his action to save time 92 | :param date: 93 | :param method: 94 | :param url: 95 | :param agent: 96 | :param severity: 97 | :return: 98 | """ 99 | self.violations.append((method, url, agent, 10)) 100 | 101 | 102 | class Anathema: 103 | def __init__(self, filename): 104 | self.filename = filename 105 | self.log_line_regex = re.compile( 106 | r'^([0-9\.]+)\s(.*)\[(.*)\]\s"([A-Z]+)\s*(.+)\sHTTP/\d.\d"\s(\d+)\s([\d]+)(\s"(.+)" )?(.*)$') 107 | self.heuristic = JsonSignatureBase() 108 | 109 | self.purgitory = dict() 110 | self.jail = set() 111 | 112 | def parse_log(self): 113 | with open(self.filename) as log_file: 114 | for line_id, line in enumerate(log_file): 115 | m = self.log_line_regex.match(line) 116 | if m is not None: 117 | ip, name, date, method, url, response, byte, _, referrer, agent = m.groups() 118 | if ip in self.jail: 119 | self.purgitory[ip].push_evidence(date, method, url, agent) 120 | continue 121 | 122 | if len(url) > 1 and method in ('GET', 'POST', 'HEAD', 'PUT', 'PUSH', 'OPTIONS'): 123 | date = datetime.strptime(date, '%d/%b/%Y:%H:%M:%S %z') 124 | sev = self.heuristic.analyse(ip, method, url, agent) 125 | if (sev>0): 126 | if (ip not in self.purgitory): 127 | self.purgitory[ip] = Violator(ip) 128 | if self.purgitory[ip].push_violation(date, method, url, agent, sev): 129 | self.jail.add(ip) 130 | for key, violator in self.purgitory.items(): 131 | violator.pretty_print() 132 | if __name__ == '__main__': 133 | a = Anathema('../../test/satellite-access.log') 134 | a.parse_log() -------------------------------------------------------------------------------- /anathema/signature.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": 0.001, 3 | "signatures": [ 4 | { 5 | "q": "tmUnblock.cgi", 6 | "s": 10, 7 | "tags": ["scan"] 8 | }, 9 | { 10 | "q": "w00tw00t.cgi", 11 | "s": 10, 12 | "tags": ["scan"] 13 | }, 14 | { 15 | "q": "myadmin/scripts/setup.php", 16 | "s": 10, 17 | "tags": ["scan"] 18 | }, 19 | { 20 | "q": "pma/scripts/setup.php", 21 | "s": 10, 22 | "tags": ["scan"] 23 | }, 24 | { 25 | "q": "phpmyadmin/scripts/setup.php", 26 | "s": 10, 27 | "tags": ["scan"] 28 | }, 29 | { 30 | "q": "cgi-sys/defaultwebpage.cgi", 31 | "s": 10, 32 | "tags": ["scan"] 33 | }, 34 | { 35 | "q": "bigdump/bigdump.php", 36 | "s": 10, 37 | "tags": ["scan"] 38 | } 39 | ] 40 | } -------------------------------------------------------------------------------- /output.example.html: -------------------------------------------------------------------------------- 1 |

Scalp of almost-rgaucher.info-Aug-2008.log [Tue-16-Sep-2008]

17 |

xss (Cross-Site Scripting)

18 |
19 |
Impact 4
20 |
21 | Reason: Detects JavaScript language constructs
22 | Log line: /romain/include-favicon.php?url=http://yaisb.blogspot.com/favicon.ico
23 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 24 |
25 |
26 | Reason: Detects JavaScript language constructs
27 | Log line: /romain/include-favicon.php?url=http://blog.ianbicking.org/favicon.ico
28 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 29 |
30 |
31 | Reason: Detects JavaScript language constructs
32 | Log line: /romain/include-favicon.php?url=http://www.cigital.com/favicon.ico
33 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 34 |
35 |
36 | Reason: Detects JavaScript language constructs
37 | Log line: /romain/include-favicon.php?url=http://www.hackosis.com/favicon.ico
38 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 39 |
40 |
41 | Reason: Detects JavaScript language constructs
42 | Log line: /romain/include-favicon.php?url=http://jeremy.zawodny.com/favicon.ico
43 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 44 |
45 |
46 | Reason: Detects JavaScript language constructs
47 | Log line: /romain/include-favicon.php?url=http://www.modsecurity.org/favicon.ico
48 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 49 |
50 |
51 | Reason: Detects JavaScript language constructs
52 | Log line: /romain/include-favicon.php?url=http://googleonlinesecurity.blogspot.com/favicon.ico
53 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 54 |
55 |
56 | Reason: Detects JavaScript language constructs
57 | Log line: /romain/include-favicon.php?url=http://jeremiahgrossman.blogspot.com/favicon.ico
58 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 59 |
60 |
61 | Reason: Detects JavaScript language constructs
62 | Log line: /romain/include-favicon.php?url=http://kuza55.blogspot.com/favicon.ico
63 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 64 |
65 |
66 | Reason: Detects JavaScript language constructs
67 | Log line: /romain/include-favicon.php?url=http://myappsecurity.blogspot.com/favicon.ico
68 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 69 |
70 |
71 | Reason: Detects JavaScript language constructs
72 | Log line: /romain/include-favicon.php?url=http://myappsecurity.blogspot.com/favicon.ico
73 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-])) 74 |
75 |
76 |
77 |

rfe (Remote File Execution)

78 |
79 |
Impact 5
80 |
81 | Reason: Detects url injections and RFE attempts
82 | Log line: /romain/include-favicon.php?url=http://yaisb.blogspot.com/favicon.ico
83 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 84 |
85 |
86 | Reason: Detects url injections and RFE attempts
87 | Log line: /romain/include-favicon.php?url=http://blog.ianbicking.org/favicon.ico
88 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 89 |
90 |
91 | Reason: Detects url injections and RFE attempts
92 | Log line: /romain/include-favicon.php?url=http://www.cigital.com/favicon.ico
93 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 94 |
95 |
96 | Reason: Detects url injections and RFE attempts
97 | Log line: /romain/include-favicon.php?url=http://www.hackosis.com/favicon.ico
98 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 99 |
100 |
101 | Reason: Detects url injections and RFE attempts
102 | Log line: /romain/include-favicon.php?url=http://jeremy.zawodny.com/favicon.ico
103 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 104 |
105 |
106 | Reason: Detects url injections and RFE attempts
107 | Log line: /romain/include-favicon.php?url=http://www.modsecurity.org/favicon.ico
108 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 109 |
110 |
111 | Reason: Detects url injections and RFE attempts
112 | Log line: /romain/include-favicon.php?url=http://googleonlinesecurity.blogspot.com/favicon.ico
113 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 114 |
115 |
116 | Reason: Detects url injections and RFE attempts
117 | Log line: /romain/include-favicon.php?url=http://jeremiahgrossman.blogspot.com/favicon.ico
118 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 119 |
120 |
121 | Reason: Detects url injections and RFE attempts
122 | Log line: /romain/include-favicon.php?url=http://kuza55.blogspot.com/favicon.ico
123 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 124 |
125 |
126 | Reason: Detects url injections and RFE attempts
127 | Log line: /romain/include-favicon.php?url=http://myappsecurity.blogspot.com/favicon.ico
128 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 129 |
130 |
131 | Reason: Detects url injections and RFE attempts
132 | Log line: /romain/include-favicon.php?url=http://myappsecurity.blogspot.com/favicon.ico
133 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{) 134 |
135 |
136 |
137 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | regex -------------------------------------------------------------------------------- /scalp/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nanopony/apache-scalp/94cf094fb36360a39de9c407e87841363dbd4500/scalp/__init__.py -------------------------------------------------------------------------------- /scalp/scalp.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | Scalp! Apache log based attack analyzer 4 | by Romain Gaucher - http://rgaucher.info 5 | http://code.google.com/p/apache-scalp 6 | 7 | 8 | Copyright (c) 2008 Romain Gaucher 9 | 10 | Licensed under the Apache License, Version 2.0 (the "License"); 11 | you may not use this file except in compliance with the License. 12 | You may obtain a copy of the License at 13 | 14 | http://www.apache.org/licenses/LICENSE-2.0 15 | 16 | Unless required by applicable law or agreed to in writing, software 17 | distributed under the License is distributed on an "AS IS" BASIS, 18 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 19 | See the License for the specific language governing permissions and 20 | limitations under the License. 21 | 22 | EXTERNAL DEVELOPER NOTES: 23 | 10052008: Don C. Weber 24 | Fixed XML header by putting comment after XML version line. 25 | - This was necessary so that Firefox recognized the file as XML and 26 | displayed it properly. Proper display allows for sections to be 27 | collapsed for easy viewing. 28 | 10062008: Don C. Weber 29 | Added Regexp to the XML output. Also added this to the DTD 30 | 12312008: Don C. Weber 31 | Added IP and Subnet exclusion capability to cmd line input and scalper function 32 | 33 | """ 34 | from __future__ import with_statement 35 | import time, base64 36 | import os,sys,random 37 | 38 | import regex as re 39 | 40 | try: 41 | from lxml import etree 42 | except ImportError: 43 | try: 44 | import xml.etree.cElementTree as etree 45 | except ImportError: 46 | try: 47 | import xml.etree.ElementTree as etree 48 | except ImportError: 49 | print("Cannot find the ElementTree in your python packages") 50 | 51 | __application__ = "scalp" 52 | __version__ = "0.5" 53 | __release__ = __application__ + '/' + __version__ 54 | __author__ = "Romain Gaucher" 55 | __credits__ = ["Romain Gaucher", "Don C. Weber", "nanopony"] 56 | 57 | PHPIDC_DEFAULT_XML_URL = "http://dev.itratos.de/projects/php-ids/repository/raw/trunk/lib/IDS/default_filter.xml" # they have expired https cert atm :c 58 | 59 | names = { 60 | 'xss' : 'Cross-Site Scripting', 61 | 'sqli' : 'SQL Injection', 62 | 'csrf' : 'Cross-Site Request Forgery', 63 | 'dos' : 'Denial Of Service', 64 | 'dt' : 'Directory Traversal', 65 | 'spam' : 'Spam', 66 | 'id' : 'Information Disclosure', 67 | 'rfe' : 'Remote File Execution', 68 | 'lfi' : 'Local File Inclusion' 69 | } 70 | 71 | c_reg = re.compile(r'^(.+)-(.*)\[(.+)[-|+](\d+)\] "([A-Z]+)?(.+) HTTP/\d.\d" (\d+)(\s[\d]+)?(\s"(.+)" )?(.*)$') 72 | table = {} 73 | 74 | class BreakLoop( Exception ): 75 | pass 76 | 77 | txt_header = """ 78 | # 79 | # File created by Scalp! by Romain Gaucher - http://code.google.com/p/apache-scalp 80 | # Apache log attack analysis tool based on PHP-IDS filters 81 | # 82 | """ 83 | 84 | xml_header = """ 85 | 89 | """ 90 | 91 | html_header = """""" 107 | 108 | html_footer = "" 109 | 110 | class object_dict(dict): 111 | def __init__(self, initd=None): 112 | if initd is None: 113 | initd = {} 114 | dict.__init__(self, initd) 115 | def __getattr__(self, item): 116 | d = self.__getitem__(item) 117 | # if value is the only key in object, you can omit it 118 | if isinstance(d, dict) and 'value' in d and len(d) == 1: 119 | return d['value'] 120 | else: 121 | return d 122 | def __setattr__(self, item, value): 123 | self.__setitem__(item, value) 124 | 125 | def __parse_node(node): 126 | tmp = object_dict() 127 | # save attrs and text, hope there will not be a child with same name 128 | if node.text: 129 | tmp['value'] = node.text 130 | for (k,v) in node.attrib.items(): 131 | tmp[k] = v 132 | for ch in node.getchildren(): 133 | cht = ch.tag 134 | chp = __parse_node(ch) 135 | if cht not in tmp: # the first time, so store it in dict 136 | tmp[cht] = chp 137 | continue 138 | old = tmp[cht] 139 | if not isinstance(old, list): 140 | tmp.pop(cht) 141 | tmp[cht] = [old] # multi times, so change old dict to a list 142 | tmp[cht].append(chp) # add the new one 143 | return tmp 144 | 145 | def parse(xml_file): 146 | try: 147 | xml_handler = open(xml_file, 'r') 148 | doc = etree.parse(xml_handler).getroot() 149 | xml_handler.close() 150 | return object_dict({doc.tag: __parse_node(doc)}) 151 | except IOError: 152 | print("error: problem with the filter's file") 153 | return {} 154 | 155 | def get_value(array, default): 156 | if 'value' in array: 157 | return array['value'] 158 | return default 159 | 160 | def html_entities(str): 161 | out = "" 162 | for i in str: 163 | if i == '"': out += '"' 164 | elif i == '<': out += '<' 165 | elif i == '>': out += '>' 166 | else: 167 | out += i 168 | return out 169 | 170 | d_replace = { 171 | "\r":";", 172 | "\n":";", 173 | "\f":";", 174 | "\t":";", 175 | "\v":";", 176 | "'":"\"", 177 | "+ACI-":"\"", 178 | "+ADw-":"<", 179 | "+AD4-" : ">", 180 | "+AFs-" : "[", 181 | "+AF0-" : "]", 182 | "+AHs-" : "{", 183 | "+AH0-" : "}", 184 | "+AFw-" : "\\", 185 | "+ADs-" : ";", 186 | "+ACM-" : "#", 187 | "+ACY-" : "&", 188 | "+ACU-" : "%", 189 | "+ACQ-" : "$", 190 | "+AD0-" : "=", 191 | "+AGA-" : "'", 192 | "+ALQ-" : "\"", 193 | "+IBg-" : "\"", 194 | "+IBk-" : "\"", 195 | "+AHw-" : "|", 196 | "+ACo-" : "*", 197 | "+AF4-" : "^", 198 | "+ACIAPg-" : "\">", 199 | "+ACIAPgA8-" : "\">", 200 | } 201 | re_replace = None 202 | 203 | 204 | def fill_replace_dict(): 205 | global d_replace, re_replace 206 | # very first control-chars 207 | for i in range(0,20): 208 | d_replace["%%%x" % i] = "%00" 209 | d_replace["%%%X" % i] = "%00" 210 | # javascript charcode 211 | for i in range(33,127): 212 | c = "%c" % i 213 | d_replace["\\%o" % i] = c 214 | d_replace["\\%x" % i] = c 215 | d_replace["\\%X" % i] = c 216 | d_replace["0x%x" % i] = c 217 | d_replace["&#%d;" % i] = c 218 | d_replace["&#%x;" % i] = c 219 | d_replace["&#%X;" % i] = c 220 | # SQL words? 221 | d_replace["is null"]="=0" 222 | d_replace["like null"]="=0" 223 | d_replace["utc_time"]="" 224 | d_replace["null"]="" 225 | d_replace["true"]="" 226 | d_replace["false"]="" 227 | d_replace["localtime"]="" 228 | d_replace["stamp"]="" 229 | d_replace["binary"]="" 230 | d_replace["ascii"]="" 231 | d_replace["soundex"]="" 232 | d_replace["md5"]="" 233 | d_replace["between"]="=" 234 | d_replace["is"]="=" 235 | d_replace["not in"]="=" 236 | d_replace["xor"]="=" 237 | d_replace["rlike"]="=" 238 | d_replace["regexp"]="=" 239 | d_replace["sounds like"]="=" 240 | re_replace = re.compile("(%s)" % "|".join(map(re.escape, d_replace.keys()))) 241 | 242 | 243 | def multiple_replace(text): 244 | return re_replace.sub(lambda mo: d_replace[mo.string[mo.start():mo.end()]], text) 245 | 246 | # the decode engine tries to detect then decode... 247 | def decode_attempt(string): 248 | return multiple_replace(string) 249 | 250 | def analyzer(data): 251 | exp_line, regs, array, preferences, org_line = data[0],data[1],data[2],data[3],data[4] 252 | done = [] 253 | # look for the detected attacks... 254 | # either stop at the first found or not 255 | for attack_type in preferences['attack_type']: 256 | if attack_type in regs: 257 | if attack_type not in array: 258 | array[attack_type] = {} 259 | for _hash in regs[attack_type]: 260 | if _hash not in done: 261 | done.append(_hash) 262 | attack = table[_hash] 263 | cur_line = exp_line[5] 264 | if preferences['encodings']: 265 | cur_line = decode_attempt(cur_line) 266 | if attack[0].search(cur_line): 267 | if attack[1] not in array[attack_type]: 268 | array[attack_type][attack[1]] = [] 269 | array[attack_type][attack[1]].append((exp_line, attack[3], attack[2], org_line)) 270 | if preferences['exhaustive']: 271 | break 272 | else: 273 | return 274 | 275 | def scalper(access, filters, preferences = [], output = "text"): 276 | global table 277 | if not os.path.isfile(access): 278 | print("error: the log file doesn't exist") 279 | return 280 | if not os.path.isfile(filters): 281 | print("error: the filters file (XML) doesn't exist") 282 | 283 | ans = input("Do you want me to download it? [y]/n: ") 284 | if ans in ["", "y", "Y"]: 285 | import urllib.request 286 | urllib.request.urlretrieve(PHPIDC_DEFAULT_XML_URL, filters) 287 | else: 288 | return 289 | if output not in ('html', 'text', 'xml'): 290 | print("error: the output format '%s' hasn't been recognized") % output 291 | return 292 | # load the XML file 293 | xml_filters = parse(filters) 294 | len_filters = len(xml_filters) 295 | if len_filters < 1: 296 | return 297 | # prepare to load the compiled regular expression 298 | regs = {} # type => (reg.compiled, impact, description, rule) 299 | 300 | print("Loading XML file '%s'..." % filters) 301 | for group in xml_filters: 302 | for f in xml_filters[group]: 303 | if f == 'filter': 304 | if type(xml_filters[group][f]) == type([]): 305 | for elmt in xml_filters[group][f]: 306 | rule, impact, description, tags = "",-1,"",[] 307 | if 'impact' in elmt: 308 | impact = int(get_value(elmt['impact'], -1)) 309 | if 'rule' in elmt: 310 | rule = get_value(elmt['rule'], "") 311 | if 'description' in elmt: 312 | description = get_value(elmt['description'], "") 313 | if 'tags' in elmt and 'tag' in elmt['tags']: 314 | if type(elmt['tags']['tag']) == type([]): 315 | for tag in elmt['tags']['tag']: 316 | tags.append(get_value(tag, "")) 317 | else: 318 | tags.append(get_value(elmt['tags']['tag'], "")) 319 | # register the entry in our array 320 | for t in tags: 321 | compiled = None 322 | if t not in regs: 323 | regs[t] = [] 324 | try: 325 | compiled = re.compile(rule) 326 | except Exception: 327 | print("The rule '%s' cannot be compiled properly" % rule) 328 | return 329 | _hash = hash(rule) 330 | if impact > -1: 331 | table[_hash] = (compiled, impact, description, rule, _hash) 332 | regs[t].append(_hash) 333 | if len(preferences['attack_type']) < 1: 334 | preferences['attack_type'] = regs.keys() 335 | flag = {} # {type => { impact => ({log_line dict}, rule, description, org_line) }} 336 | 337 | print("Processing the file '%s'..." % access) 338 | 339 | sample, sampled_lines = False, [] 340 | if preferences['sample'] != float(100): 341 | # get the number of lines 342 | sample = True 343 | total_nb_lines = sum(1 for line in open(access)) 344 | # take a random sample 345 | random.seed(time.clock()) 346 | sampled_lines = random.sample(range(total_nb_lines), int(float(total_nb_lines) * preferences['sample'] / float(100))) 347 | sampled_lines.sort() 348 | 349 | loc, lines, nb_lines = 0, 0, 0 350 | old_diff = 0 351 | start = time.time() 352 | diff = [] 353 | with open(access) as log_file: 354 | for line in log_file: 355 | lines += 1 356 | if sample and lines not in sampled_lines: 357 | continue 358 | if c_reg.match(line): 359 | out = c_reg.search(line) 360 | ip = out.group(1) 361 | name = out.group(2) 362 | date = out.group(3) 363 | ext = out.group(4) 364 | method = out.group(5) 365 | url = out.group(6) 366 | response = out.group(7) 367 | byte = out.group(8) 368 | referrer = out.group(9) 369 | agent = out.group(10) 370 | 371 | if preferences['ip_exclude'] != [] or preferences['subnet_exclude'] != []: 372 | ip_split = ip.split() 373 | if ip_split[0] in preferences['ip_exclude']: 374 | continue 375 | 376 | try: 377 | for sub in preferences['subnet_exclude']: 378 | if ip_split[0].startswith(sub): 379 | raise BreakLoop() 380 | except BreakLoop: 381 | continue 382 | 383 | if not correct_period(date, preferences['period']): 384 | continue 385 | loc += 1 386 | if len(url) > 1 and method in ('GET','POST','HEAD','PUT','PUSH','OPTIONS'): 387 | analyzer([(ip,name,date,ext,method,url,response,byte,referrer,agent),regs,flag, preferences, line]) 388 | elif preferences['except']: 389 | diff.append(line) 390 | 391 | # mainly testing purposes... 392 | if nb_lines > 0 and lines > nb_lines: 393 | break 394 | 395 | tt = time.time() - start 396 | n = 0 397 | for t in flag: 398 | for i in flag[t]: 399 | n += len(flag[t][i]) 400 | print("Scalp results:") 401 | print("\tProcessed %d lines over %d" % (loc,lines)) 402 | print("\tFound %d attack patterns in %f s" % (n,tt)) 403 | 404 | short_name = access[access.rfind(os.sep)+1:] 405 | if n > 0: 406 | print("Generating output in %s%s%s_scalp_*" % (preferences['odir'],os.sep,short_name)) 407 | if 'html' in preferences['output']: 408 | generate_html_file(flag, short_name, filters, preferences['odir']) 409 | elif 'text' in preferences['output']: 410 | generate_text_file(flag, short_name, filters, preferences['odir']) 411 | elif 'xml' in preferences['output']: 412 | generate_xml_file(flag, short_name, filters, preferences['odir']) 413 | 414 | # generate exceptions 415 | if len(diff) > 0: 416 | o_except = open(os.path.abspath(preferences['odir'] + os.sep + "scalp_except.txt"), "w") 417 | for l in diff: 418 | o_except.write(l + '\n') 419 | o_except.close() 420 | 421 | 422 | def generate_text_file(flag, access, filters, odir): 423 | curtime = time.strftime("%a-%d-%b-%Y", time.localtime()) 424 | fname = '%s_scalp_%s.txt' % (access, curtime) 425 | fname = os.path.abspath(odir + os.sep + fname) 426 | try: 427 | out = open(fname, 'w') 428 | out.write(txt_header) 429 | out.write("Scalped file: %s\n" % access) 430 | out.write("Creation date: %s\n\n" % curtime) 431 | for attack_type in flag: 432 | if attack_type in names: 433 | out.write("Attack %s (%s)\n" % (names[attack_type], attack_type)) 434 | else: 435 | out.write("Attack type: %s\n" % attack_type) 436 | impacts = list(flag[attack_type].keys()) 437 | impacts.sort(reverse=True) 438 | 439 | for i in impacts: 440 | out.write("\n\t### Impact %d\n" % int(i)) 441 | for e in flag[attack_type][i]: 442 | out.write("\t%s" % e[3]) 443 | out.write("\tReason: \"%s\"\n\n" % e[2]) 444 | out.close() 445 | except IOError: 446 | print("Cannot open the file:", fname) 447 | return 448 | 449 | 450 | def generate_xml_file(flag, access, filters, odir): 451 | curtime = time.strftime("%a-%d-%b-%Y", time.localtime()) 452 | fname = '%s_scalp_%s.xml' % (access, curtime) 453 | fname = os.path.abspath(odir + os.sep + fname) 454 | try: 455 | out = open(fname, 'w') 456 | out.write(xml_header) 457 | out.write("\n" % (access, curtime)) 458 | for attack_type in flag: 459 | name = "" 460 | if attack_type in names: 461 | name = " name=\"%s\"" % names[attack_type] 462 | out.write(" \n" % (attack_type, name)) 463 | impacts = flag[attack_type].keys() 464 | impacts.sort(reverse=True) 465 | for i in impacts: 466 | out.write(" \n" % int(i)) 467 | for e in flag[attack_type][i]: 468 | out.write(" \n") 469 | out.write(" \n" % e[2]) 470 | out.write(" \n" % e[1]) 471 | out.write(" \n" % e[3]) 472 | out.write(" \n") 473 | out.write(" \n") 474 | out.write(" \n") 475 | out.write("") 476 | out.close() 477 | except IOError: 478 | print("Cannot open the file:", fname) 479 | return 480 | return 481 | 482 | def generate_html_file(flag, access, filters, odir): 483 | curtime = time.strftime("%a-%d-%b-%Y", time.localtime()) 484 | fname = '%s_scalp_%s.html' % (access, curtime) 485 | fname = os.path.abspath(odir + os.sep + fname) 486 | try: 487 | out = open(fname, 'w') 488 | out.write(html_header) 489 | out.write("

Scalp of %s [%s]

\n" % (access, curtime)) 490 | for attack_type in flag: 491 | name = "" 492 | if attack_type in names: 493 | name = "%s" % names[attack_type] 494 | if len(flag[attack_type].values()) < 1: 495 | continue 496 | out.write("

%s (%s)

\n" % (attack_type, name)) 497 | impacts = flag[attack_type].keys() 498 | impacts.sort(reverse=True) 499 | # order by impact 500 | for i in impacts: 501 | out.write("
\n" % int(i)) 502 | out.write("
Impact %d
\n" % int(i)) 503 | # list the one of same impacts 504 | for e in flag[attack_type][i]: 505 | out.write("
\n") 506 | out.write(" Reason: %s
\n" % html_entities(e[2])) 507 | out.write(" Log line:%s
\n" % html_entities(e[0][5])) 508 | out.write(" Matching Regexp:%s\n" % html_entities(e[1])) 509 | out.write("
\n") 510 | out.write("
\n") 511 | out.write("
\n") 512 | out.write(html_footer) 513 | out.close() 514 | except IOError: 515 | print("Cannot open the file:", fname) 516 | return 517 | 518 | months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'] 519 | 520 | def correct_period(date, period): 521 | date = date.replace(':', '/') 522 | l_date = date.split('/') 523 | for i in (2,1,0,3,4,5): 524 | if i != 1: 525 | cur = int(l_date[i]) 526 | if cur < period['start'][i] or cur > period['end'][i]: 527 | return False 528 | else: 529 | cur = months.index(l_date[i]) 530 | if cur == -1: 531 | return False 532 | if cur < period['start'][i] or cur > period['end'][i]: 533 | return False 534 | return True 535 | 536 | 537 | def analyze_date(date): 538 | """04/Apr/2008:15:45;*/May/2008""" 539 | 540 | d_min = [1, 00, 0000, 00, 00, 00] 541 | d_max = [31, 11, 9999, 24, 59, 59] 542 | 543 | date = date.replace(':', '/') 544 | l_date = date.split(';') 545 | l_start= l_date[0].split('/') 546 | l_end = l_date[1].split('/') 547 | 548 | v_start = [1, 00, 0000, 00, 00, 00] 549 | v_end = [31, 11, 9999, 24, 59, 59] 550 | 551 | for i in range(len(l_start)): 552 | if l_start[i] == '*': continue 553 | else: 554 | if i == 1: 555 | v_start[1] = months.index(l_start[1]) 556 | else: 557 | cur = int(l_start[i]) 558 | if cur < d_min[i]: v_start[i] = d_min[i] 559 | elif cur > d_max[i]: v_start[i] = d_max[i] 560 | else: v_start[i] = cur 561 | for i in range(len(l_end)): 562 | if l_end[i] == '*': continue 563 | else: 564 | if i == 1: 565 | v_end[1] = months.index(l_end[1]) 566 | else: 567 | cur = int(l_end[i]) 568 | if cur < d_min[i]: v_end[i] = d_min[i] 569 | elif cur > d_max[i]: v_end[i] = d_max[i] 570 | else: v_end[i] = cur 571 | return {'start' : v_start, 'end' : v_end} 572 | 573 | def help(): 574 | print("Scalp the apache log! by Romain Gaucher - http://rgaucher.info") 575 | print("usage: ./scalp.py [--log|-l log_file] [--filters|-f filter_file] [--period time-frame] [OPTIONS] [--attack a1,a2,..,an]") 576 | print(" [--sample|-s 4.2]") 577 | print(" --log |-l: the apache log file './access_log' by default") 578 | print(" --filters |-f: the filter file './default_filter.xml' by default") 579 | print(" --exhaustive|-e: will report all type of attacks detected and not stop") 580 | print(" at the first found") 581 | print(" --tough |-u: try to decode the potential attack vectors (may increase") 582 | print(" the examination time)") 583 | print(" --period |-p: the period must be specified in the same format as in") 584 | print(" the Apache logs using * as wild-card") 585 | print(" ex: 04/Apr/2008:15:45;*/Mai/2008") 586 | print(" if not specified at the end, the max or min are taken") 587 | print(" --html |-h: generate an HTML output") 588 | print(" --xml |-x: generate an XML output") 589 | print(" --text |-t: generate a simple text output (default)") 590 | print(" --except |-c: generate a file that contains the non examined logs due to the") 591 | print(" main regular expression; ill-formed Apache log etc.") 592 | print(" --attack |-a: specify the list of attacks to look for") 593 | print(" list: xss, sqli, csrf, dos, dt, spam, id, ref, lfi") 594 | print(" the list of attacks should not contains spaces and comma separated") 595 | print(" ex: xss,sqli,lfi,ref") 596 | print(" --ignore-ip|-i: specify the list of IP Addresses to look exclude") 597 | print(" the list of IP Addresses should be comma separated and not contain spaces") 598 | print(" This option can be used in conjunction with --ignore-ip") 599 | print(" --ignore-subnet|-n: specify the list of Subnets to look exclude") 600 | print(" the list of Subnets should be comma separated and not contain spaces") 601 | print(" This option can be used in conjunction with --ignore-subnet") 602 | print(" --output |-o: specifying the output directory; by default, scalp will try to write") 603 | print(" in the same directory as the log file") 604 | print(" --sample |-s: use a random sample of the lines, the number (float in [0,100]) is") 605 | print(" the percentage, ex: --sample 0.1 for 1/1000") 606 | 607 | def main(argc, argv): 608 | filters = "default_filter.xml" 609 | access = "access_log" 610 | output = "" 611 | preferences = { 612 | 'attack_type' : [], 613 | 'ip_exclude' : [], 614 | 'subnet_exclude' : [], 615 | 'period' : { 616 | 'start' : [1, 00, 0000, 00, 00, 00],# day, month, year, hour, minute, second 617 | 'end' : [31, 11, 9999, 24, 59, 59] 618 | }, 619 | 'except' : False, 620 | 'exhaustive' : False, 621 | 'encodings' : False, 622 | 'output' : "", 623 | 'odir' : os.path.abspath(os.curdir), 624 | 'sample' : float(100) 625 | } 626 | 627 | if argc < 2 or argv[1] == "--help": 628 | help() 629 | sys.exit(0) 630 | else: 631 | for i in range(argc): 632 | s = argv[i] 633 | if i < argc: 634 | if s in ("--filters","-f"): 635 | filters = argv[i+1] 636 | elif s in ("--log","-l"): 637 | access = argv[i+1] 638 | elif s in ("--output", "-o"): 639 | preferences['odir'] = argv[i+1] 640 | elif s in ("--sample", "-s"): 641 | try: 642 | preferences['sample'] = float(argv[i+1]) 643 | except: 644 | preferences['sample'] = float(4.2) 645 | print("/!\ Error in the sample size, will be 4.2%") 646 | elif s in ("--period", "-p"): 647 | preferences['period'] = analyze_date(argv[i+1]) 648 | elif s in ("--exhaustive", "-e"): 649 | preferences['exhaustive'] = True 650 | elif s in ("--html", "-h"): 651 | preferences['output'] += ",html" 652 | elif s in ("--xml", "-x"): 653 | preferences['output'] += ",xml" 654 | elif s in ("--text", "-t"): 655 | preferences['output'] += ",text" 656 | elif s in ("--except", "-c"): 657 | preferences['except'] = True 658 | elif s in ("--tough","-u"): 659 | fill_replace_dict() 660 | preferences['encodings'] = True 661 | elif s in ("--attack", "-a"): 662 | preferences['attack_type'] = argv[i+1].split(',') 663 | elif s in ("--ignore-ip", "-i"): 664 | preferences['ip_exclude'] = argv[i+1].split(',') 665 | elif s in ("--ignore-subnet", "-n"): 666 | preferences['subnet_exclude'] = argv[i+1].split(',') 667 | else: 668 | print("argument error, '%s' has been ignored") % s 669 | if len(preferences['output']) < 1: 670 | preferences['output'] = "text" 671 | if not os.path.isdir(preferences['odir']): 672 | print("The directory %s doesn't exist, scalp will try to create it") 673 | try: 674 | os.mkdir(preferences['odir']) 675 | except: 676 | print("/!\ scalp cannot write in"),preferences['odir'] 677 | print("/!\ Ising /tmp/scalp/ as new directory...") 678 | preferences['odir'] = '/tmp/scalp' 679 | os.mkdir(preferences['odir']) 680 | scalper(access, filters, preferences) 681 | 682 | if __name__ == "__main__": 683 | main(len(sys.argv), sys.argv) 684 | """ 685 | import hotshot 686 | from hotshot import stats 687 | name = "hotshot_scalp_stats" 688 | if not os.path.isfile(name): 689 | prof = hotshot.Profile(name) 690 | prof.runcall(main) 691 | prof.close() 692 | s = stats.load(name) 693 | s.sort_stats("time").print_stats() 694 | """ 695 | -------------------------------------------------------------------------------- /scalp/sexr.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """ 3 | File: sexr.py 4 | Author: Don C. Weber 5 | Start Date: 10052008 6 | Purpose: Scalp External XML Reporter parses Scalp XML files and output 7 | alert information, statistics, and detected IP addresses 8 | 9 | Copyright 2008 Don C. Weber 10 | 11 | License: 12 | This work is licensed under the Creative Commons Attribution-Share Alike 3.0 13 | United States License. To view a copy of this license, visit 14 | http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to 15 | Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 16 | 94105, USA. 17 | 18 | 19 | Last Mod: 12292008 20 | Mods: 21 | 12292008 - Removed some residual debugging messages. 22 | 23 | 24 | Notes: 25 | No official DTD for Scalp XML scheme provided. 26 | DTD for Scalp XML scheme. NOTE: First time I have done this so it may or may not be correct. 27 | ---------------------- 28 | File: scalp_xmldtd.dtd 29 | 30 | 31 | 33 | 34 | 36 | 37 | 38 | 39 | 40 | 41 | ---------------------- 42 | 43 | To Do: 44 | - Count hits by attack type 45 | - Count hits by attacking IP 46 | - Create CSV output - not sure if this is necessary, if so use "import csv" module 47 | 48 | Resources: 49 | Scalp: http://code.google.com/p/apache-scalp/ 50 | PHP-IDS: http://php-ids.org/ 51 | PyXML: http://pyxml.sourceforge.net 52 | DTD Attributes: http://www.w3schools.com/DTD/dtd_attributes.asp 53 | Declaring Attributes and Entities in DTDs: http://www.criticism.com/dita/dtd2.html 54 | Apache Log Format 1.3: http://httpd.apache.org/docs/1.3/logs.html 55 | Apache Log Format 2.2: http://httpd.apache.org/docs/2.2/logs.html 56 | The lxml.etree Tutorial: http://codespeak.net/lxml/tutorial.html 57 | Validation with lxml: http://codespeak.net/lxml/validation.html#dtd 58 | Dive in Python - Handling command line arguments: 59 | http://www.faqs.org/docs/diveintopython/kgp_commandline.html 60 | """ 61 | import os 62 | import sys 63 | import datetime 64 | import getopt 65 | import glob 66 | 67 | try: 68 | import psyco 69 | psyco.full() 70 | except ImportError: 71 | print "%s: psyco is not installed" % sys.argv[0] 72 | pass 73 | 74 | try: 75 | from lxml import etree 76 | except ImportError: 77 | try: 78 | import xml.etree.cElementTree as etree 79 | except ImportError: 80 | try: 81 | import xml.etree.ElementTree as etree 82 | except ImportError: 83 | print "%s: Cannot find the ElementTree in your python packages" % sys.argv[0] 84 | 85 | __application__ = "sexr" 86 | __version__ = "0.1" 87 | __release__ = __application__ + "/" + __version__ 88 | __author__ = "Don C. Weber" 89 | __copyright__ = "Copyright 2008 Don C. Weber" 90 | __license__ = "Creative Commons Attribution-Share Alike 3.0 United States License" 91 | __credits__ = "Don C. Weber" 92 | __maintainer__ = "Don C. Weber" 93 | __email__ = "cutaway@cutawaysecurity.com" 94 | 95 | def xparse(xml_file, val_dtd): 96 | """ 97 | Function: xparse 98 | Variables: 99 | xml_file - the XML file to be reviewed 100 | val_dtd - The DTD object for validation 101 | 102 | Return: 103 | On Success: Parsed XML 'Element' 104 | On Fail: Return nothing which basically skips the file 105 | 106 | Purpose: 107 | This function takes the xml_file and parses it into a handler 108 | so that it can be evaluated. 109 | """ 110 | try: 111 | xml_handler = open(xml_file, 'r') 112 | xparse = etree.parse(xml_handler).getroot() 113 | if not val_dtd.validate(xparse): 114 | print "%s: XML file does not comply with Scalp DTD: %s" % (sys.argv[0], xml_file) 115 | return "" 116 | xml_handler.close() 117 | return xparse 118 | except IOError: 119 | print "%s: IOError with the filter's file: %s" % (sys.argv[0], xml_file) 120 | return 121 | except: 122 | print "%s: Unknown error with the filter's file: %s" % (sys.argv[0], xml_file) 123 | return "" 124 | 125 | 126 | def iter(node,indent,fOUT): 127 | """ 128 | Function: iter 129 | Variables: 130 | node - the node of the XML tree to be evaluated 131 | indent - spaces for visual purposes only, helps build a visible tree for text 132 | 133 | Purpose: 134 | Iterate through a specific node of the XML tree. First show the attributes 135 | and their values and then show any text for the node. The check to see if 136 | the node has any children and iterate through them. Continue until complete. 137 | """ 138 | 139 | len_node = len(node) 140 | indent += ' ' 141 | if len(node.attrib): 142 | # print "%s%s: %s" % (indent, node.tag, node.attrib) 143 | line_out = "%s%s: %s\n" % (indent, node.tag, node.attrib) 144 | fOUT.write(line_out) 145 | else: 146 | # print indent + node.tag 147 | line_out = indent + node.tag + "\n" 148 | fOUT.write(line_out) 149 | if len(node.text.strip()): 150 | # print "%s - %s" % (indent, node.text) 151 | line_out = "%s - %s\n" % (indent, node.text) 152 | fOUT.write(line_out) 153 | if len_node: 154 | for ch in node: 155 | iter(ch,indent,fOUT) 156 | 157 | def item_cnt_iter(node,indent,fOUT): 158 | """ 159 | Function: item_cnt_iter 160 | Variables: 161 | node - the node of the XML tree to be evaluated 162 | indent - spaces for visual purposes only, helps build a visible tree for text 163 | 164 | Purpose: 165 | Iterate through each node of the XML tree until it gets to the Impact node. 166 | When this node is encountered, count the number of source IP addresses that 167 | were associated with the flagged requests. Each impact will have specific 168 | reasons an alert was triggered. Count the number of alerts per reason. 169 | """ 170 | 171 | len_node = len(node) 172 | indent += ' ' 173 | d_impact = {} 174 | ip_impact = {} 175 | 176 | if node.tag == "impact": # Analyze all impact nodes 177 | # print "%sImpact %s Items: %s" % (indent, node.get('value'), str(len(node))) 178 | line_out = "%sImpact %s Items: %s\n" % (indent, node.get('value'), str(len(node))) 179 | fOUT.write(line_out) 180 | d_impact.clear() 181 | for i_ch in node: # loop through children of impact = items 182 | for ic_ch in i_ch: # loop through children of items = reason,line,regexp 183 | if ic_ch.tag == "line" and _scan == "IP": 184 | source_ip = ic_ch.text.split() # Grab and count the IP from the flagged log entry 185 | if ip_impact.has_key(source_ip[0]): 186 | ip_impact[source_ip[0]] += 1 187 | else: 188 | ip_impact[source_ip[0]] = 1 189 | if ic_ch.tag == "reason" and _scan == "count": # Grab and count the alert 190 | if d_impact.has_key(ic_ch.text): 191 | d_impact[ic_ch.text] += 1 192 | else: 193 | d_impact[ic_ch.text] = 1 194 | if len(d_impact) and _scan == "count": 195 | for key, value in d_impact.items(): 196 | # print "%s - \'%s\': %s" % (indent, key, str(value)) 197 | line_out = "%s - \'%s\': %s\n" % (indent, key, str(value)) 198 | fOUT.write(line_out) 199 | if len(ip_impact) and _scan == "IP": 200 | # print "%s - Total Source IP Addresses: %s" % (indent, len(ip_impact)) 201 | line_out = "%s - Total Source IP Addresses: %s\n" % (indent, len(ip_impact)) 202 | fOUT.write(line_out) 203 | # Sort on keys only since values are not unique 204 | # I might try to figure this out later 205 | ip_addr = ip_impact.keys() 206 | ip_addr.sort() 207 | for addr in ip_addr: 208 | # print "%s - %s: %d" % (indent, addr, ip_impact[addr]) 209 | line_out = "%s - %s: %s\n" % (indent, addr, ip_impact[addr]) 210 | fOUT.write(line_out) 211 | return 212 | 213 | if len_node: 214 | if len(node.attrib): 215 | # print "%s%s: %s" % (indent, node.tag, node.attrib) 216 | line_out = "%s%s: %s\n" % (indent, node.tag, node.attrib) 217 | fOUT.write(line_out) 218 | else: 219 | # print indent + node.tag 220 | line_out = indent + node.tag + "\n" 221 | fOUT.write(line_out) 222 | if len(node.text.strip()): 223 | # print "%s - %s" % (indent, node.text) 224 | line_out = "%s - %s\n" % (indent, node.text) 225 | fOUT.write(line_out) 226 | for ch in node: 227 | item_cnt_iter(ch,indent,fOUT) 228 | else: 229 | return 230 | 231 | def set_foutput(fOUT): 232 | """ 233 | Function: set_foutput 234 | Variables: 235 | fOUT - file handle for output 236 | none - all globals 237 | 238 | Purpose: 239 | Set new output file 240 | """ 241 | outFile = "%s%s%s" % (_dout, _fout, _fout_ext) 242 | print "%s: Writing output to %s" % (sys.argv[0], outFile) 243 | try: 244 | fOUT = open(outFile, 'w') 245 | except IOError: 246 | print "%s: Error opening file location. Check permissions: %s" % (sys.argv[0], outFile) 247 | sys.exit(1) 248 | 249 | return fOUT 250 | 251 | def help(): 252 | """ 253 | Function: help 254 | Variables: 255 | None 256 | 257 | Purpose: 258 | Print help output to stdout 259 | """ 260 | 261 | print "Scalp External XML Reporter" 262 | print "Author: Don C. Weber" 263 | print "" 264 | print "usage: ./sexr.py [-h|--help] [-V|--version] [-v xml_dtd] [-d out_directory]" 265 | print " [-t | -f | -a | -s] " 266 | print "" 267 | print " -h | --help: Print this help." 268 | print " -V | --version: Version information." 269 | print " -v: The Scalp DTD file. './scalp_xmldtd.dtd' by default." 270 | print " -d: The directory to write the output files. './' by default. Implies -t" 271 | print " -t: Text output. This will produce a indented text file which" 272 | print " will be written to 'sexr_.<##>.txt'." 273 | print " -f: Full parse to selected output format." 274 | print " -a: Provides a count of specific attacks detected to selected" 275 | print " output format." 276 | print " -s: Provides a count of the Source IP addresses associated with" 277 | print " the specific Attack types to selected output format." 278 | 279 | def version(): 280 | """ 281 | Function: version 282 | Variables: 283 | None 284 | 285 | Purpose: 286 | Print version informatin out stdout 287 | """ 288 | print "Scalp External XML Reporter release: %s" % __release__ 289 | print "%s" % __copyright__ 290 | print "%s" % __license__ 291 | print "" 292 | print "Credits: %s" % __credits__ 293 | 294 | 295 | #def main(argv): 296 | def main(): 297 | """ 298 | Function: main 299 | Variables: 300 | argv = List of command line arguments NOT including the program name. 301 | 302 | Purpose: 303 | The main function where the user's intent is determined and all of the 304 | functions are called. 305 | """ 306 | ################### 307 | # Init 308 | ################### 309 | 310 | # Setup variables and default locations 311 | global _dout # Directory to write output 312 | global _fhandle 313 | global _fout # File to write output 314 | global _fout_ext # File extention in case Text, if STDOUT then this = "" which is default 315 | global _scan # Scan type: full = full parse, count = count by attack type, IP = List Source IPs of attack 316 | _dout = os.getcwd() + "/" # default write to current working directory 317 | _fhandle = sys.stdout # Default output is to stdout 318 | _fout_ext = "" 319 | _scan = "full" # Default 320 | dnow = datetime.datetime.utcnow() 321 | fnow = "%s.%s" % (dnow.date(), dnow.time()) 322 | fdtd = "scalp_xmldtd.dtd" # Default DTD file for validation 323 | vdtd = "" 324 | xdtd = "" 325 | 326 | # Grab file or directory 327 | if len(sys.argv) < 2: 328 | help() 329 | sys.exit() 330 | if len(sys.argv) > 1: 331 | inXML = sys.argv.pop(len(sys.argv) - 1) 332 | 333 | # Get program options 334 | try: 335 | opts, args = getopt.getopt(sys.argv[1:], "hVv:d:tfas",["help","version"]) 336 | except getopt.GetoptError: 337 | # Program help 338 | print "%s: command line error" % sys.argv[0] 339 | help() 340 | sys.exit(1) 341 | for opt, arg in opts: 342 | if opt in ("-h", "--help"): 343 | help() 344 | sys.exit() 345 | elif opt in ("-V", "--version"): 346 | version() 347 | sys.exit() 348 | elif opt == ("-v"): # Validation - DTD file 349 | fdtd = os.path.abspath(arg) 350 | if not os.path.isfile(fdtd): 351 | print "%s: Could not find DTD file: %s" % (sys.argv[0], fdtd) 352 | sys.exit(1) 353 | elif opt == ("-d"): # Output directory 354 | # check for ending / and append if none 355 | _dout = os.path.abspath(arg) 356 | if not os.path.exists(_dout): 357 | try: 358 | os.mkdir(_dout) 359 | except OSError: 360 | print "%s: Could not create: %s" % (sys.argv[0], _dout) 361 | if not _dout[len(_dout) - 1] == "/": 362 | _dout = _dout + "/" 363 | # Set these again in case user forgot -t 364 | _fout = "sexr_%s." % fnow 365 | _fout_ext = ".txt" 366 | elif opt == ("-t"): # Text output 367 | _fout = "sexr_%s." % fnow 368 | _fout_ext = ".txt" 369 | elif opt == ("-f"): # Full Parse - default 370 | _scan = "full" 371 | elif opt == ("-a"): # count by attack type 372 | _scan = "count" 373 | elif opt == ("-s"): # List Source IPs of attack 374 | _scan = "IP" 375 | else: 376 | print "%s: Detected unrecognized command line argument." % sys.argv[0] 377 | help() 378 | sys.exit(1) 379 | 380 | # validate Scalp XML files 381 | tempXML = [] 382 | if os.path.isdir(inXML): 383 | tempXML = glob.glob(os.path.abspath(inXML + '/*')) 384 | for fXML in tempXML: 385 | if os.path.isdir(fXML): 386 | tempXML.pop(tempXML.index(fXML)) 387 | elif os.path.isfile(inXML): 388 | tempXML.insert(0,os.path.abspath(inXML)) 389 | else: 390 | print "%s: Could not find Scalp XML file." % sys.argv[0] 391 | help() 392 | sys.exit(1) 393 | inXML = [] # convert inXML to a list 394 | inXML = tempXML 395 | 396 | # Prep for XML validation 397 | if not os.path.isfile(fdtd): 398 | print "%s: Could not find DTD file: %s" % (sys.argv[0], fdtd) 399 | sys.exit(1) 400 | try: 401 | xdtd = open(fdtd,'r') 402 | except: 403 | print "%s: Could not open DTD file: %s" % (sys.argv[0], fdtd) 404 | sys.exit(1) 405 | vdtd = etree.DTD(xdtd) 406 | 407 | 408 | ################### 409 | # Main 410 | ################### 411 | 412 | if _scan == "full": 413 | print "%s: Conducting %s scan of %s files" % (sys.argv[0], _scan, len(inXML)) 414 | 415 | for fXML in inXML: 416 | # Parse the XML file and find the root node 417 | p_scalp = xparse(fXML, vdtd) 418 | len_p_scalp = len(p_scalp) 419 | if len_p_scalp < 1: # Nothing found in file 420 | continue # skip to next file 421 | 422 | # Determine where to write 423 | if len(_fout_ext): # User wants output to file 424 | _fhandle = set_foutput(_fhandle) 425 | 426 | # Iterate through the whole XML file and print it to STDOUT 427 | iter(p_scalp,'',_fhandle) 428 | if len(_fout_ext): # User wants output to file 429 | _fhandle.close() 430 | 431 | if _scan == "count" or _scan == "IP": 432 | print "%s: Conducting %s scan of %s files" % (sys.argv[0], _scan, len(inXML)) 433 | 434 | for fXML in inXML: 435 | # Parse the XML file and find the root node 436 | p_scalp = xparse(fXML, vdtd) 437 | len_p_scalp = len(p_scalp) 438 | if len_p_scalp < 1: # Nothing found in file 439 | continue # skip to next file 440 | 441 | # Determine where to write 442 | if len(_fout_ext): # User wants output to file 443 | _fhandle = set_foutput(_fhandle) 444 | 445 | # Iterate through the whole XML file but only show attack numbers 446 | # and source IP addresses 447 | item_cnt_iter(p_scalp,'',_fhandle) 448 | if len(_fout_ext): # User wants output to file 449 | _fhandle.close() 450 | 451 | 452 | ################### 453 | # Clean up 454 | ################### 455 | 456 | print "%s: Done" % sys.argv[0] 457 | sys.exit() # return 458 | 459 | 460 | if __name__ == '__main__': 461 | 462 | # Function where all the work is done 463 | main() 464 | -------------------------------------------------------------------------------- /scalp_xmldtd.dtd: -------------------------------------------------------------------------------- 1 | 2 | 4 | 5 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | --------------------------------------------------------------------------------