├── README.md
├── anathema
├── __init__.py
├── anathema.py
└── signature.json
├── output.example.html
├── requirements.txt
├── scalp
├── __init__.py
├── scalp.py
└── sexr.py
└── scalp_xmldtd.dtd
/README.md:
--------------------------------------------------------------------------------
1 | Scalp!/Anathema is a fork (or rather saving the code before GoogleCode collapses) of the original project originally hosted at GoogleCode (one of thousands forks, I feel); My aim is to rewrite outdated places (Scalp! was written at 2008, and since then Python has made a big step forward), add multiprocessing plus implement Anathema heuristic module.
2 |
3 | # Scalp!
4 |
5 | Scalp! is a log analyzer for the Apache web server that aims to look for security problems developed by Romain Gaucher. The main idea is to look through huge log files and extract the possible attacks that have been sent through HTTP/GET (By default, Apache does not log the HTTP/POST variable).
6 |
7 | default_filters.xml is a part of PHP IDS project;
8 |
9 | ## How it works
10 | Scalp is basically using the regular expression from the PHP-IDS project and matches the lines from the Apache access log file. These regexp has been chosen because of their quality and the top activity of the team maintaining that project.
11 |
12 | You will then need latest version of this file https://dev.itratos.de/projects/php-ids/repository/raw/trunk/lib/IDS/default_filter.xml in order to run Scalp. (actually, Scalp! can even download it for you :3 )
13 |
14 | Scalp started as a simple python script which is still maintained, but I plan to focus my effort on the binary version (written in C++) for efficiency when it comes to scalp huge log files.
15 |
16 | ### Usage
17 | Scalp has a couple of options that may be useful in order to save time when scalping a huge log file or in order to perform a full examination; the default options are almost okay for log files of hundreds of MB.
18 |
19 | Current options:
20 |
21 | - exhaustive: Won't stop at the first pattern matched, but will test all the patterns
22 | - tough: Will decode a part of potential attacks (this is done to use better the regexp from PHP-IDS in order to - decrease the false-negative rate)
23 | - period: Specify a time-frame to look at, all the rest will be ignored
24 | - sample: Does a random sampling of the log lines in order to look at a certain percentage, this is useful when the user doesn't want to do a full scan of all the log, but just ping it to see if there is some problem...
25 | - attack: Specify what classes of vulnerabilities the tool will look at (eg, look only for XSS, SQL Injection, etc.)
26 | Example of utilization:
27 |
28 | ./scalp-0.4.py -l /var/log/httpd_log -f ./default_filter.xml -o ./scalp-output --html
29 |
30 | ### Help
31 |
32 | rgaucher@plop:~/work/scalp/branches$ ./scalp-0.4.py --help
33 |
34 | ### Features
35 | Since the main engine is done, I am currently focusing on the speed; for now, I am around 250000 lines of log in 170 seconds (which I consider not good, but okay compared to the Python's version I did before starting this one in C++) if I don't select an exhaustive list of the attacks (which means, it will not perform all the attack checking but stop at the first found -- based on criteria which is IMPACT > TYPE). To increase the speed, I am looking to use a multi-thread engine in order to take advantage of the muti-core processors.
36 |
37 | Beside the speed of this software, a couple of points are important:
38 |
39 | - output in many formats (TEXT, XML, HTML)
40 | - options in order to let the user do a pre-selection (mainly with a range of dates)
41 | - configuration of the format of the Apache log may come later...
42 |
--------------------------------------------------------------------------------
/anathema/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nanopony/apache-scalp/94cf094fb36360a39de9c407e87841363dbd4500/anathema/__init__.py
--------------------------------------------------------------------------------
/anathema/anathema.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | """
3 | Anathema heuristic module
4 | by Nanopony
5 |
6 | Copyright (c) 2015 Nanopony
7 |
8 | Licensed under the Apache License, Version 2.0 (the "License");
9 | you may not use this file except in compliance with the License.
10 | You may obtain a copy of the License at
11 |
12 | http://www.apache.org/licenses/LICENSE-2.0
13 |
14 | Unless required by applicable law or agreed to in writing, software
15 | distributed under the License is distributed on an "AS IS" BASIS,
16 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17 | See the License for the specific language governing permissions and
18 | limitations under the License.
19 | """
20 | import json
21 | import re
22 | from datetime import datetime
23 |
24 |
25 | class AbstractSignatureBase:
26 | def analyse(self, ip, method, url, agent):
27 | """
28 | Gets request, returns heuristic level 0 - code blue, 10 - code red
29 | :param url:
30 | :param agent:
31 | :return:
32 | """
33 | return 0
34 |
35 |
36 | class JsonSignatureBase:
37 | def __init__(self):
38 | with open('./signature.json') as rf:
39 | raw = rf.read()
40 | self._json = json.loads(raw)
41 | self._compile_signatures()
42 |
43 | def _compile_signatures(self):
44 | self._signatures = []
45 | for s in self._json['signatures']:
46 | self._signatures.append((
47 | re.compile(s['q'],re.I),
48 | int(s['s']),
49 | ))
50 | def analyse(self, ip, method, url, agent):
51 | """
52 | Gets request, returns heuristic level 0 - code blue, 10 - code red
53 | :param url:
54 | :param agent:
55 | :return:
56 | """
57 | for s in self._signatures:
58 | if s[0].search(url):
59 | return s[1]
60 | return 0
61 |
62 | class Violator:
63 | def __init__(self, ip):
64 | self.ip = ip
65 | self.violations = []
66 | self.should_be_banned = False
67 | def pretty_print(self):
68 | print('Violator: %s %s' % (self.ip, '[BUSTED]' if self.should_be_banned else ''))
69 | print('Crimes:')
70 |
71 | for v in self.violations:
72 | print(' %s %s %s : %s'%v)
73 | print('')
74 | def push_violation(self, date, method, url, agent, severity):
75 | """
76 | Violator isn't in jail
77 | :param date:
78 | :param method:
79 | :param url:
80 | :param agent:
81 | :param severity:
82 | :return:
83 | """
84 | self.latest_violation = date
85 | self.violations.append((method, url, agent, severity))
86 | self.should_be_banned = True
87 | return self.should_be_banned
88 |
89 | def push_evidence(self, date, method, url, agent):
90 | """
91 | Violator is in jail, we won't analyse his action to save time
92 | :param date:
93 | :param method:
94 | :param url:
95 | :param agent:
96 | :param severity:
97 | :return:
98 | """
99 | self.violations.append((method, url, agent, 10))
100 |
101 |
102 | class Anathema:
103 | def __init__(self, filename):
104 | self.filename = filename
105 | self.log_line_regex = re.compile(
106 | r'^([0-9\.]+)\s(.*)\[(.*)\]\s"([A-Z]+)\s*(.+)\sHTTP/\d.\d"\s(\d+)\s([\d]+)(\s"(.+)" )?(.*)$')
107 | self.heuristic = JsonSignatureBase()
108 |
109 | self.purgitory = dict()
110 | self.jail = set()
111 |
112 | def parse_log(self):
113 | with open(self.filename) as log_file:
114 | for line_id, line in enumerate(log_file):
115 | m = self.log_line_regex.match(line)
116 | if m is not None:
117 | ip, name, date, method, url, response, byte, _, referrer, agent = m.groups()
118 | if ip in self.jail:
119 | self.purgitory[ip].push_evidence(date, method, url, agent)
120 | continue
121 |
122 | if len(url) > 1 and method in ('GET', 'POST', 'HEAD', 'PUT', 'PUSH', 'OPTIONS'):
123 | date = datetime.strptime(date, '%d/%b/%Y:%H:%M:%S %z')
124 | sev = self.heuristic.analyse(ip, method, url, agent)
125 | if (sev>0):
126 | if (ip not in self.purgitory):
127 | self.purgitory[ip] = Violator(ip)
128 | if self.purgitory[ip].push_violation(date, method, url, agent, sev):
129 | self.jail.add(ip)
130 | for key, violator in self.purgitory.items():
131 | violator.pretty_print()
132 | if __name__ == '__main__':
133 | a = Anathema('../../test/satellite-access.log')
134 | a.parse_log()
--------------------------------------------------------------------------------
/anathema/signature.json:
--------------------------------------------------------------------------------
1 | {
2 | "version": 0.001,
3 | "signatures": [
4 | {
5 | "q": "tmUnblock.cgi",
6 | "s": 10,
7 | "tags": ["scan"]
8 | },
9 | {
10 | "q": "w00tw00t.cgi",
11 | "s": 10,
12 | "tags": ["scan"]
13 | },
14 | {
15 | "q": "myadmin/scripts/setup.php",
16 | "s": 10,
17 | "tags": ["scan"]
18 | },
19 | {
20 | "q": "pma/scripts/setup.php",
21 | "s": 10,
22 | "tags": ["scan"]
23 | },
24 | {
25 | "q": "phpmyadmin/scripts/setup.php",
26 | "s": 10,
27 | "tags": ["scan"]
28 | },
29 | {
30 | "q": "cgi-sys/defaultwebpage.cgi",
31 | "s": 10,
32 | "tags": ["scan"]
33 | },
34 | {
35 | "q": "bigdump/bigdump.php",
36 | "s": 10,
37 | "tags": ["scan"]
38 | }
39 | ]
40 | }
--------------------------------------------------------------------------------
/output.example.html:
--------------------------------------------------------------------------------
1 |
Scalp of almost-rgaucher.info-Aug-2008.log [Tue-16-Sep-2008]
17 | xss (Cross-Site Scripting)
18 |
19 |
Impact 4
20 |
21 | Reason: Detects JavaScript language constructs
22 | Log line: /romain/include-favicon.php?url=http://yaisb.blogspot.com/favicon.ico
23 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
24 |
25 |
26 | Reason: Detects JavaScript language constructs
27 | Log line: /romain/include-favicon.php?url=http://blog.ianbicking.org/favicon.ico
28 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
29 |
30 |
31 | Reason: Detects JavaScript language constructs
32 | Log line: /romain/include-favicon.php?url=http://www.cigital.com/favicon.ico
33 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
34 |
35 |
36 | Reason: Detects JavaScript language constructs
37 | Log line: /romain/include-favicon.php?url=http://www.hackosis.com/favicon.ico
38 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
39 |
40 |
41 | Reason: Detects JavaScript language constructs
42 | Log line: /romain/include-favicon.php?url=http://jeremy.zawodny.com/favicon.ico
43 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
44 |
45 |
46 | Reason: Detects JavaScript language constructs
47 | Log line: /romain/include-favicon.php?url=http://www.modsecurity.org/favicon.ico
48 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
49 |
50 |
51 | Reason: Detects JavaScript language constructs
52 | Log line: /romain/include-favicon.php?url=http://googleonlinesecurity.blogspot.com/favicon.ico
53 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
54 |
55 |
56 | Reason: Detects JavaScript language constructs
57 | Log line: /romain/include-favicon.php?url=http://jeremiahgrossman.blogspot.com/favicon.ico
58 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
59 |
60 |
61 | Reason: Detects JavaScript language constructs
62 | Log line: /romain/include-favicon.php?url=http://kuza55.blogspot.com/favicon.ico
63 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
64 |
65 |
66 | Reason: Detects JavaScript language constructs
67 | Log line: /romain/include-favicon.php?url=http://myappsecurity.blogspot.com/favicon.ico
68 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
69 |
70 |
71 | Reason: Detects JavaScript language constructs
72 | Log line: /romain/include-favicon.php?url=http://myappsecurity.blogspot.com/favicon.ico
73 | Matching Regexp:([^*:\s\w,.\/?+-]\s*)?(?<![a-z]\s)(?<![a-z\/_@>-])(\s*return\s*)?(?:globalstorage|sessionstorage|postmessage|callee|constructor|content|domain|prototype|try|catch|top|call|apply|url|function|object|array|string|math|if|elseif|case|switch|regex|boolean|location|settimeout|setinterval|void|setexpression|namespace)(?(1)[^\w%"]|(?:\s*[^@\s\w%",.+-]))
74 |
75 |
76 |
77 | rfe (Remote File Execution)
78 |
79 |
Impact 5
80 |
81 | Reason: Detects url injections and RFE attempts
82 | Log line: /romain/include-favicon.php?url=http://yaisb.blogspot.com/favicon.ico
83 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
84 |
85 |
86 | Reason: Detects url injections and RFE attempts
87 | Log line: /romain/include-favicon.php?url=http://blog.ianbicking.org/favicon.ico
88 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
89 |
90 |
91 | Reason: Detects url injections and RFE attempts
92 | Log line: /romain/include-favicon.php?url=http://www.cigital.com/favicon.ico
93 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
94 |
95 |
96 | Reason: Detects url injections and RFE attempts
97 | Log line: /romain/include-favicon.php?url=http://www.hackosis.com/favicon.ico
98 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
99 |
100 |
101 | Reason: Detects url injections and RFE attempts
102 | Log line: /romain/include-favicon.php?url=http://jeremy.zawodny.com/favicon.ico
103 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
104 |
105 |
106 | Reason: Detects url injections and RFE attempts
107 | Log line: /romain/include-favicon.php?url=http://www.modsecurity.org/favicon.ico
108 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
109 |
110 |
111 | Reason: Detects url injections and RFE attempts
112 | Log line: /romain/include-favicon.php?url=http://googleonlinesecurity.blogspot.com/favicon.ico
113 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
114 |
115 |
116 | Reason: Detects url injections and RFE attempts
117 | Log line: /romain/include-favicon.php?url=http://jeremiahgrossman.blogspot.com/favicon.ico
118 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
119 |
120 |
121 | Reason: Detects url injections and RFE attempts
122 | Log line: /romain/include-favicon.php?url=http://kuza55.blogspot.com/favicon.ico
123 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
124 |
125 |
126 | Reason: Detects url injections and RFE attempts
127 | Log line: /romain/include-favicon.php?url=http://myappsecurity.blogspot.com/favicon.ico
128 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
129 |
130 |
131 | Reason: Detects url injections and RFE attempts
132 | Log line: /romain/include-favicon.php?url=http://myappsecurity.blogspot.com/favicon.ico
133 | Matching Regexp:(?:\w+]?(?<!href)(?<!src)(?<!longdesc)(?<!returnurl)=(?:https?|ftp):)|(?:\{\s*\$\s*\{)
134 |
135 |
136 |
137 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | regex
--------------------------------------------------------------------------------
/scalp/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nanopony/apache-scalp/94cf094fb36360a39de9c407e87841363dbd4500/scalp/__init__.py
--------------------------------------------------------------------------------
/scalp/scalp.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | """
3 | Scalp! Apache log based attack analyzer
4 | by Romain Gaucher - http://rgaucher.info
5 | http://code.google.com/p/apache-scalp
6 |
7 |
8 | Copyright (c) 2008 Romain Gaucher
9 |
10 | Licensed under the Apache License, Version 2.0 (the "License");
11 | you may not use this file except in compliance with the License.
12 | You may obtain a copy of the License at
13 |
14 | http://www.apache.org/licenses/LICENSE-2.0
15 |
16 | Unless required by applicable law or agreed to in writing, software
17 | distributed under the License is distributed on an "AS IS" BASIS,
18 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19 | See the License for the specific language governing permissions and
20 | limitations under the License.
21 |
22 | EXTERNAL DEVELOPER NOTES:
23 | 10052008: Don C. Weber
24 | Fixed XML header by putting comment after XML version line.
25 | - This was necessary so that Firefox recognized the file as XML and
26 | displayed it properly. Proper display allows for sections to be
27 | collapsed for easy viewing.
28 | 10062008: Don C. Weber
29 | Added Regexp to the XML output. Also added this to the DTD
30 | 12312008: Don C. Weber
31 | Added IP and Subnet exclusion capability to cmd line input and scalper function
32 |
33 | """
34 | from __future__ import with_statement
35 | import time, base64
36 | import os,sys,random
37 |
38 | import regex as re
39 |
40 | try:
41 | from lxml import etree
42 | except ImportError:
43 | try:
44 | import xml.etree.cElementTree as etree
45 | except ImportError:
46 | try:
47 | import xml.etree.ElementTree as etree
48 | except ImportError:
49 | print("Cannot find the ElementTree in your python packages")
50 |
51 | __application__ = "scalp"
52 | __version__ = "0.5"
53 | __release__ = __application__ + '/' + __version__
54 | __author__ = "Romain Gaucher"
55 | __credits__ = ["Romain Gaucher", "Don C. Weber", "nanopony"]
56 |
57 | PHPIDC_DEFAULT_XML_URL = "http://dev.itratos.de/projects/php-ids/repository/raw/trunk/lib/IDS/default_filter.xml" # they have expired https cert atm :c
58 |
59 | names = {
60 | 'xss' : 'Cross-Site Scripting',
61 | 'sqli' : 'SQL Injection',
62 | 'csrf' : 'Cross-Site Request Forgery',
63 | 'dos' : 'Denial Of Service',
64 | 'dt' : 'Directory Traversal',
65 | 'spam' : 'Spam',
66 | 'id' : 'Information Disclosure',
67 | 'rfe' : 'Remote File Execution',
68 | 'lfi' : 'Local File Inclusion'
69 | }
70 |
71 | c_reg = re.compile(r'^(.+)-(.*)\[(.+)[-|+](\d+)\] "([A-Z]+)?(.+) HTTP/\d.\d" (\d+)(\s[\d]+)?(\s"(.+)" )?(.*)$')
72 | table = {}
73 |
74 | class BreakLoop( Exception ):
75 | pass
76 |
77 | txt_header = """
78 | #
79 | # File created by Scalp! by Romain Gaucher - http://code.google.com/p/apache-scalp
80 | # Apache log attack analysis tool based on PHP-IDS filters
81 | #
82 | """
83 |
84 | xml_header = """
85 |
89 | """
90 |
91 | html_header = """"""
107 |
108 | html_footer = ""
109 |
110 | class object_dict(dict):
111 | def __init__(self, initd=None):
112 | if initd is None:
113 | initd = {}
114 | dict.__init__(self, initd)
115 | def __getattr__(self, item):
116 | d = self.__getitem__(item)
117 | # if value is the only key in object, you can omit it
118 | if isinstance(d, dict) and 'value' in d and len(d) == 1:
119 | return d['value']
120 | else:
121 | return d
122 | def __setattr__(self, item, value):
123 | self.__setitem__(item, value)
124 |
125 | def __parse_node(node):
126 | tmp = object_dict()
127 | # save attrs and text, hope there will not be a child with same name
128 | if node.text:
129 | tmp['value'] = node.text
130 | for (k,v) in node.attrib.items():
131 | tmp[k] = v
132 | for ch in node.getchildren():
133 | cht = ch.tag
134 | chp = __parse_node(ch)
135 | if cht not in tmp: # the first time, so store it in dict
136 | tmp[cht] = chp
137 | continue
138 | old = tmp[cht]
139 | if not isinstance(old, list):
140 | tmp.pop(cht)
141 | tmp[cht] = [old] # multi times, so change old dict to a list
142 | tmp[cht].append(chp) # add the new one
143 | return tmp
144 |
145 | def parse(xml_file):
146 | try:
147 | xml_handler = open(xml_file, 'r')
148 | doc = etree.parse(xml_handler).getroot()
149 | xml_handler.close()
150 | return object_dict({doc.tag: __parse_node(doc)})
151 | except IOError:
152 | print("error: problem with the filter's file")
153 | return {}
154 |
155 | def get_value(array, default):
156 | if 'value' in array:
157 | return array['value']
158 | return default
159 |
160 | def html_entities(str):
161 | out = ""
162 | for i in str:
163 | if i == '"': out += '"'
164 | elif i == '<': out += '<'
165 | elif i == '>': out += '>'
166 | else:
167 | out += i
168 | return out
169 |
170 | d_replace = {
171 | "\r":";",
172 | "\n":";",
173 | "\f":";",
174 | "\t":";",
175 | "\v":";",
176 | "'":"\"",
177 | "+ACI-":"\"",
178 | "+ADw-":"<",
179 | "+AD4-" : ">",
180 | "+AFs-" : "[",
181 | "+AF0-" : "]",
182 | "+AHs-" : "{",
183 | "+AH0-" : "}",
184 | "+AFw-" : "\\",
185 | "+ADs-" : ";",
186 | "+ACM-" : "#",
187 | "+ACY-" : "&",
188 | "+ACU-" : "%",
189 | "+ACQ-" : "$",
190 | "+AD0-" : "=",
191 | "+AGA-" : "'",
192 | "+ALQ-" : "\"",
193 | "+IBg-" : "\"",
194 | "+IBk-" : "\"",
195 | "+AHw-" : "|",
196 | "+ACo-" : "*",
197 | "+AF4-" : "^",
198 | "+ACIAPg-" : "\">",
199 | "+ACIAPgA8-" : "\">",
200 | }
201 | re_replace = None
202 |
203 |
204 | def fill_replace_dict():
205 | global d_replace, re_replace
206 | # very first control-chars
207 | for i in range(0,20):
208 | d_replace["%%%x" % i] = "%00"
209 | d_replace["%%%X" % i] = "%00"
210 | # javascript charcode
211 | for i in range(33,127):
212 | c = "%c" % i
213 | d_replace["\\%o" % i] = c
214 | d_replace["\\%x" % i] = c
215 | d_replace["\\%X" % i] = c
216 | d_replace["0x%x" % i] = c
217 | d_replace["%d;" % i] = c
218 | d_replace["%x;" % i] = c
219 | d_replace["%X;" % i] = c
220 | # SQL words?
221 | d_replace["is null"]="=0"
222 | d_replace["like null"]="=0"
223 | d_replace["utc_time"]=""
224 | d_replace["null"]=""
225 | d_replace["true"]=""
226 | d_replace["false"]=""
227 | d_replace["localtime"]=""
228 | d_replace["stamp"]=""
229 | d_replace["binary"]=""
230 | d_replace["ascii"]=""
231 | d_replace["soundex"]=""
232 | d_replace["md5"]=""
233 | d_replace["between"]="="
234 | d_replace["is"]="="
235 | d_replace["not in"]="="
236 | d_replace["xor"]="="
237 | d_replace["rlike"]="="
238 | d_replace["regexp"]="="
239 | d_replace["sounds like"]="="
240 | re_replace = re.compile("(%s)" % "|".join(map(re.escape, d_replace.keys())))
241 |
242 |
243 | def multiple_replace(text):
244 | return re_replace.sub(lambda mo: d_replace[mo.string[mo.start():mo.end()]], text)
245 |
246 | # the decode engine tries to detect then decode...
247 | def decode_attempt(string):
248 | return multiple_replace(string)
249 |
250 | def analyzer(data):
251 | exp_line, regs, array, preferences, org_line = data[0],data[1],data[2],data[3],data[4]
252 | done = []
253 | # look for the detected attacks...
254 | # either stop at the first found or not
255 | for attack_type in preferences['attack_type']:
256 | if attack_type in regs:
257 | if attack_type not in array:
258 | array[attack_type] = {}
259 | for _hash in regs[attack_type]:
260 | if _hash not in done:
261 | done.append(_hash)
262 | attack = table[_hash]
263 | cur_line = exp_line[5]
264 | if preferences['encodings']:
265 | cur_line = decode_attempt(cur_line)
266 | if attack[0].search(cur_line):
267 | if attack[1] not in array[attack_type]:
268 | array[attack_type][attack[1]] = []
269 | array[attack_type][attack[1]].append((exp_line, attack[3], attack[2], org_line))
270 | if preferences['exhaustive']:
271 | break
272 | else:
273 | return
274 |
275 | def scalper(access, filters, preferences = [], output = "text"):
276 | global table
277 | if not os.path.isfile(access):
278 | print("error: the log file doesn't exist")
279 | return
280 | if not os.path.isfile(filters):
281 | print("error: the filters file (XML) doesn't exist")
282 |
283 | ans = input("Do you want me to download it? [y]/n: ")
284 | if ans in ["", "y", "Y"]:
285 | import urllib.request
286 | urllib.request.urlretrieve(PHPIDC_DEFAULT_XML_URL, filters)
287 | else:
288 | return
289 | if output not in ('html', 'text', 'xml'):
290 | print("error: the output format '%s' hasn't been recognized") % output
291 | return
292 | # load the XML file
293 | xml_filters = parse(filters)
294 | len_filters = len(xml_filters)
295 | if len_filters < 1:
296 | return
297 | # prepare to load the compiled regular expression
298 | regs = {} # type => (reg.compiled, impact, description, rule)
299 |
300 | print("Loading XML file '%s'..." % filters)
301 | for group in xml_filters:
302 | for f in xml_filters[group]:
303 | if f == 'filter':
304 | if type(xml_filters[group][f]) == type([]):
305 | for elmt in xml_filters[group][f]:
306 | rule, impact, description, tags = "",-1,"",[]
307 | if 'impact' in elmt:
308 | impact = int(get_value(elmt['impact'], -1))
309 | if 'rule' in elmt:
310 | rule = get_value(elmt['rule'], "")
311 | if 'description' in elmt:
312 | description = get_value(elmt['description'], "")
313 | if 'tags' in elmt and 'tag' in elmt['tags']:
314 | if type(elmt['tags']['tag']) == type([]):
315 | for tag in elmt['tags']['tag']:
316 | tags.append(get_value(tag, ""))
317 | else:
318 | tags.append(get_value(elmt['tags']['tag'], ""))
319 | # register the entry in our array
320 | for t in tags:
321 | compiled = None
322 | if t not in regs:
323 | regs[t] = []
324 | try:
325 | compiled = re.compile(rule)
326 | except Exception:
327 | print("The rule '%s' cannot be compiled properly" % rule)
328 | return
329 | _hash = hash(rule)
330 | if impact > -1:
331 | table[_hash] = (compiled, impact, description, rule, _hash)
332 | regs[t].append(_hash)
333 | if len(preferences['attack_type']) < 1:
334 | preferences['attack_type'] = regs.keys()
335 | flag = {} # {type => { impact => ({log_line dict}, rule, description, org_line) }}
336 |
337 | print("Processing the file '%s'..." % access)
338 |
339 | sample, sampled_lines = False, []
340 | if preferences['sample'] != float(100):
341 | # get the number of lines
342 | sample = True
343 | total_nb_lines = sum(1 for line in open(access))
344 | # take a random sample
345 | random.seed(time.clock())
346 | sampled_lines = random.sample(range(total_nb_lines), int(float(total_nb_lines) * preferences['sample'] / float(100)))
347 | sampled_lines.sort()
348 |
349 | loc, lines, nb_lines = 0, 0, 0
350 | old_diff = 0
351 | start = time.time()
352 | diff = []
353 | with open(access) as log_file:
354 | for line in log_file:
355 | lines += 1
356 | if sample and lines not in sampled_lines:
357 | continue
358 | if c_reg.match(line):
359 | out = c_reg.search(line)
360 | ip = out.group(1)
361 | name = out.group(2)
362 | date = out.group(3)
363 | ext = out.group(4)
364 | method = out.group(5)
365 | url = out.group(6)
366 | response = out.group(7)
367 | byte = out.group(8)
368 | referrer = out.group(9)
369 | agent = out.group(10)
370 |
371 | if preferences['ip_exclude'] != [] or preferences['subnet_exclude'] != []:
372 | ip_split = ip.split()
373 | if ip_split[0] in preferences['ip_exclude']:
374 | continue
375 |
376 | try:
377 | for sub in preferences['subnet_exclude']:
378 | if ip_split[0].startswith(sub):
379 | raise BreakLoop()
380 | except BreakLoop:
381 | continue
382 |
383 | if not correct_period(date, preferences['period']):
384 | continue
385 | loc += 1
386 | if len(url) > 1 and method in ('GET','POST','HEAD','PUT','PUSH','OPTIONS'):
387 | analyzer([(ip,name,date,ext,method,url,response,byte,referrer,agent),regs,flag, preferences, line])
388 | elif preferences['except']:
389 | diff.append(line)
390 |
391 | # mainly testing purposes...
392 | if nb_lines > 0 and lines > nb_lines:
393 | break
394 |
395 | tt = time.time() - start
396 | n = 0
397 | for t in flag:
398 | for i in flag[t]:
399 | n += len(flag[t][i])
400 | print("Scalp results:")
401 | print("\tProcessed %d lines over %d" % (loc,lines))
402 | print("\tFound %d attack patterns in %f s" % (n,tt))
403 |
404 | short_name = access[access.rfind(os.sep)+1:]
405 | if n > 0:
406 | print("Generating output in %s%s%s_scalp_*" % (preferences['odir'],os.sep,short_name))
407 | if 'html' in preferences['output']:
408 | generate_html_file(flag, short_name, filters, preferences['odir'])
409 | elif 'text' in preferences['output']:
410 | generate_text_file(flag, short_name, filters, preferences['odir'])
411 | elif 'xml' in preferences['output']:
412 | generate_xml_file(flag, short_name, filters, preferences['odir'])
413 |
414 | # generate exceptions
415 | if len(diff) > 0:
416 | o_except = open(os.path.abspath(preferences['odir'] + os.sep + "scalp_except.txt"), "w")
417 | for l in diff:
418 | o_except.write(l + '\n')
419 | o_except.close()
420 |
421 |
422 | def generate_text_file(flag, access, filters, odir):
423 | curtime = time.strftime("%a-%d-%b-%Y", time.localtime())
424 | fname = '%s_scalp_%s.txt' % (access, curtime)
425 | fname = os.path.abspath(odir + os.sep + fname)
426 | try:
427 | out = open(fname, 'w')
428 | out.write(txt_header)
429 | out.write("Scalped file: %s\n" % access)
430 | out.write("Creation date: %s\n\n" % curtime)
431 | for attack_type in flag:
432 | if attack_type in names:
433 | out.write("Attack %s (%s)\n" % (names[attack_type], attack_type))
434 | else:
435 | out.write("Attack type: %s\n" % attack_type)
436 | impacts = list(flag[attack_type].keys())
437 | impacts.sort(reverse=True)
438 |
439 | for i in impacts:
440 | out.write("\n\t### Impact %d\n" % int(i))
441 | for e in flag[attack_type][i]:
442 | out.write("\t%s" % e[3])
443 | out.write("\tReason: \"%s\"\n\n" % e[2])
444 | out.close()
445 | except IOError:
446 | print("Cannot open the file:", fname)
447 | return
448 |
449 |
450 | def generate_xml_file(flag, access, filters, odir):
451 | curtime = time.strftime("%a-%d-%b-%Y", time.localtime())
452 | fname = '%s_scalp_%s.xml' % (access, curtime)
453 | fname = os.path.abspath(odir + os.sep + fname)
454 | try:
455 | out = open(fname, 'w')
456 | out.write(xml_header)
457 | out.write("\n" % (access, curtime))
458 | for attack_type in flag:
459 | name = ""
460 | if attack_type in names:
461 | name = " name=\"%s\"" % names[attack_type]
462 | out.write(" \n" % (attack_type, name))
463 | impacts = flag[attack_type].keys()
464 | impacts.sort(reverse=True)
465 | for i in impacts:
466 | out.write(" \n" % int(i))
467 | for e in flag[attack_type][i]:
468 | out.write(" - \n")
469 | out.write(" \n" % e[2])
470 | out.write(" \n" % e[1])
471 | out.write(" \n" % e[3])
472 | out.write("
\n")
473 | out.write(" \n")
474 | out.write(" \n")
475 | out.write("")
476 | out.close()
477 | except IOError:
478 | print("Cannot open the file:", fname)
479 | return
480 | return
481 |
482 | def generate_html_file(flag, access, filters, odir):
483 | curtime = time.strftime("%a-%d-%b-%Y", time.localtime())
484 | fname = '%s_scalp_%s.html' % (access, curtime)
485 | fname = os.path.abspath(odir + os.sep + fname)
486 | try:
487 | out = open(fname, 'w')
488 | out.write(html_header)
489 | out.write("Scalp of %s [%s]
\n" % (access, curtime))
490 | for attack_type in flag:
491 | name = ""
492 | if attack_type in names:
493 | name = "%s" % names[attack_type]
494 | if len(flag[attack_type].values()) < 1:
495 | continue
496 | out.write(" %s (%s)
\n" % (attack_type, name))
497 | impacts = flag[attack_type].keys()
498 | impacts.sort(reverse=True)
499 | # order by impact
500 | for i in impacts:
501 | out.write("\n" % int(i))
502 | out.write("
Impact %d
\n" % int(i))
503 | # list the one of same impacts
504 | for e in flag[attack_type][i]:
505 | out.write("
\n")
506 | out.write(" Reason: %s
\n" % html_entities(e[2]))
507 | out.write(" Log line:%s
\n" % html_entities(e[0][5]))
508 | out.write(" Matching Regexp:%s\n" % html_entities(e[1]))
509 | out.write("
\n")
510 | out.write("
\n")
511 | out.write("
\n")
512 | out.write(html_footer)
513 | out.close()
514 | except IOError:
515 | print("Cannot open the file:", fname)
516 | return
517 |
518 | months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
519 |
520 | def correct_period(date, period):
521 | date = date.replace(':', '/')
522 | l_date = date.split('/')
523 | for i in (2,1,0,3,4,5):
524 | if i != 1:
525 | cur = int(l_date[i])
526 | if cur < period['start'][i] or cur > period['end'][i]:
527 | return False
528 | else:
529 | cur = months.index(l_date[i])
530 | if cur == -1:
531 | return False
532 | if cur < period['start'][i] or cur > period['end'][i]:
533 | return False
534 | return True
535 |
536 |
537 | def analyze_date(date):
538 | """04/Apr/2008:15:45;*/May/2008"""
539 |
540 | d_min = [1, 00, 0000, 00, 00, 00]
541 | d_max = [31, 11, 9999, 24, 59, 59]
542 |
543 | date = date.replace(':', '/')
544 | l_date = date.split(';')
545 | l_start= l_date[0].split('/')
546 | l_end = l_date[1].split('/')
547 |
548 | v_start = [1, 00, 0000, 00, 00, 00]
549 | v_end = [31, 11, 9999, 24, 59, 59]
550 |
551 | for i in range(len(l_start)):
552 | if l_start[i] == '*': continue
553 | else:
554 | if i == 1:
555 | v_start[1] = months.index(l_start[1])
556 | else:
557 | cur = int(l_start[i])
558 | if cur < d_min[i]: v_start[i] = d_min[i]
559 | elif cur > d_max[i]: v_start[i] = d_max[i]
560 | else: v_start[i] = cur
561 | for i in range(len(l_end)):
562 | if l_end[i] == '*': continue
563 | else:
564 | if i == 1:
565 | v_end[1] = months.index(l_end[1])
566 | else:
567 | cur = int(l_end[i])
568 | if cur < d_min[i]: v_end[i] = d_min[i]
569 | elif cur > d_max[i]: v_end[i] = d_max[i]
570 | else: v_end[i] = cur
571 | return {'start' : v_start, 'end' : v_end}
572 |
573 | def help():
574 | print("Scalp the apache log! by Romain Gaucher - http://rgaucher.info")
575 | print("usage: ./scalp.py [--log|-l log_file] [--filters|-f filter_file] [--period time-frame] [OPTIONS] [--attack a1,a2,..,an]")
576 | print(" [--sample|-s 4.2]")
577 | print(" --log |-l: the apache log file './access_log' by default")
578 | print(" --filters |-f: the filter file './default_filter.xml' by default")
579 | print(" --exhaustive|-e: will report all type of attacks detected and not stop")
580 | print(" at the first found")
581 | print(" --tough |-u: try to decode the potential attack vectors (may increase")
582 | print(" the examination time)")
583 | print(" --period |-p: the period must be specified in the same format as in")
584 | print(" the Apache logs using * as wild-card")
585 | print(" ex: 04/Apr/2008:15:45;*/Mai/2008")
586 | print(" if not specified at the end, the max or min are taken")
587 | print(" --html |-h: generate an HTML output")
588 | print(" --xml |-x: generate an XML output")
589 | print(" --text |-t: generate a simple text output (default)")
590 | print(" --except |-c: generate a file that contains the non examined logs due to the")
591 | print(" main regular expression; ill-formed Apache log etc.")
592 | print(" --attack |-a: specify the list of attacks to look for")
593 | print(" list: xss, sqli, csrf, dos, dt, spam, id, ref, lfi")
594 | print(" the list of attacks should not contains spaces and comma separated")
595 | print(" ex: xss,sqli,lfi,ref")
596 | print(" --ignore-ip|-i: specify the list of IP Addresses to look exclude")
597 | print(" the list of IP Addresses should be comma separated and not contain spaces")
598 | print(" This option can be used in conjunction with --ignore-ip")
599 | print(" --ignore-subnet|-n: specify the list of Subnets to look exclude")
600 | print(" the list of Subnets should be comma separated and not contain spaces")
601 | print(" This option can be used in conjunction with --ignore-subnet")
602 | print(" --output |-o: specifying the output directory; by default, scalp will try to write")
603 | print(" in the same directory as the log file")
604 | print(" --sample |-s: use a random sample of the lines, the number (float in [0,100]) is")
605 | print(" the percentage, ex: --sample 0.1 for 1/1000")
606 |
607 | def main(argc, argv):
608 | filters = "default_filter.xml"
609 | access = "access_log"
610 | output = ""
611 | preferences = {
612 | 'attack_type' : [],
613 | 'ip_exclude' : [],
614 | 'subnet_exclude' : [],
615 | 'period' : {
616 | 'start' : [1, 00, 0000, 00, 00, 00],# day, month, year, hour, minute, second
617 | 'end' : [31, 11, 9999, 24, 59, 59]
618 | },
619 | 'except' : False,
620 | 'exhaustive' : False,
621 | 'encodings' : False,
622 | 'output' : "",
623 | 'odir' : os.path.abspath(os.curdir),
624 | 'sample' : float(100)
625 | }
626 |
627 | if argc < 2 or argv[1] == "--help":
628 | help()
629 | sys.exit(0)
630 | else:
631 | for i in range(argc):
632 | s = argv[i]
633 | if i < argc:
634 | if s in ("--filters","-f"):
635 | filters = argv[i+1]
636 | elif s in ("--log","-l"):
637 | access = argv[i+1]
638 | elif s in ("--output", "-o"):
639 | preferences['odir'] = argv[i+1]
640 | elif s in ("--sample", "-s"):
641 | try:
642 | preferences['sample'] = float(argv[i+1])
643 | except:
644 | preferences['sample'] = float(4.2)
645 | print("/!\ Error in the sample size, will be 4.2%")
646 | elif s in ("--period", "-p"):
647 | preferences['period'] = analyze_date(argv[i+1])
648 | elif s in ("--exhaustive", "-e"):
649 | preferences['exhaustive'] = True
650 | elif s in ("--html", "-h"):
651 | preferences['output'] += ",html"
652 | elif s in ("--xml", "-x"):
653 | preferences['output'] += ",xml"
654 | elif s in ("--text", "-t"):
655 | preferences['output'] += ",text"
656 | elif s in ("--except", "-c"):
657 | preferences['except'] = True
658 | elif s in ("--tough","-u"):
659 | fill_replace_dict()
660 | preferences['encodings'] = True
661 | elif s in ("--attack", "-a"):
662 | preferences['attack_type'] = argv[i+1].split(',')
663 | elif s in ("--ignore-ip", "-i"):
664 | preferences['ip_exclude'] = argv[i+1].split(',')
665 | elif s in ("--ignore-subnet", "-n"):
666 | preferences['subnet_exclude'] = argv[i+1].split(',')
667 | else:
668 | print("argument error, '%s' has been ignored") % s
669 | if len(preferences['output']) < 1:
670 | preferences['output'] = "text"
671 | if not os.path.isdir(preferences['odir']):
672 | print("The directory %s doesn't exist, scalp will try to create it")
673 | try:
674 | os.mkdir(preferences['odir'])
675 | except:
676 | print("/!\ scalp cannot write in"),preferences['odir']
677 | print("/!\ Ising /tmp/scalp/ as new directory...")
678 | preferences['odir'] = '/tmp/scalp'
679 | os.mkdir(preferences['odir'])
680 | scalper(access, filters, preferences)
681 |
682 | if __name__ == "__main__":
683 | main(len(sys.argv), sys.argv)
684 | """
685 | import hotshot
686 | from hotshot import stats
687 | name = "hotshot_scalp_stats"
688 | if not os.path.isfile(name):
689 | prof = hotshot.Profile(name)
690 | prof.runcall(main)
691 | prof.close()
692 | s = stats.load(name)
693 | s.sort_stats("time").print_stats()
694 | """
695 |
--------------------------------------------------------------------------------
/scalp/sexr.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | """
3 | File: sexr.py
4 | Author: Don C. Weber
5 | Start Date: 10052008
6 | Purpose: Scalp External XML Reporter parses Scalp XML files and output
7 | alert information, statistics, and detected IP addresses
8 |
9 | Copyright 2008 Don C. Weber
10 |
11 | License:
12 | This work is licensed under the Creative Commons Attribution-Share Alike 3.0
13 | United States License. To view a copy of this license, visit
14 | http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to
15 | Creative Commons, 171 Second Street, Suite 300, San Francisco, California,
16 | 94105, USA.
17 |
18 |
19 | Last Mod: 12292008
20 | Mods:
21 | 12292008 - Removed some residual debugging messages.
22 |
23 |
24 | Notes:
25 | No official DTD for Scalp XML scheme provided.
26 | DTD for Scalp XML scheme. NOTE: First time I have done this so it may or may not be correct.
27 | ----------------------
28 | File: scalp_xmldtd.dtd
29 |
30 |
31 |
33 |
34 |
36 |
37 |
38 |
39 |
40 |
41 | ----------------------
42 |
43 | To Do:
44 | - Count hits by attack type
45 | - Count hits by attacking IP
46 | - Create CSV output - not sure if this is necessary, if so use "import csv" module
47 |
48 | Resources:
49 | Scalp: http://code.google.com/p/apache-scalp/
50 | PHP-IDS: http://php-ids.org/
51 | PyXML: http://pyxml.sourceforge.net
52 | DTD Attributes: http://www.w3schools.com/DTD/dtd_attributes.asp
53 | Declaring Attributes and Entities in DTDs: http://www.criticism.com/dita/dtd2.html
54 | Apache Log Format 1.3: http://httpd.apache.org/docs/1.3/logs.html
55 | Apache Log Format 2.2: http://httpd.apache.org/docs/2.2/logs.html
56 | The lxml.etree Tutorial: http://codespeak.net/lxml/tutorial.html
57 | Validation with lxml: http://codespeak.net/lxml/validation.html#dtd
58 | Dive in Python - Handling command line arguments:
59 | http://www.faqs.org/docs/diveintopython/kgp_commandline.html
60 | """
61 | import os
62 | import sys
63 | import datetime
64 | import getopt
65 | import glob
66 |
67 | try:
68 | import psyco
69 | psyco.full()
70 | except ImportError:
71 | print "%s: psyco is not installed" % sys.argv[0]
72 | pass
73 |
74 | try:
75 | from lxml import etree
76 | except ImportError:
77 | try:
78 | import xml.etree.cElementTree as etree
79 | except ImportError:
80 | try:
81 | import xml.etree.ElementTree as etree
82 | except ImportError:
83 | print "%s: Cannot find the ElementTree in your python packages" % sys.argv[0]
84 |
85 | __application__ = "sexr"
86 | __version__ = "0.1"
87 | __release__ = __application__ + "/" + __version__
88 | __author__ = "Don C. Weber"
89 | __copyright__ = "Copyright 2008 Don C. Weber"
90 | __license__ = "Creative Commons Attribution-Share Alike 3.0 United States License"
91 | __credits__ = "Don C. Weber"
92 | __maintainer__ = "Don C. Weber"
93 | __email__ = "cutaway@cutawaysecurity.com"
94 |
95 | def xparse(xml_file, val_dtd):
96 | """
97 | Function: xparse
98 | Variables:
99 | xml_file - the XML file to be reviewed
100 | val_dtd - The DTD object for validation
101 |
102 | Return:
103 | On Success: Parsed XML 'Element'
104 | On Fail: Return nothing which basically skips the file
105 |
106 | Purpose:
107 | This function takes the xml_file and parses it into a handler
108 | so that it can be evaluated.
109 | """
110 | try:
111 | xml_handler = open(xml_file, 'r')
112 | xparse = etree.parse(xml_handler).getroot()
113 | if not val_dtd.validate(xparse):
114 | print "%s: XML file does not comply with Scalp DTD: %s" % (sys.argv[0], xml_file)
115 | return ""
116 | xml_handler.close()
117 | return xparse
118 | except IOError:
119 | print "%s: IOError with the filter's file: %s" % (sys.argv[0], xml_file)
120 | return
121 | except:
122 | print "%s: Unknown error with the filter's file: %s" % (sys.argv[0], xml_file)
123 | return ""
124 |
125 |
126 | def iter(node,indent,fOUT):
127 | """
128 | Function: iter
129 | Variables:
130 | node - the node of the XML tree to be evaluated
131 | indent - spaces for visual purposes only, helps build a visible tree for text
132 |
133 | Purpose:
134 | Iterate through a specific node of the XML tree. First show the attributes
135 | and their values and then show any text for the node. The check to see if
136 | the node has any children and iterate through them. Continue until complete.
137 | """
138 |
139 | len_node = len(node)
140 | indent += ' '
141 | if len(node.attrib):
142 | # print "%s%s: %s" % (indent, node.tag, node.attrib)
143 | line_out = "%s%s: %s\n" % (indent, node.tag, node.attrib)
144 | fOUT.write(line_out)
145 | else:
146 | # print indent + node.tag
147 | line_out = indent + node.tag + "\n"
148 | fOUT.write(line_out)
149 | if len(node.text.strip()):
150 | # print "%s - %s" % (indent, node.text)
151 | line_out = "%s - %s\n" % (indent, node.text)
152 | fOUT.write(line_out)
153 | if len_node:
154 | for ch in node:
155 | iter(ch,indent,fOUT)
156 |
157 | def item_cnt_iter(node,indent,fOUT):
158 | """
159 | Function: item_cnt_iter
160 | Variables:
161 | node - the node of the XML tree to be evaluated
162 | indent - spaces for visual purposes only, helps build a visible tree for text
163 |
164 | Purpose:
165 | Iterate through each node of the XML tree until it gets to the Impact node.
166 | When this node is encountered, count the number of source IP addresses that
167 | were associated with the flagged requests. Each impact will have specific
168 | reasons an alert was triggered. Count the number of alerts per reason.
169 | """
170 |
171 | len_node = len(node)
172 | indent += ' '
173 | d_impact = {}
174 | ip_impact = {}
175 |
176 | if node.tag == "impact": # Analyze all impact nodes
177 | # print "%sImpact %s Items: %s" % (indent, node.get('value'), str(len(node)))
178 | line_out = "%sImpact %s Items: %s\n" % (indent, node.get('value'), str(len(node)))
179 | fOUT.write(line_out)
180 | d_impact.clear()
181 | for i_ch in node: # loop through children of impact = items
182 | for ic_ch in i_ch: # loop through children of items = reason,line,regexp
183 | if ic_ch.tag == "line" and _scan == "IP":
184 | source_ip = ic_ch.text.split() # Grab and count the IP from the flagged log entry
185 | if ip_impact.has_key(source_ip[0]):
186 | ip_impact[source_ip[0]] += 1
187 | else:
188 | ip_impact[source_ip[0]] = 1
189 | if ic_ch.tag == "reason" and _scan == "count": # Grab and count the alert
190 | if d_impact.has_key(ic_ch.text):
191 | d_impact[ic_ch.text] += 1
192 | else:
193 | d_impact[ic_ch.text] = 1
194 | if len(d_impact) and _scan == "count":
195 | for key, value in d_impact.items():
196 | # print "%s - \'%s\': %s" % (indent, key, str(value))
197 | line_out = "%s - \'%s\': %s\n" % (indent, key, str(value))
198 | fOUT.write(line_out)
199 | if len(ip_impact) and _scan == "IP":
200 | # print "%s - Total Source IP Addresses: %s" % (indent, len(ip_impact))
201 | line_out = "%s - Total Source IP Addresses: %s\n" % (indent, len(ip_impact))
202 | fOUT.write(line_out)
203 | # Sort on keys only since values are not unique
204 | # I might try to figure this out later
205 | ip_addr = ip_impact.keys()
206 | ip_addr.sort()
207 | for addr in ip_addr:
208 | # print "%s - %s: %d" % (indent, addr, ip_impact[addr])
209 | line_out = "%s - %s: %s\n" % (indent, addr, ip_impact[addr])
210 | fOUT.write(line_out)
211 | return
212 |
213 | if len_node:
214 | if len(node.attrib):
215 | # print "%s%s: %s" % (indent, node.tag, node.attrib)
216 | line_out = "%s%s: %s\n" % (indent, node.tag, node.attrib)
217 | fOUT.write(line_out)
218 | else:
219 | # print indent + node.tag
220 | line_out = indent + node.tag + "\n"
221 | fOUT.write(line_out)
222 | if len(node.text.strip()):
223 | # print "%s - %s" % (indent, node.text)
224 | line_out = "%s - %s\n" % (indent, node.text)
225 | fOUT.write(line_out)
226 | for ch in node:
227 | item_cnt_iter(ch,indent,fOUT)
228 | else:
229 | return
230 |
231 | def set_foutput(fOUT):
232 | """
233 | Function: set_foutput
234 | Variables:
235 | fOUT - file handle for output
236 | none - all globals
237 |
238 | Purpose:
239 | Set new output file
240 | """
241 | outFile = "%s%s%s" % (_dout, _fout, _fout_ext)
242 | print "%s: Writing output to %s" % (sys.argv[0], outFile)
243 | try:
244 | fOUT = open(outFile, 'w')
245 | except IOError:
246 | print "%s: Error opening file location. Check permissions: %s" % (sys.argv[0], outFile)
247 | sys.exit(1)
248 |
249 | return fOUT
250 |
251 | def help():
252 | """
253 | Function: help
254 | Variables:
255 | None
256 |
257 | Purpose:
258 | Print help output to stdout
259 | """
260 |
261 | print "Scalp External XML Reporter"
262 | print "Author: Don C. Weber"
263 | print ""
264 | print "usage: ./sexr.py [-h|--help] [-V|--version] [-v xml_dtd] [-d out_directory]"
265 | print " [-t | -f | -a | -s] "
266 | print ""
267 | print " -h | --help: Print this help."
268 | print " -V | --version: Version information."
269 | print " -v: The Scalp DTD file. './scalp_xmldtd.dtd' by default."
270 | print " -d: The directory to write the output files. './' by default. Implies -t"
271 | print " -t: Text output. This will produce a indented text file which"
272 | print " will be written to 'sexr_.<##>.txt'."
273 | print " -f: Full parse to selected output format."
274 | print " -a: Provides a count of specific attacks detected to selected"
275 | print " output format."
276 | print " -s: Provides a count of the Source IP addresses associated with"
277 | print " the specific Attack types to selected output format."
278 |
279 | def version():
280 | """
281 | Function: version
282 | Variables:
283 | None
284 |
285 | Purpose:
286 | Print version informatin out stdout
287 | """
288 | print "Scalp External XML Reporter release: %s" % __release__
289 | print "%s" % __copyright__
290 | print "%s" % __license__
291 | print ""
292 | print "Credits: %s" % __credits__
293 |
294 |
295 | #def main(argv):
296 | def main():
297 | """
298 | Function: main
299 | Variables:
300 | argv = List of command line arguments NOT including the program name.
301 |
302 | Purpose:
303 | The main function where the user's intent is determined and all of the
304 | functions are called.
305 | """
306 | ###################
307 | # Init
308 | ###################
309 |
310 | # Setup variables and default locations
311 | global _dout # Directory to write output
312 | global _fhandle
313 | global _fout # File to write output
314 | global _fout_ext # File extention in case Text, if STDOUT then this = "" which is default
315 | global _scan # Scan type: full = full parse, count = count by attack type, IP = List Source IPs of attack
316 | _dout = os.getcwd() + "/" # default write to current working directory
317 | _fhandle = sys.stdout # Default output is to stdout
318 | _fout_ext = ""
319 | _scan = "full" # Default
320 | dnow = datetime.datetime.utcnow()
321 | fnow = "%s.%s" % (dnow.date(), dnow.time())
322 | fdtd = "scalp_xmldtd.dtd" # Default DTD file for validation
323 | vdtd = ""
324 | xdtd = ""
325 |
326 | # Grab file or directory
327 | if len(sys.argv) < 2:
328 | help()
329 | sys.exit()
330 | if len(sys.argv) > 1:
331 | inXML = sys.argv.pop(len(sys.argv) - 1)
332 |
333 | # Get program options
334 | try:
335 | opts, args = getopt.getopt(sys.argv[1:], "hVv:d:tfas",["help","version"])
336 | except getopt.GetoptError:
337 | # Program help
338 | print "%s: command line error" % sys.argv[0]
339 | help()
340 | sys.exit(1)
341 | for opt, arg in opts:
342 | if opt in ("-h", "--help"):
343 | help()
344 | sys.exit()
345 | elif opt in ("-V", "--version"):
346 | version()
347 | sys.exit()
348 | elif opt == ("-v"): # Validation - DTD file
349 | fdtd = os.path.abspath(arg)
350 | if not os.path.isfile(fdtd):
351 | print "%s: Could not find DTD file: %s" % (sys.argv[0], fdtd)
352 | sys.exit(1)
353 | elif opt == ("-d"): # Output directory
354 | # check for ending / and append if none
355 | _dout = os.path.abspath(arg)
356 | if not os.path.exists(_dout):
357 | try:
358 | os.mkdir(_dout)
359 | except OSError:
360 | print "%s: Could not create: %s" % (sys.argv[0], _dout)
361 | if not _dout[len(_dout) - 1] == "/":
362 | _dout = _dout + "/"
363 | # Set these again in case user forgot -t
364 | _fout = "sexr_%s." % fnow
365 | _fout_ext = ".txt"
366 | elif opt == ("-t"): # Text output
367 | _fout = "sexr_%s." % fnow
368 | _fout_ext = ".txt"
369 | elif opt == ("-f"): # Full Parse - default
370 | _scan = "full"
371 | elif opt == ("-a"): # count by attack type
372 | _scan = "count"
373 | elif opt == ("-s"): # List Source IPs of attack
374 | _scan = "IP"
375 | else:
376 | print "%s: Detected unrecognized command line argument." % sys.argv[0]
377 | help()
378 | sys.exit(1)
379 |
380 | # validate Scalp XML files
381 | tempXML = []
382 | if os.path.isdir(inXML):
383 | tempXML = glob.glob(os.path.abspath(inXML + '/*'))
384 | for fXML in tempXML:
385 | if os.path.isdir(fXML):
386 | tempXML.pop(tempXML.index(fXML))
387 | elif os.path.isfile(inXML):
388 | tempXML.insert(0,os.path.abspath(inXML))
389 | else:
390 | print "%s: Could not find Scalp XML file." % sys.argv[0]
391 | help()
392 | sys.exit(1)
393 | inXML = [] # convert inXML to a list
394 | inXML = tempXML
395 |
396 | # Prep for XML validation
397 | if not os.path.isfile(fdtd):
398 | print "%s: Could not find DTD file: %s" % (sys.argv[0], fdtd)
399 | sys.exit(1)
400 | try:
401 | xdtd = open(fdtd,'r')
402 | except:
403 | print "%s: Could not open DTD file: %s" % (sys.argv[0], fdtd)
404 | sys.exit(1)
405 | vdtd = etree.DTD(xdtd)
406 |
407 |
408 | ###################
409 | # Main
410 | ###################
411 |
412 | if _scan == "full":
413 | print "%s: Conducting %s scan of %s files" % (sys.argv[0], _scan, len(inXML))
414 |
415 | for fXML in inXML:
416 | # Parse the XML file and find the root node
417 | p_scalp = xparse(fXML, vdtd)
418 | len_p_scalp = len(p_scalp)
419 | if len_p_scalp < 1: # Nothing found in file
420 | continue # skip to next file
421 |
422 | # Determine where to write
423 | if len(_fout_ext): # User wants output to file
424 | _fhandle = set_foutput(_fhandle)
425 |
426 | # Iterate through the whole XML file and print it to STDOUT
427 | iter(p_scalp,'',_fhandle)
428 | if len(_fout_ext): # User wants output to file
429 | _fhandle.close()
430 |
431 | if _scan == "count" or _scan == "IP":
432 | print "%s: Conducting %s scan of %s files" % (sys.argv[0], _scan, len(inXML))
433 |
434 | for fXML in inXML:
435 | # Parse the XML file and find the root node
436 | p_scalp = xparse(fXML, vdtd)
437 | len_p_scalp = len(p_scalp)
438 | if len_p_scalp < 1: # Nothing found in file
439 | continue # skip to next file
440 |
441 | # Determine where to write
442 | if len(_fout_ext): # User wants output to file
443 | _fhandle = set_foutput(_fhandle)
444 |
445 | # Iterate through the whole XML file but only show attack numbers
446 | # and source IP addresses
447 | item_cnt_iter(p_scalp,'',_fhandle)
448 | if len(_fout_ext): # User wants output to file
449 | _fhandle.close()
450 |
451 |
452 | ###################
453 | # Clean up
454 | ###################
455 |
456 | print "%s: Done" % sys.argv[0]
457 | sys.exit() # return
458 |
459 |
460 | if __name__ == '__main__':
461 |
462 | # Function where all the work is done
463 | main()
464 |
--------------------------------------------------------------------------------
/scalp_xmldtd.dtd:
--------------------------------------------------------------------------------
1 |
2 |
4 |
5 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
--------------------------------------------------------------------------------