├── .gitignore ├── LICENSE.txt ├── README.md ├── basespec ├── __init__.py ├── analyze_spec.py ├── parse_spec.py ├── preprocess.py ├── scatterload.py ├── slicer.py ├── spec │ ├── 24008-f80.doc │ ├── 24008-f80.txt │ ├── 24011-f30.doc │ ├── 24011-f30.txt │ ├── 24080-f10.doc │ ├── 24080-f10.txt │ ├── 24301-f80.doc │ ├── 24301-f80.txt │ ├── 44018-f50.doc │ ├── 44018-f50.txt │ ├── ts_124008v150800p.pdf │ ├── ts_124008v150800p.txt │ ├── ts_124301v150800p.pdf │ └── ts_124301v150800p.txt ├── structs │ └── l3msg.py └── utils.py ├── examples ├── ex_check_spec.py ├── ex_get_spec_msgs.py ├── ex_init_functions.py ├── ex_init_strings.py └── ex_run_scatterload.py ├── load_ida.py └── overview.png /.gitignore: -------------------------------------------------------------------------------- 1 | .*.swp 2 | *~ 3 | cache 4 | __pycache__ 5 | *.pickle 6 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Dongkwan Kim and Eunsoo Kim 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so, 10 | subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Description 2 | BaseSpec is a system that performs a comparative analysis of baseband 3 | implementation and the specifications of cellular networks. The key intuition of 4 | BaseSpec is that a message decoder in baseband software embeds the protocol 5 | specification in a machine-friendly structure to parse incoming messages; hence, 6 | the embedded protocol structures can be easily extracted and compared with the 7 | specification. This enables BaseSpec to automate the comparison process and 8 | explicitly discover mismatches in the protocol implementation, which are 9 | non-compliant to the specification. These mismatches can directly pinpoint the 10 | mistakes of developers when embedding the protocol structures or hint at 11 | potential vulnerabilities. 12 | 13 | ![BaseSpec Overview](./overview.png) 14 | 15 | With BaseSpec, we analyzed the implementation of cellular standard L3 messages 16 | in 18 baseband firmware images of 9 devices models from one of the top three 17 | vendors. BaseSpec identified hundreds of mismatches that indicate both 18 | functional errors and potentially vulnerable points. We investigated their 19 | functional and security implications and discovered 9 erroneous cases affecting 20 | 33 distinct messages: 5 of these cases are functional errors and 4 of them are 21 | memory-related vulnerabilities. Notably, 2 of the vulnerabilities are critical 22 | remote code execution (RCE) 0-days. We also applied BaseSpec to 3 models from a 23 | different vendor in the top three. Through this analysis, BaseSpec identified 24 | multiple mismatches, 2 of which led us to discover a buffer overflow bug. 25 | 26 | For more details, please see [our 27 | paper](https://syssec.kaist.ac.kr/pub/2021/kim-ndss2021.pdf). 28 | 29 | - BaseSpec will be presented at [NDSS 2021](https://www.ndss-symposium.org/ndss-paper/basespec-comparative-analysis-of-baseband-software-and-cellular-specifications-for-l3-protocols/). 30 | 31 | 32 | ## Disclaimer 33 | The current release of BaseSpec **only includes the parts that are irrelevant to 34 | the vendors**: preprocessing (i.e., memory layout analysis and function 35 | identification), complementary specification parsing, and comparison. 36 | 37 | We reported all findings to the two vendors; one strongly refuses to publish the 38 | details, and the other has not responded to us yet. The one that refused, 39 | particularly, concerned that complete patch deployment would take a long time 40 | (over six months) because they should collaborate with each mobile carrier. 41 | According to the vendor, they should request the patches to ~280 carriers to 42 | update ~130 models globally. Due to this complexity, the vendor thinks that 43 | numerous devices might remain unpatched and vulnerable to our bugs. We agree 44 | with this and anonymize the vendor in the 45 | [paper](https://syssec.kaist.ac.kr/pub/2021/kim-ndss2021.pdf). 46 | 47 | 48 | # How to use 49 | 50 | ### 0. Using BaseSpec in IDA Pro 51 | BaseSpec contains python scripts based on IDA Pro APIs (IDAPython). To use 52 | BaseSpec, first load the baseband firmware of interest into IDA Pro at the 53 | correct locations, which may require parsing of vendor-specific firmware 54 | file formats. 55 | Then, import `load_ida.py` as a script file in IDA Pro (using Alt+F7). 56 | 57 | 58 | ### 1. Preprocessing 59 | For scatter-loading, use `basespec.scatterload` as below. 60 | 61 | ```python 62 | from basespec import scatterload 63 | scatterload.run_scatterload() 64 | ``` 65 | 66 | For function identification, use `basespec.preprocess` as below. 67 | 68 | ```python 69 | from basespec import preprocess 70 | preprocess.init_functions() 71 | preprocess.FUNC_BY_LS # identified functions by linear sweep prologue detection 72 | preprocess.FUNC_BY_LS_TIME # time spent for linear sweep prologue detection 73 | preprocess.FUNC_BY_PTR # identified functions by pointer analysis 74 | preprocess.FUNC_BY_PTR_TIME # time spent for pointer analysis 75 | ``` 76 | 77 | For string initialization, use `basespec.preprocess` as below. 78 | 79 | ```python 80 | from basespec import preprocess 81 | preprocess.init_strings() 82 | ``` 83 | 84 | 85 | ### 2. Specification parsing 86 | 87 | You can fetch the dictionary containing all specification msgs by running as 88 | below. 89 | 90 | ```python 91 | from basespec import parse_spec 92 | spec_msgs = parse_spec.get_spec_msgs() # spec_msgs[nas_type][msg_type] = ie_list 93 | ``` 94 | 95 | This `spec_msgs` dictionary contains a list of IEs for each message. Below is an 96 | example to fetch the IE list of the EMM SECURITY MODE COMMAND message. 97 | 98 | ```python 99 | emm_msgs = spec_msgs[7] # 7 : the type of EPS Mobility Management 100 | smc_ie_list = emm_msgs[0x5d] # 0x5d : the type of SECURITY MODE COMMAND 101 | ``` 102 | 103 | 104 | ### 3. Specification comparing 105 | 106 | To compare the message structures in the specification and binary, you should 107 | first create the corresponding class instances. Below is an example to compare 108 | the IE list of the EMM ATTACH ACCEPT message 109 | ([`examples/ex_check_spec.py`](./examples/ex_check_spec.py)). 110 | 111 | ```python 112 | from basespec.analyze_spec import check_spec 113 | from basespec.structs.l3msg import IeInfo, L3MsgInfo, L3ProtInfo 114 | 115 | # EMM protocol 116 | pd = 7 117 | 118 | # EMM attach accept message 119 | msg_type = 0x42 120 | 121 | # Build a message 122 | # The information should be extracted from embedded message structures in the binary. 123 | IE_list = [] 124 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True)) 125 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True)) 126 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True)) 127 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=6, max=96, imperative=True)) 128 | #IE_list.append(IeInfo(msg_type, name="", iei=0, min=0, max=32767, imperative=True)) #missing 129 | IE_list.append(IeInfo(msg_type, name="", iei=0x50, min=11, max=11, imperative=False)) 130 | IE_list.append(IeInfo(msg_type, name="", iei=0x13, min=5, max=5, imperative=False)) 131 | IE_list.append(IeInfo(msg_type, name="", iei=0x23, min=5, max=8, imperative=False)) 132 | IE_list.append(IeInfo(msg_type, name="", iei=0x53, min=1, max=1, imperative=False)) 133 | IE_list.append(IeInfo(msg_type, name="", iei=0x4A, min=1, max=99, imperative=False)) #invalid 134 | IE_list.append(IeInfo(msg_type, name="", iei=0xFF, min=5, max=5, imperative=False)) #unknown 135 | attach_accept_msg = L3MsgInfo(pd, msg_type, name="Attach accept", direction="DL", ie_list=IE_list) 136 | 137 | # Build protocol 138 | EMM_prot = L3ProtInfo(pd, [attach_accept_msg]) 139 | 140 | l3_list = [EMM_prot] 141 | 142 | # Compare with specification 143 | check_spec(l3_list, pd) 144 | ``` 145 | 146 | This returns the mismatch results in a CSV format. Below is a part of the output 147 | in a CSV table format. 148 | 149 | |IE Name|Reference|Spec IEI|Spec Presence|Spec Format|Spec Length|Bin IEI|Bin Imperative|Bin Length|Bin Idx|Error 1|Error 2| 150 | |---|---|---|---|---|---|---|---|---|---|---|---| 151 | |EPS attach result|EPS attach result||M|V|1/2|00|True|1|0x42| 152 | |Spare half octet|Spare half octet||M|V|1/2|00|True|1|0x42| 153 | |T3412 value|GPRS timer||M|V|1|00|True|1|0x42| 154 | |TAI list|Tracking area identity list||M|LV|7-97|00|True|7-97|0x42| 155 | |GUTI|EPS mobile identity|50|O|TLV|13|50|False|13|0x42| 156 | |Location area identification|Location area identification|13|O|TV|6|13|False|6|0x42| 157 | |MS identity|Mobile identity|23|O|TLV|7-10|23|False|7-10|0x42| 158 | |EMM cause|EMM cause|53|O|TV|2|53|False|2|0x42| 159 | |Equivalent PLMNs|PLMN list|4A|O|TLV|5-47|4A|False|3-101|0x42| non-imperative invalid mismatch (min length)| non-imperative invalid mismatch (max length)| 160 | |-|-|-|-|-|-|FF|False|5|0x42|non-imperative unknown mismatch| 161 | |ESM message container|ESM message container||M|LV-E|5-n|-|-|-|-|imperative missing mismatch| 162 | |T3402 value|GPRS timer|17|O|TV|2|-|-|-|-|non-imperative missing mismatch| 163 | |T3423 value|GPRS timer|59|O|TV|2|-|-|-|-|non-imperative missing mismatch| 164 | | ... | 165 | 166 | 167 | # Issues 168 | 169 | ### Tested environment 170 | We ran all our experiments on a machine equipped with an Intel Core I7-6700K CPU 171 | at 4.00 GHz and 64 GB DDR4 RAM. We setup Windows 10 Pro, IDA Pro v7.4, and 172 | Python 3.7.6 on the machine. 173 | 174 | For converting the doc and pdf files, we ran it on a Linux machine. 175 | Please check [this function](./basespec/parse_spec.py#L15). 176 | 177 | 178 | # Authors 179 | This project has been conducted by the below authors at KAIST. 180 | * [Eunsoo Kim](https://hahah.kim) (These two authors contributed equally.) 181 | * [Dongkwan Kim](https://0xdkay.me/) (These two authors contributed equally.) 182 | * [CheolJun Park](https://unrloay2.github.io/) 183 | * [Insu Yun](https://insuyun.github.io/) 184 | * [Yongdae Kim](https://syssec.kaist.ac.kr/~yongdaek/) 185 | 186 | 187 | # Citation 188 | We would appreciate if you consider citing [our 189 | paper](https://syssec.kaist.ac.kr/pub/2021/kim-ndss2021.pdf). 190 | ```bibtex 191 | @article{kim:2021:basespec, 192 | author = {Eunsoo Kim and Dongkwan Kim and CheolJun Park and Insu Yun and Yongdae Kim}, 193 | title = {{BaseSpec}: Comparative Analysis of Baseband Software and Cellular Specifications for L3 Protocols}, 194 | booktitle = {Proceedings of the 2021 Annual Network and Distributed System Security Symposium (NDSS)}, 195 | year = 2021, 196 | month = feb, 197 | address = {Online} 198 | } 199 | ``` 200 | -------------------------------------------------------------------------------- /basespec/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/__init__.py -------------------------------------------------------------------------------- /basespec/analyze_spec.py: -------------------------------------------------------------------------------- 1 | import time 2 | import itertools 3 | 4 | from .parse_spec import get_spec_msgs 5 | 6 | 7 | def flatten(l): 8 | return list(itertools.chain.from_iterable(l)) 9 | 10 | 11 | def find_ref(spec_map, target_ref, target_pd=None): 12 | for pd, vals in spec_map.items(): 13 | if target_pd and pd != target_pd: 14 | continue 15 | 16 | for msg_type, args in vals.items(): 17 | class_name, sub_class_name, msg_name, ie_list = args 18 | for idx, ie in enumerate(ie_list): 19 | ie_name, iei, ref, presence, ie_format, length = ie 20 | if target_ref in ref.lower(): 21 | print( 22 | "NasProt[{}] {} -> {}, {}".format( 23 | pd, class_name, sub_class_name, msg_name 24 | ) 25 | ) 26 | print( 27 | " [{}] {}: iei: {}, presence: {}, format: {}, length: {}".format( 28 | idx, ie_name, iei, presence, ie_format, length 29 | ) 30 | ) 31 | 32 | 33 | def print_ie_list(spec_ie_list): 34 | for idx, ie in enumerate(spec_ie_list): 35 | ie_name, iei1, ref, spec_presence, spec_format, spec_length = ie 36 | print("[{}] ({}) {} {} {}".format(idx, iei1, ie_name, spec_format, spec_length)) 37 | 38 | 39 | def sanitize_iei(iei): 40 | if iei: 41 | iei = iei.strip("-") 42 | iei = iei.strip("x") 43 | if iei == "TBC": 44 | iei = 0 45 | elif iei: 46 | iei = int(iei, 16) 47 | else: 48 | iei = 0 49 | else: 50 | iei = 0 51 | 52 | return iei 53 | 54 | 55 | def sanitize_len(len1, len2): 56 | if len1 == len2: 57 | msg_length = "{}".format(len1) 58 | else: 59 | msg_length = "{}-{}".format(len1, len2) 60 | 61 | return msg_length 62 | 63 | 64 | def apply_li(spec_ie, bin_ie): 65 | ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie 66 | len2_min = bin_ie.min 67 | len2_max = bin_ie.max 68 | 69 | # ================================================ 70 | # Convert Length 71 | # ================================================ 72 | # For the length of the specification, the length of IEI, LI is already 73 | # included in the length. However, for those having length 'n', it is not 74 | # included. Thus, we calculate them first. 75 | if "-" in spec_length: 76 | len1_min, len1_max = spec_length.split("-") 77 | if len1_max == "?": 78 | len1_max = "n" 79 | len1_min = int(len1_min) 80 | if len1_max == "n": 81 | if "-E" in spec_format: 82 | len1_max = 2 ** 16 - 1 83 | else: 84 | len1_max = 2 ** 8 - 1 85 | 86 | if "T" in spec_format: 87 | len1_max += 1 88 | if "L" in spec_format: 89 | len1_max += 1 90 | if "-E" in spec_format: 91 | len1_max += 1 92 | else: 93 | len1_max = int(len1_max) 94 | 95 | elif spec_length == "1/2": 96 | len1_min = 1 97 | len1_max = 1 98 | 99 | else: 100 | len1_min = int(spec_length) 101 | len1_max = int(spec_length) 102 | 103 | if not (spec_format == "TV" and spec_length == "1"): 104 | if "T" in spec_format: 105 | len2_min += 1 106 | len2_max += 1 107 | if "L" in spec_format: 108 | len2_min += 1 109 | len2_max += 1 110 | if "-E" in spec_format: 111 | len2_min += 1 112 | len2_max += 1 113 | 114 | return len1_min, len1_max, len2_min, len2_max 115 | 116 | 117 | def compare_ie(spec_ie, bin_ie): 118 | ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie 119 | spec_presence = spec_presence.split()[0] 120 | iei1 = sanitize_iei(iei1) 121 | iei2 = bin_ie.iei 122 | len1_min, len1_max, len2_min, len2_max = apply_li(spec_ie, bin_ie) 123 | 124 | bug_str = "" 125 | if bin_ie.imperative: 126 | ie_type = 'imperative' 127 | else: 128 | ie_type = 'non-imperative' 129 | 130 | # ================================================ 131 | # Check format (LI, Length Indicator) and length 132 | # ================================================ 133 | # If a length is specified, 134 | if len1_min != len2_min: 135 | bug_str += ", {} invalid mismatch (min length)".format(ie_type) 136 | if len1_max != len2_max: 137 | bug_str += ", {} invalid mismatch (max length)".format(ie_type) 138 | 139 | return bug_str 140 | 141 | 142 | def compare_ie_list(pd, msg, spec_msg): 143 | s = "" 144 | class_name, sub_class_name, msg_name, spec_direction, spec_ie_list = spec_msg 145 | # check direction 146 | if (msg.direction != spec_direction) and (spec_direction != 'both'): 147 | return s 148 | 149 | # skip common IEs that do not exist in the baseband firmware 150 | l = 0 151 | for idx, ie in enumerate(spec_ie_list): 152 | ie_name, iei, ref, presence, ie_format, length = ie 153 | if "message type" in ref.lower() or "message type" in ie_name.lower(): 154 | break 155 | 156 | # sms (9) -> nested element exists 157 | if pd == 9 and msg.type not in [4, 16]: 158 | spec_ie_list = spec_ie_list 159 | else: 160 | spec_ie_list = spec_ie_list[idx + 1 :] 161 | 162 | bin_ie_list = msg.ie_list 163 | # Filter if bin_ie_list is not implemented yet. 164 | if len(spec_ie_list) > 0 and len(bin_ie_list) == 0: 165 | s += "0x{0:x} ({0}) {1} not implemented in pd {2}".format( 166 | msg.type, msg_name, pd 167 | ) 168 | return s 169 | 170 | # ================================================ 171 | # First, we divide IE list of the specification to imperatives and 172 | # non-imperatives. 173 | # ================================================ 174 | bug_flag = False 175 | imperatives = [] 176 | nonimperatives = {} 177 | for idx, ie in enumerate(spec_ie_list): 178 | ie_name, iei1, ref, spec_presence, spec_format, spec_length = ie 179 | bug_str = "" 180 | 181 | # Check spec length 182 | if spec_length == "1/2" and spec_format not in ["V", "TV"]: 183 | bug_str += ",spec length error" 184 | 185 | # Check presence of the specification 186 | if ( 187 | "M" not in spec_presence 188 | and "O" not in spec_presence 189 | and "C" not in spec_presence 190 | ): 191 | bug_str += ",spec presence error" 192 | 193 | # In TS 24.007, 11.2.5 Presence requirements of information elements, 194 | # only IEs belonging to non-imperative part of a message may have 195 | # presence requirement C. However, we find special case for conditional 196 | # IE implementation. That is, there is a message that has an imperative 197 | # part having IEs of the "C" presence. 198 | # 199 | # Protocol: Radio Resource management (PD: 6) 200 | # Message: IMMEDIATE ASSIGNMENT (DL) (Message Type: 0x3f) 201 | if pd == 6 and msg.type == 0x3F: 202 | if "C" in spec_presence and sanitize_iei(iei1) == 0: 203 | if 'packet channel description' in ie_name.lower(): 204 | continue 205 | ie = ie_name, iei1, ref, spec_presence, spec_format, spec_length 206 | 207 | ie_name = ie_name.replace(",", "") 208 | # imperative / non-imperative is defined by iei, not presence. 209 | if sanitize_iei(iei1) == 0 and "O" not in spec_presence: 210 | if "T" in spec_format or "O" in spec_presence: 211 | bug_str += ",spec format error" 212 | 213 | imperatives.append(ie) 214 | 215 | else: 216 | if "T" not in spec_format: 217 | bug_str += ",spec non-imperative format error" 218 | 219 | if sanitize_iei(iei1) in nonimperatives: 220 | # ts 24.007, 11.2.4 221 | # A message may contain two or more IEs with equal IEI. Two IEs 222 | # with the same IEI in a same message must have 223 | # 1) the same format, 224 | # 2) when of type 3, the same length. 225 | # More generally, care should be taken not to introduce 226 | # ambiguities by using an IEI for two purposes. Ambiguities 227 | # appear in particular when two IEs potentially immediately 228 | # successive have the same IEI but different meanings and when 229 | # both are non-mandatory. As a recommended design rule, 230 | # messages should contain a single IE of a given IEI. 231 | ie2 = nonimperatives[sanitize_iei(iei1)] 232 | ie_name2, iei2, ref2, spec_presence2, spec_format2, spec_length2 = ie2 233 | if spec_format != spec_format2: # same format 234 | bug_str += ",spec non-imperative same iei format error" 235 | 236 | elif spec_length != spec_length2: # Type 3 237 | assert "-" not in spec_length 238 | bug_str += ",spec non-imperative same iei length error" 239 | 240 | else: 241 | nonimperatives[sanitize_iei(iei1)] = ie 242 | 243 | if bug_str: 244 | s += "{},{},".format(ie_name, ref) 245 | s += "{},{},{},{},".format(iei1, spec_presence, spec_format, spec_length) 246 | s += "-,-,-,-,-" 247 | s += "," + bug_str.lstrip(",") + "\n" 248 | bug_flag = True 249 | 250 | # To separate errors from the spec and the baseband binary. 251 | if s: 252 | s += "-" * 20 + "\n" 253 | 254 | # ================================================ 255 | # Now we check the IE list in the baseband binary and compare them with the 256 | # IE list from the spec. 257 | # ================================================ 258 | iei_done = set() 259 | 260 | for idx, bin_ie in enumerate(bin_ie_list): 261 | bug_str = "" 262 | 263 | # Fetch corresponding IE from specification 264 | spec_ie = None 265 | 266 | # The implementation has two rules for representing imperatives. 267 | # Developers may misunderstood the specification? 268 | if bin_ie.imperative: 269 | if imperatives: 270 | spec_ie = imperatives.pop(0) 271 | 272 | if spec_ie: 273 | ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie 274 | ie_name = ie_name.replace(",", "") 275 | len1_min, len1_max, len2_min, len2_max = apply_li(spec_ie, bin_ie) 276 | 277 | # skipped 1/2 length in binary implementation 278 | if spec_length == "1/2" and (len2_min != len2_max or len2_max != 1): 279 | s += "{},{},".format(ie_name, ref) 280 | s += "{},{},{},{},".format( 281 | iei1, spec_presence, spec_format, spec_length 282 | ) 283 | s += "-,-,-,-" 284 | if "spare half" in ref.lower(): 285 | s += ",(skipped spare half)" 286 | else: 287 | s += ",imperative missing mismatch (skipped 1/2)" 288 | bug_flag = True 289 | s += "\n" 290 | 291 | if imperatives: 292 | spec_ie = imperatives.pop(0) 293 | else: 294 | spec_ie = None 295 | 296 | else: 297 | if bin_ie.iei in nonimperatives: 298 | spec_ie = nonimperatives[bin_ie.iei] 299 | iei1 = spec_ie[1] 300 | len1_min, len1_max, len2_min, len2_max = apply_li(spec_ie, bin_ie) 301 | 302 | # This is check for IE type1 having a 4-bit IEI and 4-bit value 303 | if iei1.endswith("-") and (len2_min != len2_max or len2_max != 1): 304 | spec_ie = None 305 | else: 306 | iei_done.add(bin_ie.iei) 307 | 308 | if spec_ie: 309 | ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie 310 | ie_name = ie_name.replace(",", "") 311 | len1_min, len1_max, len2_min, len2_max = apply_li(spec_ie, bin_ie) 312 | if "/" not in spec_length: 313 | spec_length = sanitize_len(len1_min, len1_max) 314 | msg_length = sanitize_len(len2_min, len2_max) 315 | s += "{},{},".format(ie_name, ref) 316 | s += "{},{},{},{},".format(iei1, spec_presence, spec_format, spec_length) 317 | s += "{:02X},{},{},0x{:X}".format( 318 | bin_ie.iei, bin_ie.imperative, msg_length, bin_ie.type 319 | ) 320 | bug_str = compare_ie(spec_ie, bin_ie) 321 | 322 | else: 323 | msg_length = sanitize_len(bin_ie.min, bin_ie.max) 324 | s += "-,-," 325 | s += "-,-,-,-," 326 | s += "{:02X},{},{},0x{:X}".format( 327 | bin_ie.iei, bin_ie.imperative, msg_length, bin_ie.type 328 | ) 329 | if bin_ie.imperative: 330 | bug_str = "imperative unknown mismatch" 331 | else: 332 | bug_str = "non-imperative unknown mismatch" 333 | 334 | if bug_str: 335 | s += "," + bug_str.lstrip(",") 336 | bug_flag = True 337 | s += "\n" 338 | 339 | # ================================================ 340 | # Check leftovers 341 | # ================================================ 342 | for spec_ie in imperatives: 343 | ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie 344 | ie_name = ie_name.replace(",", "") 345 | s += "{},{},".format(ie_name, ref) 346 | s += "{},{},{},{},".format(iei1, spec_presence, spec_format, spec_length) 347 | s += "-,-,-,-" 348 | if "spare half" in ref.lower(): 349 | s += ",(skipped spare half)" 350 | s += "\n" 351 | else: 352 | s += ",imperative missing mismatch" 353 | s += "\n" 354 | bug_flag = True 355 | 356 | for iei in nonimperatives: 357 | if iei not in iei_done: 358 | ie2 = nonimperatives[iei] 359 | ie_name, iei1, ref, spec_presence, spec_format, spec_length = ie2 360 | ie_name = ie_name.replace(",", "") 361 | s += "{},{},".format(ie_name, ref) 362 | s += "{},{},{},{},".format(iei1, spec_presence, spec_format, spec_length) 363 | s += "-,-,-,-" 364 | s += ",non-imperative missing mismatch" 365 | s += "\n" 366 | bug_flag = True 367 | 368 | if not bug_flag: 369 | s = "" 370 | 371 | return s 372 | 373 | 374 | def check_numbers(l3_msgs): 375 | total_msgs = 0 376 | total_ies = 0 377 | total_iies = 0 378 | for pd, prot in enumerate(l3_msgs): 379 | if len(prot.msg_list) == 0: 380 | continue 381 | 382 | if pd > 12: 383 | break 384 | 385 | msgs = l3_msgs[pd].msg_list 386 | print("# of {} msgs: {}".format(pd, len(msgs))) 387 | 388 | ies = flatten(map(lambda x: x.ie_list, msgs)) 389 | iies = list(filter(lambda x: x.imperative, map(lambda x: x[0], ies))) 390 | print("# of {} msg IEs: {}".format(pd, len(ies))) 391 | print("# of {} msg imperative IEs: {}".format(pd, len(iies))) 392 | 393 | total_msgs += len(valid_msgs) 394 | total_ies += len(ies) 395 | total_iies += len(iies) 396 | 397 | print("# of total msgs: {}".format(total_msgs)) 398 | print("# of total IEs: {}".format(total_ies)) 399 | print("# of total imperative IEs: {}".format(total_iies)) 400 | 401 | 402 | def check_spec(l3_msgs, target_pd=3): 403 | ''' 404 | check_spec compares l3_msgs from binary with specification. 405 | It prints comparision results (e.g., mismatches) in CSV format. 406 | 407 | :param l3_msgs: list of basespec.structs.l3msg.L3ProtInfo 408 | which is generated based on embedded message structure 409 | of the binary. 410 | :param target_pd: the PD value to analyze. 411 | ''' 412 | global spec_map 413 | if "spec_map" not in globals(): 414 | spec_map = get_spec_msgs() 415 | 416 | for prot in l3_msgs: 417 | pd = prot.pd 418 | if len(prot.msg_list) == 0: 419 | continue 420 | 421 | # For DEBUG 422 | if pd != target_pd: 423 | continue 424 | 425 | if pd not in spec_map: 426 | continue 427 | 428 | prot_done = set() 429 | for msg in prot.msg_list: 430 | msg_type = msg.type 431 | 432 | if msg_type not in spec_map[pd]: 433 | print("=" * 20) 434 | print( 435 | "NasProt[{0}] msg_type 0x{1:x} ({1}) not in pd {0}".format( 436 | pd, msg_type 437 | ) 438 | ) 439 | continue 440 | 441 | # There may exist multiple messages from spec, so we analyze each 442 | # message and pick the least buggy message. 443 | spec_msgs = spec_map[pd][msg_type] 444 | msg_results = {} 445 | 446 | for spec_msg in spec_msgs: 447 | class_name, sub_class_name, msg_name, msgs = spec_msg 448 | for direction, ie_list in msgs: 449 | spec_msg = ( 450 | class_name, 451 | sub_class_name, 452 | msg_name, 453 | direction, 454 | ie_list, 455 | ) 456 | prot_done.add(msg_type) 457 | 458 | try: 459 | s = compare_ie_list(pd, msg, spec_msg).strip() 460 | s_cnt = s.count("error") 461 | 462 | if direction in msg_results: 463 | if s_cnt < msg_results[direction][-1][-1]: 464 | msg_results[direction] = ( 465 | class_name, 466 | sub_class_name, 467 | msg_name, 468 | (s, s_cnt), 469 | ) 470 | else: 471 | msg_results[direction] = ( 472 | class_name, 473 | sub_class_name, 474 | msg_name, 475 | (s, s_cnt), 476 | ) 477 | except: 478 | import traceback 479 | 480 | print(class_name, sub_class_name, msg_name, direction) 481 | traceback.print_exc() 482 | 483 | # Filter the least buggy message 484 | for direction, ( 485 | class_name, 486 | sub_class_name, 487 | msg_name, 488 | (s, s_cnt), 489 | ) in msg_results.items(): 490 | if s: 491 | print("=" * 20) 492 | print( 493 | "NasProt[{}] {} -> {}".format( 494 | pd, class_name, sub_class_name 495 | ) 496 | ) 497 | print( 498 | "0x{0:x} ({0}) {1} ({2})".format(msg_type, msg_name, direction) 499 | ) 500 | title = "IE Name,Reference,Spec IEI,Spec Presence,Spec Format,Spec Length," 501 | title += "Bin IEI,Bin Imperative,Bin Length,Bin Idx" 502 | print(title) 503 | print(s) 504 | 505 | # Check messages not implemented in the binary. 506 | for msg_type in spec_map[pd]: 507 | if msg_type not in prot_done: 508 | spec_msg = spec_map[pd][msg_type][0] 509 | class_name, sub_class_name, msg_name, msgs = spec_msg 510 | print("=" * 20) 511 | print( 512 | "NasProt[{}] {} -> {}".format(pd, class_name, sub_class_name) 513 | ) 514 | print( 515 | "0x{0:x} ({0}) {1} not implemented in pd {2}".format( 516 | msg_type, msg_name, pd 517 | ) 518 | ) 519 | -------------------------------------------------------------------------------- /basespec/parse_spec.py: -------------------------------------------------------------------------------- 1 | import re 2 | import sys 3 | import os 4 | 5 | from collections import defaultdict 6 | 7 | # TODO: handle special cases 8 | SHORT_PROT = [ 9 | "VBS/VGCS", 10 | "Synchronization channel information", 11 | ] 12 | 13 | 14 | # This may need to run on Linux 15 | def convert_txt(fname_in, fname_out=""): 16 | if not fname_out: 17 | fname_out = os.path.splitext(fname_in)[0] + ".txt" 18 | 19 | if not os.path.exists(fname_out): 20 | ext = os.path.splitext(fname_in)[-1] 21 | if ext == ".doc": 22 | os.system('antiword -w 0 "{}" > "{}"'.format(fname_in, fname_out)) 23 | elif ext == ".pdf": 24 | os.system('pdftotext -layout "{}" "{}"'.format(fname_in, fname_out)) 25 | 26 | return fname_out 27 | 28 | 29 | def read_data(fname): 30 | with open(fname, "r", encoding="ascii", errors="ignore") as f: 31 | data = f.read() 32 | 33 | # incorrectly converted character 34 | data = data.replace("\xa0", " ") 35 | 36 | # TS 44.018 37 | data = data.replace("ACK.", "ACKNOWLEDGE") 38 | data = data.replace("NOTIFICATION RESPONSE", "NOTIFICATION/RESPONSE") 39 | data = data.replace( 40 | "SYSTEM INFORMATION TYPE 2 quater", "SYSTEM INFORMATION TYPE 2quater" 41 | ) 42 | data = data.replace("SYSTEM INFORMATION 15", "SYSTEM INFORMATION TYPE 15") 43 | data = data.replace("EXTENDED MEASUREMENT ORDER", " EXTENDED MEASUREMENT ORDER") 44 | data = data.replace("CDMA 2000", "CDMA2000") 45 | # data = data.replace('00010110 MBMS ANNOUNCEMENT', '00110101 MBMS ANNOUNCEMENT') 46 | 47 | # TS 24.008 48 | # data = data.replace('DETACH REQUEST', ' DETACH REQUEST') 49 | # data = data.replace('DETACH ACCEPT', ' DETACH ACCEPT') 50 | # data = data.replace('Detach ACCEPT', ' Detach ACCEPT') 51 | data = data.replace(" Contents of Service Request", " Service Request") 52 | data = data.replace(" Contents of Service Accept", " Service Accept") 53 | data = data.replace(" Contents of Service Reject", " Service Reject") 54 | data = data.replace( 55 | " Authentication and ciphering req", " Authentication and ciphering request" 56 | ) 57 | data = data.replace( 58 | " Authentication and ciphering resp", " Authentication and ciphering response" 59 | ) 60 | data = data.replace( 61 | " Authentication and ciphering rej", " Authentication and ciphering reject" 62 | ) 63 | data = data.replace("activation rej.", "activation reject") 64 | data = data.replace("request(Network", "request (Network") 65 | data = data.replace("request(MS", "request (MS") 66 | data = data.replace("TABLE", "Table") 67 | data = data.replace("AUTHENTICATION FAILURE..", "AUTHENTICATION FAILURE") 68 | # Check "Facility(simple recall alignment)" length "2-" 69 | lines = data.splitlines() 70 | 71 | return lines 72 | 73 | 74 | def get_direction(lines, idx): 75 | # find direction 76 | tmp_idx = idx 77 | while "direction" not in lines[tmp_idx].lower() and tmp_idx >= 0: 78 | tmp_idx -= 1 79 | 80 | assert "direction" in lines[tmp_idx].lower() 81 | 82 | s = lines[tmp_idx].lower() 83 | if ( 84 | "network to mobile" in s 85 | or "to ms" in s 86 | or "to ue" in s 87 | or "-> ms " in s 88 | or "dl" in s 89 | ): 90 | direction = "DL" 91 | elif ( 92 | "mobile station to network" in s 93 | or "mobile to network" in s 94 | or "ms to" in s 95 | or "ue to" in s 96 | or "ms ->" in s 97 | or "ul" in s 98 | ): 99 | direction = "UL" 100 | elif "both" in s: 101 | direction = "both" 102 | else: 103 | direction = None 104 | 105 | return direction 106 | 107 | 108 | def parse_msg_content(lines): 109 | in_table = False 110 | direction = None 111 | ie_list = [] 112 | idx = 0 113 | msgs = defaultdict(list) 114 | msg_name = "" 115 | 116 | while idx < len(lines): 117 | line = lines[idx] 118 | 119 | if not line: 120 | idx += 1 121 | continue 122 | 123 | if "Table" in line: 124 | # Handle messages whose name is too long. 125 | # ts 44.018: INTER SYSTEM TO CDMA2000 HANDOVER 126 | # ts 24.008: MODIFY PDP CONTEXT REQUEST, MODIFY PDP CONTEXT ACCEPT 127 | if "message content" not in line and re.search( 128 | "(message )?content[s]?$", lines[idx + 1].strip() 129 | ): 130 | line = line.strip() + " " + lines[idx + 1].strip() 131 | idx += 1 132 | 133 | # Handle messages whose name is too long. 134 | # ts 44.018: EC IMMEDIATE ASSIGNMENT TYPE 1 135 | if "information element" not in line and re.search( 136 | "(information )?element[s]?$", lines[idx + 1].strip() 137 | ): 138 | line = line.strip() + " " + lines[idx + 1].strip() 139 | 140 | # Handle messages whose name is an exception. 141 | # ts 24.008: SETUP 142 | line = re.sub("(.* message content) ?\(.*to.*direction\)", "\g<1>", line) 143 | # if 'SETUP message content' in line and 'direction' in line: 144 | # line = re.sub('(SETUP message content).*', '\g<1>', line) 145 | 146 | if re.match("^[ \|]*Table.*message content", line): 147 | # If there exist another table, we skip it 148 | if in_table: 149 | if ie_list: 150 | msgs[msg_name.lower()].append((direction, ie_list)) 151 | 152 | # These are not standard L3 messages. 153 | # only one exception is special case for IMMEDIATE ASSIGNMENT as below: 154 | # Table 9.1.18.1a: IMMEDIATE ASSIGNMENT message content (MTA 155 | # Access Burst or Extended Access Burst Method only) 156 | # 157 | # We skip this case as it is a special case. 158 | idx += 1 159 | in_table = False 160 | continue 161 | 162 | in_table = True 163 | # g = re.match('^\s*Table [0-9\.\:]+', line) 164 | msg_name = re.search("[: ]([A-Za-z\-\/0-9 \(\)]+) message content", line) 165 | msg_name = msg_name.group().strip().replace(":", "") 166 | # msg_name = msg_name.replace(':', '').strip() 167 | # table_name = table_name.split()[-1] 168 | msg_name = msg_name.replace("message content", "").strip() 169 | # msg_name = (table_name, msg_name) 170 | 171 | direction = get_direction(lines, idx) 172 | ie_list = [] 173 | 174 | idx += 1 175 | continue 176 | 177 | if re.match("^\s*Table.*information elements[ ]*$", line): 178 | # If there exist another table, we skip it 179 | if in_table: 180 | if ie_list: 181 | msgs[msg_name.lower()].append((direction, ie_list)) 182 | 183 | # These are not standard L3 messages. 184 | idx += 1 185 | in_table = False 186 | continue 187 | 188 | in_table = True 189 | # g = re.match('^\s*Table [0-9\.\:]+', line) 190 | msg_name = re.search(" ([A-Za-z\-\/0-9 ]+) information elements", line) 191 | msg_name = msg_name.group().strip() 192 | # table_name = table_name.split()[-1] 193 | msg_name = msg_name.replace("information elements", "").strip() 194 | # msg_name = (table_name, msg_name) 195 | 196 | direction = get_direction(lines, idx) 197 | ie_list = [] 198 | 199 | idx += 1 200 | continue 201 | 202 | if not in_table: 203 | idx += 1 204 | continue 205 | 206 | if re.match("^\d+\.\d+", line): 207 | if ie_list: 208 | msgs[msg_name.lower()].append((direction, ie_list)) 209 | else: 210 | # this is not a proper standard L3 message 211 | # print('{} may not be parsed properly.'.format(msg_name)) 212 | pass 213 | 214 | in_table = False 215 | direction = None 216 | ie_list = [] 217 | idx += 1 218 | continue 219 | 220 | if "ETSI" in line: 221 | idx += 1 222 | continue 223 | 224 | # dummy line 225 | # IEI Information Element Type/Reference Presence Format Length 226 | # ts 44.018 has 'length', not 'Length' 227 | if ( 228 | "presence" in lines[idx].lower() 229 | and "format" in lines[idx].lower() 230 | and "length" in lines[idx].lower() 231 | ): 232 | idx += 1 233 | continue 234 | 235 | fields = lines[idx].split("|") 236 | fields = list(filter(lambda x: x, fields)) 237 | 238 | if len(fields) != 6: 239 | idx += 1 240 | continue 241 | 242 | fields = list(map(lambda x: x.strip(), fields)) 243 | iei, ie_name, ref, presence, ie_format, length = fields 244 | if not presence or not ie_format: 245 | idx += 1 246 | continue 247 | 248 | # incompatible spec 249 | length = length.replace("octets", "").strip() 250 | length = length.replace("octet", "").strip() 251 | length = length.replace("(", "-").strip() 252 | length = length.replace(" ", "") 253 | # length = length.replace('?', 'n') 254 | 255 | # convert spec error 256 | if length.endswith("-"): 257 | length = length + "n" 258 | 259 | if not length: 260 | length = "1/2" 261 | 262 | # spec error 263 | if "n" in length and "-" not in length: 264 | length = length.replace("n", "-n") 265 | 266 | if length == "1/23/2": 267 | length = "1/2-3/2" 268 | 269 | # For SMS (type 9) - CM - RP messages. rest 5 bits are spare 270 | if length == "3bits": 271 | length = "1" 272 | 273 | # For SMS (type 9) - some messages use '<=' operator 274 | if length.startswith("-"): 275 | length = "1" + length 276 | 277 | # convert spec parsing error 278 | try: 279 | if 3000 < int(length) < 4000: 280 | length = length[0] + "-" + length[1:] 281 | except: 282 | pass 283 | 284 | ie_list.append([ie_name, iei, ref, presence, ie_format, length]) 285 | 286 | idx += 1 287 | 288 | return msgs 289 | 290 | 291 | def parse_msg_type(lines, pdf=False): 292 | in_table = False 293 | idx = 0 294 | msgs = defaultdict(list) 295 | prefix = "" 296 | 297 | while idx < len(lines): 298 | line = lines[idx] 299 | 300 | if not line: 301 | idx += 1 302 | continue 303 | 304 | if re.match("^\s*Table.*Message types", line): 305 | in_table = True 306 | table_name, class_name = line.split(":") 307 | table_name = table_name.split()[-1] 308 | if "for" in class_name: 309 | class_name = class_name.replace("Message types for", "").strip() 310 | else: 311 | class_name = "" 312 | prefix = "" 313 | 314 | # dummy line 315 | # IEI Information Element Type/Reference Presence Format Length 316 | idx += 1 317 | continue 318 | 319 | if not in_table: 320 | idx += 1 321 | continue 322 | 323 | if re.match("^\d+\.\d+", line): 324 | in_table = False 325 | idx += 1 326 | continue 327 | 328 | if "ETSI" in line: 329 | idx += 1 330 | continue 331 | 332 | msg = lines[idx] 333 | 334 | if pdf: 335 | msg_type = re.match("^[01x\- ]+", msg) 336 | else: 337 | msg_type = re.match("^\|[\|\.01x\- ]+", msg) 338 | 339 | if not msg_type: 340 | idx += 1 341 | continue 342 | 343 | msg_type = msg_type.group() 344 | msg_name = msg.replace(msg_type, "") 345 | msg_name = msg_name.replace("|", "").strip() 346 | msg_name = msg_name.replace(":", "").strip() 347 | 348 | msg_type = msg_type.replace("|", "") 349 | msg_type = msg_type.replace(" ", "").strip() 350 | msg_type = msg_type.replace(".", "-") 351 | 352 | if "reserved" in msg_name.lower(): 353 | idx += 1 354 | continue 355 | 356 | if len(msg_type) < 4: 357 | idx += 1 358 | continue 359 | 360 | msg_cnt = sum(map(lambda x: x == "-", msg_type)) 361 | 362 | if msg_cnt > 2: 363 | prefix = msg_type 364 | if class_name: 365 | sub_class_name = class_name + "-" + msg_name 366 | else: 367 | sub_class_name = msg_name 368 | idx += 1 369 | continue 370 | 371 | if "x" in msg_type: 372 | idx += 1 373 | continue 374 | 375 | msg_type = msg_type.replace("-", "") 376 | if len(msg_type) < 8 and prefix: 377 | prefix_cnt = sum(map(lambda x: x == "-", prefix)) 378 | if len(msg_type) != prefix_cnt: 379 | idx += 1 380 | continue 381 | 382 | msg_type = prefix[: (8 - len(msg_type))] + msg_type 383 | msg_type = msg_type.replace("x", "0") 384 | 385 | # print(msg_type, prefix, msg_name) 386 | # if len(msg_type) != 8: 387 | # msg_type = msg_type.rjust(8, '0') 388 | # import pdb; pdb.set_trace() 389 | # idx += 1 390 | # continue 391 | 392 | assert msg_name not in msgs 393 | 394 | if prefix: 395 | msgs[msg_name] = [msg_type, sub_class_name] 396 | else: 397 | msgs[msg_name] = [msg_type, class_name] 398 | idx += 1 399 | 400 | return msgs 401 | 402 | 403 | def handle_exception_24011(msgs): 404 | """ 405 | # ts 4.011, RP messages use 406 | RP messages are included in CP messages 407 | 0 0 0 ms -> n RP-DATA 408 | 0 0 1 n -> ms RP-DATA 409 | 0 1 0 ms -> n RP-ACK 410 | 0 1 1 n -> ms RP-ACK 411 | 1 0 0 ms -> n RP-ERROR 412 | 1 0 1 n -> ms RP-ERROR 413 | 1 1 0 ms -> n RP-SMMA 414 | """ 415 | 416 | cp_class_name = "short message and notification transfer on CM" 417 | rp_class_name = "short message and notification transfer on CM-RP messages" 418 | types = { 419 | # 'cp-data' [1, cp_Class_name] is embedding below messages 420 | "rp-data": [0, rp_class_name], 421 | "rp-data": [1, rp_class_name], 422 | "rp-ack": [2, rp_class_name], 423 | "rp-ack": [3, rp_class_name], 424 | "rp-error": [4, rp_class_name], 425 | "rp-error": [5, rp_class_name], 426 | "rp-smma": [6, rp_class_name], 427 | "cp-ack": [4, cp_class_name], 428 | "cp-error": [16, cp_class_name], 429 | } 430 | 431 | return msgs, types 432 | 433 | 434 | # complementary parser 435 | def parse(input_fname, input_fname2=""): 436 | txt_name = convert_txt(input_fname) 437 | lines = read_data(txt_name) 438 | msgs = parse_msg_content(lines) 439 | types = parse_msg_type(lines, ".pdf" in input_fname) 440 | 441 | # complementary step 442 | if input_fname2: 443 | # analyze pdf to extract types 444 | txt_name = convert_txt(input_fname2) 445 | lines = read_data(txt_name) 446 | # msgs2 = parse_msg_content(lines) 447 | types2 = parse_msg_type(lines, ".pdf" in input_fname2) 448 | 449 | for key, val in types2.items(): 450 | if key not in types: 451 | types[key] = val 452 | 453 | if "24011" in input_fname or "24.011" in input_fname: 454 | msgs, types = handle_exception_24011(msgs) 455 | 456 | total = defaultdict(list) 457 | for msg_name, (msg_type, class_name) in types.items(): 458 | orig_name = msg_name 459 | msg_name = msg_name.lower() 460 | if "reserved" in msg_name: 461 | continue 462 | 463 | if msg_name not in msgs: 464 | continue 465 | 466 | if isinstance(msg_type, str): 467 | msg_type = int(msg_type, 2) 468 | total[class_name].append([msg_type, orig_name, msgs[msg_name]]) 469 | 470 | return total 471 | 472 | 473 | def parse_all(): 474 | path = os.path.abspath(os.path.dirname(__file__)) 475 | file_list = [ 476 | ["24008-f80.doc", "ts_124008v150800p.pdf"], 477 | ["24011-f30.doc", ""], 478 | ["24080-f10.doc", ""], 479 | ["24301-f80.doc", "ts_124301v150800p.pdf"], 480 | ["44018-f50.doc", ""], 481 | ] 482 | 483 | total = {} 484 | for f1, f2 in file_list: 485 | f1 = os.path.join(path, "spec", f1) 486 | if f2: 487 | f2 = os.path.join(path, "spec", f2) 488 | msgs = parse(f1, f2) 489 | for key, val in msgs.items(): 490 | total[key] = val 491 | 492 | return total 493 | 494 | 495 | def get_spec_msgs(): 496 | msgs = parse_all() 497 | # ============================= 498 | # Message types 499 | # ============================= 500 | nas_prots = { 501 | "EPS session management": 2, 502 | "Call Control and call related SS messages": 3, 503 | "GTTP messages": 4, 504 | "Mobility Management": 5, 505 | "Radio Resource management": 6, 506 | "EPS mobility management": 7, 507 | "GPRS mobility management": 8, 508 | "short message and notification transfer": 9, 509 | "GPRS session management": 10, 510 | "Miscellaneous message group": 11, 511 | "Clearing messages": 11, 512 | } 513 | 514 | spec_map = {} 515 | for nas_name, nas_type in nas_prots.items(): 516 | spec_map[nas_type] = {} 517 | 518 | for class_name, vals in sorted(msgs.items()): 519 | if "-" in class_name: 520 | class_name, sub_class_name = class_name.split("-") 521 | else: 522 | sub_class_name = "" 523 | 524 | target_nas_type = None 525 | for nas_name, nas_type in nas_prots.items(): 526 | if nas_name in class_name: 527 | target_nas_type = nas_type 528 | break 529 | 530 | assert target_nas_type is not None 531 | 532 | for msg_type, msg_name, ie_list in vals: 533 | if msg_type not in spec_map[target_nas_type]: 534 | spec_map[target_nas_type][msg_type] = [] 535 | spec_map[target_nas_type][msg_type].append( 536 | [class_name, sub_class_name, msg_name, ie_list] 537 | ) 538 | 539 | return spec_map 540 | 541 | 542 | def main(): 543 | if len(sys.argv) < 2: 544 | msgs = parse_all() 545 | 546 | elif len(sys.argv) == 2: 547 | input_fname = sys.argv[1] 548 | msgs = parse(input_fname) 549 | 550 | elif len(sys.argv) > 2: 551 | input_fname = sys.argv[1] 552 | input_fname2 = sys.argv[2] 553 | msgs = parse(input_fname, input_fname2) 554 | 555 | for class_name, vals in sorted(msgs.items()): 556 | for msg_type, msg_name, msgs2 in sorted(vals): 557 | for direction, ie_list in msgs2: 558 | for ie in ie_list: 559 | ie_name, iei, ref, presence, ie_format, length = ie 560 | if direction == "UL": 561 | print( 562 | class_name, 563 | "->", 564 | msg_name, 565 | "(UL) ->", 566 | ie_name, 567 | ie_format, 568 | length, 569 | ) 570 | elif direction == "DL": 571 | print( 572 | class_name, 573 | "->", 574 | msg_name, 575 | "(DL) ->", 576 | ie_name, 577 | ie_format, 578 | length, 579 | ) 580 | elif direction == "both": 581 | print( 582 | class_name, 583 | "->", 584 | msg_name, 585 | "(Both) ->", 586 | ie_name, 587 | ie_format, 588 | length, 589 | ) 590 | else: 591 | print( 592 | class_name, 593 | "->", 594 | msg_name, 595 | "() ->", 596 | ie_name, 597 | ie_format, 598 | length, 599 | ) 600 | assert False 601 | 602 | 603 | if __name__ == "__main__": 604 | main() 605 | -------------------------------------------------------------------------------- /basespec/preprocess.py: -------------------------------------------------------------------------------- 1 | import time 2 | import pickle 3 | 4 | import idc 5 | import idaapi 6 | import idautils 7 | 8 | import ida_funcs 9 | import ida_auto 10 | import ida_bytes 11 | import ida_offset 12 | import ida_segment 13 | 14 | from idautils import XrefsTo 15 | 16 | from .utils import get_string, check_string, create_string 17 | from .utils import check_funcname, set_funcname 18 | 19 | 20 | def next_addr_aligned(ea, align=2, end_ea=idc.BADADDR): 21 | ea = ida_bytes.next_inited(ea, end_ea) 22 | while ea % align != 0: 23 | ea = ida_bytes.next_inited(ea, end_ea) 24 | 25 | return ea 26 | 27 | 28 | def prev_addr_aligned(ea, align=2, end_ea=0): 29 | ea = ida_bytes.prev_inited(ea, 0) 30 | while ea % align != 0: 31 | ea = ida_bytes.prev_inited(ea, 0) 32 | 33 | return ea 34 | 35 | 36 | def is_assigned(ea): 37 | func_end = idc.get_func_attr(ea, idc.FUNCATTR_END) 38 | # check if function is already assigned 39 | if func_end != idc.BADADDR: 40 | # sometimes, ida returns wrong addr 41 | if func_end > ea: 42 | ea = func_end 43 | else: 44 | ea = next_addr_aligned(ea, 4) 45 | 46 | return True, ea 47 | 48 | # check if string is already assigned 49 | if idc.get_str_type(ea) is not None: 50 | item_end = idc.get_item_end(ea) 51 | # sometimes, ida returns wrong addr 52 | if item_end > ea: 53 | ea = item_end 54 | else: 55 | ea = next_addr_aligned(ea, 4) 56 | 57 | return True, ea 58 | 59 | return False, ea 60 | 61 | 62 | string_initialized = False 63 | 64 | 65 | def init_strings(force=False, target_seg_name=None): 66 | global string_initialized 67 | if not force and string_initialized: 68 | return 69 | 70 | for ea in idautils.Segments(): 71 | seg = ida_segment.getseg(ea) 72 | seg_name = ida_segment.get_segm_name(seg) 73 | 74 | # We only check target segment since it may take too much time. 75 | if target_seg_name and seg_name == target_seg_name: 76 | continue 77 | 78 | print("Initializing %x -> %x (%s)" % (seg.start_ea, seg.end_ea, seg_name)) 79 | 80 | # TODO: we may use other strategy to find string pointers 81 | analyze_str_ptr(seg.start_ea, seg.end_ea) 82 | 83 | analyze_ida_str() 84 | string_initialized = True 85 | 86 | 87 | def analyze_str_ptr(start_ea, end_ea): 88 | str_cnt = 0 89 | start_time = time.time() 90 | 91 | # First, find string references. strings referenced by pointers are highly 92 | # likely string. 93 | ea = start_ea 94 | while ea != idc.BADADDR and ea < end_ea: 95 | status, ea = is_assigned(ea) 96 | if status: 97 | continue 98 | 99 | str_ptr = ida_bytes.get_dword(ea) 100 | if idc.get_str_type(str_ptr) is not None or check_string(str_ptr, 8): 101 | # even already assigned strings may not have reference. 102 | ida_offset.op_offset(ea, 0, idc.REF_OFF32) 103 | 104 | if idc.get_str_type(str_ptr) is None: 105 | create_string(str_ptr) 106 | 107 | str_cnt += 1 108 | if str_cnt % 10000 == 0: 109 | print( 110 | "%x: %d strings has been found. (%0.3f secs)" 111 | % (ea, str_cnt, time.time() - start_time) 112 | ) 113 | 114 | ea = next_addr_aligned(ea, 4) 115 | 116 | print("Created %d strings. (%0.3f secs)" % (str_cnt, time.time() - start_time)) 117 | 118 | 119 | def analyze_ida_str(): 120 | global all_str 121 | if "all_str" not in globals(): 122 | all_str = idautils.Strings() 123 | 124 | str_cnt = 0 125 | start_time = time.time() 126 | 127 | for s in all_str: 128 | # check if there exists already assigned function or string 129 | if any( 130 | ida_funcs.get_fchunk(ea) or (idc.get_str_type(ea) is not None) 131 | for ea in (s.ea, s.ea + s.length) 132 | ): 133 | continue 134 | 135 | if check_string(30) and create_string(s.ea): 136 | str_cnt += 1 137 | if str_cnt % 1000 == 0: 138 | print( 139 | "%x: %d strings has been found. (%0.3f secs)" 140 | % (s.ea, str_cnt, time.time() - start_time) 141 | ) 142 | 143 | print("Created %d strings. (%0.3f secs)" % (str_cnt, time.time() - start_time)) 144 | 145 | 146 | NONE = 0 147 | ARM = 1 148 | THUMB = 2 149 | 150 | 151 | def is_valid_reglist(word, is_16bit=False): 152 | if is_16bit: 153 | word = word & 0x1FF 154 | # should contain LR 155 | lr = word >> 8 & 1 156 | if lr != 1: 157 | return False 158 | 159 | regs = word 160 | 161 | else: 162 | # should contain LR 163 | # should not contain SP, PC 164 | sp = word >> 13 & 1 165 | lr = word >> 14 & 1 166 | pc = word >> 15 & 1 167 | if sp != 0 or lr != 1 or pc != 0: 168 | return False 169 | 170 | regs = word & 0x1FFF 171 | 172 | # At least one reg should exist 173 | if regs == 0: 174 | return False 175 | 176 | # We compute maximum consequtive 1s to find continuous reg-list like 177 | # '0001111100000'. 178 | if is_16bit: 179 | threshold = 0 180 | else: 181 | threshold = 1 182 | 183 | cnt = 0 184 | while regs != 0: 185 | regs = regs & (regs << 1) 186 | cnt += 1 187 | 188 | if cnt > threshold: 189 | return True 190 | else: 191 | return False 192 | 193 | 194 | # This function only checks PUSH instruction. 195 | # TODO: properly find function prolog (e.g, using ML?) 196 | def is_func_prolog(ea, reg_check=True): 197 | if ea % 2 != 0: 198 | return NONE 199 | 200 | # function prolog requires at least four bytes 201 | if any(not ida_bytes.is_mapped(ea + i) for i in range(4)): 202 | return NONE 203 | 204 | word = ida_bytes.get_word(ea) 205 | next_word = ida_bytes.get_word(ea + 2) 206 | 207 | # check thumb PUSH.W 208 | if word == 0xE92D: 209 | # PUSH LR 210 | if not reg_check or is_valid_reglist(next_word, False): 211 | return THUMB 212 | 213 | # check thumb PUSH 214 | elif (word >> 8) == 0xB5: 215 | if not reg_check or is_valid_reglist(word, True): 216 | return THUMB 217 | 218 | # check arm PUSH 219 | elif next_word == 0xE92D: 220 | if ea % 4 == 0: 221 | # PUSH LR 222 | if not reg_check or is_valid_reglist(word, False): 223 | return ARM 224 | 225 | return NONE 226 | 227 | 228 | # This function finds function candidates. Currently, we only find the 229 | # candidates by prolog. 230 | def find_prev_func_cand(ea, end_ea=idc.BADADDR): 231 | while ea < end_ea and ea != idc.BADADDR: 232 | mode = is_func_prolog(ea, reg_check=False) 233 | if mode != NONE: 234 | # check if the function is already assigned. If current function 235 | # has found, we cannot just set ea to the function end since IDA 236 | # may fail to find the function end corrently. 237 | if idc.get_func_attr(ea, idc.FUNCATTR_START) == idc.BADADDR: 238 | return ea, mode 239 | 240 | ea = prev_addr_aligned(ea) 241 | 242 | return idc.BADADDR, NONE 243 | 244 | 245 | # This function finds function candidates. Currently, we only find the 246 | # candidates by prolog. 247 | def find_next_func_cand(ea, end_ea=idc.BADADDR): 248 | while ea < end_ea and ea != idc.BADADDR: 249 | mode = is_func_prolog(ea) 250 | if mode != NONE: 251 | # check if the function is already assigned. If current function 252 | # has found, we cannot just set ea to the function end since IDA 253 | # may fail to find the function end corrently. 254 | if idc.get_func_attr(ea, idc.FUNCATTR_START) == idc.BADADDR: 255 | return ea, mode 256 | 257 | ea = next_addr_aligned(ea) 258 | 259 | return idc.BADADDR, NONE 260 | 261 | 262 | def fix_func_prolog(ea, end_ea=idc.BADADDR): 263 | global FUNC_BY_LS 264 | 265 | func_cnt = 0 266 | func = ida_funcs.get_fchunk(ea) 267 | if func is None: 268 | func = ida_funcs.get_next_func(ea) 269 | ea = func.start_ea 270 | 271 | while func is not None and ea < end_ea: 272 | # if current function is small enough and there exists a function right 273 | # next to current function 274 | if ( 275 | func.size() <= 8 276 | and idc.get_func_attr(func.end_ea, idc.FUNCATTR_START) != idc.BADADDR 277 | ): 278 | # If the next function can be connected, there must be a basic block reference. 279 | # xref.type == 21 means 'fl_F', which is an ordinary flow. 280 | if all( 281 | (func.start_ea <= xref.frm < func.end_ea) and xref.type == 21 282 | for xref in XrefsTo(func.end_ea) 283 | ): 284 | if func_cnt > 0 and func_cnt % 1000 == 0: 285 | print( 286 | "%x <- %x: prolog merging (%d)." 287 | % (func.start_ea, func.end_ea, func_cnt) 288 | ) 289 | ida_bytes.del_items(func.end_ea, ida_bytes.DELIT_EXPAND) 290 | ida_bytes.del_items(func.start_ea, ida_bytes.DELIT_EXPAND) 291 | ida_auto.auto_wait() 292 | 293 | status = idc.add_func(func.start_ea) 294 | if not status: 295 | print("Error merging 0x%x <- 0x%x" % (func.start_ea, func.end_ea)) 296 | else: 297 | func_cnt += 1 298 | FUNC_BY_LS.discard(func.end_ea) 299 | ida_auto.auto_wait() 300 | 301 | func = ida_funcs.get_next_func(ea) 302 | if func: 303 | ea = func.start_ea 304 | 305 | print("Fixed %d functions" % func_cnt) 306 | 307 | 308 | FUNC_BY_LS = set() 309 | FUNC_BY_LS_TIME = None 310 | 311 | 312 | def analyze_linear_sweep(start_ea, end_ea=idc.BADADDR): 313 | global FUNC_BY_LS, FUNC_BY_LS_TIME 314 | if "FUNC_BY_LS" not in globals() or len(FUNC_BY_LS) == 0: 315 | FUNC_BY_LS = set() 316 | 317 | cand_cnt = 0 318 | func_cnt = 0 319 | ea = start_ea 320 | start_time = time.time() 321 | while ea < end_ea and ea != idc.BADADDR: 322 | ea, mode = find_next_func_cand(ea, end_ea) 323 | if ea == idc.BADADDR: 324 | break 325 | 326 | cand_cnt += 1 327 | if cand_cnt % 10000 == 0: 328 | print( 329 | "%x: %d/%d function has been found (%d secs)" 330 | % (ea, func_cnt, cand_cnt, time.time() - start_time) 331 | ) 332 | 333 | # set IDA segment register to specify ARM mode 334 | old_flag = idc.get_sreg(ea, "T") 335 | if mode == THUMB: 336 | idc.split_sreg_range(ea, "T", 1, idc.SR_user) 337 | elif mode == ARM: 338 | idc.split_sreg_range(ea, "T", 0, idc.SR_user) 339 | else: 340 | print("Unknown mode") 341 | raise NotImplemented 342 | 343 | # add_func ignores the existing function, but existing function is 344 | # already filtered when finding the candidate 345 | status = idc.add_func(ea) 346 | if status: 347 | func_cnt += 1 348 | FUNC_BY_LS.add(ea) 349 | 350 | # Wait IDA's auto analysis 351 | ida_auto.auto_wait() 352 | 353 | # even though add_func succeed, it may not be correct. 354 | # TODO: how to check the correctness? we may check the function end? 355 | func_end = idc.get_func_attr(ea, idc.FUNCATTR_END) 356 | if func_end > ea: 357 | ea = func_end 358 | else: 359 | # sometimes, ida returns wrong addr 360 | ea = next_addr_aligned(ea) 361 | 362 | else: 363 | if idc.get_func_attr(ea, idc.FUNCATTR_START) == idc.BADADDR: 364 | # IDA automatically make code, and this remains even though 365 | # add_func fails. 366 | ida_bytes.del_items(ea, ida_bytes.DELIT_EXPAND) 367 | 368 | # reset IDA segment register to previous ARM mode 369 | idc.split_sreg_range(ea, "T", old_flag, idc.SR_user) 370 | 371 | # Wait IDA's auto analysis 372 | ida_auto.auto_wait() 373 | 374 | ea = next_addr_aligned(ea) 375 | 376 | # linear sweep may choose wrong prologs. We merge the prologs of two 377 | # adjacent functions. 378 | if func_cnt > 0: 379 | fix_func_prolog(start_ea, end_ea) 380 | 381 | FUNC_BY_LS_TIME = time.time() - start_time 382 | print( 383 | "Found %d/%d functions. (%d sec)" % (len(FUNC_BY_LS), cand_cnt, FUNC_BY_LS_TIME) 384 | ) 385 | 386 | 387 | # TODO: find other functions by other pointer analysis 388 | # Please check analyze_func_ptrs function at the below 389 | FUNC_BY_PTR = set() 390 | FUNC_BY_PTR_TIME = None 391 | 392 | 393 | def analyze_func_ptr(start_ea, end_ea): 394 | global FUNC_BY_PTR, FUNC_BY_PTR_TIME 395 | if "FUNC_BY_PTR" not in globals() or len(FUNC_BY_PTR) == 0: 396 | FUNC_BY_PTR = set() 397 | 398 | ea = start_ea 399 | func_cnt = 0 400 | name_cnt = 0 401 | start_time = time.time() 402 | 403 | while ea != idc.BADADDR and ea <= end_ea: 404 | status, ea = is_assigned(ea) 405 | if status: 406 | continue 407 | 408 | # now check function pointer 409 | func_ptr = ida_bytes.get_dword(ea) 410 | 411 | # TODO: skip other segments that are not code. 412 | 413 | # for those already assigned functions, we need to check the segment range. 414 | if not (start_ea <= func_ptr < end_ea): 415 | ea = next_addr_aligned(ea, 4) 416 | continue 417 | 418 | # we only target thumb function to reduce false positives 419 | if func_ptr & 1 == 0: 420 | ea = next_addr_aligned(ea, 4) 421 | continue 422 | 423 | func_ptr = func_ptr - 1 424 | func_start = idc.get_func_attr(func_ptr, idc.FUNCATTR_START) 425 | if func_start != idc.BADADDR and func_start != func_ptr: 426 | # this is not a proper function pointer 427 | ea = next_addr_aligned(ea, 4) 428 | continue 429 | 430 | # new thumb function has been found! 431 | if func_start == idc.BADADDR: 432 | old_flag = idc.get_sreg(func_ptr, "T") 433 | idc.split_sreg_range(func_ptr, "T", 1, idc.SR_user) 434 | status = idc.add_func(func_ptr) 435 | if not status: 436 | # IDA automatically make code, and this remains even 437 | # though add_func fails. 438 | ida_bytes.del_items(func_ptr, ida_bytes.DELIT_EXPAND) 439 | idc.split_sreg_range(func_ptr, "T", old_flag, idc.SR_user) 440 | 441 | ea = next_addr_aligned(ea, 4) 442 | continue 443 | 444 | func_cnt += 1 445 | FUNC_BY_PTR.add(ea) 446 | if func_cnt % 10000 == 0: 447 | print( 448 | "%x: %d functions has been found. (%0.3f secs)" 449 | % (ea, func_cnt, time.time() - start_time) 450 | ) 451 | 452 | # If we find a function, we try to assign a name. The name may be 453 | # derived by C++ structure. 454 | if analyze_funcname(ea, func_ptr): 455 | name_cnt += 1 456 | func_name = idc.get_func_name(func_ptr) 457 | if name_cnt % 10000 == 0: 458 | print( 459 | "%x: %d names has been found. (%0.3f secs)" 460 | % (ea, name_cnt, time.time() - start_time) 461 | ) 462 | # print("%x: %x => %s" % (ea, func_ptr, func_name)) 463 | 464 | ea = next_addr_aligned(ea, 4) 465 | 466 | FUNC_BY_PTR_TIME = time.time() - start_time 467 | print( 468 | "Found %d functions, renamed %d functions (%0.3f secs)" 469 | % (len(FUNC_BY_PTR), name_cnt, FUNC_BY_PTR_TIME) 470 | ) 471 | 472 | 473 | # TODO: find the cause of the remaining function names. 474 | def analyze_funcname(ea, func_ptr): 475 | # check at least 10 items. 476 | name_pptr = find_funcname_ptr(ea, 10) 477 | if not name_pptr: 478 | return False 479 | 480 | ida_offset.op_offset(ea, 0, idc.REF_OFF32) 481 | ida_offset.op_offset(name_pptr, 0, idc.REF_OFF32) 482 | 483 | func_name = idc.get_func_name(func_ptr) 484 | if not func_name.startswith("sub_"): 485 | # already function name has been assigned 486 | return False 487 | 488 | name_ptr = ida_bytes.get_dword(name_pptr) 489 | func_name = get_string(name_ptr) 490 | if isinstance(func_name, bytes): 491 | func_name = func_name.decode() 492 | 493 | set_funcname(func_ptr, func_name) 494 | 495 | return True 496 | 497 | 498 | # this function returns first name pointer 499 | def find_funcname_ptr(ea, n): 500 | for i in range(4, n * 4, 4): 501 | name_ptr = ida_bytes.get_dword(ea + i) 502 | # this might be a function, so break and proceed next check 503 | if name_ptr & 1 == 1: 504 | return 505 | elif check_funcname(name_ptr): 506 | return ea + i 507 | return 508 | 509 | 510 | func_initialized = False 511 | 512 | 513 | def init_functions(force=False, target_seg_name=None): 514 | global func_initialized 515 | if not force and func_initialized: 516 | return 517 | 518 | # Linear sweep to find functions 519 | for ea in idautils.Segments(): 520 | # TODO: skip other segments that are not code. 521 | seg = ida_segment.getseg(ea) 522 | seg_name = ida_segment.get_segm_name(seg) 523 | 524 | # We only check target segment since it may take too much time. 525 | if target_seg_name and seg_name == target_seg_name: 526 | continue 527 | 528 | # TODO: we may use other strategy not just sweep linearly. 529 | print( 530 | "Linear sweep analysis: %x -> %x (%s)" 531 | % (seg.start_ea, seg.end_ea, seg_name) 532 | ) 533 | analyze_linear_sweep(seg.start_ea, seg.end_ea) 534 | 535 | # Find function pointer candidates 536 | for ea in idautils.Segments(): 537 | # TODO: skip other segments that are not code. 538 | seg = ida_segment.getseg(ea) 539 | seg_name = ida_segment.get_segm_name(seg) 540 | 541 | # We only check target segment since it may take too much time. 542 | if target_seg_name and seg_name == target_seg_name: 543 | continue 544 | 545 | # Analyze functions by pointers 546 | print( 547 | "Function pointer analysis: %x -> %x (%s)" 548 | % (seg.start_ea, seg.end_ea, seg_name) 549 | ) 550 | analyze_func_ptr(seg.start_ea, seg.end_ea) 551 | 552 | func_initialized = True 553 | -------------------------------------------------------------------------------- /basespec/scatterload.py: -------------------------------------------------------------------------------- 1 | import idc 2 | import ida_bytes 3 | import ida_segment 4 | import ida_search 5 | import ida_offset 6 | import ida_ua 7 | import ida_auto 8 | 9 | from idautils import XrefsTo 10 | 11 | from .utils import set_entry_name 12 | from .slicer import find_args 13 | 14 | # This is the main function 15 | # It finds scatterload related information, and performs scatterloading 16 | def run_scatterload(debug=False): 17 | # Newly identified region may have additional scatter load procedure. Thus, 18 | # we continuously proceed until no changes left. 19 | is_changed = True 20 | while is_changed: 21 | is_changed = False 22 | tables = find_scatter_table() 23 | scatter_funcs = find_scatter_funcs() 24 | 25 | for start, end in tables.items(): 26 | print("Processing table: 0x%x to 0x%x" % (start, end)) 27 | while start < end: 28 | ida_bytes.create_dword(start, 16) 29 | ida_offset.op_offset(start, 0, idc.REF_OFF32) 30 | src = ida_bytes.get_dword(start) 31 | dst = ida_bytes.get_dword(start + 4) 32 | size = ida_bytes.get_dword(start + 8) 33 | how = ida_bytes.get_dword(start + 12) 34 | 35 | if how not in scatter_funcs: 36 | print("%x: no addr 0x%x in scatter_funcs" % (start, how)) 37 | start += 16 38 | continue 39 | 40 | func_name = scatter_funcs[how] 41 | start += 16 42 | print("%s: 0x%x -> 0x%x (0x%x bytes)" % (func_name, src, dst, size)) 43 | 44 | if func_name != "__scatterload_zeroinit": 45 | if not idc.is_loaded(src) or size == 0: 46 | print("0x%x is not loaded." % (src)) 47 | continue 48 | 49 | if debug: 50 | # only show information above 51 | continue 52 | 53 | if func_name == "__scatterload_copy": 54 | if add_segment(dst, size, "CODE"): 55 | memcpy(src, dst, size) 56 | is_changed = True 57 | elif func_name == "__scatterload_decompress": 58 | if add_segment(dst, size, "DATA"): 59 | decomp(src, dst, size) 60 | is_changed = True 61 | # some old firmware images have this. 62 | elif func_name == "__scatterload_decompress2": 63 | if add_segment(dst, size, "DATA"): 64 | decomp2(src, dst, size) 65 | is_changed = True 66 | elif func_name == "__scatterload_zeroinit": 67 | # No need to further proceed for zero init. 68 | if add_segment(dst, size, "DATA"): 69 | memclr(dst, size) 70 | 71 | ida_auto.auto_wait() 72 | 73 | 74 | def add_segment(ea, size, seg_class, debug=False): 75 | # align page size 76 | ea = ea & 0xFFFFF000 77 | end_ea = ea + size 78 | is_changed = False 79 | if ea == 0: 80 | return False 81 | while ea < end_ea: 82 | cur_seg = ida_segment.getseg(ea) 83 | next_seg = ida_segment.get_next_seg(ea) 84 | 85 | if debug: 86 | print("=" * 30) 87 | if cur_seg: 88 | print("cur_seg: %x - %x" % (cur_seg.start_ea, cur_seg.end_ea)) 89 | if next_seg: 90 | print("next_seg: %x - %x" % (next_seg.start_ea, next_seg.end_ea)) 91 | print("new_seg: %x - %x" % (ea, end_ea)) 92 | 93 | # if there is no segment, so create new segment 94 | if not cur_seg: 95 | if not next_seg: 96 | ida_segment.add_segm(0, ea, end_ea, "", seg_class) 97 | is_changed = True 98 | break 99 | 100 | # if next_seg exists 101 | if end_ea <= next_seg.start_ea: 102 | ida_segment.add_segm(0, ea, end_ea, "", seg_class) 103 | is_changed = True 104 | break 105 | 106 | # end_ea > next_seg.start_ea, need to create more segments 107 | ida_segment.add_segm(0, ea, next_seg.start_ea, "", seg_class) 108 | 109 | # if segment already exists, we extend current segment 110 | else: 111 | if end_ea <= cur_seg.end_ea: 112 | break 113 | 114 | if not next_seg: 115 | ida_segment.set_segm_end(ea, end_ea, 0) 116 | ida_segment.set_segm_class(cur_seg, seg_class) 117 | is_changed = True 118 | break 119 | 120 | # if next_seg exists 121 | if end_ea <= next_seg.start_ea: 122 | ida_segment.set_segm_end(ea, end_ea, 0) 123 | ida_segment.set_segm_class(cur_seg, seg_class) 124 | is_changed = True 125 | break 126 | 127 | # end_ea > next_seg.start_ea, need to create more segments 128 | if cur_seg.end_ea < next_seg.start_ea: 129 | ida_segment.set_segm_end(ea, next_seg.start_ea, 0) 130 | ida_segment.set_segm_class(cur_seg, seg_class) 131 | is_changed = True 132 | 133 | ea = next_seg.start_ea 134 | 135 | return is_changed 136 | 137 | 138 | # TODO: search only newly created segments. 139 | def create_func_by_prefix(func_name, prefix, force=False): 140 | addrs = [] 141 | start_addr = 0 142 | func_addr = 0 143 | while func_addr != idc.BADADDR: 144 | func_addr = ida_search.find_binary( 145 | start_addr, idc.BADADDR, prefix, 16, idc.SEARCH_DOWN 146 | ) 147 | if func_addr == idc.BADADDR: 148 | break 149 | 150 | # already existing function but it is not the right prefix 151 | addr = idc.get_func_attr(func_addr, idc.FUNCATTR_START) 152 | if addr != idc.BADADDR and func_addr != addr: 153 | if not force: 154 | start_addr = func_addr + 4 155 | continue 156 | 157 | idc.del_func(addr) 158 | idc.del_items(func_addr) 159 | 160 | # add_func is not applied to the existing function 161 | idc.add_func(func_addr) 162 | 163 | func_name = set_entry_name(func_addr, func_name) 164 | print("%s: 0x%x" % (func_name, func_addr)) 165 | 166 | addrs.append(func_addr) 167 | start_addr = func_addr + 4 168 | 169 | return addrs 170 | 171 | 172 | def find_scatter_funcs(): 173 | scatter_func_bytes = { 174 | "__scatterload": [ 175 | "2C 00 8F E2 00 0C 90 E8 00 A0 8A E0 00 B0 8B E0", # For 5G 176 | "0A A0 90 E8 00 0C 82 44", 177 | ], 178 | "__scatterload_copy": [ 179 | "10 20 52 E2 78 00 B0 28", # For 5G 180 | "10 3A 24 BF 78 C8 78 C1 FA D8 52 07", 181 | ], 182 | "__scatterload_decompress": [ 183 | "02 20 81 E0 00 C0 A0 E3 01 30 D0 E4", # For 5G 184 | "0A 44 10 F8 01 4B 14 F0 0F 05 08 BF 10 F8 01 5B", 185 | "0A 44 4F F0 00 0C 10 F8 01 3B 13 F0 07 04 08 BF", 186 | ], 187 | "__scatterload_decompress2": [ 188 | "10 F8 01 3B 0A 44 13 F0 03 04 08 BF 10 F8 01 4B", 189 | ], 190 | "__scatterload_zeroinit": [ 191 | "00 30 B0 E3 00 40 B0 E3 00 50 B0 E3 00 60 B0 E3", # For 5G 192 | "00 23 00 24 00 25 00 26 10 3A 28 BF 78 C1 FB D8", 193 | ], 194 | } 195 | 196 | funcs = {} 197 | for name, prefixes in scatter_func_bytes.items(): 198 | for prefix in prefixes: 199 | addrs = create_func_by_prefix(name, prefix, force=True) 200 | for addr in addrs: 201 | if addr != idc.BADADDR: 202 | funcs[addr] = name 203 | 204 | return funcs 205 | 206 | 207 | def find_scatter_table(): 208 | scatter_load_bytes = { 209 | "__scatterload": [ 210 | "0A A0 90 E8 00 0C 82 44", 211 | "2C 00 8F E2 00 0C 90 E8 00 A0 8A E0 00 B0 8B E0", # For 5G 212 | ], 213 | } 214 | 215 | tables = {} 216 | for name, prefixes in scatter_load_bytes.items(): 217 | for prefix in prefixes: 218 | addrs = create_func_by_prefix(name, prefix, force=True) 219 | for addr in addrs: 220 | if addr == idc.BADADDR: 221 | continue 222 | 223 | offset_addr = idc.get_operand_value(addr, 1) 224 | if offset_addr == -1: 225 | old_flag = idc.get_sreg(addr, "T") 226 | idc.split_sreg_range(addr, "T", not old_flag, idc.SR_user) 227 | offset_addr = idc.get_operand_value(addr, 1) 228 | 229 | offset = ida_bytes.get_dword(offset_addr) 230 | offset2 = ida_bytes.get_dword(offset_addr + 4) 231 | start = (offset + offset_addr) & 0xFFFFFFFF 232 | end = (offset2 + offset_addr) & 0xFFFFFFFF 233 | if not idc.is_loaded(start): 234 | continue 235 | 236 | tables[start] = end 237 | print("__scatter_table: 0x%x -> 0x%x" % (start, end)) 238 | func_name = set_entry_name(start, "__scatter_table") 239 | 240 | return tables 241 | 242 | 243 | def memcpy(src, dst, length): 244 | if length == 0: 245 | return 246 | data = ida_bytes.get_bytes(src, length) 247 | ida_bytes.put_bytes(dst, data) 248 | 249 | 250 | def memclr(dst, length): 251 | if length == 0: 252 | return 253 | data = b"\x00" * length 254 | ida_bytes.put_bytes(dst, data) 255 | 256 | 257 | def decomp(src, dst, length): 258 | # print("decomp 0x%X 0x%X (0x%x)"%(src, dst, length)) 259 | end = dst + length 260 | while True: 261 | meta = ida_bytes.get_byte(src) 262 | src += 1 263 | l = meta & 7 264 | if l == 0: 265 | l = ida_bytes.get_byte(src) 266 | src += 1 267 | l2 = meta >> 4 268 | if l2 == 0: 269 | l2 = ida_bytes.get_byte(src) 270 | src += 1 271 | # print("meta: 0x%x l: 0x%X l2: 0x%x"%(meta,l,l2)) 272 | # copy l byte 273 | memcpy(src, dst, l - 1) 274 | src += l - 1 275 | dst += l - 1 276 | if meta & 8: 277 | off = ida_bytes.get_byte(src) 278 | src += 1 279 | for i in range(l2 + 2): 280 | memcpy(dst - off, dst, 1) 281 | dst += 1 282 | else: 283 | memclr(dst, l2) 284 | dst += l2 285 | if dst >= end: 286 | assert dst == end, "Decompress failed" 287 | # print('decomp end %0x %0x'%(dst, end)) 288 | break 289 | 290 | 291 | def decomp2(src, dst, length): 292 | # print("decomp 0x%X 0x%X (0x%x)"%(src, dst, length)) 293 | meta = ida_bytes.get_byte(src) 294 | src += 1 295 | end = dst + length 296 | while True: 297 | l = meta & 3 298 | if l == 0: 299 | l = ida_bytes.get_byte(src) 300 | src += 1 301 | 302 | l2 = meta >> 4 303 | if l2 == 0: 304 | l2 = ida_bytes.get_byte(src) 305 | src += 1 306 | # print("meta: 0x%x l: 0x%X l2: 0x%x"%(meta,l,l2)) 307 | # copy l byte 308 | memcpy(src, dst, l - 1) 309 | src += l - 1 310 | dst += l - 1 311 | 312 | if l2: 313 | off = ida_bytes.get_byte(src) 314 | src += 1 315 | meta_val = meta & 0xC 316 | src_ptr = dst - off 317 | if meta_val == 12: 318 | meta_val = ida_bytes.get_byte(src) 319 | src += 1 320 | src_ptr -= 256 * meta_val 321 | 322 | else: 323 | src_ptr -= 64 * meta_val 324 | 325 | l2 += 2 326 | memcpy(src_ptr, dst, l2) 327 | dst += l2 328 | 329 | meta = ida_bytes.get_byte(src) 330 | src += 1 331 | 332 | if dst >= end: 333 | assert dst == end, "Decompress failed" 334 | # print('decomp end %0x %0x'%(dst, end)) 335 | break 336 | -------------------------------------------------------------------------------- /basespec/slicer.py: -------------------------------------------------------------------------------- 1 | import idc 2 | import idautils 3 | import idaapi 4 | 5 | import ida_bytes 6 | import ida_ua 7 | import ida_funcs 8 | import ida_idp 9 | import ida_xref 10 | import ida_segment 11 | 12 | import re 13 | 14 | from .utils import is_thumb 15 | 16 | 17 | def get_reg(op): 18 | return ida_idp.get_reg_name(op.reg, 0) 19 | 20 | 21 | def get_regs(ea): 22 | if is_thumb(ea): 23 | # 1 --- 1 (LSB) 24 | # R7 --- R0 25 | reg_bits = ida_bytes.get_word(ea) & 0x1FF 26 | reg_list = ["R{0}".format(idx) for idx in range(8)] 27 | reg_list.append("LR") 28 | # reg_list.extend(['SP', 'LR']) 29 | # TODO: add 32 bit Thumb handling 30 | else: 31 | # 1 --- 1 (LSB) 32 | # R12 --- R0 33 | reg_bits = ida_bytes.get_word(ea) & 0xFFFF 34 | reg_list = ["R{0}".format(idx) for idx in range(13)] 35 | reg_list.extend(["SP", "LR", "PC"]) 36 | 37 | regs = [] 38 | idx = 0 39 | while reg_bits: 40 | if reg_bits & 0x1: 41 | regs.append(reg_list[idx]) 42 | reg_bits = reg_bits >> 1 43 | idx += 1 44 | 45 | return regs 46 | 47 | 48 | def merge_op_vals(val1, val2, operator): 49 | values = set() 50 | for x in val1: 51 | for y in val2: 52 | values.add(operator(x, y)) 53 | 54 | return values 55 | 56 | 57 | class SimpleForwardSlicer(object): 58 | def __init__(self): 59 | self.visited = set() 60 | self.values = dict() 61 | self.inter = False 62 | self.init() 63 | 64 | def init(self): 65 | self.memory = dict() 66 | self.regs = dict() 67 | self.func_start = None 68 | 69 | # initialize stack 70 | self.regs["SP"] = 0x100000000 71 | 72 | def run(self, start_ea, end_ea=idc.BADADDR, end_cnt=100): 73 | self.init() 74 | 75 | # not in code segment 76 | ea = start_ea 77 | 78 | func_start = idc.get_func_attr(ea, idc.FUNCATTR_START) 79 | if func_start == idc.BADADDR: 80 | return 81 | 82 | self.func_start = func_start 83 | 84 | cnt = 0 85 | while True: 86 | if ea in self.visited: 87 | break 88 | 89 | self.visited.add(ea) 90 | # break if ea is out of the original function 91 | # TODO: Add inter-procedural 92 | if idc.get_func_attr(ea, idc.FUNCATTR_START) != func_start: 93 | break 94 | 95 | if ea == end_ea: 96 | break 97 | 98 | if end_ea == idc.BADADDR and cnt >= end_cnt: 99 | break 100 | 101 | # there may exist data section 102 | mnem = ida_ua.ua_mnem(ea) 103 | if not ida_ua.can_decode(ea) or not mnem: 104 | break 105 | 106 | if mnem.startswith("B"): 107 | ea = idc.get_operand_value(ea, 0) 108 | 109 | elif mnem.startswith("POP"): 110 | break 111 | 112 | else: 113 | if not self.run_helper(ea): 114 | # print("%x: something wrong: %s" % (ea, idc.GetDisasm(ea))) 115 | break 116 | 117 | ea = ida_xref.get_first_cref_from(ea) 118 | 119 | cnt += 1 120 | 121 | def fetch_value(self, op1, op2, op3, op4, operator=None): 122 | value = None 123 | value2 = None 124 | value3 = None 125 | value4 = None 126 | 127 | # fetch register value 128 | if op2.type == ida_ua.o_reg: 129 | if get_reg(op2) not in self.regs: 130 | return 131 | 132 | value = self.regs[get_reg(op2)] 133 | 134 | # More than two arguments 135 | if op3 and op3.type != ida_ua.o_void: 136 | # ADD R0, R1, R2 137 | if op3.type == ida_ua.o_reg: 138 | if get_reg(op3) not in self.regs: 139 | return 140 | 141 | value3 = self.regs[get_reg(op3)] 142 | 143 | # immediate value, we get the value right away 144 | # ADD R0, R1, #123 145 | elif op3.type == ida_ua.o_imm: 146 | value3 = op3.value 147 | 148 | # o_idaspec0-5 149 | # ADD R0, R1, R2,LSL#2 150 | # processor specific type 'LSL' 151 | elif op3.type == ida_ua.o_idpspec0: 152 | if get_reg(op3) not in self.regs: 153 | return 154 | 155 | value3 = self.regs[get_reg(op3)] << op3.value 156 | 157 | else: 158 | # TODO: currently not implemented 159 | # print("unknown operand type: %d" % (op3.type)) 160 | # raise NotImplemented 161 | return 162 | 163 | # Handle arithmetic operator 164 | assert operator is not None 165 | assert value3 is not None 166 | value = operator(value, value3) & 0xFFFFFFFF 167 | 168 | # MLA R0, R1, R2, R3 169 | if op4 and op4.type != ida_ua.o_void: 170 | if op4.type == ida_ua.o_reg: 171 | if get_reg(op4) not in self.regs: 172 | return 173 | 174 | value4 = self.regs[get_reg(op4)] 175 | 176 | # Handle arithmetic operator 177 | assert operator is not None 178 | assert value3 is not None 179 | value = operator(value, value4) & 0xFFFFFFFF 180 | 181 | # in the stack. 182 | # o_displ = [Base Reg + Displacement] 183 | elif op2.type == ida_ua.o_displ: 184 | if get_reg(op2) not in self.regs: 185 | return 186 | 187 | value = self.regs[get_reg(op2)] + op2.addr 188 | 189 | # reference the memory and get the value written in the memory 190 | elif op2.type == ida_ua.o_mem: 191 | value = op2.addr 192 | 193 | # immediate value, we get the value right away 194 | elif op2.type == ida_ua.o_imm: 195 | value = op2.value 196 | 197 | # o_phrase = [Base Reg + Index Reg + Displacement] 198 | elif op2.type == ida_ua.o_phrase: 199 | assert op3 is not None 200 | 201 | if get_reg(op3) not in self.regs: 202 | return 203 | 204 | if get_reg(op2) not in self.regs: 205 | return 206 | 207 | value2 = self.regs[get_reg(op2)] 208 | value3 = self.regs[get_reg(op3)] 209 | value = value2 + value3 + op2.phrase 210 | 211 | return value 212 | 213 | def run_helper(self, ea): 214 | # there may exist data section 215 | mnem = ida_ua.ua_mnem(ea) 216 | if not ida_ua.can_decode(ea) or not mnem: 217 | return 218 | 219 | # we need to check at most 4 operands 220 | insn = ida_ua.insn_t() 221 | inslen = ida_ua.decode_insn(insn, ea) 222 | op1 = insn.ops[0] 223 | op2 = insn.ops[1] 224 | op3 = insn.ops[2] 225 | op4 = insn.ops[3] 226 | 227 | if any(mnem.startswith(word) for word in ["PUSH", "POP"]): 228 | # TODO: implement this properly 229 | return True 230 | 231 | elif any( 232 | mnem.startswith(word) 233 | for word in ["MOV", "LDR", "ADR", "STR", "ADD", "SUB", "MUL"] 234 | ): 235 | assert op2 is not None 236 | 237 | if mnem.startswith("ADD"): 238 | operator = lambda x1, x2: x1 + x2 239 | elif mnem.startswith("SUB"): 240 | operator = lambda x1, x2: x1 - x2 241 | elif mnem.startswith("MUL"): 242 | operator = lambda x1, x2: x1 * x2 243 | else: 244 | operator = None 245 | 246 | value = self.fetch_value(op1, op2, op3, op4, operator) 247 | if value is None: 248 | return 249 | 250 | value = value & 0xFFFFFFFF 251 | 252 | if mnem.startswith("MOV"): 253 | self.regs[get_reg(op1)] = value 254 | 255 | elif mnem.startswith("LDR") or mnem.startswith("ADR"): 256 | if mnem.startswith("LDR"): 257 | assert op2.type in [ida_ua.o_displ, ida_ua.o_mem, ida_ua.o_phrase] 258 | 259 | if value in self.memory: 260 | value = self.memory[value] 261 | 262 | else: 263 | seg = ida_segment.getseg(value) 264 | if seg == idc.BADADDR: 265 | return 266 | 267 | value = ida_bytes.get_dword(value) 268 | 269 | elif mnem.startswith("ADR"): 270 | assert op2.type == ida_ua.o_imm 271 | 272 | self.regs[get_reg(op1)] = value 273 | 274 | elif mnem.startswith("STR"): 275 | assert op2.type in [ida_ua.o_displ, ida_ua.o_mem, ida_ua.o_phrase] 276 | if get_reg(op1) not in self.regs: 277 | return 278 | 279 | self.memory[value] = self.regs[get_reg(op1)] 280 | 281 | elif any(mnem.startswith(word) for word in ["ADD", "SUB", "MUL"]): 282 | if op2.type == ida_ua.o_imm: 283 | if get_reg(op1) not in self.regs: 284 | return 285 | 286 | value = operator(self.regs[get_reg(op1)], value) 287 | 288 | self.regs[get_reg(op1)] = value 289 | 290 | return True 291 | 292 | else: 293 | # Skip unknown instructions 294 | return True 295 | 296 | # This should not be reached. 297 | print(hex(ea), idc.GetDisasm(ea)) 298 | assert False 299 | 300 | 301 | class SimpleBackwardSlicer(object): 302 | def __init__(self): 303 | self.visited = set() 304 | self.values = dict() 305 | self.memory = dict() 306 | self.stack = dict() 307 | self.func_start = None 308 | self.inter = False 309 | 310 | def find_reg_value( 311 | self, ea, reg_name, end_ea=idc.BADADDR, inter=False, end_cnt=100 312 | ): 313 | # not in code segment 314 | func_start = idc.get_func_attr(ea, idc.FUNCATTR_START) 315 | if func_start == idc.BADADDR: 316 | return 317 | 318 | self.func_start = func_start 319 | self.inter = inter 320 | 321 | return self.find_reg_value_helper(ea, reg_name, end_ea, end_cnt) 322 | 323 | def find_reg_value_helper(self, ea, reg_name, end_ea, end_cnt, offset=None): 324 | if (ea, reg_name) in self.values: 325 | return self.values[(ea, reg_name)] 326 | 327 | if end_ea != idc.BADADDR and ea < end_ea: 328 | return 329 | 330 | if end_cnt == 0: 331 | return 332 | 333 | # not in code segment 334 | func_addr = idc.get_func_attr(ea, idc.FUNCATTR_START) 335 | if func_addr == idc.BADADDR: 336 | return 337 | 338 | # out of current function 339 | if not self.inter and func_addr != self.func_start: 340 | return 341 | 342 | # there may exist data section 343 | mnem = ida_ua.ua_mnem(ea) 344 | if not ida_ua.can_decode(ea) or not mnem: 345 | return 346 | 347 | # we need to check at most 4 operands 348 | insn = ida_ua.insn_t() 349 | inslen = ida_ua.decode_insn(insn, ea) 350 | op1 = insn.ops[0] 351 | op2 = insn.ops[1] 352 | op3 = insn.ops[2] 353 | op4 = insn.ops[3] 354 | 355 | if any(mnem.startswith(word) for word in ["MOV", "LDR"]): 356 | assert op2 is not None 357 | 358 | # first argument should be reg_name 359 | if get_reg(op1) != reg_name: 360 | return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset) 361 | 362 | # follow new register 363 | if op2.type == ida_ua.o_reg: 364 | if get_reg(op2) == "SP": 365 | offset = 0 366 | return self.proceed_backward( 367 | ea, get_reg(op2), end_ea, end_cnt - 1, offset 368 | ) 369 | 370 | # in the stack. need to check when the value is stored 371 | # o_displ = [Base Reg + Index Reg + Displacement] 372 | elif op2.type == ida_ua.o_displ: 373 | values = self.proceed_backward( 374 | ea, get_reg(op2), end_ea, end_cnt - 1, op2.addr 375 | ) 376 | values = set(filter(lambda x: x, values)) 377 | if mnem.startswith("LDR"): 378 | return set(map(lambda x: ida_bytes.get_dword(x), values)) 379 | else: 380 | return values 381 | 382 | # reference the memory and get the value written in the memory 383 | elif op2.type == ida_ua.o_mem: 384 | # TODO: implement memory access 385 | 386 | # we assume that this memory is not initialized. 387 | if mnem.startswith("LDR"): 388 | return set([ida_bytes.get_dword(op2.addr)]) 389 | else: 390 | return set([op2.addr]) 391 | 392 | # immediate value, we get the value right away 393 | elif op2.type == ida_ua.o_imm: 394 | return set([op2.value]) 395 | 396 | elif op2.type == ida_ua.o_phrase: 397 | assert mnem.startswith("LDR") 398 | 399 | phrase_val = self.proceed_backward( 400 | ea, get_reg(op3), end_ea, end_cnt - 1, offset 401 | ) 402 | if not phrase_val: 403 | return 404 | 405 | op2_val = self.proceed_backward( 406 | ea, get_reg(op2), end_ea, end_cnt - 1, offset 407 | ) 408 | if not op2_val: 409 | return 410 | 411 | operator = lambda x1, x2: x1 + x2 412 | values = merge_op_vals(op2_val, phrase_val, operator) 413 | 414 | return set(map(lambda x: ida_bytes.get_dword(x + op2.phrase), values)) 415 | 416 | return 417 | 418 | # only checks stored stacks 419 | elif any(mnem.startswith(word) for word in ["STR"]): 420 | assert op2 is not None 421 | 422 | if op3 and op3.type != ida_ua.o_void: 423 | target_op = op3 424 | else: 425 | target_op = op2 426 | 427 | # arguments should include reg_name 428 | if get_reg(target_op) != reg_name: 429 | return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset) 430 | 431 | # in the stack. need to check when the value is stored 432 | # o_displ = [Base Reg + Index Reg + Displacement] 433 | if target_op.type == ida_ua.o_displ: 434 | target_memory = self.stack 435 | 436 | # we assume that memory is not initialized. 437 | # reference the memory and get the value written in the memory 438 | elif target_op.type == ida_ua.o_mem: 439 | assert get_reg(target_op) != "SP" 440 | target_memory = self.memory 441 | 442 | else: 443 | return 444 | 445 | if target_op == op2: 446 | if target_op.addr == offset: 447 | self.stack[target_op.addr] = self.proceed_backward( 448 | ea, get_reg(op1), end_ea, end_cnt - 1 449 | ) 450 | return self.stack[target_op.addr] 451 | else: 452 | if target_op.addr == offset: 453 | self.stack[target_op.addr] = self.proceed_backward( 454 | ea, get_reg(op1), end_ea, end_cnt - 1 455 | ) 456 | return self.stack[target_op.addr] 457 | elif target_op.addr + 4 == offset: 458 | self.stack[target_op.addr + 4] = self.proceed_backward( 459 | ea, get_reg(op2), end_ea, end_cnt - 1 460 | ) 461 | return self.stack[target_op.addr + 4] 462 | 463 | return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset) 464 | 465 | elif any(mnem.startswith(word) for word in ["ADD", "SUB", "MUL"]): 466 | assert op2 is not None 467 | 468 | if mnem.startswith("ADD"): 469 | operator = lambda x1, x2: x1 + x2 470 | elif mnem.startswith("SUB"): 471 | operator = lambda x1, x2: x1 - x2 472 | elif mnem.startswith("MUL"): 473 | operator = lambda x1, x2: x1 * x2 474 | 475 | # TODO: Handle stack variable 476 | # Check how to follow below. 477 | # STR R5, [SP #8] 478 | # STR R4, [SP #4] 479 | # ADD R3, SP, #4 480 | # ADD R2, R3, #4 481 | if get_reg(op1) != reg_name: 482 | return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset) 483 | 484 | # Two arguments 485 | if not op3 or op3.type == ida_ua.o_void: 486 | if op2.type == ida_ua.o_reg: 487 | op1_val = self.proceed_backward( 488 | ea, reg_name, end_ea, end_cnt - 1, offset 489 | ) 490 | if not op1_val: 491 | return 492 | 493 | op2_val = self.proceed_backward( 494 | ea, get_reg(op2), end_ea, end_cnt - 1 495 | ) 496 | if not op2_val: 497 | return 498 | 499 | return merge_op_vals(op1_val, op2_val, operator) 500 | 501 | elif op2.type == ida_ua.o_imm: 502 | op1_val = self.proceed_backward( 503 | ea, reg_name, end_ea, end_cnt - 1, offset 504 | ) 505 | return set(map(lambda x: operator(x, op2.value), op1_val)) 506 | 507 | else: 508 | return 509 | 510 | if op2.type != ida_ua.o_reg: 511 | # This should not be reached. 512 | print(hex(ea), idc.GetDisasm(ea), reg_name, op2.type) 513 | assert False 514 | 515 | # More than three arguments 516 | # follow new register 517 | # ADD R0, R1, R2 518 | if op3.type == ida_ua.o_reg: 519 | op2_val = self.proceed_backward( 520 | ea, get_reg(op2), end_ea, end_cnt - 1, offset 521 | ) 522 | # if we cannot fetch the value, stop the analysis 523 | if not op2_val: 524 | return 525 | 526 | op3_val = self.proceed_backward( 527 | ea, get_reg(op3), end_ea, end_cnt - 1, offset 528 | ) 529 | # if we cannot fetch the value, stop the analysis 530 | if not op3_val: 531 | return 532 | 533 | # MLA R0, R1, R2, R3 534 | if op4 and op4.type == ida_ua.o_reg: 535 | op4_val = self.proceed_backward( 536 | ea, get_reg(op4), end_ea, end_cnt - 1, offset 537 | ) 538 | if not op4_val: 539 | return 540 | 541 | return merge_op_vals( 542 | merge_op_vals(op2_val, op3_val, operator), op4_val, operator 543 | ) 544 | 545 | return merge_op_vals(op2_val, op3_val, operator) 546 | 547 | # immediate value, we get the value right away 548 | # ADD R0, R1, #123 549 | elif op3.type == ida_ua.o_imm: 550 | return self.proceed_backward( 551 | ea, get_reg(op2), end_ea, end_cnt - 1, operator(0, op3.value) 552 | ) 553 | 554 | # ADD R0, R1, R2,LSL#2 555 | # o_idaspec0~5 556 | elif op3.type == ida_ua.o_idpspec0: 557 | # processor specific type 'LSL' 558 | op3_val = self.proceed_backward( 559 | ea, get_reg(op3), end_ea, end_cnt - 1, offset 560 | ) 561 | # if we cannot fetch the value, stop the analysis 562 | if not op3_val: 563 | return 564 | op3_val = set(map(lambda x: x << op3.value, op3_val)) 565 | op2_val = self.proceed_backward( 566 | ea, get_reg(op2), end_ea, end_cnt - 1, offset 567 | ) 568 | 569 | return merge_op_vals(op2_val, op3_val, operator) 570 | 571 | else: 572 | return 573 | 574 | else: 575 | return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset) 576 | 577 | def proceed_backward(self, ea, reg_name, end_ea, end_cnt, offset=None): 578 | # initialize prev code points 579 | values = set() 580 | xref = ida_xref.get_first_cref_to(ea) 581 | while xref and xref != idc.BADADDR: 582 | tmp_values = self.find_reg_value_helper( 583 | xref, reg_name, end_ea, end_cnt, offset 584 | ) 585 | if tmp_values: 586 | tmp_values = list(map(lambda x: x & 0xFFFFFFFF, tmp_values)) 587 | values.update(tmp_values) 588 | 589 | xref = ida_xref.get_next_cref_to(ea, xref) 590 | 591 | self.values[(ea, reg_name)] = values 592 | 593 | return values 594 | 595 | 596 | def find_mnem(target_ea, target_mnem, backward=False, threshold=0x100): 597 | assert target_ea is not None 598 | ea = target_ea 599 | addr = None 600 | visited = set() 601 | while True: 602 | mnem = ida_ua.ua_mnem(ea) 603 | if not ida_ua.can_decode(ea) or not mnem: 604 | break 605 | if mnem == target_mnem: 606 | addr = ea 607 | break 608 | 609 | visited.add(ea) 610 | 611 | if backward: 612 | next_ea = ida_xref.get_first_cref_to(ea) 613 | if next_ea < target_ea - 0x100: 614 | break 615 | 616 | while next_ea != idc.BADADDR and next_ea in visited: 617 | next_ea = ida_xref.get_next_cref_to(ea, next_ea) 618 | 619 | else: 620 | next_ea = ida_xref.get_first_cref_from(ea) 621 | if next_ea > target_ea + 0x100: 622 | break 623 | 624 | while next_ea != idc.BADADDR and next_ea in visited: 625 | next_ea = ida_xref.get_next_cref_from(ea, next_ea) 626 | 627 | ea = next_ea 628 | 629 | return addr 630 | 631 | 632 | def fetch_arg_one(ea, reg_name, end_ea=idc.BADADDR, end_cnt=100): 633 | slicer = SimpleBackwardSlicer() 634 | values = slicer.find_reg_value(ea, reg_name, end_ea=end_ea, end_cnt=end_cnt) 635 | if not values: 636 | return idc.BADADDR 637 | 638 | return values.pop() 639 | 640 | 641 | def find_args(ea, num_regs, limit=10): 642 | registers = ["R%d" % (i) for i in range(num_regs)] 643 | return [fetch_arg_one(ea, reg, end_cnt=limit) for reg in registers] 644 | -------------------------------------------------------------------------------- /basespec/spec/24008-f80.doc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/24008-f80.doc -------------------------------------------------------------------------------- /basespec/spec/24011-f30.doc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/24011-f30.doc -------------------------------------------------------------------------------- /basespec/spec/24080-f10.doc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/24080-f10.doc -------------------------------------------------------------------------------- /basespec/spec/24301-f80.doc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/24301-f80.doc -------------------------------------------------------------------------------- /basespec/spec/44018-f50.doc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/44018-f50.doc -------------------------------------------------------------------------------- /basespec/spec/ts_124008v150800p.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/ts_124008v150800p.pdf -------------------------------------------------------------------------------- /basespec/spec/ts_124301v150800p.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/ts_124301v150800p.pdf -------------------------------------------------------------------------------- /basespec/structs/l3msg.py: -------------------------------------------------------------------------------- 1 | class IeInfo: 2 | def __init__(self, msg_type, name, iei, min, max, imperative): 3 | self.type = msg_type 4 | self.name = name 5 | self.iei = iei 6 | self.min = min 7 | self.max = max 8 | self.imperative = imperative 9 | 10 | def __repr__(self): 11 | if self.imperative: 12 | imper = 'imperative' 13 | else: 14 | imper = 'non-imperative' 15 | res = " 30: 94 | return False 95 | 96 | if s.upper() == s: 97 | return False 98 | 99 | if any(ch not in FUNCNAME_CHARS for ch in s): 100 | return False 101 | 102 | # TODO: add other func name checks 103 | return True 104 | 105 | 106 | # deprecated. 107 | def set_funcname(ea, name): 108 | func_addr = idc.get_func_attr(ea, idc.FUNCATTR_START) 109 | if func_addr == idc.BADADDR: 110 | return 111 | return set_entry_name(func_addr, name) 112 | 113 | 114 | def set_entry_name(ea, name): 115 | cur_name = idc.get_name(ea) 116 | if cur_name.startswith(name): 117 | return cur_name 118 | 119 | name = check_name(name) 120 | status = idc.set_name(ea, name) 121 | if status: 122 | return name 123 | else: 124 | return 125 | 126 | 127 | def is_name_exist(name): 128 | addr = idc.get_name_ea_simple(name) 129 | # if name already exists, we need to assign new name with suffix 130 | if addr != idc.BADADDR: 131 | return True 132 | else: 133 | return False 134 | 135 | 136 | def check_name(orig_name): 137 | name = orig_name 138 | idx = 1 139 | while is_name_exist(name): 140 | name = "%s_%d" % (orig_name, idx) 141 | idx += 1 142 | 143 | return name 144 | 145 | def is_func(ea): 146 | if ea == idc.BADADDR: 147 | return False 148 | 149 | start_ea = idc.get_func_attr(ea, idc.FUNCATTR_START) 150 | end_ea = idc.get_func_attr(ea, idc.FUNCATTR_END) 151 | 152 | return start_ea <= ea < end_ea 153 | 154 | 155 | def is_thumb(ea): 156 | return idc.get_sreg(ea, "T") == 1 157 | 158 | -------------------------------------------------------------------------------- /examples/ex_check_spec.py: -------------------------------------------------------------------------------- 1 | from basespec.analyze_spec import check_spec 2 | from basespec.structs.l3msg import IeInfo, L3MsgInfo, L3ProtInfo 3 | 4 | # EMM protocol 5 | pd = 7 6 | 7 | # EMM attach accept message 8 | msg_type = 0x42 9 | 10 | # Build a message 11 | # The information should be extracted from embedded message structures in the binary. 12 | IE_list = [] 13 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True)) 14 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True)) 15 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True)) 16 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=6, max=96, imperative=True)) 17 | #IE_list.append(IeInfo(msg_type, name="", iei=0, min=0, max=32767, imperative=True)) #missing 18 | IE_list.append(IeInfo(msg_type, name="", iei=0x50, min=11, max=11, imperative=False)) 19 | IE_list.append(IeInfo(msg_type, name="", iei=0x13, min=5, max=5, imperative=False)) 20 | IE_list.append(IeInfo(msg_type, name="", iei=0x23, min=5, max=8, imperative=False)) 21 | IE_list.append(IeInfo(msg_type, name="", iei=0x53, min=1, max=1, imperative=False)) 22 | IE_list.append(IeInfo(msg_type, name="", iei=0x4A, min=1, max=99, imperative=False)) #invalid 23 | IE_list.append(IeInfo(msg_type, name="", iei=0xFF, min=5, max=5, imperative=False)) #unknown 24 | attach_accept_msg = L3MsgInfo(pd, msg_type, name="Attach accept", direction="DL", ie_list=IE_list) 25 | 26 | # Build protocol 27 | EMM_prot = L3ProtInfo(pd, [attach_accept_msg]) 28 | 29 | l3_list = [EMM_prot] 30 | 31 | # Compare with specification 32 | check_spec(l3_list, pd) 33 | -------------------------------------------------------------------------------- /examples/ex_get_spec_msgs.py: -------------------------------------------------------------------------------- 1 | from basespec import parse_spec 2 | spec_msgs = parse_spec.get_spec_msgs() # Format: msgs[pd][msg_type] = ie_list 3 | emm_msgs = spec_msgs[7] # 7 : the type of EPS Mobility Management 4 | smc_ie_list = emm_msgs[0x5d] # 0x5d : the type of SECURITY MODE COMMAND 5 | -------------------------------------------------------------------------------- /examples/ex_init_functions.py: -------------------------------------------------------------------------------- 1 | from basespec import preprocess 2 | preprocess.init_functions() 3 | preprocess.FUNC_BY_LS # identified functions by linear sweep prologue detection 4 | preprocess.FUNC_BY_LS_TIME # time spent for linear sweep prologue detection 5 | preprocess.FUNC_BY_PTR # identified functions by pointer analysis 6 | preprocess.FUNC_BY_PTR_TIME # time spent for pointer analysis 7 | -------------------------------------------------------------------------------- /examples/ex_init_strings.py: -------------------------------------------------------------------------------- 1 | from basespec import preprocess 2 | preprocess.init_strings() 3 | -------------------------------------------------------------------------------- /examples/ex_run_scatterload.py: -------------------------------------------------------------------------------- 1 | from basespec import scatterload 2 | scatterload.run_scatterload() 3 | -------------------------------------------------------------------------------- /load_ida.py: -------------------------------------------------------------------------------- 1 | #need to import here for IDA 2 | import basespec 3 | -------------------------------------------------------------------------------- /overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/overview.png --------------------------------------------------------------------------------