├── .gitignore
├── LICENSE.txt
├── README.md
├── basespec
    ├── __init__.py
    ├── analyze_spec.py
    ├── parse_spec.py
    ├── preprocess.py
    ├── scatterload.py
    ├── slicer.py
    ├── spec
    │   ├── 24008-f80.doc
    │   ├── 24008-f80.txt
    │   ├── 24011-f30.doc
    │   ├── 24011-f30.txt
    │   ├── 24080-f10.doc
    │   ├── 24080-f10.txt
    │   ├── 24301-f80.doc
    │   ├── 24301-f80.txt
    │   ├── 44018-f50.doc
    │   ├── 44018-f50.txt
    │   ├── ts_124008v150800p.pdf
    │   ├── ts_124008v150800p.txt
    │   ├── ts_124301v150800p.pdf
    │   └── ts_124301v150800p.txt
    ├── structs
    │   └── l3msg.py
    └── utils.py
├── examples
    ├── ex_check_spec.py
    ├── ex_get_spec_msgs.py
    ├── ex_init_functions.py
    ├── ex_init_strings.py
    └── ex_run_scatterload.py
├── load_ida.py
└── overview.png


/.gitignore:
--------------------------------------------------------------------------------
1 | .*.swp
2 | *~
3 | cache
4 | __pycache__
5 | *.pickle
6 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Dongkwan Kim and Eunsoo Kim
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 6 | this software and associated documentation files (the "Software"), to deal in
 7 | the Software without restriction, including without limitation the rights to
 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 9 | the Software, and to permit persons to whom the Software is furnished to do so,
10 | subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Description
  2 | BaseSpec is a system that performs a comparative analysis of baseband
  3 | implementation and the specifications of cellular networks. The key intuition of
  4 | BaseSpec is that a message decoder in baseband software embeds the protocol
  5 | specification in a machine-friendly structure to parse incoming messages; hence,
  6 | the embedded protocol structures can be easily extracted and compared with the
  7 | specification. This enables BaseSpec to automate the comparison process and
  8 | explicitly discover mismatches in the protocol implementation, which are
  9 | non-compliant to the specification. These mismatches can directly pinpoint the
 10 | mistakes of developers when embedding the protocol structures or hint at
 11 | potential vulnerabilities.
 12 | 
 13 | ![BaseSpec Overview](./overview.png)
 14 | 
 15 | With BaseSpec, we analyzed the implementation of cellular standard L3 messages
 16 | in 18 baseband firmware images of 9 devices models from one of the top three
 17 | vendors. BaseSpec identified hundreds of mismatches that indicate both
 18 | functional errors and potentially vulnerable points. We investigated their
 19 | functional and security implications and discovered 9 erroneous cases affecting
 20 | 33 distinct messages: 5 of these cases are functional errors and 4 of them are
 21 | memory-related vulnerabilities. Notably, 2 of the vulnerabilities are critical
 22 | remote code execution (RCE) 0-days. We also applied BaseSpec to 3 models from a
 23 | different vendor in the top three. Through this analysis, BaseSpec identified
 24 | multiple mismatches, 2 of which led us to discover a buffer overflow bug.
 25 | 
 26 | For more details, please see [our
 27 | paper](https://syssec.kaist.ac.kr/pub/2021/kim-ndss2021.pdf).
 28 | 
 29 | - BaseSpec will be presented at [NDSS 2021](https://www.ndss-symposium.org/ndss-paper/basespec-comparative-analysis-of-baseband-software-and-cellular-specifications-for-l3-protocols/).
 30 | 
 31 | 
 32 | ## Disclaimer
 33 | The current release of BaseSpec **only includes the parts that are irrelevant to
 34 | the vendors**: preprocessing (i.e., memory layout analysis and function
 35 | identification), complementary specification parsing, and comparison.
 36 | 
 37 | We reported all findings to the two vendors; one strongly refuses to publish the
 38 | details, and the other has not responded to us yet. The one that refused,
 39 | particularly, concerned that complete patch deployment would take a long time
 40 | (over six months) because they should collaborate with each mobile carrier.
 41 | According to the vendor, they should request the patches to ~280 carriers to
 42 | update ~130 models globally. Due to this complexity, the vendor thinks that
 43 | numerous devices might remain unpatched and vulnerable to our bugs. We agree
 44 | with this and anonymize the vendor in the
 45 | [paper](https://syssec.kaist.ac.kr/pub/2021/kim-ndss2021.pdf).
 46 | 
 47 | 
 48 | # How to use
 49 | 
 50 | ### 0. Using BaseSpec in IDA Pro
 51 | BaseSpec contains python scripts based on IDA Pro APIs (IDAPython). To use
 52 | BaseSpec, first load the baseband firmware of interest into IDA Pro at the
 53 | correct locations, which may require parsing of vendor-specific firmware
 54 | file formats.
 55 | Then, import `load_ida.py` as a script file in IDA Pro (using Alt+F7).
 56 | 
 57 | 
 58 | ### 1. Preprocessing
 59 | For scatter-loading, use `basespec.scatterload` as below.
 60 | 
 61 | ```python
 62 | from basespec import scatterload
 63 | scatterload.run_scatterload()
 64 | ```
 65 | 
 66 | For function identification, use `basespec.preprocess` as below.
 67 | 
 68 | ```python
 69 | from basespec import preprocess
 70 | preprocess.init_functions()
 71 | preprocess.FUNC_BY_LS # identified functions by linear sweep prologue detection
 72 | preprocess.FUNC_BY_LS_TIME # time spent for linear sweep prologue detection
 73 | preprocess.FUNC_BY_PTR # identified functions by pointer analysis
 74 | preprocess.FUNC_BY_PTR_TIME # time spent for pointer analysis
 75 | ```
 76 | 
 77 | For string initialization, use `basespec.preprocess` as below.
 78 | 
 79 | ```python
 80 | from basespec import preprocess
 81 | preprocess.init_strings()
 82 | ```
 83 | 
 84 | 
 85 | ### 2. Specification parsing
 86 | 
 87 | You can fetch the dictionary containing all specification msgs by running as
 88 | below.
 89 | 
 90 | ```python
 91 | from basespec import parse_spec
 92 | spec_msgs = parse_spec.get_spec_msgs() # spec_msgs[nas_type][msg_type] = ie_list
 93 | ```
 94 | 
 95 | This `spec_msgs` dictionary contains a list of IEs for each message. Below is an
 96 | example to fetch the IE list of the EMM SECURITY MODE COMMAND message.
 97 | 
 98 | ```python
 99 | emm_msgs = spec_msgs[7] # 7 : the type of EPS Mobility Management
100 | smc_ie_list = emm_msgs[0x5d] # 0x5d : the type of SECURITY MODE COMMAND
101 | ```
102 | 
103 | 
104 | ### 3. Specification comparing
105 | 
106 | To compare the message structures in the specification and binary, you should
107 | first create the corresponding class instances. Below is an example to compare
108 | the IE list of the EMM ATTACH ACCEPT message
109 | ([`examples/ex_check_spec.py`](./examples/ex_check_spec.py)).
110 | 
111 | ```python
112 | from basespec.analyze_spec import check_spec
113 | from basespec.structs.l3msg import IeInfo, L3MsgInfo, L3ProtInfo
114 | 
115 | # EMM protocol
116 | pd = 7
117 | 
118 | # EMM attach accept message
119 | msg_type = 0x42
120 | 
121 | # Build a message
122 | # The information should be extracted from embedded message structures in the binary.
123 | IE_list = []
124 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True))
125 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True))
126 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True))
127 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=6, max=96, imperative=True))
128 | #IE_list.append(IeInfo(msg_type, name="", iei=0, min=0, max=32767, imperative=True)) #missing
129 | IE_list.append(IeInfo(msg_type, name="", iei=0x50, min=11, max=11, imperative=False))
130 | IE_list.append(IeInfo(msg_type, name="", iei=0x13, min=5, max=5, imperative=False))
131 | IE_list.append(IeInfo(msg_type, name="", iei=0x23, min=5, max=8, imperative=False))
132 | IE_list.append(IeInfo(msg_type, name="", iei=0x53, min=1, max=1, imperative=False))
133 | IE_list.append(IeInfo(msg_type, name="", iei=0x4A, min=1, max=99, imperative=False)) #invalid
134 | IE_list.append(IeInfo(msg_type, name="", iei=0xFF, min=5, max=5, imperative=False)) #unknown
135 | attach_accept_msg = L3MsgInfo(pd, msg_type, name="Attach accept", direction="DL", ie_list=IE_list)
136 | 
137 | # Build protocol
138 | EMM_prot = L3ProtInfo(pd, [attach_accept_msg])
139 | 
140 | l3_list = [EMM_prot]
141 | 
142 | # Compare with specification
143 | check_spec(l3_list, pd)
144 | ```
145 | 
146 | This returns the mismatch results in a CSV format. Below is a part of the output
147 | in a CSV table format.
148 | 
149 | |IE Name|Reference|Spec IEI|Spec Presence|Spec Format|Spec Length|Bin IEI|Bin Imperative|Bin Length|Bin Idx|Error 1|Error 2|
150 | |---|---|---|---|---|---|---|---|---|---|---|---|
151 | |EPS attach result|EPS attach result||M|V|1/2|00|True|1|0x42|
152 | |Spare half octet|Spare half octet||M|V|1/2|00|True|1|0x42|
153 | |T3412 value|GPRS timer||M|V|1|00|True|1|0x42|
154 | |TAI list|Tracking area identity list||M|LV|7-97|00|True|7-97|0x42|
155 | |GUTI|EPS mobile identity|50|O|TLV|13|50|False|13|0x42|
156 | |Location area identification|Location area identification|13|O|TV|6|13|False|6|0x42|
157 | |MS identity|Mobile identity|23|O|TLV|7-10|23|False|7-10|0x42|
158 | |EMM cause|EMM cause|53|O|TV|2|53|False|2|0x42|
159 | |Equivalent PLMNs|PLMN list|4A|O|TLV|5-47|4A|False|3-101|0x42| non-imperative invalid mismatch (min length)| non-imperative invalid mismatch (max length)|
160 | |-|-|-|-|-|-|FF|False|5|0x42|non-imperative unknown mismatch|
161 | |ESM message container|ESM message container||M|LV-E|5-n|-|-|-|-|imperative missing mismatch|
162 | |T3402 value|GPRS timer|17|O|TV|2|-|-|-|-|non-imperative missing mismatch|
163 | |T3423 value|GPRS timer|59|O|TV|2|-|-|-|-|non-imperative missing mismatch|
164 | | ... |
165 | 
166 | 
167 | # Issues
168 | 
169 | ### Tested environment
170 | We ran all our experiments on a machine equipped with an Intel Core I7-6700K CPU
171 | at 4.00 GHz and 64 GB DDR4 RAM. We setup Windows 10 Pro, IDA Pro v7.4, and
172 | Python 3.7.6 on the machine.
173 | 
174 | For converting the doc and pdf files, we ran it on a Linux machine.
175 | Please check [this function](./basespec/parse_spec.py#L15).
176 | 
177 | 
178 | # Authors
179 | This project has been conducted by the below authors at KAIST.
180 | * [Eunsoo Kim](https://hahah.kim) (These two authors contributed equally.)
181 | * [Dongkwan Kim](https://0xdkay.me/) (These two authors contributed equally.)
182 | * [CheolJun Park](https://unrloay2.github.io/)
183 | * [Insu Yun](https://insuyun.github.io/)
184 | * [Yongdae Kim](https://syssec.kaist.ac.kr/~yongdaek/)
185 | 
186 | 
187 | # Citation
188 | We would appreciate if you consider citing [our
189 | paper](https://syssec.kaist.ac.kr/pub/2021/kim-ndss2021.pdf).
190 | ```bibtex
191 | @article{kim:2021:basespec,
192 |   author = {Eunsoo Kim and Dongkwan Kim and CheolJun Park and Insu Yun and Yongdae Kim},
193 |   title = {{BaseSpec}: Comparative Analysis of Baseband Software and Cellular Specifications for L3 Protocols},
194 |   booktitle = {Proceedings of the 2021 Annual Network and Distributed System Security Symposium (NDSS)},
195 |   year = 2021,
196 |   month = feb,
197 |   address = {Online}
198 | }
199 | ```
200 | 


--------------------------------------------------------------------------------
/basespec/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/__init__.py


--------------------------------------------------------------------------------
/basespec/analyze_spec.py:
--------------------------------------------------------------------------------
  1 | import time
  2 | import itertools
  3 | 
  4 | from .parse_spec import get_spec_msgs
  5 | 
  6 | 
  7 | def flatten(l):
  8 |     return list(itertools.chain.from_iterable(l))
  9 | 
 10 | 
 11 | def find_ref(spec_map, target_ref, target_pd=None):
 12 |     for pd, vals in spec_map.items():
 13 |         if target_pd and pd != target_pd:
 14 |             continue
 15 | 
 16 |         for msg_type, args in vals.items():
 17 |             class_name, sub_class_name, msg_name, ie_list = args
 18 |             for idx, ie in enumerate(ie_list):
 19 |                 ie_name, iei, ref, presence, ie_format, length = ie
 20 |                 if target_ref in ref.lower():
 21 |                     print(
 22 |                         "NasProt[{}] {} -> {}, {}".format(
 23 |                             pd, class_name, sub_class_name, msg_name
 24 |                         )
 25 |                     )
 26 |                     print(
 27 |                         "    [{}] {}: iei: {}, presence: {}, format: {}, length: {}".format(
 28 |                             idx, ie_name, iei, presence, ie_format, length
 29 |                         )
 30 |                     )
 31 | 
 32 | 
 33 | def print_ie_list(spec_ie_list):
 34 |     for idx, ie in enumerate(spec_ie_list):
 35 |         ie_name, iei1, ref, spec_presence, spec_format, spec_length = ie
 36 |         print("[{}] ({}) {} {} {}".format(idx, iei1, ie_name, spec_format, spec_length))
 37 | 
 38 | 
 39 | def sanitize_iei(iei):
 40 |     if iei:
 41 |         iei = iei.strip("-")
 42 |         iei = iei.strip("x")
 43 |         if iei == "TBC":
 44 |             iei = 0
 45 |         elif iei:
 46 |             iei = int(iei, 16)
 47 |         else:
 48 |             iei = 0
 49 |     else:
 50 |         iei = 0
 51 | 
 52 |     return iei
 53 | 
 54 | 
 55 | def sanitize_len(len1, len2):
 56 |     if len1 == len2:
 57 |         msg_length = "{}".format(len1)
 58 |     else:
 59 |         msg_length = "{}-{}".format(len1, len2)
 60 | 
 61 |     return msg_length
 62 | 
 63 | 
 64 | def apply_li(spec_ie, bin_ie):
 65 |     ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie
 66 |     len2_min = bin_ie.min
 67 |     len2_max = bin_ie.max
 68 | 
 69 |     # ================================================
 70 |     # Convert Length
 71 |     # ================================================
 72 |     # For the length of the specification, the length of IEI, LI is already
 73 |     # included in the length. However, for those having length 'n', it is not
 74 |     # included. Thus, we calculate them first.
 75 |     if "-" in spec_length:
 76 |         len1_min, len1_max = spec_length.split("-")
 77 |         if len1_max == "?":
 78 |             len1_max = "n"
 79 |         len1_min = int(len1_min)
 80 |         if len1_max == "n":
 81 |             if "-E" in spec_format:
 82 |                 len1_max = 2 ** 16 - 1
 83 |             else:
 84 |                 len1_max = 2 ** 8 - 1
 85 | 
 86 |             if "T" in spec_format:
 87 |                 len1_max += 1
 88 |             if "L" in spec_format:
 89 |                 len1_max += 1
 90 |             if "-E" in spec_format:
 91 |                 len1_max += 1
 92 |         else:
 93 |             len1_max = int(len1_max)
 94 | 
 95 |     elif spec_length == "1/2":
 96 |         len1_min = 1
 97 |         len1_max = 1
 98 | 
 99 |     else:
100 |         len1_min = int(spec_length)
101 |         len1_max = int(spec_length)
102 | 
103 |     if not (spec_format == "TV" and spec_length == "1"):
104 |         if "T" in spec_format:
105 |             len2_min += 1
106 |             len2_max += 1
107 |         if "L" in spec_format:
108 |             len2_min += 1
109 |             len2_max += 1
110 |         if "-E" in spec_format:
111 |             len2_min += 1
112 |             len2_max += 1
113 | 
114 |     return len1_min, len1_max, len2_min, len2_max
115 | 
116 | 
117 | def compare_ie(spec_ie, bin_ie):
118 |     ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie
119 |     spec_presence = spec_presence.split()[0]
120 |     iei1 = sanitize_iei(iei1)
121 |     iei2 = bin_ie.iei
122 |     len1_min, len1_max, len2_min, len2_max = apply_li(spec_ie, bin_ie)
123 | 
124 |     bug_str = ""
125 |     if bin_ie.imperative:
126 |         ie_type = 'imperative'
127 |     else:
128 |         ie_type = 'non-imperative'
129 | 
130 |     # ================================================
131 |     # Check format (LI, Length Indicator) and length
132 |     # ================================================
133 |     # If a length is specified,
134 |     if len1_min != len2_min:
135 |         bug_str += ", {} invalid mismatch (min length)".format(ie_type)
136 |     if len1_max != len2_max:
137 |         bug_str += ", {} invalid mismatch (max length)".format(ie_type)
138 | 
139 |     return bug_str
140 | 
141 | 
142 | def compare_ie_list(pd, msg, spec_msg):
143 |     s = ""
144 |     class_name, sub_class_name, msg_name, spec_direction, spec_ie_list = spec_msg
145 |     # check direction
146 |     if (msg.direction != spec_direction) and (spec_direction != 'both'):
147 |         return s
148 | 
149 |     # skip common IEs that do not exist in the baseband firmware
150 |     l = 0
151 |     for idx, ie in enumerate(spec_ie_list):
152 |         ie_name, iei, ref, presence, ie_format, length = ie
153 |         if "message type" in ref.lower() or "message type" in ie_name.lower():
154 |             break
155 | 
156 |     # sms (9) -> nested element exists
157 |     if pd == 9 and msg.type not in [4, 16]:
158 |         spec_ie_list = spec_ie_list
159 |     else:
160 |         spec_ie_list = spec_ie_list[idx + 1 :]
161 | 
162 |     bin_ie_list = msg.ie_list
163 |     # Filter if bin_ie_list is not implemented yet.
164 |     if len(spec_ie_list) > 0 and len(bin_ie_list) == 0:
165 |         s += "0x{0:x} ({0}) {1} not implemented in pd {2}".format(
166 |             msg.type, msg_name, pd
167 |         )
168 |         return s
169 | 
170 |     # ================================================
171 |     # First, we divide IE list of the specification to imperatives and
172 |     # non-imperatives.
173 |     # ================================================
174 |     bug_flag = False
175 |     imperatives = []
176 |     nonimperatives = {}
177 |     for idx, ie in enumerate(spec_ie_list):
178 |         ie_name, iei1, ref, spec_presence, spec_format, spec_length = ie
179 |         bug_str = ""
180 | 
181 |         # Check spec length
182 |         if spec_length == "1/2" and spec_format not in ["V", "TV"]:
183 |             bug_str += ",spec length error"
184 | 
185 |         # Check presence of the specification
186 |         if (
187 |             "M" not in spec_presence
188 |             and "O" not in spec_presence
189 |             and "C" not in spec_presence
190 |         ):
191 |             bug_str += ",spec presence error"
192 | 
193 |         # In TS 24.007, 11.2.5 Presence requirements of information elements,
194 |         # only IEs belonging to non-imperative part of a message may have
195 |         # presence requirement C. However, we find special case for conditional
196 |         # IE implementation. That is, there is a message that has an imperative
197 |         # part having IEs of the "C" presence.
198 |         #
199 |         # Protocol: Radio Resource management (PD: 6)
200 |         # Message: IMMEDIATE ASSIGNMENT (DL) (Message Type: 0x3f)
201 |         if pd == 6 and msg.type == 0x3F:
202 |             if "C" in spec_presence and sanitize_iei(iei1) == 0:
203 |                 if 'packet channel description' in ie_name.lower():
204 |                     continue
205 |                 ie = ie_name, iei1, ref, spec_presence, spec_format, spec_length
206 | 
207 |         ie_name = ie_name.replace(",", "")
208 |         # imperative / non-imperative is defined by iei, not presence.
209 |         if sanitize_iei(iei1) == 0 and "O" not in spec_presence:
210 |             if "T" in spec_format or "O" in spec_presence:
211 |                 bug_str += ",spec format error"
212 | 
213 |             imperatives.append(ie)
214 | 
215 |         else:
216 |             if "T" not in spec_format:
217 |                 bug_str += ",spec non-imperative format error"
218 | 
219 |             if sanitize_iei(iei1) in nonimperatives:
220 |                 # ts 24.007, 11.2.4
221 |                 # A message may contain two or more IEs with equal IEI. Two IEs
222 |                 # with the same IEI in a same message must have
223 |                 # 1) the same format,
224 |                 # 2) when of type 3, the same length.
225 |                 # More generally, care should be taken not to introduce
226 |                 # ambiguities by using an IEI for two purposes. Ambiguities
227 |                 # appear in particular when two IEs potentially immediately
228 |                 # successive have the same IEI but different meanings and when
229 |                 # both are non-mandatory. As a recommended design rule,
230 |                 # messages should contain a single IE of a given IEI.
231 |                 ie2 = nonimperatives[sanitize_iei(iei1)]
232 |                 ie_name2, iei2, ref2, spec_presence2, spec_format2, spec_length2 = ie2
233 |                 if spec_format != spec_format2:  # same format
234 |                     bug_str += ",spec non-imperative same iei format error"
235 | 
236 |                 elif spec_length != spec_length2:  # Type 3
237 |                     assert "-" not in spec_length
238 |                     bug_str += ",spec non-imperative same iei length error"
239 | 
240 |             else:
241 |                 nonimperatives[sanitize_iei(iei1)] = ie
242 | 
243 |         if bug_str:
244 |             s += "{},{},".format(ie_name, ref)
245 |             s += "{},{},{},{},".format(iei1, spec_presence, spec_format, spec_length)
246 |             s += "-,-,-,-,-"
247 |             s += "," + bug_str.lstrip(",") + "\n"
248 |             bug_flag = True
249 | 
250 |     # To separate errors from the spec and the baseband binary.
251 |     if s:
252 |         s += "-" * 20 + "\n"
253 | 
254 |     # ================================================
255 |     # Now we check the IE list in the baseband binary and compare them with the
256 |     # IE list from the spec.
257 |     # ================================================
258 |     iei_done = set()
259 | 
260 |     for idx, bin_ie in enumerate(bin_ie_list):
261 |         bug_str = ""
262 | 
263 |         # Fetch corresponding IE from specification
264 |         spec_ie = None
265 | 
266 |         # The implementation has two rules for representing imperatives.
267 |         # Developers may misunderstood the specification?
268 |         if bin_ie.imperative:
269 |             if imperatives:
270 |                 spec_ie = imperatives.pop(0)
271 | 
272 |             if spec_ie:
273 |                 ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie
274 |                 ie_name = ie_name.replace(",", "")
275 |                 len1_min, len1_max, len2_min, len2_max = apply_li(spec_ie, bin_ie)
276 | 
277 |                 # skipped 1/2 length in binary implementation
278 |                 if spec_length == "1/2" and (len2_min != len2_max or len2_max != 1):
279 |                     s += "{},{},".format(ie_name, ref)
280 |                     s += "{},{},{},{},".format(
281 |                         iei1, spec_presence, spec_format, spec_length
282 |                     )
283 |                     s += "-,-,-,-"
284 |                     if "spare half" in ref.lower():
285 |                         s += ",(skipped spare half)"
286 |                     else:
287 |                         s += ",imperative missing mismatch (skipped 1/2)"
288 |                         bug_flag = True
289 |                     s += "\n"
290 | 
291 |                     if imperatives:
292 |                         spec_ie = imperatives.pop(0)
293 |                     else:
294 |                         spec_ie = None
295 | 
296 |         else:
297 |             if bin_ie.iei in nonimperatives:
298 |                 spec_ie = nonimperatives[bin_ie.iei]
299 |                 iei1 = spec_ie[1]
300 |                 len1_min, len1_max, len2_min, len2_max = apply_li(spec_ie, bin_ie)
301 | 
302 |                 # This is check for IE type1 having a 4-bit IEI and 4-bit value
303 |                 if iei1.endswith("-") and (len2_min != len2_max or len2_max != 1):
304 |                     spec_ie = None
305 |                 else:
306 |                     iei_done.add(bin_ie.iei)
307 | 
308 |         if spec_ie:
309 |             ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie
310 |             ie_name = ie_name.replace(",", "")
311 |             len1_min, len1_max, len2_min, len2_max = apply_li(spec_ie, bin_ie)
312 |             if "/" not in spec_length:
313 |                 spec_length = sanitize_len(len1_min, len1_max)
314 |             msg_length = sanitize_len(len2_min, len2_max)
315 |             s += "{},{},".format(ie_name, ref)
316 |             s += "{},{},{},{},".format(iei1, spec_presence, spec_format, spec_length)
317 |             s += "{:02X},{},{},0x{:X}".format(
318 |                 bin_ie.iei, bin_ie.imperative, msg_length, bin_ie.type
319 |             )
320 |             bug_str = compare_ie(spec_ie, bin_ie)
321 | 
322 |         else:
323 |             msg_length = sanitize_len(bin_ie.min, bin_ie.max)
324 |             s += "-,-,"
325 |             s += "-,-,-,-,"
326 |             s += "{:02X},{},{},0x{:X}".format(
327 |                 bin_ie.iei, bin_ie.imperative, msg_length, bin_ie.type
328 |             )
329 |             if bin_ie.imperative:
330 |                 bug_str = "imperative unknown mismatch"
331 |             else:
332 |                 bug_str = "non-imperative unknown mismatch"
333 | 
334 |         if bug_str:
335 |             s += "," + bug_str.lstrip(",")
336 |             bug_flag = True
337 |         s += "\n"
338 | 
339 |     # ================================================
340 |     # Check leftovers
341 |     # ================================================
342 |     for spec_ie in imperatives:
343 |         ie_name, iei1, ref, spec_presence, spec_format, spec_length = spec_ie
344 |         ie_name = ie_name.replace(",", "")
345 |         s += "{},{},".format(ie_name, ref)
346 |         s += "{},{},{},{},".format(iei1, spec_presence, spec_format, spec_length)
347 |         s += "-,-,-,-"
348 |         if "spare half" in ref.lower():
349 |             s += ",(skipped spare half)"
350 |             s += "\n"
351 |         else:
352 |             s += ",imperative missing mismatch"
353 |             s += "\n"
354 |             bug_flag = True
355 | 
356 |     for iei in nonimperatives:
357 |         if iei not in iei_done:
358 |             ie2 = nonimperatives[iei]
359 |             ie_name, iei1, ref, spec_presence, spec_format, spec_length = ie2
360 |             ie_name = ie_name.replace(",", "")
361 |             s += "{},{},".format(ie_name, ref)
362 |             s += "{},{},{},{},".format(iei1, spec_presence, spec_format, spec_length)
363 |             s += "-,-,-,-"
364 |             s += ",non-imperative missing mismatch"
365 |             s += "\n"
366 |             bug_flag = True
367 | 
368 |     if not bug_flag:
369 |         s = ""
370 | 
371 |     return s
372 | 
373 | 
374 | def check_numbers(l3_msgs):
375 |     total_msgs = 0
376 |     total_ies = 0
377 |     total_iies = 0
378 |     for pd, prot in enumerate(l3_msgs):
379 |         if len(prot.msg_list) == 0:
380 |             continue
381 | 
382 |         if pd > 12:
383 |             break
384 | 
385 |         msgs = l3_msgs[pd].msg_list
386 |         print("# of {} msgs: {}".format(pd, len(msgs)))
387 | 
388 |         ies = flatten(map(lambda x: x.ie_list, msgs))
389 |         iies = list(filter(lambda x: x.imperative, map(lambda x: x[0], ies)))
390 |         print("# of {} msg IEs: {}".format(pd, len(ies)))
391 |         print("# of {} msg imperative IEs: {}".format(pd, len(iies)))
392 | 
393 |         total_msgs += len(valid_msgs)
394 |         total_ies += len(ies)
395 |         total_iies += len(iies)
396 | 
397 |     print("# of total msgs: {}".format(total_msgs))
398 |     print("# of total IEs: {}".format(total_ies))
399 |     print("# of total imperative IEs: {}".format(total_iies))
400 | 
401 | 
402 | def check_spec(l3_msgs, target_pd=3):
403 |     '''
404 |     check_spec compares l3_msgs from binary with specification.
405 |     It prints comparision results (e.g., mismatches) in CSV format.
406 | 
407 |     :param l3_msgs: list of basespec.structs.l3msg.L3ProtInfo
408 |                     which is generated based on embedded message structure
409 |                     of the binary.
410 |     :param target_pd: the PD value to analyze.
411 |     '''
412 |     global spec_map
413 |     if "spec_map" not in globals():
414 |         spec_map = get_spec_msgs()
415 | 
416 |     for prot in l3_msgs:
417 |         pd = prot.pd
418 |         if len(prot.msg_list) == 0:
419 |             continue
420 | 
421 |         # For DEBUG
422 |         if pd != target_pd:
423 |             continue
424 | 
425 |         if pd not in spec_map:
426 |             continue
427 | 
428 |         prot_done = set()
429 |         for msg in prot.msg_list:
430 |             msg_type = msg.type
431 | 
432 |             if msg_type not in spec_map[pd]:
433 |                 print("=" * 20)
434 |                 print(
435 |                     "NasProt[{0}] msg_type 0x{1:x} ({1}) not in pd {0}".format(
436 |                         pd, msg_type
437 |                     )
438 |                 )
439 |                 continue
440 | 
441 |             # There may exist multiple messages from spec, so we analyze each
442 |             # message and pick the least buggy message.
443 |             spec_msgs = spec_map[pd][msg_type]
444 |             msg_results = {}
445 | 
446 |             for spec_msg in spec_msgs:
447 |                 class_name, sub_class_name, msg_name, msgs = spec_msg
448 |                 for direction, ie_list in msgs:
449 |                     spec_msg = (
450 |                         class_name,
451 |                         sub_class_name,
452 |                         msg_name,
453 |                         direction,
454 |                         ie_list,
455 |                     )
456 |                     prot_done.add(msg_type)
457 | 
458 |                     try:
459 |                         s = compare_ie_list(pd, msg, spec_msg).strip()
460 |                         s_cnt = s.count("error")
461 | 
462 |                         if direction in msg_results:
463 |                             if s_cnt < msg_results[direction][-1][-1]:
464 |                                 msg_results[direction] = (
465 |                                     class_name,
466 |                                     sub_class_name,
467 |                                     msg_name,
468 |                                     (s, s_cnt),
469 |                                 )
470 |                         else:
471 |                             msg_results[direction] = (
472 |                                 class_name,
473 |                                 sub_class_name,
474 |                                 msg_name,
475 |                                 (s, s_cnt),
476 |                             )
477 |                     except:
478 |                         import traceback
479 | 
480 |                         print(class_name, sub_class_name, msg_name, direction)
481 |                         traceback.print_exc()
482 | 
483 |             # Filter the least buggy message
484 |             for direction, (
485 |                 class_name,
486 |                 sub_class_name,
487 |                 msg_name,
488 |                 (s, s_cnt),
489 |             ) in msg_results.items():
490 |                 if s:
491 |                     print("=" * 20)
492 |                     print(
493 |                         "NasProt[{}] {} -> {}".format(
494 |                             pd, class_name, sub_class_name
495 |                         )
496 |                     )
497 |                     print(
498 |                         "0x{0:x} ({0}) {1} ({2})".format(msg_type, msg_name, direction)
499 |                     )
500 |                     title = "IE Name,Reference,Spec IEI,Spec Presence,Spec Format,Spec Length,"
501 |                     title += "Bin IEI,Bin Imperative,Bin Length,Bin Idx"
502 |                     print(title)
503 |                     print(s)
504 | 
505 |         # Check messages not implemented in the binary.
506 |         for msg_type in spec_map[pd]:
507 |             if msg_type not in prot_done:
508 |                 spec_msg = spec_map[pd][msg_type][0]
509 |                 class_name, sub_class_name, msg_name, msgs = spec_msg
510 |                 print("=" * 20)
511 |                 print(
512 |                     "NasProt[{}] {} -> {}".format(pd, class_name, sub_class_name)
513 |                 )
514 |                 print(
515 |                     "0x{0:x} ({0}) {1} not implemented in pd {2}".format(
516 |                         msg_type, msg_name, pd
517 |                     )
518 |                 )
519 | 


--------------------------------------------------------------------------------
/basespec/parse_spec.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import sys
  3 | import os
  4 | 
  5 | from collections import defaultdict
  6 | 
  7 | # TODO: handle special cases
  8 | SHORT_PROT = [
  9 |     "VBS/VGCS",
 10 |     "Synchronization channel information",
 11 | ]
 12 | 
 13 | 
 14 | # This may need to run on Linux
 15 | def convert_txt(fname_in, fname_out=""):
 16 |     if not fname_out:
 17 |         fname_out = os.path.splitext(fname_in)[0] + ".txt"
 18 | 
 19 |     if not os.path.exists(fname_out):
 20 |         ext = os.path.splitext(fname_in)[-1]
 21 |         if ext == ".doc":
 22 |             os.system('antiword -w 0 "{}" > "{}"'.format(fname_in, fname_out))
 23 |         elif ext == ".pdf":
 24 |             os.system('pdftotext -layout "{}" "{}"'.format(fname_in, fname_out))
 25 | 
 26 |     return fname_out
 27 | 
 28 | 
 29 | def read_data(fname):
 30 |     with open(fname, "r", encoding="ascii", errors="ignore") as f:
 31 |         data = f.read()
 32 | 
 33 |     # incorrectly converted character
 34 |     data = data.replace("\xa0", " ")
 35 | 
 36 |     # TS 44.018
 37 |     data = data.replace("ACK.", "ACKNOWLEDGE")
 38 |     data = data.replace("NOTIFICATION RESPONSE", "NOTIFICATION/RESPONSE")
 39 |     data = data.replace(
 40 |         "SYSTEM INFORMATION TYPE 2 quater", "SYSTEM INFORMATION TYPE 2quater"
 41 |     )
 42 |     data = data.replace("SYSTEM INFORMATION 15", "SYSTEM INFORMATION TYPE 15")
 43 |     data = data.replace("EXTENDED MEASUREMENT ORDER", " EXTENDED MEASUREMENT ORDER")
 44 |     data = data.replace("CDMA 2000", "CDMA2000")
 45 |     # data = data.replace('00010110  MBMS ANNOUNCEMENT', '00110101  MBMS ANNOUNCEMENT')
 46 | 
 47 |     # TS 24.008
 48 |     #    data = data.replace('DETACH REQUEST', ' DETACH REQUEST')
 49 |     #    data = data.replace('DETACH ACCEPT', ' DETACH ACCEPT')
 50 |     #    data = data.replace('Detach ACCEPT', ' Detach ACCEPT')
 51 |     data = data.replace(" Contents of Service Request", " Service Request")
 52 |     data = data.replace(" Contents of Service Accept", " Service Accept")
 53 |     data = data.replace(" Contents of Service Reject", " Service Reject")
 54 |     data = data.replace(
 55 |         "  Authentication and ciphering req", "  Authentication and ciphering request"
 56 |     )
 57 |     data = data.replace(
 58 |         "  Authentication and ciphering resp", "  Authentication and ciphering response"
 59 |     )
 60 |     data = data.replace(
 61 |         "  Authentication and ciphering rej", "  Authentication and ciphering reject"
 62 |     )
 63 |     data = data.replace("activation rej.", "activation reject")
 64 |     data = data.replace("request(Network", "request (Network")
 65 |     data = data.replace("request(MS", "request (MS")
 66 |     data = data.replace("TABLE", "Table")
 67 |     data = data.replace("AUTHENTICATION FAILURE..", "AUTHENTICATION FAILURE")
 68 |     # Check "Facility(simple recall alignment)" length "2-"
 69 |     lines = data.splitlines()
 70 | 
 71 |     return lines
 72 | 
 73 | 
 74 | def get_direction(lines, idx):
 75 |     # find direction
 76 |     tmp_idx = idx
 77 |     while "direction" not in lines[tmp_idx].lower() and tmp_idx >= 0:
 78 |         tmp_idx -= 1
 79 | 
 80 |     assert "direction" in lines[tmp_idx].lower()
 81 | 
 82 |     s = lines[tmp_idx].lower()
 83 |     if (
 84 |         "network to mobile" in s
 85 |         or "to ms" in s
 86 |         or "to ue" in s
 87 |         or "-> ms " in s
 88 |         or "dl" in s
 89 |     ):
 90 |         direction = "DL"
 91 |     elif (
 92 |         "mobile station to network" in s
 93 |         or "mobile to network" in s
 94 |         or "ms to" in s
 95 |         or "ue to" in s
 96 |         or "ms ->" in s
 97 |         or "ul" in s
 98 |     ):
 99 |         direction = "UL"
100 |     elif "both" in s:
101 |         direction = "both"
102 |     else:
103 |         direction = None
104 | 
105 |     return direction
106 | 
107 | 
108 | def parse_msg_content(lines):
109 |     in_table = False
110 |     direction = None
111 |     ie_list = []
112 |     idx = 0
113 |     msgs = defaultdict(list)
114 |     msg_name = ""
115 | 
116 |     while idx < len(lines):
117 |         line = lines[idx]
118 | 
119 |         if not line:
120 |             idx += 1
121 |             continue
122 | 
123 |         if "Table" in line:
124 |             # Handle messages whose name is too long.
125 |             # ts 44.018: INTER SYSTEM TO CDMA2000 HANDOVER
126 |             # ts 24.008: MODIFY PDP CONTEXT REQUEST, MODIFY PDP CONTEXT ACCEPT
127 |             if "message content" not in line and re.search(
128 |                 "(message )?content[s]?$", lines[idx + 1].strip()
129 |             ):
130 |                 line = line.strip() + " " + lines[idx + 1].strip()
131 |                 idx += 1
132 | 
133 |             # Handle messages whose name is too long.
134 |             # ts 44.018: EC IMMEDIATE ASSIGNMENT TYPE 1
135 |             if "information element" not in line and re.search(
136 |                 "(information )?element[s]?$", lines[idx + 1].strip()
137 |             ):
138 |                 line = line.strip() + " " + lines[idx + 1].strip()
139 | 
140 |             # Handle messages whose name is an exception.
141 |             # ts 24.008: SETUP
142 |             line = re.sub("(.* message content) ?\(.*to.*direction\)", "\g<1>", line)
143 |         #            if 'SETUP message content' in line and 'direction' in line:
144 |         #                line = re.sub('(SETUP message content).*', '\g<1>', line)
145 | 
146 |         if re.match("^[ \|]*Table.*message content", line):
147 |             # If there exist another table, we skip it
148 |             if in_table:
149 |                 if ie_list:
150 |                     msgs[msg_name.lower()].append((direction, ie_list))
151 | 
152 |                 # These are not standard L3 messages.
153 |                 # only one exception is special case for IMMEDIATE ASSIGNMENT as below:
154 |                 # Table 9.1.18.1a: IMMEDIATE ASSIGNMENT message content (MTA
155 |                 # Access Burst or Extended Access Burst Method only)
156 |                 #
157 |                 # We skip this case as it is a special case.
158 |                 idx += 1
159 |                 in_table = False
160 |                 continue
161 | 
162 |             in_table = True
163 |             # g = re.match('^\s*Table [0-9\.\:]+', line)
164 |             msg_name = re.search("[: ]([A-Za-z\-\/0-9 \(\)]+) message content", line)
165 |             msg_name = msg_name.group().strip().replace(":", "")
166 |             # msg_name = msg_name.replace(':', '').strip()
167 |             # table_name = table_name.split()[-1]
168 |             msg_name = msg_name.replace("message content", "").strip()
169 |             # msg_name = (table_name, msg_name)
170 | 
171 |             direction = get_direction(lines, idx)
172 |             ie_list = []
173 | 
174 |             idx += 1
175 |             continue
176 | 
177 |         if re.match("^\s*Table.*information elements[ ]*$", line):
178 |             # If there exist another table, we skip it
179 |             if in_table:
180 |                 if ie_list:
181 |                     msgs[msg_name.lower()].append((direction, ie_list))
182 | 
183 |                 # These are not standard L3 messages.
184 |                 idx += 1
185 |                 in_table = False
186 |                 continue
187 | 
188 |             in_table = True
189 |             # g = re.match('^\s*Table [0-9\.\:]+', line)
190 |             msg_name = re.search(" ([A-Za-z\-\/0-9 ]+) information elements", line)
191 |             msg_name = msg_name.group().strip()
192 |             # table_name = table_name.split()[-1]
193 |             msg_name = msg_name.replace("information elements", "").strip()
194 |             # msg_name = (table_name, msg_name)
195 | 
196 |             direction = get_direction(lines, idx)
197 |             ie_list = []
198 | 
199 |             idx += 1
200 |             continue
201 | 
202 |         if not in_table:
203 |             idx += 1
204 |             continue
205 | 
206 |         if re.match("^\d+\.\d+", line):
207 |             if ie_list:
208 |                 msgs[msg_name.lower()].append((direction, ie_list))
209 |             else:
210 |                 # this is not a proper standard L3 message
211 |                 # print('{} may not be parsed properly.'.format(msg_name))
212 |                 pass
213 | 
214 |             in_table = False
215 |             direction = None
216 |             ie_list = []
217 |             idx += 1
218 |             continue
219 | 
220 |         if "ETSI" in line:
221 |             idx += 1
222 |             continue
223 | 
224 |         # dummy line
225 |         # IEI Information Element Type/Reference Presence Format Length
226 |         # ts 44.018 has 'length', not 'Length'
227 |         if (
228 |             "presence" in lines[idx].lower()
229 |             and "format" in lines[idx].lower()
230 |             and "length" in lines[idx].lower()
231 |         ):
232 |             idx += 1
233 |             continue
234 | 
235 |         fields = lines[idx].split("|")
236 |         fields = list(filter(lambda x: x, fields))
237 | 
238 |         if len(fields) != 6:
239 |             idx += 1
240 |             continue
241 | 
242 |         fields = list(map(lambda x: x.strip(), fields))
243 |         iei, ie_name, ref, presence, ie_format, length = fields
244 |         if not presence or not ie_format:
245 |             idx += 1
246 |             continue
247 | 
248 |         # incompatible spec
249 |         length = length.replace("octets", "").strip()
250 |         length = length.replace("octet", "").strip()
251 |         length = length.replace("(", "-").strip()
252 |         length = length.replace(" ", "")
253 |         # length = length.replace('?', 'n')
254 | 
255 |         # convert spec error
256 |         if length.endswith("-"):
257 |             length = length + "n"
258 | 
259 |         if not length:
260 |             length = "1/2"
261 | 
262 |         # spec error
263 |         if "n" in length and "-" not in length:
264 |             length = length.replace("n", "-n")
265 | 
266 |         if length == "1/23/2":
267 |             length = "1/2-3/2"
268 | 
269 |         # For SMS (type 9) - CM - RP messages. rest 5 bits are spare
270 |         if length == "3bits":
271 |             length = "1"
272 | 
273 |         # For SMS (type 9) - some messages use '<=' operator
274 |         if length.startswith("-"):
275 |             length = "1" + length
276 | 
277 |         # convert spec parsing error
278 |         try:
279 |             if 3000 < int(length) < 4000:
280 |                 length = length[0] + "-" + length[1:]
281 |         except:
282 |             pass
283 | 
284 |         ie_list.append([ie_name, iei, ref, presence, ie_format, length])
285 | 
286 |         idx += 1
287 | 
288 |     return msgs
289 | 
290 | 
291 | def parse_msg_type(lines, pdf=False):
292 |     in_table = False
293 |     idx = 0
294 |     msgs = defaultdict(list)
295 |     prefix = ""
296 | 
297 |     while idx < len(lines):
298 |         line = lines[idx]
299 | 
300 |         if not line:
301 |             idx += 1
302 |             continue
303 | 
304 |         if re.match("^\s*Table.*Message types", line):
305 |             in_table = True
306 |             table_name, class_name = line.split(":")
307 |             table_name = table_name.split()[-1]
308 |             if "for" in class_name:
309 |                 class_name = class_name.replace("Message types for", "").strip()
310 |             else:
311 |                 class_name = ""
312 |             prefix = ""
313 | 
314 |             # dummy line
315 |             # IEI Information Element Type/Reference Presence Format Length
316 |             idx += 1
317 |             continue
318 | 
319 |         if not in_table:
320 |             idx += 1
321 |             continue
322 | 
323 |         if re.match("^\d+\.\d+", line):
324 |             in_table = False
325 |             idx += 1
326 |             continue
327 | 
328 |         if "ETSI" in line:
329 |             idx += 1
330 |             continue
331 | 
332 |         msg = lines[idx]
333 | 
334 |         if pdf:
335 |             msg_type = re.match("^[01x\- ]+", msg)
336 |         else:
337 |             msg_type = re.match("^\|[\|\.01x\- ]+", msg)
338 | 
339 |         if not msg_type:
340 |             idx += 1
341 |             continue
342 | 
343 |         msg_type = msg_type.group()
344 |         msg_name = msg.replace(msg_type, "")
345 |         msg_name = msg_name.replace("|", "").strip()
346 |         msg_name = msg_name.replace(":", "").strip()
347 | 
348 |         msg_type = msg_type.replace("|", "")
349 |         msg_type = msg_type.replace(" ", "").strip()
350 |         msg_type = msg_type.replace(".", "-")
351 | 
352 |         if "reserved" in msg_name.lower():
353 |             idx += 1
354 |             continue
355 | 
356 |         if len(msg_type) < 4:
357 |             idx += 1
358 |             continue
359 | 
360 |         msg_cnt = sum(map(lambda x: x == "-", msg_type))
361 | 
362 |         if msg_cnt > 2:
363 |             prefix = msg_type
364 |             if class_name:
365 |                 sub_class_name = class_name + "-" + msg_name
366 |             else:
367 |                 sub_class_name = msg_name
368 |             idx += 1
369 |             continue
370 | 
371 |         if "x" in msg_type:
372 |             idx += 1
373 |             continue
374 | 
375 |         msg_type = msg_type.replace("-", "")
376 |         if len(msg_type) < 8 and prefix:
377 |             prefix_cnt = sum(map(lambda x: x == "-", prefix))
378 |             if len(msg_type) != prefix_cnt:
379 |                 idx += 1
380 |                 continue
381 | 
382 |             msg_type = prefix[: (8 - len(msg_type))] + msg_type
383 |             msg_type = msg_type.replace("x", "0")
384 | 
385 |         # print(msg_type, prefix, msg_name)
386 |         #        if len(msg_type) != 8:
387 |         #            msg_type = msg_type.rjust(8, '0')
388 |         #            import pdb; pdb.set_trace()
389 |         #            idx += 1
390 |         #            continue
391 | 
392 |         assert msg_name not in msgs
393 | 
394 |         if prefix:
395 |             msgs[msg_name] = [msg_type, sub_class_name]
396 |         else:
397 |             msgs[msg_name] = [msg_type, class_name]
398 |         idx += 1
399 | 
400 |     return msgs
401 | 
402 | 
403 | def handle_exception_24011(msgs):
404 |     """
405 |     # ts 4.011, RP messages use
406 |     RP messages are included in CP messages
407 |     0 0 0 ms -> n RP-DATA
408 |     0 0 1 n -> ms RP-DATA
409 |     0 1 0 ms -> n RP-ACK
410 |     0 1 1 n -> ms RP-ACK
411 |     1 0 0 ms -> n RP-ERROR
412 |     1 0 1 n -> ms RP-ERROR
413 |     1 1 0 ms -> n RP-SMMA
414 |     """
415 | 
416 |     cp_class_name = "short message and notification transfer on CM"
417 |     rp_class_name = "short message and notification transfer on CM-RP messages"
418 |     types = {
419 |         # 'cp-data' [1, cp_Class_name] is embedding below messages
420 |         "rp-data": [0, rp_class_name],
421 |         "rp-data": [1, rp_class_name],
422 |         "rp-ack": [2, rp_class_name],
423 |         "rp-ack": [3, rp_class_name],
424 |         "rp-error": [4, rp_class_name],
425 |         "rp-error": [5, rp_class_name],
426 |         "rp-smma": [6, rp_class_name],
427 |         "cp-ack": [4, cp_class_name],
428 |         "cp-error": [16, cp_class_name],
429 |     }
430 | 
431 |     return msgs, types
432 | 
433 | 
434 | # complementary parser
435 | def parse(input_fname, input_fname2=""):
436 |     txt_name = convert_txt(input_fname)
437 |     lines = read_data(txt_name)
438 |     msgs = parse_msg_content(lines)
439 |     types = parse_msg_type(lines, ".pdf" in input_fname)
440 | 
441 |     # complementary step
442 |     if input_fname2:
443 |         # analyze pdf to extract types
444 |         txt_name = convert_txt(input_fname2)
445 |         lines = read_data(txt_name)
446 |         # msgs2 = parse_msg_content(lines)
447 |         types2 = parse_msg_type(lines, ".pdf" in input_fname2)
448 | 
449 |         for key, val in types2.items():
450 |             if key not in types:
451 |                 types[key] = val
452 | 
453 |     if "24011" in input_fname or "24.011" in input_fname:
454 |         msgs, types = handle_exception_24011(msgs)
455 | 
456 |     total = defaultdict(list)
457 |     for msg_name, (msg_type, class_name) in types.items():
458 |         orig_name = msg_name
459 |         msg_name = msg_name.lower()
460 |         if "reserved" in msg_name:
461 |             continue
462 | 
463 |         if msg_name not in msgs:
464 |             continue
465 | 
466 |         if isinstance(msg_type, str):
467 |             msg_type = int(msg_type, 2)
468 |         total[class_name].append([msg_type, orig_name, msgs[msg_name]])
469 | 
470 |     return total
471 | 
472 | 
473 | def parse_all():
474 |     path = os.path.abspath(os.path.dirname(__file__))
475 |     file_list = [
476 |         ["24008-f80.doc", "ts_124008v150800p.pdf"],
477 |         ["24011-f30.doc", ""],
478 |         ["24080-f10.doc", ""],
479 |         ["24301-f80.doc", "ts_124301v150800p.pdf"],
480 |         ["44018-f50.doc", ""],
481 |     ]
482 | 
483 |     total = {}
484 |     for f1, f2 in file_list:
485 |         f1 = os.path.join(path, "spec", f1)
486 |         if f2:
487 |             f2 = os.path.join(path, "spec", f2)
488 |         msgs = parse(f1, f2)
489 |         for key, val in msgs.items():
490 |             total[key] = val
491 | 
492 |     return total
493 | 
494 | 
495 | def get_spec_msgs():
496 |     msgs = parse_all()
497 |     # =============================
498 |     # Message types
499 |     # =============================
500 |     nas_prots = {
501 |         "EPS session management": 2,
502 |         "Call Control and call related SS messages": 3,
503 |         "GTTP messages": 4,
504 |         "Mobility Management": 5,
505 |         "Radio Resource management": 6,
506 |         "EPS mobility management": 7,
507 |         "GPRS mobility management": 8,
508 |         "short message and notification transfer": 9,
509 |         "GPRS session management": 10,
510 |         "Miscellaneous message group": 11,
511 |         "Clearing messages": 11,
512 |     }
513 | 
514 |     spec_map = {}
515 |     for nas_name, nas_type in nas_prots.items():
516 |         spec_map[nas_type] = {}
517 | 
518 |     for class_name, vals in sorted(msgs.items()):
519 |         if "-" in class_name:
520 |             class_name, sub_class_name = class_name.split("-")
521 |         else:
522 |             sub_class_name = ""
523 | 
524 |         target_nas_type = None
525 |         for nas_name, nas_type in nas_prots.items():
526 |             if nas_name in class_name:
527 |                 target_nas_type = nas_type
528 |                 break
529 | 
530 |         assert target_nas_type is not None
531 | 
532 |         for msg_type, msg_name, ie_list in vals:
533 |             if msg_type not in spec_map[target_nas_type]:
534 |                 spec_map[target_nas_type][msg_type] = []
535 |             spec_map[target_nas_type][msg_type].append(
536 |                 [class_name, sub_class_name, msg_name, ie_list]
537 |             )
538 | 
539 |     return spec_map
540 | 
541 | 
542 | def main():
543 |     if len(sys.argv) < 2:
544 |         msgs = parse_all()
545 | 
546 |     elif len(sys.argv) == 2:
547 |         input_fname = sys.argv[1]
548 |         msgs = parse(input_fname)
549 | 
550 |     elif len(sys.argv) > 2:
551 |         input_fname = sys.argv[1]
552 |         input_fname2 = sys.argv[2]
553 |         msgs = parse(input_fname, input_fname2)
554 | 
555 |     for class_name, vals in sorted(msgs.items()):
556 |         for msg_type, msg_name, msgs2 in sorted(vals):
557 |             for direction, ie_list in msgs2:
558 |                 for ie in ie_list:
559 |                     ie_name, iei, ref, presence, ie_format, length = ie
560 |                     if direction == "UL":
561 |                         print(
562 |                             class_name,
563 |                             "->",
564 |                             msg_name,
565 |                             "(UL) ->",
566 |                             ie_name,
567 |                             ie_format,
568 |                             length,
569 |                         )
570 |                     elif direction == "DL":
571 |                         print(
572 |                             class_name,
573 |                             "->",
574 |                             msg_name,
575 |                             "(DL) ->",
576 |                             ie_name,
577 |                             ie_format,
578 |                             length,
579 |                         )
580 |                     elif direction == "both":
581 |                         print(
582 |                             class_name,
583 |                             "->",
584 |                             msg_name,
585 |                             "(Both) ->",
586 |                             ie_name,
587 |                             ie_format,
588 |                             length,
589 |                         )
590 |                     else:
591 |                         print(
592 |                             class_name,
593 |                             "->",
594 |                             msg_name,
595 |                             "() ->",
596 |                             ie_name,
597 |                             ie_format,
598 |                             length,
599 |                         )
600 |                         assert False
601 | 
602 | 
603 | if __name__ == "__main__":
604 |     main()
605 | 


--------------------------------------------------------------------------------
/basespec/preprocess.py:
--------------------------------------------------------------------------------
  1 | import time
  2 | import pickle
  3 | 
  4 | import idc
  5 | import idaapi
  6 | import idautils
  7 | 
  8 | import ida_funcs
  9 | import ida_auto
 10 | import ida_bytes
 11 | import ida_offset
 12 | import ida_segment
 13 | 
 14 | from idautils import XrefsTo
 15 | 
 16 | from .utils import get_string, check_string, create_string
 17 | from .utils import check_funcname, set_funcname
 18 | 
 19 | 
 20 | def next_addr_aligned(ea, align=2, end_ea=idc.BADADDR):
 21 |     ea = ida_bytes.next_inited(ea, end_ea)
 22 |     while ea % align != 0:
 23 |         ea = ida_bytes.next_inited(ea, end_ea)
 24 | 
 25 |     return ea
 26 | 
 27 | 
 28 | def prev_addr_aligned(ea, align=2, end_ea=0):
 29 |     ea = ida_bytes.prev_inited(ea, 0)
 30 |     while ea % align != 0:
 31 |         ea = ida_bytes.prev_inited(ea, 0)
 32 | 
 33 |     return ea
 34 | 
 35 | 
 36 | def is_assigned(ea):
 37 |     func_end = idc.get_func_attr(ea, idc.FUNCATTR_END)
 38 |     # check if function is already assigned
 39 |     if func_end != idc.BADADDR:
 40 |         # sometimes, ida returns wrong addr
 41 |         if func_end > ea:
 42 |             ea = func_end
 43 |         else:
 44 |             ea = next_addr_aligned(ea, 4)
 45 | 
 46 |         return True, ea
 47 | 
 48 |     # check if string is already assigned
 49 |     if idc.get_str_type(ea) is not None:
 50 |         item_end = idc.get_item_end(ea)
 51 |         # sometimes, ida returns wrong addr
 52 |         if item_end > ea:
 53 |             ea = item_end
 54 |         else:
 55 |             ea = next_addr_aligned(ea, 4)
 56 | 
 57 |         return True, ea
 58 | 
 59 |     return False, ea
 60 | 
 61 | 
 62 | string_initialized = False
 63 | 
 64 | 
 65 | def init_strings(force=False, target_seg_name=None):
 66 |     global string_initialized
 67 |     if not force and string_initialized:
 68 |         return
 69 | 
 70 |     for ea in idautils.Segments():
 71 |         seg = ida_segment.getseg(ea)
 72 |         seg_name = ida_segment.get_segm_name(seg)
 73 | 
 74 |         # We only check target segment since it may take too much time.
 75 |         if target_seg_name and seg_name == target_seg_name:
 76 |             continue
 77 | 
 78 |         print("Initializing %x -> %x (%s)" % (seg.start_ea, seg.end_ea, seg_name))
 79 | 
 80 |         # TODO: we may use other strategy to find string pointers
 81 |         analyze_str_ptr(seg.start_ea, seg.end_ea)
 82 | 
 83 |     analyze_ida_str()
 84 |     string_initialized = True
 85 | 
 86 | 
 87 | def analyze_str_ptr(start_ea, end_ea):
 88 |     str_cnt = 0
 89 |     start_time = time.time()
 90 | 
 91 |     # First, find string references. strings referenced by pointers are highly
 92 |     # likely string.
 93 |     ea = start_ea
 94 |     while ea != idc.BADADDR and ea < end_ea:
 95 |         status, ea = is_assigned(ea)
 96 |         if status:
 97 |             continue
 98 | 
 99 |         str_ptr = ida_bytes.get_dword(ea)
100 |         if idc.get_str_type(str_ptr) is not None or check_string(str_ptr, 8):
101 |             # even already assigned strings may not have reference.
102 |             ida_offset.op_offset(ea, 0, idc.REF_OFF32)
103 | 
104 |             if idc.get_str_type(str_ptr) is None:
105 |                 create_string(str_ptr)
106 | 
107 |             str_cnt += 1
108 |             if str_cnt % 10000 == 0:
109 |                 print(
110 |                     "%x: %d strings has been found. (%0.3f secs)"
111 |                     % (ea, str_cnt, time.time() - start_time)
112 |                 )
113 | 
114 |         ea = next_addr_aligned(ea, 4)
115 | 
116 |     print("Created %d strings. (%0.3f secs)" % (str_cnt, time.time() - start_time))
117 | 
118 | 
119 | def analyze_ida_str():
120 |     global all_str
121 |     if "all_str" not in globals():
122 |         all_str = idautils.Strings()
123 | 
124 |     str_cnt = 0
125 |     start_time = time.time()
126 | 
127 |     for s in all_str:
128 |         # check if there exists already assigned function or string
129 |         if any(
130 |             ida_funcs.get_fchunk(ea) or (idc.get_str_type(ea) is not None)
131 |             for ea in (s.ea, s.ea + s.length)
132 |         ):
133 |             continue
134 | 
135 |         if check_string(30) and create_string(s.ea):
136 |             str_cnt += 1
137 |             if str_cnt % 1000 == 0:
138 |                 print(
139 |                     "%x: %d strings has been found. (%0.3f secs)"
140 |                     % (s.ea, str_cnt, time.time() - start_time)
141 |                 )
142 | 
143 |     print("Created %d strings. (%0.3f secs)" % (str_cnt, time.time() - start_time))
144 | 
145 | 
146 | NONE = 0
147 | ARM = 1
148 | THUMB = 2
149 | 
150 | 
151 | def is_valid_reglist(word, is_16bit=False):
152 |     if is_16bit:
153 |         word = word & 0x1FF
154 |         # should contain LR
155 |         lr = word >> 8 & 1
156 |         if lr != 1:
157 |             return False
158 | 
159 |         regs = word
160 | 
161 |     else:
162 |         # should contain LR
163 |         # should not contain SP, PC
164 |         sp = word >> 13 & 1
165 |         lr = word >> 14 & 1
166 |         pc = word >> 15 & 1
167 |         if sp != 0 or lr != 1 or pc != 0:
168 |             return False
169 | 
170 |         regs = word & 0x1FFF
171 | 
172 |     # At least one reg should exist
173 |     if regs == 0:
174 |         return False
175 | 
176 |     # We compute maximum consequtive 1s to find continuous reg-list like
177 |     # '0001111100000'.
178 |     if is_16bit:
179 |         threshold = 0
180 |     else:
181 |         threshold = 1
182 | 
183 |     cnt = 0
184 |     while regs != 0:
185 |         regs = regs & (regs << 1)
186 |         cnt += 1
187 | 
188 |     if cnt > threshold:
189 |         return True
190 |     else:
191 |         return False
192 | 
193 | 
194 | # This function only checks PUSH instruction.
195 | # TODO: properly find function prolog (e.g, using ML?)
196 | def is_func_prolog(ea, reg_check=True):
197 |     if ea % 2 != 0:
198 |         return NONE
199 | 
200 |     # function prolog requires at least four bytes
201 |     if any(not ida_bytes.is_mapped(ea + i) for i in range(4)):
202 |         return NONE
203 | 
204 |     word = ida_bytes.get_word(ea)
205 |     next_word = ida_bytes.get_word(ea + 2)
206 | 
207 |     # check thumb PUSH.W
208 |     if word == 0xE92D:
209 |         # PUSH LR
210 |         if not reg_check or is_valid_reglist(next_word, False):
211 |             return THUMB
212 | 
213 |     # check thumb PUSH
214 |     elif (word >> 8) == 0xB5:
215 |         if not reg_check or is_valid_reglist(word, True):
216 |             return THUMB
217 | 
218 |     # check arm PUSH
219 |     elif next_word == 0xE92D:
220 |         if ea % 4 == 0:
221 |             # PUSH LR
222 |             if not reg_check or is_valid_reglist(word, False):
223 |                 return ARM
224 | 
225 |     return NONE
226 | 
227 | 
228 | # This function finds function candidates.  Currently, we only find the
229 | # candidates by prolog.
230 | def find_prev_func_cand(ea, end_ea=idc.BADADDR):
231 |     while ea < end_ea and ea != idc.BADADDR:
232 |         mode = is_func_prolog(ea, reg_check=False)
233 |         if mode != NONE:
234 |             # check if the function is already assigned. If current function
235 |             # has found, we cannot just set ea to the function end since IDA
236 |             # may fail to find the function end corrently.
237 |             if idc.get_func_attr(ea, idc.FUNCATTR_START) == idc.BADADDR:
238 |                 return ea, mode
239 | 
240 |         ea = prev_addr_aligned(ea)
241 | 
242 |     return idc.BADADDR, NONE
243 | 
244 | 
245 | # This function finds function candidates.  Currently, we only find the
246 | # candidates by prolog.
247 | def find_next_func_cand(ea, end_ea=idc.BADADDR):
248 |     while ea < end_ea and ea != idc.BADADDR:
249 |         mode = is_func_prolog(ea)
250 |         if mode != NONE:
251 |             # check if the function is already assigned. If current function
252 |             # has found, we cannot just set ea to the function end since IDA
253 |             # may fail to find the function end corrently.
254 |             if idc.get_func_attr(ea, idc.FUNCATTR_START) == idc.BADADDR:
255 |                 return ea, mode
256 | 
257 |         ea = next_addr_aligned(ea)
258 | 
259 |     return idc.BADADDR, NONE
260 | 
261 | 
262 | def fix_func_prolog(ea, end_ea=idc.BADADDR):
263 |     global FUNC_BY_LS
264 | 
265 |     func_cnt = 0
266 |     func = ida_funcs.get_fchunk(ea)
267 |     if func is None:
268 |         func = ida_funcs.get_next_func(ea)
269 |     ea = func.start_ea
270 | 
271 |     while func is not None and ea < end_ea:
272 |         # if current function is small enough and there exists a function right
273 |         # next to current function
274 |         if (
275 |             func.size() <= 8
276 |             and idc.get_func_attr(func.end_ea, idc.FUNCATTR_START) != idc.BADADDR
277 |         ):
278 |             # If the next function can be connected, there must be a basic block reference.
279 |             # xref.type == 21 means 'fl_F', which is an ordinary flow.
280 |             if all(
281 |                 (func.start_ea <= xref.frm < func.end_ea) and xref.type == 21
282 |                 for xref in XrefsTo(func.end_ea)
283 |             ):
284 |                 if func_cnt > 0 and func_cnt % 1000 == 0:
285 |                     print(
286 |                         "%x <- %x: prolog merging (%d)."
287 |                         % (func.start_ea, func.end_ea, func_cnt)
288 |                     )
289 |                 ida_bytes.del_items(func.end_ea, ida_bytes.DELIT_EXPAND)
290 |                 ida_bytes.del_items(func.start_ea, ida_bytes.DELIT_EXPAND)
291 |                 ida_auto.auto_wait()
292 | 
293 |                 status = idc.add_func(func.start_ea)
294 |                 if not status:
295 |                     print("Error merging 0x%x <- 0x%x" % (func.start_ea, func.end_ea))
296 |                 else:
297 |                     func_cnt += 1
298 |                     FUNC_BY_LS.discard(func.end_ea)
299 |                 ida_auto.auto_wait()
300 | 
301 |         func = ida_funcs.get_next_func(ea)
302 |         if func:
303 |             ea = func.start_ea
304 | 
305 |     print("Fixed %d functions" % func_cnt)
306 | 
307 | 
308 | FUNC_BY_LS = set()
309 | FUNC_BY_LS_TIME = None
310 | 
311 | 
312 | def analyze_linear_sweep(start_ea, end_ea=idc.BADADDR):
313 |     global FUNC_BY_LS, FUNC_BY_LS_TIME
314 |     if "FUNC_BY_LS" not in globals() or len(FUNC_BY_LS) == 0:
315 |         FUNC_BY_LS = set()
316 | 
317 |     cand_cnt = 0
318 |     func_cnt = 0
319 |     ea = start_ea
320 |     start_time = time.time()
321 |     while ea < end_ea and ea != idc.BADADDR:
322 |         ea, mode = find_next_func_cand(ea, end_ea)
323 |         if ea == idc.BADADDR:
324 |             break
325 | 
326 |         cand_cnt += 1
327 |         if cand_cnt % 10000 == 0:
328 |             print(
329 |                 "%x: %d/%d function has been found (%d secs)"
330 |                 % (ea, func_cnt, cand_cnt, time.time() - start_time)
331 |             )
332 | 
333 |         # set IDA segment register to specify ARM mode
334 |         old_flag = idc.get_sreg(ea, "T")
335 |         if mode == THUMB:
336 |             idc.split_sreg_range(ea, "T", 1, idc.SR_user)
337 |         elif mode == ARM:
338 |             idc.split_sreg_range(ea, "T", 0, idc.SR_user)
339 |         else:
340 |             print("Unknown mode")
341 |             raise NotImplemented
342 | 
343 |         # add_func ignores the existing function, but existing function is
344 |         # already filtered when finding the candidate
345 |         status = idc.add_func(ea)
346 |         if status:
347 |             func_cnt += 1
348 |             FUNC_BY_LS.add(ea)
349 | 
350 |             # Wait IDA's auto analysis
351 |             ida_auto.auto_wait()
352 | 
353 |             # even though add_func succeed, it may not be correct.
354 |             # TODO: how to check the correctness? we may check the function end?
355 |             func_end = idc.get_func_attr(ea, idc.FUNCATTR_END)
356 |             if func_end > ea:
357 |                 ea = func_end
358 |             else:
359 |                 # sometimes, ida returns wrong addr
360 |                 ea = next_addr_aligned(ea)
361 | 
362 |         else:
363 |             if idc.get_func_attr(ea, idc.FUNCATTR_START) == idc.BADADDR:
364 |                 # IDA automatically make code, and this remains even though
365 |                 # add_func fails.
366 |                 ida_bytes.del_items(ea, ida_bytes.DELIT_EXPAND)
367 | 
368 |                 # reset IDA segment register to previous ARM mode
369 |                 idc.split_sreg_range(ea, "T", old_flag, idc.SR_user)
370 | 
371 |                 # Wait IDA's auto analysis
372 |                 ida_auto.auto_wait()
373 | 
374 |             ea = next_addr_aligned(ea)
375 | 
376 |     # linear sweep may choose wrong prologs. We merge the prologs of two
377 |     # adjacent functions.
378 |     if func_cnt > 0:
379 |         fix_func_prolog(start_ea, end_ea)
380 | 
381 |     FUNC_BY_LS_TIME = time.time() - start_time
382 |     print(
383 |         "Found %d/%d functions. (%d sec)" % (len(FUNC_BY_LS), cand_cnt, FUNC_BY_LS_TIME)
384 |     )
385 | 
386 | 
387 | # TODO: find other functions by other pointer analysis
388 | # Please check analyze_func_ptrs function at the below
389 | FUNC_BY_PTR = set()
390 | FUNC_BY_PTR_TIME = None
391 | 
392 | 
393 | def analyze_func_ptr(start_ea, end_ea):
394 |     global FUNC_BY_PTR, FUNC_BY_PTR_TIME
395 |     if "FUNC_BY_PTR" not in globals() or len(FUNC_BY_PTR) == 0:
396 |         FUNC_BY_PTR = set()
397 | 
398 |     ea = start_ea
399 |     func_cnt = 0
400 |     name_cnt = 0
401 |     start_time = time.time()
402 | 
403 |     while ea != idc.BADADDR and ea <= end_ea:
404 |         status, ea = is_assigned(ea)
405 |         if status:
406 |             continue
407 | 
408 |         # now check function pointer
409 |         func_ptr = ida_bytes.get_dword(ea)
410 | 
411 |         # TODO: skip other segments that are not code.
412 | 
413 |         # for those already assigned functions, we need to check the segment range.
414 |         if not (start_ea <= func_ptr < end_ea):
415 |             ea = next_addr_aligned(ea, 4)
416 |             continue
417 | 
418 |         # we only target thumb function to reduce false positives
419 |         if func_ptr & 1 == 0:
420 |             ea = next_addr_aligned(ea, 4)
421 |             continue
422 | 
423 |         func_ptr = func_ptr - 1
424 |         func_start = idc.get_func_attr(func_ptr, idc.FUNCATTR_START)
425 |         if func_start != idc.BADADDR and func_start != func_ptr:
426 |             # this is not a proper function pointer
427 |             ea = next_addr_aligned(ea, 4)
428 |             continue
429 | 
430 |         # new thumb function has been found!
431 |         if func_start == idc.BADADDR:
432 |             old_flag = idc.get_sreg(func_ptr, "T")
433 |             idc.split_sreg_range(func_ptr, "T", 1, idc.SR_user)
434 |             status = idc.add_func(func_ptr)
435 |             if not status:
436 |                 # IDA automatically make code, and this remains even
437 |                 # though add_func fails.
438 |                 ida_bytes.del_items(func_ptr, ida_bytes.DELIT_EXPAND)
439 |                 idc.split_sreg_range(func_ptr, "T", old_flag, idc.SR_user)
440 | 
441 |                 ea = next_addr_aligned(ea, 4)
442 |                 continue
443 | 
444 |             func_cnt += 1
445 |             FUNC_BY_PTR.add(ea)
446 |             if func_cnt % 10000 == 0:
447 |                 print(
448 |                     "%x: %d functions has been found. (%0.3f secs)"
449 |                     % (ea, func_cnt, time.time() - start_time)
450 |                 )
451 | 
452 |         # If we find a function, we try to assign a name. The name may be
453 |         # derived by C++ structure.
454 |         if analyze_funcname(ea, func_ptr):
455 |             name_cnt += 1
456 |             func_name = idc.get_func_name(func_ptr)
457 |             if name_cnt % 10000 == 0:
458 |                 print(
459 |                     "%x: %d names has been found. (%0.3f secs)"
460 |                     % (ea, name_cnt, time.time() - start_time)
461 |                 )
462 |             # print("%x: %x => %s" % (ea, func_ptr, func_name))
463 | 
464 |         ea = next_addr_aligned(ea, 4)
465 | 
466 |     FUNC_BY_PTR_TIME = time.time() - start_time
467 |     print(
468 |         "Found %d functions, renamed %d functions (%0.3f secs)"
469 |         % (len(FUNC_BY_PTR), name_cnt, FUNC_BY_PTR_TIME)
470 |     )
471 | 
472 | 
473 | # TODO: find the cause of the remaining function names.
474 | def analyze_funcname(ea, func_ptr):
475 |     # check at least 10 items.
476 |     name_pptr = find_funcname_ptr(ea, 10)
477 |     if not name_pptr:
478 |         return False
479 | 
480 |     ida_offset.op_offset(ea, 0, idc.REF_OFF32)
481 |     ida_offset.op_offset(name_pptr, 0, idc.REF_OFF32)
482 | 
483 |     func_name = idc.get_func_name(func_ptr)
484 |     if not func_name.startswith("sub_"):
485 |         # already function name has been assigned
486 |         return False
487 | 
488 |     name_ptr = ida_bytes.get_dword(name_pptr)
489 |     func_name = get_string(name_ptr)
490 |     if isinstance(func_name, bytes):
491 |         func_name = func_name.decode()
492 | 
493 |     set_funcname(func_ptr, func_name)
494 | 
495 |     return True
496 | 
497 | 
498 | # this function returns first name pointer
499 | def find_funcname_ptr(ea, n):
500 |     for i in range(4, n * 4, 4):
501 |         name_ptr = ida_bytes.get_dword(ea + i)
502 |         # this might be a function, so break and proceed next check
503 |         if name_ptr & 1 == 1:
504 |             return
505 |         elif check_funcname(name_ptr):
506 |             return ea + i
507 |     return
508 | 
509 | 
510 | func_initialized = False
511 | 
512 | 
513 | def init_functions(force=False, target_seg_name=None):
514 |     global func_initialized
515 |     if not force and func_initialized:
516 |         return
517 | 
518 |     # Linear sweep to find functions
519 |     for ea in idautils.Segments():
520 |         # TODO: skip other segments that are not code.
521 |         seg = ida_segment.getseg(ea)
522 |         seg_name = ida_segment.get_segm_name(seg)
523 | 
524 |         # We only check target segment since it may take too much time.
525 |         if target_seg_name and seg_name == target_seg_name:
526 |             continue
527 | 
528 |         # TODO: we may use other strategy not just sweep linearly.
529 |         print(
530 |             "Linear sweep analysis: %x -> %x (%s)"
531 |             % (seg.start_ea, seg.end_ea, seg_name)
532 |         )
533 |         analyze_linear_sweep(seg.start_ea, seg.end_ea)
534 | 
535 |     # Find function pointer candidates
536 |     for ea in idautils.Segments():
537 |         # TODO: skip other segments that are not code.
538 |         seg = ida_segment.getseg(ea)
539 |         seg_name = ida_segment.get_segm_name(seg)
540 | 
541 |         # We only check target segment since it may take too much time.
542 |         if target_seg_name and seg_name == target_seg_name:
543 |             continue
544 | 
545 |         # Analyze functions by pointers
546 |         print(
547 |             "Function pointer analysis: %x -> %x (%s)"
548 |             % (seg.start_ea, seg.end_ea, seg_name)
549 |         )
550 |         analyze_func_ptr(seg.start_ea, seg.end_ea)
551 | 
552 |     func_initialized = True
553 | 


--------------------------------------------------------------------------------
/basespec/scatterload.py:
--------------------------------------------------------------------------------
  1 | import idc
  2 | import ida_bytes
  3 | import ida_segment
  4 | import ida_search
  5 | import ida_offset
  6 | import ida_ua
  7 | import ida_auto
  8 | 
  9 | from idautils import XrefsTo
 10 | 
 11 | from .utils import set_entry_name
 12 | from .slicer import find_args
 13 | 
 14 | # This is the main function
 15 | # It finds scatterload related information, and performs scatterloading
 16 | def run_scatterload(debug=False):
 17 |     # Newly identified region may have additional scatter load procedure. Thus,
 18 |     # we continuously proceed until no changes left.
 19 |     is_changed = True
 20 |     while is_changed:
 21 |         is_changed = False
 22 |         tables = find_scatter_table()
 23 |         scatter_funcs = find_scatter_funcs()
 24 | 
 25 |         for start, end in tables.items():
 26 |             print("Processing table: 0x%x to 0x%x" % (start, end))
 27 |             while start < end:
 28 |                 ida_bytes.create_dword(start, 16)
 29 |                 ida_offset.op_offset(start, 0, idc.REF_OFF32)
 30 |                 src = ida_bytes.get_dword(start)
 31 |                 dst = ida_bytes.get_dword(start + 4)
 32 |                 size = ida_bytes.get_dword(start + 8)
 33 |                 how = ida_bytes.get_dword(start + 12)
 34 | 
 35 |                 if how not in scatter_funcs:
 36 |                     print("%x: no addr 0x%x in scatter_funcs" % (start, how))
 37 |                     start += 16
 38 |                     continue
 39 | 
 40 |                 func_name = scatter_funcs[how]
 41 |                 start += 16
 42 |                 print("%s: 0x%x -> 0x%x (0x%x bytes)" % (func_name, src, dst, size))
 43 | 
 44 |                 if func_name != "__scatterload_zeroinit":
 45 |                     if not idc.is_loaded(src) or size == 0:
 46 |                         print("0x%x is not loaded." % (src))
 47 |                         continue
 48 | 
 49 |                 if debug:
 50 |                     # only show information above
 51 |                     continue
 52 | 
 53 |                 if func_name == "__scatterload_copy":
 54 |                     if add_segment(dst, size, "CODE"):
 55 |                         memcpy(src, dst, size)
 56 |                         is_changed = True
 57 |                 elif func_name == "__scatterload_decompress":
 58 |                     if add_segment(dst, size, "DATA"):
 59 |                         decomp(src, dst, size)
 60 |                         is_changed = True
 61 |                 # some old firmware images have this.
 62 |                 elif func_name == "__scatterload_decompress2":
 63 |                     if add_segment(dst, size, "DATA"):
 64 |                         decomp2(src, dst, size)
 65 |                         is_changed = True
 66 |                 elif func_name == "__scatterload_zeroinit":
 67 |                     # No need to further proceed for zero init.
 68 |                     if add_segment(dst, size, "DATA"):
 69 |                         memclr(dst, size)
 70 | 
 71 |                 ida_auto.auto_wait()
 72 | 
 73 | 
 74 | def add_segment(ea, size, seg_class, debug=False):
 75 |     # align page size
 76 |     ea = ea & 0xFFFFF000
 77 |     end_ea = ea + size
 78 |     is_changed = False
 79 |     if ea == 0:
 80 |         return False
 81 |     while ea < end_ea:
 82 |         cur_seg = ida_segment.getseg(ea)
 83 |         next_seg = ida_segment.get_next_seg(ea)
 84 | 
 85 |         if debug:
 86 |             print("=" * 30)
 87 |             if cur_seg:
 88 |                 print("cur_seg: %x - %x" % (cur_seg.start_ea, cur_seg.end_ea))
 89 |             if next_seg:
 90 |                 print("next_seg: %x - %x" % (next_seg.start_ea, next_seg.end_ea))
 91 |             print("new_seg: %x - %x" % (ea, end_ea))
 92 | 
 93 |         # if there is no segment, so create new segment
 94 |         if not cur_seg:
 95 |             if not next_seg:
 96 |                 ida_segment.add_segm(0, ea, end_ea, "", seg_class)
 97 |                 is_changed = True
 98 |                 break
 99 | 
100 |             # if next_seg exists
101 |             if end_ea <= next_seg.start_ea:
102 |                 ida_segment.add_segm(0, ea, end_ea, "", seg_class)
103 |                 is_changed = True
104 |                 break
105 | 
106 |             # end_ea > next_seg.start_ea, need to create more segments
107 |             ida_segment.add_segm(0, ea, next_seg.start_ea, "", seg_class)
108 | 
109 |         # if segment already exists, we extend current segment
110 |         else:
111 |             if end_ea <= cur_seg.end_ea:
112 |                 break
113 | 
114 |             if not next_seg:
115 |                 ida_segment.set_segm_end(ea, end_ea, 0)
116 |                 ida_segment.set_segm_class(cur_seg, seg_class)
117 |                 is_changed = True
118 |                 break
119 | 
120 |             # if next_seg exists
121 |             if end_ea <= next_seg.start_ea:
122 |                 ida_segment.set_segm_end(ea, end_ea, 0)
123 |                 ida_segment.set_segm_class(cur_seg, seg_class)
124 |                 is_changed = True
125 |                 break
126 | 
127 |             # end_ea > next_seg.start_ea, need to create more segments
128 |             if cur_seg.end_ea < next_seg.start_ea:
129 |                 ida_segment.set_segm_end(ea, next_seg.start_ea, 0)
130 |                 ida_segment.set_segm_class(cur_seg, seg_class)
131 |                 is_changed = True
132 | 
133 |         ea = next_seg.start_ea
134 | 
135 |     return is_changed
136 | 
137 | 
138 | # TODO: search only newly created segments.
139 | def create_func_by_prefix(func_name, prefix, force=False):
140 |     addrs = []
141 |     start_addr = 0
142 |     func_addr = 0
143 |     while func_addr != idc.BADADDR:
144 |         func_addr = ida_search.find_binary(
145 |             start_addr, idc.BADADDR, prefix, 16, idc.SEARCH_DOWN
146 |         )
147 |         if func_addr == idc.BADADDR:
148 |             break
149 | 
150 |         # already existing function but it is not the right prefix
151 |         addr = idc.get_func_attr(func_addr, idc.FUNCATTR_START)
152 |         if addr != idc.BADADDR and func_addr != addr:
153 |             if not force:
154 |                 start_addr = func_addr + 4
155 |                 continue
156 | 
157 |             idc.del_func(addr)
158 |             idc.del_items(func_addr)
159 | 
160 |         # add_func is not applied to the existing function
161 |         idc.add_func(func_addr)
162 | 
163 |         func_name = set_entry_name(func_addr, func_name)
164 |         print("%s: 0x%x" % (func_name, func_addr))
165 | 
166 |         addrs.append(func_addr)
167 |         start_addr = func_addr + 4
168 | 
169 |     return addrs
170 | 
171 | 
172 | def find_scatter_funcs():
173 |     scatter_func_bytes = {
174 |         "__scatterload": [
175 |             "2C 00 8F E2 00 0C 90 E8 00 A0 8A E0 00 B0 8B E0",  # For 5G
176 |             "0A A0 90 E8 00 0C 82 44",
177 |         ],
178 |         "__scatterload_copy": [
179 |             "10 20 52 E2 78 00 B0 28",  # For 5G
180 |             "10 3A 24 BF 78 C8 78 C1 FA D8 52 07",
181 |         ],
182 |         "__scatterload_decompress": [
183 |             "02 20 81 E0 00 C0 A0 E3 01 30 D0 E4",  # For 5G
184 |             "0A 44 10 F8 01 4B 14 F0 0F 05 08 BF 10 F8 01 5B",
185 |             "0A 44 4F F0 00 0C 10 F8 01 3B 13 F0 07 04 08 BF",
186 |         ],
187 |         "__scatterload_decompress2": [
188 |             "10 F8 01 3B 0A 44 13 F0 03 04 08 BF 10 F8 01 4B",
189 |         ],
190 |         "__scatterload_zeroinit": [
191 |             "00 30 B0 E3 00 40 B0 E3 00 50 B0 E3 00 60 B0 E3",  # For 5G
192 |             "00 23 00 24 00 25 00 26 10 3A 28 BF 78 C1 FB D8",
193 |         ],
194 |     }
195 | 
196 |     funcs = {}
197 |     for name, prefixes in scatter_func_bytes.items():
198 |         for prefix in prefixes:
199 |             addrs = create_func_by_prefix(name, prefix, force=True)
200 |             for addr in addrs:
201 |                 if addr != idc.BADADDR:
202 |                     funcs[addr] = name
203 | 
204 |     return funcs
205 | 
206 | 
207 | def find_scatter_table():
208 |     scatter_load_bytes = {
209 |         "__scatterload": [
210 |             "0A A0 90 E8 00 0C 82 44",
211 |             "2C 00 8F E2 00 0C 90 E8 00 A0 8A E0 00 B0 8B E0",  # For 5G
212 |         ],
213 |     }
214 | 
215 |     tables = {}
216 |     for name, prefixes in scatter_load_bytes.items():
217 |         for prefix in prefixes:
218 |             addrs = create_func_by_prefix(name, prefix, force=True)
219 |             for addr in addrs:
220 |                 if addr == idc.BADADDR:
221 |                     continue
222 | 
223 |                 offset_addr = idc.get_operand_value(addr, 1)
224 |                 if offset_addr == -1:
225 |                     old_flag = idc.get_sreg(addr, "T")
226 |                     idc.split_sreg_range(addr, "T", not old_flag, idc.SR_user)
227 |                     offset_addr = idc.get_operand_value(addr, 1)
228 | 
229 |                 offset = ida_bytes.get_dword(offset_addr)
230 |                 offset2 = ida_bytes.get_dword(offset_addr + 4)
231 |                 start = (offset + offset_addr) & 0xFFFFFFFF
232 |                 end = (offset2 + offset_addr) & 0xFFFFFFFF
233 |                 if not idc.is_loaded(start):
234 |                     continue
235 | 
236 |                 tables[start] = end
237 |                 print("__scatter_table: 0x%x -> 0x%x" % (start, end))
238 |                 func_name = set_entry_name(start, "__scatter_table")
239 | 
240 |     return tables
241 | 
242 | 
243 | def memcpy(src, dst, length):
244 |     if length == 0:
245 |         return
246 |     data = ida_bytes.get_bytes(src, length)
247 |     ida_bytes.put_bytes(dst, data)
248 | 
249 | 
250 | def memclr(dst, length):
251 |     if length == 0:
252 |         return
253 |     data = b"\x00" * length
254 |     ida_bytes.put_bytes(dst, data)
255 | 
256 | 
257 | def decomp(src, dst, length):
258 |     # print("decomp 0x%X 0x%X (0x%x)"%(src, dst, length))
259 |     end = dst + length
260 |     while True:
261 |         meta = ida_bytes.get_byte(src)
262 |         src += 1
263 |         l = meta & 7
264 |         if l == 0:
265 |             l = ida_bytes.get_byte(src)
266 |             src += 1
267 |         l2 = meta >> 4
268 |         if l2 == 0:
269 |             l2 = ida_bytes.get_byte(src)
270 |             src += 1
271 |         # print("meta: 0x%x l: 0x%X l2: 0x%x"%(meta,l,l2))
272 |         # copy l byte
273 |         memcpy(src, dst, l - 1)
274 |         src += l - 1
275 |         dst += l - 1
276 |         if meta & 8:
277 |             off = ida_bytes.get_byte(src)
278 |             src += 1
279 |             for i in range(l2 + 2):
280 |                 memcpy(dst - off, dst, 1)
281 |                 dst += 1
282 |         else:
283 |             memclr(dst, l2)
284 |             dst += l2
285 |         if dst >= end:
286 |             assert dst == end, "Decompress failed"
287 |             # print('decomp end %0x %0x'%(dst, end))
288 |             break
289 | 
290 | 
291 | def decomp2(src, dst, length):
292 |     # print("decomp 0x%X 0x%X (0x%x)"%(src, dst, length))
293 |     meta = ida_bytes.get_byte(src)
294 |     src += 1
295 |     end = dst + length
296 |     while True:
297 |         l = meta & 3
298 |         if l == 0:
299 |             l = ida_bytes.get_byte(src)
300 |             src += 1
301 | 
302 |         l2 = meta >> 4
303 |         if l2 == 0:
304 |             l2 = ida_bytes.get_byte(src)
305 |             src += 1
306 |         # print("meta: 0x%x l: 0x%X l2: 0x%x"%(meta,l,l2))
307 |         # copy l byte
308 |         memcpy(src, dst, l - 1)
309 |         src += l - 1
310 |         dst += l - 1
311 | 
312 |         if l2:
313 |             off = ida_bytes.get_byte(src)
314 |             src += 1
315 |             meta_val = meta & 0xC
316 |             src_ptr = dst - off
317 |             if meta_val == 12:
318 |                 meta_val = ida_bytes.get_byte(src)
319 |                 src += 1
320 |                 src_ptr -= 256 * meta_val
321 | 
322 |             else:
323 |                 src_ptr -= 64 * meta_val
324 | 
325 |             l2 += 2
326 |             memcpy(src_ptr, dst, l2)
327 |             dst += l2
328 | 
329 |         meta = ida_bytes.get_byte(src)
330 |         src += 1
331 | 
332 |         if dst >= end:
333 |             assert dst == end, "Decompress failed"
334 |             # print('decomp end %0x %0x'%(dst, end))
335 |             break
336 | 


--------------------------------------------------------------------------------
/basespec/slicer.py:
--------------------------------------------------------------------------------
  1 | import idc
  2 | import idautils
  3 | import idaapi
  4 | 
  5 | import ida_bytes
  6 | import ida_ua
  7 | import ida_funcs
  8 | import ida_idp
  9 | import ida_xref
 10 | import ida_segment
 11 | 
 12 | import re
 13 | 
 14 | from .utils import is_thumb
 15 | 
 16 | 
 17 | def get_reg(op):
 18 |     return ida_idp.get_reg_name(op.reg, 0)
 19 | 
 20 | 
 21 | def get_regs(ea):
 22 |     if is_thumb(ea):
 23 |         # 1  --- 1 (LSB)
 24 |         # R7 --- R0
 25 |         reg_bits = ida_bytes.get_word(ea) & 0x1FF
 26 |         reg_list = ["R{0}".format(idx) for idx in range(8)]
 27 |         reg_list.append("LR")
 28 |         # reg_list.extend(['SP', 'LR'])
 29 |         # TODO: add 32 bit Thumb handling
 30 |     else:
 31 |         # 1   --- 1 (LSB)
 32 |         # R12 --- R0
 33 |         reg_bits = ida_bytes.get_word(ea) & 0xFFFF
 34 |         reg_list = ["R{0}".format(idx) for idx in range(13)]
 35 |         reg_list.extend(["SP", "LR", "PC"])
 36 | 
 37 |     regs = []
 38 |     idx = 0
 39 |     while reg_bits:
 40 |         if reg_bits & 0x1:
 41 |             regs.append(reg_list[idx])
 42 |         reg_bits = reg_bits >> 1
 43 |         idx += 1
 44 | 
 45 |     return regs
 46 | 
 47 | 
 48 | def merge_op_vals(val1, val2, operator):
 49 |     values = set()
 50 |     for x in val1:
 51 |         for y in val2:
 52 |             values.add(operator(x, y))
 53 | 
 54 |     return values
 55 | 
 56 | 
 57 | class SimpleForwardSlicer(object):
 58 |     def __init__(self):
 59 |         self.visited = set()
 60 |         self.values = dict()
 61 |         self.inter = False
 62 |         self.init()
 63 | 
 64 |     def init(self):
 65 |         self.memory = dict()
 66 |         self.regs = dict()
 67 |         self.func_start = None
 68 | 
 69 |         # initialize stack
 70 |         self.regs["SP"] = 0x100000000
 71 | 
 72 |     def run(self, start_ea, end_ea=idc.BADADDR, end_cnt=100):
 73 |         self.init()
 74 | 
 75 |         # not in code segment
 76 |         ea = start_ea
 77 | 
 78 |         func_start = idc.get_func_attr(ea, idc.FUNCATTR_START)
 79 |         if func_start == idc.BADADDR:
 80 |             return
 81 | 
 82 |         self.func_start = func_start
 83 | 
 84 |         cnt = 0
 85 |         while True:
 86 |             if ea in self.visited:
 87 |                 break
 88 | 
 89 |             self.visited.add(ea)
 90 |             # break if ea is out of the original function
 91 |             # TODO: Add inter-procedural
 92 |             if idc.get_func_attr(ea, idc.FUNCATTR_START) != func_start:
 93 |                 break
 94 | 
 95 |             if ea == end_ea:
 96 |                 break
 97 | 
 98 |             if end_ea == idc.BADADDR and cnt >= end_cnt:
 99 |                 break
100 | 
101 |             # there may exist data section
102 |             mnem = ida_ua.ua_mnem(ea)
103 |             if not ida_ua.can_decode(ea) or not mnem:
104 |                 break
105 | 
106 |             if mnem.startswith("B"):
107 |                 ea = idc.get_operand_value(ea, 0)
108 | 
109 |             elif mnem.startswith("POP"):
110 |                 break
111 | 
112 |             else:
113 |                 if not self.run_helper(ea):
114 |                     # print("%x: something wrong: %s" % (ea, idc.GetDisasm(ea)))
115 |                     break
116 | 
117 |                 ea = ida_xref.get_first_cref_from(ea)
118 | 
119 |             cnt += 1
120 | 
121 |     def fetch_value(self, op1, op2, op3, op4, operator=None):
122 |         value = None
123 |         value2 = None
124 |         value3 = None
125 |         value4 = None
126 | 
127 |         # fetch register value
128 |         if op2.type == ida_ua.o_reg:
129 |             if get_reg(op2) not in self.regs:
130 |                 return
131 | 
132 |             value = self.regs[get_reg(op2)]
133 | 
134 |             # More than two arguments
135 |             if op3 and op3.type != ida_ua.o_void:
136 |                 # ADD R0, R1, R2
137 |                 if op3.type == ida_ua.o_reg:
138 |                     if get_reg(op3) not in self.regs:
139 |                         return
140 | 
141 |                     value3 = self.regs[get_reg(op3)]
142 | 
143 |                 # immediate value, we get the value right away
144 |                 # ADD R0, R1, #123
145 |                 elif op3.type == ida_ua.o_imm:
146 |                     value3 = op3.value
147 | 
148 |                 # o_idaspec0-5
149 |                 # ADD R0, R1, R2,LSL#2
150 |                 # processor specific type 'LSL'
151 |                 elif op3.type == ida_ua.o_idpspec0:
152 |                     if get_reg(op3) not in self.regs:
153 |                         return
154 | 
155 |                     value3 = self.regs[get_reg(op3)] << op3.value
156 | 
157 |                 else:
158 |                     # TODO: currently not implemented
159 |                     # print("unknown operand type: %d" % (op3.type))
160 |                     # raise NotImplemented
161 |                     return
162 | 
163 |                 # Handle arithmetic operator
164 |                 assert operator is not None
165 |                 assert value3 is not None
166 |                 value = operator(value, value3) & 0xFFFFFFFF
167 | 
168 |             # MLA R0, R1, R2, R3
169 |             if op4 and op4.type != ida_ua.o_void:
170 |                 if op4.type == ida_ua.o_reg:
171 |                     if get_reg(op4) not in self.regs:
172 |                         return
173 | 
174 |                     value4 = self.regs[get_reg(op4)]
175 | 
176 |                 # Handle arithmetic operator
177 |                 assert operator is not None
178 |                 assert value3 is not None
179 |                 value = operator(value, value4) & 0xFFFFFFFF
180 | 
181 |         # in the stack.
182 |         # o_displ = [Base Reg + Displacement]
183 |         elif op2.type == ida_ua.o_displ:
184 |             if get_reg(op2) not in self.regs:
185 |                 return
186 | 
187 |             value = self.regs[get_reg(op2)] + op2.addr
188 | 
189 |         # reference the memory and get the value written in the memory
190 |         elif op2.type == ida_ua.o_mem:
191 |             value = op2.addr
192 | 
193 |         # immediate value, we get the value right away
194 |         elif op2.type == ida_ua.o_imm:
195 |             value = op2.value
196 | 
197 |         # o_phrase = [Base Reg + Index Reg + Displacement]
198 |         elif op2.type == ida_ua.o_phrase:
199 |             assert op3 is not None
200 | 
201 |             if get_reg(op3) not in self.regs:
202 |                 return
203 | 
204 |             if get_reg(op2) not in self.regs:
205 |                 return
206 | 
207 |             value2 = self.regs[get_reg(op2)]
208 |             value3 = self.regs[get_reg(op3)]
209 |             value = value2 + value3 + op2.phrase
210 | 
211 |         return value
212 | 
213 |     def run_helper(self, ea):
214 |         # there may exist data section
215 |         mnem = ida_ua.ua_mnem(ea)
216 |         if not ida_ua.can_decode(ea) or not mnem:
217 |             return
218 | 
219 |         # we need to check at most 4 operands
220 |         insn = ida_ua.insn_t()
221 |         inslen = ida_ua.decode_insn(insn, ea)
222 |         op1 = insn.ops[0]
223 |         op2 = insn.ops[1]
224 |         op3 = insn.ops[2]
225 |         op4 = insn.ops[3]
226 | 
227 |         if any(mnem.startswith(word) for word in ["PUSH", "POP"]):
228 |             # TODO: implement this properly
229 |             return True
230 | 
231 |         elif any(
232 |             mnem.startswith(word)
233 |             for word in ["MOV", "LDR", "ADR", "STR", "ADD", "SUB", "MUL"]
234 |         ):
235 |             assert op2 is not None
236 | 
237 |             if mnem.startswith("ADD"):
238 |                 operator = lambda x1, x2: x1 + x2
239 |             elif mnem.startswith("SUB"):
240 |                 operator = lambda x1, x2: x1 - x2
241 |             elif mnem.startswith("MUL"):
242 |                 operator = lambda x1, x2: x1 * x2
243 |             else:
244 |                 operator = None
245 | 
246 |             value = self.fetch_value(op1, op2, op3, op4, operator)
247 |             if value is None:
248 |                 return
249 | 
250 |             value = value & 0xFFFFFFFF
251 | 
252 |             if mnem.startswith("MOV"):
253 |                 self.regs[get_reg(op1)] = value
254 | 
255 |             elif mnem.startswith("LDR") or mnem.startswith("ADR"):
256 |                 if mnem.startswith("LDR"):
257 |                     assert op2.type in [ida_ua.o_displ, ida_ua.o_mem, ida_ua.o_phrase]
258 | 
259 |                     if value in self.memory:
260 |                         value = self.memory[value]
261 | 
262 |                     else:
263 |                         seg = ida_segment.getseg(value)
264 |                         if seg == idc.BADADDR:
265 |                             return
266 | 
267 |                         value = ida_bytes.get_dword(value)
268 | 
269 |                 elif mnem.startswith("ADR"):
270 |                     assert op2.type == ida_ua.o_imm
271 | 
272 |                 self.regs[get_reg(op1)] = value
273 | 
274 |             elif mnem.startswith("STR"):
275 |                 assert op2.type in [ida_ua.o_displ, ida_ua.o_mem, ida_ua.o_phrase]
276 |                 if get_reg(op1) not in self.regs:
277 |                     return
278 | 
279 |                 self.memory[value] = self.regs[get_reg(op1)]
280 | 
281 |             elif any(mnem.startswith(word) for word in ["ADD", "SUB", "MUL"]):
282 |                 if op2.type == ida_ua.o_imm:
283 |                     if get_reg(op1) not in self.regs:
284 |                         return
285 | 
286 |                     value = operator(self.regs[get_reg(op1)], value)
287 | 
288 |                 self.regs[get_reg(op1)] = value
289 | 
290 |             return True
291 | 
292 |         else:
293 |             # Skip unknown instructions
294 |             return True
295 | 
296 |         # This should not be reached.
297 |         print(hex(ea), idc.GetDisasm(ea))
298 |         assert False
299 | 
300 | 
301 | class SimpleBackwardSlicer(object):
302 |     def __init__(self):
303 |         self.visited = set()
304 |         self.values = dict()
305 |         self.memory = dict()
306 |         self.stack = dict()
307 |         self.func_start = None
308 |         self.inter = False
309 | 
310 |     def find_reg_value(
311 |         self, ea, reg_name, end_ea=idc.BADADDR, inter=False, end_cnt=100
312 |     ):
313 |         # not in code segment
314 |         func_start = idc.get_func_attr(ea, idc.FUNCATTR_START)
315 |         if func_start == idc.BADADDR:
316 |             return
317 | 
318 |         self.func_start = func_start
319 |         self.inter = inter
320 | 
321 |         return self.find_reg_value_helper(ea, reg_name, end_ea, end_cnt)
322 | 
323 |     def find_reg_value_helper(self, ea, reg_name, end_ea, end_cnt, offset=None):
324 |         if (ea, reg_name) in self.values:
325 |             return self.values[(ea, reg_name)]
326 | 
327 |         if end_ea != idc.BADADDR and ea < end_ea:
328 |             return
329 | 
330 |         if end_cnt == 0:
331 |             return
332 | 
333 |         # not in code segment
334 |         func_addr = idc.get_func_attr(ea, idc.FUNCATTR_START)
335 |         if func_addr == idc.BADADDR:
336 |             return
337 | 
338 |         # out of current function
339 |         if not self.inter and func_addr != self.func_start:
340 |             return
341 | 
342 |         # there may exist data section
343 |         mnem = ida_ua.ua_mnem(ea)
344 |         if not ida_ua.can_decode(ea) or not mnem:
345 |             return
346 | 
347 |         # we need to check at most 4 operands
348 |         insn = ida_ua.insn_t()
349 |         inslen = ida_ua.decode_insn(insn, ea)
350 |         op1 = insn.ops[0]
351 |         op2 = insn.ops[1]
352 |         op3 = insn.ops[2]
353 |         op4 = insn.ops[3]
354 | 
355 |         if any(mnem.startswith(word) for word in ["MOV", "LDR"]):
356 |             assert op2 is not None
357 | 
358 |             # first argument should be reg_name
359 |             if get_reg(op1) != reg_name:
360 |                 return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset)
361 | 
362 |             # follow new register
363 |             if op2.type == ida_ua.o_reg:
364 |                 if get_reg(op2) == "SP":
365 |                     offset = 0
366 |                 return self.proceed_backward(
367 |                     ea, get_reg(op2), end_ea, end_cnt - 1, offset
368 |                 )
369 | 
370 |             # in the stack. need to check when the value is stored
371 |             # o_displ = [Base Reg + Index Reg + Displacement]
372 |             elif op2.type == ida_ua.o_displ:
373 |                 values = self.proceed_backward(
374 |                     ea, get_reg(op2), end_ea, end_cnt - 1, op2.addr
375 |                 )
376 |                 values = set(filter(lambda x: x, values))
377 |                 if mnem.startswith("LDR"):
378 |                     return set(map(lambda x: ida_bytes.get_dword(x), values))
379 |                 else:
380 |                     return values
381 | 
382 |             # reference the memory and get the value written in the memory
383 |             elif op2.type == ida_ua.o_mem:
384 |                 # TODO: implement memory access
385 | 
386 |                 # we assume that this memory is not initialized.
387 |                 if mnem.startswith("LDR"):
388 |                     return set([ida_bytes.get_dword(op2.addr)])
389 |                 else:
390 |                     return set([op2.addr])
391 | 
392 |             # immediate value, we get the value right away
393 |             elif op2.type == ida_ua.o_imm:
394 |                 return set([op2.value])
395 | 
396 |             elif op2.type == ida_ua.o_phrase:
397 |                 assert mnem.startswith("LDR")
398 | 
399 |                 phrase_val = self.proceed_backward(
400 |                     ea, get_reg(op3), end_ea, end_cnt - 1, offset
401 |                 )
402 |                 if not phrase_val:
403 |                     return
404 | 
405 |                 op2_val = self.proceed_backward(
406 |                     ea, get_reg(op2), end_ea, end_cnt - 1, offset
407 |                 )
408 |                 if not op2_val:
409 |                     return
410 | 
411 |                 operator = lambda x1, x2: x1 + x2
412 |                 values = merge_op_vals(op2_val, phrase_val, operator)
413 | 
414 |                 return set(map(lambda x: ida_bytes.get_dword(x + op2.phrase), values))
415 | 
416 |             return
417 | 
418 |         # only checks stored stacks
419 |         elif any(mnem.startswith(word) for word in ["STR"]):
420 |             assert op2 is not None
421 | 
422 |             if op3 and op3.type != ida_ua.o_void:
423 |                 target_op = op3
424 |             else:
425 |                 target_op = op2
426 | 
427 |             # arguments should include reg_name
428 |             if get_reg(target_op) != reg_name:
429 |                 return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset)
430 | 
431 |             # in the stack. need to check when the value is stored
432 |             # o_displ = [Base Reg + Index Reg + Displacement]
433 |             if target_op.type == ida_ua.o_displ:
434 |                 target_memory = self.stack
435 | 
436 |             # we assume that memory is not initialized.
437 |             # reference the memory and get the value written in the memory
438 |             elif target_op.type == ida_ua.o_mem:
439 |                 assert get_reg(target_op) != "SP"
440 |                 target_memory = self.memory
441 | 
442 |             else:
443 |                 return
444 | 
445 |             if target_op == op2:
446 |                 if target_op.addr == offset:
447 |                     self.stack[target_op.addr] = self.proceed_backward(
448 |                         ea, get_reg(op1), end_ea, end_cnt - 1
449 |                     )
450 |                     return self.stack[target_op.addr]
451 |             else:
452 |                 if target_op.addr == offset:
453 |                     self.stack[target_op.addr] = self.proceed_backward(
454 |                         ea, get_reg(op1), end_ea, end_cnt - 1
455 |                     )
456 |                     return self.stack[target_op.addr]
457 |                 elif target_op.addr + 4 == offset:
458 |                     self.stack[target_op.addr + 4] = self.proceed_backward(
459 |                         ea, get_reg(op2), end_ea, end_cnt - 1
460 |                     )
461 |                     return self.stack[target_op.addr + 4]
462 | 
463 |             return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset)
464 | 
465 |         elif any(mnem.startswith(word) for word in ["ADD", "SUB", "MUL"]):
466 |             assert op2 is not None
467 | 
468 |             if mnem.startswith("ADD"):
469 |                 operator = lambda x1, x2: x1 + x2
470 |             elif mnem.startswith("SUB"):
471 |                 operator = lambda x1, x2: x1 - x2
472 |             elif mnem.startswith("MUL"):
473 |                 operator = lambda x1, x2: x1 * x2
474 | 
475 |             # TODO: Handle stack variable
476 |             # Check how to follow below.
477 |             # STR R5, [SP #8]
478 |             # STR R4, [SP #4]
479 |             # ADD R3, SP, #4
480 |             # ADD R2, R3, #4
481 |             if get_reg(op1) != reg_name:
482 |                 return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset)
483 | 
484 |             # Two arguments
485 |             if not op3 or op3.type == ida_ua.o_void:
486 |                 if op2.type == ida_ua.o_reg:
487 |                     op1_val = self.proceed_backward(
488 |                         ea, reg_name, end_ea, end_cnt - 1, offset
489 |                     )
490 |                     if not op1_val:
491 |                         return
492 | 
493 |                     op2_val = self.proceed_backward(
494 |                         ea, get_reg(op2), end_ea, end_cnt - 1
495 |                     )
496 |                     if not op2_val:
497 |                         return
498 | 
499 |                     return merge_op_vals(op1_val, op2_val, operator)
500 | 
501 |                 elif op2.type == ida_ua.o_imm:
502 |                     op1_val = self.proceed_backward(
503 |                         ea, reg_name, end_ea, end_cnt - 1, offset
504 |                     )
505 |                     return set(map(lambda x: operator(x, op2.value), op1_val))
506 | 
507 |                 else:
508 |                     return
509 | 
510 |             if op2.type != ida_ua.o_reg:
511 |                 # This should not be reached.
512 |                 print(hex(ea), idc.GetDisasm(ea), reg_name, op2.type)
513 |                 assert False
514 | 
515 |             # More than three arguments
516 |             # follow new register
517 |             # ADD R0, R1, R2
518 |             if op3.type == ida_ua.o_reg:
519 |                 op2_val = self.proceed_backward(
520 |                     ea, get_reg(op2), end_ea, end_cnt - 1, offset
521 |                 )
522 |                 # if we cannot fetch the value, stop the analysis
523 |                 if not op2_val:
524 |                     return
525 | 
526 |                 op3_val = self.proceed_backward(
527 |                     ea, get_reg(op3), end_ea, end_cnt - 1, offset
528 |                 )
529 |                 # if we cannot fetch the value, stop the analysis
530 |                 if not op3_val:
531 |                     return
532 | 
533 |                 # MLA R0, R1, R2, R3
534 |                 if op4 and op4.type == ida_ua.o_reg:
535 |                     op4_val = self.proceed_backward(
536 |                         ea, get_reg(op4), end_ea, end_cnt - 1, offset
537 |                     )
538 |                     if not op4_val:
539 |                         return
540 | 
541 |                     return merge_op_vals(
542 |                         merge_op_vals(op2_val, op3_val, operator), op4_val, operator
543 |                     )
544 | 
545 |                 return merge_op_vals(op2_val, op3_val, operator)
546 | 
547 |             # immediate value, we get the value right away
548 |             # ADD R0, R1, #123
549 |             elif op3.type == ida_ua.o_imm:
550 |                 return self.proceed_backward(
551 |                     ea, get_reg(op2), end_ea, end_cnt - 1, operator(0, op3.value)
552 |                 )
553 | 
554 |             # ADD R0, R1, R2,LSL#2
555 |             # o_idaspec0~5
556 |             elif op3.type == ida_ua.o_idpspec0:
557 |                 # processor specific type 'LSL'
558 |                 op3_val = self.proceed_backward(
559 |                     ea, get_reg(op3), end_ea, end_cnt - 1, offset
560 |                 )
561 |                 # if we cannot fetch the value, stop the analysis
562 |                 if not op3_val:
563 |                     return
564 |                 op3_val = set(map(lambda x: x << op3.value, op3_val))
565 |                 op2_val = self.proceed_backward(
566 |                     ea, get_reg(op2), end_ea, end_cnt - 1, offset
567 |                 )
568 | 
569 |                 return merge_op_vals(op2_val, op3_val, operator)
570 | 
571 |             else:
572 |                 return
573 | 
574 |         else:
575 |             return self.proceed_backward(ea, reg_name, end_ea, end_cnt - 1, offset)
576 | 
577 |     def proceed_backward(self, ea, reg_name, end_ea, end_cnt, offset=None):
578 |         # initialize prev code points
579 |         values = set()
580 |         xref = ida_xref.get_first_cref_to(ea)
581 |         while xref and xref != idc.BADADDR:
582 |             tmp_values = self.find_reg_value_helper(
583 |                 xref, reg_name, end_ea, end_cnt, offset
584 |             )
585 |             if tmp_values:
586 |                 tmp_values = list(map(lambda x: x & 0xFFFFFFFF, tmp_values))
587 |                 values.update(tmp_values)
588 | 
589 |             xref = ida_xref.get_next_cref_to(ea, xref)
590 | 
591 |         self.values[(ea, reg_name)] = values
592 | 
593 |         return values
594 | 
595 | 
596 | def find_mnem(target_ea, target_mnem, backward=False, threshold=0x100):
597 |     assert target_ea is not None
598 |     ea = target_ea
599 |     addr = None
600 |     visited = set()
601 |     while True:
602 |         mnem = ida_ua.ua_mnem(ea)
603 |         if not ida_ua.can_decode(ea) or not mnem:
604 |             break
605 |         if mnem == target_mnem:
606 |             addr = ea
607 |             break
608 | 
609 |         visited.add(ea)
610 | 
611 |         if backward:
612 |             next_ea = ida_xref.get_first_cref_to(ea)
613 |             if next_ea < target_ea - 0x100:
614 |                 break
615 | 
616 |             while next_ea != idc.BADADDR and next_ea in visited:
617 |                 next_ea = ida_xref.get_next_cref_to(ea, next_ea)
618 | 
619 |         else:
620 |             next_ea = ida_xref.get_first_cref_from(ea)
621 |             if next_ea > target_ea + 0x100:
622 |                 break
623 | 
624 |             while next_ea != idc.BADADDR and next_ea in visited:
625 |                 next_ea = ida_xref.get_next_cref_from(ea, next_ea)
626 | 
627 |         ea = next_ea
628 | 
629 |     return addr
630 | 
631 | 
632 | def fetch_arg_one(ea, reg_name, end_ea=idc.BADADDR, end_cnt=100):
633 |     slicer = SimpleBackwardSlicer()
634 |     values = slicer.find_reg_value(ea, reg_name, end_ea=end_ea, end_cnt=end_cnt)
635 |     if not values:
636 |         return idc.BADADDR
637 | 
638 |     return values.pop()
639 | 
640 | 
641 | def find_args(ea, num_regs, limit=10):
642 |     registers = ["R%d" % (i) for i in range(num_regs)]
643 |     return [fetch_arg_one(ea, reg, end_cnt=limit) for reg in registers]
644 | 


--------------------------------------------------------------------------------
/basespec/spec/24008-f80.doc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/24008-f80.doc


--------------------------------------------------------------------------------
/basespec/spec/24011-f30.doc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/24011-f30.doc


--------------------------------------------------------------------------------
/basespec/spec/24080-f10.doc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/24080-f10.doc


--------------------------------------------------------------------------------
/basespec/spec/24301-f80.doc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/24301-f80.doc


--------------------------------------------------------------------------------
/basespec/spec/44018-f50.doc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/44018-f50.doc


--------------------------------------------------------------------------------
/basespec/spec/ts_124008v150800p.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/ts_124008v150800p.pdf


--------------------------------------------------------------------------------
/basespec/spec/ts_124301v150800p.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/basespec/spec/ts_124301v150800p.pdf


--------------------------------------------------------------------------------
/basespec/structs/l3msg.py:
--------------------------------------------------------------------------------
 1 | class IeInfo:
 2 |     def __init__(self, msg_type, name, iei, min, max, imperative):
 3 |         self.type = msg_type
 4 |         self.name = name
 5 |         self.iei = iei
 6 |         self.min = min
 7 |         self.max = max
 8 |         self.imperative = imperative
 9 | 
10 |     def __repr__(self):
11 |         if self.imperative:
12 |             imper = 'imperative'
13 |         else:
14 |             imper = 'non-imperative'
15 |         res = "<IE {} (0x{:02X}, 0x{:02X}) {}".format(imper, self.type, self.iei, self.name)
16 |         length = (
17 |             str(self.min)
18 |             if self.min == self.max
19 |             else "{}-{}".format(self.min, self.max)
20 |         )
21 |         res += " len: {}".format(length)
22 |         res += ">"
23 |         return res
24 | 
25 | 
26 | class L3MsgInfo:
27 |     def __init__(self, pd, msg_type, name, direction, ie_list):
28 |         self.pd = pd
29 |         self.type = msg_type
30 |         self.direction = direction
31 |         self.ie_list = ie_list  # A list of IeInfo instances.
32 |         self.ie_num = len(ie_list)
33 | 
34 |     def __repr__(self):
35 |         res = "L3Msg (0x{:02X})".format(self.type)
36 |         res += " {} {}".format(self.direction, self.ie_num)
37 |         for idx, ie in enumerate(self.ie_list):
38 |             res += "\n\t0x{:02x}: {}".format(idx, ie)
39 |         res += "\n"
40 |         return res
41 | 
42 | 
43 | class L3ProtInfo:
44 |     def __init__(self, pd, msg_list):
45 |         self.pd = pd
46 |         self.msg_list = msg_list  # A list of L3MsgInfo instances.
47 |         self.msg_num = len(msg_list)
48 | 
49 |     def __repr__(self):
50 |         res = "L3Prot ({}) {} msg".format(self.pd, len(self.msg_list))
51 |         return res
52 | 


--------------------------------------------------------------------------------
/basespec/utils.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import string
  3 | import re
  4 | import pickle
  5 | 
  6 | import idc
  7 | import ida_ua
  8 | import ida_segment
  9 | import ida_bytes
 10 | 
 11 | STR_CHARS = string.ascii_letters + string.digits + "_ "
 12 | STR_CHARS = STR_CHARS.encode()
 13 | STR_COMMENT_CHARS = string.ascii_letters + string.punctuation + string.digits + " \t\n"
 14 | STR_COMMENT_CHARS = STR_COMMENT_CHARS.encode()
 15 | FUNCNAME_CHARS = string.ascii_letters + "_" + string.digits
 16 | FUNCNAME_CHARS = FUNCNAME_CHARS.encode()
 17 | 
 18 | def create_string(ea, length=idc.BADADDR):
 19 |     s = get_string(ea, length)
 20 |     if s:
 21 |         idc.create_strlit(ea, ea + len(s))
 22 | 
 23 |     return s
 24 | 
 25 | # ida's get_strlit_contents filters 0x1d, 0x9, 0x6, etc.
 26 | def get_string(ea, length=idc.BADADDR):
 27 |     end_ea = ea + length
 28 |     ret = []
 29 | 
 30 |     while ea < end_ea:
 31 |         # break if current ea is already assigned
 32 |         if not idc.is_loaded(ea):
 33 |             break
 34 | 
 35 |         byte = ida_bytes.get_byte(ea)
 36 |         if byte == 0:  # NULL terminate
 37 |             break
 38 | 
 39 |         ret.append(byte)
 40 |         ea += 1
 41 | 
 42 |     return bytes(ret)
 43 | 
 44 | def check_string(ea, length=idc.BADADDR):
 45 |     global STR_CHARS, STR_COMMENT_CHARS
 46 | 
 47 |     if isinstance(ea, int):
 48 |         s = get_string(ea, length)
 49 |     elif isinstance(ea, str):
 50 |         s = ea.encode()
 51 |     elif isinstance(ea, bytes):
 52 |         s = ea
 53 | 
 54 |     if not s:
 55 |         return False
 56 | 
 57 |     # strict limit
 58 |     if len(s) < 4:
 59 |         return False
 60 | 
 61 |     #    # possible strings
 62 |     #    if len(s) < 10:
 63 |     #        if any(ch not in STR_CHARS for ch in s):
 64 |     #            return False
 65 | 
 66 |     # highly likely comments
 67 | 
 68 |     if any(ch not in STR_COMMENT_CHARS for ch in s):
 69 |         return False
 70 | 
 71 |     return True
 72 | 
 73 | 
 74 | def check_funcname(data_ptr, length=idc.BADADDR):
 75 |     global FUNCNAME_CHARS
 76 | 
 77 |     if isinstance(data_ptr, int):
 78 |         s = get_string(data_ptr, length)
 79 |     elif isinstance(data_ptr, str):
 80 |         s = data_ptr.encode()
 81 |     elif isinstance(data_ptr, bytes):
 82 |         s = data_ptr
 83 |     else:
 84 |         raise Exception
 85 | 
 86 |     if not s:
 87 |         return False
 88 | 
 89 |     if len(s) < 8:
 90 |         return False
 91 | 
 92 |     # function name would be less than 30 characters.
 93 |     if len(s) > 30:
 94 |         return False
 95 | 
 96 |     if s.upper() == s:
 97 |         return False
 98 | 
 99 |     if any(ch not in FUNCNAME_CHARS for ch in s):
100 |         return False
101 | 
102 |     # TODO: add other func name checks
103 |     return True
104 | 
105 | 
106 | # deprecated.
107 | def set_funcname(ea, name):
108 |     func_addr = idc.get_func_attr(ea, idc.FUNCATTR_START)
109 |     if func_addr == idc.BADADDR:
110 |         return
111 |     return set_entry_name(func_addr, name)
112 | 
113 | 
114 | def set_entry_name(ea, name):
115 |     cur_name = idc.get_name(ea)
116 |     if cur_name.startswith(name):
117 |         return cur_name
118 | 
119 |     name = check_name(name)
120 |     status = idc.set_name(ea, name)
121 |     if status:
122 |         return name
123 |     else:
124 |         return
125 | 
126 | 
127 | def is_name_exist(name):
128 |     addr = idc.get_name_ea_simple(name)
129 |     # if name already exists, we need to assign new name with suffix
130 |     if addr != idc.BADADDR:
131 |         return True
132 |     else:
133 |         return False
134 | 
135 | 
136 | def check_name(orig_name):
137 |     name = orig_name
138 |     idx = 1
139 |     while is_name_exist(name):
140 |         name = "%s_%d" % (orig_name, idx)
141 |         idx += 1
142 | 
143 |     return name
144 | 
145 | def is_func(ea):
146 |     if ea == idc.BADADDR:
147 |         return False
148 | 
149 |     start_ea = idc.get_func_attr(ea, idc.FUNCATTR_START)
150 |     end_ea = idc.get_func_attr(ea, idc.FUNCATTR_END)
151 | 
152 |     return start_ea <= ea < end_ea
153 | 
154 | 
155 | def is_thumb(ea):
156 |     return idc.get_sreg(ea, "T") == 1
157 | 
158 | 


--------------------------------------------------------------------------------
/examples/ex_check_spec.py:
--------------------------------------------------------------------------------
 1 | from basespec.analyze_spec import check_spec
 2 | from basespec.structs.l3msg import IeInfo, L3MsgInfo, L3ProtInfo
 3 | 
 4 | # EMM protocol
 5 | pd = 7
 6 | 
 7 | # EMM attach accept message
 8 | msg_type = 0x42
 9 | 
10 | # Build a message
11 | # The information should be extracted from embedded message structures in the binary.
12 | IE_list = []
13 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True))
14 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True))
15 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=1, max=1, imperative=True))
16 | IE_list.append(IeInfo(msg_type, name="", iei=0, min=6, max=96, imperative=True))
17 | #IE_list.append(IeInfo(msg_type, name="", iei=0, min=0, max=32767, imperative=True)) #missing
18 | IE_list.append(IeInfo(msg_type, name="", iei=0x50, min=11, max=11, imperative=False))
19 | IE_list.append(IeInfo(msg_type, name="", iei=0x13, min=5, max=5, imperative=False))
20 | IE_list.append(IeInfo(msg_type, name="", iei=0x23, min=5, max=8, imperative=False))
21 | IE_list.append(IeInfo(msg_type, name="", iei=0x53, min=1, max=1, imperative=False))
22 | IE_list.append(IeInfo(msg_type, name="", iei=0x4A, min=1, max=99, imperative=False)) #invalid
23 | IE_list.append(IeInfo(msg_type, name="", iei=0xFF, min=5, max=5, imperative=False)) #unknown
24 | attach_accept_msg = L3MsgInfo(pd, msg_type, name="Attach accept", direction="DL", ie_list=IE_list)
25 | 
26 | # Build protocol
27 | EMM_prot = L3ProtInfo(pd, [attach_accept_msg])
28 | 
29 | l3_list = [EMM_prot]
30 | 
31 | # Compare with specification
32 | check_spec(l3_list, pd)
33 | 


--------------------------------------------------------------------------------
/examples/ex_get_spec_msgs.py:
--------------------------------------------------------------------------------
1 | from basespec import parse_spec
2 | spec_msgs = parse_spec.get_spec_msgs() # Format: msgs[pd][msg_type] = ie_list
3 | emm_msgs = spec_msgs[7] # 7 : the type of EPS Mobility Management
4 | smc_ie_list = emm_msgs[0x5d] # 0x5d : the type of SECURITY MODE COMMAND
5 | 


--------------------------------------------------------------------------------
/examples/ex_init_functions.py:
--------------------------------------------------------------------------------
1 | from basespec import preprocess
2 | preprocess.init_functions()
3 | preprocess.FUNC_BY_LS # identified functions by linear sweep prologue detection
4 | preprocess.FUNC_BY_LS_TIME # time spent for linear sweep prologue detection
5 | preprocess.FUNC_BY_PTR # identified functions by pointer analysis
6 | preprocess.FUNC_BY_PTR_TIME # time spent for pointer analysis
7 | 


--------------------------------------------------------------------------------
/examples/ex_init_strings.py:
--------------------------------------------------------------------------------
1 | from basespec import preprocess
2 | preprocess.init_strings()
3 | 


--------------------------------------------------------------------------------
/examples/ex_run_scatterload.py:
--------------------------------------------------------------------------------
1 | from basespec import scatterload
2 | scatterload.run_scatterload()
3 | 


--------------------------------------------------------------------------------
/load_ida.py:
--------------------------------------------------------------------------------
1 | #need to import here for IDA
2 | import basespec
3 | 


--------------------------------------------------------------------------------
/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SysSec-KAIST/BaseSpec/e027413148ce79f53bfdabb3bd5e6c2ffb291dcc/overview.png


--------------------------------------------------------------------------------