├── README ├── gpl.txt ├── ida2sql.cfg ├── ida2sql.py └── ida_to_sql ├── __init__.py ├── arch ├── __init__.py ├── arch.py ├── arm.py ├── metapc.py └── ppc.py ├── common.py ├── db_statements.py ├── db_statements_v2.py ├── functional_unit.py ├── ida_to_sql.py ├── instrumentation.py ├── memory_info.py └── sql_exporter.py /README: -------------------------------------------------------------------------------- 1 | zynamics IDA2SQL IDA Pro Plugin has moved to Google Code 2 | ======================================================== 3 | 4 | This repository has moved to Google Code: 5 | http://code.google.com/p/zynamics/source/checkout?repo=ida2sql 6 | 7 | 8 | 9 | Introduction: 10 | ---------------------- 11 | 12 | The module is distributed as two files: 13 | 14 | -ida2sql.py (the file run from IDA, it imports 15 | the main part of the code in the ZIP file) 16 | -ida2sql.zip (A zipped module implementing all the 17 | functionality, it's created by zipping the 18 | ida_to_sql directory) 19 | 20 | 21 | License: 22 | ---------------------- 23 | ida2sql is licensed under the GPL 2 license. See gpl.txt for more information. 24 | 25 | 26 | Installation: 27 | ------------- 28 | 29 | Drop the ZIP file inside the IDA plugins directory. That's all. Just run the 30 | ida2sql.py script from within IDA afterwards. 31 | 32 | 33 | The configuration file: 34 | ----------------------- 35 | 36 | If a file named "ida2sql.cfg" is placed in the IDA top level folder the 37 | database information will be loaded from it. Allowing to quickly export 38 | by just calling the "ida2sql.py" from within IDA. 39 | See the example ida2sql.cfg file included. 40 | 41 | 42 | Special features/Notes: 43 | ----------------- 44 | 45 | If a file named "function_set.txt" exists in IDA's root directory (where 46 | IDA's binaries reside) it will be loaded and only functions listed on it 47 | will be exported. 48 | 49 | The file should contain the hexadecimal start address of the functions 50 | that should be exported, one per line. 51 | 52 | Note 1: if there are any imported symbols referred to from those functions, 53 | those will also be exported. 54 | 55 | Note 2: The exporter might display some errors towards the end of the export. 56 | It will complain because not all the constraints of the database are fulfilled 57 | as there might be references to functions that are not exported. Those errors 58 | can be ignored. 59 | 60 | -------------------------------------------------------------------------------- /gpl.txt: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc. 5 | 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Library General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | 294 | Copyright (C) 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License 307 | along with this program; if not, write to the Free Software 308 | Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 309 | 310 | 311 | Also add information on how to contact you by electronic and paper mail. 312 | 313 | If the program is interactive, make it output a short notice like this 314 | when it starts in an interactive mode: 315 | 316 | Gnomovision version 69, Copyright (C) year name of author 317 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 318 | This is free software, and you are welcome to redistribute it 319 | under certain conditions; type `show c' for details. 320 | 321 | The hypothetical commands `show w' and `show c' should show the appropriate 322 | parts of the General Public License. Of course, the commands you use may 323 | be called something other than `show w' and `show c'; they could even be 324 | mouse-clicks or menu items--whatever suits your program. 325 | 326 | You should also get your employer (if you work as a programmer) or your 327 | school, if any, to sign a "copyright disclaimer" for the program, if 328 | necessary. Here is a sample; alter the names: 329 | 330 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 331 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 332 | 333 | , 1 April 1989 334 | Ty Coon, President of Vice 335 | 336 | This General Public License does not permit incorporating your program into 337 | proprietary programs. If your program is a subroutine library, you may 338 | consider it more useful to permit linking proprietary applications with the 339 | library. If this is what you want to do, use the GNU Library General 340 | Public License instead of this License. 341 | 342 | -------------------------------------------------------------------------------- /ida2sql.cfg: -------------------------------------------------------------------------------- 1 | [database] 2 | engine: MYSQL 3 | host: hostname 4 | schema: database_name 5 | user: username 6 | password: password 7 | 8 | 9 | [importing] 10 | 11 | # Operation mode, "batch" or "auto" indicates that no questions 12 | # or other kind of interaction should be requested from the user. 13 | # It's useful when running ida2sql from the command line: 14 | # 15 | # [ ida_executable.exe -A -OIDAPython:ida2sql.py database.idb|filename.exe ] 16 | # 17 | mode: batch 18 | 19 | # Default comment to set on module import 20 | # 21 | comment: "Imported by ida2sql" 22 | 23 | # Whether to process the raw section data and insert it into the 24 | # corresponding table 25 | # 26 | process_sections: no -------------------------------------------------------------------------------- /ida2sql.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | __author__ = 'Ero Carrera' 20 | __license__ = 'GPL' 21 | 22 | import os 23 | try: 24 | import idaapi 25 | except ImportError: 26 | # This module can sometimes be invoked outside IDA, so 27 | # don't blow up if that happens 28 | # 29 | pass 30 | 31 | 32 | ida2sql_path = os.environ.get('IDA2SQLPATH', None) 33 | 34 | if ida2sql_path: 35 | print 'Environment variable IDA2SQLPATH found: [%s]' % ida2sql_path 36 | os.sys.path.append(ida2sql_path) 37 | else: 38 | print 'Environment variable IDA2SQLPATH not found' 39 | os.sys.path.append(idaapi.idadir(os.path.join('plugins', 'ida2sql.zip'))) 40 | 41 | # Import the main module located in the IDA plugins directory 42 | # 43 | 44 | import ida_to_sql 45 | 46 | import ida_to_sql.common 47 | 48 | __version__ = ida_to_sql.common.__version__ 49 | 50 | # Start the exporter 51 | # 52 | ida_to_sql.ida_to_sql.main() 53 | 54 | #import cProfile 55 | #cProfile.run('ida_to_sql.ida_to_sql.main()', 'ida2sql_profiling_stats.txt') 56 | -------------------------------------------------------------------------------- /ida_to_sql/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """Sabre Security IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into Sabre-Security's SQL 6 | format. 7 | 8 | References: 9 | 10 | Sabre-Security GmbH: http://sabre-security.com/ 11 | MySQL: http://www.mysql.com 12 | IDA: http://www.datarescue.com/idabase/ 13 | 14 | Programmed and tested with IDA 5.0, Python 2.4.4 and IDAPython 0.8.0 on Windows 15 | by Ero Carrera (c) Sabre-Security 2006 [ero.carrera@sabre-security.com] 16 | 17 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 18 | """ 19 | 20 | __author__ = 'Ero Carrera' 21 | __license__ = 'GPL' 22 | 23 | import ida_to_sql 24 | import common 25 | __version__ = common.__version__ 26 | -------------------------------------------------------------------------------- /ida_to_sql/arch/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | __all__ = ('metapc', 'ppc', 'arm') 3 | 4 | -------------------------------------------------------------------------------- /ida_to_sql/arch/arch.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | 20 | """Generic architecure support for instruction parsing. 21 | 22 | All the architecture modules should extend this one and provide the 23 | defined methods. 24 | """ 25 | 26 | import idautils 27 | import idaapi 28 | import idc 29 | import re 30 | 31 | 32 | def ida_hexify(val): 33 | """Utility function to render hex values as shown in IDA.""" 34 | if val < 10: 35 | return '%d' % val 36 | return '%xh' % val 37 | 38 | 39 | 40 | class Instruction: 41 | def __init__(self, itype, size, ip): 42 | self.itype = itype 43 | self.size = size 44 | self.ip = ip 45 | 46 | 47 | class ExpressionNamedValue: 48 | def __init__(self, value, name): 49 | self.value = value 50 | self.name = name 51 | 52 | 53 | class Arch: 54 | 55 | NODE_TYPE_OPERATOR_PLUS = '+' 56 | NODE_TYPE_OPERATOR_MINUS = '-' 57 | NODE_TYPE_OPERATOR_TIMES = '*' 58 | #NODE_TYPE_OPERATOR_DEREFERENCE = '[' 59 | NODE_TYPE_OPERATOR_LIST = '{' 60 | NODE_TYPE_OPERATOR_EXCMARK = '!' 61 | NODE_TYPE_OPERATOR_COMMA = ',' 62 | NODE_TYPE_OPERATOR_LSL = 'LSL' 63 | NODE_TYPE_OPERATOR_LSR = 'LSR' 64 | NODE_TYPE_OPERATOR_ASR = 'ASR' 65 | NODE_TYPE_OPERATOR_ROR = 'ROR' 66 | NODE_TYPE_OPERATOR_RRX = 'RRX' 67 | 68 | NODE_TYPE_OPERATOR_WIDTH_BYTE_1 = 'b1' # Byte 69 | NODE_TYPE_OPERATOR_WIDTH_BYTE_2 = 'b2' # Word 70 | NODE_TYPE_OPERATOR_WIDTH_BYTE_3 = 'b3' # 71 | NODE_TYPE_OPERATOR_WIDTH_BYTE_4 = 'b4' # Double-Word 72 | NODE_TYPE_OPERATOR_WIDTH_BYTE_5 = 'b5' # 73 | NODE_TYPE_OPERATOR_WIDTH_BYTE_6 = 'b6' # 74 | NODE_TYPE_OPERATOR_WIDTH_BYTE_7 = 'b7' # 75 | NODE_TYPE_OPERATOR_WIDTH_BYTE_8 = 'b8' # Quad-Word 76 | NODE_TYPE_OPERATOR_WIDTH_BYTE_9 = 'b9' # 77 | NODE_TYPE_OPERATOR_WIDTH_BYTE_10 = 'b10' # 78 | NODE_TYPE_OPERATOR_WIDTH_BYTE_12 = 'b12' # Packed Real Format mc68040 79 | NODE_TYPE_OPERATOR_WIDTH_BYTE_14 = 'b14' # 80 | NODE_TYPE_OPERATOR_WIDTH_BYTE_16 = 'b16' # 81 | NODE_TYPE_OPERATOR_WIDTH_BYTE_VARIABLE = 'b_var' # Variable size 82 | 83 | NODE_TYPE_VALUE = '#' 84 | NODE_TYPE_SYMBOL = '$' 85 | NODE_TYPE_REGISTER = 'r' 86 | NODE_TYPE_SIZE_PREFIX = 'S' 87 | NODE_TYPE_DEREFERENCE = '[' 88 | 89 | 90 | OPERATORS = (NODE_TYPE_OPERATOR_PLUS, NODE_TYPE_OPERATOR_MINUS, 91 | NODE_TYPE_OPERATOR_TIMES, #NODE_TYPE_OPERATOR_DEREFERENCE, 92 | NODE_TYPE_OPERATOR_LIST, NODE_TYPE_OPERATOR_EXCMARK, 93 | NODE_TYPE_OPERATOR_COMMA, NODE_TYPE_OPERATOR_LSL, NODE_TYPE_OPERATOR_LSR, 94 | NODE_TYPE_OPERATOR_ASR, NODE_TYPE_OPERATOR_ROR, NODE_TYPE_OPERATOR_RRX) 95 | 96 | WIDTH_OPERATORS = ( 97 | NODE_TYPE_OPERATOR_WIDTH_BYTE_1, NODE_TYPE_OPERATOR_WIDTH_BYTE_2, 98 | NODE_TYPE_OPERATOR_WIDTH_BYTE_3, NODE_TYPE_OPERATOR_WIDTH_BYTE_4, 99 | NODE_TYPE_OPERATOR_WIDTH_BYTE_5, NODE_TYPE_OPERATOR_WIDTH_BYTE_6, 100 | NODE_TYPE_OPERATOR_WIDTH_BYTE_7, NODE_TYPE_OPERATOR_WIDTH_BYTE_8, 101 | NODE_TYPE_OPERATOR_WIDTH_BYTE_9, NODE_TYPE_OPERATOR_WIDTH_BYTE_10, 102 | NODE_TYPE_OPERATOR_WIDTH_BYTE_10, NODE_TYPE_OPERATOR_WIDTH_BYTE_12, 103 | NODE_TYPE_OPERATOR_WIDTH_BYTE_14, NODE_TYPE_OPERATOR_WIDTH_BYTE_16, 104 | NODE_TYPE_OPERATOR_WIDTH_BYTE_VARIABLE) 105 | 106 | LEAFS = (NODE_TYPE_SYMBOL, NODE_TYPE_VALUE) 107 | 108 | 109 | def __init__(self): 110 | 111 | # To be set by the architecture specific module if specific instructions 112 | # exist for the purpose 113 | # 114 | self.INSTRUCTIONS_CALL = [] 115 | self.INSTRUCTIONS_CONDITIONAL_BRANCH = [] 116 | self.INSTRUCTIONS_UNCONDITIONAL_BRANCH = [] 117 | self.INSTRUCTIONS_RET = [] 118 | self.INSTRUCTIONS_BRANCH = [] 119 | 120 | if hasattr( idaapi, 'get_inf_structure' ): 121 | inf = idaapi.get_inf_structure() 122 | else: 123 | inf = idaapi.cvar.inf 124 | 125 | # Find the null character of the string (if any) 126 | # 127 | null_idx = inf.procName.find(chr(0)) 128 | if null_idx > 0: 129 | self.processor_name = inf.procName[:null_idx] 130 | else: 131 | self.processor_name = inf.procName 132 | 133 | self.os_type = inf.ostype 134 | self.asmtype = inf.asmtype 135 | 136 | # RegExp to parse stack variable names as IDA 137 | # returns an string containing some sort of reference 138 | # to their frame. 139 | # 140 | self.stack_name_parse = re.compile(r'.*fr[0-9a-f]+\.([^ ].*)') 141 | 142 | 143 | self.current_instruction_type = None 144 | 145 | # To be filled by the architecture module 146 | # 147 | self.arch_name = None 148 | 149 | def as_byte_value(self, c): 150 | """Helper function to deal with the changing type of some byte-size fields. 151 | 152 | In older versions of IDAPython those where returned as characters while in newer 153 | they are returned as ints. This will always return int. 154 | """ 155 | 156 | if isinstance(c, str): 157 | return ord(c) 158 | 159 | return c 160 | 161 | def get_architecture_name(self): 162 | """Fetch the name to be used to identify the architecture.""" 163 | 164 | # Get the addressing mode of the first segment in the IDB and 165 | # set it to describe the module in the database. 166 | # This would need to be rethought for the cases where addressing 167 | # might change withing a module. 168 | # 169 | bitness = idc.GetSegmentAttr( list( idautils.Segments() )[0], idc.SEGATTR_BITNESS) 170 | 171 | if bitness == 0: 172 | bitness = 16 173 | elif bitness == 1: 174 | bitness = 32 175 | elif bitness == 2: 176 | bitness = 64 177 | 178 | return '%s-%d' % (self.arch_name, bitness) 179 | 180 | 181 | def get_stack_var_name(self, var): 182 | """Get the name of a stack variable and return it parsed.""" 183 | 184 | var_name = idaapi.get_struc_name(var.id) 185 | if not isinstance(var_name, str): 186 | return None 187 | 188 | res = self.stack_name_parse.match(var_name) 189 | if res: 190 | return res.group(1) 191 | else: 192 | #raise Exception('Cannot get operand name.') 193 | #print '*** Cannot get operand name!!! ***' 194 | return None 195 | 196 | 197 | def get_address_name(self, value): 198 | """Return the name associated to the address.""" 199 | 200 | name = idc.Name(value) 201 | 202 | if name: 203 | return name 204 | 205 | return None 206 | 207 | 208 | def get_operand_stack_variable_name(self, address, op, idx): 209 | """Return the name of any variable referenced from this operand.""" 210 | 211 | if op.addr>2**31: 212 | addr = -(2**32-op.addr) 213 | else: 214 | addr = op.addr 215 | 216 | try: 217 | # In IDA 5.7 get_stkvar takes 2 arguments 218 | var = idaapi.get_stkvar(op, addr) 219 | except TypeError: 220 | # In earlier versions it takes 3... 221 | var = idaapi.get_stkvar(op, addr, None) 222 | 223 | if var: 224 | if isinstance(var, (tuple, list)): 225 | # get the member_t 226 | # In IDA 5.7 this returns a tuple: (member_t, actval) 227 | # so we need to get the actual object from the first 228 | # item. In previous version that was what was returned 229 | var = var[0] 230 | 231 | func = idaapi.get_func(address) 232 | 233 | stackvar_offset = idaapi.calc_stkvar_struc_offset( 234 | func, address, idx) 235 | stackvar_start_offset = var.soff 236 | stackvar_offset_delta = stackvar_offset-stackvar_start_offset 237 | 238 | delta_str = '' 239 | 240 | if stackvar_offset_delta != 0: 241 | delta_str = '+0x%x' % stackvar_offset_delta 242 | 243 | 244 | disp_str = '' 245 | 246 | # 4 is the value of the stack pointer register SP/ESP in x86. This 247 | # should not break other archs but needs to be here or otherwise would 248 | # need to override the whole method in metapc... 249 | # 250 | if op.reg == 4: 251 | difference_orig_sp_and_current = idaapi.get_spd(func, address) 252 | disp_str = ida_hexify( -difference_orig_sp_and_current-idc.GetFrameRegsSize(address) ) + '+' 253 | 254 | name = self.get_stack_var_name(var) 255 | 256 | if name: 257 | return disp_str + name + delta_str 258 | 259 | return None 260 | 261 | 262 | def is_call(self, instruction=None): 263 | """Return whether the last instruction processed is a call or not.""" 264 | 265 | # If there are instructions defined as being specifically used for "calls" 266 | # we take those as a unique indentifier for whether the instruction is 267 | # if fact a call or not 268 | # 269 | if self.INSTRUCTIONS_CALL: 270 | if instruction.itype in self.INSTRUCTIONS_CALL: 271 | return True 272 | else: 273 | return False 274 | 275 | if not instruction.itype in self.INSTRUCTIONS_BRANCH: 276 | return False 277 | 278 | trgt = list( idautils.CodeRefsFrom(instruction.ip, 0) ) 279 | if not trgt: 280 | trgt = list( idautils.DataRefsFrom(instruction.ip) ) 281 | 282 | if len(trgt) > 0: 283 | 284 | # When getting the name there's a fall back from 285 | # using GetFunctionName() to Name() as sometimes 286 | # imported functions are not defined as functions 287 | # and the former will return an empty string while 288 | # the later will return the import name. 289 | # 290 | trgt_name = idc.GetFunctionName(trgt[0]) 291 | if trgt_name=='': 292 | trgt_name = idc.Name(trgt[0]) 293 | 294 | trgt_name_prev = idc.GetFunctionName(trgt[0]-1) 295 | if trgt_name_prev=='': 296 | trgt_name_prev = idc.Name(trgt[0]-1) 297 | 298 | # In order for the reference to be a call the following 299 | # must hold. 300 | # -There must be a valid function name 301 | # -The function name should be different at the target 302 | # address then the name in the immediately posterior 303 | # address (i.e. target must point to begging of function) 304 | # -The function name should be different than the function 305 | # name of the branch source 306 | # 307 | if( trgt_name is not None and 308 | trgt_name != '' and 309 | trgt_name != trgt_name_prev and 310 | idc.GetFunctionName(instruction.ip) != trgt_name ): 311 | 312 | return True 313 | 314 | return False 315 | 316 | 317 | def is_end_of_flow(self, instruction): 318 | """Return whether the last instruction processed end the flow.""" 319 | 320 | next_addr = instruction.ip+idc.ItemSize(instruction.ip) 321 | next_addr_flags = idc.GetFlags(next_addr) 322 | if idc.isCode(next_addr_flags) and idc.isFlow(next_addr_flags): 323 | return False 324 | 325 | return True 326 | 327 | 328 | def is_conditional_branch(self, instruction): 329 | """Return whether the instruction is a conditional branch""" 330 | 331 | next_addr = instruction.ip+idc.ItemSize(instruction.ip) 332 | next_addr_flags = idc.GetFlags(next_addr) 333 | if ( 334 | idc.isCode(next_addr_flags) and 335 | idc.isFlow(next_addr_flags) and 336 | (instruction.itype in self.INSTRUCTIONS_BRANCH) ): 337 | 338 | return True 339 | 340 | return False 341 | 342 | 343 | def is_unconditional_branch(self, instruction): 344 | """Return whether the instruction is an unconditional branch""" 345 | 346 | next_addr = instruction.ip+idc.ItemSize(instruction.ip) 347 | next_addr_flags = idc.GetFlags(next_addr) 348 | 349 | if ( (instruction.itype in self.INSTRUCTIONS_BRANCH) and 350 | (not idc.isCode(next_addr_flags)) or 351 | (not idc.isFlow(next_addr_flags)) ): 352 | 353 | return True 354 | 355 | return False 356 | 357 | # 358 | # Methods to override by implementing classes 359 | # 360 | 361 | def check_arch(self): 362 | """Test whether this module can process the current architecture.""" 363 | pass 364 | 365 | def process_instruction(self, packet, addr): 366 | """Architecture specific instruction processing. 367 | 368 | The functions can call 'process_instruction_generic' which 369 | will do some processing generic to all architectures. 370 | """ 371 | pass 372 | #return instruction 373 | 374 | 375 | def get_mnemonic(self, addr): 376 | """Return the mnemonic for the current instruction. 377 | 378 | Achitecture specific modules can define a new method 379 | to process mnemonics in different ways. 380 | """ 381 | 382 | return idaapi.ua_mnem(addr) 383 | 384 | 385 | def operands_parser(self, address, operands): 386 | """Parse operands. 387 | 388 | Can be defined in architecture specific modules to 389 | process the whole list of operands before or after 390 | parsing, if necessary. In Intel, for instance, is 391 | used to post process operands where the target is 392 | also used as source but included only once, that 393 | happens for instance with the IMUL instruction. 394 | """ 395 | 396 | op_list = [] 397 | 398 | for op, idx in operands: 399 | current_operand = self.single_operand_parser(address, op, idx) 400 | 401 | if not current_operand: 402 | continue 403 | 404 | if isinstance(current_operand[0], (list, tuple)): 405 | op_list.extend( current_operand ) 406 | else: 407 | op_list.append( current_operand ) 408 | 409 | operands = op_list 410 | 411 | return op_list 412 | 413 | 414 | #def process_instruction_generic(self, addr, operand_parser): 415 | def process_instruction_generic(self, addr): 416 | """Architecture agnostic instruction parsing.""" 417 | 418 | # Retrieve the instruction mnemonic 419 | # 420 | i_mnemonic = self.get_mnemonic(addr) 421 | if not i_mnemonic: 422 | return None, None, None, None, None 423 | 424 | # Set the current location to the instruction to disassemble 425 | # 426 | #idaapi.jumpto(addr) 427 | #idaapi.ua_ana0(addr) 428 | 429 | # Up to IDA 5.7 it was called ua_code... 430 | if hasattr(idaapi, 'ua_code'): 431 | # Gergely told me of using ua_code() and idaapi.cvar.cmd 432 | # instead of jumpto() and get_current_instruction(). The latter 433 | # where always making IDA to reposition the cursor and refresh 434 | # the GUI, which was quite painful 435 | # 436 | idaapi.ua_code(addr) 437 | # Retrieve the current instruction's structure and 438 | # set its type 439 | ida_instruction = idaapi.cvar.cmd 440 | else: 441 | # now it's called decode_insn() 442 | idaapi.decode_insn(addr) 443 | # Retrieve the current instruction's structure and 444 | # set its type 445 | ida_instruction = idaapi.cmd 446 | 447 | 448 | instruction = Instruction( 449 | ida_instruction.itype, ida_instruction.size, ida_instruction.ip) 450 | self.current_instruction_type = instruction.itype 451 | 452 | 453 | # Try to process as many operands as IDA supports 454 | # 455 | # Up to IDA 5.7 it was called ua_code... so we use it to check for 5.7 456 | if hasattr(idaapi, 'ua_code'): 457 | operands = self.operands_parser( addr, [( 458 | idaapi.get_instruction_operand(ida_instruction, idx), 459 | idx ) for idx in range(6)] ) 460 | else: 461 | operands = self.operands_parser( addr, [( 462 | ida_instruction.Operands[idx], 463 | idx ) for idx in range(6)] ) 464 | 465 | # Retrieve the operand strings 466 | # 467 | operand_strings = [ 468 | idc.GetOpnd(addr, idx) for idx in range(len(operands))] 469 | 470 | # Get the instruction data 471 | # 472 | data = ''.join( 473 | [chr(idaapi.get_byte(addr+i)) for i in range(idc.ItemSize(addr))]) 474 | 475 | # Return the mnemonic and the operand AST 476 | # 477 | return instruction, i_mnemonic, operands, operand_strings, data 478 | 479 | -------------------------------------------------------------------------------- /ida_to_sql/arch/arm.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | __author__ = 'Ero Carrera' 20 | __license__ = 'GPL' 21 | 22 | 23 | import arch 24 | import idc 25 | import idaapi 26 | 27 | 28 | # IDA's operand types 29 | # 30 | OPERAND_TYPE_NO_OPERAND = 0 31 | OPERAND_TYPE_REGISTER = 1 32 | OPERAND_TYPE_MEMORY = 2 33 | OPERAND_TYPE_PHRASE = 3 34 | OPERAND_TYPE_DISPLACEMENT = 4 35 | OPERAND_TYPE_IMMEDIATE = 5 36 | OPERAND_TYPE_FAR = 6 37 | OPERAND_TYPE_NEAR = 7 38 | OPERAND_TYPE_IDPSPEC0 = 8 39 | OPERAND_TYPE_IDPSPEC1 = 9 40 | OPERAND_TYPE_IDPSPEC2 = 10 41 | OPERAND_TYPE_IDPSPEC3 = 11 42 | OPERAND_TYPE_IDPSPEC4 = 12 # MMX register 43 | OPERAND_TYPE_IDPSPEC5 = 13 # XMM register 44 | 45 | 46 | 47 | class Arch(arch.Arch): 48 | """Architecture specific processing for 'ARM'""" 49 | 50 | 51 | INSTRUCTIONS = [ 'ARM_null', 'ARM_ret', 'ARM_nop', 'ARM_b', 'ARM_bl', 'ARM_asr', 'ARM_lsl', 'ARM_lsr', 'ARM_ror', 'ARM_neg', 'ARM_and', 'ARM_eor', 'ARM_sub', 'ARM_rsb', 'ARM_add', 'ARM_adc', 'ARM_sbc', 'ARM_rsc', 'ARM_tst', 'ARM_teq', 'ARM_cmp', 'ARM_cmn', 'ARM_orr', 'ARM_mov', 'ARM_bic', 'ARM_mvn', 'ARM_mrs', 'ARM_msr', 'ARM_mul', 'ARM_mla', 'ARM_ldr', 'ARM_ldrpc', 'ARM_str', 'ARM_ldm', 'ARM_stm', 'ARM_swp', 'ARM_swi', 'ARM_smull', 'ARM_smlal', 'ARM_umull', 'ARM_umlal', 'ARM_bx', 'ARM_pop', 'ARM_push', 'ARM_adr', 'ARM_bkpt', 'ARM_blx1', 'ARM_blx2', 'ARM_clz', 'ARM_ldrd', 'ARM_pld', 'ARM_qadd', 'ARM_qdadd', 'ARM_qdsub', 'ARM_qsub', 'ARM_smlabb', 'ARM_smlatb', 'ARM_smlabt', 'ARM_smlatt', 'ARM_smlalbb', 'ARM_smlaltb', 'ARM_smlalbt', 'ARM_smlaltt', 'ARM_smlawb', 'ARM_smulwb', 'ARM_smlawt', 'ARM_smulwt', 'ARM_smulbb', 'ARM_smultb', 'ARM_smulbt', 'ARM_smultt', 'ARM_strd', 'xScale_mia', 'xScale_miaph', 'xScale_miabb', 'xScale_miabt', 'xScale_miatb', 'xScale_miatt', 'xScale_mar', 'xScale_mra', 'ARM_movl', 'ARM_swbkpt', 'ARM_cdp', 'ARM_cdp2', 'ARM_ldc', 'ARM_ldc2', 'ARM_stc', 'ARM_stc2', 'ARM_mrc', 'ARM_mrc2', 'ARM_mcr', 'ARM_mcr2', 'ARM_mcrr', 'ARM_mrrc', 'ARM_last'] 52 | 53 | 54 | # With IDA 5.5 D0-DX registers appeared with op.ref ranging in the 61-7X range. Don't know if there 55 | # are other registers defined earlier 56 | REGISTERS = ['R%d' % i for i in range(32)] + [None for i in range(32, 61)] + ['D%d' % (i-61) for i in range(61, 93)] + ['S%d' % (i-93) for i in range(93, 125)] 57 | REGISTERS[13] = 'SP' 58 | REGISTERS[14] = 'LR' 59 | REGISTERS[15] = 'PC' 60 | 61 | OPERATORS = arch.Arch.OPERATORS 62 | 63 | 64 | OPERAND_WIDTH = [ 65 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_1, 66 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_2, 67 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_4, 68 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_4, 69 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_8, 70 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_VARIABLE, 71 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_12, 72 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_8, 73 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_16, 74 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_2, None, 75 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_6, 76 | None, None, 77 | None, None] 78 | 79 | def __get_instruction_index(self, insn_list): 80 | """Retrieve the indices of the given instructions into the instruction table. 81 | 82 | Those indices are used to indicate the type of an instruction. 83 | """ 84 | 85 | return [self.INSTRUCTIONS.index(i) for i in insn_list] 86 | 87 | 88 | def __init__(self): 89 | arch.Arch.__init__(self) 90 | 91 | #self.INSTRUCTIONS_CALL = self.__get_instruction_index(('ARM_bl',)) 92 | self.INSTRUCTIONS_CONDITIONAL_BRANCH = self.__get_instruction_index( 93 | ( )) 94 | self.INSTRUCTIONS_UNCONDITIONAL_BRANCH = self.__get_instruction_index( 95 | ( )) 96 | #self.INSTRUCTIONS_RET = self.__get_instruction_index((,)) 97 | 98 | self.INSTRUCTIONS_BRANCH = self.__get_instruction_index( 99 | ( 'ARM_b', 'ARM_blx1', 'ARM_blx2', 'ARM_b', 'ARM_bl', 'ARM_bx', 'ARM_blx1', 'ARM_blx2' )) 100 | 101 | self.arch_name = 'ARM' 102 | 103 | def generate_shift_tree(self, shift_value, first_value, second_value): 104 | shifts = ['LSL', 'LSR', 'ASR', 'ROR', 'RRX'] 105 | shift_types = [self.NODE_TYPE_OPERATOR_LSL, self.NODE_TYPE_OPERATOR_LSR, self.NODE_TYPE_OPERATOR_ASR, self.NODE_TYPE_OPERATOR_ROR, self.NODE_TYPE_OPERATOR_RRX] 106 | 107 | shift_type = shift_types[shift_value] 108 | 109 | if shift_type == self.NODE_TYPE_OPERATOR_RRX: 110 | return [shift_type, [self.NODE_TYPE_REGISTER, first_value, 0]] 111 | elif isinstance(second_value, int): 112 | return [shift_type, [self.NODE_TYPE_REGISTER, first_value, 0],[self.NODE_TYPE_VALUE, second_value, 1]] 113 | else: 114 | return [shift_type, [self.NODE_TYPE_REGISTER, first_value, 0],[self.NODE_TYPE_REGISTER, second_value, 1]] 115 | 116 | def check_arch(self): 117 | 118 | if self.processor_name == 'ARM': 119 | return True 120 | 121 | return False 122 | 123 | def is_s_instruction(self, mnemonic): 124 | return mnemonic.endswith("S") and mnemonic[0:3] in ["MOV", "AND", "BIC", "EOR", "MVN", "ORR", "TEQ", "TST"] 125 | 126 | def get_mnemonic(self, addr): 127 | """ 128 | Return the mnemonic for the current instruction. 129 | """ 130 | disasm_line = idc.GetDisasm(addr) 131 | if disasm_line is None: 132 | # This behavior has been exhibited by IDA5.4 with an IDB of "libSystem.B.dylib" 133 | # at address 0x3293210e ( "08 BB CBNZ R0, loc_32932154" ) 134 | # Never IDA versions show the instruction above while IDA 5.4 135 | # returns None. We will skip the instruction in such a case returning 'invalid' 136 | # as the mnemonic 137 | # 138 | print '%08x: idc.GetDisasm() returned None for address: %08x' % (addr, addr) 139 | return 'invalid' 140 | disasm_line_tokenized = disasm_line.split() 141 | mnem = disasm_line_tokenized[0] 142 | return mnem 143 | 144 | def single_operand_parser(self, address, op, idx): 145 | """Parse a ARM operand.""" 146 | 147 | def constraint_value(value): 148 | if value>2**16: 149 | return -(2**32-value) 150 | return value 151 | 152 | 153 | def parse_register_list(bitfield, bit_field_width=32): 154 | """Parse operand representing a list of registers.""" 155 | operand = [self.NODE_TYPE_OPERATOR_LIST] 156 | i = 0 157 | for idx in range(32): 158 | if bitfield&(2**idx): 159 | operand.extend([[self.NODE_TYPE_REGISTER, self.REGISTERS[idx], i]]) 160 | i=i+1 161 | 162 | return operand 163 | 164 | 165 | def parse_register_list_floating_point(register, count): 166 | """Parse operand representing a list of registers.""" 167 | 168 | operand = [self.NODE_TYPE_OPERATOR_LIST] 169 | for idx in range(register, register+count): 170 | operand.extend([[self.NODE_TYPE_REGISTER, 'D%d' % idx , 0]]) 171 | 172 | return operand 173 | 174 | ### Operand parsing ### 175 | 176 | if op.type == OPERAND_TYPE_NO_OPERAND: 177 | return None 178 | 179 | segment = idaapi.getseg(address) 180 | addressing_mode = segment.bitness 181 | 182 | # Start creating the AST, the root entry is always the width of the operand 183 | 184 | operand = [self.OPERAND_WIDTH[self.as_byte_value(op.dtyp)]] 185 | 186 | 187 | # Compose the rest of the AST 188 | 189 | if op.type == OPERAND_TYPE_DISPLACEMENT: 190 | 191 | # At this point we have to parse specific bits of the instruction 192 | # ourselves because IDA does not provide all the required data. 193 | 194 | val = idc.Dword(address) 195 | p = (val >> 24) & 1 196 | w = (val >> 21) & 1 197 | value = constraint_value(op.addr) 198 | 199 | phrase = [self.NODE_TYPE_OPERATOR_COMMA, [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0]] 200 | 201 | if idc.GetMnem(address) in ["LDR", "STR"] and idc.ItemSize(address) > 2: 202 | if p == 0 and w == 0: # Situation: [ ... ], VALUE 203 | operand.extend( [ [ 204 | self.NODE_TYPE_OPERATOR_COMMA, 205 | [ self.NODE_TYPE_DEREFERENCE, [ 206 | self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0] , ], 207 | [ self.NODE_TYPE_VALUE, value, 1 ] ] ]) 208 | 209 | else: # Situation: [ ... ] or [ ... ]! 210 | 211 | # We want to avoid [R1 + 0]! situations, so we explicitly 212 | # remove the +0 phrase if it exists. 213 | 214 | if value != 0: 215 | inner = [self.NODE_TYPE_DEREFERENCE,phrase+[ [self.NODE_TYPE_VALUE, value, 1]] ] 216 | else: 217 | inner = [self.NODE_TYPE_DEREFERENCE,[self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0]] 218 | if p == 1 and w == 1: 219 | operand.extend([[self.NODE_TYPE_OPERATOR_EXCMARK, inner]]) 220 | else: 221 | operand.extend([inner]) 222 | else: 223 | 224 | if value != 0: 225 | operand.extend([[self.NODE_TYPE_DEREFERENCE,phrase+[ [self.NODE_TYPE_VALUE, value, 1]]]]) 226 | else: 227 | operand.extend([[self.NODE_TYPE_DEREFERENCE,[self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0]]]) 228 | 229 | elif op.type == OPERAND_TYPE_REGISTER: 230 | 231 | if idc.GetMnem(address) in ["STM", "LDM"]: 232 | 233 | val = idc.Dword(address) 234 | w = (val >> 21) & 1 235 | 236 | if w == 1: 237 | operand.extend([[self.NODE_TYPE_OPERATOR_EXCMARK,[self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0]]]) 238 | else: 239 | operand.extend([[self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0]]) 240 | 241 | else: 242 | try: 243 | operand.extend([[self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0]]) 244 | except Exception, excp: 245 | print '%08x: UNSUPPORTED OPERAND REGISTER at %08x: idx: %d' % (address, address, op.reg) 246 | 247 | elif op.type == OPERAND_TYPE_MEMORY: 248 | 249 | addr_name = self.get_address_name(op.addr) 250 | 251 | if addr_name: 252 | value = arch.ExpressionNamedValue(long(op.addr), addr_name) 253 | else: 254 | value = op.addr 255 | 256 | operand.extend([ 257 | [self.NODE_TYPE_DEREFERENCE, 258 | [self.NODE_TYPE_VALUE, value, 0]] ]) 259 | 260 | elif op.type == OPERAND_TYPE_IMMEDIATE: 261 | 262 | mnemonic = self.get_mnemonic(address) 263 | 264 | if ( self.is_s_instruction(mnemonic) and idc.ItemSize(address) >= 4): 265 | val = idc.Dword(address) 266 | rotate_imm = 2 * (((val >> 8) & 1) | ((val >> 8) & 2) | ((val >> 8) & 4) | ((val >> 8) & 8)) 267 | immed_8 = val & 0xFF 268 | 269 | if rotate_imm == 0: 270 | operand.extend([[self.NODE_TYPE_VALUE, op.value, 0]]) 271 | else: 272 | operand.extend([[self.NODE_TYPE_OPERATOR_ROR,[self.NODE_TYPE_VALUE, immed_8, 0],[self.NODE_TYPE_VALUE, rotate_imm, 1]]]) 273 | else: 274 | operand.extend([[self.NODE_TYPE_VALUE, op.value, 0]]) 275 | 276 | elif op.type in (OPERAND_TYPE_NEAR, OPERAND_TYPE_FAR): 277 | 278 | addr_name = self.get_address_name(op.addr) 279 | 280 | if addr_name: 281 | value = arch.ExpressionNamedValue(long(op.addr), addr_name) 282 | else: 283 | value = op.addr 284 | operand.extend([[self.NODE_TYPE_VALUE, value, 0]]) 285 | 286 | elif op.type == OPERAND_TYPE_PHRASE: 287 | if ( idc.ItemSize(address) <= 2 ): 288 | operand.extend( [ [ 289 | self.NODE_TYPE_DEREFERENCE, 290 | [ self.NODE_TYPE_OPERATOR_COMMA, 291 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], 292 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[ self.as_byte_value(op.specflag1) ], 1 ] ], ] ]) 293 | else: 294 | val = idc.Dword(address) 295 | p = (val >> 24) & 1 296 | w = (val >> 21) & 1 297 | needs_shift = ((val >> 25) & 1) & (((val >> 11) & 1) | ((val >> 10) & 1) | ((val >> 9) & 1) | ((val >> 8) & 1) | ((val >> 7) & 1)) 298 | 299 | if needs_shift: 300 | tree = self.generate_shift_tree(self.as_byte_value(op.specflag2), self.REGISTERS[self.as_byte_value(op.specflag1)], op.value) 301 | if p == 0 and w == 0: 302 | operand.extend( [ [ 303 | self.NODE_TYPE_OPERATOR_COMMA, 304 | [ self.NODE_TYPE_DEREFERENCE, 305 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0],],tree + [1] ] ]) 306 | elif p == 1 and w == 1: 307 | operand.extend( [ [ 308 | self.NODE_TYPE_OPERATOR_EXCMARK, 309 | [ self.NODE_TYPE_DEREFERENCE, 310 | [ self.NODE_TYPE_OPERATOR_COMMA, 311 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], tree + [1] ], ] ] ] ) 312 | else: 313 | operand.extend( [ [ 314 | self.NODE_TYPE_DEREFERENCE, 315 | [ self.NODE_TYPE_OPERATOR_COMMA, 316 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], tree + [1] ], ] ] ) 317 | else: 318 | if op.value: # Optional Integer value 319 | if p == 0 and w == 0: 320 | operand.extend( [ [ 321 | self.NODE_TYPE_DEREFERENCE, 322 | [ self.NODE_TYPE_OPERATOR_COMMA, 323 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], 324 | [ self.NODE_TYPE_OPERATOR_LSL, 325 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[ self.as_byte_value(op.specflag1) ], 0 ], 326 | [ self.NODE_TYPE_VALUE, op.value, 1], 1 ] ] ] ]) 327 | 328 | elif p == 1 and w == 1: 329 | operand.extend( [ [ 330 | self.NODE_TYPE_OPERATOR_EXCMARK, 331 | [ self.NODE_TYPE_DEREFERENCE, 332 | [ self.NODE_TYPE_OPERATOR_COMMA, 333 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], 334 | [self.NODE_TYPE_VALUE, op.value, 1 ] ], ] ] ] ) 335 | else: 336 | operand.extend( [ [ 337 | self.NODE_TYPE_DEREFERENCE, 338 | [ self.NODE_TYPE_OPERATOR_COMMA, 339 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], 340 | [ self.NODE_TYPE_VALUE, op.value, 1 ] ], ] ]) 341 | 342 | else: # Optional Register value 343 | if p == 0 and w == 0: 344 | operand.extend( [ [ 345 | self.NODE_TYPE_DEREFERENCE, 346 | [ self.NODE_TYPE_OPERATOR_COMMA, 347 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], 348 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[ self.as_byte_value(op.specflag1) ], 1 ] ] ] ] ) 349 | 350 | elif p == 1 and w == 1: # set exclamation mark if write back is indicated 351 | operand.extend( [ [ 352 | self.NODE_TYPE_OPERATOR_EXCMARK, 353 | [ self.NODE_TYPE_DEREFERENCE, 354 | [ self.NODE_TYPE_OPERATOR_COMMA, 355 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], 356 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[self.as_byte_value(op.specflag1)], 1] ] ] ] ] ) 357 | else: 358 | operand.extend( [ [ 359 | self.NODE_TYPE_DEREFERENCE, 360 | [ self.NODE_TYPE_OPERATOR_COMMA, 361 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[op.reg], 0 ], 362 | [ self.NODE_TYPE_REGISTER, self.REGISTERS[self.as_byte_value(op.specflag1)], 1 ] ] ] ] ) 363 | 364 | elif op.type == OPERAND_TYPE_IDPSPEC0: 365 | if op.value: # Optional Integer value 366 | operand.extend( [ self.generate_shift_tree(self.as_byte_value(op.specflag2), self.REGISTERS[op.reg], op.value) ] ) 367 | else: # Optional Register value 368 | operand.extend( [ 369 | self.generate_shift_tree( 370 | self.as_byte_value(op.specflag2), 371 | self.REGISTERS[op.reg], 372 | self.REGISTERS[ self.as_byte_value(op.specflag1) ] ) ] ) 373 | 374 | elif op.type == OPERAND_TYPE_IDPSPEC1: 375 | operand.extend([parse_register_list(op.specval, bit_field_width=16)]) 376 | 377 | elif op.type == OPERAND_TYPE_IDPSPEC2: 378 | operand.extend([parse_register_list(op.specval, bit_field_width=32)]) 379 | 380 | elif op.type == OPERAND_TYPE_IDPSPEC3: 381 | print '***Don\'t know how to parse OPERAND_TYPE_IDPSPEC3' 382 | operand.extend([[self.NODE_TYPE_SYMBOL, 'UNK_IDPSPEC3(val:%d, reg:%d, type:%d)' % ( op.value, op.reg, op.type), 0]]) 383 | 384 | elif op.type == OPERAND_TYPE_IDPSPEC4: 385 | operand.extend([ 386 | [self.NODE_TYPE_REGISTER, 'D%d' % op.reg, 0]]) 387 | 388 | elif op.type == OPERAND_TYPE_IDPSPEC5: 389 | operand.extend([parse_register_list_floating_point(op.reg, op.value)]) 390 | 391 | return operand 392 | 393 | def process_instruction(self, packet, addr): 394 | """Architecture specific instruction processing""" 395 | 396 | # Call the generic part with the architecture specific operand 397 | # handling 398 | 399 | (instruction, 400 | i_mnemonic, 401 | operands, 402 | operand_strings, 403 | data) = self.process_instruction_generic(addr) 404 | 405 | packet.add_instruction(instruction, addr, i_mnemonic, 406 | operand_strings, operands, data) 407 | 408 | return instruction 409 | -------------------------------------------------------------------------------- /ida_to_sql/arch/metapc.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | __author__ = 'Ero Carrera' 20 | __license__ = 'GPL' 21 | 22 | 23 | import arch 24 | import idc 25 | import idaapi 26 | 27 | 28 | # IDA's operand types 29 | # 30 | OPERAND_TYPE_NO_OPERAND = 0 31 | OPERAND_TYPE_REGISTER = 1 32 | OPERAND_TYPE_MEMORY = 2 33 | OPERAND_TYPE_PHRASE = 3 34 | OPERAND_TYPE_DISPLACEMENT = 4 35 | OPERAND_TYPE_IMMEDIATE = 5 36 | OPERAND_TYPE_FAR = 6 37 | OPERAND_TYPE_NEAR = 7 38 | OPERAND_TYPE_IDPSPEC0 = 8 39 | OPERAND_TYPE_IDPSPEC1 = 9 40 | OPERAND_TYPE_IDPSPEC2 = 10 41 | OPERAND_TYPE_IDPSPEC3 = 11 42 | OPERAND_TYPE_IDPSPEC4 = 12 # MMX register 43 | OPERAND_TYPE_IDPSPEC5 = 13 # XMM register 44 | 45 | 46 | 47 | class Arch(arch.Arch): 48 | """Architecture specific processing for 'metapc'""" 49 | 50 | 51 | INSTRUCTIONS = ['NN_null', 'NN_aaa', 'NN_aad', 'NN_aam', 'NN_aas', 'NN_adc', 'NN_add', 'NN_and', 'NN_arpl', 'NN_bound', 'NN_bsf', 'NN_bsr', 'NN_bt', 'NN_btc', 'NN_btr', 'NN_bts', 'NN_call', 'NN_callfi', 'NN_callni', 'NN_cbw', 'NN_cwde', 'NN_cdqe', 'NN_clc', 'NN_cld', 'NN_cli', 'NN_clts', 'NN_cmc', 'NN_cmp', 'NN_cmps', 'NN_cwd', 'NN_cdq', 'NN_cqo', 'NN_daa', 'NN_das', 'NN_dec', 'NN_div', 'NN_enterw', 'NN_enter', 'NN_enterd', 'NN_enterq', 'NN_hlt', 'NN_idiv', 'NN_imul', 'NN_in', 'NN_inc', 'NN_ins', 'NN_int', 'NN_into', 'NN_int3', 'NN_iretw', 'NN_iret', 'NN_iretd', 'NN_iretq', 'NN_ja', 'NN_jae', 'NN_jb', 'NN_jbe', 'NN_jc', 'NN_jcxz', 'NN_jecxz', 'NN_jrcxz', 'NN_je', 'NN_jg', 'NN_jge', 'NN_jl', 'NN_jle', 'NN_jna', 'NN_jnae', 'NN_jnb', 'NN_jnbe', 'NN_jnc', 'NN_jne', 'NN_jng', 'NN_jnge', 'NN_jnl', 'NN_jnle', 'NN_jno', 'NN_jnp', 'NN_jns', 'NN_jnz', 'NN_jo', 'NN_jp', 'NN_jpe', 'NN_jpo', 'NN_js', 'NN_jz', 'NN_jmp', 'NN_jmpfi', 'NN_jmpni', 'NN_jmpshort', 'NN_lahf', 'NN_lar', 'NN_lea', 'NN_leavew', 'NN_leave', 'NN_leaved', 'NN_leaveq', 'NN_lgdt', 'NN_lidt', 'NN_lgs', 'NN_lss', 'NN_lds', 'NN_les', 'NN_lfs', 'NN_lldt', 'NN_lmsw', 'NN_lock', 'NN_lods', 'NN_loopw', 'NN_loop', 'NN_loopd', 'NN_loopq', 'NN_loopwe', 'NN_loope', 'NN_loopde', 'NN_loopqe', 'NN_loopwne', 'NN_loopne', 'NN_loopdne', 'NN_loopqne', 'NN_lsl', 'NN_ltr', 'NN_mov', 'NN_movsp', 'NN_movs', 'NN_movsx', 'NN_movzx', 'NN_mul', 'NN_neg', 'NN_nop', 'NN_not', 'NN_or', 'NN_out', 'NN_outs', 'NN_pop', 'NN_popaw', 'NN_popa', 'NN_popad', 'NN_popaq', 'NN_popfw', 'NN_popf', 'NN_popfd', 'NN_popfq', 'NN_push', 'NN_pushaw', 'NN_pusha', 'NN_pushad', 'NN_pushaq', 'NN_pushfw', 'NN_pushf', 'NN_pushfd', 'NN_pushfq', 'NN_rcl', 'NN_rcr', 'NN_rol', 'NN_ror', 'NN_rep', 'NN_repe', 'NN_repne', 'NN_retn', 'NN_retf', 'NN_sahf', 'NN_sal', 'NN_sar', 'NN_shl', 'NN_shr', 'NN_sbb', 'NN_scas', 'NN_seta', 'NN_setae', 'NN_setb', 'NN_setbe', 'NN_setc', 'NN_sete', 'NN_setg', 'NN_setge', 'NN_setl', 'NN_setle', 'NN_setna', 'NN_setnae', 'NN_setnb', 'NN_setnbe', 'NN_setnc', 'NN_setne', 'NN_setng', 'NN_setnge', 'NN_setnl', 'NN_setnle', 'NN_setno', 'NN_setnp', 'NN_setns', 'NN_setnz', 'NN_seto', 'NN_setp', 'NN_setpe', 'NN_setpo', 'NN_sets', 'NN_setz', 'NN_sgdt', 'NN_sidt', 'NN_shld', 'NN_shrd', 'NN_sldt', 'NN_smsw', 'NN_stc', 'NN_std', 'NN_sti', 'NN_stos', 'NN_str', 'NN_sub', 'NN_test', 'NN_verr', 'NN_verw', 'NN_wait', 'NN_xchg', 'NN_xlat', 'NN_xor', 'NN_cmpxchg', 'NN_bswap', 'NN_xadd', 'NN_invd', 'NN_wbinvd', 'NN_invlpg', 'NN_rdmsr', 'NN_wrmsr', 'NN_cpuid', 'NN_cmpxchg8b', 'NN_rdtsc', 'NN_rsm', 'NN_cmova', 'NN_cmovb', 'NN_cmovbe', 'NN_cmovg', 'NN_cmovge', 'NN_cmovl', 'NN_cmovle', 'NN_cmovnb', 'NN_cmovno', 'NN_cmovnp', 'NN_cmovns', 'NN_cmovnz', 'NN_cmovo', 'NN_cmovp', 'NN_cmovs', 'NN_cmovz', 'NN_fcmovb', 'NN_fcmove', 'NN_fcmovbe', 'NN_fcmovu', 'NN_fcmovnb', 'NN_fcmovne', 'NN_fcmovnbe', 'NN_fcmovnu', 'NN_fcomi', 'NN_fucomi', 'NN_fcomip', 'NN_fucomip', 'NN_rdpmc', 'NN_fld', 'NN_fst', 'NN_fstp', 'NN_fxch', 'NN_fild', 'NN_fist', 'NN_fistp', 'NN_fbld', 'NN_fbstp', 'NN_fadd', 'NN_faddp', 'NN_fiadd', 'NN_fsub', 'NN_fsubp', 'NN_fisub', 'NN_fsubr', 'NN_fsubrp', 'NN_fisubr', 'NN_fmul', 'NN_fmulp', 'NN_fimul', 'NN_fdiv', 'NN_fdivp', 'NN_fidiv', 'NN_fdivr', 'NN_fdivrp', 'NN_fidivr', 'NN_fsqrt', 'NN_fscale', 'NN_fprem', 'NN_frndint', 'NN_fxtract', 'NN_fabs', 'NN_fchs', 'NN_fcom', 'NN_fcomp', 'NN_fcompp', 'NN_ficom', 'NN_ficomp', 'NN_ftst', 'NN_fxam', 'NN_fptan', 'NN_fpatan', 'NN_f2xm1', 'NN_fyl2x', 'NN_fyl2xp1', 'NN_fldz', 'NN_fld1', 'NN_fldpi', 'NN_fldl2t', 'NN_fldl2e', 'NN_fldlg2', 'NN_fldln2', 'NN_finit', 'NN_fninit', 'NN_fsetpm', 'NN_fldcw', 'NN_fstcw', 'NN_fnstcw', 'NN_fstsw', 'NN_fnstsw', 'NN_fclex', 'NN_fnclex', 'NN_fstenv', 'NN_fnstenv', 'NN_fldenv', 'NN_fsave', 'NN_fnsave', 'NN_frstor', 'NN_fincstp', 'NN_fdecstp', 'NN_ffree', 'NN_fnop', 'NN_feni', 'NN_fneni', 'NN_fdisi', 'NN_fndisi', 'NN_fprem1', 'NN_fsincos', 'NN_fsin', 'NN_fcos', 'NN_fucom', 'NN_fucomp', 'NN_fucompp', 'NN_setalc', 'NN_svdc', 'NN_rsdc', 'NN_svldt', 'NN_rsldt', 'NN_svts', 'NN_rsts', 'NN_icebp', 'NN_loadall', 'NN_emms', 'NN_movd', 'NN_movq', 'NN_packsswb', 'NN_packssdw', 'NN_packuswb', 'NN_paddb', 'NN_paddw', 'NN_paddd', 'NN_paddsb', 'NN_paddsw', 'NN_paddusb', 'NN_paddusw', 'NN_pand', 'NN_pandn', 'NN_pcmpeqb', 'NN_pcmpeqw', 'NN_pcmpeqd', 'NN_pcmpgtb', 'NN_pcmpgtw', 'NN_pcmpgtd', 'NN_pmaddwd', 'NN_pmulhw', 'NN_pmullw', 'NN_por', 'NN_psllw', 'NN_pslld', 'NN_psllq', 'NN_psraw', 'NN_psrad', 'NN_psrlw', 'NN_psrld', 'NN_psrlq', 'NN_psubb', 'NN_psubw', 'NN_psubd', 'NN_psubsb', 'NN_psubsw', 'NN_psubusb', 'NN_psubusw', 'NN_punpckhbw', 'NN_punpckhwd', 'NN_punpckhdq', 'NN_punpcklbw', 'NN_punpcklwd', 'NN_punpckldq', 'NN_pxor', 'NN_fxsave', 'NN_fxrstor', 'NN_sysenter', 'NN_sysexit', 'NN_pavgusb', 'NN_pfadd', 'NN_pfsub', 'NN_pfsubr', 'NN_pfacc', 'NN_pfcmpge', 'NN_pfcmpgt', 'NN_pfcmpeq', 'NN_pfmin', 'NN_pfmax', 'NN_pi2fd', 'NN_pf2id', 'NN_pfrcp', 'NN_pfrsqrt', 'NN_pfmul', 'NN_pfrcpit1', 'NN_pfrsqit1', 'NN_pfrcpit2', 'NN_pmulhrw', 'NN_femms', 'NN_prefetch', 'NN_prefetchw', 'NN_addps', 'NN_addss', 'NN_andnps', 'NN_andps', 'NN_cmpps', 'NN_cmpss', 'NN_comiss', 'NN_cvtpi2ps', 'NN_cvtps2pi', 'NN_cvtsi2ss', 'NN_cvtss2si', 'NN_cvttps2pi', 'NN_cvttss2si', 'NN_divps', 'NN_divss', 'NN_ldmxcsr', 'NN_maxps', 'NN_maxss', 'NN_minps', 'NN_minss', 'NN_movaps', 'NN_movhlps', 'NN_movhps', 'NN_movlhps', 'NN_movlps', 'NN_movmskps', 'NN_movss', 'NN_movups', 'NN_mulps', 'NN_mulss', 'NN_orps', 'NN_rcpps', 'NN_rcpss', 'NN_rsqrtps', 'NN_rsqrtss', 'NN_shufps', 'NN_sqrtps', 'NN_sqrtss', 'NN_stmxcsr', 'NN_subps', 'NN_subss', 'NN_ucomiss', 'NN_unpckhps', 'NN_unpcklps', 'NN_xorps', 'NN_pavgb', 'NN_pavgw', 'NN_pextrw', 'NN_pinsrw', 'NN_pmaxsw', 'NN_pmaxub', 'NN_pminsw', 'NN_pminub', 'NN_pmovmskb', 'NN_pmulhuw', 'NN_psadbw', 'NN_pshufw', 'NN_maskmovq', 'NN_movntps', 'NN_movntq', 'NN_prefetcht0', 'NN_prefetcht1', 'NN_prefetcht2', 'NN_prefetchnta', 'NN_sfence', 'NN_cmpeqps', 'NN_cmpltps', 'NN_cmpleps', 'NN_cmpunordps', 'NN_cmpneqps', 'NN_cmpnltps', 'NN_cmpnleps', 'NN_cmpordps', 'NN_cmpeqss', 'NN_cmpltss', 'NN_cmpless', 'NN_cmpunordss', 'NN_cmpneqss', 'NN_cmpnltss', 'NN_cmpnless', 'NN_cmpordss', 'NN_pf2iw', 'NN_pfnacc', 'NN_pfpnacc', 'NN_pi2fw', 'NN_pswapd', 'NN_fstp1', 'NN_fcom2', 'NN_fcomp3', 'NN_fxch4', 'NN_fcomp5', 'NN_ffreep', 'NN_fxch7', 'NN_fstp8', 'NN_fstp9', 'NN_addpd', 'NN_addsd', 'NN_andnpd', 'NN_andpd', 'NN_clflush', 'NN_cmppd', 'NN_cmpsd', 'NN_comisd', 'NN_cvtdq2pd', 'NN_cvtdq2ps', 'NN_cvtpd2dq', 'NN_cvtpd2pi', 'NN_cvtpd2ps', 'NN_cvtpi2pd', 'NN_cvtps2dq', 'NN_cvtps2pd', 'NN_cvtsd2si', 'NN_cvtsd2ss', 'NN_cvtsi2sd', 'NN_cvtss2sd', 'NN_cvttpd2dq', 'NN_cvttpd2pi', 'NN_cvttps2dq', 'NN_cvttsd2si', 'NN_divpd', 'NN_divsd', 'NN_lfence', 'NN_maskmovdqu', 'NN_maxpd', 'NN_maxsd', 'NN_mfence', 'NN_minpd', 'NN_minsd', 'NN_movapd', 'NN_movdq2q', 'NN_movdqa', 'NN_movdqu', 'NN_movhpd', 'NN_movlpd', 'NN_movmskpd', 'NN_movntdq', 'NN_movnti', 'NN_movntpd', 'NN_movq2dq', 'NN_movsd', 'NN_movupd', 'NN_mulpd', 'NN_mulsd', 'NN_orpd', 'NN_paddq', 'NN_pause', 'NN_pmuludq', 'NN_pshufd', 'NN_pshufhw', 'NN_pshuflw', 'NN_pslldq', 'NN_psrldq', 'NN_psubq', 'NN_punpckhqdq', 'NN_punpcklqdq', 'NN_shufpd', 'NN_sqrtpd', 'NN_sqrtsd', 'NN_subpd', 'NN_subsd', 'NN_ucomisd', 'NN_unpckhpd', 'NN_unpcklpd', 'NN_xorpd', 'NN_syscall', 'NN_sysret', 'NN_swapgs', 'NN_movddup', 'NN_movshdup', 'NN_movsldup', 'NN_movsxd', 'NN_cmpxchg16b', 'NN_last'] 52 | 53 | 54 | # The following table is indexed with the segment's bitness of op.dtyp 55 | # depending on whether the register is used as an operand value or 56 | # for addressing. 57 | # 58 | # The first two list are identical, the 3rd and 4th are for 32 and 64 bits 59 | # respectively 60 | # 61 | 62 | REGISTERS = [ 63 | [ 'ax', 'cx', 'dx', 'bx', 'sp', 'bp', 'si', 'di', 'r8', 64 | 'r9', 'r10', 'r11', 'r12', 'r13', 'r14', 'r15', 'al', 65 | 'cl', 'dl', 'bl', 'ah', 'ch', 'dh', 'bh', 'spl', 'bpl', 66 | 'sil', 'dil', 'ip', 'es', 'cs', 'ss', 'ds', 'fs', 'gs'], 67 | [ 'ax', 'cx', 'dx', 'bx', 'sp', 'bp', 'si', 'di', 'r8', 68 | 'r9', 'r10', 'r11', 'r12', 'r13', 'r14', 'r15', 'al', 69 | 'cl', 'dl', 'bl', 'ah', 'ch', 'dh', 'bh', 'spl', 'bpl', 70 | 'sil', 'dil', 'ip', 'es', 'cs', 'ss', 'ds', 'fs', 'gs'], 71 | [ 'eax', 'ecx', 'edx', 'ebx', 'esp', 'ebp', 'esi', 'edi', 'r8', 72 | 'r9', 'r10', 'r11', 'r12', 'r13', 'r14', 'r15', 'al', 73 | 'cl', 'dl', 'bl', 'ah', 'ch', 'dh', 'bh', 'spl', 'bpl', 74 | 'sil', 'dil', 'eip', 'es', 'cs', 'ss', 'ds', 'fs', 'gs'], 75 | [ 'rax', 'rcx', 'rdx', 'rbx', 'rsp', 'rbp', 'rsi', 'rdi', 'r8', 76 | 'r9', 'r10', 'r11', 'r12', 'r13', 'r14', 'r15', 'al', 77 | 'cl', 'dl', 'bl', 'ah', 'ch', 'dh', 'bh', 'spl', 'bpl', 78 | 'sil', 'dil', 'rip', 'es', 'cs', 'ss', 'ds', 'fs', 'gs'], 79 | [], 80 | [], 81 | [], 82 | ['unk_reg_%02d' % i for i in range(56)] + [ 'mm%d' % i for i in range(8) ], 83 | ['unk_reg_%02d' % i for i in range(64)] + [ 'xmm%d' % i for i in range(8) ] ] 84 | 85 | SIB_BASE_REGISTERS = ['eax', 'ecx', 'edx', 'ebx', 'esp', '', 'esi', 'edi'] 86 | SIB_INDEX_REGISTERS = ['eax', 'ecx', 'edx', 'ebx', '', 'ebp', 'esi', 'edi'] 87 | 88 | # Add the segment registers as operators 89 | # 90 | NODE_TYPE_OPERATOR_SEGMENT_ES = 'es:' 91 | NODE_TYPE_OPERATOR_SEGMENT_CS = 'cs:' 92 | NODE_TYPE_OPERATOR_SEGMENT_SS = 'ss:' 93 | NODE_TYPE_OPERATOR_SEGMENT_DS = 'ds:' 94 | NODE_TYPE_OPERATOR_SEGMENT_FS = 'fs:' 95 | NODE_TYPE_OPERATOR_SEGMENT_GS = 'gs:' 96 | NODE_TYPE_OPERATOR_SEGMENT_GEN = ':' 97 | 98 | OPERATORS = arch.Arch.OPERATORS+( 99 | NODE_TYPE_OPERATOR_SEGMENT_ES, NODE_TYPE_OPERATOR_SEGMENT_CS, 100 | NODE_TYPE_OPERATOR_SEGMENT_SS, NODE_TYPE_OPERATOR_SEGMENT_DS, 101 | NODE_TYPE_OPERATOR_SEGMENT_FS, NODE_TYPE_OPERATOR_SEGMENT_GS, 102 | NODE_TYPE_OPERATOR_SEGMENT_GEN) 103 | 104 | 105 | OPERAND_WIDTH = [ 106 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_1, 107 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_2, 108 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_4, 109 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_4, 110 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_8, 111 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_VARIABLE, 112 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_12, 113 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_8, 114 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_16, 115 | None, None, 116 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_6, 117 | None, None, 118 | None, None] 119 | 120 | def __get_instruction_index(self, insn_list): 121 | """Retrieve the indices of the given instructions into the instruction table. 122 | 123 | Those indices are used to indicate the type of an instruction. 124 | """ 125 | 126 | return [self.INSTRUCTIONS.index(i) for i in insn_list] 127 | 128 | 129 | def __init__(self): 130 | arch.Arch.__init__(self) 131 | 132 | self.INSTRUCTIONS_CALL = self.__get_instruction_index( 133 | ('NN_call', 'NN_callfi', 'NN_callni')) 134 | self.INSTRUCTIONS_CONDITIONAL_BRANCH = self.__get_instruction_index( 135 | ('NN_ja', 'NN_jae', 'NN_jb', 'NN_jbe', 'NN_jc', 'NN_jcxz', 'NN_jecxz', 'NN_jrcxz', 'NN_je', 'NN_jg', 'NN_jge', 'NN_jl', 'NN_jle', 'NN_jna', 'NN_jnae', 'NN_jnb', 'NN_jnbe', 'NN_jnc', 'NN_jne', 'NN_jng', 'NN_jnge', 'NN_jnl', 'NN_jnle', 'NN_jno', 'NN_jnp', 'NN_jns', 'NN_jnz', 'NN_jo', 'NN_jp', 'NN_jpe', 'NN_jpo', 'NN_js', 'NN_jz')) 136 | self.INSTRUCTIONS_UNCONDITIONAL_BRANCH = self.__get_instruction_index( 137 | ('NN_jmp', 'NN_jmpfi', 'NN_jmpni', 'NN_jmpshort')) 138 | self.INSTRUCTIONS_RET = self.__get_instruction_index( 139 | ('NN_iretw', 'NN_iret', 'NN_iretd', 'NN_iretq')) 140 | 141 | self.INSTRUCTIONS_BRANCH = self.__get_instruction_index( 142 | ('NN_call', 'NN_callfi', 'NN_callni', 'NN_ja', 'NN_jae', 'NN_jb', 'NN_jbe', 'NN_jc', 'NN_jcxz', 'NN_jecxz', 'NN_jrcxz', 'NN_je', 'NN_jg', 'NN_jge', 'NN_jl', 'NN_jle', 'NN_jna', 'NN_jnae', 'NN_jnb', 'NN_jnbe', 'NN_jnc', 'NN_jne', 'NN_jng', 'NN_jnge', 'NN_jnl', 'NN_jnle', 'NN_jno', 'NN_jnp', 'NN_jns', 'NN_jnz', 'NN_jo', 'NN_jp', 'NN_jpe', 'NN_jpo', 'NN_js', 'NN_jz', 'NN_jmp', 'NN_jmpfi', 'NN_jmpni', 'NN_jmpshort')) 143 | 144 | self.no_op_instr = [ "lods", "stos", "scas", "cmps", "movs" ] 145 | 146 | self.arch_name = 'x86' 147 | 148 | def check_arch(self): 149 | 150 | if self.processor_name == 'metapc': 151 | return True 152 | 153 | return False 154 | 155 | def get_mnemonic(self, addr): 156 | """Return the mnemonic for the current instruction. 157 | 158 | """ 159 | 160 | if idaapi.ua_mnem(addr) in self.no_op_instr: 161 | return idc.GetDisasm(addr) 162 | else: 163 | return idaapi.ua_mnem(addr) 164 | 165 | 166 | def operands_parser(self, address, operands): 167 | """Parse operands. 168 | 169 | Can be defined in architecture specific modules to 170 | process the whole list of operands before or after 171 | parsing, if necessary. In Intel, for instance, is 172 | used to post process operands where the target is 173 | also used as source but included only once, that 174 | happens for instance with the IMUL instruction. 175 | """ 176 | 177 | op_list = [] 178 | 179 | if idaapi.ua_mnem(address) in self.no_op_instr: 180 | return op_list 181 | 182 | for op, idx in operands: 183 | # The following will make sure it's an operand that IDA displays. 184 | # IDA sometimes encodes implicit operand's information into the 185 | # structures representing instructions but chooses not to display 186 | # those operands. We try to reproduce IDAs output 187 | # 188 | if idc.GetOpnd(address, idx) != '': 189 | current_operand = self.single_operand_parser(address, op, idx) 190 | 191 | if not current_operand: 192 | continue 193 | 194 | if isinstance(current_operand[0], (list, tuple)): 195 | op_list.extend( current_operand ) 196 | else: 197 | op_list.append( current_operand ) 198 | 199 | operands = op_list 200 | 201 | return op_list 202 | 203 | 204 | def single_operand_parser(self, address, op, idx): 205 | """Parse a metapc operand.""" 206 | 207 | # Convenience functions 208 | # 209 | def has_sib_byte(op): 210 | # Does the instruction use the SIB byte? 211 | return self.as_byte_value(op.specflag1)==1 212 | 213 | def get_sib_scale(op): 214 | return (None, 2, 4, 8)[self.as_byte_value(op.specflag2)>>6] 215 | 216 | def get_sib_scaled_index_reg(op): 217 | return self.SIB_INDEX_REGISTERS[(self.as_byte_value(op.specflag2)>>3)&0x7] 218 | 219 | def get_sib_base_reg(op): 220 | # 221 | # [ [7-6] [5-3] [2-0] ] 222 | # MOD/RM = ( (mod_2 << 6) | (reg_opcode_3 << 3) | rm_3 ) 223 | # There's not MOD/RM made available by IDA!? 224 | # 225 | # [ [7-6] [5-3] [2-0] ] 226 | # SIB = ( (scale_2 << 6) | (index_3 << 3) | base ) 227 | # op.specflag2 228 | # 229 | # instruction = op + modrm + sib + disp + imm 230 | # 231 | 232 | # If MOD is zero there's no base register, otherwise it's EBP 233 | # But IDA exposes no MOD/RM. 234 | # Following a discussion in IDA's forums: 235 | # http://www.hex-rays.com/forum/viewtopic.php?f=8&t=1424&p=8479&hilit=mod+rm#p8479 236 | # checking for it can be done in the following manner: 237 | # 238 | 239 | SIB_byte = self.as_byte_value(op.specflag2) 240 | 241 | return self.SIB_BASE_REGISTERS[ SIB_byte & 0x7] 242 | 243 | def get_segment_prefix(op): 244 | 245 | seg_idx = (op.specval>>16) 246 | if seg_idx == 0: 247 | return None 248 | 249 | if (op.specval>>16) < len(self.REGISTERS[0]) : 250 | seg_prefix = self.REGISTERS[0][op.specval>>16] + ':' 251 | else: 252 | seg_prefix = op.specval&0xffff 253 | 254 | # This must return a string in case a segment register selector is used 255 | # or and int/long of a descriptor itself. 256 | # 257 | return seg_prefix 258 | 259 | 260 | def parse_phrase(op, has_displacement=False): 261 | """Parse the expression used for indexed memory access. 262 | 263 | Returns its AST as a nested list of lists. 264 | """ 265 | 266 | # Check the addressing mode using in this segment 267 | segment = idaapi.getseg(address) 268 | if segment.bitness != 1: 269 | raise Exception( 270 | 'Not yet handling addressing modes other than 32bit!') 271 | 272 | 273 | base_reg = get_sib_base_reg(op) 274 | scaled_index_reg = get_sib_scaled_index_reg(op) 275 | scale = get_sib_scale(op) 276 | 277 | if scale: 278 | 279 | # return nested list for reg+reg*scale 280 | if base_reg != '': 281 | # The last values in each tuple indicate the 282 | # preferred display position of each element. 283 | # base_reg + (scale_reg * scale) 284 | # 285 | 286 | if scaled_index_reg == '': 287 | return [ 288 | self.NODE_TYPE_OPERATOR_PLUS, 289 | [self.NODE_TYPE_REGISTER, base_reg, 0] ] 290 | 291 | return [ 292 | self.NODE_TYPE_OPERATOR_PLUS, 293 | [self.NODE_TYPE_REGISTER, base_reg, 0], 294 | [self.NODE_TYPE_OPERATOR_TIMES, 295 | [self.NODE_TYPE_REGISTER, scaled_index_reg, 0], 296 | [self.NODE_TYPE_VALUE, scale, 1], 1 ] ] 297 | else: 298 | # If there's no base register and 299 | # mod == 01 or mod == 10 (=> operand has displacement) 300 | # then we need to add EBP 301 | if has_displacement: 302 | return [ 303 | self.NODE_TYPE_OPERATOR_PLUS, 304 | [ self.NODE_TYPE_REGISTER, 'ebp', 0], 305 | [ self.NODE_TYPE_OPERATOR_TIMES, 306 | [self.NODE_TYPE_REGISTER, scaled_index_reg, 0], 307 | [self.NODE_TYPE_VALUE, scale, 1], 1 ] ] 308 | return [ 309 | self.NODE_TYPE_OPERATOR_PLUS, 310 | [ self.NODE_TYPE_OPERATOR_TIMES, 311 | [self.NODE_TYPE_REGISTER, scaled_index_reg, 0], 312 | [self.NODE_TYPE_VALUE, scale, 1], 0 ] ] 313 | 314 | else: 315 | # return nested list for reg+reg 316 | if base_reg == '': 317 | if scaled_index_reg != '': 318 | if has_displacement: 319 | return [ 320 | self.NODE_TYPE_OPERATOR_PLUS, 321 | [ self.NODE_TYPE_REGISTER, 'ebp', 0], 322 | [ self.NODE_TYPE_REGISTER, scaled_index_reg, 1 ] ] 323 | return [ 324 | self.NODE_TYPE_OPERATOR_PLUS, 325 | [self.NODE_TYPE_REGISTER, scaled_index_reg, 0 ] ] 326 | else: 327 | if has_displacement: 328 | return [self.NODE_TYPE_OPERATOR_PLUS, [self.NODE_TYPE_REGISTER, 'ebp', 0] ] 329 | return [ ] 330 | 331 | else: 332 | if scaled_index_reg != '': 333 | return [ 334 | self.NODE_TYPE_OPERATOR_PLUS, 335 | [self.NODE_TYPE_REGISTER, base_reg, 0], 336 | [self.NODE_TYPE_REGISTER, scaled_index_reg, 1 ] ] 337 | else: 338 | return [ 339 | self.NODE_TYPE_OPERATOR_PLUS, 340 | [self.NODE_TYPE_REGISTER, base_reg, 0] ] 341 | 342 | 343 | # Operand parsing 344 | # 345 | 346 | if op.type == OPERAND_TYPE_NO_OPERAND: 347 | return None 348 | 349 | segment = idaapi.getseg(address) 350 | addressing_mode = segment.bitness 351 | 352 | # Start creating the AST, the root entry is always the width of the 353 | # operand 354 | operand = [self.OPERAND_WIDTH[ self.as_byte_value( op.dtyp ) ]] 355 | 356 | 357 | # If the operand indicates a displacement and it does 358 | # the indexing through the SIB the it might be referring 359 | # a variable on the stack and an attempt to retrieve it 360 | # is made. 361 | # 362 | 363 | 364 | # Compose the rest of the AST 365 | # 366 | 367 | if op.type == OPERAND_TYPE_DISPLACEMENT: 368 | 369 | # A displacement operatior might refer to a variable... 370 | # 371 | var_name = None 372 | 373 | # Try to get any stack name that might have been assigned 374 | # to the variable. 375 | # 376 | flags = idc.GetFlags(address) 377 | if (idx==0 and idc.isStkvar0(flags)) or ( 378 | idx==1 and idc.isStkvar1(flags)): 379 | 380 | var_name = self.get_operand_stack_variable_name(address, op, idx) 381 | 382 | if has_sib_byte(op) is True: 383 | # when SIB byte set, process the SIB indexing 384 | phrase = parse_phrase(op, has_displacement=True) 385 | else: 386 | phrase = [ 387 | self.NODE_TYPE_OPERATOR_PLUS, 388 | [self.NODE_TYPE_REGISTER, 389 | self.REGISTERS[addressing_mode+1][op.reg], 0] ] 390 | 391 | if var_name: 392 | value = arch.ExpressionNamedValue(long(op.addr), var_name) 393 | else: 394 | value = op.addr 395 | 396 | # Calculate the index of the value depending on how many components 397 | # we have in the phrase 398 | # 399 | idx_of_value = len( phrase ) - 1 400 | operand.extend([ 401 | [ get_segment_prefix(op), 402 | [self.NODE_TYPE_DEREFERENCE, 403 | phrase+[ [self.NODE_TYPE_VALUE, value, idx_of_value] ] ] ] ]) 404 | 405 | 406 | elif op.type == OPERAND_TYPE_REGISTER: 407 | 408 | operand.extend([ 409 | [self.NODE_TYPE_REGISTER, self.REGISTERS[self.as_byte_value(op.dtyp)][op.reg], 0]]) 410 | 411 | elif op.type == OPERAND_TYPE_MEMORY: 412 | 413 | addr_name = self.get_address_name(op.addr) 414 | 415 | if addr_name: 416 | value = arch.ExpressionNamedValue(long(op.addr), addr_name) 417 | else: 418 | value = op.addr 419 | 420 | if has_sib_byte(op) is True: 421 | # when SIB byte set, process the SIB indexing 422 | phrase = parse_phrase(op) 423 | 424 | idx_of_value = len( phrase ) - 1 425 | operand.extend([ 426 | [ get_segment_prefix(op), 427 | [self.NODE_TYPE_DEREFERENCE, 428 | phrase+[[self.NODE_TYPE_VALUE, value, idx_of_value]] ] ] ]) 429 | else: 430 | operand.extend([ 431 | [ get_segment_prefix(op), 432 | [self.NODE_TYPE_DEREFERENCE, 433 | [self.NODE_TYPE_VALUE, value, 0] ] ] ]) 434 | 435 | 436 | 437 | elif op.type == OPERAND_TYPE_IMMEDIATE: 438 | 439 | width = self.OPERAND_WIDTH[self.as_byte_value(op.dtyp)] 440 | 441 | if width == arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_1: 442 | value = op.value&0xff 443 | elif width == arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_2: 444 | value = op.value&0xffff 445 | elif width == arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_4: 446 | value = op.value&0xffffffff 447 | else: 448 | value = op.value 449 | 450 | operand.extend([[self.NODE_TYPE_VALUE, value, 0]]) 451 | 452 | 453 | elif op.type in (OPERAND_TYPE_NEAR, OPERAND_TYPE_FAR): 454 | 455 | addr_name = self.get_address_name(op.addr) 456 | 457 | if addr_name: 458 | value = arch.ExpressionNamedValue(long(op.addr), addr_name) 459 | else: 460 | value = op.addr 461 | 462 | seg_prefix = get_segment_prefix(op) 463 | if isinstance(seg_prefix, str): 464 | operand.extend([ 465 | [ seg_prefix, [self.NODE_TYPE_VALUE, value, 0] ]]) 466 | elif isinstance(seg_prefix, (int, long)): 467 | operand.extend([ 468 | [ self.NODE_TYPE_OPERATOR_SEGMENT_GEN, 469 | [self.NODE_TYPE_VALUE, seg_prefix, 0], 470 | [self.NODE_TYPE_VALUE, value, 1] ]] ) 471 | 472 | 473 | elif op.type == OPERAND_TYPE_PHRASE: 474 | if has_sib_byte(op) is True: 475 | phrase = parse_phrase(op) 476 | 477 | # Detect observed cases (in GCC compiled sshd) where GCC's instruction 478 | # encoding would be parsed into a phrase with an addition of a single 479 | # register, without any other summands. 480 | # In those cases, if there's a name associated to the zero such as 481 | # a stack variable, we will add a zero to the sum. We do that to have 482 | # an expression to which alias an expression substitution (in the past 483 | # we were removing the addition altogether) 484 | # If there's no name we will remove the redundant 0 485 | # 486 | # 487 | # This case has been observed for the encoding of [esp] where the tree 488 | # would be "[" -> "+" -> "esp". 489 | # 490 | # 491 | if phrase[0] == self.NODE_TYPE_OPERATOR_PLUS and len(phrase) == 2: 492 | 493 | var_name = self.get_operand_stack_variable_name(address, op, idx) 494 | if var_name: 495 | value = arch.ExpressionNamedValue(0, var_name) 496 | phrase.append( [self.NODE_TYPE_VALUE, value, 1] ) 497 | else: 498 | phrase = phrase[1] 499 | 500 | 501 | operand.extend([ 502 | [get_segment_prefix(op), 503 | [self.NODE_TYPE_DEREFERENCE, phrase] ]] ) 504 | 505 | else: 506 | operand.extend([ 507 | [get_segment_prefix(op), 508 | [self.NODE_TYPE_DEREFERENCE, 509 | [self.NODE_TYPE_REGISTER, 510 | self.REGISTERS[addressing_mode+1][op.phrase], 0] ] ]]) 511 | 512 | elif op.type == OPERAND_TYPE_IDPSPEC0: 513 | # The operand refers to the TR* registers 514 | operand.extend([ 515 | [self.NODE_TYPE_REGISTER, 'tr%d' % op.reg, 0]]) 516 | 517 | elif op.type == OPERAND_TYPE_IDPSPEC1: 518 | # The operand refers to the DR* registers 519 | operand.extend([ 520 | [self.NODE_TYPE_REGISTER, 'dr%d' % op.reg, 0]]) 521 | 522 | elif op.type == OPERAND_TYPE_IDPSPEC2: 523 | # The operand refers to the CR* registers 524 | operand.extend([ 525 | [self.NODE_TYPE_REGISTER, 'cr%d' % op.reg, 0]]) 526 | 527 | elif op.type == OPERAND_TYPE_IDPSPEC3: 528 | # The operand refers to the FPU register stack 529 | operand.extend([ 530 | [self.NODE_TYPE_REGISTER, 'st(%d)' % op.reg, 0]]) 531 | 532 | elif op.type == OPERAND_TYPE_IDPSPEC4: 533 | # The operand is a MMX register 534 | operand.extend([ 535 | [self.NODE_TYPE_REGISTER, 'mm%d' % op.reg, 0]]) 536 | 537 | elif op.type == OPERAND_TYPE_IDPSPEC5: 538 | # The operand is a MMX register 539 | operand.extend([ 540 | [self.NODE_TYPE_REGISTER, 'xmm%d' % op.reg, 0]]) 541 | 542 | # If no other thing that a width, i.e. ['b2'] is retrieved 543 | # we assume there was no operand... this is a hack but I've seen 544 | # IDA pretend there's a first operand like this: 545 | # 546 | # fld ['b2'], ['b4', ['ds', ['[', ['+', ['$', 'edx'], [...]]]]] 547 | # 548 | # So, in these cases I want no first operand... 549 | #if len(operand)==1: 550 | # return None 551 | 552 | return operand 553 | 554 | 555 | def process_instruction(self, packet, addr): 556 | """Architecture specific instruction processing""" 557 | 558 | # Call the generic part with the architecture specific operand 559 | # handling 560 | # 561 | 562 | (instruction, 563 | i_mnemonic, 564 | operands, 565 | operand_strings, 566 | data) = self.process_instruction_generic(addr) 567 | 568 | if i_mnemonic is None: 569 | return None 570 | 571 | if idaapi.get_byte(addr) == 0xf0: 572 | prefix = 'lock ' 573 | else: 574 | prefix = '' 575 | 576 | packet.add_instruction(instruction, addr, prefix+i_mnemonic, 577 | operand_strings, operands, data) 578 | 579 | return instruction 580 | 581 | 582 | 583 | -------------------------------------------------------------------------------- /ida_to_sql/arch/ppc.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | __author__ = 'Ero Carrera' 20 | __license__ = 'GPL' 21 | 22 | 23 | import arch 24 | import idc 25 | import idaapi 26 | 27 | 28 | # IDA's operand types 29 | # 30 | OPERAND_TYPE_NO_OPERAND = 0 31 | OPERAND_TYPE_REGISTER = 1 32 | OPERAND_TYPE_MEMORY = 2 33 | OPERAND_TYPE_PHRASE = 3 34 | OPERAND_TYPE_DISPLACEMENT = 4 35 | OPERAND_TYPE_IMMEDIATE = 5 36 | OPERAND_TYPE_FAR = 6 37 | OPERAND_TYPE_NEAR = 7 38 | OPERAND_TYPE_IDPSPEC0 = 8 39 | OPERAND_TYPE_IDPSPEC1 = 9 40 | OPERAND_TYPE_IDPSPEC2 = 10 41 | OPERAND_TYPE_IDPSPEC3 = 11 42 | OPERAND_TYPE_IDPSPEC4 = 12 43 | OPERAND_TYPE_IDPSPEC5 = 13 44 | 45 | 46 | 47 | class Arch(arch.Arch): 48 | """Architecture specific processing for 'PPC'""" 49 | 50 | 51 | INSTRUCTIONS = [ 'PPC_null', 'PPC_add', 'PPC_addc', 'PPC_adde', 'PPC_addi', 'PPC_addic', 'PPC_addis', 'PPC_addme', 'PPC_addze', 'PPC_and', 'PPC_andc', 'PPC_andi', 'PPC_andis', 'PPC_b', 'PPC_bc', 'PPC_bcctr', 'PPC_bclr', 'PPC_cmp', 'PPC_cmpi', 'PPC_cmpl', 'PPC_cmpli', 'PPC_cntlzd', 'PPC_cntlzw', 'PPC_crand', 'PPC_crandc', 'PPC_creqv', 'PPC_crnand', 'PPC_crnor', 'PPC_cror', 'PPC_crorc', 'PPC_crxor', 'PPC_dcba', 'PPC_dcbf', 'PPC_dcbi', 'PPC_dcbst', 'PPC_dcbt', 'PPC_dcbtst', 'PPC_dcbz', 'PPC_divd', 'PPC_divdu', 'PPC_divw', 'PPC_divwu', 'PPC_eciwx', 'PPC_ecowx', 'PPC_eieio', 'PPC_eqv', 'PPC_extsb', 'PPC_extsh', 'PPC_extsw', 'PPC_fabs', 'PPC_fadd', 'PPC_fadds', 'PPC_fcfid', 'PPC_fcmpo', 'PPC_fcmpu', 'PPC_fctid', 'PPC_fctidz', 'PPC_fctiw', 'PPC_fctiwz', 'PPC_fdiv', 'PPC_fdivs', 'PPC_fmadd', 'PPC_fmadds', 'PPC_fmr', 'PPC_fmsub', 'PPC_fmsubs', 'PPC_fmul', 'PPC_fmuls', 'PPC_fnabs', 'PPC_fneg', 'PPC_fnmadd', 'PPC_fnmadds', 'PPC_fnmsub', 'PPC_fnmsubs', 'PPC_fres', 'PPC_frsp', 'PPC_frsqrte', 'PPC_fsel', 'PPC_fsqrt', 'PPC_fsqrts', 'PPC_fsub', 'PPC_fsubs', 'PPC_icbi', 'PPC_isync', 'PPC_lbz', 'PPC_lbzu', 'PPC_lbzux', 'PPC_lbzx', 'PPC_ld', 'PPC_ldarx', 'PPC_ldu', 'PPC_ldux', 'PPC_ldx', 'PPC_lfd', 'PPC_lfdu', 'PPC_lfdux', 'PPC_lfdx', 'PPC_lfs', 'PPC_lfsu', 'PPC_lfsux', 'PPC_lfsx', 'PPC_lha', 'PPC_lhau', 'PPC_lhaux', 'PPC_lhax', 'PPC_lhbrx', 'PPC_lhz', 'PPC_lhzu', 'PPC_lhzux', 'PPC_lhzx', 'PPC_lmw', 'PPC_lswi', 'PPC_lswx', 'PPC_lwa', 'PPC_lwarx', 'PPC_lwaux', 'PPC_lwax', 'PPC_lwbrx', 'PPC_lwz', 'PPC_lwzu', 'PPC_lwzux', 'PPC_lwzx', 'PPC_mcrf', 'PPC_mcrfs', 'PPC_mcrxr', 'PPC_mfcr', 'PPC_mffs', 'PPC_mfmsr', 'PPC_mfspr', 'PPC_mfsr', 'PPC_mfsrin', 'PPC_mftb', 'PPC_mtcrf', 'PPC_mtfsb0', 'PPC_mtfsb1', 'PPC_mtfsf', 'PPC_mtfsfi', 'PPC_mtmsr', 'PPC_mtmsrd', 'PPC_mtspr', 'PPC_mtsr', 'PPC_mtsrd', 'PPC_mtsrdin', 'PPC_mtsrin', 'PPC_mulhd', 'PPC_mulhdu', 'PPC_mulhw', 'PPC_mulhwu', 'PPC_mulld', 'PPC_mulli', 'PPC_mullw', 'PPC_nand', 'PPC_neg', 'PPC_nor', 'PPC_or', 'PPC_orc', 'PPC_ori', 'PPC_oris', 'PPC_rfi', 'PPC_rfid', 'PPC_rldcl', 'PPC_rldcr', 'PPC_rldic', 'PPC_rldicl', 'PPC_rldicr', 'PPC_rldimi', 'PPC_rlwimi', 'PPC_rlwinm', 'PPC_rlwnm', 'PPC_sc', 'PPC_slbia', 'PPC_slbie', 'PPC_sld', 'PPC_slw', 'PPC_srad', 'PPC_sradi', 'PPC_sraw', 'PPC_srawi', 'PPC_srd', 'PPC_srw', 'PPC_stb', 'PPC_stbu', 'PPC_stbux', 'PPC_stbx', 'PPC_std', 'PPC_stdcx', 'PPC_stdu', 'PPC_stdux', 'PPC_stdx', 'PPC_stfd', 'PPC_stfdu', 'PPC_stfdux', 'PPC_stfdx', 'PPC_stfiwx', 'PPC_stfs', 'PPC_stfsu', 'PPC_stfsux', 'PPC_stfsx', 'PPC_sth', 'PPC_sthbrx', 'PPC_sthu', 'PPC_sthux', 'PPC_sthx', 'PPC_stmw', 'PPC_stswi', 'PPC_stswx', 'PPC_stw', 'PPC_stwbrx', 'PPC_stwcx', 'PPC_stwu', 'PPC_stwux', 'PPC_stwx', 'PPC_subf', 'PPC_subfc', 'PPC_subfe', 'PPC_subfic', 'PPC_subfme', 'PPC_subfze', 'PPC_sync', 'PPC_td', 'PPC_tdi', 'PPC_tlbia', 'PPC_tlbie', 'PPC_tlbsync', 'PPC_tw', 'PPC_twi', 'PPC_xor', 'PPC_xori', 'PPC_xoris', 'PPC_cmpwi', 'PPC_cmpw', 'PPC_cmplwi', 'PPC_cmplw', 'PPC_cmpdi', 'PPC_cmpd', 'PPC_cmpldi', 'PPC_cmpld', 'PPC_trap', 'PPC_trapd', 'PPC_twlgt', 'PPC_twllt', 'PPC_tweq', 'PPC_twlge', 'PPC_twlle', 'PPC_twgt', 'PPC_twge', 'PPC_twlt', 'PPC_twle', 'PPC_twne', 'PPC_twlgti', 'PPC_twllti', 'PPC_tweqi', 'PPC_twlgei', 'PPC_twllei', 'PPC_twgti', 'PPC_twgei', 'PPC_twlti', 'PPC_twlei', 'PPC_twnei', 'PPC_tdlgt', 'PPC_tdllt', 'PPC_tdeq', 'PPC_tdlge', 'PPC_tdlle', 'PPC_tdgt', 'PPC_tdge', 'PPC_tdlt', 'PPC_tdle', 'PPC_tdne', 'PPC_tdlgti', 'PPC_tdllti', 'PPC_tdeqi', 'PPC_tdlgei', 'PPC_tdllei', 'PPC_tdgti', 'PPC_tdgei', 'PPC_tdlti', 'PPC_tdlei', 'PPC_tdnei', 'PPC_nop', 'PPC_not', 'PPC_mr', 'PPC_subi', 'PPC_subic', 'PPC_subis', 'PPC_li', 'PPC_lis', 'PPC_crset', 'PPC_crnot', 'PPC_crmove', 'PPC_crclr', 'PPC_mtxer', 'PPC_mtlr', 'PPC_mtctr', 'PPC_mtdsisr', 'PPC_mtdar', 'PPC_mtdec', 'PPC_mtsrr0', 'PPC_mtsrr1', 'PPC_mtsprg0', 'PPC_mtsprg1', 'PPC_mtsprg2', 'PPC_mtsprg3', 'PPC_mttbl', 'PPC_mttbu', 'PPC_mfxer', 'PPC_mflr', 'PPC_mfctr', 'PPC_mfdsisr', 'PPC_mfdar', 'PPC_mfdec', 'PPC_mfsrr0', 'PPC_mfsrr1', 'PPC_mfsprg0', 'PPC_mfsprg1', 'PPC_mfsprg2', 'PPC_mfsprg3', 'PPC_mftbl', 'PPC_mftbu', 'PPC_mfpvr', 'PPC_balways', 'PPC_bt', 'PPC_bf', 'PPC_bdnz', 'PPC_bdnzt', 'PPC_bdnzf', 'PPC_bdz', 'PPC_bdzt', 'PPC_bdzf', 'PPC_blt', 'PPC_ble', 'PPC_beq', 'PPC_bge', 'PPC_bgt', 'PPC_bne', 'PPC_bso', 'PPC_bns', 'PPC_extlwi', 'PPC_extrwi', 'PPC_inslwi', 'PPC_insrwi', 'PPC_rotlwi', 'PPC_rotrwi', 'PPC_rotlw', 'PPC_slwi', 'PPC_srwi', 'PPC_clrlwi', 'PPC_clrrwi', 'PPC_clrlslwi', 'PPC_dccci', 'PPC_dcread', 'PPC_icbt', 'PPC_iccci', 'PPC_icread', 'PPC_mfdcr', 'PPC_mtdcr', 'PPC_rfci', 'PPC_tlbre', 'PPC_tlbsx', 'PPC_tlbwe', 'PPC_wrtee', 'PPC_wrteei', 'PPC_last'] 52 | 53 | 54 | # Special Purpose Registers. Looked up in GDB's source code, 55 | # IDA and the Freescale's PowerPC MPC823e manual 56 | # 57 | SPR_REGISTERS = { 58 | 0: 'mq', 59 | 1: 'xer', 60 | 4: 'rtcu', 61 | 5: 'rtcl', 62 | 8: 'lr', 63 | 9: 'ctr', 64 | #9: 'cnt', # IDA defines 9 to be CTR, I looked up 65 | # this from GDB's source so I ignore if 66 | # CNT being 9 too is an error 67 | 18: 'dsisr', 68 | 19: 'dar', 69 | 22: 'dec', 70 | 25: 'sdr1', 71 | 26: 'srr0', 72 | 27: 'srr1', 73 | 80: 'eie', 74 | 81: 'eid', 75 | 82: 'nri', 76 | 102: 'sp', 77 | 144: 'cmpa', 78 | 145: 'cmpb', 79 | 146: 'cmpc', 80 | 147: 'cmpd', 81 | 148: 'icr', 82 | 149: 'der', 83 | 150: 'counta', 84 | 151: 'countb', 85 | 152: 'cmpe', 86 | 153: 'cmpf', 87 | 154: 'cmpg', 88 | 155: 'cmph', 89 | 156: 'lctrl1', 90 | 157: 'lctrl2', 91 | 158: 'ictrl', 92 | 159: 'bar', 93 | 256: 'vrsave', 94 | 272: 'sprg0', 95 | 273: 'sprg1', 96 | 274: 'sprg2', 97 | 275: 'sprg3', 98 | 280: 'asr', 99 | 282: 'ear', 100 | 268: 'tbl_read', 101 | 269: 'tbu_read', 102 | 284: 'tbl_write', 103 | 285: 'tbu_write', 104 | 287: 'pvr', 105 | 512: 'spefscr', 106 | 528: 'ibat0u', 107 | 529: 'ibat0l', 108 | 530: 'ibat1u', 109 | 531: 'ibat1l', 110 | 532: 'ibat2u', 111 | 533: 'ibat2l', 112 | 534: 'ibat3u', 113 | 535: 'ibat3l', 114 | 536: 'dbat0u', 115 | 537: 'dbat0l', 116 | 538: 'dbat1u', 117 | 539: 'dbat1l', 118 | 540: 'dbat2u', 119 | 541: 'dbat2l', 120 | 542: 'dbat3u', 121 | 543: 'dbat3l', 122 | 560: 'ic_cst', 123 | 561: 'ic_adr', 124 | 562: 'ic_dat', 125 | 568: 'dc_cst', 126 | 569: 'dc_adr', 127 | 570: 'dc_dat', 128 | 630: 'dpdr', 129 | 631: 'dpir', 130 | 638: 'immr', 131 | 784: 'mi_ctr', 132 | 786: 'mi_ap', 133 | 787: 'mi_epn', 134 | 789: 'mi_twc', 135 | 790: 'mi_rpn', 136 | 816: 'mi_cam', 137 | 817: 'mi_ram0', 138 | 818: 'mi_ram1', 139 | 792: 'md_ctr', 140 | 793: 'm_casid', 141 | 794: 'md_ap', 142 | 795: 'md_epn', 143 | 796: 'm_twb', 144 | 797: 'md_twc', 145 | 798: 'md_rpn', 146 | 799: 'm_tw', 147 | 816: 'mi_dbcam', 148 | 817: 'mi_dbram0', 149 | 818: 'mi_dbram1', 150 | #824: 'md_dbcam', 151 | 824: 'md_cam', 152 | #825: 'md_dbram0', 153 | 825: 'md_ram0', 154 | #826: 'md_dbram1', 155 | 826: 'md_ram1', 156 | 936: 'ummcr0', 157 | 937: 'upmc1', 158 | 938: 'upmc2', 159 | 939: 'usia', 160 | 940: 'ummcr1', 161 | 941: 'upmc3', 162 | 942: 'upmc4', 163 | 944: 'zpr', 164 | 945: 'pid', 165 | 952: 'mmcr0', 166 | 953: 'pmc1', 167 | 953: 'sgr', 168 | 954: 'pmc2', 169 | #954: 'dcwr', 170 | 955: 'sia', 171 | 956: 'mmcr1', 172 | 957: 'pmc3', 173 | 958: 'pmc4', 174 | 959: 'sda', 175 | 972: 'tbhu', 176 | 973: 'tblu', 177 | 976: 'dmiss', 178 | 977: 'dcmp', 179 | 978: 'hash1', 180 | 979: 'hash2', 181 | #979: 'icdbdr', 182 | #980: 'imiss', 183 | 980: 'esr', 184 | 981: 'icmp', 185 | #981: 'dear', 186 | 982: 'rpa', 187 | 982: 'evpr', 188 | 983: 'cdbcr', 189 | 984: 'tsr', 190 | #984: '602_tcr', 191 | #986: '403_tcr', 192 | #986: 'ibr', 193 | 986: 'tcr', 194 | 987: 'pit', 195 | 988: 'esasrr', 196 | #988: 'tbhi', 197 | 989: 'tblo', 198 | 990: 'srr2', 199 | #990: 'sebr', 200 | 991: 'srr3', 201 | #991: 'ser', 202 | 1008: 'hid0', 203 | #1008: 'dbsr', 204 | 1009: 'hid1', 205 | 1010: 'iabr', 206 | #1010: 'dbcr', 207 | 1012: 'iac1', 208 | 1013: 'dabr', 209 | #1013: 'iac2', 210 | 1014: 'dac1', 211 | 1015: 'dac2', 212 | 1017: 'l2cr', 213 | 1018: 'dccr', 214 | #1019: 'ictc', 215 | 1019: 'iccr', 216 | #1020: 'thrm1', 217 | 1020: 'pbl1', 218 | #1021: 'thrm2', 219 | 1021: 'pbu1', 220 | #1022: 'thrm3', 221 | 1022: 'pbl2', 222 | #1022: 'fpecr', 223 | #1022: 'lt', 224 | 1023: 'pir', 225 | #1023: 'pbu2' 226 | } 227 | 228 | CR_REGISTERS = ['cr%d' % i for i in range(8)] 229 | REGISTERS = ['%%r%d' % i for i in range(32)] 230 | REGISTERS.extend(['UNK32', 'UNK33']) 231 | REGISTERS.extend(['%%fp%d' % i for i in range(32)]) 232 | REGISTERS.extend(['%%sr%d' % i for i in range(16)]) 233 | REGISTERS[1] = '%sp' 234 | REGISTERS[2] = '%rtoc' 235 | 236 | OPERATORS = arch.Arch.OPERATORS 237 | 238 | 239 | OPERAND_WIDTH = [ 240 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_1, 241 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_2, 242 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_4, 243 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_4, 244 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_8, 245 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_VARIABLE, 246 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_12, 247 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_8, 248 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_16, 249 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_2, None, 250 | arch.Arch.NODE_TYPE_OPERATOR_WIDTH_BYTE_6, 251 | None, None, 252 | None, None] 253 | 254 | def __get_instruction_index(self, insn_list): 255 | """Retrieve the indices of the given instructions into the instruction table. 256 | 257 | Those indices are used to indicate the type of an instruction. 258 | """ 259 | 260 | return [self.INSTRUCTIONS.index(i) for i in insn_list] 261 | 262 | def __init__(self): 263 | arch.Arch.__init__(self) 264 | 265 | #self.INSTRUCTIONS_CALL = self.__get_instruction_index((,)) 266 | self.INSTRUCTIONS_CONDITIONAL_BRANCH = self.__get_instruction_index( 267 | ( 'PPC_bc', 'PPC_bcctr', 'PPC_bclr', 'PPC_bt', 'PPC_bf', 'PPC_bdnz', 'PPC_bdnzt', 'PPC_bdnzf', 'PPC_bdz', 'PPC_bdzt', 'PPC_bdzf', 'PPC_blt', 'PPC_ble', 'PPC_beq', 'PPC_bge', 'PPC_bgt', 'PPC_bne', 'PPC_bso', 'PPC_bns')) 268 | self.INSTRUCTIONS_UNCONDITIONAL_BRANCH = self.__get_instruction_index( 269 | ( 'PPC_b', 'PPC_balways')) 270 | #self.INSTRUCTIONS_RET = self.__get_instruction_index((,)) 271 | 272 | self.INSTRUCTIONS_BRANCH = self.__get_instruction_index( 273 | ( 'PPC_bc', 'PPC_bcctr', 'PPC_bclr', 'PPC_bt', 'PPC_bf', 'PPC_bdnz', 'PPC_bdnzt', 'PPC_bdnzf', 'PPC_bdz', 'PPC_bdzt', 'PPC_bdzf', 'PPC_blt', 'PPC_ble', 'PPC_beq', 'PPC_bge', 'PPC_bgt', 'PPC_bne', 'PPC_bso', 'PPC_bns', 'PPC_b', 'PPC_balways')) 274 | 275 | self.arch_name = 'PowerPC' 276 | 277 | def check_arch(self): 278 | 279 | if self.processor_name == 'PPC': 280 | return True 281 | 282 | return False 283 | 284 | 285 | def get_mnemonic(self, addr): 286 | """Return the mnemonic for the current instruction.""" 287 | 288 | disasm_line = idc.GetDisasm(addr) 289 | if disasm_line is None: 290 | # This behavior has been exhibited by IDA5.4 with an IDB of "libSystem.B.dylib" 291 | # at address 0x3293210e ( "08 BB CBNZ R0, loc_32932154" ) 292 | # Never IDA versions show the instruction above while IDA 5.4 293 | # returns None. We will skip the instruction in such a case returning 'invalid' 294 | # as the mnemonic 295 | # 296 | print '%08x: idc.GetDisasm() returned None for address: %08x' % (addr, addr) 297 | return 'invalid' 298 | disasm_line_tokenized = disasm_line.split() 299 | mnem = disasm_line_tokenized[0] 300 | return mnem 301 | 302 | 303 | def single_operand_parser(self, address, op, idx): 304 | """Parse a PPC operand.""" 305 | 306 | def constraint_value(value): 307 | if value>2**16: 308 | return -(2**32-value) 309 | return value 310 | 311 | 312 | # Operand parsing 313 | # 314 | 315 | if op.type == OPERAND_TYPE_NO_OPERAND: 316 | return None 317 | 318 | #print '>>>', hex(address), idx, op.type 319 | 320 | segment = idaapi.getseg(address) 321 | addressing_mode = segment.bitness 322 | 323 | # Start creating the AST, the root entry is always the width of the 324 | # operand 325 | operand = [self.OPERAND_WIDTH[self.as_byte_value(op.dtyp)]] 326 | 327 | 328 | # Compose the rest of the AST 329 | # 330 | 331 | if op.type == OPERAND_TYPE_DISPLACEMENT: 332 | 333 | # A displacement operatior might refer to a variable... 334 | # 335 | var_name = None 336 | 337 | # Try to get any name that might have been assigned to the 338 | # variable. It's only done if the register is: 339 | # sp/esp (4) os bp/ebp (5) 340 | # 341 | flags = idc.GetFlags(address) 342 | if (idx==0 and idc.isStkvar0(flags)) or ( 343 | idx==1 and idc.isStkvar1(flags)): 344 | 345 | var_name = self.get_operand_stack_variable_name(address, op, idx) 346 | 347 | #if has_sib_byte(op) is True: 348 | # when SIB byte set, process the SIB indexing 349 | # phrase = parse_phrase(op) 350 | #else: 351 | phrase = [ 352 | self.NODE_TYPE_OPERATOR_PLUS, 353 | [self.NODE_TYPE_REGISTER, 354 | self.REGISTERS[self.as_byte_value(op.reg)], 0]] 355 | 356 | if var_name: 357 | value = arch.ExpressionNamedValue(long(op.addr), var_name) 358 | else: 359 | value = constraint_value(op.addr) 360 | 361 | operand.extend([ 362 | [self.NODE_TYPE_DEREFERENCE, 363 | phrase+[ [self.NODE_TYPE_VALUE, value, 1]] ] ]) 364 | 365 | elif op.type == OPERAND_TYPE_REGISTER: 366 | operand.extend([ 367 | [self.NODE_TYPE_REGISTER, self.REGISTERS[self.as_byte_value(op.reg)], 1]]) 368 | 369 | 370 | elif op.type == OPERAND_TYPE_MEMORY: 371 | 372 | addr_name = self.get_address_name(op.addr) 373 | 374 | if addr_name: 375 | value = arch.ExpressionNamedValue(long(op.addr), addr_name) 376 | else: 377 | value = op.addr 378 | 379 | operand.extend([ 380 | [self.NODE_TYPE_DEREFERENCE, 381 | [self.NODE_TYPE_VALUE, value, 0]] ]) 382 | 383 | 384 | elif op.type == OPERAND_TYPE_IMMEDIATE: 385 | 386 | # Keep the value's size 387 | # 388 | if self.as_byte_value(op.dtyp) == 0: 389 | mask = 0xff 390 | elif self.as_byte_value(op.dtyp) == 1: 391 | mask = 0xffff 392 | else: 393 | mask = 0xffffffff 394 | 395 | operand.extend([[self.NODE_TYPE_VALUE, op.value&mask, 0]]) 396 | 397 | 398 | elif op.type in (OPERAND_TYPE_NEAR, OPERAND_TYPE_FAR): 399 | 400 | addr_name = self.get_address_name(op.addr) 401 | 402 | if addr_name: 403 | value = arch.ExpressionNamedValue(long(op.addr), addr_name) 404 | else: 405 | value = op.addr 406 | 407 | operand.extend([[self.NODE_TYPE_VALUE, value, 0]]) 408 | 409 | 410 | elif op.type == OPERAND_TYPE_PHRASE: 411 | print '***Dunno how to parse PHRASE' 412 | operand.extend([[self.NODE_TYPE_SYMBOL, 413 | 'UNK_PHRASE(val:%d, reg:%d, type:%d)' % ( 414 | op.value, self.as_byte_value(op.reg), op.type), 0]]) 415 | 416 | elif op.type == OPERAND_TYPE_IDPSPEC0: 417 | 418 | # Handle Special Purpose Registers 419 | # 420 | register = self.SPR_REGISTERS.get( 421 | op.value, 'UNKNOWN_REGISTER(val:%x)' % op.value) 422 | 423 | operand.extend([ 424 | [self.NODE_TYPE_REGISTER, register, 0]]) 425 | 426 | elif op.type == OPERAND_TYPE_IDPSPEC1: 427 | #print '***Dunno how to parse OPERAND_TYPE_IDPSPEC1' 428 | #operand.extend([[self.NODE_TYPE_SYMBOL, 429 | # 'UNK_IDPSPEC1(val:%d, reg:%d, type:%d)' % ( 430 | # op.value, op.reg, op.type), 0]]) 431 | operand.extend([ 432 | [self.NODE_TYPE_REGISTER, self.REGISTERS[self.as_byte_value(op.reg)], 1]]) 433 | operand.extend([ 434 | [self.NODE_TYPE_REGISTER, self.REGISTERS[self.as_byte_value(op.specflag1)], 2]]) 435 | 436 | elif op.type == OPERAND_TYPE_IDPSPEC2: 437 | # IDSPEC2 is operand type for all rlwinm and rlwnm 438 | # instructions which are in general op reg, reg, byte, byte, byte 439 | # or eqivalent. simplified mnemonics sometimes take less than 440 | # five arguments. 441 | # 442 | # Keep the value's size 443 | # 444 | if self.as_byte_value(op.dtyp) == 0: 445 | mask = 0xff 446 | elif self.as_byte_value(op.dtyp) == 1: 447 | mask = 0xffff 448 | else: 449 | mask = 0xffffffff 450 | 451 | operand_1 = [] 452 | operand_2 = [] 453 | operand_3 = [] 454 | 455 | # Get the object representing the instruction's data. 456 | # It varies between IDA pre-5.7 and 5.7 onwards, the following check 457 | # will take care of it (for more detail look into the similar 458 | # construct in arch.py) 459 | # 460 | if hasattr(idaapi, 'cmd' ): 461 | idaapi.decode_insn(address) 462 | ida_instruction = idaapi.cmd 463 | else: 464 | idaapi.ua_code(address) 465 | ida_instruction = idaapi.cvar.cmd 466 | 467 | if (ida_instruction.auxpref & 0x0020): 468 | #print "SH" 469 | operand_1 = [self.OPERAND_WIDTH[self.as_byte_value(op.dtyp)]] 470 | operand_1.extend([[self.NODE_TYPE_VALUE, self.as_byte_value(op.reg)&mask, 0]]) 471 | else: 472 | operand_1 = [self.OPERAND_WIDTH[self.as_byte_value(op.dtyp)]] 473 | operand_1.extend([[self.NODE_TYPE_REGISTER, self.REGISTERS[self.as_byte_value(op.reg)], 0]]) 474 | #print operand_1 475 | 476 | if (ida_instruction.auxpref & 0x0040): 477 | #print "MB" 478 | operand_2 = [self.OPERAND_WIDTH[self.as_byte_value(op.dtyp)]] 479 | operand_2.extend([[self.NODE_TYPE_VALUE, self.as_byte_value(op.specflag1)&mask, 0]]) 480 | #print operand_2 481 | 482 | if (ida_instruction.auxpref & 0x0080): 483 | #print "ME" 484 | operand_3 = [self.OPERAND_WIDTH[self.as_byte_value(op.dtyp)]] 485 | operand_3.extend([[self.NODE_TYPE_VALUE, self.as_byte_value(op.specflag2)&mask, 0]]) 486 | #print operand_3 487 | 488 | operand = [operand_1] 489 | #operand = operand_1 490 | 491 | if (ida_instruction.auxpref & 0x0040): 492 | #print "MB2" 493 | operand.append(operand_2) 494 | if (ida_instruction.auxpref & 0x0080): 495 | #print "ME2" 496 | operand.append(operand_3) 497 | 498 | #print operand 499 | # operand = operand_1 500 | #print operand 501 | #print '>>>', hex(address), idx, op.type, op.reg 502 | #operand.extend([[self.NODE_TYPE_OPERATOR_COMMA, [self.NODE_TYPE_VALUE, op.reg&mask, 0], [self.NODE_TYPE_VALUE, self.as_byte_value(op.specflag1)&mask, 1], [self.NODE_TYPE_VALUE, self.as_byte_value(op.specflag2)&mask, 2]]]) 503 | 504 | elif op.type == OPERAND_TYPE_IDPSPEC3: 505 | # CR registers 506 | # 507 | operand.extend([ 508 | [self.NODE_TYPE_REGISTER, self.CR_REGISTERS[self.as_byte_value(op.reg)], 0]]) 509 | 510 | elif op.type == OPERAND_TYPE_IDPSPEC4: 511 | # The bit in the CR to check for 512 | # 513 | operand.extend([[self.NODE_TYPE_REGISTER, self.as_byte_value(op.reg), 0]]) 514 | 515 | 516 | elif op.type == OPERAND_TYPE_IDPSPEC5: 517 | # Device Control Register, implementation specific 518 | operand.extend([[self.NODE_TYPE_REGISTER, 'DCR(%x)' % op.value, 0]]) 519 | 520 | 521 | return operand 522 | 523 | 524 | def process_instruction(self, packet, addr): 525 | """Architecture specific instruction processing""" 526 | 527 | # Call the generic part with the architecture specific operand 528 | # handling 529 | # 530 | 531 | (instruction, 532 | i_mnemonic, 533 | operands, 534 | operand_strings, 535 | data) = self.process_instruction_generic(addr) 536 | 537 | 538 | packet.add_instruction(instruction, addr, i_mnemonic, 539 | operand_strings, operands, data) 540 | 541 | 542 | return instruction 543 | 544 | 545 | -------------------------------------------------------------------------------- /ida_to_sql/common.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | __revision__ = "$LastChangedRevision: 5095 $" 20 | __author__ = 'Ero Carrera' 21 | __version__ = '%d' % int( __revision__[21:-2] ) 22 | __license__ = 'GPL' 23 | 24 | 25 | FATAL_CANNOT_CONNECT_TO_DATABASE = 0x10 26 | FATAL_MODULE_ALREADY_IN_DATABASE = 0x11 27 | FATAL_INVALID_SCHEMA_VERSION = 0x12 28 | 29 | 30 | BRANCH_TYPE_TRUE = 0 31 | BRANCH_TYPE_FALSE = 1 32 | BRANCH_TYPE_UNCONDITIONAL = 2 33 | BRANCH_TYPE_SWITCH = 3 34 | 35 | 36 | class DB_ENGINE: 37 | MYSQL = 'MySQL' 38 | POSTGRESQL = 'PostgreSQL' 39 | MYSQLDUMP = 'MySQL File Dump' 40 | SQLITE = 'SQLite' 41 | 42 | class FUNC_TYPE: 43 | FUNCTION_STANDARD = 0 44 | FUNCTION_LIBRARY = 1 45 | FUNCTION_IMPORTED = 2 46 | FUNCTION_THUNK = 3 47 | 48 | class REF_TYPE: 49 | CONDITIONAL_BRANCH_TRUE = BRANCH_TYPE_TRUE # 0 50 | CONDITIONAL_BRANCH_FALSE = BRANCH_TYPE_FALSE # 1 51 | UNCONDITIONAL_BRANCH = BRANCH_TYPE_UNCONDITIONAL # 2 52 | BRANCH_SWITCH = BRANCH_TYPE_SWITCH # 3 53 | 54 | CALL_DIRECT = 4 55 | CALL_INDIRECT = 5 56 | CALL_INDIRECT_VIRTUAL = 6 57 | 58 | DATA = 7 59 | DATA_STRING = 8 60 | 61 | 62 | def log_message(s): 63 | print 'IDA2SQL> %s' % s 64 | 65 | def dbg_message(s): 66 | print 'IDA2SQL DBG> %s' % s 67 | 68 | 69 | 70 | class Section: 71 | """Data container to encapsulate segment information.""" 72 | 73 | def __init__(self, name, base, start, end, data=None): 74 | 75 | self.name = name 76 | self.base = base 77 | self.start = start 78 | self.end = end 79 | self.data = data 80 | 81 | 82 | 83 | class DismantlerDataPacket: 84 | """Data container to encapsulate information sent from workers.""" 85 | 86 | def __init__(self): 87 | self.instructions = dict() 88 | self.address_references = set() 89 | # self.code_references = set() 90 | self.branches = set() 91 | self.calls = dict() 92 | self.todo_data_refs = list() 93 | self.todo_code_refs = list() 94 | self.disassembly = dict() 95 | self.comments = list() # list of pairs (address, comment) 96 | 97 | # Private methods 98 | # 99 | def _add_branch(self, src, dst): 100 | self.branches.add((src, dst)) 101 | 102 | def add_todo_data_ref(self, src, dst): 103 | self.todo_data_refs.append((src, dst)) 104 | 105 | def _add_todo_code_ref(self, src, dst): 106 | """Append address to analysis queue.""" 107 | 108 | self.todo_code_refs.append((src, dst)) 109 | 110 | def _add_call(self, src, dst): 111 | self.calls[src] = dst 112 | 113 | 114 | # Public methods 115 | # 116 | def add_comment(self, address, comment): 117 | self.comments.append((address, comment)) 118 | 119 | 120 | def add_instruction(self, 121 | instruction, address, mnemonic, 122 | # instruction_string, instruction_tree, data): 123 | operands, operand_trees, data): 124 | 125 | self.disassembly[address] = (instruction, data) 126 | self.instructions[address] = (instruction, mnemonic, 127 | operands, operand_trees, data) 128 | # instruction_string, instruction_tree, data) 129 | 130 | def add_data_reference(self, src, dst): 131 | # self.data_references.add((src, dst)) 132 | self.address_references.add((src, dst, REF_TYPE.DATA)) 133 | 134 | # def add_code_reference(self, src, dst): 135 | # self.code_references.add((src, dst)) 136 | 137 | def add_conditional_branch_true(self, src, dst): 138 | self._add_branch(src, dst) 139 | self.address_references.add((src, dst, REF_TYPE.CONDITIONAL_BRANCH_TRUE)) 140 | 141 | def add_conditional_branch_false(self, src, dst): 142 | self._add_branch(src, dst) 143 | self.address_references.add((src, dst, REF_TYPE.CONDITIONAL_BRANCH_FALSE)) 144 | 145 | def add_unconditional_branch(self, src, dst): 146 | self._add_branch(src, dst) 147 | self.address_references.add((src, dst, REF_TYPE.UNCONDITIONAL_BRANCH)) 148 | 149 | def add_direct_call(self, src, dst): 150 | self._add_call(src, dst) 151 | self.address_references.add((src, dst, REF_TYPE.CALL_DIRECT)) 152 | self._add_todo_code_ref(src, dst) 153 | 154 | def add_indirect_call(self, src, dst): 155 | self._add_call(src, dst) 156 | self.address_references.add((src, dst, REF_TYPE.CALL_INDIRECT)) 157 | self._add_todo_code_ref(src, dst) 158 | 159 | def add_indirect_virtual_call(self, src, dst): 160 | """Add a call reference to a function which does not exist. 161 | 162 | Any function which does actually exist in the binary at 163 | dissasembly time but it's known to exist and a later 164 | stage can be referenced with a virtual reference. 165 | A case of such references is a call to an imported function. 166 | 167 | The reason a specific method is provided is in order to 168 | differentiate between references to actual code (which can be 169 | appended to a processing queue) and those to "virtual" code. 170 | """ 171 | 172 | self._add_call(src, dst) 173 | self.address_references.add((src, dst, REF_TYPE.CALL_INDIRECT_VIRTUAL)) 174 | 175 | 176 | -------------------------------------------------------------------------------- /ida_to_sql/db_statements.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | import common 20 | 21 | __author__ = 'Ero Carrera' 22 | __version__ = common.__version__ 23 | __license__ = 'GPL' 24 | 25 | MYSQL_SCHEMA_VERSION = 2 26 | 27 | mysql_new_db_statements = """ 28 | CREATE table modules( 29 | id INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 30 | name TEXT NOT NULL, 31 | architecture VARCHAR( 32 ) NOT NULL, 32 | base_address BIGINT UNSIGNED NOT NULL, 33 | exporter varchar( 256 ) NOT NULL, 34 | version INT NOT NULL, 35 | md5 CHAR(32) NOT NULL, 36 | sha1 CHAR(40) NOT NULL, 37 | comment TEXT, 38 | import_time TIMESTAMP NOT NULL ) 39 | ENGINE=InnoDB; 40 | """ 41 | 42 | 43 | mysql_new_module_statements = """ 44 | CREATE TABLE IF NOT EXISTS 45 | ex_{MODULE_ID}_functions ( 46 | address BIGINT UNSIGNED UNIQUE NOT NULL, 47 | name TEXT NOT NULL, 48 | real_name boolean NOT NULL DEFAULT TRUE, 49 | function_type INTEGER UNSIGNED NOT NULL DEFAULT 0 CHECK( function_type <= 3 ), 50 | module_name VARCHAR( 64 ) NULL DEFAULT NULL, 51 | PRIMARY KEY ( address )) 52 | ENGINE=InnoDB; 53 | 54 | 55 | CREATE TABLE IF NOT EXISTS 56 | ex_{MODULE_ID}_basic_blocks ( 57 | id INTEGER UNSIGNED NOT NULL, 58 | parent_function BIGINT UNSIGNED NOT NULL, 59 | address BIGINT UNSIGNED NOT NULL, 60 | PRIMARY KEY(id, parent_function), 61 | KEY(address), 62 | FOREIGN KEY (parent_function) REFERENCES ex_{MODULE_ID}_functions(address) ON DELETE CASCADE ) 63 | ENGINE=InnoDB; 64 | 65 | 66 | CREATE TABLE IF NOT EXISTS 67 | ex_{MODULE_ID}_instructions ( 68 | address BIGINT UNSIGNED NOT NULL, 69 | basic_block_id INTEGER UNSIGNED NOT NULL, 70 | mnemonic VARCHAR(32), 71 | sequence INT UNSIGNED NOT NULL, 72 | data BLOB NOT NULL, 73 | PRIMARY KEY(address, basic_block_id), 74 | FOREIGN KEY (basic_block_id) REFERENCES ex_{MODULE_ID}_basic_blocks(id) ON DELETE CASCADE ) 75 | ENGINE=InnoDB; 76 | 77 | 78 | CREATE TABLE IF NOT EXISTS 79 | ex_{MODULE_ID}_callgraph ( 80 | id INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 81 | src BIGINT UNSIGNED NOT NULL, 82 | src_basic_block_id INTEGER UNSIGNED NOT NULL, 83 | src_address BIGINT UNSIGNED NOT NULL, 84 | dst BIGINT UNSIGNED NOT NULL, 85 | FOREIGN KEY (src) REFERENCES ex_{MODULE_ID}_functions(address) ON DELETE CASCADE, 86 | FOREIGN KEY (dst) REFERENCES ex_{MODULE_ID}_functions(address) ON DELETE CASCADE, 87 | FOREIGN KEY (src_basic_block_id) REFERENCES ex_{MODULE_ID}_basic_blocks(id) ON DELETE CASCADE, 88 | FOREIGN KEY (src_address) REFERENCES ex_{MODULE_ID}_instructions(address) ON DELETE CASCADE ) 89 | ENGINE=InnoDB; 90 | 91 | 92 | CREATE TABLE IF NOT EXISTS 93 | ex_{MODULE_ID}_control_flow_graph ( 94 | id INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 95 | parent_function BIGINT UNSIGNED NOT NULL, 96 | src INTEGER UNSIGNED NOT NULL, 97 | dst INTEGER UNSIGNED NOT NULL, 98 | kind INTEGER UNSIGNED NOT NULL DEFAULT 0 CHECK( kind <= 3 ), 99 | FOREIGN KEY (src) REFERENCES ex_{MODULE_ID}_basic_blocks(id) ON DELETE CASCADE, 100 | FOREIGN KEY (dst) REFERENCES ex_{MODULE_ID}_basic_blocks(id) ON DELETE CASCADE, 101 | FOREIGN KEY (parent_function) REFERENCES ex_{MODULE_ID}_functions(address) ON DELETE CASCADE, 102 | INDEX (parent_function, src) ) 103 | ENGINE=InnoDB; 104 | 105 | 106 | CREATE TABLE IF NOT EXISTS 107 | ex_{MODULE_ID}_operand_strings ( 108 | id INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 109 | str TEXT NOT NULL ) 110 | ENGINE=InnoDB; 111 | 112 | 113 | CREATE TABLE IF NOT EXISTS 114 | ex_{MODULE_ID}_expression_tree ( 115 | id INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 116 | expr_type INTEGER UNSIGNED NOT NULL DEFAULT 0 CHECK( expr_type <= 7 ), 117 | symbol VARCHAR(256), 118 | immediate BIGINT SIGNED, 119 | position INTEGER, 120 | parent_id INTEGER UNSIGNED CHECK(id > parent_id), 121 | FOREIGN KEY (parent_id) REFERENCES ex_{MODULE_ID}_expression_tree(id) ON DELETE CASCADE ) 122 | ENGINE=InnoDB; 123 | 124 | 125 | CREATE TABLE IF NOT EXISTS 126 | ex_{MODULE_ID}_operand_tuples ( 127 | address BIGINT UNSIGNED NOT NULL, 128 | operand_id INTEGER UNSIGNED NOT NULL, 129 | position INTEGER UNSIGNED NOT NULL, 130 | FOREIGN KEY (operand_id) REFERENCES ex_{MODULE_ID}_operand_strings(id) ON DELETE CASCADE, 131 | FOREIGN KEY (address) REFERENCES ex_{MODULE_ID}_instructions(address) ON DELETE CASCADE ) 132 | ENGINE=InnoDB; 133 | 134 | 135 | CREATE TABLE IF NOT EXISTS 136 | ex_{MODULE_ID}_expression_substitutions ( 137 | id INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 138 | address BIGINT UNSIGNED NOT NULL, 139 | operand_id INTEGER UNSIGNED NOT NULL, 140 | expr_id INTEGER UNSIGNED NOT NULL, 141 | replacement TEXT NOT NULL, 142 | FOREIGN KEY (address) REFERENCES ex_{MODULE_ID}_instructions(address) ON DELETE CASCADE, 143 | FOREIGN KEY (operand_id) REFERENCES ex_{MODULE_ID}_operand_strings(id) ON DELETE CASCADE, 144 | FOREIGN KEY (expr_id) REFERENCES ex_{MODULE_ID}_expression_tree(id) ON DELETE CASCADE ) 145 | ENGINE=InnoDB; 146 | 147 | 148 | CREATE TABLE IF NOT EXISTS 149 | ex_{MODULE_ID}_operand_expressions ( 150 | operand_id INTEGER UNSIGNED NOT NULL, 151 | expr_id INTEGER UNSIGNED NOT NULL, 152 | FOREIGN KEY (operand_id) REFERENCES ex_{MODULE_ID}_operand_strings(id) ON DELETE CASCADE, 153 | FOREIGN KEY (expr_id) REFERENCES ex_{MODULE_ID}_expression_tree(id) ON DELETE CASCADE ) 154 | ENGINE=InnoDB; 155 | 156 | 157 | CREATE TABLE IF NOT EXISTS 158 | ex_{MODULE_ID}_address_references ( 159 | address BIGINT UNSIGNED NOT NULL, 160 | operand_id INT UNSIGNED NULL, 161 | expression_id INT UNSIGNED NULL, 162 | target BIGINT UNSIGNED NOT NULL, 163 | kind INT UNSIGNED NOT NULL DEFAULT 0 CHECK( kind <= 8 ), 164 | FOREIGN KEY (address) REFERENCES ex_{MODULE_ID}_instructions(address) ON DELETE CASCADE, 165 | FOREIGN KEY (operand_id) REFERENCES ex_{MODULE_ID}_operand_strings( id ) ON DELETE CASCADE, 166 | FOREIGN KEY (expression_id) REFERENCES ex_{MODULE_ID}_expression_tree( id ) ON DELETE CASCADE, 167 | KEY(target), 168 | KEY(kind)) 169 | ENGINE=InnoDB; 170 | 171 | 172 | CREATE TABLE IF NOT EXISTS 173 | ex_{MODULE_ID}_address_comments ( 174 | address BIGINT UNSIGNED UNIQUE NOT NULL, 175 | comment TEXT NOT NULL, 176 | PRIMARY KEY(address)) 177 | ENGINE=InnoDB; 178 | 179 | 180 | CREATE TABLE IF NOT EXISTS 181 | ex_{MODULE_ID}_sections ( 182 | name VARCHAR(256) NOT NULL, 183 | base BIGINT UNSIGNED NOT NULL, 184 | start_address BIGINT UNSIGNED NOT NULL, 185 | end_address BIGINT UNSIGNED NOT NULL, 186 | length BIGINT UNSIGNED NOT NULL, 187 | data LONGBLOB ) 188 | ENGINE=InnoDB; 189 | 190 | """ 191 | 192 | 193 | #################################### 194 | ########### PostgreSQL ############# 195 | #################################### 196 | 197 | #-removed INNODB 198 | #-removed backticks 199 | #-removed UNSIGNED, SIGNED 200 | #-changed AUTO_INCREMENT into SERIAL 201 | #-removed "IF NOT EXISTS" 202 | #-changed BLOB to BYTEA 203 | 204 | postgresql_new_db_statements = """ 205 | """ 206 | 207 | postgresql_new_module_statements = """""" 208 | -------------------------------------------------------------------------------- /ida_to_sql/db_statements_v2.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | import common 20 | 21 | __author__ = 'Ero Carrera' 22 | __version__ = common.__version__ 23 | __license__ = 'GPL' 24 | 25 | MYSQL_SCHEMA_VERSION = 3 26 | 27 | mysql_new_db_statements = """ 28 | CREATE table modules( 29 | id INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 30 | name TEXT NOT NULL, 31 | architecture VARCHAR( 32 ) NOT NULL, 32 | base_address BIGINT UNSIGNED NOT NULL, 33 | exporter VARCHAR( 256 ) NOT NULL, 34 | version INT NOT NULL, 35 | md5 CHAR(32) NOT NULL, 36 | sha1 CHAR(40) NOT NULL, 37 | comment TEXT, 38 | import_time TIMESTAMP NOT NULL ) 39 | ENGINE=InnoDB; 40 | """ 41 | 42 | 43 | mysql_new_module_statements = """ 44 | CREATE TABLE IF NOT EXISTS 45 | ex_{MODULE_ID}_functions ( 46 | `address` BIGINT UNSIGNED UNIQUE NOT NULL UNIQUE, 47 | `name` TEXT NOT NULL, 48 | `has_real_name` BOOLEAN NOT NULL DEFAULT TRUE, 49 | `type` INTEGER UNSIGNED NOT NULL DEFAULT 0 CHECK( `type` <= 3 ), 50 | `module_name` TEXT NULL DEFAULT NULL, 51 | PRIMARY KEY ( `address` )) 52 | ENGINE=InnoDB; 53 | 54 | 55 | CREATE TABLE IF NOT EXISTS 56 | ex_{MODULE_ID}_basic_blocks ( 57 | `id` INTEGER UNSIGNED NOT NULL, 58 | `parent_function` BIGINT UNSIGNED NOT NULL, 59 | `address` BIGINT UNSIGNED NOT NULL, 60 | PRIMARY KEY( `id`, `parent_function`), 61 | KEY(`address`), 62 | FOREIGN KEY (`parent_function`) REFERENCES ex_{MODULE_ID}_functions(`address`) ON DELETE CASCADE ON UPDATE CASCADE ) 63 | ENGINE=InnoDB; 64 | 65 | 66 | CREATE TABLE IF NOT EXISTS 67 | ex_{MODULE_ID}_instructions ( 68 | `address` BIGINT UNSIGNED NOT NULL, 69 | `basic_block_id` INTEGER UNSIGNED NOT NULL, 70 | `mnemonic` VARCHAR(32), 71 | `sequence` INT UNSIGNED NOT NULL, 72 | `data` BLOB NOT NULL, 73 | PRIMARY KEY(`address`, `basic_block_id`), 74 | FOREIGN KEY (`basic_block_id`) REFERENCES ex_{MODULE_ID}_basic_blocks(`id`) ON DELETE CASCADE ON UPDATE CASCADE ) 75 | ENGINE=InnoDB; 76 | 77 | 78 | CREATE TABLE IF NOT EXISTS 79 | ex_{MODULE_ID}_callgraph ( 80 | `id` INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 81 | `source` BIGINT UNSIGNED NOT NULL, 82 | `source_basic_block_id` INTEGER UNSIGNED NOT NULL, 83 | `source_address` BIGINT UNSIGNED NOT NULL, 84 | `destination` BIGINT UNSIGNED NOT NULL, 85 | FOREIGN KEY (`source`) REFERENCES ex_{MODULE_ID}_functions(`address`) ON DELETE CASCADE ON UPDATE CASCADE, 86 | FOREIGN KEY (`destination`) REFERENCES ex_{MODULE_ID}_functions(`address`) ON DELETE CASCADE ON UPDATE CASCADE, 87 | FOREIGN KEY (`source_basic_block_id`) REFERENCES ex_{MODULE_ID}_basic_blocks(`id`) ON DELETE CASCADE ON UPDATE CASCADE, 88 | FOREIGN KEY (`source_address`) REFERENCES ex_{MODULE_ID}_instructions(`address`) ON DELETE CASCADE ON UPDATE CASCADE ) 89 | ENGINE=InnoDB; 90 | 91 | 92 | CREATE TABLE IF NOT EXISTS 93 | ex_{MODULE_ID}_control_flow_graphs ( 94 | `id` INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 95 | `parent_function` BIGINT UNSIGNED NOT NULL, 96 | `source` INTEGER UNSIGNED NOT NULL, 97 | `destination` INTEGER UNSIGNED NOT NULL, 98 | `type` INTEGER UNSIGNED NOT NULL DEFAULT 0 CHECK( `type` <= 3 ), 99 | FOREIGN KEY (`source`) REFERENCES ex_{MODULE_ID}_basic_blocks(`id`) ON DELETE CASCADE ON UPDATE CASCADE, 100 | FOREIGN KEY (`destination`) REFERENCES ex_{MODULE_ID}_basic_blocks(`id`) ON DELETE CASCADE ON UPDATE CASCADE, 101 | FOREIGN KEY (`parent_function`) REFERENCES ex_{MODULE_ID}_functions(`address`) ON DELETE CASCADE ON UPDATE CASCADE, 102 | INDEX (parent_function, source) ) 103 | ENGINE=InnoDB; 104 | 105 | 106 | CREATE TABLE IF NOT EXISTS 107 | ex_{MODULE_ID}_expression_trees ( 108 | `id` INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT 109 | ) 110 | ENGINE=InnoDB; 111 | 112 | 113 | CREATE TABLE IF NOT EXISTS 114 | ex_{MODULE_ID}_expression_nodes ( 115 | `id` INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 116 | `type` INTEGER UNSIGNED NOT NULL DEFAULT 0 CHECK( `type` <= 7 ), 117 | `symbol` VARCHAR(256), 118 | `immediate` BIGINT SIGNED, 119 | `position` INTEGER, 120 | `parent_id` INTEGER UNSIGNED CHECK(`id` > `parent_id`), 121 | FOREIGN KEY (`parent_id`) REFERENCES ex_{MODULE_ID}_expression_nodes(`id`) ON DELETE CASCADE ON UPDATE CASCADE ) 122 | ENGINE=InnoDB; 123 | 124 | 125 | CREATE TABLE IF NOT EXISTS 126 | ex_{MODULE_ID}_operands ( 127 | `address` BIGINT UNSIGNED NOT NULL, 128 | `expression_tree_id` INTEGER UNSIGNED NOT NULL, 129 | `position` INTEGER UNSIGNED NOT NULL, 130 | FOREIGN KEY (`expression_tree_id`) REFERENCES ex_{MODULE_ID}_expression_trees(`id`) ON DELETE CASCADE ON UPDATE CASCADE, 131 | FOREIGN KEY (`address`) REFERENCES ex_{MODULE_ID}_instructions(`address`) ON DELETE CASCADE ON UPDATE CASCADE, 132 | PRIMARY KEY( `address`, `position` ) ) 133 | ENGINE=InnoDB; 134 | 135 | 136 | CREATE TABLE IF NOT EXISTS 137 | ex_{MODULE_ID}_expression_substitutions ( 138 | `id` INTEGER UNSIGNED NOT NULL UNIQUE PRIMARY KEY AUTO_INCREMENT, 139 | `address` BIGINT UNSIGNED NOT NULL, 140 | `position` INTEGER UNSIGNED NOT NULL, 141 | `expression_node_id` INTEGER UNSIGNED NOT NULL, 142 | `replacement` TEXT NOT NULL, 143 | FOREIGN KEY (`address`, `position`) REFERENCES ex_{MODULE_ID}_operands(`address`, `position`) ON DELETE CASCADE ON UPDATE CASCADE, 144 | FOREIGN KEY (`expression_node_id`) REFERENCES ex_{MODULE_ID}_expression_nodes(`id`) ON DELETE CASCADE ON UPDATE CASCADE ) 145 | ENGINE=InnoDB; 146 | 147 | 148 | CREATE TABLE IF NOT EXISTS 149 | ex_{MODULE_ID}_expression_tree_nodes ( 150 | `expression_tree_id` INTEGER UNSIGNED NOT NULL, 151 | `expression_node_id` INTEGER UNSIGNED NOT NULL, 152 | FOREIGN KEY (`expression_tree_id`) REFERENCES ex_{MODULE_ID}_expression_trees(`id`) ON DELETE CASCADE ON UPDATE CASCADE, 153 | FOREIGN KEY (`expression_node_id`) REFERENCES ex_{MODULE_ID}_expression_nodes(`id`) ON DELETE CASCADE ON UPDATE CASCADE ) 154 | ENGINE=InnoDB; 155 | 156 | 157 | CREATE TABLE IF NOT EXISTS 158 | ex_{MODULE_ID}_address_references ( 159 | `address` BIGINT UNSIGNED NOT NULL, 160 | `position` INTEGER UNSIGNED NULL, 161 | `expression_node_id` INTEGER UNSIGNED NULL, 162 | `destination` BIGINT UNSIGNED NOT NULL, 163 | `type` INT UNSIGNED NOT NULL DEFAULT 0 CHECK( `type` <= 8 ), 164 | FOREIGN KEY (`address`, `position`) REFERENCES ex_{MODULE_ID}_operands(`address`, `position`) ON DELETE CASCADE ON UPDATE CASCADE, 165 | FOREIGN KEY (`expression_node_id`) REFERENCES ex_{MODULE_ID}_expression_nodes( `id` ) ON DELETE CASCADE ON UPDATE CASCADE, 166 | KEY(`destination`), 167 | KEY(`type`) ) 168 | ENGINE=InnoDB; 169 | 170 | 171 | CREATE TABLE IF NOT EXISTS 172 | ex_{MODULE_ID}_address_comments ( 173 | `address` BIGINT UNSIGNED UNIQUE NOT NULL, 174 | `comment` TEXT NOT NULL, 175 | PRIMARY KEY(`address`)) 176 | ENGINE=InnoDB; 177 | 178 | 179 | CREATE TABLE IF NOT EXISTS 180 | ex_{MODULE_ID}_sections ( 181 | `name` VARCHAR(256) NOT NULL, 182 | `base` BIGINT UNSIGNED NOT NULL, 183 | `start_address` BIGINT UNSIGNED NOT NULL, 184 | `end_address` BIGINT UNSIGNED NOT NULL, 185 | `length` BIGINT UNSIGNED NOT NULL, 186 | `data` LONGBLOB ) 187 | ENGINE=InnoDB; 188 | 189 | """ 190 | 191 | 192 | #################################### 193 | ########### PostgreSQL ############# 194 | #################################### 195 | 196 | #-removed INNODB 197 | #-removed backticks 198 | #-removed UNSIGNED, SIGNED 199 | #-changed AUTO_INCREMENT into SERIAL 200 | #-removed "IF NOT EXISTS" 201 | #-changed BLOB to BYTEA 202 | 203 | postgresql_new_db_statements = """ 204 | """ 205 | 206 | postgresql_new_module_statements = """""" 207 | -------------------------------------------------------------------------------- /ida_to_sql/functional_unit.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | __author__ = 'Ero Carrera' 20 | __license__ = 'GPL' 21 | 22 | import common 23 | 24 | from sets import Set 25 | 26 | 27 | class FunctionalUnit: 28 | 29 | # def __init__(self, start, end): 30 | def __init__(self, start): 31 | 32 | # Start and end addresses of the function. 33 | # 34 | # If the function consists of multiple 35 | # non-sequential chunks, 'end' should be 36 | # the last address of the first chunk 37 | # (where the function's entry point lays) 38 | 39 | self.start = start 40 | # self.end = end 41 | 42 | # Function's name 43 | self.name = None 44 | 45 | # If the module belongs to a DLL its name will be stored 46 | # in this variable 47 | # 48 | self.module = None 49 | 50 | # Source and target addresses of the branches 51 | self.branch_sources = set() 52 | self.branch_targets = set() 53 | 54 | # Data references made. 55 | # {src: [dst,]} 56 | self.data_references = dict() 57 | 58 | # Basic blocks composing the function. 59 | # It's a list of (start, end) address pairs. 60 | self.blocks = list() 61 | 62 | # Set of pairs (source, target) address for 63 | # all branches. 64 | self.branches = set() 65 | 66 | # Kinds of all branches. The keys are the elements 67 | # of self.branches and the values the kind of branch 68 | # 69 | # common.BRANCH_TYPE_TRUE = 0 70 | # common.BRANCH_TYPE_FALSE = 1 71 | # common.BRANCH_TYPE_UNCONDITIONAL = 2 72 | # common.BRANCH_TYPE_SWITCH = 3 73 | 74 | self.branch_kinds = dict() 75 | 76 | # List of pairs (source, target) of indexes 77 | # into the basic blocks 'blocks' list. 78 | self.cfg_block_paths = list() 79 | 80 | # List of instruction addresses for all addresses 81 | # belonging to the function 82 | self.instructions = list() 83 | 84 | # Dictionary containing all the instruction sizes. 85 | # Keys are the addresses of the instructions. 86 | self.instruction_sizes = dict() 87 | 88 | # Information about the normal flow from instruction 89 | # to instruction. Refer to 'is_flow' for more info. 90 | self.instruction_flow = dict() 91 | 92 | # Pairs of (source, target) for calls made from within the 93 | # function. 'source' is always in the function's body. 94 | self.calls = list() 95 | 96 | # Function chunks are discovered as we process the 97 | # functions's flow. For a conventional function 98 | # existing sequentically in memory this will be a 99 | # unique block. 100 | # 101 | self.function_chunks = list() 102 | 103 | 104 | # Kind of function 105 | self.kind = None 106 | 107 | 108 | def add_instructions(self, instructions): 109 | """Add instructions information. 110 | 111 | The argument is a list of pairs (adress, instruction_length). 112 | """ 113 | 114 | self.instruction_sizes.update(dict(instructions)) 115 | self.instructions.extend([i[0] for i in instructions]) 116 | self.instructions.sort() 117 | 118 | 119 | def insn_size(self, addr): 120 | """Return the size of the instruction at the given address. 121 | 122 | Returns none if no instruction is known to exist at such 123 | location. 124 | """ 125 | 126 | return self.instruction_sizes.get(addr, None) 127 | 128 | 129 | def has_instruction_at(self, address): 130 | """Query whether an instruction at the given address exists.""" 131 | 132 | return address in self.instructions 133 | 134 | 135 | def set_instruction_flow(self, flows): 136 | """Set flow information for the instructions given. 137 | 138 | 'flows' is a list of pairs (address, boolean) indicating 139 | whether the instruction at 'address' is reachable by 140 | normal flow, that is, if flow can go from the previous 141 | instruction to the one at 'address'. 142 | Cases where it will be False are, for instance. Start of 143 | the function, start of function chunks separated from the 144 | main body of the function, start of basic blocks starting 145 | after unconditional jumps or rets. 146 | """ 147 | 148 | self.instruction_flow.update(dict(flows)) 149 | 150 | 151 | def is_flow(self, addr): 152 | """Returns whether normal execution can flow from the previous instruction to this one. 153 | 154 | See 'set_instruction_flow' for more info.""" 155 | 156 | if not self.has_instruction_at(addr): 157 | return None 158 | 159 | return self.instruction_flow.get(addr, False) 160 | 161 | 162 | def instructions_in_range(self, start, end): 163 | """Return the instructions in the given range.""" 164 | 165 | return [i for i in self.instructions if i>=start and i<=end] 166 | 167 | 168 | def add_data_reference(self, ref): 169 | """Add data reference 170 | 171 | 'ref' is of the form (source, target) 172 | """ 173 | 174 | if ref[0] in self.data_references: 175 | self.data_references[ ref[0] ].append( ref[1] ) 176 | else: 177 | self.data_references[ ref[0] ] = [ ref[1] ] 178 | 179 | 180 | def add_branch(self, branch): 181 | """Add branch information. 182 | 183 | 'branch' is of the form (source, target) 184 | """ 185 | 186 | self.branches.add(branch) 187 | self.branch_sources.add(branch[0]) 188 | self.branch_targets.add(branch[1]) 189 | 190 | 191 | def get_block_by_address(self, address): 192 | """Returns basic block containing the given address.""" 193 | 194 | for b in self.blocks: 195 | if address>=b[0] and address<=b[1]: 196 | return b 197 | 198 | return None 199 | 200 | 201 | def get_prev_address(self, ea): 202 | """Get the previous address to the given one. 203 | 204 | This takes into account the instruction flow. If it exists 205 | an instruction in the immediately preceeding address but 206 | the execution flow can't reach the given one None is returned. 207 | """ 208 | 209 | if self.is_flow(ea)==False or ea not in self.instructions: 210 | return None 211 | 212 | idx = self.instructions.index(ea) 213 | if idx==0: 214 | return None 215 | else: 216 | return self.instructions[idx-1] 217 | 218 | 219 | def get_next_address(self, ea): 220 | """Get the following address to the given one. 221 | 222 | This takes into account the instruction flow. If it exists 223 | an instruction in the immediately following address but 224 | the execution flow from the given one can't reach it None 225 | is returned. 226 | """ 227 | 228 | idx = self.instructions.index(ea) 229 | 230 | if idx+1 == len(self.instructions) or \ 231 | self.is_flow(self.instructions[idx+1])==False: 232 | 233 | return None 234 | 235 | else: 236 | return self.instructions[idx+1] 237 | 238 | 239 | def build_main_blocks(self): 240 | """Compose main function blocks. 241 | 242 | Scan sequentially all the instructions in the function and 243 | group them into blocks where all instruction sequentailly 244 | follow each other. 245 | """ 246 | 247 | # Function chunks can be located before the start of the 248 | # function, so we just get the lowest address belonging to 249 | # the function. 250 | b_start = min(self.instructions) # == self.instructions[0] 251 | for idx, ea in enumerate(self.instructions): 252 | 253 | if idx+1 == len(self.instructions): 254 | next_ea = None 255 | else: 256 | next_ea = self.instructions[idx+1] 257 | 258 | if ea+self.insn_size(ea) != next_ea or not self.is_flow(next_ea): 259 | self.blocks.append([b_start, ea]) 260 | if not next_ea: 261 | break 262 | b_start = next_ea 263 | 264 | 265 | def split_blocks_by_target_branches(self): 266 | """Split the function blocks at all target locations of a branch.""" 267 | 268 | for ref in self.branch_targets: 269 | b = self.get_block_by_address(ref) 270 | if not b: 271 | continue 272 | 273 | # If the reference already points to a block's start 274 | # there's nothing to do. 275 | if b[0] == ref: 276 | continue 277 | 278 | end = b[1] 279 | if end>=ref: 280 | prev_ea = self.get_prev_address(ref) 281 | if prev_ea: 282 | b[1] = prev_ea 283 | self.blocks.append([ref, end]) 284 | self.branches.add((prev_ea, ref)) 285 | 286 | 287 | def split_blocks_by_source_branches(self): 288 | """Split the function blocks at all source locations of a branch.""" 289 | 290 | for ref in self.branch_sources: 291 | b = self.get_block_by_address(ref) 292 | if not b: 293 | continue 294 | end = b[1] 295 | next_ea = self.get_next_address(ref) 296 | 297 | if next_ea and end >= next_ea: 298 | b[1] = ref 299 | self.blocks.append([next_ea, end]) 300 | 301 | 302 | def find_cfg_paths(self): 303 | """Generate control flow graph paths. 304 | 305 | Goes through all branches and get all the destination and 306 | target blocks generating a list of paths where the contents 307 | are the block's index into the basic blocks list of the 308 | function. 309 | """ 310 | 311 | self.blocks.sort() 312 | 313 | for ref in self.branches: 314 | b_from = self.get_block_by_address(ref[0]) 315 | b_to = self.get_block_by_address(ref[1]) 316 | if self.blocks and b_from in self.blocks and b_to in self.blocks: 317 | self.cfg_block_paths.append( 318 | (self.blocks.index(b_from), self.blocks.index(b_to))) 319 | self.cfg_block_paths.sort() 320 | 321 | 322 | def find_branch_types(self): 323 | 324 | branch_dict = dict() 325 | for b in self.branches: 326 | trgt_set = branch_dict.get(b[0], set()) 327 | trgt_set.add(b[1]) 328 | branch_dict[b[0]] = trgt_set 329 | 330 | 331 | for src, trgt_set in branch_dict.items(): 332 | 333 | # Take a look whether a data references is also 334 | # made from this address, that might help identify 335 | # switches 336 | # 337 | # NOTE: according to intel all conditional jumps 338 | # have relative addresses as operands. Only 339 | # unconditional jumps can make memory dereferences 340 | # 341 | src_has_data_ref = False 342 | if src in self.data_references: 343 | src_has_data_ref = True 344 | 345 | # if 'src' only has one outgoing edge it's unconditional 346 | if len(trgt_set) == 1: 347 | self.branch_kinds[ 348 | (src, list(trgt_set)[0])] = common.BRANCH_TYPE_UNCONDITIONAL 349 | continue 350 | 351 | if len(trgt_set) == 2 and src_has_data_ref is False: 352 | 353 | # if one of the edges follows immediatelly 354 | # then it'll be a conditional jump 355 | 356 | next_insn = src+self.instruction_sizes[src] 357 | trgt_set = list(trgt_set) 358 | 359 | if next_insn in trgt_set: 360 | if next_insn == trgt_set[0]: 361 | self.branch_kinds[ 362 | (src, trgt_set[0])] = common.BRANCH_TYPE_FALSE 363 | self.branch_kinds[ 364 | (src, trgt_set[1])] = common.BRANCH_TYPE_TRUE 365 | else: 366 | self.branch_kinds[ 367 | (src, trgt_set[0])] = common.BRANCH_TYPE_TRUE 368 | self.branch_kinds[ 369 | (src, trgt_set[1])] = common.BRANCH_TYPE_FALSE 370 | continue 371 | 372 | # If it was not an unconditional jump and all the tests for a 373 | # conditional one failed, it's got to be a switch... 374 | # 375 | # We could do a test like "if src_has_data_ref is True:" 376 | # but would fail when the address is not avaiable, like in 377 | # the case of a switch like [eax+ebx*4], hence we don't 378 | # explicitly check for it and assume it's going to be a switch 379 | # 380 | for trgt in trgt_set: 381 | self.branch_kinds[ 382 | (src, trgt)] = common.BRANCH_TYPE_SWITCH 383 | 384 | 385 | 386 | def analyze(self): 387 | """Main analysis function. 388 | 389 | It will: 390 | -Compose the main blocks of the function. 391 | -Split them according to all the branching information. 392 | -Generate function's internal connectivity information. 393 | """ 394 | 395 | self.build_main_blocks() 396 | 397 | self.split_blocks_by_target_branches() 398 | 399 | self.split_blocks_by_source_branches() 400 | 401 | self.find_branch_types() 402 | 403 | self.find_cfg_paths() 404 | 405 | 406 | 407 | class ImportedFunction(FunctionalUnit): 408 | """Class representing an imported function.""" 409 | 410 | def __init__(self, start, name, module): 411 | FunctionalUnit.__init__(self, start) 412 | self.name = name 413 | self.module = module 414 | self.kind = common.FUNC_TYPE.FUNCTION_IMPORTED 415 | 416 | # An nonexistent instruction is added in order to be able 417 | # to find the function by address. 418 | self.instructions.append(start) 419 | self.instruction_sizes[start] = 0 420 | 421 | self.analyze() 422 | 423 | -------------------------------------------------------------------------------- /ida_to_sql/ida_to_sql.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | __author__ = 'Ero Carrera' 20 | __license__ = 'GPL' 21 | 22 | 23 | import os 24 | import re 25 | import time 26 | import ConfigParser 27 | import functional_unit 28 | import idc 29 | import idaapi 30 | import idautils 31 | import sys 32 | from sql_exporter import SQLExporter 33 | from instrumentation import Instrumentation 34 | from common import * 35 | 36 | if 'MEMORY_PROFILE' in os.environ: 37 | print 'MEMORY_PROFILE environment variable set. Will show memory usage statistics' 38 | 39 | import memory_info 40 | def mem_info(): 41 | memory = memory_info.memory() 42 | resident = memory_info.resident() 43 | try: 44 | print 'Memory usage[ %d bytes = %.2f KiB = %.2f MiB ] Resident[ %d bytes = %.2f KiB = %.2f MiB ]' % ( 45 | memory, memory/2**10, memory/2**20, resident, resident/2**10, resident/2**20 ) 46 | except TypeError: 47 | print 'TypeError while attemping to print memory info: ', memory, type(memory), resident, type(resident) 48 | 49 | 50 | CONFIG_FILE_NAME = os.environ.get('IDA2SQLCFG', None) 51 | 52 | USE_NEW_SCHEMA = True 53 | use_old_schema = os.environ.get('IDA2SQL_USE_OLD_SCHEMA', None) 54 | if use_old_schema: 55 | USE_NEW_SCHEMA = False 56 | 57 | # Gobal variable to keep track of time 58 | # 59 | tm_start = 0 60 | 61 | class FunctionAnalyzer(functional_unit.FunctionalUnit): 62 | "Class representing an analyzed function." 63 | 64 | 65 | def __init__(self, arch, start_ava, packet): 66 | 67 | functional_unit.FunctionalUnit.__init__(self, start_ava) 68 | 69 | # Architecure handling class 70 | # 71 | self.arch = arch 72 | 73 | # Add all the calls referring to this function 74 | # 75 | self.calls.extend( 76 | [call for call in packet.calls.items() if call[1]]) 77 | 78 | # Add all the branches with their target within the function 79 | # 80 | [self.add_branch(branch) for branch in packet.branches if 81 | packet.disassembly.has_key(branch[1])] 82 | 83 | [self.add_data_reference(ref[:2]) for ref in packet.address_references ] 84 | 85 | self.reconstruct_flow(start_ava, packet) 86 | if not self.function_chunks: 87 | raise FunctionException(None) 88 | 89 | self.name = idc.GetFunctionName(start_ava) 90 | 91 | # Now the function's end can be set. 92 | # End is set to the last instruction of the first 93 | # continuous sequence of code from the beginning of 94 | # the function. 95 | # 96 | self.end = self.function_chunks[0][-1] 97 | 98 | flags = idc.GetFunctionFlags(start_ava) 99 | 100 | if flags&idaapi.FUNC_LIB == idaapi.FUNC_LIB: 101 | self.kind = FUNC_TYPE.FUNCTION_LIBRARY 102 | elif flags&idaapi.FUNC_THUNK == idaapi.FUNC_THUNK: 103 | self.kind = FUNC_TYPE.FUNCTION_THUNK 104 | else: 105 | self.kind = FUNC_TYPE.FUNCTION_STANDARD 106 | 107 | 108 | # Each chunk is a list of addresses of instructions 109 | # within the chunk. 110 | instruction_addresses = list() 111 | chunk_starting_addresses = list() 112 | chunk_remaining_addresses = list() 113 | for chunk in self.function_chunks: 114 | # The first instruction in a chunk can't have normal 115 | # flow into it. Only unconditional branches or code 116 | # after end-of-flow instructions lead to new 117 | # chunks. 118 | instruction_addresses.extend(chunk) 119 | chunk_starting_addresses.append(chunk[0]) 120 | chunk_remaining_addresses.extend(chunk[1:]) 121 | 122 | self.set_instruction_flow( 123 | zip( chunk_remaining_addresses, 124 | (True,)*len(chunk_remaining_addresses))) 125 | 126 | self.set_instruction_flow( 127 | zip( chunk_starting_addresses, 128 | (False,)*len(chunk_starting_addresses))) 129 | 130 | instruction_sizes = [i[0].size for i in 131 | (packet.disassembly[addr] for addr in instruction_addresses)] 132 | 133 | self.add_instructions( 134 | zip(instruction_addresses, instruction_sizes) ) 135 | 136 | # Call the FunctionalUnit's analyze function 137 | self.analyze() 138 | 139 | def reconstruct_flow(self, function_start, packet): 140 | 141 | disassembly = packet.disassembly 142 | instructions_queue = set(disassembly.keys()) 143 | 144 | visited = set() 145 | branches_to_do = list([function_start]+ 146 | list(b[1] for b in packet.branches)) 147 | 148 | # Walk all the function code. 149 | # 150 | while branches_to_do: 151 | addr = branches_to_do.pop() 152 | if addr in visited: 153 | continue 154 | chunk = list() 155 | 156 | while True: 157 | if addr in visited: 158 | # If the address has been visited the blocks need 159 | # to be merged, as there's normal flow from one 160 | # to another. 161 | old_chunk = [c for c in self.function_chunks if 162 | addr in c][0] 163 | chunk.extend(old_chunk) 164 | self.function_chunks.remove(old_chunk) 165 | break 166 | 167 | i, i_data = disassembly.get(addr, (None, None)) 168 | if not i: 169 | break 170 | 171 | visited.add(addr) 172 | instructions_queue.remove(addr) 173 | chunk.append(addr) 174 | 175 | if self.arch.is_end_of_flow(i): 176 | break 177 | 178 | addr += i.size 179 | 180 | if chunk: 181 | self.function_chunks.append(chunk) 182 | 183 | # If there are instruction left after the previous analysis 184 | # those must be disconnected from the rest, either dead code 185 | # or referenced at runtime. 186 | if instructions_queue: 187 | instructions_queue = list(instructions_queue) 188 | instructions_queue.sort() 189 | chunk = [instructions_queue.pop(0)] 190 | while True: 191 | if not instructions_queue: 192 | break 193 | curr_i = disassembly[chunk[-1]][0] 194 | next = instructions_queue.pop(0) 195 | 196 | if (not (self.arch.is_end_of_flow(curr_i) ) and 197 | chunk[-1]+curr_i.size == next): 198 | 199 | chunk.append(next) 200 | else: 201 | self.function_chunks.append(chunk) 202 | chunk = [next] 203 | 204 | if chunk: 205 | self.function_chunks.append(chunk) 206 | 207 | 208 | 209 | def get_chunks(ea): 210 | 211 | function_chunks = [] 212 | 213 | #Get the tail iterator 214 | func_iter = idaapi.func_tail_iterator_t(idaapi.get_func(ea)) 215 | 216 | # While the iterator's status is valid 217 | status = func_iter.main() 218 | while status: 219 | # Get the chunk 220 | chunk = func_iter.chunk() 221 | # Store its start and ending address as a tuple 222 | function_chunks.append((chunk.startEA, chunk.endEA)) 223 | 224 | # Get the last status 225 | status = func_iter.next() 226 | 227 | return function_chunks 228 | 229 | def address_in_chunks(address, chunk_list): 230 | 231 | for chunk_start, chunk_end in chunk_list: 232 | if chunk_start <= address < chunk_end: 233 | return True 234 | 235 | return False 236 | 237 | 238 | def get_flow_code_from_address(address): 239 | """Get a sequence of instructions starting at a given address. 240 | 241 | This function is used to collect basic blocks marked as chunks in IDA 242 | but not as belonging to the function being examined. IDA can only 243 | assign a chunk to a function, not to multiple. 244 | This helps getting around that limitation. 245 | """ 246 | 247 | if idc.isCode(idc.GetFlags(address)): 248 | code = [address] 249 | else: 250 | return None 251 | 252 | while True: 253 | 254 | # Get the address of the following element 255 | address = address+idc.ItemSize(address) 256 | 257 | flags = idc.GetFlags(address) 258 | 259 | # If the element is an instruction and "flow" goes into it 260 | if idc.isCode(flags) and idc.isFlow(flags): 261 | code.append(address) 262 | else: 263 | break 264 | 265 | # Return the code chunk just obtained 266 | # Note: if we get down here there'll be at least one instruction so we are cool 267 | # Node: the +1 is so the last instruction can be retrieved through a call to 268 | # "Heads(start, end)". As end is a non-inclusive limit we need to move the 269 | # pointer ahead so the instruction at that address is retrieved. 270 | return (min(code), max(code)+1) 271 | 272 | 273 | def process_function(arch, func_ea): 274 | 275 | func_end = idc.FindFuncEnd(func_ea) 276 | 277 | packet = DismantlerDataPacket() 278 | 279 | ida_chunks = get_chunks(func_ea) 280 | chunks = set() 281 | 282 | # Add to the chunks only the main block, containing the 283 | # function entry point 284 | # 285 | chunk = get_flow_code_from_address(func_ea) 286 | if chunk: 287 | chunks.add( chunk ) 288 | 289 | # Make "ida_chunks" a set for faster searches within 290 | ida_chunks = set(ida_chunks) 291 | ida_chunks_idx = dict(zip([c[0] for c in ida_chunks], ida_chunks)) 292 | 293 | func = idaapi.get_func(func_ea) 294 | comments = [idaapi.get_func_cmt(func, 0), idaapi.get_func_cmt(func, 1)] 295 | 296 | # Copy the list of chunks into a queue to process 297 | # 298 | chunks_todo = [c for c in chunks] 299 | 300 | while True: 301 | 302 | # If no chunks left in the queue, exit 303 | if not chunks_todo: 304 | 305 | if ida_chunks: 306 | chunks_todo.extend(ida_chunks) 307 | else: 308 | break 309 | 310 | chunk_start, chunk_end = chunks_todo.pop() 311 | if ida_chunks_idx.has_key(chunk_start): 312 | ida_chunks.remove(ida_chunks_idx[chunk_start]) 313 | del ida_chunks_idx[chunk_start] 314 | 315 | for head in idautils.Heads(chunk_start, chunk_end): 316 | 317 | comments.extend( (idaapi.get_cmt(head, 0), idaapi.get_cmt(head, 1)) ) 318 | comment = '\n'.join([c for c in comments if c is not None]) 319 | comment = comment.strip() 320 | if comment: 321 | packet.add_comment(head, comment) 322 | comments = list() 323 | 324 | if idc.isCode(idc.GetFlags(head)): 325 | 326 | instruction = arch.process_instruction(packet, head) 327 | 328 | # if there are other references than 329 | # flow add them all. 330 | if list( idautils.CodeRefsFrom(head, 0) ): 331 | 332 | # for each reference, including flow ones 333 | for ref_idx, ref in enumerate(idautils.CodeRefsFrom(head, 1)): 334 | 335 | if arch.is_call(instruction): 336 | 337 | # This two conditions must remain separated, it's 338 | # necessary to enter the enclosing "if" whenever 339 | # the instruction is a call, otherwise it will be 340 | # added as an uncoditional jump in the last else 341 | # 342 | if ref in list( idautils.CodeRefsFrom(head, 0) ): 343 | packet.add_direct_call(head, ref) 344 | 345 | elif ref_idx>0 and arch.is_conditional_branch(instruction): 346 | # The ref_idx is > 0 in order to avoid processing the 347 | # normal flow reference which would effectively imply 348 | # that the conditional branch is processed twice. 349 | # It's done this way instead of changing the loop's head 350 | # from CodeRefsFrom(head, 1) to CodeRefsFrom(head, 0) in 351 | # order to avoid altering the behavior of other conditions 352 | # which rely on it being so. 353 | 354 | # FIXME 355 | # I don't seem to check for the reference here 356 | # to point to valid, defined code. I suspect 357 | # this could lead to a failure when exporting 358 | # if such situation appears. I should test if 359 | # it's a likely scenario and probably just add 360 | # an isHead() or isCode() to address it. 361 | 362 | packet.add_conditional_branch_true(head, ref) 363 | packet.add_conditional_branch_false( 364 | head, idaapi.next_head(head, chunk_end)) 365 | 366 | # If the target is not in our chunk list 367 | if not address_in_chunks(ref, chunks): 368 | new_chunk = get_flow_code_from_address(ref) 369 | # Add the chunk to the chunks to process 370 | # and to the set containing all visited 371 | # chunks 372 | if new_chunk is not None: 373 | chunks_todo.append(new_chunk) 374 | chunks.add(new_chunk) 375 | 376 | elif arch.is_unconditional_branch(instruction): 377 | packet.add_unconditional_branch(head, ref) 378 | 379 | # If the target is not in our chunk list 380 | if not address_in_chunks(ref, chunks): 381 | new_chunk = get_flow_code_from_address(ref) 382 | # Add the chunk to the chunks to process 383 | # and to the set containing all visited 384 | # chunks 385 | if new_chunk is not None: 386 | chunks_todo.append(new_chunk) 387 | chunks.add(new_chunk) 388 | 389 | #skip = False 390 | 391 | for ref in idautils.DataRefsFrom(head): 392 | packet.add_data_reference(head, ref) 393 | 394 | # Get a data reference from the current reference's 395 | # location. For instance, if 'ref' points to a valid 396 | # address and such address contains a data reference 397 | # to code. 398 | target = list( idautils.DataRefsFrom(ref) ) 399 | if target: 400 | target = target[0] 401 | else: 402 | target = None 403 | 404 | if target is None and arch.is_call(instruction): 405 | imp_name = idc.Name(ref) 406 | 407 | imp_module = get_import_module_name(ref) 408 | 409 | imported_functions.add((ref, imp_name, imp_module)) 410 | packet.add_indirect_virtual_call(head, ref) 411 | 412 | elif target is not None and idc.isHead(target): 413 | # for calls "routed" through this reference 414 | if arch.is_call(instruction): 415 | packet.add_indirect_call(head, target) 416 | 417 | # for unconditional jumps "routed" through this reference 418 | elif arch.is_unconditional_branch(instruction): 419 | packet.add_unconditional_branch(head, target) 420 | 421 | # for conditional "routed" through this reference 422 | elif arch.is_conditional_branch(instruction): 423 | packet.add_conditional_branch_true(head, target) 424 | packet.add_conditional_branch_false( 425 | head, idaapi.next_head(head, chunk_end)) 426 | 427 | 428 | f = FunctionAnalyzer(arch, func_ea, packet) 429 | 430 | instrumentation.new_packet(packet) 431 | instrumentation.new_function(f) 432 | 433 | 434 | idata_seg_start = 0 435 | idata_seg_end = 0 436 | # Will contain a list of tuples of the form: 437 | # ((range_start, range_end), name_for_the_range_of_addresses) 438 | # The name is valid for the range of addresses it covers in the 439 | # .idata segment, once a new name is defined, a new range starts 440 | # 441 | module_names = None 442 | 443 | 444 | def get_import_module_name(address): 445 | 446 | global module_names 447 | global idata_seg_start 448 | global idata_seg_end 449 | 450 | segment_eas = list( idautils.Segments() ) 451 | 452 | # This hasn't been initialized yet... 453 | # 454 | if module_names is None: 455 | 456 | module_names = list() 457 | for idata_seg_start in segment_eas: 458 | print "Going through segment %08X" % idata_seg_start 459 | segment = idaapi.getseg(idata_seg_start) 460 | if segment.type != idaapi.SEG_XTRN: 461 | continue 462 | print "Found idata segment" 463 | 464 | idata_seg_end = idc.SegEnd(idata_seg_start) 465 | 466 | parse = re.compile('.*Imports\s+from\s+([\w\d]+\.[\w\d]+).*', re.IGNORECASE) 467 | 468 | # save the address/module name combinations we discover 469 | # 470 | modules = list() 471 | 472 | # Scan the .idata segment looking for the imports from 473 | # string and get the address ranges where it applies 474 | # 475 | for head in idautils.Heads(idata_seg_start, idata_seg_end): 476 | for line_id in range(100): 477 | line = idc.LineA(head, line_id) 478 | if line and 'imports from' in line.lower(): 479 | res = parse.match(line) 480 | if res: 481 | print 'Found import line [%s][%s]' % (line, res.group(1)) 482 | modules.append( (head, res.group(1).lower()) ) 483 | 484 | modules.append( (idata_seg_end, None) ) 485 | for idx in range(len(modules)-1): 486 | mod = modules[idx] 487 | module_names.append( ( (mod[0], modules[idx+1][0]), mod[1] ) ) 488 | 489 | 490 | for addr_range, module_name in module_names: 491 | if addr_range[0] <= address < addr_range[1]: 492 | return module_name 493 | 494 | return None 495 | 496 | def load_function_set(): 497 | 498 | function_addresses = set() 499 | 500 | dataf_path = idaapi.idadir('function_set.txt') 501 | if os.path.exists(dataf_path) and os.path.isfile(dataf_path): 502 | 503 | dataf = file(dataf_path, 'rt') 504 | while True: 505 | 506 | line = dataf.readline() 507 | if not line: 508 | break 509 | try: 510 | function_addresses.add(int(line, 16)) 511 | except ValueError: 512 | pass 513 | 514 | dataf.close() 515 | 516 | return function_addresses 517 | 518 | def process_section_data(arch, section, section_end): 519 | 520 | log_message('Fetching data for section...') 521 | 522 | section_data = list() 523 | for addr in range(section, section_end): 524 | if idaapi.isLoaded(addr): 525 | section_data.append( chr(idc.Byte(addr)) ) 526 | else: 527 | # If there's undefined data in the middle 528 | # of a section, nothing after that point 529 | # is exported 530 | break 531 | 532 | section_data = ''.join(section_data) 533 | sect = Section(idc.SegName(section), 0, section, section_end, section_data) 534 | 535 | log_message('Inserting section data (%d bytes)...' % (len(section_data))) 536 | instrumentation.new_section(sect) 537 | 538 | def workaround_Functions(start=idaapi.cvar.inf.minEA, end=idaapi.cvar.inf.maxEA): 539 | """ 540 | Get a list of functions 541 | 542 | @param start: start address (default: inf.minEA) 543 | @param end: end address (default: inf.maxEA) 544 | 545 | @return: list of heads between start and end 546 | 547 | @note: The last function that starts before 'end' is included even 548 | if it extends beyond 'end'. 549 | """ 550 | func = idaapi.get_func(start) 551 | if not func: 552 | func = idaapi.get_next_func(start) 553 | while func and func.startEA < end: 554 | startea = func.startEA 555 | yield startea 556 | func = idaapi.get_next_func(startea) 557 | addr = startea 558 | while func and startea == func.startEA: 559 | addr = idaapi.next_head(addr, end) 560 | func = idaapi.get_next_func(addr) 561 | 562 | 563 | def process_binary(arch, process_sections, iteration, already_imported): 564 | 565 | global imported_functions 566 | 567 | total_function_count = 0 568 | 569 | imported_functions = set() 570 | 571 | functions_to_export = load_function_set() 572 | 573 | FUNCTIONS_PER_RUN = 5000 574 | 575 | if iteration == -1: 576 | firstFunction = 0 577 | lastFunction = 0x7FFFFFFF 578 | else: 579 | firstFunction = iteration * FUNCTIONS_PER_RUN 580 | lastFunction = firstFunction + FUNCTIONS_PER_RUN - 1 581 | 582 | segment_list = list( idautils.Segments() ) 583 | 584 | if 'MEMORY_PROFILE' in os.environ: 585 | #h = hp.heap() 586 | mem_info() 587 | 588 | segment_count = 1 589 | incomplete = False 590 | for seg_ea in segment_list: 591 | 592 | seg_end = idc.SegEnd(seg_ea) 593 | 594 | if process_sections: 595 | process_section_data(arch, seg_ea, seg_end) 596 | 597 | function_list = set(f for f in workaround_Functions(seg_ea, seg_end) if idc.SegStart(f)==seg_ea) 598 | function_count = 1 599 | 600 | if functions_to_export: 601 | log_message('Only exporting %d functions: %s' % ( 602 | len(functions_to_export), 603 | str([hex(ea) for ea in functions_to_export])) ) 604 | 605 | function_list = filter(lambda x:x in function_list, functions_to_export) 606 | 607 | if not function_list: 608 | log_message('Processing: Segment[%d/%d]. No Functions' % ( 609 | segment_count, len(segment_list)) ) 610 | 611 | for func_ea in function_list: 612 | 613 | if total_function_count < firstFunction: 614 | total_function_count += 1 615 | continue 616 | 617 | if total_function_count > lastFunction: 618 | incomplete = True 619 | break 620 | 621 | total_function_count += 1 622 | 623 | log_message( 624 | 'Processing: Segment[%d/%d]. Function[%d/%d] at [%x]. Time elapsed: %s. Avg time per function: %s' % ( 625 | segment_count, len(segment_list), 626 | function_count, len(function_list), func_ea, 627 | get_time_delta_string(), get_avg_time_string(total_function_count) ) ) 628 | 629 | process_function(arch, func_ea) 630 | 631 | function_count +=1 632 | 633 | segment_count += 1 634 | 635 | if incomplete: 636 | break 637 | 638 | for imp_addr, imp_name, imp_module in imported_functions: 639 | 640 | if iteration != -1 and imp_addr in already_imported: 641 | continue 642 | 643 | packet = DismantlerDataPacket() 644 | packet.add_instruction(None, imp_addr, None, [], [], '') 645 | instrumentation.new_packet(packet) 646 | 647 | f = functional_unit.ImportedFunction(imp_addr, imp_name, imp_module) 648 | instrumentation.new_function(f) 649 | 650 | if 'MEMORY_PROFILE' in os.environ: 651 | mem_info() 652 | 653 | return incomplete 654 | 655 | def query_configuration(): 656 | 657 | # Set the default values to None 658 | db_engine, db_host, db_name, db_user, db_password = (None,)*5 659 | 660 | class ExportChoose(idaapi.Choose): 661 | def __init__(self, engines = []): 662 | idaapi.Choose.__init__(self, engines, 'Select Database Type', 1) 663 | self.width = 30 664 | 665 | def sizer(self): 666 | return len(self.list)-1 667 | 668 | engines = [ 669 | DB_ENGINE.MYSQL, DB_ENGINE.POSTGRESQL, 670 | DB_ENGINE.MYSQLDUMP, 'Export Method'] 671 | dlg = ExportChoose(engines) 672 | 673 | chosen_one = dlg.choose() 674 | if chosen_one>0: 675 | db_engine = engines[chosen_one-1] 676 | 677 | if db_engine == DB_ENGINE.MYSQLDUMP: 678 | # If a SQL dump is going to be generated, no DB 679 | # parameters are needed 680 | # 681 | return db_engine, '', '', '' ,'' 682 | 683 | db_host = idc.AskStr('localhost', '[1/4] Enter database host:') 684 | if not db_host is None: 685 | db_name = idc.AskStr('db_name', '[2/4] Enter database(schema) name:') 686 | if not db_name is None: 687 | db_user = idc.AskStr('root', '[3/4] Enter database user:') 688 | if not db_user is None: 689 | db_password = idc.AskStr('', '[4/4] Enter password for user:') 690 | 691 | return db_engine, db_host, db_name, db_user, db_password 692 | 693 | def get_time_delta_string(): 694 | global tm_start 695 | 696 | tm_delta = time.time() - tm_start 697 | 698 | tm_delta_tup = [t-z for (t,z) in zip( time.localtime(tm_delta), time.localtime(0) )] 699 | 700 | tm_delta_str = '%02d:%02d:%02d.%03d' % ( 701 | tm_delta_tup[3], tm_delta_tup[4], 702 | tm_delta_tup[5], 1000*(tm_delta-long(tm_delta))) 703 | 704 | return tm_delta_str 705 | 706 | def get_avg_time_string(items): 707 | global tm_start 708 | 709 | tm_delta = time.time() - tm_start 710 | tm_delta = float(tm_delta) / items 711 | 712 | tm_delta_tup = [t-z for (t,z) in zip( time.localtime(tm_delta), time.localtime(0) )] 713 | 714 | tm_delta_str = '%02d:%02d:%02d.%03d' % ( 715 | tm_delta_tup[3], tm_delta_tup[4], 716 | tm_delta_tup[5], 1000*(tm_delta-long(tm_delta))) 717 | 718 | return tm_delta_str 719 | 720 | def main(): 721 | 722 | global tm_start 723 | 724 | for mod in ('metapc', 'ppc', 'arm'): 725 | arch_mod = __import__('arch.%s' % mod, globals(), locals(), ['*']) 726 | arch = arch_mod.Arch() 727 | if arch: 728 | if arch.check_arch(): 729 | # This is a valid module for the current architecure 730 | # so the search has finished 731 | log_message('Using architecture module [%s]' % mod) 732 | break 733 | else: 734 | log_message('No module found to process the current architecure [%s]. Exiting.' % (arch.processor_name)) 735 | return 736 | 737 | global instrumentation 738 | 739 | log_message('Initialization sucessful.') 740 | 741 | db_engine, db_host, db_name, db_user, db_password = (None,)*5 742 | batch_mode = False 743 | module_comment = '' 744 | process_sections = False 745 | 746 | 747 | # If the configuration filename has been fetched from the 748 | # environment variables, then use that. 749 | # 750 | if CONFIG_FILE_NAME: 751 | config_file_path = CONFIG_FILE_NAME 752 | 753 | # Otherwise fallback into the one expected in the IDA directory 754 | # 755 | else: 756 | config_file_path = os.path.join(idaapi.idadir(''), 'ida2sql.cfg') 757 | 758 | 759 | if os.path.exists(config_file_path): 760 | cfg = ConfigParser.ConfigParser() 761 | cfg.read(config_file_path) 762 | 763 | if cfg.has_section('database'): 764 | if cfg.has_option('database', 'engine'): 765 | db_engine = getattr(DB_ENGINE, cfg.get('database', 'engine')) 766 | 767 | if cfg.has_option('database', 'host'): 768 | db_host = cfg.get('database', 'host') 769 | 770 | if cfg.has_option('database', 'schema'): 771 | db_name = cfg.get('database', 'schema') 772 | 773 | if cfg.has_option('database', 'user'): 774 | db_user = cfg.get('database', 'user') 775 | 776 | if cfg.has_option('database', 'password'): 777 | db_password = cfg.get('database', 'password') 778 | 779 | if cfg.has_option('importing', 'mode'): 780 | batch_mode = cfg.get('importing', 'mode') 781 | 782 | if batch_mode.lower() in ('batch', 'auto'): 783 | batch_mode = True 784 | 785 | if cfg.has_option('importing', 'comment'): 786 | module_comment = cfg.get('importing', 'comment') 787 | 788 | if cfg.has_option('importing', 'process_sections'): 789 | process_sections = cfg.get('importing', 'process_sections') 790 | 791 | if process_sections.lower() in ('no', 'false'): 792 | process_sections = False 793 | else: 794 | process_sections = True 795 | 796 | 797 | if None in (db_engine, db_host, db_name, db_user, db_password): 798 | 799 | (db_engine, db_host, 800 | db_name, db_user, 801 | db_password) = query_configuration() 802 | 803 | if None in (db_engine, db_host, db_name, db_user, db_password): 804 | log_message('User cancelled the exporting.') 805 | return 806 | 807 | failed = False 808 | try: 809 | sqlexporter = SQLExporter(arch, db_engine, db=db_name, 810 | user=db_user, passwd=db_password, host=db_host, use_new_schema=USE_NEW_SCHEMA) 811 | except ImportError: 812 | print "Error connecting to the database, error importing required module: %s" % sys.exc_info()[0] 813 | failed = True 814 | except Exception: 815 | print "Error connecting to the database, Reason: %s" % sys.exc_info()[0] 816 | failed = True 817 | 818 | if failed: 819 | # Can't connect to the database, indicate that to BinNavi 820 | if batch_mode is True: 821 | idc.Exit(FATAL_CANNOT_CONNECT_TO_DATABASE) 822 | else: 823 | return 824 | 825 | if not sqlexporter.is_database_ready(): 826 | 827 | if batch_mode is False: 828 | result = idc.AskYN(1, 'Database has not been initialized yet. Do you want to create now the basic tables? (This step is performed only once)') 829 | else: 830 | result = 1 831 | 832 | if result == 1: 833 | sqlexporter.init_database() 834 | else: 835 | log_message('User requested abort.') 836 | return 837 | 838 | iteration = os.environ.get('EXPORT_ITERATION', None) 839 | module_id = os.environ.get('MODULE_ID', None) 840 | 841 | if iteration is None and module_id == None: 842 | # Export manually 843 | print "Exporting manually ..." 844 | iteration = -1 845 | sqlexporter.set_callgraph_only(False) 846 | sqlexporter.set_exporting_manually(True) 847 | status = sqlexporter.new_module( 848 | idc.GetInputFilePath(), arch.get_architecture_name(), idaapi.get_imagebase(), module_comment, batch_mode) 849 | 850 | elif iteration is not None and module_id is not None: 851 | 852 | # Export the next k functions or the call graph 853 | sqlexporter.set_exporting_manually(False) 854 | sqlexporter.set_callgraph_only(int(iteration) == -1) 855 | sqlexporter.set_module_id(int(module_id)) 856 | status = True 857 | 858 | else: 859 | 860 | sqlexporter.set_exporting_manually(False) 861 | status = sqlexporter.new_module( 862 | idc.GetInputFilePath(), arch.get_architecture_name(), idaapi.get_imagebase(), module_comment, batch_mode) 863 | sqlexporter.set_callgraph_only(False) 864 | 865 | if status is False: 866 | log_message('Export aborted') 867 | return 868 | elif status is None: 869 | log_message('The database appears to contain data exported with different schemas, exporting not allowed.') 870 | if batch_mode: 871 | idc.Exit(FATAL_INVALID_SCHEMA_VERSION) 872 | 873 | instrumentation = Instrumentation() 874 | 875 | instrumentation.new_function_callable(sqlexporter.process_function) 876 | instrumentation.new_packet_callable(sqlexporter.process_packet) 877 | instrumentation.new_section_callable(sqlexporter.process_section) 878 | 879 | 880 | tm_start = time.time() 881 | 882 | already_imported = sqlexporter.db.get_already_imported() 883 | 884 | incomplete = process_binary(arch, process_sections, int(iteration), already_imported) 885 | 886 | sqlexporter.finish() 887 | 888 | log_message('Results: %d functions, %d instructions, %d basic blocks, %d address references' % ( 889 | len(sqlexporter.exported_functions), len(sqlexporter.exported_instructions), 890 | sqlexporter.basic_blocks_next_id-1, sqlexporter.address_references_values_count )) 891 | 892 | log_message('Results: %d expression substitutions, %d operand expressions, %d operand tuples' % ( 893 | sqlexporter.expression_substitutions_values_count, sqlexporter.operand_expressions_values_count, 894 | sqlexporter.operand_tuples___operands_values_count ) ) 895 | 896 | 897 | log_message('Exporting completed in %s' % get_time_delta_string()) 898 | 899 | # If running in batch mode, exit when done 900 | if batch_mode: 901 | if incomplete: 902 | shiftedModule = (sqlexporter.db.module_id << 0x10) | 0xFF 903 | 904 | idc.Exit(shiftedModule) 905 | elif not sqlexporter.callgraph_only: 906 | shiftedModule = (sqlexporter.db.module_id << 0x10) | 0xFE 907 | 908 | idc.Exit(shiftedModule) 909 | else: 910 | idc.Exit(0) 911 | -------------------------------------------------------------------------------- /ida_to_sql/instrumentation.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | 3 | """zynamics GmbH IDA to SQL exporter. 4 | 5 | This module exports IDA's IDB database information into zynamics's SQL format. 6 | 7 | References: 8 | 9 | zynamics GmbH: http://www.zynamics.com/ 10 | MySQL: http://www.mysql.com 11 | IDA: http://www.datarescue.com/idabase/ 12 | 13 | Programmed and tested with IDA 5.4-5.7, Python 2.5/2.6 and IDAPython >1.0 on Windows & OSX 14 | by Ero Carrera & the zynamics team (c) zynamics GmbH 2006 - 2010 [ero.carrera@zynamics.com] 15 | 16 | Distributed under GPL license [http://opensource.org/licenses/gpl-license.php]. 17 | """ 18 | 19 | __author__ = 'Ero Carrera' 20 | __license__ = 'GPL' 21 | 22 | import common 23 | __version__ = common.__version__ 24 | 25 | 26 | 27 | class Instrumentation: 28 | """This class provides an instrumentation interface 29 | to Dismantler, with it it's possible to receive events 30 | as the disassemble progresses. Useful applications are 31 | generation of statistics and exporting data as the 32 | disassembly progresses.""" 33 | 34 | instrument_hooks = ( 35 | 'new_instruction', 'new_section', 36 | 'new_function', 'new_operand', 37 | 'new_packet', 'new_entrypoint' ) 38 | 39 | 40 | def __init__(self, enabled=True): 41 | 42 | self.enabled = enabled 43 | 44 | for hook in self.instrument_hooks: 45 | 46 | callable_attr = '__'+hook+'_callable__' 47 | 48 | # Create method to set a provided instrumentation 49 | # function to be called for the specific hooked event 50 | def set_hook(function, attr=callable_attr): 51 | setattr(self, attr, function) 52 | 53 | # Add function to the attributes of the class 54 | setattr(self, hook+'_callable', set_hook) 55 | 56 | # Create method to call the instrumentation function 57 | def call_hook(arg, attr=callable_attr): 58 | if self.enabled and hasattr(self, attr): 59 | function = getattr(self, attr) 60 | # foo 61 | if function: 62 | function(arg) 63 | 64 | setattr(self, hook, call_hook) 65 | 66 | 67 | def enable(self): 68 | self.enabled = True 69 | 70 | def disable(self): 71 | self.enabled = False 72 | 73 | -------------------------------------------------------------------------------- /ida_to_sql/memory_info.py: -------------------------------------------------------------------------------- 1 | # ActiveState Recipe 2 | # [Linux Only] 3 | # http://code.activestate.com/recipes/286222-memory-usage/ 4 | 5 | import os 6 | 7 | _proc_status = '/proc/%d/status' % os.getpid() 8 | 9 | _scale = {'kB': 1024.0, 'mB': 1024.0*1024.0, 10 | 'KB': 1024.0, 'MB': 1024.0*1024.0} 11 | 12 | def _VmB(VmKey): 13 | '''Private. 14 | ''' 15 | global _proc_status, _scale 16 | # get pseudo file /proc//status 17 | try: 18 | t = open(_proc_status) 19 | v = t.read() 20 | t.close() 21 | except: 22 | return 0.0 # non-Linux? 23 | # get VmKey line e.g. 'VmRSS: 9999 kB\n ...' 24 | i = v.index(VmKey) 25 | v = v[i:].split(None, 3) # whitespace 26 | if len(v) < 3: 27 | return 0.0 # invalid format? 28 | # convert Vm value to bytes 29 | return float(v[1]) * _scale[v[2]] 30 | 31 | 32 | def memory(since=0.0): 33 | '''Return memory usage in bytes. 34 | ''' 35 | return _VmB('VmSize:') - since 36 | 37 | 38 | def resident(since=0.0): 39 | '''Return resident memory usage in bytes. 40 | ''' 41 | return _VmB('VmRSS:') - since 42 | 43 | 44 | def stacksize(since=0.0): 45 | '''Return stack size in bytes. 46 | ''' 47 | return _VmB('VmStk:') - since 48 | --------------------------------------------------------------------------------