├── cyber-ml-logo.png ├── CONTRIBUTING.md ├── README_ch.md ├── LICENSE.txt └── README.md /cyber-ml-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Biprodeep/awesome-ml-for-cybersecurity/HEAD/cyber-ml-logo.png -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contribution Guidelines 2 | 3 | Please ensure your pull request adheres to the following guidelines: 4 | 5 | - Read [the awesome manifesto](https://github.com/sindresorhus/awesome/blob/master/awesome.md) and ensure your list complies. 6 | - Search previous suggestions before making a new one, as yours may be a duplicate. 7 | - Make sure your list is useful before submitting. That implies it having enough content and every item a good succinct description. 8 | - A link back to this list from yours, so users can discover more lists, would be appreciated. 9 | - Make an individual pull request for each suggestion. 10 | - Titles should be [capitalized](http://grammar.yourdictionary.com/capitalization/rules-for-capitalization-in-titles.html). 11 | - Use the following format: `[List Name](link)` 12 | - Link additions should be added to the bottom of the relevant category. 13 | - New categories or improvements to the existing categorization are welcome. 14 | - Check your spelling and grammar. 15 | - Make sure your text editor is set to remove trailing whitespace. 16 | - The pull request and commit should have a useful title. 17 | 18 | Thank you for your suggestions! -------------------------------------------------------------------------------- /README_ch.md: -------------------------------------------------------------------------------- 1 | # 网络安全中机器学习大合集 [![Awesom](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) 2 | 3 | [](https://github.com/jivoi/awesome-ml-for-cybersecurity) 4 | 5 | 历年来那些与网络安全中机器学习相关最好的工具与资源 6 | 7 | ## 目录 8 | 9 | - [数据集](#-datasets) 10 | - [论文](#-papers) 11 | - [书籍](#-books) 12 | - [演讲](#-talks) 13 | - [教程](#-tutorials) 14 | - [课程](#-courses) 15 | - [杂项](#-miscellaneous) 16 | 17 | ## [↑](#table-of-contents) 贡献 18 | 19 | 如果你想要添加工具或资源请参阅 [CONTRIBUTING](./CONTRIBUTING.md) 20 | 21 | ## [↑](#table-of-contents) 数据集 22 | 23 | * [安全相关数据样本集](http://www.secrepo.com/) 24 | * [DARPA 入侵检测数据集](https://www.ll.mit.edu/ideval/data/) 25 | * [Stratosphere IPS 数据集](https://stratosphereips.org/category/dataset.html) 26 | * [开放数据集](http://csr.lanl.gov/data/) 27 | * [NSA 的数据捕获](http://www.westpoint.edu/crc/SitePages/DataSets.aspx) 28 | * [ADFA 入侵检测数据集](https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-IDS-Datasets/) 29 | * [NSL-KDD 数据集](https://github.com/defcom17/NSL_KDD) 30 | * [恶意 URL 数据集](http://sysnet.ucsd.edu/projects/url/) 31 | * [多源安全事件数据集](http://csr.lanl.gov/data/cyber1/) 32 | * [恶意软件训练集](http://marcoramilli.blogspot.cz/2016/12/malware-training-sets-machine-learning.html) 33 | * [KDD Cup 1999 数据集](http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html) 34 | * [Web 攻击载荷](https://github.com/foospidy/payloads) 35 | * [WAF 恶意请求数据集](https://github.com/faizann24/Fwaf-Machine-Learning-driven-Web-Application-Firewall) 36 | * [恶意软件训练数据集](https://github.com/marcoramilli/MalwareTrainingSets) 37 | * [Aktaion 数据集](https://github.com/jzadeh/Aktaion/tree/master/data) 38 | * [DeepEnd 研究中的犯罪数据集](https://www.dropbox.com/sh/7fo4efxhpenexqp/AADHnRKtL6qdzCdRlPmJpS8Aa/CRIME?dl=0) 39 | * [公开可用的 PCAP 文件数据集](http://www.netresec.com/?page=PcapFiles) 40 | 41 | ## [↑](#table-of-contents) 论文 42 | 43 | * [快速、可靠、准确:使用神经网络建模猜测密码](https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/melicher) 44 | * [封闭世界之外,应用机器学习在网络入侵检测](http://ieeexplore.ieee.org/document/5504793/?reload=true) 45 | * [基于 Payload 的异常网络入侵检测](https://link.springer.com/chapter/10.1007/978-3-540-30143-1_11) 46 | * [使用元数据与结构特征检测恶意 PDF](http://dl.acm.org/citation.cfm?id=2420987) 47 | * [对抗性支持向量机学习](https://dl.acm.org/citation.cfm?id=2339697) 48 | * [利用机器学习颠覆垃圾邮件过滤器](https://dl.acm.org/citation.cfm?id=1387709.1387716) 49 | * [CAMP – 内容不可知的恶意软件保护](http://www.covert.io/research-papers/security/CAMP%20-%20Content%20Agnostic%20Malware%20Protection.pdf) 50 | * [Notos – 构建动态 DNS 信誉系统](http://www.covert.io/research-papers/security/Notos%20-%20Building%20a%20dynamic%20reputation%20system%20for%20dns.pdf) 51 | * [Kopis – 在 DNS 上层结构中检测恶意软件的域名](http://www.covert.io/research-papers/security/Kopis%20-%20Detecting%20malware%20domains%20at%20the%20upper%20dns%20hierarchy.pdf) 52 | * [Pleiades – 检测基于 DGA 的恶意软件的崛起](http://www.covert.io/research-papers/security/From%20throw-away%20traffic%20to%20bots%20-%20detecting%20the%20rise%20of%20dga-based%20malware.pdf) 53 | * [EXPOSURE – 使用被动 DNS 分析找到恶意域名](http://www.covert.io/research-papers/security/Exposure%20-%20Finding%20malicious%20domains%20using%20passive%20dns%20analysis.pdf) 54 | * [Polonium – 恶意软件检测中万亿级图计算挖掘](http://www.covert.io/research-papers/security/Polonium%20-%20Tera-Scale%20Graph%20Mining%20for%20Malware%20Detection.pdf) 55 | * [Nazca – 在大规模网络中检测恶意软件分布](http://www.covert.io/research-papers/security/Nazca%20-%20%20Detecting%20Malware%20Distribution%20in%20Large-Scale%20Networks.pdf) 56 | * [PAYL – 基于 Payload 的网络异常入侵检测](http://www.covert.io/research-papers/security/PAYL%20-%20Anomalous%20Payload-based%20Network%20Intrusion%20Detection.pdf) 57 | * [Anagram – 用于对抗模仿攻击的内容异常检测](http://www.covert.io/research-papers/security/Anagram%20-%20A%20Content%20Anomaly%20Detector%20Resistant%20to%20Mimicry%20Attack.pdf) 58 | * [在网络安全中应用机器学习](https://www.researchgate.net/publication/283083699_Applications_of_Machine_Learning_in_Cyber_Security) 59 | * [用数据挖掘构建网络入侵检测系统(RUS)](http://vak.ed.gov.ru/az/server/php/filer.php?table=att_case&fld=autoref&key%5B%5D=100003407) 60 | * [数据挖掘在企业网络中构建入侵检测系统 (RUS)](http://engjournal.ru/articles/987/987.pdf) 61 | * [应用神经网络在计算机安全任务分层 (RUS)](http://engjournal.ru/articles/534/534.pdf) 62 | * [数据挖掘技术与入侵检测 (RUS)](http://vestnik.sibsutis.ru/uploads/1459329553_3576.pdf) 63 | * [网络入侵检测系统中的降维](http://elib.bsu.by/bitstream/123456789/120105/1/v17no3p284.pdf) 64 | 65 | ## [↑](#table-of-contents) 书籍 66 | 67 | * [网络安全中的数据挖掘与机器学习](https://www.amazon.com/Data-Mining-Machine-Learning-Cybersecurity/dp/1439839425) 68 | * [网络安全中的机器学习与数据挖掘](https://www.amazon.com/Machine-Learning-Mining-Computer-Security/dp/184628029X) 69 | * [网络异常检测:机器学习观点](https://www.amazon.com/Network-Anomaly-Detection-Learning-Perspective/dp/1466582081) 70 | 71 | ## [↑](#table-of-contents) 演讲 72 | 73 | * [应用机器学习来支撑信息安全](https://www.youtube.com/watch?v=tukidI5vuBs) 74 | * [利用不完整的信息进行网络防卫](https://www.youtube.com/watch?v=36IT9VgGr0g) 75 | * [机器学习应用于网络安全监测](https://www.youtube.com/watch?v=vy-jpFpm1AU) 76 | * [测量你威胁情报订阅的 IQ](https://www.youtube.com/watch?v=yG6QlHOAWiE) 77 | * [数据驱动的威胁情报:指标的传播与共享的度量](https://www.youtube.com/watch?v=6JMEKnes-w0) 78 | * [机器学习应对数据盗窃与其他主题](https://www.youtube.com/watch?v=dGwH7m4N8DE) 79 | * [基于机器学习监控的深度探索](https://www.youtube.com/watch?v=TYVCVzEJhhQ) 80 | * [Pwning 深度学习系统](https://www.youtube.com/watch?v=JAGDpJFFM2A) 81 | * [社会工程学中武器化的数据科学](https://www.youtube.com/watch?v=l7U0pDcsKLg) 82 | * [打败机器学习,你的安全厂商没告诉你的事儿](https://www.youtube.com/watch?v=oiuS1DyFNd8) 83 | * [集思广益,群体训练-恶意软件检测的机器学习模型](https://www.youtube.com/watch?v=u6a7afsD39A) 84 | * [打败机器学习,检测恶意软件的系统缺陷](https://www.youtube.com/watch?v=sPtbDUJjhbk) 85 | * [数据包捕获 – 如何使用机器学习发现恶意软件](https://www.youtube.com/watch?v=2cQRSPFSY-s) 86 | * [五分钟用机器学习构建反病毒软件](https://www.youtube.com/watch?v=iLNHVwSu9EA&t=245s) 87 | * [使用机器学习狩猎恶意软件](https://www.youtube.com/watch?v=zT-4zdtvR30) 88 | * [机器学习应用于威胁检测](https://www.youtube.com/watch?v=qVwktOa-F34) 89 | * [机器学习与云:扰乱检测与防御](https://www.youtube.com/watch?v=fRklX97iGIw) 90 | * [在欺诈检测中应用机器学习与深度学习](https://www.youtube.com/watch?v=gHtN4jU69W0) 91 | * [深度学习在流量识别上的应用](https://www.youtube.com/watch?v=B7OKgC3AJVM) 92 | * [利用不完整信息进行网络防卫:机器学习方法](https://www.youtube.com/watch?v=_0CRSF6yPB4) 93 | * [机器学习与数据科学](https://vimeo.com/112702666) 94 | * [云计算规模的机器学习应用于网络防御的进展](https://www.youtube.com/watch?v=skSIIvvZFIk) 95 | * [应用机器学习:打败现代恶意文档](https://www.youtube.com/watch?v=ZAuCEgA3itI) 96 | * [使用机器学习与 GPO 自动防御勒索软件](https://www.rsaconference.com/writable/presentations/file_upload/spo2-t11_automated-prevention-of-ransomware-with-machine-learning-and-gpos.pdf) 97 | * [通过挖掘安全文献检测恶意软件](https://www.usenix.org/conference/enigma2017/conference-program/presentation/dumitras) 98 | 99 | ## [↑](#table-of-contents) 教程 100 | 101 | * [点击安全数据窃听项目](http://clicksecurity.github.io/data_hacking/) 102 | * [使用神经网络生成人类可读的密码](http://fsecurify.com/using-neural-networks-to-generate-human-readable-passwords/) 103 | * [基于机器学习的密码强度分类](http://fsecurify.com/machine-learning-based-password-strength-checking/) 104 | * [应用机器学习在检测恶意 URL](http://fsecurify.com/using-machine-learning-detect-malicious-urls/) 105 | * [在安全与欺诈检测中的大数据与数据科学](http://www.kdnuggets.com/2015/12/big-data-science-security-fraud-detection.html) 106 | * [使用深度学习突破验证码](https://deepmlblog.wordpress.com/2016/01/03/how-to-break-a-captcha-system/) 107 | * [网络安全与入侵检测中的数据挖掘](https://www.r-bloggers.com/data-mining-for-network-security-and-intrusion-detection/) 108 | * [机器学习应用于网络安全与威胁狩猎简介](http://blog.sqrrl.com/an-introduction-to-machine-learning-for-cybersecurity-and-threat-hunting) 109 | * [应用机器学习提高入侵检测系统](https://securityintelligence.com/applying-machine-learning-to-improve-your-intrusion-detection-system/) 110 | * [使用 Suricata 与机器学习分析僵尸网络](http://blogs.splunk.com/2017/01/30/analyzing-botnets-with-suricata-machine-learning/) 111 | * [fWaf – 机器学习驱动的 Web 应用防火墙](http://fsecurify.com/fwaf-machine-learning-driven-web-application-firewall/) 112 | * [网络安全中的深度域学习](https://blog.cyberreboot.org/deep-session-learning-for-cyber-security-e7c0f6804b81#.eo2m4alid) 113 | 114 | ## [↑](#table-of-contents) 课程 115 | 116 | * [Stanford 的网络安全数据挖掘课](http://web.stanford.edu/class/cs259d/) 117 | * [Infosec 数据科学与机器学习](http://www.pentesteracademy.com/course?id=30) 118 | 119 | ## [↑](#table-of-contents) 杂项 120 | 121 | * [使用人类专家的输入对网络攻击达到 85% 的预测系统](http://news.mit.edu/2016/ai-system-predicts-85-percent-cyber-attacks-using-input-human-experts-0418) 122 | * [使用机器学习的网络安全项目开源列表](http://www.mlsecproject.org/#open-source-projects) 123 | 124 | ## 许可证 125 | 126 | ![cc license](http://i.creativecommons.org/l/by-sa/4.0/88x31.png) 127 | 128 | 许可证为 [Creative Commons Attribution-ShareAlike 4.0 International](http://creativecommons.org/licenses/by-sa/4.0/) 129 | 130 | 131 | 132 | 133 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | ## Creative Commons Attribution-ShareAlike 4.0 International Public License 2 | 3 | 4 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. 5 | 6 | Section 1 – Definitions. 7 | 8 | Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. 9 | Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. 10 | BY-SA Compatible License means a license listed at creativecommons.org/compatiblelicenses, approved by Creative Commons as essentially the equivalent of this Public License. 11 | Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. 12 | Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. 13 | Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. 14 | License Elements means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution and ShareAlike. 15 | Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License. 16 | Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. 17 | Licensor means the individual(s) or entity(ies) granting rights under this Public License. 18 | Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. 19 | Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. 20 | You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. 21 | 22 | Section 2 – Scope. 23 | 24 | License grant. 25 | Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: 26 | reproduce and Share the Licensed Material, in whole or in part; and 27 | produce, reproduce, and Share Adapted Material. 28 | Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 29 | Term. The term of this Public License is specified in Section 6(a). 30 | Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. 31 | Downstream recipients. 32 | Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. 33 | Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply. 34 | No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 35 | No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). 36 | 37 | Other rights. 38 | Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 39 | Patent and trademark rights are not licensed under this Public License. 40 | To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties. 41 | 42 | Section 3 – License Conditions. 43 | 44 | Your exercise of the Licensed Rights is expressly made subject to the following conditions. 45 | 46 | Attribution. 47 | 48 | If You Share the Licensed Material (including in modified form), You must: 49 | retain the following if it is supplied by the Licensor with the Licensed Material: 50 | identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); 51 | a copyright notice; 52 | a notice that refers to this Public License; 53 | a notice that refers to the disclaimer of warranties; 54 | a URI or hyperlink to the Licensed Material to the extent reasonably practicable; 55 | indicate if You modified the Licensed Material and retain an indication of any previous modifications; and 56 | indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 57 | You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 58 | If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 59 | ShareAlike. 60 | 61 | In addition to the conditions in Section 3(a), if You Share Adapted Material You produce, the following conditions also apply. 62 | The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License. 63 | You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material. 64 | You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply. 65 | 66 | Section 4 – Sui Generis Database Rights. 67 | 68 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: 69 | 70 | for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database; 71 | if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section 3(b); and 72 | You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. 73 | 74 | For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. 75 | 76 | Section 5 – Disclaimer of Warranties and Limitation of Liability. 77 | 78 | Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You. 79 | To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You. 80 | 81 | The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. 82 | 83 | Section 6 – Term and Termination. 84 | 85 | This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. 86 | 87 | Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 88 | automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 89 | upon express reinstatement by the Licensor. 90 | For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. 91 | For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. 92 | Sections 1, 5, 6, 7, and 8 survive termination of this Public License. 93 | 94 | Section 7 – Other Terms and Conditions. 95 | 96 | The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. 97 | Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. 98 | 99 | Section 8 – Interpretation. 100 | 101 | For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. 102 | To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. 103 | No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. 104 | Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. 105 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Machine Learning for Cyber Security [![Awesom](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) 2 | 3 | [](https://github.com/jivoi/awesome-ml-for-cybersecurity) 4 | 5 | A curated list of amazingly awesome tools and resources related to the use of machine learning for cyber security. 6 | 7 | ## Table of Contents 8 | 9 | - [Datasets](#-datasets) 10 | - [Papers](#-papers) 11 | - [Books](#-books) 12 | - [Talks](#-talks) 13 | - [Tutorials](#-tutorials) 14 | - [Courses](#-courses) 15 | - [Miscellaneous](#-miscellaneous) 16 | 17 | ## [↑](#table-of-contents) Contributing 18 | 19 | Please read [CONTRIBUTING](./CONTRIBUTING.md) if you wish to add tools or resources. 20 | 21 | ## [↑](#table-of-contents) Datasets 22 | 23 | * [Samples of Security Related Data](http://www.secrepo.com/) 24 | * [DARPA Intrusion Detection Data Sets](https://www.ll.mit.edu/ideval/data/) 25 | * [Stratosphere IPS Data Sets](https://stratosphereips.org/category/dataset.html) 26 | * [Open Data Sets](http://csr.lanl.gov/data/) 27 | * [Data Capture from National Security Agency](http://www.westpoint.edu/crc/SitePages/DataSets.aspx) 28 | * [The ADFA Intrusion Detection Data Sets](https://www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-IDS-Datasets/) 29 | * [NSL-KDD Data Sets](https://github.com/defcom17/NSL_KDD) 30 | * [Malicious URLs Data Sets](http://sysnet.ucsd.edu/projects/url/) 31 | * [Multi-Source Cyber-Security Events](http://csr.lanl.gov/data/cyber1/) 32 | * [Malware Training Sets: A machine learning dataset for everyone](http://marcoramilli.blogspot.cz/2016/12/malware-training-sets-machine-learning.html) 33 | * [KDD Cup 1999 Data](http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html) 34 | * [Web Attack Payloads](https://github.com/foospidy/payloads) 35 | * [WAF Malicious Queries Data Sets](https://github.com/faizann24/Fwaf-Machine-Learning-driven-Web-Application-Firewall) 36 | * [Malware Training Data Sets](https://github.com/marcoramilli/MalwareTrainingSets) 37 | * [Aktaion Data Sets](https://github.com/jzadeh/Aktaion/tree/master/data) 38 | * [CRIME Database from DeepEnd Research](https://www.dropbox.com/sh/7fo4efxhpenexqp/AADHnRKtL6qdzCdRlPmJpS8Aa/CRIME?dl=0) 39 | * [Publicly available PCAP files](http://www.netresec.com/?page=PcapFiles) 40 | * [2007 TREC Public Spam Corpus](https://plg.uwaterloo.ca/~gvcormac/treccorpus07/) 41 | 42 | ## [↑](#table-of-contents) Papers 43 | 44 | * [Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks](https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/melicher) 45 | * [Outside the Closed World: On Using Machine Learning for Network Intrusion Detection](http://ieeexplore.ieee.org/document/5504793/?reload=true) 46 | * [Anomalous Payload-Based Network Intrusion Detection](https://link.springer.com/chapter/10.1007/978-3-540-30143-1_11) 47 | * [Malicious PDF detection using metadata and structural features](http://dl.acm.org/citation.cfm?id=2420987) 48 | * [Adversarial support vector machine learning](https://dl.acm.org/citation.cfm?id=2339697) 49 | * [Exploiting machine learning to subvert your spam filter](https://dl.acm.org/citation.cfm?id=1387709.1387716) 50 | * [CAMP – Content Agnostic Malware Protection](http://www.covert.io/research-papers/security/CAMP%20-%20Content%20Agnostic%20Malware%20Protection.pdf) 51 | * [Notos – Building a Dynamic Reputation System for DNS](http://www.covert.io/research-papers/security/Notos%20-%20Building%20a%20dynamic%20reputation%20system%20for%20dns.pdf) 52 | * [Kopis – Detecting malware domains at the upper dns hierarchy](http://www.covert.io/research-papers/security/Kopis%20-%20Detecting%20malware%20domains%20at%20the%20upper%20dns%20hierarchy.pdf) 53 | * [Pleiades – From Throw-away Traffic To Bots – Detecting The Rise Of DGA-based Malware](http://www.covert.io/research-papers/security/From%20throw-away%20traffic%20to%20bots%20-%20detecting%20the%20rise%20of%20dga-based%20malware.pdf) 54 | * [EXPOSURE – Finding Malicious Domains Using Passive DNS Analysis](http://www.covert.io/research-papers/security/Exposure%20-%20Finding%20malicious%20domains%20using%20passive%20dns%20analysis.pdf) 55 | * [Polonium – Tera-Scale Graph Mining for Malware Detection](http://www.covert.io/research-papers/security/Polonium%20-%20Tera-Scale%20Graph%20Mining%20for%20Malware%20Detection.pdf) 56 | * [Nazca – Detecting Malware Distribution in Large-Scale Networks](http://www.covert.io/research-papers/security/Nazca%20-%20%20Detecting%20Malware%20Distribution%20in%20Large-Scale%20Networks.pdf) 57 | * [PAYL – Anomalous Payload-based Network Intrusion Detection](http://www.covert.io/research-papers/security/PAYL%20-%20Anomalous%20Payload-based%20Network%20Intrusion%20Detection.pdf) 58 | * [Anagram – A Content Anomaly Detector Resistant to Mimicry Attacks](http://www.covert.io/research-papers/security/Anagram%20-%20A%20Content%20Anomaly%20Detector%20Resistant%20to%20Mimicry%20Attack.pdf) 59 | * [Applications of Machine Learning in Cyber Security](https://www.researchgate.net/publication/283083699_Applications_of_Machine_Learning_in_Cyber_Security) 60 | * [Data Mining для построения систем обнаружения сетевых атак (RUS)](http://vak.ed.gov.ru/az/server/php/filer.php?table=att_case&fld=autoref&key%5B%5D=100003407) 61 | * [Выбор технологий Data Mining для систем обнаружения вторжений в корпоративную сеть (RUS)](http://engjournal.ru/articles/987/987.pdf) 62 | * [Нейросетевой подход к иерархическому представлению компьютерной сети в задачах информационной безопасности (RUS)](http://engjournal.ru/articles/534/534.pdf) 63 | * [Методы интеллектуального анализа данных и обнаружение вторжений (RUS)](http://vestnik.sibsutis.ru/uploads/1459329553_3576.pdf) 64 | * [Dimension Reduction in Network Attacks Detection Systems](http://elib.bsu.by/bitstream/123456789/120105/1/v17no3p284.pdf) 65 | * [Rise of the machines: Machine Learning & its cyber security applications](https://www.nccgroup.trust/globalassets/our-research/uk/whitepapers/2017/rise-of-the-machines-preliminaries-wp-new-template-final_web.pdf) 66 | * [Machine Learning in Cyber Security: Age of the Centaurs](https://go.recordedfuture.com/hubfs/white-papers/machine-learning.pdf) 67 | * [Automatically Evading Classifiers A Case Study on PDF Malware Classifiers](https://www.cs.virginia.edu/~evans/pubs/ndss2016/) 68 | * [Weaponizing Data Science for Social Engineering — Automated E2E Spear Phishing on Twitter](https://www.blackhat.com/docs/us-16/materials/us-16-Seymour-Tully-Weaponizing-Data-Science-For-Social-Engineering-Automated-E2E-Spear-Phishing-On-Twitter.pdf) 69 | * [Machine Learning: A Threat-Hunting Reality Check](https://www.countercept.com/assets/Uploads/whitepapers/MWRI-Countercept-Machine-Learning-Whitepaper-2017-04-01.pdf) 70 | 71 | ## [↑](#table-of-contents) Books 72 | 73 | * [Data Mining and Machine Learning in Cybersecurity](https://www.amazon.com/Data-Mining-Machine-Learning-Cybersecurity/dp/1439839425) 74 | * [Machine Learning and Data Mining for Computer Security](https://www.amazon.com/Machine-Learning-Mining-Computer-Security/dp/184628029X) 75 | * [Network Anomaly Detection: A Machine Learning Perspective](https://www.amazon.com/Network-Anomaly-Detection-Learning-Perspective/dp/1466582081) 76 | * [Machine Learning and Security: Protecting Systems with Data and Algorithms](https://www.amazon.com/Machine-Learning-Security-Protecting-Algorithms/dp/1491979909) 77 | * [Introduction To Artificial Intelligence For Security Professionals](http://defense.ballastsecurity.net/static/IntroductionToArtificialIntelligenceForSecurityProfessionals_Cylance.pdf) 78 | 79 | ## [↑](#table-of-contents) Talks 80 | 81 | * [Using Machine Learning to Support Information Security](https://www.youtube.com/watch?v=tukidI5vuBs) 82 | * [Defending Networks with Incomplete Information](https://www.youtube.com/watch?v=36IT9VgGr0g) 83 | * [Applying Machine Learning to Network Security Monitoring](https://www.youtube.com/watch?v=vy-jpFpm1AU) 84 | * [Measuring the IQ of your Threat Intelligence Feeds](https://www.youtube.com/watch?v=yG6QlHOAWiE) 85 | * [Data-Driven Threat Intelligence: Metrics On Indicator Dissemination And Sharing](https://www.youtube.com/watch?v=6JMEKnes-w0) 86 | * [Applied Machine Learning for Data Exfil and Other Fun Topics](https://www.youtube.com/watch?v=dGwH7m4N8DE) 87 | * [Secure Because Math: A Deep-Dive on ML-Based Monitoring](https://www.youtube.com/watch?v=TYVCVzEJhhQ) 88 | * [Machine Duping 101: Pwning Deep Learning Systems](https://www.youtube.com/watch?v=JAGDpJFFM2A) 89 | * [Delta Zero, KingPhish3r – Weaponizing Data Science for Social Engineering](https://www.youtube.com/watch?v=l7U0pDcsKLg) 90 | * [Defeating Machine Learning What Your Security Vendor Is Not Telling You](https://www.youtube.com/watch?v=oiuS1DyFNd8) 91 | * [CrowdSource: Crowd Trained Machine Learning Model for Malware Capability Det](https://www.youtube.com/watch?v=u6a7afsD39A) 92 | * [Defeating Machine Learning: Systemic Deficiencies for Detecting Malware](https://www.youtube.com/watch?v=sPtbDUJjhbk) 93 | * [Packet Capture Village – Theodora Titonis – How Machine Learning Finds Malware](https://www.youtube.com/watch?v=2cQRSPFSY-s) 94 | * [Build an Antivirus in 5 Min – Fresh Machine Learning #7. A fun video to watch](https://www.youtube.com/watch?v=iLNHVwSu9EA&t=245s) 95 | * [Hunting for Malware with Machine Learning](https://www.youtube.com/watch?v=zT-4zdtvR30) 96 | * [Machine Learning for Threat Detection](https://www.youtube.com/watch?v=qVwktOa-F34) 97 | * [Machine Learning and the Cloud: Disrupting Threat Detection and Prevention](https://www.youtube.com/watch?v=fRklX97iGIw) 98 | * [Fraud detection using machine learning & deep learning](https://www.youtube.com/watch?v=gHtN4jU69W0) 99 | * [The Applications Of Deep Learning On Traffic Identification](https://www.youtube.com/watch?v=B7OKgC3AJVM) 100 | * [Defending Networks With Incomplete Information: A Machine Learning Approach](https://www.youtube.com/watch?v=_0CRSF6yPB4) 101 | * [Machine Learning & Data Science](https://vimeo.com/112702666) 102 | * [Advances in Cloud-Scale Machine Learning for Cyber-Defense](https://www.youtube.com/watch?v=skSIIvvZFIk) 103 | * [Applied Machine Learning: Defeating Modern Malicious Documents](https://www.youtube.com/watch?v=ZAuCEgA3itI) 104 | * [Automated Prevention of Ransomware with Machine Learning and GPOs](https://www.rsaconference.com/writable/presentations/file_upload/spo2-t11_automated-prevention-of-ransomware-with-machine-learning-and-gpos.pdf) 105 | * [Learning to Detect Malware by Mining the Security Literature](https://www.usenix.org/conference/enigma2017/conference-program/presentation/dumitras) 106 | * [Clarence Chio and Anto Joseph - Practical Machine Learning in Infosecurity](https://conference.hitb.org/hitbsecconf2017ams/materials/D1T3%20-%20Clarence%20Chio%20and%20Anto%20Joseph%20-%20Practical%20Machine%20Learning%20in%20Infosecurity.pdf) 107 | * [Advances in Cloud-Scale Machine Learning for Cyberdefense](https://www.youtube.com/watch?v=6Slj2FV9CLA) 108 | * [Machine Learning-Based Techniques For Network Intrusion Detection](https://www.youtube.com/watch?v=-EUJgpiJ8Jo) 109 | * [Practical Machine Learning in Infosec](https://www.youtube.com/watch?v=YF2dm6GZf2U) 110 | * [AI and Security](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/07/AI_and_Security_Dawn_Song.pdf) 111 | * [AI in InfoSec](https://vimeo.com/230502013) 112 | * [Beyond the Blacklists: Detecting Malicious URL Through Machine Learning](https://www.youtube.com/watch?v=Kd3svc9HZ0Y) 113 | 114 | ## [↑](#table-of-contents) Tutorials 115 | 116 | * [Click Security Data Hacking Project](http://clicksecurity.github.io/data_hacking/) 117 | * [Using Neural Networks to generate human readable passwords](http://fsecurify.com/using-neural-networks-to-generate-human-readable-passwords/) 118 | * [Machine Learning based Password Strength Classification](http://fsecurify.com/machine-learning-based-password-strength-checking/) 119 | * [Using Machine Learning to Detect Malicious URLs](http://fsecurify.com/using-machine-learning-detect-malicious-urls/) 120 | * [Big Data and Data Science for Security and Fraud Detection](http://www.kdnuggets.com/2015/12/big-data-science-security-fraud-detection.html) 121 | * [Using deep learning to break a Captcha system](https://deepmlblog.wordpress.com/2016/01/03/how-to-break-a-captcha-system/) 122 | * [Data mining for network security and intrusion detection](https://www.r-bloggers.com/data-mining-for-network-security-and-intrusion-detection/) 123 | * [An Introduction to Machine Learning for Cybersecurity and Threat Hunting](http://blog.sqrrl.com/an-introduction-to-machine-learning-for-cybersecurity-and-threat-hunting) 124 | * [Applying Machine Learning to Improve Your Intrusion Detection System](https://securityintelligence.com/applying-machine-learning-to-improve-your-intrusion-detection-system/) 125 | * [Analyzing BotNets with Suricata & Machine Learning](http://blogs.splunk.com/2017/01/30/analyzing-botnets-with-suricata-machine-learning/) 126 | * [fWaf – Machine learning driven Web Application Firewall](http://fsecurify.com/fwaf-machine-learning-driven-web-application-firewall/) 127 | * [Deep Session Learning for Cyber Security](https://blog.cyberreboot.org/deep-session-learning-for-cyber-security-e7c0f6804b81#.eo2m4alid) 128 | * [DMachine Learning for Malware Detection](http://resources.infosecinstitute.com/machine-learning-malware-detection/) 129 | * [ShadowBrokers Leak: A Machine Learning Approach](https://marcoramilli.blogspot.ru/2017/04/shadowbrokers-leak-machine-learning.html) 130 | * [Practical Machine Learning in Infosec - Virtualbox Image and Stuff](https://docs.google.com/document/d/1v4plS1EhLBfjaz-9GHBqspTH7vnrJfqLrLjeP9k9i9A/edit) 131 | * [A Machine-Learning Toolkit for Large-scale eCrime Forensics](http://blog.trendmicro.com/trendlabs-security-intelligence/defplorex-machine-learning-toolkit-large-scale-ecrime-forensics/) 132 | 133 | 134 | ## [↑](#table-of-contents) Courses 135 | 136 | * [Data Mining for Cyber Security by Stanford](http://web.stanford.edu/class/cs259d/) 137 | * [Data Science and Machine Learning for Infosec](http://www.pentesteracademy.com/course?id=30) 138 | 139 | ## [↑](#table-of-contents) Miscellaneous 140 | 141 | * [System predicts 85 percent of cyber-attacks using input from human experts](http://news.mit.edu/2016/ai-system-predicts-85-percent-cyber-attacks-using-input-human-experts-0418) 142 | * [A list of open source projects in cyber security using machine learning](http://www.mlsecproject.org/#open-source-projects) 143 | 144 | ## License 145 | 146 | ![cc license](http://i.creativecommons.org/l/by-sa/4.0/88x31.png) 147 | 148 | This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International](http://creativecommons.org/licenses/by-sa/4.0/) license. 149 | --------------------------------------------------------------------------------