└── README.md /README.md: -------------------------------------------------------------------------------- 1 | [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) 2 | ![](https://img.shields.io/github/stars/patternex/awesome-ml-for-threat-detection) 3 | 4 | # Awesome ML for Threat Detection 5 | 6 | A curated list of resources to deep dive into the intersection of applied machine learning and threat detection. 7 | 8 | 9 | ## Table of Contents 10 | 11 | - [Threat detection papers](#threat-detection-papers) 12 | - [Threat characterization papers](#threat-characterization-papers) 13 | - [Machine learning systems and operationalization papers](#machine-learning-systems-and-operationalization-papers) 14 | - [PatternEx papers](#patternex-papers) 15 | - [Other machine learning for cybersecurity repos](#other-machine-learning-for-cybersecurity-repos) 16 | 17 | 18 | ### Threat detection papers 19 | * **Malicious URL Detection using Machine Learning: A Survey**. Doyen Sahoo, Chenghao Liu and Steven C.H. Hoi. *arXiv, 2017*. [[PDF]](https://arxiv.org/pdf/1701.07179) 20 | * **SoK: Applying Machine Learning in Security - A Survey**. Heju Jiang, Jasvir Nagra, Parvez Ahammad. *arXiv, 2016*. [[PDF]](https://arxiv.org/pdf/1611.03186) 21 | * **Predicting Domain Generation Algorithms with Long Short-Term Memory Networks**. Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja and Daniel Grant. *arXiv, 2016*. [[PDF]](https://arxiv.org/pdf/1611.00791) 22 | * **Network connectivity graph for malicious traffic dissection**. Enrico Bocchi, Luigi Grimaudo, Marco Mellia, Elena Baralis, Sabyasachi Saha, Stanislav Miskovic, Gaspar Modelo-Howard, Sung-Ju Lee. *24th International Conference on Computer Communication and Networks (ICCCN), 2015*. [[PDF]](https://iris.polito.it/retrieve/handle/11583/2625360/76615/connectivity_graph.pdf) 23 | * **Detecting malicious domains via graph inference**. Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, William Horne. *ACM Conference on Computer and Communications Security, 2014*. [[PDF]](https://link.springer.com/content/pdf/10.1007/978-3-319-11203-9_1.pdf) 24 | * **Nazca: Detecting Malware Distribution in Large-Scale Networks.** Luca Invernizzi, Stanislav Miskovic, Ruben Torres, Sabyasachi Saha, Sung-ju Lee, Marco Mellia, Christopher Kruegel and Giovanni Vigna. *NDSS Symposium, 2014*. [[PDF]](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.438.2760&rep=rep1&type=pdf) 25 | * **Machine learning for identifying botnet network traffic**. Matija Stevanovic and Jens Myrup Pedersen. *Aalborg University (Technical report), 2013*. [[PDF]](https://vbn.aau.dk/ws/portalfiles/portal/75720938/paper.pdf) 26 | * **Survey on network‐based botnet detection methods**. Sebastián García, Alejandro Zunino and Marcelo Campo. *Security and Communication Networks, 2013*. [[PDF]](https://onlinelibrary.wiley.com/doi/pdf/10.1002/sec.800) 27 | * **Detecting insider threats in a real corporate database of computer usage activity**. Ted E. Senator et al. *19th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD), 2013*. [[PDF]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.480.1037&rep=rep1&type=pdf) 28 | * **Botnet detection based on traffic behavior analysis and flow intervals**. David Zhao, Issa Traore, Bassam Sayed, Wei Lu, Sherif Saad, Ali Ghorbani, Dan Garant. *Computers & Security, 2013*. [[PDF]](https://www.researchgate.net/profile/Sherif_Saad/publication/259117704_Botnet_detection_based_on_traffic_behavior_analysis_and_flow_intervals/links/5a303435aca27271ec89f8e5/Botnet-detection-based-on-traffic-behavior-analysis-and-flow-intervals.pdf) 29 | 30 | 31 | ### Threat characterization papers 32 | * **A Taxonomy of Network Threats and the Effect of Current Datasets on Intrusion Detection Systems**. Hanan Hindy, David Brosset, Ethan Bayne, Amar Seeam, Christos Tachtatzis, Robert Atkinson, Xavier Bellekens. *IEEE Access, 2020*. [[PDF]](https://ieeexplore.ieee.org/iel7/6287639/8948470/09108270.pdf) 33 | * **A lustrum of malware network communication: Evolution and insights**. Chaz Lever, Platon Kotzias, Davide Balzarotti, Juan Caballero and Manos Antonakakis. *IEEE Symposium on Security and Privacy, 2017*. [[PDF]](http://www.ieee-security.org/TC/SP2017/papers/409.pdf) 34 | * **A comprehensive measurement study of domain generating malware**. Daniel Plohmann, Khaled Yakdan, Michael Klatt, Johannes Bader, Elmar Gerhards-Padilla. *25th USENIX Security Symposium, 2016*. [[PDF]](https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_plohmann.pdf) 35 | * **A Survey on Botnet Architectures, Detection and Defences.** Muhammad Mahmoud, Manjinder Nir and Ashraf Matrawy. *International Journal of Network Security, 2015*. [[PDF]](http://ijns.jalaxy.com.tw/contents/ijns-v17-n3/ijns-v17-n3.pdf#page=48) 36 | * **Practical Comprehensive Bounds on Surreptitious Communication over DNS**. Vern Paxson, Mihai Christodorescu, Mobin Javed, Josyula Rao, Reiner Sailer, Douglas Lee Schales, and Marc Ph. Stoecklin, Kurt Thomas, Wietse Venema and Nicholas Weaver. *22nd USENIX Security Symposium, 2013*. [[PDF]](https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_paxson.pdf) 37 | * **Analysis of security data from a large computing organization**. A. Sharma, Z. Kalbarczyk, J. Barlow and R. Iyer. *IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), 2011.* [[PDF]](http://www.academia.edu/download/40319777/Analysis_of_security_data_from_a_large_c20151123-15766-14wy5bo.pdf) 38 | 39 | 40 | ### Machine learning systems and operationalization papers 41 | * **A survey of methods for explaining black box models**. Riccardo Guidotti profile imageRiccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, Dino Pedreschi. *ACM Computing Surveys, 2018*. [[PDF]](https://dl.acm.org/doi/pdf/10.1145/3236009) 42 | * **Hidden Technical Debt in Machine Learning Systems**. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison. *Advances in Neural Information Processing Systems (NIPS), 2015*. [[PDF]](http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf) 43 | * **Local Outlier Detection with Interpretation**. Xuan Hong Dang, Barbora Micenková, Ira Assent and Raymond T. Ng. *European Conference on Machine Learning and Knowledge Discovery in Databases, 2013.* [[PDF]](https://link.springer.com/content/pdf/10.1007/978-3-642-40994-3_20.pdf) 44 | * **Interpreting and unifying outlier scores**. Hans-Peter Kriegel, Peer Kroger, Erich Schubert and Arthur Zimek. *SIAM International Conference on Data Mining, 2011*. [[PDF]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.232.2719&rep=rep1&type=pdf) 45 | * **Outside the Closed World: On Using Machine Learning for Network Intrusion Detection**. Robin Sommer and Vern Paxson. *IEEE Symposium on Security and Privacy, 2010*. [[PDF]](https://www.icir.org/robin/papers/oakland10-ml.pdf) 46 | * **Converting output scores from outlier detection algorithms into probability estimates**. Jing Gao and Pang-ning Tan. *International Conference on Data Mining (ICDM), 2006.* [[PDF]](https://core.ac.uk/download/pdf/193238184.pdf) 47 | 48 | 49 | ### PatternEx papers 50 | * **The Holy Grail of “Systems for Machine Learning”: Teaming humans and machine learning for detecting cyber threats**. Ignacio Arnaldo and Kalyan Veeramachaneni. *ACM SIGKDD Explorations Newsletter 21, 2019*. [[PDF]](https://www.kdd.org/exploration_files/5._CR_18._The_challenges_in_teaming_humans_-_Final.pdf) 51 | * **Shooting the moving target: machine learning in cybersecurity**. Ankit Arun and Ignacio Arnaldo. *USENIX Conference on Operational Machine Learning (OpML), 2019.* [[PDF]](https://www.usenix.org/system/files/opml19papers-arun.pdf) 52 | * **eX2: a framework for interactive anomaly detection**. Ignacio Arnaldo, Kalyan Veeramachaneni, Mei Lam. *Intelligent User Interfaces Workshops, 2019*. [[PDF]](http://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-2.pdf) 53 | * **Acquire, adapt, and anticipate: continuous learning to block malicious domains**. Ignacio Arnaldo, Ankit Arun, Sumeeth Kyathanahalli, Kalyan Veeramachaneni. *IEEE international conference on Big Data, 2018*. [[IEEE Link]](https://ieeexplore.ieee.org/document/8622197) 54 | * **Learning representations for log data in cybersecurity**. Ignacio Arnaldo, Alfredo Cuesta-Infante, Ankit Arun, Mei Lam, Costas Bassias and Kalyan Veeramachaneni. *International Conference on Cyber Security Cryptography and Machine Learning, 2017*. [[PDF]](https://dai.lids.mit.edu/wp-content/uploads/2018/02/2017_CSCML_Learning_log_representations_camera_ready_v2-3-1-1.pdf) 55 | * **AI2: Training a Big Data Machine to Defend**. Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constantinos Bassias and Ke Li. *2nd IEEE International Conference on Big Data Security on Cloud, 2016*. [[PDF]](https://dai.lids.mit.edu/wp-content/uploads/2017/10/AI2_Paper.pdf) 56 | 57 | 58 | ### Other machine learning for cybersecurity repos 59 | * [Awesome Machine Learning for Cyber Security](https://github.com/jivoi/awesome-ml-for-cybersecurity) 60 | * [Awesome Machine Learning And Cybersecurity](https://github.com/mebiux/Awesome-ML-Cybersecurity) 61 | * [Machine Learning for Cyber Security](https://github.com/wtsxDev/Machine-Learning-for-Cyber-Security) 62 | * [Machine Learning and Cyber Security Resources](https://github.com/dleyanlin/Machine-Learning-and-Cyber-Security-Resources) 63 | 64 | 65 | ## Note 66 | 67 | The intial intent was to create a repo pointing to our own papers only (PatternEx papers) but we thought it made sense to also include papers that shaped our understanding of this space, enjoy! 68 | --------------------------------------------------------------------------------