├── ML-notes.md ├── README.md ├── TF-IDF.md ├── Tricks-improve-windows-PC-performance.md ├── anonymous-presence-on-internet.md ├── big-data-topics.md ├── block-email-attachment.md ├── blue-team-tips.md ├── bookmarks.md ├── bro_conn_history.md ├── bro_conn_states.md ├── building-word-list.md ├── critical-infra-security.md ├── cyber-security.md ├── detect-compromised-linux-machine.md ├── dns.md ├── encrypted-traffic-fingerprinting.md ├── full-text-search.md ├── icmp-codes.md ├── interview-fun.md ├── linux-auth-log.md ├── linux-forensics.md ├── log-files-and-journalctl.md ├── logs vs metrics.md ├── machine learning terms ├── malware-detection-methods.md ├── malwares.md ├── netflow traffic classification use cases.md ├── network-security-monitoring.md ├── nmap-nse-scripts.md ├── osquery-threat-hunting.md ├── pandas scaling.md ├── quantum-notes.md ├── replace-linux-on-smartphone.md ├── sandbox-drawbacks.md ├── scap-security-compliance.md ├── scoring classification.md ├── security-guidance.md ├── security-testing.md ├── signs-of-compromise.md ├── source port 0 traffic ├── system-base-line-building.md ├── tap-vs-span port.md ├── things-to-explore.md ├── threat-feeds.md ├── useful-commands.md ├── vulnerability-management.md ├── web-logs-iocs.md ├── weekly-report-template.md └── why-time-series-databases.md /ML-notes.md: -------------------------------------------------------------------------------- 1 | ## Machine learning vs traditional programming 2 | 3 | Artifical intelligence is an umbrella that contain other realms like image processing, cognitive science, neural network and much more. 4 | The core idea is computer not only just use pre-written algorithm, but learns how to solve the problem itself. 5 | 6 | Arthur Samuel - definition - ML is a field of study that gives computers the ability to learn without being explicitly programmed. 7 | 8 | In traditional programming you hard code the behaviour of the program. In machine learning, you leave a lot of that to machine to learn from data. 9 | 10 | ML is used in cases where traditional programming strategy falls behnind and it's not enough to fully implement a certain task.e.g. prediction of currency price.. It depends on many factors like country, location, its image, GDP etc. To improve accuracy, you may need many parameters. If you write your program logic with fixed parameters, your algorithm accuracies will not grow. 11 | So, instead of developing algorithm on its own, you need to collect historical data that can be used for model building. 12 | The end result is a model can predict the result more accurately. 13 | 14 | Ref - https://towardsdatascience.com/machine-learning-vs-traditional-programming-c066e39b5b17 15 | 16 | ## Data science vs Machine learning 17 | ML and statistics are part of data science. The word 'learning' means it depends on some kind of data. This encompasses many technique such as regression, naive bayes or clustering. But all the techniques does not fit in this category. e.g. unsupervised clustering- clusters are formed without any prior knowledge. Humans will label the clusters. 18 | 19 | Data science is more than ML. Data in data science may not come from machine or mechanical process. It encompasses many things - many aspects of data processing and not just algorithmic or statisitcal aspect - data integration, data visualization, data engineering, data in production mode, data driven decisions etc. 20 | 21 | ### Machine learning vs rule based learning 22 | ML and rule based systems are widely used to make inferences from data. Forget the hype about ML, rule based systems have a place in system design. 23 | Rule-based system are simple kind of artifical intelligence which uses a series of IF-THEN-ELSE statements to guide computer to a conclusion. 24 | Rule based system have set of facts and set of rules. 25 | 26 | Set-of-facts: It is a knowledge base. It' used for formation of rules. 27 | Set-of-rules: It's a rule engine. Rules describe the 28 | relationship between IF and THEN statements. 29 | 30 | Full rule based systems are built from combined knowledge of human experts in problem domain. The domain experts specify all the steps to make a decision and how to handle special cases. The number of special rule cases may grow over a period of time! 31 | 32 | In machine learning, instead of emulating decision making process of an expert, you take outcomes from experts. Focussing on outcomes (rather than decision making process) makes machine learning more flexible and less suspectible to problems in rule based system. 33 | ML also uses probabilistic and stastical methods rather than rules. The basic moto is - given that we know about historical outcomes, what can we say about future outcomes. 34 | Ref - https://deparkes.co.uk/2017/11/24/machine-learning-vs-rules-systems/ 35 | 36 | ### ML vs Deep learning 37 | The main difference between deep and machine learning is, machine learning models become better progressively but the model still needs some guidance. If a machine learning model returns an inaccurate prediction then the programmer needs to fix that problem explicitly but in the case of deep learning, the model does it by himself. Automatic car driving system is a good example of deep learning. 38 | 39 | Deep learning and machine learning both are the subsets of AI. 40 | “AI is a ability of computer program to function like a human brain ” 41 | 42 | Machine learning is empowering computer systems with the ability to “learn”. The intention of ML is to enable machines to learn by themselves using the provided data and make accurate predictions. ML is a subset of artificial intelligence; in fact, it’s simply a technique for realizing AI. 43 | Ref - 44 | * https://towardsdatascience.com/clearing-the-confusion-ai-vs-machine-learning-vs-deep-learning-differences-fce69b21d5eb 45 | * https://emerj.com/ai-glossary-terms/what-is-machine-learning/ 46 | 47 | 48 | ## Deep learning disadvantages 49 | * it does not work well with small data. For high accuracies, you need large datasets. 50 | * In practice, it is hard and expensive. you need computing and data resources along with human expertize. 51 | * Deep learning is not easily interpreted.It's difficult to validate. Hyper-parameters and network design are also challenge due to absence of theortical foundation. 52 | 53 | ## Disadvantges of ML 54 | * ML requires massive data sets to train and these should be unbiased and of good quality. So, you have to wait for data to be generated. 55 | * ML requires enough time to let the algorithm to learn to achieve sufficient accuracy and relevancy. So, it needs massive computing and storage resources. 56 | * Selection of right algorithm and interpretation of results is major challenge. 57 | * ML learning is autonomous and is suspectible to errors. Suppose you train your algorithm on small datasets and it may end up making biased predictions. 58 | 59 | Ref - https://data-flair.training/blogs/advantages-and-disadvantages-of-machine-learning/ 60 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Notes 2 | My notes on various topics in Cyber security 3 | -------------------------------------------------------------------------------- /TF-IDF.md: -------------------------------------------------------------------------------- 1 | ## TF-IDF 2 | 3 | TF-IDF is a method to generate features from text by multiplying the frequency of a term (usually a word) in a document (the Term Frequency, or TF) by the importance (the Inverse Document Frequency or IDF) of the same term in an entire corpus. This last term weights less important words (e.g. the, it, and etc) down, and words that don’t occur frequently up. 4 | 5 | IDF is calculated as: 6 | 7 | IDF(t) = log_e(Total number of documents / Number of documents with term t in it). 8 | 9 | An example (from www.tfidf.com/) illustrates the concept nicely: 10 | 11 | Consider a document containing 100 words in which the word cat appears 3 times. The term frequency (i.e., tf) for cat is then (3 / 100) = 0.03. Now, assume we have 10 million documents and the word cat appears in one thousand of these. Then, the inverse document frequency (i.e., idf) is calculated as log(10,000,000 / 1,000) = 4. Thus, the Tf-idf weight is the product of these quantities: 0.03 * 4 = 0.12. 12 | 13 | TF-IDF is very useful in text classification and text clustering. It is used to transform documents into numeric vectors, that can easily be compared. 14 | 15 | For string matching, algorithms like Jaro-Winkler or Levenshtein distance measures are used commonly. However, these algorithms are not suitable to find string similarities in large datasets as their responses are slow. Using TF-IDF with N-grams can be used to find similar strings as it transforms the problem into matrix multiplication problem and is computationally much cheaper. 16 | -------------------------------------------------------------------------------- /Tricks-improve-windows-PC-performance.md: -------------------------------------------------------------------------------- 1 | ## Some tips to improve Windows PC Performance 2 | 3 | ### High CPU or Disk usage caused by ntoskrnl.exe process in Windows 10 4 | 5 | Windows NT kernel is responsible for managing various services like memory management, process management, hardware resoure management etc.Sometimes, it is seen that this process utilizes too much of CPU/Disk and cause the slowdown of computer especially at startup. 6 | A minor registry tweak will solve the high CPU/Disk utilization issue. For this, you have to open registry editor ( Open "Run"->"Regedit") and modify the following registry setting: 7 | 8 | 1. Go to section - HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management 9 | 2. Change/modify key value of "ClearPageFileAtShutdown" from 0 to 1. 10 | 11 | ### System File Checker (SFC) 12 | System File Checker (SFC) is a utility in Windows that allows users to scan for corruptions in Windows system files and restore corrupted files. To run, open command prompt "run"->"cmd " and type 13 | ``` 14 | C:\Windows\system32> sfc /scannow 15 | ``` 16 | Note - sfc program should be run in administrative mode. 17 | 18 | ### Deployment Image servicing and management (DISM) 19 | A DISM scan can be used to repair and prepare Windows images, including the Windows Recovery Environment, Windows Setup, and Windows PE. To run a DISM scan, open Command Prompt as administrator and type this command: "DISM /Online /Cleanup-Image /RestoreHealth". Press Enter on the keyboard to execute it. 20 | ``` 21 | C:\Windows\system32> DISM /Online /Cleanup-Image/RestoreHealth 22 | ``` 23 | 24 | * Ref - https://blog.pcrisk.com/windows/12536-ntoskrnlexe-process-is-causing-high-cpu-or-disk-usage-how-to-fix-it 25 | 26 | -------------------------------------------------------------------------------- /anonymous-presence-on-internet.md: -------------------------------------------------------------------------------- 1 | ### Keeping anonymous on Internet 2 | * TOR is the proven de-facto service to provide general anonymity 3 | * Various freee/commercial VPN services also provide good privacy 4 | * There are some peer-to-peer networks like i2p and freenet in development. You have to 5 | dug deep to setup on your own! 6 | * Virtual machine(s) - in one's own environment to provide security and in some way privacy; 7 | depends on the use 8 | * Use of proxychains ( You can use them with Kali VM) - yes, these are not 9 | completely anonymous; but the traffic will be tunneled through various routes 10 | * Tails operating system is a good option as it uses Tor in the background: available as easy to setup VM and deletes all the user activity after you close the VM. 11 | 12 | -------------------------------------------------------------------------------- /big-data-topics.md: -------------------------------------------------------------------------------- 1 | ## MapReduce 2 | Mapreduce(MR) is the computing paradigm used in Hadoop cluster for parallel processing of large datasets.Its hypothesis is designed by google to achieve: 3 | * parallel execution 4 | * data distribution 5 | * fault tolerance 6 | MR processes data in the form of key-value pairs. A key-value(KV) pair is a mapping element between the linked data items - key and its value. 7 | Mapreduce architecture consists of two stages- map stage and reduce stage ( along with intermediate process like shuffling, splitting, sorting). Actual MR process happens in task traker. 8 | 9 | ## How Hadoop provides solution to big data problems: 10 | 11 | ### Storage issues 12 | HDFS provides distributed way to store big data. Data is stored in blocks across datanodes and you can specify size of blocks.e.g. 512MB is datasize and you have 128MB hadoop data blocks, HDFS will create 4 blocks and store it across different nodes. It will also replicate data blocks on different datanodes. It focuses on horizontal scaling instead of vertical scaling. 13 | 14 | 15 | ### Variety of data 16 | You can store all kinds of data - structured, unstrucutred or semi-structured data. There is no pre-dumping schema validation. It follows write onces and read many model. 17 | 18 | ### Accessing and processing data in faster way 19 | This is one of the challange of big data. In order to solve it, the processing is moved towards data and not the data towards processing/computing node.-i.e. moving data to master mode and processing it. In mapreduce, processing logic is sent to various slave node and data is processed parallely across different slave nodes. Processed results are sent to master node where the results are merged and response is sent to the client. 20 | 21 | ### YARN 22 | YARN is used for resource management between data nodes and master nodes. In YARN architecture, we have ResourceManager and NodeManager. ResourceManager might or might not be configured on the same machine as NameNode. But, NodeManagers should be configured on the same machine where DataNodes are present. 23 | 24 | ### Use case where Hadoop is not effective 25 | * Low latency data access - quick access to small parts of data 26 | * Multiple data modification - Hadoop is better fit only if we are concerned about reading the data and not modifying data 27 | * Lots of small files - Hadoop is suitable for scenerios where we have few but large files. 28 | 29 | ### References 30 | * http://a4academics.com/tutorials/83-hadoop/840-map-reduce-architecture 31 | * https://www.edureka.co/blog/what-is-hadoop/ 32 | 33 | ## Cloud computing vs Grid computing vs Cluster computing 34 | 35 | ### Grid Computing 36 | * Loosely coupled(Decentralization) 37 | * Diversity and Dynamism 38 | * Distributed Job Management & scheduling 39 | 40 | ### Cloud computing 41 | * Dynamic computing infrastructure 42 | * IT servicecentric approach 43 | * Self service based usage model 44 | * Minimally or self managed platform 45 | 46 | ### Cluster computing 47 | * Tightlycoupled systems 48 | * Single system image 49 | * Centralized Job management & scheduling system 50 | 51 | Distributed Computing 52 | It is a technique to solve a single large problem by breaking it down into several tasks where each task is computed in the individual computers of the distributed system. 53 | 54 | ### CAP theorm 55 | 56 | This is proposed by Eric Brewer in 2000 with a set of 3 basic requirements for distributed system consisting of multiple nodes: 57 | * Consistency - All the servers will have same data. So, the users will get same copy regardless of which server they query 58 | * Availability - The system will always respond to request ( even if it's not having the latest data) 59 | * Partition tolerance - The system will continue to operate as a whole even if individual server fails or can't be reached. 60 | 61 | All 3 requirements are impossible to be met.So, a combination of 2 is chosen and is the deciding factor while technology is used. 62 | 63 | * Ref - https://www.quora.com/What-is-the-relation-between-SQL-NoSQL-the-CAP-theorem-and-ACID 64 | -------------------------------------------------------------------------------- /block-email-attachment.md: -------------------------------------------------------------------------------- 1 | ### Block unsafe file types in email messages (attachements) 2 | Google and Microsoft are most heavyweight in Internet World and receive maximum spam messages for their Gmail and Outlook services. In the support pages for each of these services, they have listed the file type extensions that are being blocked. If you are managing your own E-mail server, it's time to take advantage of these extension to secure your organization. 3 | 4 | #### Gmail blocked extensions: 5 | ``` 6 | ade 7 | adp 8 | apk 9 | appx 10 | appxbundle 11 | bat 12 | cab 13 | chm 14 | cmd 15 | com 16 | cpl 17 | dll 18 | dmg 19 | exe 20 | hta 21 | ins 22 | isp 23 | iso 24 | jar 25 | js 26 | jse 27 | lib 28 | lnk 29 | mde 30 | msc 31 | msi 32 | msix 33 | msixbundle 34 | msp 35 | mst 36 | nsh 37 | pif 38 | ps1 39 | scr 40 | sct 41 | shb 42 | sys 43 | vb 44 | vbe 45 | vbs 46 | vxd 47 | wsc 48 | wsf 49 | wsh 50 | ``` 51 | #### Microsoft blocked extensions 52 | ``` 53 | ade 54 | adp 55 | app 56 | asp 57 | aspx 58 | asx 59 | bas 60 | bat 61 | cer 62 | chm 63 | cmd 64 | cnt 65 | com 66 | cpl 67 | crt 68 | csh 69 | der 70 | diagcab 71 | exe 72 | fxp 73 | gadget 74 | grp 75 | hlp 76 | hpj 77 | hta 78 | htc 79 | inf 80 | ins 81 | isp 82 | its 83 | jar 84 | jnlp 85 | js 86 | jse 87 | ksh 88 | lnk 89 | mad 90 | maf 91 | mag 92 | mam 93 | maq 94 | mar 95 | mas 96 | mat 97 | mau 98 | mav 99 | maw 100 | mcf 101 | mda 102 | mdb 103 | mde 104 | mdt 105 | mdw 106 | mdz 107 | msc 108 | msh 109 | msh1 110 | msh2 111 | mshxml 112 | msh1xml 113 | msh2xml 114 | msi 115 | msp 116 | mst 117 | msu 118 | ops 119 | osd 120 | pcd 121 | pif 122 | pl 123 | plg 124 | prf 125 | prg 126 | printerexport 127 | ps1 128 | ps1xml 129 | ps2 130 | ps2xml 131 | psc1 132 | psc2 133 | psd1 134 | psdm1 135 | pst 136 | py 137 | pyc 138 | pyo 139 | pyw 140 | pyz 141 | pyzw 142 | reg 143 | scf 144 | scr 145 | sct 146 | shb 147 | shs 148 | theme 149 | tmp 150 | url 151 | vb 152 | vbe 153 | vbp 154 | vbs 155 | vhd 156 | vhdx 157 | vsmacros 158 | vsw 159 | webpnp 160 | website 161 | ws 162 | wsc 163 | wsf 164 | wsh 165 | xbap 166 | xll 167 | xnk 168 | ``` 169 | Since there are duplicates, I have merged them to form a combined list and you can use it to block in postfix/sendmail configurations. 170 | ``` 171 | ade 172 | adp 173 | apk 174 | app 175 | appx 176 | appxbundle 177 | asp 178 | aspx 179 | asx 180 | bas 181 | bat 182 | cab 183 | cer 184 | chm 185 | cmd 186 | cnt 187 | com 188 | cpl 189 | crt 190 | csh 191 | der 192 | diagcab 193 | dll 194 | dmg 195 | exe 196 | fxp 197 | gadget 198 | grp 199 | hlp 200 | hpj 201 | hta 202 | htc 203 | inf 204 | ins 205 | iso 206 | isp 207 | its 208 | jar 209 | jnlp 210 | js 211 | jse 212 | ksh 213 | lib 214 | lnk 215 | mad 216 | maf 217 | mag 218 | mam 219 | maq 220 | mar 221 | mas 222 | mat 223 | mau 224 | mav 225 | maw 226 | mcf 227 | mda 228 | mdb 229 | mde 230 | mdt 231 | mdw 232 | mdz 233 | msc 234 | msh 235 | msh1 236 | msh1xml 237 | msh2 238 | msh2xml 239 | mshxml 240 | msi 241 | msix 242 | msixbundle 243 | msp 244 | mst 245 | msu 246 | nsh 247 | ops 248 | osd 249 | pcd 250 | pif 251 | pl 252 | plg 253 | prf 254 | prg 255 | printerexport 256 | ps1 257 | ps1xml 258 | ps2 259 | ps2xml 260 | psc1 261 | psc2 262 | psd1 263 | psdm1 264 | pst 265 | py 266 | pyc 267 | pyo 268 | pyw 269 | pyz 270 | pyzw 271 | reg 272 | scf 273 | scr 274 | sct 275 | shb 276 | shs 277 | sys 278 | theme 279 | tmp 280 | url 281 | vb 282 | vbe 283 | vbp 284 | vbs 285 | vhd 286 | vhdx 287 | vsmacros 288 | vsw 289 | vxd 290 | webpnp 291 | website 292 | ws 293 | wsc 294 | wsf 295 | wsh 296 | xbap 297 | xll 298 | xnk 299 | ``` 300 | ### References: 301 | * Blocked attachments in outlook - https://support.office.com/en-us/article/Blocked-attachments-in-Outlook-434752E1-02D3-4E90-9124-8B81E49A8519 302 | * Blocked attachments in gmail - https://support.google.com/mail/answer/6590?hl=en 303 | 304 | -------------------------------------------------------------------------------- /blue-team-tips.md: -------------------------------------------------------------------------------- 1 | ## Blue team tips 2 | 3 | ### Network diagram 4 | 5 | A good network diagram that presents a high level overview of network including ingress/egress points across all sites is a MUST and should include: 6 | 7 | * All routing devices, proxies or gateways that affect flow of traffic 8 | * External/Internal IP addresses of routing devices, proxies and gateways 9 | * Workstations, servers or other devices - IP address ranges, custom groupings 10 | 11 | Scan the network using nmap and build your attack surface. Find also vulnerabilities associated with the services using nmap plugins. 12 | 13 | ### Control Local Admin account 14 | Microsoft's local administrator password solution(Microsoft LAPS) should be used. If you have a single admin password for every machine, it's easy to compromise all the machines in lateral movement stage once the attacker is in the network. LAPS will create a unique admin password for each endpoint and is securely stored and managed in Active directory environment. However, it requires agent installation and GPO creation. 15 | 16 | ### Gain visibility in powershell execution 17 | Powershell has become a de-facto standard for attackers for penetration in any network. Some of the popular exploit frameworks are Powersploit, Empire, Nishang, PoshC2 and many others. Powershell comes by default in all Windows installation now a days and it makes the attackers job easy. On the top of it, no logging by default is enabled in Windows. 18 | So, as a defender, you have to enable powershell logging to gain visibility. But, it is enabled only in Powershell Ver 5 onwards and allows powershell module logging, scriptblock logging(input/output commands) and automatic suspicious script detection. It also de-obfuscates encoded powershell commands. The powershell logging results in windows event log codes in the range 400-4xx. 19 | 20 | ### Track process creation 21 | Tracking process execution is the key for anything malicious infecting the system(s) or running on the system(s). Process creation logging generates event ID 4688. 22 | 23 | ### Advanced system logging with Sysmon 24 | SysMon provide more information about parent process, file hashes, network connection, loading of DLLs/drivers etc. On the top of it, it's a free tool from Microsoft and every organization should have deployed this tool in their environment. 25 | 26 | ### Visualization of active directory environment 27 | In windows environment, it is essential to keep track of attack paths to domain controller and bloodhound is the tool you should definitely go for - https://github.com/BloodHoundAD/BloodHound 28 | It gives information like: 29 | 30 | ### Active directory defense 31 | You should definitely follow recommendations by Sean Metcalf available at https://adsecurity.org/?p=1684 32 | 33 | ### Audit endpoint 34 | It is recommended to audit the endpoint using CIS benchmarks or "HardeningAuditor" tool - https://github.com/cottinghamd/HardeningAuditor 35 | Also, Australian signals directorate has a great guide on Windows hardening - https://www.asd.gov.au/publications/protect/Hardening_Win10.pdf 36 | Please go through it carefully. 37 | 38 | If you wish, there are many good tips available here - 39 | https://www.sneakymonkey.net/2018/06/25/blue-team-tips/ 40 | 41 | 42 | 43 | 44 | -------------------------------------------------------------------------------- /bookmarks.md: -------------------------------------------------------------------------------- 1 | ### Interesting Bookmarks 2 | * Useful checklist for backend applications, covering networking, monitoring, logging, backups, secrets etc - https://medium.com/@aleksei.kornev/production-readiness-checklist-for-backend-applications-8d2b0c57ccec 3 | * Attack pattern detection and prediction - https://medium.com/@ensarseker1/attack-pattern-detection-and-prediction-108fc3d47f03 4 | * Malware analysis with visual pattern recognition - https://medium.com/@nkent/malware-analysis-with-visual-pattern-recognition-5a4d087c9d26 5 | * Malware classification using CNN - https://medium.com/@hugom1997/malware-classification-using-convolutional-neural-networks-step-by-step-tutorial-a3e8d97122f 6 | * Rangeforce Cyber security simulation training platform - https://rangeforce.com/wp-content/uploads/2020/03/A-Market-Guide-to-CyberSecurity-Simulation-Training-2020b.pdf 7 | * Reconstruct process trees from event logs - https://github.com/williballenthin/process-forest 8 | * Url analysis using Unfurl - https://lospi.net/python/unfurl/abrade/hacking/2018/02/08/unfurl-url-analysis.html 9 | * Tracking potential malicious files Belkasoft paper - https://belkasoft.com/whitepaper_tracking_potentially_malicious_files 10 | * Automating PCAP analysis using bash,Security Onion - https://medium.com/@mikecybersec/automating-pcap-parsing-with-linux-cli-bash-security-onion-780cb2b08b6e 11 | * Monitoring linux logs with Kibana - https://medium.com/@solnichkin.antoine/monitoring-linux-logs-with-kibana-and-rsyslog-4dfbbd287807 12 | * Learning cyber security, good links - https://github.com/1d8/CybersecLearning 13 | * Thematic for success in offensive cyber operations, NCC group - https://research.nccgroup.com/wp-content/uploads/2020/07/1992-Insight-Space-Technical-Deep-Dive-June-v2.pdf 14 | 15 | 16 | ### Modern web development 17 | * Access AJAX,Websockets,SSE in HTML - https://htmx.org/ 18 | * Build blazing fast, modern apps and websites with React - https://www.gatsbyjs.org 19 | 20 | ### Webinars 21 | * Fishing for network health using batfish - https://www.brighttalk.com/webcast/17628/391789 22 | 23 | ### Tracking evidence of program execution on windows 24 | * Forensic Artifacts: evidences of program execution on Windows systems - https://www.andreafortuna.org/2018/05/23/forensic-artifacts-evidences-of-program-execution-on-windows-systems/ 25 | * Evidence of execution on Windows - 26 | * https://blog.1234n6.com/2018/10/available-artifacts-evidence-of.html 27 | * https://blog.1234n6.com/2019/01/available-artifacts-evidence-of.html 28 | ### ICS reated 29 | * Spire is an open-source intrusion-tolerant SCADA system for the power grid - http://www.dsn.jhu.edu/spire/ 30 | * Prime: Byzantine Replication Under Attack - http://www.dsn.jhu.edu/prime/ 31 | * Spines is a generic messaging infrastructure that provides transparent unicast, multicast and anycast communication over dynamic, multi-hop networking environments - http://spines.org/ 32 | * SMesh is a seamless wireless mesh network - http://www.smesh.org/ 33 | * pvBrowser, cross platform process visualization engine - https://pvbrowser.de/pvbrowser/index.php 34 | 35 | ### Time series 36 | * Flexible time series feature extraction & processing library in python - https://github.com/predict-idlab/tsflex 37 | ### Log analysis 38 | * Predictive log analysis - https://github.com/animeshdutta888/System-Failure-Prediction-using-log-analysis 39 | ### Free Book 40 | * Joy of cryptography Book - https://joyofcryptography.com 41 | ### Virtual machines 42 | * Desired state configuration of VM - https://octo.vmware.com/introducing-virtual-machine-desired-state-configuration/ 43 | ### IoT 44 | * Fileless attacks on Linux based IoT devices - https://www.ics.uci.edu/~alfchen/fan_mobisys19.pdf 45 | -------------------------------------------------------------------------------- /bro_conn_history.md: -------------------------------------------------------------------------------- 1 | ### Connection history 2 | 3 | **Letter**|**Meaning** 4 | :-----:|:-----: 5 | s|SYN w/o the ACK bit set 6 | h|SYN+ACK ("handshake") 7 | a|pure ACK 8 | d|packet with payload ("data") 9 | f|packet with FIN bit set 10 | r|packet with RST bit set 11 | c|packet with a bad checksum 12 | t|packet with retransmitted payload 13 | i|inconsistent packet (e.g. FIN+RST bits set) 14 | q|multi-flag packet (SYN+FIN or SYN+RST bits set) 15 | ^|connection direction was flipped by Bro's heuristic 16 | -------------------------------------------------------------------------------- /bro_conn_states.md: -------------------------------------------------------------------------------- 1 | **Connection state**|**Meaning** 2 | :-----:|:-----: 3 | S0|Connection attempt seen 4 | S1|Connection established 5 | SF|Normal establishment and termination. Note that this is the same symbol as for state S1. You can tell the two apart because for S1 there will not be any byte counts in the summary 6 | REJ|Connection attempt rejected. 7 | S2|Connection established and close attempt by originator seen (but no reply from responder). 8 | S3|Connection established and close attempt by responder seen (but no reply from originator). 9 | RSTO|Connection established 10 | RSTR|Responder sent a RST. 11 | RSTOS0|Originator sent a SYN followed by a RST 12 | RSTRH|Responder sent a SYN ACK followed by a RST 13 | SH|Originator sent a SYN followed by a FIN 14 | SHR|Responder sent a SYN ACK followed by a FIN 15 | OTH|No SYN seen 16 | -------------------------------------------------------------------------------- /building-word-list.md: -------------------------------------------------------------------------------- 1 | ### Building word list 2 | Building word list is absolutely essential if you wish to do red teaming activities like password spraying or participation in CTFs. 3 | 4 | There are 4 main techniques that are generally used to generate word lists. 5 | * Using regular expressions 6 | * Extraction of words from website 7 | * Generation of words based on human heuristics or human profiling 8 | * Word list from keyboard random key walks 9 | 10 | #### Word list using regular expression 11 | Most humans have a tendency to set password(s) based on some patterns. In many organization(s), these patterns are fixed. We can make use of regular expressions to generate word list that potentially matches the existing patterns and find out passwords. 12 | We can use python module ```Exrex```(https://pypi.org/project/exrex/) - a command line tool that generates all — or random — matching strings to a given regular expression and more. 13 | 14 | How to install and its usage 15 | ``` 16 | $ pip install exrex 17 | $ exrex --help 18 | ``` 19 | 20 | #### Word list from extraction of word(s) from website 21 | CeWL (https://github.com/digininja/CeWL/) is a ruby app which spiders a given url to a specified depth, optionally following external links, and returns a list of words which can then be used for password crackers such as John the Ripper. 22 | 23 | CeWL also has an associated command line app, FAB (Files Already Bagged) which uses the same meta data extraction techniques to create author/creator lists. 24 | 25 | How to install and its usage 26 | ``` 27 | $ apt install cewl 28 | $ cewl --help 29 | $ cewl -d 2 -m 5 -w docswords.txt https://example.com 30 | ``` 31 | ### Word list from human profiling 32 | In many past security incidences, it has been found that employees are the weak link and can be easy target for setting an initial foothold for getting into the organization. People tend to set weak passwords that matches their personal interest and personal information. There is a tool ```CUPP``` (https://github.com/Mebus/cupp) that generates potential passwords based on personal information. 33 | 34 | How to install and its usage 35 | ``` 36 | $ git clone https://github.com/Mebus/cupp.git 37 | $ cd cupp 38 | $ python3 cupp.py -h 39 | $ python3 cupp.py -i 40 | ``` 41 | ### Word list from keyboard random key walks 42 | Keyboard random walk refers to a word-list which are made up of adjacent keys on the keyboard like 12345678, or 1qazxsw2. Of course, there are many ways, key random walks can be generated. There is a python module ```kwprocessor``` (https://github.com/hashcat/kwprocessor) for tracking such key walks. 43 | 44 | 45 | How to install and its usage 46 | ``` 47 | $ git clone https://github.com/hashcat/kwprocessor.git 48 | $ cd kwprocessor 49 | $ make 50 | ``` 51 | Keymaps folder contains keyboard layout for multiple languages and the routes folder has 7 pre-configured keymap walks that can be used to generate a word-list. 52 | 53 | Usage 54 | ``` 55 | $ /kwp basechars/full.base keymaps/en.keymap routes/2-to-10-max-3-direction-changes.route 56 | ``` 57 | This causes kwp to create multiple keymap walk combinations, of 2–10 characters with a maximum of 3 direction changes. 58 | 59 | In addition, there are popular tools like crunch ,a wordlist generator,(https://sourceforge.net/projects/crunch-wordlist/files/crunch-wordlist/crunch-3.6.tgz/downloadoad) where you can specify a standard character set or a character set you specify. crunch generates wordlists in both combination and permutation ways. 60 | 61 | Ref: 62 | * https://medium.com/owasp-chennai/building-word-lists-for-red-teamers-a8ba2d79ee3 63 | -------------------------------------------------------------------------------- /critical-infra-security.md: -------------------------------------------------------------------------------- 1 | ### Security of Critical infrastructure 2 | 3 | Some key areas to enable a secure, connected critical infrastructure: 4 | * Enabling methodologies for communicating between a combination of trusted, untrusted, and adversarial networks as well as trusted, untrusted, and potentially adversarial equipment. 5 | * Develop policy, guidelines, and suggestions for inspecting and whitelisting communications involving critical infrastructure including but not limited to ICS, SCADA, OT, and IoT. 6 | * Deterministic timelines of security and functionality software/firmware updates to critical infrastructure 7 | * Making available innovative, trusted architectures 8 | 9 | ### Cybersecurity Maturity Model Certification (CMMC) 10 | By developing the Cybersecurity Maturity Model Certification (CMMC), it is possible to normalize and standardizes cybersecurity preparedness. CMMC removes the disadvantages of a cybersecurity investment and requires an independently certified maturity level with well-defined guidelines in order to participate in certain acquisitions or to supply certain types of goods. This is good for cybersecurity because it reduces competitive threat from poorly secured Original Equipment Manufacturers (OEMs) and incentivizes positive security behaviors. 11 | 12 | ### Network segmentation 13 | To mitigate risks from potentially compromised equipment, it is best to assume that it is compromised already but it needs to be used regardless. The most common approach adopted is widespread physical network segmentation. 14 | In network segmentation , you are seperating various devices into multiple subnets and blocking packet routing across the gateway. By enforcing this, you are limiting attackers from propagating laterally to additional targets. 15 | Generally, network segmentation is achieved by logically dividing the devices into small subnets and placing a Next-Generation Firewall (NGFW) at the gateway, and only allowing approved communications (port, protocol, application, and recipient) to pass. 16 | Another type of network segmentation is physical separation, or “air-gapping”, where a set of subnets operates on a separate network that cannot be routed in or out of. This is widely used in national defense and nuclear applications because it makes the execution of zero-day exploits exponentially more difficult. 17 | 18 | ### Traffic inspection 19 | It's important to build traffic inspection mechanism(s) in the critical networks to capture unknown traffic patterns. 20 | Although implementing physical network segmentation is often recommended approach, it has its pros and cons - it keeps bad things out; but it keeps good things out as well. ICS and OT network need to be able to pass information such as control instructions and operating metrics. Further, technicians and engineers need to be 21 | able to perform maintenance. 22 | 23 | To move this data across the air-gap, sometimes a data diode is used. A data diode passes data in one direction only and it cannot be reversed. Although diodes help get data across the gap, they are simple devices; they don’t check to make sure it’s valid data and there are no policy violations. 24 | To overcome these limitations, the concept of data guard is introduced. Data guards inspect the traffic moving between 2 or more airgapped networks and provide byte-level deep content inspection, data validation and filtering that can be tailored to customer-specific security policies, requirements, and risks. 25 | 26 | With strong traffic inspection and enforcement across the air-gap, it's possible to eliminate major vectors of attacks. 27 | 28 | ### References: 29 | * https://securitydelta.nl/media/com_hsd/report/246/document/HSD-Rapport-Data-Diodes.pdf 30 | * https://www.zadara.com/wp-content/uploads/AirGap_Arrosoft_Solution_Brief.pdf 31 | -------------------------------------------------------------------------------- /cyber-security.md: -------------------------------------------------------------------------------- 1 | ### Cyber security aims 2 | * reduce the likehood of damaging cyber intrusion 3 | * detect potential intrusion 4 | * ensure that organization is prepared to respond if an intrusion occurs 5 | * maximize organization's resilience to the destructive cyber incident 6 | 7 | ### Reduce likehood of damaging cyber intrusion 8 | * Validate all remote access to the organization's network and investigate the non-geniune ones 9 | * Enforce privileged or admin access requires multi-factor authentication 10 | * All the softwares are up-to-date and all known vulnerabilities are patched. 11 | * All ports and protocols that are not essential are closed 12 | * Ensure that all cloud service accesses are reviewed and strong auth controls are in place 13 | 14 | ### Detect potential intrusion 15 | * Identify unexpected and/or unusual behaviour in network. Enable logging to better investigate the issues. 16 | * Entire network is protected by antivirus/antimalware software with up-to-date signatures 17 | -------------------------------------------------------------------------------- /detect-compromised-linux-machine.md: -------------------------------------------------------------------------------- 1 | ## Detecting compromised linux server - some commands 2 | 3 | ### Verify md5 checksum of RPM files 4 | 5 | ``` 6 | # rpm -qa | xargs rpm -V 7 | ``` 8 | 9 | ### Track network connections 10 | ``` 11 | # netstat -an 12 | # netstat -nalp 13 | # nestat -plant 14 | # ss -a -e -i 15 | ``` 16 | 17 | ### Watch traffic in detail on demand for a specific port 18 | ``` 19 | # tcpdump src port 6697 20 | ``` 21 | 22 | ### Process tree 23 | ``` 24 | # ps -auxwf 25 | ``` 26 | 27 | ### Deleted binaries still running 28 | ``` 29 | # ls -alR /proc/*/exe 2> /dev/null | grep -i deleted 30 | ``` 31 | 32 | ### Process command name/cmdline 33 | ``` 34 | # strings /proc//comm 35 | # strings /proc//cmdline 36 | ``` 37 | 38 | ### Real process path 39 | ``` 40 | # ls -la /proc//exe 41 | ``` 42 | ### Process environment 43 | ``` 44 | # strings /proc//environ 45 | ``` 46 | ### Process working directory 47 | ``` 48 | # ls -alR /proc/*/cwd 49 | ``` 50 | ### Processes running from tmp, dev directories 51 | ``` 52 | # ls -alR /proc/*/cwd 2> /dev/nulll | grep tmp 53 | # ls -alR /proc/*/cwd 2> /dev/nulll | grep dev 54 | ``` 55 | ### List all hidden directories 56 | ``` 57 | # find / -type d -name ".*" 58 | ``` 59 | 60 | ### Check for zero size logs 61 | ``` 62 | # ls -al /var/log/* 63 | ``` 64 | ### Dump audit logs 65 | ``` 66 | # utmpdump /var/log/wtmp 67 | # utmpdump /var/run/utmp 68 | # utmpdump /var/log/btmp 69 | ``` 70 | ### Track last logins 71 | ``` 72 | # last 73 | # lastb 74 | ``` 75 | ### Find logs with binary content 76 | ``` 77 | # grep [[:cntrl:]] /var/log/*.log 78 | ``` 79 | ### Check scheduled tasks 80 | ``` 81 | # crontab -l 82 | # atq 83 | # systemctl list-timers --all 84 | ``` 85 | ### Look for UID-0 and GID-0 86 | ``` 87 | # grep ":0:" /etc/passwd 88 | ``` 89 | ### Check sudoers file 90 | ``` 91 | # cat /etc/sudoers 92 | # cat /etc/group 93 | ``` 94 | ### Find all ssh authorized_keys files 95 | ``` 96 | # find / -name authorized_keys 97 | ``` 98 | ### history file for user 99 | ``` 100 | # find / -name *history 101 | ``` 102 | ### History files linked to /dev/null 103 | ``` 104 | # ls -laR / 2> /dev/null | grep *history | grep null 105 | ``` 106 | ### Find all hidden directories 107 | ``` 108 | # find / -type d -name ".*" 109 | ``` 110 | ### Find files modified within last day 111 | ``` 112 | # find / -mtime -1 113 | ``` 114 | ### Files/directories with no user/group name 115 | ``` 116 | # find / \( -nouser -o -nogroup \) -exec ls -lg {} \; 117 | ``` 118 | ### Immutable files and directories 119 | ``` 120 | # lsattr / -R 2> /dev/null | grep "\----i" 121 | ``` 122 | ### Find SUID/SGID files 123 | ``` 124 | # find / -type f \( -perm -04000 -o -perm -02000 \) -exec ls -lg {} \; 125 | ``` 126 | ### Find all executable files 127 | ``` 128 | # find / -type f -exec file -p '{}' \; | grep ELF 129 | ``` 130 | ### Find all executable files in tmp directory 131 | ``` 132 | # find /tmp -type f -exec file -p '{}' \; | grep ELF 133 | ``` 134 | Thanks to Sandrfly security - https://www.sandflysecurity.com/wp-content/uploads/2018/11/Linux.Compromise.Detection.Command.Cheatsheet.pdf 135 | -------------------------------------------------------------------------------- /dns.md: -------------------------------------------------------------------------------- 1 | ### Recursive vs Non-Recursive(Iterative) DNS query 2 | 3 | * Iterative DNS queries are the one in which DNS server is queried and returns an answer without querying other DNS servers. Iterative queries are non-recursive queries. 4 | 5 | * Recursive queries occur when DNS client requests information from DNS server that is set to query subsequent DNS servers until a definitive answer is returned to client. The queries made to subsequent DNS servers from the first DNS server are iterative queries. It may be noted that root server's are always iterative servers. 6 | 7 | A DNS server that supports recursive resolution is vulnerable to DOS (denial of service) attacks, DNS cache poisoning, unauthorized use of resources, and root name server performance degradation. 8 | 9 | #### Ref: 10 | * https://www.slashroot.in/difference-between-iterative-and-recursive-dns-query 11 | 12 | ### Some flags in DNS packet 13 | 14 | #### AA - Authoritative answer 15 | specifies if the responding name server is the authority for domain name in question 16 | * 0 - non-authoritative 17 | * 1 - authoritative 18 | 19 | #### TC - Truncated 20 | Indicates that only first 512 bytes of reply was returned 21 | * 0 - truncated 22 | * 1 - message truncated 23 | 24 | #### RD - Recursion desired 25 | Name server is directed to pursue query recursively 26 | * 0 - recursion not desired 27 | * 1 - recursion desired 28 | 29 | #### RA - Recursion available 30 | Indicates if recursive query support is available on name server 31 | * 0 - recursive query support not available 32 | * 1 - recursive query support available 33 | 34 | #### Z 35 | This flag is reserved for future use 36 | 37 | ## Tracking evil in DNS logs 38 | DNS logs ( either from PassiveDNS or Bro/Zeek logs) contain lot of useful information and it can be used to track down malware. Please find below some of the queries that you can use to extract malicious domains. 39 | 40 | ### Multiple level subdomains 41 | DNS queries usually do not use multiple subdomains. High number of subdomains might indicate 42 | that DNS is malicious. However, Content Delivery Networks (CDN) can be exception to this type 43 | of queries. 44 | ``` 45 | $ echo W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | tr -cd '.' | wc -c 46 | ``` 47 | If the number of dots are more than 5, it's safe to assume that something is not alright!! 48 | 49 | ### Domain length / DNS query length 50 | Total length of dns query is a good indicator of malicious communication as long queries are often used for data exfiltration or communication with C&C servers. The longer the length of request query, the risk that it might be malicious is larger. 51 | 52 | ``` 53 | $ echo -n W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | wc 54 | ``` 55 | The rule of thumb you can take is: 56 | if you find 63 characters in any part (subdomain) then the request is likely malicious. 57 | 58 | ### Domain Entropy 59 | It is seen that entropy of malicious domains is higher than entropy of legitimate domains. 60 | ``` 61 | $ echo -n W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | ent 62 | ``` 63 | Average entropy of top 11k Alexia domains is around 3.1 64 | However, relying on entropy alone can result in a lot number of false positive as entropy of CDN networks is also higher. So, you need to combine this indicator with other one like domain query length and it can be very effective. 65 | 66 | ### Mix of uppercase and lowercase letters 67 | If there are mixed upper/lower case characters in domains, it should be investigated as it might be base64 encoded data. 68 | Usually, most of the domain characters are either in lowercase or uppercase. 69 | ``` 70 | $ echo W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | tr -cd '[A-Z]' | wc -c 71 | 13 72 | $ echo W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | tr -cd '[a-z]' | wc -c 73 | 25 74 | ``` 75 | 76 | ### Non-alphanumeric DNS requests 77 | Domain name registration is allowed only with alphabet letters, digits and hypens. Domains which are using other characters like Punycode are rare. 78 | Non-alphanumeric DNS requests are very rare and in most cases, these are related to malicious behaviour. 79 | 80 | ``` 81 | $ host cmVkdGVhbS5wbA==.redteam.pl 82 | cmVkdGVhbS5wbA==.redteam.pl is an alias for redteam.pl. 83 | ``` 84 | ### Use of Punycode 85 | Punycode (RFC 3492) allows usage of special letters from alphabets other than english e.g. polish in domain names 86 | Punycode is not very popular and is often used in Phishing or malware campigns. So, it is a good practice to keep track of 87 | puny domains. 88 | 89 | ### Types of dns requests 90 | Common DNS queries are of Type - A, AAAA, and PTR. If you encounter unusual queries like AXFR, ANY, TXT, these need a closer look for investigation. 91 | 92 | ### TTL (time to live) 93 | A lower TTL value indicates the probability of malicious behaviour. But, it's not true in case of CDNs such as Cloudflare where TTL is 300 seconds. 94 | If you observe a TTL of 0-1 seconds, then it's likely a malicious domain. 95 | 96 | #### Ref: 97 | The following article cover a lot of DNS queries that indicate possible malicious DNS communication and it is highly recommended to go through it for more information - https://blog.redteam.pl/2019/08/threat-hunting-dns-firewall.html 98 | 99 | 100 | 101 | -------------------------------------------------------------------------------- /encrypted-traffic-fingerprinting.md: -------------------------------------------------------------------------------- 1 | ## Encrypted traffic analysis 2 | 3 | Most of the network traffic uses https protocol and it's difficult to get information about data payloads unless you have access to endpoints. 4 | 5 | By using and analyzing initial TLS handshakes, good visibility can be achieved in encrypted traffic to detect the following use cases: 6 | 7 | * Breach Detection 8 | * Insider and Advanced Threat Detection 9 | * High Risk Application Detection 10 | * Policy Violations 11 | * Encrypted Traffic Analytics 12 | 13 | ### Why fingerprinting 14 | Fingerprints in the digital world are similar to what human fingerprints are in the real world. 15 | A fingerprint is a group of information that can be used to detect software, network protocols, operating systems or hardware devices. 16 | 17 | Fingerprinting is used to correlate data sets in order to identify with high probability network services, operating system number and version, software applications, databases, configurations and more. Once the penetration tester has enough information, this fingerprinting data can be used as part of an exploit strategy against the target. 18 | 19 | ### How OS and network fingerprinting work? 20 | In order to detect OS, networks, services and application names and numbers, attackers will launch custom packets to the target. These packets will receive a response from the victim in the form of a digital signature. This signature is one of the keys to identify what software, protocols and OS is running the target device. 21 | 22 | Fingerprinting techniques are based on detecting certain patterns and differences in network packets generated by operating systems. These often analyze different types of packets and information such as TCP Window size, TCP Options in TCP SYN and SYN+ACK packets, ICMP requests, HTTP packets, DHCP requests, IP TTL values as well as IP ID values, etc. 23 | 24 | ### Active fingerprinting 25 | Active fingerprinting is the most popular type of fingerprinting in use. It consists of sending packets to a victim and waiting for the victim’s reply to analyze the results. This is often the easiest way to detect remote OS, network and services. It’s also the most risky as it can be easily detected by intrusion detection systems (IDS) and packet filtering firewalls. 26 | A popular platform used to launch active fingerprint tests is Nmap. This handy tool can help you detect specific operating systems and network service applications when you launch TCP, UDP or ICMP packets against any given target. 27 | 28 | ### Passive fingerprinting 29 | Passive fingerprinting is an alternative approach to avoid detection while performing your reconnaissance activities. 30 | The main difference between active and passive fingerprinting is that passive fingerprinting does not actively send packets to the target system. Instead, it acts as a network scanner in the form of a sniffer, merely watching the traffic data on a network without performing network alteration. 31 | 32 | In cybersecurity fingerprinting, one of the most popular methods involves OS name and version detection and is part of usual data intelligence process when running your OSINT research. While many tools may fit into this particular category, the following tools are popular in security community: 33 | 34 | ### Nmap 35 | Nmap has many features as a port scanner, but also as an OS detection software. 36 | 37 | A simple OS detection query using nmap looks like this: 38 | ``` 39 | $ sudo nmap -O X.X.X.X 40 | ``` 41 | In case there is a firewall blocking your request, you can add the -Pn option, as shown below: 42 | ``` 43 | $ sudo nmap -O X.X.X.X -Pn 44 | ``` 45 | A more aggressive approach can be taken by using -A option, but this may result in firewall detection from the remote host: 46 | ``` 47 | $ sudo nmap -A X.X.X.X 48 | ``` 49 | 50 | ### P0f (http://lcamtuf.coredump.cx/p0f3/) 51 | P0f offers a good alternative to Nmap and cane be used as a passive fingerprinting tool used to analyze network traffic and identify patterns behind TCP/IP based communications that are often blocked for Nmap active fingerprinting techniques. 52 | It includes powerful network-level fingerprinting features, as well as one that analyzes application-level payloads such as HTTP. It’s also useful for detecting NAT, proxy and load balancing setups. 53 | 54 | Once installed, you can perform any fingerprinting against the network by running: 55 | ``` 56 | $ p0f -i eth0 57 | ``` 58 | 59 | It is also possible to read offline pcap file 60 | 61 | ``` 62 | $ p0f -r some_capture.cap 63 | ``` 64 | 65 | ### Ettercap (http://ettercap.github.io/ettercap/) 66 | Ettercap is network sniffing tool that supports many different protocols including Telnet, FTP, Imap, Smb, MySQL, LDAP, NFS and encrypted ones like SSH and HTTPS. 67 | 68 | This tool is often used to launch man-in-the-middle attacks by the hackers. However, it is useful as a fingerprinting tool that can help identify local and remote operating systems along with running services, open ports, IP, mac address and network adapter vendor. 69 | 70 | Ettercap can be easily installed on most Unix/Linux platforms. In order to perform OS and service detection, it will sniff the entire network and save the results in profiles. 71 | 72 | 73 | ## Service fingerprinting 74 | In addition to fingerprinting remote OS names and versions, it is also possible to fingerprint specific network services. 75 | 76 | ### SSH Fingerprinting 77 | 78 | Hassh (https://github.com/salesforce/hassh) has become de-facto SSH Fingerprinting standard to accurately detect and identify specific Client and Server SSH deployments. These fingerprints uses MD5 as a default storage method for later analysis and usage comparisions. 79 | 80 | While SSH is a fairly secure protocol, it has a few drawbacks when it comes to analyzing interaction between client and server. In this case, using Hassh can help in situations that include: 81 | 82 | * Managing alerts and automatically blocking SSH clients using a Hassh fingerprint outside of a known “good set”. 83 | * Detecting exfiltration of data by using anomaly detection on SSH Clients with multiple distinct Hassh values 84 | * In forensic investigation, SSH connection attempts can be tracked with greater granularity and can be followed up by source IPs. Since "Hassh" based hash is associated with SSH client software, it's possible to detect the origin even if the IP is behind a NAT and is shared by different SSH clients. 85 | * Detecting and identifying specific client and server SSH implementations. 86 | 87 | Hassh works by using the MD5 “hassh” and “hasshServer” (created from a specific set of algorithms by SSH clients and SSH server software) in the SSH encrypted channel. This generates a unique identification string that can be used to fingerprint client and server applications. e.g. 88 | 89 | ``` 90 | c1c596caaeb93c566b8ecf3cae9b5a9e SSH-2.0-dropbear_2016.74 91 | d93f46d063c4382b6232a4d77db532b2 SSH-2.0-dropbear_2016.72 92 | 2dd9a9b3dbebfaeec8b8aabd689e75d2 SSH-2.0-AWSCodeCommit 93 | ``` 94 | 95 | ### SSL fingerprinting using JA3 ( https://github.com/salesforce/ja3) 96 | JA3/JA3S developed by Salesforce team is an SSL/TLS fingerprint method. This tool allows you to create fingerprints that can be produced on any platform for threat intelligence analysis. 97 | 98 | In the same cases, using JA3/JA3S as a fingerprinting technique for the TLS negotiation between both ends (client and server) can produce a more accurate identification of the encrypted communications and helps identify clients and servers with high probability in almost all cases e.g. 99 | 100 | ``` 101 | Standard Tor Client: 102 | 103 | JA3 = e7d705a3286e19ea42f587b344ee6865 (Tor Client) 104 | JA3S = a95ca7eab4d47d051a5cd4fb7b6005dc (Tor Server Response) 105 | ``` 106 | ### DNS fingerprinting using fpdns (https://github.com/kirei/fpdns) 107 | Some tools like Fpdns can be used to identify based on queries DNS the software that is used as the DNS server. This is especially TRUE even if DNS server "BIND" version printing is disabled. 108 | 109 | ``` 110 | $ sudo apt install fpdns 111 | $ sudo fpdns -D site.com 112 | 113 | Replace site.com with the actual site of your interest! 114 | ``` 115 | 116 | ### Interesting links: 117 | * https://securitytrails.com/blog/cybersecurity-fingerprinting 118 | * https://blogs.cisco.com/security/tls-fingerprinting-in-the-real-world 119 | * https://www.netresec.com/?page=Blog&tag=Satori 120 | * https://www.grc.com/fingerprints.htm 121 | * https://jis-eurasipjournals.springeropen.com/articles/10.1186/s13635-016-0030-7 122 | * Various DNS tools - https://www.dns-oarc.net/tools 123 | 124 | -------------------------------------------------------------------------------- /full-text-search.md: -------------------------------------------------------------------------------- 1 | ### Why full text search 2 | 3 | If you fire query like: 4 | 5 | SELECT * FROM table_name WHERE Foo LIKE '%Bar'; 6 | 7 | can not take advantage of index. It has to look at every single row and see if it matches. A full text index can give you instant answer! Full text index can offer a lot of flexibility in terms of the order of matching words, how words are closer. 8 | 9 | #### Stemming 10 | A fulltext search can stem words. If you search for run, you can get results for "ran" or "running". Most fulltext engines have stem dictionaries in a variety of languages. 11 | 12 | #### Weighted results 13 | A fulltext index can encompass multiple columns.e.g. if you search for "peach pie", the index can include title, keywords and a body. Results that match the title can be weighted higher as more relevant and can be sorted to show near the top. 14 | 15 | #### Disadvantages 16 | A fulltext index can be potentially huge, many times larger than standard B-tree index. So, most hosted providers who offer database instances disable this feature or at least charge extra for it. 17 | Fulltext indexes can be slower to update. If the data changes a lot, there might be some lag updating indexes compared to standard indexes. 18 | 19 | The following stackoverflow answer nicely explains what is the meaning of fulltext search. 20 | 21 | In general, there is a tradeoff between "precision" and "recall". High precision means that fewer irrelevant results are presented (no false positives), while high recall means that fewer relevant results are missing (no false negatives). Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall. 22 | 23 | Most full text search implementations use an "inverted index". This is an index where the keys are individual terms, and the associated values are sets of records that contain the term. Full text search is optimized to compute the intersection, union, etc. of these record sets, and usually provides a ranking algorithm to quantify how strongly a given record matches search keywords. 24 | 25 | The SQL LIKE operator can be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. In the worst case, the LIKE pattern will have leading wildcards that require every index key to be examined. In contrast, many information retrieval systems can enable support for leading wildcards by pre-compiling suffix trees in selected fields. 26 | 27 | Other features typical of full-text search are 28 | 29 | * lexical analysis or tokenization—breaking a block of unstructured text into individual words, phrases, and special tokens 30 | * morphological analysis, or stemming—collapsing variations of a given word into one index term; for example, treating "mice" and "mouse", or "electrification" and "electric" as the same word 31 | * ranking—measuring the similarity of a matching record to the query string 32 | 33 | Ref - https://stackoverflow.com/questions/224714/what-is-full-text-search-vs-like 34 | 35 | ### Hadoop-Spark-Elasticsearch 36 | Hadoop is distributed file system which allows you to develop distributed data processing applications under map-reduce model. 37 | Spark is a layer on the top of Hadoop which allows you to develop applications in scala or python and improves the performance of iterative process by a factor of 100. 38 | Elasticsearch is a distributed RESTful search engine and uses NoSQL to store documents. 39 | -------------------------------------------------------------------------------- /icmp-codes.md: -------------------------------------------------------------------------------- 1 | ### ICMP codes 2 | **Code**|**Description**|**References** 3 | :-----:|:-----:|:-----: 4 | 0|Network unreachable error|RFC 792 5 | 1|Host unreachable error|RFC 792 6 | 2|Protocol unreachable error Sent when the designated transport protocol is not supported|RFC 792 7 | 3|Port unreachable error|Sent when the designated transport protocol is unable to demultiplex the datagram but has no protocol mechanism to inform the sender 8 | 4|The datagram is too big|Packet fragmentation is required but the DF bit in the IP header is set 9 | 5|Source route failed error|RFC 792 10 | 6|Destination network unknown error|RFC 1122 11 | 7|Destination host unknown error|RFC 1122 12 | 8|Source host isolated error (Obsolete)|RFC 1122 13 | 9|The destination network is administratively prohibited|RFC 1122 14 | 10|The destination host is administratively prohibited|RFC 1122 15 | 11|The network is unreachable for Type Of Service|RFC 1122 16 | 12|The host is unreachable for Type Of Service|RFC 1122 17 | 13|Communication Administratively Prohibited.This is generated if a router cannot forward a packet due to administrative filtering|RFC 1812 18 | 14|Host precedence violation.Sent by the first hop router to a host to indicate that a requested precedence is not permitted for the particular combination of source/destination host or network| upper layer protocol 19 | 15|Precedence cutoff in effect.The network operators have imposed a minimum level of precedence required for operation| the datagram was sent with a precedence below this level 20 | 16-255| Not assigned| 21 | -------------------------------------------------------------------------------- /interview-fun.md: -------------------------------------------------------------------------------- 1 | ### Why do people hack websites? 2 | * To render a website useless or shut it down. 3 | * To digitally steal your money, especially through banking Trojans and malicious lines of codes. 4 | * Politically driven defacing of rivals websites. i.e., defacing a website belonging to a contestant in some election. 5 | * Purely mischievous fun. e.g., school’s own students, attacking its website 6 | 7 | ### Is it possible to detect that web site is hacked? 8 | * Your website is redirected to another URL that in most cases is a pornographic website. 9 | * A google alert appearing on the website which informs that the site has been hacked. 10 | * Strange looking JavaScript appears in the source code of the site. 11 | * You find new admin, database and FTP users which were not created by you. 12 | * Spam advertisements and pop-ups on the website due to malicious codes. 13 | * The site is no more accessible by Google. 14 | 15 | ### What steps do you take to restore? 16 | Some steps - not in order: 17 | * Inform your hosting service provider/web designer. 18 | * Run a full virus scan of your computers. 19 | * It is critical to know how severe was the attack and exactly how much damage has it caused. 20 | * shutdown the site 21 | * change passwords 22 | * Request google review 23 | 24 | ### what is cybersquatting? 25 | Similar domain to your actual domain. e.g. google.com will be typed as g00gle.com Users inadvetantly type the cybersquatted name and somebody has deliberately registered this domain for malicious purpose. 26 | 27 | 28 | -------------------------------------------------------------------------------- /linux-auth-log.md: -------------------------------------------------------------------------------- 1 | ### Auth.log analysis 2 | 3 | In linux, the file(/var/log/auth.log) contains authorization information like: 4 | * remote login 5 | * usage of sudo command 6 | * instances where a user password is required for authorization 7 | 8 | The file is stored in plain text, and rolled/archived logs will be compressed with gzip. 9 | The analysis of this file will allow us to track anomalous activities. Some of the use-cases that can be tracked are given below: 10 | 11 | #### sudo commands 12 | 'sudo' allows a user to execute a command with superuser privileges, or another user (superuser is the default). These are authorization events that you should definitely keep track of from security perspective. 13 | [[screenshot]] 14 | 15 | Parsing of this file allows us to see what folder the command was issued from, the user, as well as the command itself! 16 | 17 | #### root session 18 | Aside from sudo logs, which may be used to run a command with elevated privileges, it is possible to keep track of users escalating to root. From the logs, one can also find out number of session as well as their durations (opening time/closing time) and these metrics are useful for tracking malicious activities. 19 | [[screenshot]] 20 | 21 | #### ssh activity 22 | It is also possible to keep track of ssh remote login activity - how many connection attempts, what is the success and failure rate of ssh connections, set of commands executed once the ssh session is established and so on. 23 | [[screenshot]] 24 | 25 | If the number of ssh login failures are execessive, one can block the user on the basis of IP range or ASN numbers or City/Country and filter out the noise in the logs. 26 | -------------------------------------------------------------------------------- /linux-forensics.md: -------------------------------------------------------------------------------- 1 | ### Collecting volatile data 2 | * Date and time 3 | * network interfaces 4 | * promiscuous mode 5 | * network connections 6 | * open ports (TCP as well as UDP), Listening ports/services 7 | * running processes and their ports 8 | * open files 9 | * routing tables 10 | * mounted filesystem(s) 11 | * loaded kernel modules 12 | * kernel version 13 | * uptime 14 | * last reboot time 15 | * filesystem datetime stamps 16 | * hash values of system files 17 | * current logged in users 18 | * current users with noshell, current users with active shell 19 | * login history, login times 20 | * user accounts, inactive accounts 21 | * user history files 22 | * hidden files and directories 23 | * suid/sgid files 24 | ### Dumping RAM 25 | * using fmem kernel module 26 | * using lime 27 | * using /proc/kcore 28 | ### Acquiring filesystem images 29 | * using dd 30 | * using dcfldd 31 | * write blocking options 32 | * using forensics linux distributions like SIFT, Kali 33 | * udev rule based blocker for devices like USB 34 | * Analysis of strange file 35 | * regular files in /dev 36 | * user history files 37 | * hidden files 38 | * suid/sgid files 39 | * too old date files 40 | * finding deleted files in last 7 days/last month 41 | ### Timeline analysis 42 | * use of autospy to establish timeline 43 | * when was the system installed, rebooted, upgraded etc 44 | * changed files 45 | * newly created files 46 | ### Netwokr forensics 47 | * usage of snort for detection of malicious packets 48 | * bro for detailed logs analysis of http/dns/https/ traffic 49 | * using tcpstat 50 | * conversation analysis using tcpflow 51 | ### Writing reports 52 | * autospy 53 | * dradis 54 | * openoffice/MS-office 55 | ### File forensics 56 | * comparing file hash to known values 57 | * unknown file analysis using 58 | * file command 59 | * strings command 60 | * viewing symbols using nm 61 | * reading objects using objdump 62 | * analysis using gdb 63 | -------------------------------------------------------------------------------- /log-files-and-journalctl.md: -------------------------------------------------------------------------------- 1 | ### Various log file under /var/log 2 | 3 | * alternatives.log -- "run with" suggestions from update-alternatives 4 | * apport.log -- information on intercepted crashes 5 | * auth.log -- user logins and authentication mechanisms used 6 | * boot.log -- boot time messages 7 | * btmp -- failed login attempts 8 | * dpkg.log -- information on when packages were installed or removed 9 | * lastlog -- recent logins (use the lastlog command to view 10 | * faillog -- information on failed login attempts -- all zeroes if none have transpired (use faillog command to view) 11 | * kern.log -- kernel log messages 12 | * mail.err -- information on errors detected by the mail server 13 | * mail.log -- information from mail server 14 | * syslog -- system services log 15 | * wtmp -- login records 16 | 17 | 18 | ### Journalctl 19 | In addition to log files(/var/log), you should also watch journalctl activities. The journal represents an important collection of information on user and kernel activity and this information is retrieved from a variety of sources on the system. 20 | 21 | Some useful commands that you should run: 22 | ``` 23 | Returns total no of lines 24 | $ journalctl | wc -l 25 | 26 | ``` 27 | Journal logs from a date 28 | $ journalctl --since "2018-10-06 10:00" 29 | ``` 30 | ``` 31 | Log entries for a particular device 32 | $ journalctl -u networking.service 33 | ``` 34 | ``` 35 | Disk usage 36 | $ journalctl --disk-usage 37 | ``` 38 | ``` 39 | Activity for a specific process 40 | $ journalctl _PID=780 41 | ``` 42 | 43 | -------------------------------------------------------------------------------- /logs vs metrics.md: -------------------------------------------------------------------------------- 1 | ## Logs vs Metrics 2 | ### What are logs 3 | A log message is system generated set of data when an event has happended and it describes the event. Log data contain details about the event such as what resource was accessed, who accessed it and the time. Each event in a system is going to have different sets of data in the message. In general, there are five different categories of logs - informational, debug, warning, error and alert. 4 | 5 | ### What are metrics 6 | While logs are about specific event, metrics are a measurement at a point in time for the system. This can have value, timestamp and identifier(tag). While logs may be collected at any time after event has happened, metrics are typically collected at a fixed time interval known as resolution. The collection of data is referred to as time-series metric and can be visualized in different types of graphs such as guages, counters and timers. 7 | Although measurement of health of system can be stored in performance log file, it is costly to collect health of system. A metric will normalize log file data and the size of metric file will be a fraction of size of entire log file. 8 | 9 | In summary, a log is an event that happened and a metric is a measurement of the health of a system. 10 | 11 | Based on nice explanation from - https://www.sumologic.com/blog/logs-metrics-overview/ 12 | Another interesting take on log vs metrics is here - https://whiteink.com/2019/logs-vs-metrics-a-false-dichotomy/ 13 | -------------------------------------------------------------------------------- /machine learning terms: -------------------------------------------------------------------------------- 1 | ### Common terms used Machine learning 2 | 3 | * Statistics: Statistics is the science of collecting, organising, summarising, analysing and interpreting data. 4 | 5 | * Data mining - process of automatically discovering useful information in large data repositories. 6 | 7 | * Machine learning - set of techniques that allow you to deal with hugh dataset in intelligent way (by developing algorithms or set of logical rules) to derive actionable insights (delivering search for users in this case) 8 | 9 | * Quantitative variables - take numerical values whose size is meaningful. 10 | Quantitative variables typically have measurement units, such as pounds, dollars, years, volts, gallons, megabytes, inches, degrees, miles per hour, pounds per square inch, BTUs, and so on. So, it makes sense to add, to subtract, and to compare two persons’ weights, or two families’ incomes. 11 | 12 | * Qualitative variables - Qualitative (categorical) variables typically do not have units.e.g. gender, hair color, or ethnicity — group individuals. Qualitative and categorical variables have neither a “size” nor, typically, a natural ordering to their values. 13 | Some variables such as social security numbers, zip codes take numerical values but are not quantitative. The sum of two zip codes or social security numbers is not meaningful. so, they are qualitative or categorical variables. 14 | 15 | * Types of data: 16 | * Numerical data - continuous data, discrete data 17 | * Categorical data - normal level, ordinal level 18 | * Measurements: 19 | * Nominal - values of variables are names, so used for categorical/qualitative analysis 20 | * Ordinal -collecting information in which order is important e.g. tracking of student grades 21 | * Interval - distance between values have a special meaning - difference in temperature 22 | * Ratio - estimation of ratio between magnitude of continuous quantity 23 | -------------------------------------------------------------------------------- /malware-detection-methods.md: -------------------------------------------------------------------------------- 1 | ## Malware detection methods 2 | 3 | #### Signature based methods 4 | 5 | Signature is unique feature for file, something like fingerprint of an executable. Signature based methods use patterns extracted from various malwares to identify them and are more efficient and faster than any other methods. Signature based methods have small error rates and this is the reason these are often used in commercial applications. 6 | 7 | But, signature based methods are unable to detect unknown malware variants and requires high amount of manpower, time and money to extract unique signatures. Further, it is difficult to identify infection such as polymorphic and metamorphic codes. 8 | 9 | #### Behaviour based methods 10 | 11 | Behaviour based malware detection techniques observer behaviour of the program to conclude whether it is malicious or not. In these methods, programs with same behaviour are collected and a behaviour signature is developed. This signature can identify various samples of malware of same family. A behaviour based detector basically consists of following components: 12 | 13 | Data collector - Collects dynamic/static information about the executable 14 | Interpreter - Converts raw information collected by data collector into intermediate representation 15 | Matcher - compare interpreter representation with the signature 16 | 17 | One example of behaviour based detection approach is histogram based malicious code detection by Symantec. 18 | 19 | Main advantages of behaviour based malware is ability to detect unknown or polymorphic malware variants. But disadvantage are high False Positive Ratio(FPR) and high amount of scanning time. 20 | 21 | #### Heuristic methods 22 | 23 | Heuristic malware detection methods use data mining and machine learning to learn behaviour of malicious file - e.g. Naive Bayes and Multi-Naive Bayes are employed to classify malware and benign files(https://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6620049) Typicall, these use API/System calls, N-gram, Op-code features. 24 | 25 | #### Concealment stratergies 26 | 27 | Malware authors try to hide malware presence by adopting techniques such as 28 | 29 | Obfuscation - Actions such as garbage commands, un-necessary jumps etc. 30 | 31 | Code encryption - Contain defensive mechanism to encrypt themselves or its malcious activities. Encrypted malware is a complex consists of a decryption algorithm, encryption algorithm, encryption keys and encrypted malicious code. When the malware runs, the key and decryption algorithm have been used to decrypt its malicious part. 32 | 33 | More details - https://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6620049 34 | -------------------------------------------------------------------------------- /malwares.md: -------------------------------------------------------------------------------- 1 | ### Fast flux Networks 2 | Fast flux refer to those networks used by several botnets to hide the domains used to download malware or host phishing site. It can also refer to type of network used to host command-and-control centers or proxies used by those botnets, making them difficult to find and even more difficult to dismantle. 3 | In fast flux network, multiple IPs are associated with a domain name and these IPs undergo change as frequently as few minutes!.e.g. Avalance botnet was having 80,0000 domains under its control. 4 | 5 | Most machines on this network are not actually responsible for hosting and downloading malicious content for victims. The task is reserved for few servers while the rest act as re-directors that help botnet owners to maske real addresses of the system. 6 | 7 | #### Single flux network 8 | It is characterized by multiple individual nodes registering and de-registering IP addresses as a part of DNS A records for a single domain name. These registrations have very short lifespans( 5 min or less) and creata a constantly changing flow of addresses when attempting to access a specific domain. 9 | 10 | Moreover the domains used are hosted on bulletproof servers having good reputation and it's difficult to take them down at short notice. 11 | 12 | #### Double flux network 13 | This type of network is similar to single flux network but with additional sophistication and it makes it difficult to locate the machine serving the malware. 14 | In this case, zombie computers that are part of botnet are used as proxies and it prevents the victim from interacting directly with the server hosting the malware. This is the concealment stratergy adopted by cyber criminals to keep the infrastructure running. 15 | In fact, this networks are typically characterized by multiple nodes registering and de-registering as a part of DNS NS records. Both DNS A records and authoritative NS records for malicious domains are continually changed in round robin manner and advertized into fast flux network. 16 | 17 | 18 | Ref: 19 | * https://www.thesecuritybuddy.com/dos-ddos-prevention/what-is-fast-flux-network/ 20 | * https://resources.infosecinstitute.com/fast-flux-networks-working-detection-part-1/#gref 21 | * https://www.welivesecurity.com/2017/01/12/fast-flux-networks-work/ 22 | * https://www.akamai.com/uk/en/multimedia/documents/white-paper/digging-deeper-in-depth-analysis-of-fast-flux-network.pdf 23 | 24 | ### Crypto minining malware 25 | 26 | These malwares are specially developed to take over computer resources and use them for cryptocurrency mining without explicit permissions. Cyber criminals have turned to writing cryptomining malware as a way to harness the computing power of large number of computers, smartphones to help them generate revenue from cryptocurrency mining. A single cryptocurrency mining botnet can net cyber criminals more than $30,000 per month as per Kaspersky report. 27 | 28 | In addition to malwares specifically designed to mine cryptocurrency , cyber criminals are using browser based cryptocurrency mining to help them generate revenue. Coinhive is a software program that packages all the tools needed to easily enable website owners to use stealth scripting to force visitors into crypto currency mining while visiting their site, in most case without explicit permission. 29 | 30 | ### What is RAT 31 | RAT is a sort of swiss army knief program consisting of many malicious functionalities. 32 | * Stealing of username and passwords 33 | * Logging of keystrokes 34 | * Gathering system information 35 | * Exfiltration of data 36 | * Command-and-control activities 37 | * Downloading of malwares for further actions 38 | * Accessing and uploading sensitive files 39 | * Recording of audio/video 40 | 41 | Typical Infection vectors - email attachments and malicious downloads 42 | 43 | -------------------------------------------------------------------------------- /netflow traffic classification use cases.md: -------------------------------------------------------------------------------- 1 | ### Explanation of Flows vs Connection vs Session 2 | A network flow is defined as unidirectional sequence of packets between two network 3 | endpoints and have the following attributes: 4 | * Source IP 5 | * Destination IP 6 | * Protocol 7 | * Source Port 8 | * Destination Port 9 | * Type of service (ToS) 10 | * Input interface 11 | 12 | Whereas connection is simply a bidirectional flow (a forward flow and a reverse flow) 13 | 14 | In a session, you will have many connections between same source and destination. In addition, routing 15 | policies, paths followed by packets in flows and sessions are different as explained below: 16 | 17 | IP routing is stateless and routes each packet based on the IP address and port. A stateless 18 | connection is one in which no information is retained by either sender or receiver. TCP is used 19 | to manage state and is managed by the end servers. Firewalls are stateful and keep track of 20 | TCP/UDP sessions. The firewall tracks the attributes of the session such as sequence numbers 21 | and keeps this information in dynamic state tables. Load balancers and WAN Optimizers are 22 | added to networks to manage the state of a session to solve the problem of stateless routers. 23 | 24 | IP routing creates a fixed path between two networks. While routes can change based on network 25 | outages, it is not possible to dynamically route a flow over multiple paths in a stateless 26 | network. Flows are packet based whereas sessions are services/application based. In a flow, 27 | all packets that are alike, are treated the same. For instance, if there are six concurrent 28 | cloud based video streams, the router will treat all the UDP packets the same once the flow 29 | is established. Session based networking allows each session to be dynamically treated 30 | different such as priority and bandwidth shaping. 31 | 32 | ### Credits: 33 | * https://talkingpointz.com/flows-vs-sessions/ 34 | 35 | ## Netflow traffic classification using netflow 36 | 37 | ### Name: Good traffic 38 | * Class: Good 39 | * Score:10 40 | * Note: A list of known hosts/netblocks extracted from nDPI source code e.g. netblock of google, twitter etec. 41 | 42 | ### Name: Alienvault Bad IP 43 | * Class: Bad 44 | * Score:10 45 | * Note: IP addresses from Alienvault threat intelligent database is used to flag malicious flows. 46 | 47 | ### Name: Emerging threats Bad IP 48 | * Class: Bad 49 | * Score: 10 50 | * Note: IP addresses from Emerging threats feed is used to flag malicious flows. 51 | 52 | ### Name: Insecure TCP ports 53 | * Class: Bad 54 | * Score: 10 55 | * Note: Any traffic flow that included traffic to a list of TCP or UDP ports for insecure protocols -e.g. telnet,ftp, rsh 56 | 57 | ### Name: Unknown port conversation 58 | * Class:Bad 59 | * Score:10 60 | * Note: Any traffic flow between unknown or un-assigned source and destination TCP or UDP ports is flagged as anomaly. 61 | 62 | ### Name:SYN_Flood 63 | * Class:Bad 64 | * Score:10 65 | * Note: Any traffic flow that contains only SYN packet is treated as malicious. 66 | 67 | ### Name: SSH Brute Force scan 68 | * Class:Bad 69 | * Score:20 70 | * Note: Any flows to port 22 having 11 to 51 packets in flow is treated as possible SSH Brute force attack. 71 | 72 | ### Name: Possible TCP SCAN 73 | * Class: Bad 74 | * Score: 20 75 | * Note: For thie test, flows are grouped by destination port and counted along with average number of packets per flow. Any group with more than 1000 flows and less than 4 packets per flow is considered to indicate possible scanning of destination port associated with the group. The reasoning is that a large number of flows to a port with very few packets per flow is an indication of scanning attempt. 76 | -------------------------------------------------------------------------------- /network-security-monitoring.md: -------------------------------------------------------------------------------- 1 | ### Network monitoring 2 | It is a a system that constantly monitors a computer network for slow or failing systems and that notifies the network administrator in case of outages via email, pager or other alarms. 3 | 4 | Network monitoring can be done in two ways: 5 | * Active network monitoring 6 | * Passive network monitoring 7 | 8 | #### Active network monitoring 9 | In active monitoring, a test traffic is injected onto the network and subsequent flows are monitored. This kind of monitoring is useful when you want data on patricular aspect of network performance e.g. latency between two end-points, packet-drops, jitter, analysis of malicious payloads and so on. 10 | 11 | #### Passive monitoring 12 | It's like a continuous observation of network traffic followed by its detailed studies. 13 | 14 | Typically, instead of injecting artificial traffic onto your network, passive monitoring involves monitoring of traffic that is already on the network. This kind of monitoring requires a device on the network to capture network packets for analysis. This can be done with specialized probes designed to capture network data or with built-in capabilities on switches or other network devices. Passive network monitoring can collect large volumes of data and from that we can derive a wide range of information. For example, TCP headers contain information that can be used to derive network topology, identify services and operating systems running on networked devices, and detect potentially malicious probes. 15 | 16 | Through passive monitoring, a security admin can gain a thorough understanding of the network's health. Much of this data can be gathered in an automated, non-intrusive manner through the use of standard tools.Passive monitoring tools can record, analyze, correlate and produce highly valuable security intelligence specific to a network. 17 | 18 | ### Why Network security monitoring 19 | Network security monitoring involves collecting the full spectrum of data types (event, session, full content and statistical) needed to identify and validate intrusions. The goal is to detect and respond to threats as early as possible to prevent data loss or disruption and restore the normalcy in operations. Often, this is complicated when mountains of security-related events and log data are continuously produced by multiple disparate security tools. 20 | 21 | Most of the commercial products do generate useful security alerts ,say, "event X happened" but they do not provide enought context so that the user can act on it. The end users are always in dilemma whether the alert is relevant or not and in many cases, it is just overlooked/ignored. The usage of network security monitoring tools in open source domain allows security analyst to drill down the alerts to its minute details and make decisions. 22 | 23 | Network security monitoring can be done in two ways: 24 | 25 | ### Active security monitoring 26 | Active (in-line) monitoring typically includes “bump in the wire” type solutions – 27 | 28 | * Firewalls/Proxies 29 | * Malware/Virus scanners (Spam, Phishing, Virus) 30 | * Whitelisting / blacklisting at various layers 31 | * Encryption/Man-in-the-middle 32 | 33 | Active measures are good first steps but they are only as effective as the signature data and/or configuration driving them. Each organization’s traffic profile is different and a lot of times active measures are not sufficient or very effective and often, they go stale very quickly with the new attacks in the wild every day. 34 | 35 | Most firewalls are configured to block or allow combinations of IP / port / protocol. There are next-gen firewalls that also do Deep packet inspection to block malicious IPs/URLs. Malware scanners depend on pre-configured patterns of known bad attachments or phishing URLs. Whitelisting / blacklisting rules need to be updated on a regular basis to be effective. 36 | 37 | ### Passive security monitoring 38 | As indicated earlier, a passive monitoring system can be configured to parse a copy of live network traffic, flag known anomalies and take action(machine learning) or log it for a human(security admin) to look at. A good passive monitoring solution typically has following capabilities: 39 | * Can keep up and watch all 7-layers of network traffic 40 | * Can parse and de-construct connection flows on the fly 41 | * Can log traffic meta-data for correlation 42 | * Can apply pre-defined identification rules and flag off suspicious activities 43 | * Support flexible configuration to define new patterns on the fly 44 | 45 | There are several open source tools available for passive security monitoring and the most commomly used are described below: 46 | 47 | #### Snort/Suricata: 48 | Snort used to be the defacto IDS / IPS engine of choice for anyone looking to run an IDS.Suricata is another popular IDS project that allows efficient monitoring of very high speed links above 1Gbps. Snort / Suricata engines have a rich set of community and open/commercial rules sets available. It is possible to run it on an edge machine (router / firewall) or on a intranet machine to watch bad traffic that is flowing through and raise alerts. 49 | 50 | #### Bro: 51 | Bro is a general purpose traffic analysis platform that can also function as IDS! Bro engine is driven by program like scripts that define patterns to be matched, ignored or alerted. Bro can run on commodity hardware and can be scaled up to 100Gbps. 52 | 53 | ### Wireshark 54 | 55 | ### Network Miner 56 | 57 | ### 58 | #### Useful presentations: 59 | Principle of network security monitoring - https://www.mycert.org.my/mycert-sig/mycert-sig-08/slides/MyCERT-NSM-presentation.pdf 60 | 61 | -------------------------------------------------------------------------------- /nmap-nse-scripts.md: -------------------------------------------------------------------------------- 1 | ### How to install latest nmap scripts 2 | I often forget the process to install latest nmap scripts from official Nmap repository. So, here is a quick note for myself! 3 | #### Find location of nmap scripts directory 4 | Under Windows 5 | ``` 6 | Windows Key + F, *.nse 7 | ``` 8 | Under linux 9 | ``` 10 | $ sudo find / -name '*.nse' 11 | ``` 12 | ``` 13 | $ sudo locate *.nse 14 | ``` 15 | Most common place for nse script is 16 | ``` 17 | c:\Program Files\Nmap\Scripts 18 | /usr/share/nmap/scripts 19 | /usr/local/share/nmap/scripts 20 | ``` 21 | #### Download nse scripts from official nmap site 22 | Nmap ```.nse``` scripts are located under https://svn.nmap.org/nmap/scripts/ repository. 23 | 24 | It's official Git-based mirror is here - https://github.com/nmap/nmap/tree/master/scripts 25 | 26 | So, Download the repository using git pull command. 27 | 28 | Extract scripts folder and copy/overwrite over existing ```scripts``` directory. 29 | 30 | #### Update scripts database (optional) 31 | If you have internet connection, you can use Nmap script's update command 32 | ``` 33 | $ nmap –script-updatedb 34 | ``` 35 | 36 | Now, you are ready to run nmap with the latest version of NSE script and this is particularly useful for finding vulnerabilities. 37 | 38 | Cheers! 39 | -------------------------------------------------------------------------------- /osquery-threat-hunting.md: -------------------------------------------------------------------------------- 1 | ## Threat hunting using Facebook OSQuery 2 | 3 | ### List logged in users in the system at present 4 | ``` 5 | osquery> select * from logged_in_users; 6 | ``` 7 | 8 | ### Find all previous logins 9 | ``` 10 | osquery> select * from last; 11 | ``` 12 | 13 | ### List Firewall rules 14 | ``` 15 | osquery> select * from iptables; 16 | osquery> select chain, policy, src_ip, dst_ip from iptables; 17 | ``` 18 | 19 | ### Find all jobs scheduled by crontab 20 | ``` 21 | osquery> select command,path from crontab; 22 | ``` 23 | 24 | ### Find all files with setuid enabled.(suid bit set) 25 | ``` 26 | osquery> select * from suid_bin; 27 | ``` 28 | 29 | ### Find list of kernel modules 30 | ``` 31 | osquery> select name, used_by, status from kernel_modules where status='Live'; 32 | ``` 33 | 34 | ### Find listening ports for any backdoors 35 | ``` 36 | osquery> select * from listening_ports; 37 | ``` 38 | 39 | ### Find file activity in server along with responsible user 40 | ``` 41 | osquery> select * from file_events; 42 | ``` 43 | 44 | ### Find top 10 largest processes by resident memory size 45 | ``` 46 | osquery> select pid, name, uid, resident_size from processes order by resident_size desc limit 10; 47 | ``` 48 | 49 | ### Find all running processes 50 | ``` 51 | osquery> select * from processes; 52 | ``` 53 | 54 | ### Find the process count and name for top 10 active processes 55 | ``` 56 | osquery> select count(pid) as total, name from processes group by name order by total desc limit 10; 57 | ``` 58 | 59 | ### Find any listening ports/addresses that are not as per organization policy 60 | ``` 61 | osquery> select distinct process.name listening.port,listening.address,process.pid from processes as process JOIN listening_ports as listening ON process.pid = listening.pid; 62 | ``` 63 | ### Attackers often delete malicious binary file after running in the system. Find all such processes with no corresponding disk file ( valid file path) 64 | ``` 65 | osquery> select name,path,pid from processes where on_disk=0; 66 | ``` 67 | 68 | ### Find any malware reverse shell 69 | ``` 70 | osquery> select * from processes where cmdline like 'bin/bash -i >& /dev/tcp%'; 71 | ``` 72 | ### Arp spoofing attack 73 | ``` 74 | osquery> select * from ( select count(1) as mac_count, mac from arp_cache group by mac) where mac_count>1; 75 | ``` 76 | ### Watch a process with strict RSS limits 77 | ``` 78 | osquery> SELECT i.pid, i.version, p.resident_size, p.user_time, p.system_time, uptime.total_seconds FROM osquery_info i, processes p, uptime WHERE p.pid = i.pid; 79 | ``` 80 | 81 | #### Ref: 82 | * OSQuery to monitor linux - https://linoxide.com/monitoring-2/setup-osquery-monitor-security-threat-ubuntu/ 83 | 84 | -------------------------------------------------------------------------------- /pandas scaling.md: -------------------------------------------------------------------------------- 1 | ### Scaling in Pandas (Pre-processing of data values) 2 | #### Standard scalar 3 | The StandardScaler assumes your data is normally distributed within each feature and will scale them such that the distribution is now centred around 0, with a standard deviation of 1. 4 | If data is not normally distributed, this is not the best scaler to use. 5 | 6 | #### Min-Max scalar 7 | It essentially shrinks the range such that the range is now between 0 and 1 (or -1 to 1 if there are negative values). This scaler works better for cases in which the standard scaler might not work so well. If the distribution is not Gaussian or the standard deviation is very small, the min-max scaler works better. However, it is sensitive to outliers, so if there are outliers in the data, you might want to consider the Robust Scaler. 8 | 9 | #### Robust scalar 10 | The RobustScaler uses a similar method to the Min-Max scaler but it instead uses the interquartile range, rathar than the min-max, so that it is robust to outliers. Of course this means it is using the less of the data for scaling so it’s more suitable for when there are outliers in the data. 11 | 12 | #### Normalizer 13 | The normalizer scales each value by dividing each value by its magnitude in n-dimensional space for n number of features. 14 | 15 | * Ref link - http://benalexkeen.com/feature-scaling-with-scikit-learn/ 16 | -------------------------------------------------------------------------------- /quantum-notes.md: -------------------------------------------------------------------------------- 1 | ### Notes 2 | Microsoft has released preview version of Quantum Development Kit with new language - Q#. Simulations can be done locally and or on Azure cloud platform. The platform offers rich libraries and code samples. 3 | 4 | Quantum systems are highly suspectible to decoherence. The states of quantum bits are quickly randomized by interference from the environment.Q-CTRL toolkit help teams design and deploy control for their quantum hardware to suppress these errors. 5 | 6 | Google has released Cirq, an open source software toolkit that lets developers create algorithms without needing a background in quantum physics. Google has also released OpenFremion-Cirq for creation of 7 | algorithms that simulate molecules and properties of materials. 8 | 9 | ### Quantum algorithm and its implementation using Qskit 10 | * Introduction to Coding Quantum Algorithms - https://arxiv.org/pdf/1903.04359.pdf 11 | * Fundamentals in quantum algorithms - https://arxiv.org/pdf/2008.10647.pdf 12 | 13 | * Quantum implementation of Shor's code multiple simulator platforms - 14 | * https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11167/111670B/Quantum-implementation-of-the-Shor-code-on-multiple-simulator-platforms/10.1117/12.2532539.full?SSO=1 15 | * https://www.spiedigitallibrary.org/proceedings/Download?fullDOI=10.1117/12.2532539 16 | 17 | * Prototype Container-Based Platform for Extreme Quantum Computing Algorithm Development - https://ieeexplore.ieee.org/document/8916430 18 | * Comparision of quantum computing platforms through quantum algorithm implementations - http://csis.pace.edu/~ctappert/srd/a12.pdf 19 | * Gate implementation and cancer detection with quantum computing - http://reports.ias.ac.in/report/19342/gate-implementation-and-cancer-detection-with-quantum-computing 20 | * Assessment of IBM-Q computer and its software environment - http://dice.cyfronet.pl/publications/source/MSc_theses/ZuzannaChrzastek-MSc-Thesis-June-2018.pdf 21 | * Introduction to quantum computing - https://cerfacs.fr/wp-content/uploads/2018/09/CSG_Suau-final_report.pdf 22 | * Quantum computing tutorial - https://pythonprogramming.net/qubits-gates-quantum-computer-programming-tutorial/ 23 | 24 | 25 | ### Shor's algorithm implementation 26 | * Implementation of shor's algorithm using Qiskit - https://github.com/ttlion/ShorAlgQiskit 27 | * Quantum computing examples using QISKit - https://github.com/mrtkp9993/QuantumComputingExamples 28 | * Source code of MPI programs for simulating quantum algorithms and its post-processing - https://github.com/ekera/qunundrum 29 | * Complexity analysis for shor's algorithm - https://github.com/pkaran57/quantum-computing-final-project 30 | * An implementation of Shor's Quantum Algorithm with sequential QFT - https://github.com/nikoSchoinas/ShorsQuantumAlgorithm 31 | * Implementing Shor's algorithm in Cirq - https://github.com/dmitrifried/Shors-Algorithm-in-Cirq 32 | 33 | ### Some interesting links 34 | * Simulation of 45-bit quantum circuit - https://arxiv.org/pdf/1704.01127.pdf 35 | * Quantum circuit analyzer tool - https://iopscience.iop.org/article/10.1088/1367-2630/ab60f6#references 36 | * Quantum computing presentations: 37 | * https://appliedtech.iit.edu/sites/sat/files/pdfs/ITM/Quantum%20Computiing.pdf 38 | * https://www1.icts.res.in/admin/wysiwyg_editor/downloads/1_Monday/1_Ronald_de_Wolf/qip07.pdf#page=69&zoom=auto,531,-171 39 | * Analysis and implementation of quantum computing algorithms - https://digitalcommons.unf.edu/cgi/viewcontent.cgi?article=1027&context=soars 40 | * Quantum algorithm implementations for beginners - https://arxiv.org/pdf/1804.03719.pdf 41 | * Introduction to qunatum algorithm - https://people.cs.umass.edu/~strubell/doc/quantum_tutorial.pdf 42 | * Quest and high performance simulation of quantum computers - https://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC6656884&blobtype=pdf 43 | 44 | * Shores algorithm - https://github.com/mett29/Shor-s-Algorithm 45 | * Teach me quantum - https://github.com/msramalho/Teach-Me-Quantum 46 | * Awesome quantum computing - https://github.com/krishnakumarsekar/awesome-quantum-machine-learning 47 | * https://github.com/zommiommy/quantum_research 48 | 49 | 50 | * Introduction to quantum computing - https://medium.com/qc-applied-approach-to-build-your-own-quantum/introduction-to-quantum-computing-a5af5127de0d 51 | * Introduction to quantum logical gates - https://medium.com/qc-applied-approach-to-build-your-own-quantum/introduction-to-quantum-logical-gates-part-i-80f95fa851a2 52 | * Intel quantum simulator - https://arxiv.org/pdf/2001.10554v1.pdf 53 | 54 | ### Shor's Algorithm for factoring large integers 55 | 56 | * https://github.com/lialkaas/qiskit-shors 57 | * https://github.com/toddwildey/shors-python 58 | 59 | * Distributed Memory Techniques for Classical Simulation of Quantum Circuits - https://www.groundai.com/project/distributed-memory-techniques-for-classical-simulation-of-quantum-circuits/1 60 | * Intel quantum simulator - https://arxiv.org/pdf/2001.10554v1.pdf 61 | * QuEST and High Performance Simulation of Quantum Computers - https://europepmc.org/article/pmc/pmc6656884 62 | * 0.5 Petabyte Simulation of a 45-Qubit Quantum Circuit - https://arxiv.org/pdf/1704.01127.pdf 63 | 64 | 65 | 66 | -------------------------------------------------------------------------------- /replace-linux-on-smartphone.md: -------------------------------------------------------------------------------- 1 | ## Replacing Android with Linux on smartphone 2 | Linux can support any type of computer hardware and Android smartphone hardware is no different. It's possible to run Linux on Android smartphone if you wish. 3 | 4 | As you know, the core of Android is Linux - i.e. kernel used in Android is based on Linux kernel. Even though, Android uses Linux Kernel, it does not come with other softwares of Linux distribution. 5 | Android does not run typical linux applications as it uses Dalvik virtual machine to run specific applications written in Java. Android apps are specifically programmed to work on Android devices. So, Android is a bit different from Linux! 6 | 7 | ### Why linux on smartphone 8 | Though Android is open source, it is not considered open source by many people due to the presence of proprietary softwares. These software makes Android less privacy-focused OS. 9 | 10 | Linux offers a completely open source OS.So we can use our smartphones without any proprietary software. It will help us in keeping the data private and improving the privacy. Further, Linux is considered to be more secure than Android. So, installing a Linux on smartphone will make our devices more secure. 11 | 12 | Linux has good support for older hardwares and it would be beneficial for smartphones. Usually, it is seen that smartphones get software updates for 3-4 years after their initial release. But, deployment of Linux enables long term software updates for smartphones, up to ten years! This will increase the life span of smartphones and will result in substantial cost savings. 13 | 14 | Some Linux distributions have support for Android apps and this will give users a plethora of app choices (Android and Linux based apps combined). So Linux can become a nice alternative to Android. 15 | 16 | Ref: 17 | * https://lotoftech.com/is-it-possible-to-replace-android-with-linux-on-a-smartphone/ 18 | * https://blog.mobian-project.org/posts/2021/01/15/mobian-community-edition/ 19 | * https://www.ubuntupit.com/top-20-best-linux-voip-and-video-chat-software/ 20 | -------------------------------------------------------------------------------- /sandbox-drawbacks.md: -------------------------------------------------------------------------------- 1 | 2 | ### Why Sandboxing is not enough 3 | * It's hard to have a generic sandbox configuration that can work with all kinds of malware. e.g. it's possible that a malware can sleep for 6 hours after infection. In sandbox, you are running most samples for 4 | uoto 5 minutes only and as a result, all the samples may not get caught. So, using sandbox is not a silver bullet. It is to be remembered that each tool has a purpose and you have see the solution 5 | as a means of achieving your aim. 6 | 7 | * Some malwares make heavy use of techniques that allow them to track the environment they are running.So, these malwares have built-in anti-VM techniques embedded in malware itself and as a result, malware will not run when it encounters a sandbox. 8 | This again emphasizes the fact that usage of sandbox for malware analysis can not gurantee 100% results. 9 | 10 | * It is to be remembered that Hackers often submit samples to public sandboxes to test their detection rate. A very low detection rate is a good sign for the hacker and gives him confidence that a variant of actual sample can be used for attack. 11 | 12 | So, it is recommended to supplement sandbox investigation along with other options. This includes going through static analysis using IDA or usage of another round of 13 | dynamic analysis using a debugger. 14 | 15 | In summary, you have to use the available information and resources intelligently to do malware analysis efficiently using a combination of open source and/or commercial tools. 16 | -------------------------------------------------------------------------------- /scap-security-compliance.md: -------------------------------------------------------------------------------- 1 | ### Security Content Automation Protocol(SCAP) based security compliance 2 | 3 | In the present world, security compliance is a MUST requirement in many industries like finance, health, pharma etc and now, it has become a legal requirement in many cases. Regulatory standards like PCI-DSS, BITS, HIPPA and ISO27001 prescribe security recommendations for protecting data and improving information security management in the organization. By fulfiling the requirements of security compliance, you are able to mitigate many network and/or web application security attacks and are able to achieve specific IT security goals for an organization to protect its reputation. 4 | 5 | On one hand, organizations are confronted increasing audit and security compliance obligations with increased privacy concerns, while on the other hand, the use of cloud services, mobile ubiquity, BYOD and other mechanisms have made achieving security compliance more complex. Further, each security standard involves evolving set of specific requirements, achieving security compliance has become complicated and a costly affair. In order to gain protection from liabilities in case of a security breach, organizations are spending large amount of time and money on regulatory compliance efforts. 6 | 7 | Improving security posture is never easy journey and there will be many hurdles in implementation; but with common security terminology and standardized tools like SCAP, it's achievable. 8 | 9 | Security content automation protocol(SCAP) allows guidance documents like CIS benchmarks to be expressed in open and machine readable form. SCAP validation allows user to draw conclusions about their organization's security posture in a complex environment. SCAP allows machine processing of raw security data - e.g. naming of security flaws, test for presence of flaws, status of configuration option. This provides the potential for dramatically better, more automated security posture determination for the organization. The same assessment would take much more time if done in a manual way. 10 | 11 | Typically, SCAP based validation is done in the following way: 12 | * Scan the system against open cybersecurity standards 13 | * Calculate the score to evaluate security posture 14 | * Interoperate with other SCAP validated scanners to present results in standard way 15 | 16 | SCAP community discussions are based on deep analysis of technology and field testing of operational systems. Combined with threat information from security community and SCAP guidance present best translation of vulnerability knowledge into the language of system administrators.e.g. CIS benchmarks translates threat knowledge into system configuration that will prevent spread of many attack vector. 17 | 18 | In spite of this,many organizations find it difficult to maintain compliance due to lack of resource and expertize. 19 | But, it is necessary to go for achieving compliance as it will result in: 20 | * Reduce cyber risk by following best practice guides and expedite overall compliance process 21 | * Fulfiling of information security reporting and auditing requirements 22 | -------------------------------------------------------------------------------- /scoring classification.md: -------------------------------------------------------------------------------- 1 | ## ML Scoring classification 2 | 3 | * Ratio of correctly predicted positive observations to total predicted positive observations 4 | precision = TP/ (TP+FP) 5 | 6 | * Recall is ratio of correcly predicted positive observations to all observations in actual class. 7 | Recall = TP/ (TP+ FN) 8 | 9 | * F1 score = weighted average of precision and recall. this takes in account both false positives and false negatives. 10 | F1 = 2* ( recall * precision) / (recall + precision) 11 | 12 | #### Interesting links 13 | * Scoring classifier models - http://benalexkeen.com/scoring-classifier-models-using-scikit-learn/ 14 | * ROC plots - https://go2analytics.wordpress.com/2016/07/26/implement-classification-in-python-and-roc-plotting-svc-example/ 15 | -------------------------------------------------------------------------------- /security-guidance.md: -------------------------------------------------------------------------------- 1 | ## Block attachments in mail gateway 2 | You should block the following attachment types in mail gateway: 3 | ``` 4 | .ADE, .ADP, .APK, .BAT, .CHM, .CMD, .COM, .CPL, .DLL, .DMG, .EXE, .HTA, .INS, .ISP, .JAR, .JS, .JSE, .LIB, .LNK, .MDE, .MSC, .MSI, .MSP, .MST, .NSH .PIF, .SCR, .SCT, .SHB, .SYS, .VB, .VBE, .VBS, .VXD, .WSC, .WSF, .WSH, .CAB 5 | ``` 6 | Ref links - https://support.google.com/mail/answer/6590?hl=en 7 | 8 | ## Improve effectiveness of clamav 9 | To improve effectiveness of clamav, include signatures from Sanescurity.net and Securiteinfo.com 10 | 11 | #### Interesting links: 12 | * http://sanesecurity.com/usage/signatures/ 13 | * https://www.securiteinfo.com/services/improve-detection-rate-of-zero-day-malwares-for-clamav.shtml 14 | * https://portal.smartertools.com/community/a2583/how-to-greatly-improve-clamav-even-zero-hour-style-protection-for-free.aspx 15 | * https://portal.smartertools.com/community/a90798/are-clamav-cryen-basically-useless.aspx 16 | * Virus statistics on year basis - http://www.shadowserver.org/wiki/pmwiki.php/AV/VirusYearlyStats 17 | -------------------------------------------------------------------------------- /security-testing.md: -------------------------------------------------------------------------------- 1 | ## Security testing of applications 2 | 3 | 90% of security incidents result from attackers exploiting known software bugs. If you can eliminate bugs in the developement phase of software, it could reduce information security risks facing many organizations. 4 | The following techniques are most commonly used for security testing of applications. 5 | 6 | #### Static Application Security Testing(SAST) 7 | It checks if coding is in conformance with the guidelines and standards. SAST does not find runtime errors. SAST can be easily automated and integrated in project's workflow. 8 | 9 | #### Dynamic Application Security Testing(DAST) 10 | It is also known as blackbox testing. Used for finding vulnerabilities in web applications. DAST also allows you to identify flaws in authentication and configuration issues. DAST does not flag coding errors. 11 | 12 | #### Hybrid (SAST and DAST) 13 | Often SAST and DAST are used in tandem to improve performance. 14 | 15 | #### Interactive application security testing(IAST) 16 | SAST and DAST are older technologies but they can not handle modern web and mobile applications wherein extensive AJAX and other interactive technologies are used. 17 | 18 | #### Run-time Application Security Protection (RASP) 19 | RASP works inside the application and is more of a security tool. It is plugged into application and controls application execution. RASP lets the application to run continuous security checks on itself and response to live attacks by terminating attacker's session and alerting defender to the attack. 20 | -------------------------------------------------------------------------------- /signs-of-compromise.md: -------------------------------------------------------------------------------- 1 | ### Use cases for signs of compromise 2 | #### network artifacts 3 | * Unusual DNS queries 4 | * High or low volume port scanning 5 | * DNS tunneling and zone transfers 6 | * Low volume peiodic command and control traffic 7 | * Unusual http headers 8 | * Unknown IoT devices 9 | * Unusual RDP traffic 10 | * Unusual user agent string 11 | * Detection of Tor exit node addresses 12 | * Traffic to and from unknown geographic locations 13 | 14 | #### Host artifacts 15 | * unknown running service(s) 16 | * unknown running programs 17 | * unusual startup locations for known programs 18 | * unusal network connections for program 19 | * sudden appearence of advertizements 20 | * slow system response 21 | 22 | * Spot unknown malware, zero-days and rogue behaviour by insiders - by leveraging baselines and known patterns of bad behaviour 23 | * Detect unusal lateral movement - look for trends in outbound communication 24 | * Uncover APT - uncover hidden patterns in network traffic to unusual geographic location based on time, frequency and contexual information 25 | 26 | -------------------------------------------------------------------------------- /source port 0 traffic: -------------------------------------------------------------------------------- 1 | ## Traffic on Source port 0 in Netflow 2 | 3 | NetFlow will separate TCP communications longer than 5 minutes into separate flows, which can be identified because the source port is ‘0. 4 | 5 | In addition, packets that exceed the maximum transmission unit (MTU) size are fragmented into several packets but only the first packet will contain an valid TCP port. The remaining fragments will have no layer 4 header and thus have a destination port set to 0. 6 | 7 | IANA’s Service Name and Transport Protocol Port Number Registry list port 0 as reserved, but valid, for both TCP and UDP.Because the specification does not define behavior for connections established on those ports, attackers may use responses to fingerprint the operating systems of destination hosts. Furthermore, hackers may craft 'impossible' packets to DDoS firewalls because some routers prevent administrators from entering port 0 in the access control list since it’s supposedly impossible for traffic to be on that port. Making such packets requires using raw sockets software calls that specify everything after the Ethernet header using bytes 8 | 9 | Ref - 10 | * The strange history of port 0 - http://www.lovemytool.com/blog/2013/08/the-strange-history-of-port-0-by-jim-macleod.html 11 | 12 | -------------------------------------------------------------------------------- /system-base-line-building.md: -------------------------------------------------------------------------------- 1 | The following commands are helpful to establish a system base line. 2 | ## User Information 3 | #### Users 4 | ``` 5 | $ cat /etc/passwd | cut -d ":" -f 1 6 | ``` 7 | #### Uid information 8 | ``` 9 | $ cat /etc/passwd | cut -d ":" -f 3 10 | ``` 11 | #### Gid information 12 | ``` 13 | $ cat /etc/passwd | cut -d ":" -f 4 14 | ``` 15 | #### Root users 16 | ``` 17 | $ grep -v -E "^#" /etc/passwd | awk -F: `$3 == 0 { print $1 }` 18 | ``` 19 | ### Cron jobs 20 | #### Own cron jobs 21 | ``` 22 | $ crontab -l -u `whoami` 23 | ``` 24 | #### Job list ( own as well as other users) 25 | ``` 26 | $ ls -la /etc/cron 27 | $ ls -laR /etc/cron 28 | ``` 29 | #### Spool cron jobs 30 | ``` 31 | $ sudo ls -la /var/spool/cron/crontabs 32 | ``` 33 | ### System information 34 | #### Kernel 35 | ``` 36 | $ uname -r 37 | ``` 38 | #### Hostname 39 | ``` 40 | $ uname -n 41 | ``` 42 | #### Architecture 43 | ``` 44 | $ uname -m 45 | ``` 46 | #### Shells present 47 | ``` 48 | $ cat /etc/shells | grep "bin" | cut -d "/" -f3 49 | ``` 50 | #### Environement 51 | ``` 52 | $ env 53 | ``` 54 | #### Path information 55 | ``` 56 | $ $PATH 57 | ``` 58 | ### Password information 59 | #### Umask 60 | ``` 61 | $ grep -i "^umask" /etc/login.defs 62 | ``` 63 | #### Password - max days 64 | ``` 65 | $ grep -i "^pass_max" /etc/login.defs 66 | ``` 67 | #### Password - min days 68 | ``` 69 | $ grep -i "^pass_min" /etc/login.defs 70 | ``` 71 | #### Password - warning days 72 | ``` 73 | $ grep -i "^pass_warn" /etc/login.defs 74 | ``` 75 | #### Password - encryption method 76 | ``` 77 | $ grep -i "^encryption_method" /etc/login.defs 78 | ``` 79 | -------------------------------------------------------------------------------- /tap-vs-span port.md: -------------------------------------------------------------------------------- 1 | ### Taps vs Span ports 2 | There are two common methods to extract traffic directly from the system: TAPs and SPANs. A network TAP is a hardware component that connects into the cabling infrastructure to copy packets for monitoring purposes. A SPAN (Switch Port ANalyzer) is a software function of a switch or router that duplicates traffic from incoming or outgoing ports and forwards the copied traffic to a special SPAN (or sometimes called mirror) port. In general, network TAPs are preferred over SPAN ports for the following reasons: 3 | 4 | * SPAN ports are easily oversubscribed and have the lowest priority when it comes to forwarding, which results in dropped packets 5 | * The SPAN application is processor-intensive and can have a negative performance impact on the switch itself, possibly affecting network traffic 6 | * Because SPAN traffic is easily reconfigured, SPAN output can change from day to day, resulting in inconsistent reporting 7 | 8 | However, there are some situations where inserting a TAP is not practical. For example, traffic could be running on a physical infrastructure outside your direct control, or maintenance windows may not allow for timely TAP deployments. Perhaps a remote location may not be able to justify a permanent TAP, but has SPAN access for occasional troubleshooting needs since a SPAN can be added without bringing down a link. 9 | 10 | ### Passive Taps 11 | A passive TAP requires no power of its own and does not actively interact with other components of the network. It uses an optical splitter to create a copy of the signal and is sometimes referred to as a “photonic” TAP. Most passive TAPs have no moving parts, are highly reliable and do not require configuration. 12 | 13 | ### Active taps 14 | Active TAPs are not passive. They require their own power source to regenerate the signals. There is no split ratio consideration because the TAP receives the message and then retransmits it to both the network and monitoring destinations. From a highlevel perspective this would appear to be a positive feature. Even so, passive TAPs are preferred. During a power outage, an active TAP cannot regenerate the signal, so it becomes a point of failure. Since a passive TAP is not powered, it would be unaffected during a power outage and the packets (originating from a source that still has power) would continue to flow. 15 | 16 | * Ref - https://www.gigamon.com/content/dam/resource-library/english/white-paper/wp-network-taps-first-step-to-visibility.pdf 17 | 18 | 19 | -------------------------------------------------------------------------------- /things-to-explore.md: -------------------------------------------------------------------------------- 1 | * Reproducable Jupyter notebook - https://blog.reviewnb.com/reproducible-notebooks/ 2 | ### Prads - passive assets fingerprinting 3 | * prads presentation - https://www.slideshare.net/huayrass/pradsdagenatificleanen 4 | * Visualization of prads - https://www.duo.uio.no/bitstream/handle/10852/42155/Desta-Dawit-Master.pdf 5 | -------------------------------------------------------------------------------- /threat-feeds.md: -------------------------------------------------------------------------------- 1 | * Coin-miners - https://github.com/ntop/nDPI/blob/dev/example/mining_hosts.txt 2 | -------------------------------------------------------------------------------- /useful-commands.md: -------------------------------------------------------------------------------- 1 | # Useful commands 2 | ### OS statistics 3 | ``` 4 | $ ifconfig -a 5 | $ netstat -s 6 | $ netstat -ni 7 | $ vmstat -S m 1 8 | ``` 9 | ### NIC configuration Ethtool 10 | Format: 11 | Show ; Set 12 | ``` 13 | $ ethtool -S eth0 // Statistics 14 | $ ethtool -S eth0 | egrep '(rx_missed|no_buffer)' // Drop Values 15 | $ ethtool -g eth0 ; ethtool -G eth0 rx 4096 tx 4096 // FIFO RX Descriptors 16 | $ ethtool -k eth0 ; ethtool -K gro on gso on rx on // Offloading 17 | $ ethtool -a eth0 ; ethtool -A rx off autoneg off // Pause Frames 18 | $ ethtool -c eth0 ; ethtool -C eth0 rx-usecs 100 // Interrupt Coalescence 19 | ``` 20 | ### PCAP statistical data 21 | ``` 22 | $ capinfos file.pcap 23 | $ tcpslice -r file.pcap 24 | $ tcpdstat file.pcap 25 | $ tcpprof -S lipn -P 30000 -r file.pcap 26 | ``` 27 | ### Number System Conversions 28 | ``` 29 | $ printf "%d" 0x2d 30 | $ printf "%x" 45 31 | $ printf '\x47\x45\x54\x0a' 32 | $ echo "GET" | hexdump -c 33 | $ echo "GET" | hexdump -C 34 | ``` 35 | ### Session & Flow Data 36 | ``` 37 | $ iftop -i eth0.pcap // live only, replay for same effect 38 | ``` 39 | // use -i instead of -r for interface 40 | ``` 41 | $ tcpflow -c -e -r file.pcap 'tcp and port (80 or 443)' 42 | $ tcpflow -r file.pcap tcp and port \(80 or 443\) 43 | $ tcpick -r file.pcap -C -yP -h 'port (25 or 587)' 44 | ``` 45 | // [-wRu] write both flows; [-wRC] write client flows only ; [-wRS] write server flows only 46 | ``` 47 | $ tcpick -r file.pcap -wRu 48 | ``` 49 | 50 | ### Replay 51 | ``` 52 | $ tcpreplay -M10 -i eth0 file.pcap 53 | $ netsniff-ng --in file.pcap --out eth0 54 | $ netsniff-ng --in eth0.pcap --out eth1.pcap 55 | $ trafgen --dev eth0 --conf trafgen.txf --bind-cpu 0 56 | ``` 57 | ### Audit Record Generation And Utilization System 58 | ``` 59 | $ argus -r file.pcap -w file.argus 60 | $ ra -nnr file.argus ; ra -Z b -nnr file.argus 61 | $ ra -nnr file.argus - host 192.168.1.1 and port 80 62 | $ racluster -M rmon -m saddr -r file.argus 63 | $ ra -nnr file.argus -w - - port 22 | racluster -M rmon -m saddr -r - | rasort -m bytes -r - 64 | $ racluster -M rmon -m proto -r file.argus -w - | rasort -m pkts -r - 65 | $ racluster -M rmon -m proto sport -r file.argus 66 | $ ragraph bytes -M 30s -r file.argus -w bytes.png 67 | $ ragraph pkts -M 30s -r file.argus -w pkts.png 68 | $ ra -nnr file2.argus -s saddr,daddr,loss | sort -nr -k 3 | head -20 69 | $ ragraph dbytes sbytes -M 30s -r file.argus - dst port 80 and dst port 443 70 | $ ragraph dbytes sbytes dport sport -fill -M 30s -r file.argus 71 | ``` 72 | ### Network Forensics - File Extraction 73 | ``` 74 | $ tcpdump -nni eth0 -w image.pcap port 80 & 75 | $ wget http://upload.wikimedia.org/wikipedia/en/5/55/Bsd_daemon.jpg 76 | $ jobs 77 | $ kill %1 78 | $ tcpflow -r image.pcap 79 | $ tcpxtract -f file.pcap -o xtract/ 80 | ``` 81 | 82 | ### change MAC's 83 | ``` 84 | $ tcprewrite --enet-dmac=00:44:66:FC:29:AF,00:55:22:AF:C6:37 85 | --enet-smac=00:66:AA:D1:32:C2,00:22:55:AC:DE:AC --infile=in.pcap 86 | --outfile=out.pcap 87 | ``` 88 | # randomize IP's 89 | ``` 90 | $ tcprewrite --seed=423 --infile=in.pcap --outfile=out.pcap 91 | ``` 92 | Ref - http://www.draconyx.net/talks/pcapworksheet.txt and many thanks to John Schipp. 93 | 94 | -------------------------------------------------------------------------------- /vulnerability-management.md: -------------------------------------------------------------------------------- 1 | ### Vulnerability management 2 | Vulnerabilities in software is a practical reality for any IT professionals/Sysadmins and you can attribute many cyberattacks to the failure of sysadmins to identify vulnerabilties in time or failing to patch the existing vulnerabilities. The latest statistics have shown that such attacks cover more than 90% cases. 3 | 4 | In broad sense, vulnerabilities can be categorized as: 5 | #### Known vulnerabilties 6 | * Known to the world but unknown to sysadmin (lack of awareness) 7 | * Known to sysadmin but failure to patch in time 8 | 9 | #### Unknown vulnerabilties 10 | * Vulnerabilties not yet discovered 11 | * Vulnerabilities known to very few people(zero-day) 12 | 13 | In many cases, public exploits for these vulnerabilities are available on the internet and malicious actors/hackers will be using them on a case-to-case basis. In case of state sponsored hackers, these exploits are specifically written for a targeted environment. 14 | 15 | So, no matter how large or small the network that you oversee, it is critical that every organization has to have a vulnerability management program. 16 | 17 | Vulnerability management is never ending process and you have to be continuously proactive in identifying and mitigating existing vulnerabilties as soon as they are announced. 18 | 19 | #### Vulnerability scans 20 | You need to do periodic vulnerability scanning of all your network devices, servers, host and any other devices on the network and list down the products and programs installed. Further, you have to ensure that this list is up-to-date as users/admin may install/un-install any products/programs in day-to-day operations. 21 | 22 | A good vulnerability scanner(e.g. GFI LANGuard) not only maintains an up-to-date list of installed products/programs on the network but also keeps track of underlying operating system vulnerabilties(Linux, Windows). 23 | 24 | #### Importance of patching 25 | There are many cyberattacks that have happened due to un-patched vulnerabilties. There are many reasons like improper patch schedule,over-confident sysadmins(it may not happen to my server) and lack of awareness of the security vulnerabilty. 26 | 27 | So, it is important to have a patching schedule as a part of your vulnerability management policy. This schedule should allow enough time to test the patches for bugs or any side-effects in staging environements and only then the patching needs to be applied to production machines. For the zero-day or critical vulnerabilties, it is important to have contigency plan so that these can be applied to production systems as soon as possible to minimize the damage. 28 | -------------------------------------------------------------------------------- /web-logs-iocs.md: -------------------------------------------------------------------------------- 1 | ## Web logs analysis - Some indicators of compromise(IOC) 2 | 3 | ### IP-level statistics: 4 | High frequency, periodicity or volume by a single IP address or subnet is suspicious. 5 | 6 | ### User string abbrevations: 7 | Self referencing paths(/./) or backreferences(/../) are used in path traversal attacks 8 | 9 | ### Decoded URLs and HTML entities, escapaed characters, null byte string termination 10 | These are used by simplae signature/rule engines to avoid detection 11 | 12 | ### Unusal refererr patterns: 13 | Page accesses with abnormal referrer url are often a signal of an unwelcome access to http endpoint 14 | 15 | ### Sequence of accesses to endpoints: 16 | Out-of-order access to http endpoints that do not correspond to website logical flow is indicative of fuzzing or malicious explorations. e.g. if typically user access the website after logging in (/login using POST) followed by three successive GETs to /a, /b, /c. But a particular IP is repeatedly making GET requests to /b and /c without corresponding login or /a request. This could be a sign of bot automation or manual reconnaissance activity. 17 | 18 | ### User agent patterns 19 | Perform user agent frequency analysis on user agents to alert on un-usual user agent string or extremely old client. 20 | 21 | Web logs provide enough information about different OWASP Top Ten web application attacks. 22 | 23 | Good Book - Machine Learning and Security: Protecting Systems with Data and Algorithms - David freeman O'reilly 24 | 25 | -------------------------------------------------------------------------------- /weekly-report-template.md: -------------------------------------------------------------------------------- 1 | ## Weekly report template 2 | 3 | ### Objective 4 | * Describe short-term objective(s) of the project keeping in mind the big picture( targets of project) 5 | * If required, add long-term objectives to achieve the final goal. 6 | 7 | ### Work you did 8 | * Explain the steps/work you did to achieve your goal for this week. 9 | 10 | ### Remarks 11 | * Summarize your weekly activities 12 | * What you have learned or gained 13 | * Describe the work you liked 14 | 15 | ### Follow up 16 | * Identify the activities that you plan to take up in the next or coming week 17 | 18 | ### Meetings 19 | * List the name of the meeting that you attended, their purpose and any specific contributions from your side. Also, include if any task(s) assigned to you. 20 | * Any followup action required. 21 | 22 | -------------------------------------------------------------------------------- /why-time-series-databases.md: -------------------------------------------------------------------------------- 1 | Relational databases offer SQL which is way better than key-value or other level ways to manipulate big data sets. However, SQL's expressive power is very limited in time-series domain. Relational tables grow "Downwards" by adding rows and SQL is reasonably expressive and fast. But, time-series data is different - typically, it's a row series data consisting of primary key and series of other attributes. It follows "Wide rows" model. SQL's row-oriented model does not fit well for time-series data. You are essentially building entity-attribute-value model which is in-efficient and contains tons of repeated data and it's difficult to query. 2 | 3 | Secondly, the size of time series data that you are dealing with is hugh - millions of entries. Single-node databases have only a limited capabilities and running a production time series service, you need a distributed database. RRD files are not good foundation for building this type of system. People have tried to build time-series databases on the top of NoSQL databases and popular examples are OpenTSDB, KairosDB. But, the problem with these solutions is getting expreienced people to run the DBs. Also, it's not possible to get a good read performance from these DBs. That's the reason, people have now turned their attention to native time-series databases. 4 | 5 | A unique characteristics of time-series data include write-append-mostly, rare updates, sequential reads, and occasional bulk deletes. The datastore needs to be optimized for all of these. Good examples of Time series databases are OpenTSDB, InfluxDB. In addition, you may also have to look at other parameters like volume of data, flexibility of storage, horizontal scalbility, high availability etc. 6 | --------------------------------------------------------------------------------