├── ML-notes.md
├── README.md
├── TF-IDF.md
├── Tricks-improve-windows-PC-performance.md
├── anonymous-presence-on-internet.md
├── big-data-topics.md
├── block-email-attachment.md
├── blue-team-tips.md
├── bookmarks.md
├── bro_conn_history.md
├── bro_conn_states.md
├── building-word-list.md
├── critical-infra-security.md
├── cyber-security.md
├── detect-compromised-linux-machine.md
├── dns.md
├── encrypted-traffic-fingerprinting.md
├── full-text-search.md
├── icmp-codes.md
├── interview-fun.md
├── linux-auth-log.md
├── linux-forensics.md
├── log-files-and-journalctl.md
├── logs vs metrics.md
├── machine learning terms
├── malware-detection-methods.md
├── malwares.md
├── netflow traffic classification use cases.md
├── network-security-monitoring.md
├── nmap-nse-scripts.md
├── osquery-threat-hunting.md
├── pandas scaling.md
├── quantum-notes.md
├── replace-linux-on-smartphone.md
├── sandbox-drawbacks.md
├── scap-security-compliance.md
├── scoring classification.md
├── security-guidance.md
├── security-testing.md
├── signs-of-compromise.md
├── source port 0 traffic
├── system-base-line-building.md
├── tap-vs-span port.md
├── things-to-explore.md
├── threat-feeds.md
├── useful-commands.md
├── vulnerability-management.md
├── web-logs-iocs.md
├── weekly-report-template.md
└── why-time-series-databases.md


/ML-notes.md:
--------------------------------------------------------------------------------
 1 | ## Machine learning vs traditional programming
 2 | 
 3 | Artifical intelligence is an umbrella that contain other realms like image processing, cognitive science, neural network and much more.
 4 | The core idea is computer not only just use pre-written algorithm, but learns how to solve the problem itself.
 5 | 
 6 | Arthur Samuel - definition - ML is a field of study that gives computers the ability to learn without being explicitly programmed.
 7 | 
 8 | In traditional programming you hard code the behaviour of the program. In machine learning, you leave a lot of that to machine to learn from data.
 9 | 
10 | ML is used in cases where traditional programming strategy falls behnind and it's not enough to fully implement a certain task.e.g. prediction of currency price.. It depends on many factors like country, location, its image, GDP etc. To improve accuracy, you may need many parameters. If you write your program logic with fixed parameters, your algorithm accuracies will not grow.
11 | So, instead of developing algorithm on its own, you need to collect historical data that can be used for model building.
12 | The end result is a model can predict the result more accurately.
13 | 
14 | Ref - https://towardsdatascience.com/machine-learning-vs-traditional-programming-c066e39b5b17
15 | 
16 | ## Data science vs Machine learning
17 | ML and statistics are part of data science. The word 'learning' means it depends on some kind of data. This encompasses many technique such as regression, naive bayes or clustering. But all the techniques does not fit in this category. e.g. unsupervised clustering- clusters are formed without any prior knowledge. Humans will label the clusters.
18 | 
19 | Data science is more than ML. Data in data science may not come from machine or mechanical process. It encompasses many things - many aspects of data processing and not just algorithmic or statisitcal aspect - data integration, data visualization, data engineering, data in production mode, data driven decisions etc.
20 | 
21 | ### Machine learning vs rule based learning
22 | ML and rule based systems are widely used to make inferences from data. Forget the hype about ML, rule based systems have a place in system design.
23 | Rule-based system are simple kind of artifical intelligence which uses a series of IF-THEN-ELSE statements to guide computer to a conclusion.
24 | Rule based system have set of facts and set of rules.
25 | 
26 | Set-of-facts: It is a knowledge base. It' used for formation of rules.
27 | Set-of-rules: It's a rule engine. Rules describe the
28 | relationship between IF and THEN statements.
29 | 
30 | Full rule based systems are built from combined knowledge of human experts in problem domain. The domain experts specify all the steps to make a decision and how to handle special cases. The number of special rule cases may grow over a period of time!
31 | 
32 | In machine learning, instead of emulating decision making process of an expert, you take outcomes from experts. Focussing on outcomes (rather than decision making process) makes machine learning more flexible and less suspectible to  problems in rule based system.
33 | ML also uses probabilistic and stastical methods rather than rules. The basic moto is - given that we know about historical outcomes, what can we say about future outcomes.
34 | Ref - https://deparkes.co.uk/2017/11/24/machine-learning-vs-rules-systems/
35 | 
36 | ### ML vs Deep learning
37 | The main difference between deep and machine learning is, machine learning models become better progressively but the model still needs some guidance. If a machine learning model returns an inaccurate prediction then the programmer needs to fix that problem explicitly but in the case of deep learning, the model does it by himself. Automatic car driving system is a good example of deep learning.
38 | 
39 | Deep learning and machine learning both are the subsets of AI.
40 | “AI is a ability of computer program to function like a human brain ”
41 | 
42 | Machine learning is empowering computer systems with the ability to “learn”. The intention of ML is to enable machines to learn by themselves using the provided data and make accurate predictions. ML is a subset of artificial intelligence; in fact, it’s simply a technique for realizing AI.
43 | Ref - 
44 | * https://towardsdatascience.com/clearing-the-confusion-ai-vs-machine-learning-vs-deep-learning-differences-fce69b21d5eb
45 | * https://emerj.com/ai-glossary-terms/what-is-machine-learning/
46 | 
47 | 
48 | ## Deep learning disadvantages
49 | * it does not work well with small data. For high accuracies, you need large datasets.
50 | * In practice, it is hard and expensive. you need computing and data resources along with human expertize.
51 | * Deep learning is not easily interpreted.It's difficult to validate. Hyper-parameters and network design are also challenge due to absence of theortical foundation.
52 | 
53 | ## Disadvantges of ML
54 | * ML requires massive data sets to train and these should be unbiased and of good quality. So, you have to wait for data to be generated.
55 | * ML requires enough time to let the algorithm to learn to achieve sufficient accuracy and relevancy. So, it needs massive computing and storage resources.
56 | * Selection of right algorithm and interpretation of results is major challenge.
57 | * ML learning is autonomous and is suspectible to errors. Suppose you train your algorithm on small datasets and it may end up making biased predictions.
58 | 
59 | Ref - https://data-flair.training/blogs/advantages-and-disadvantages-of-machine-learning/
60 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Notes
2 | My notes on various topics in Cyber security
3 | 


--------------------------------------------------------------------------------
/TF-IDF.md:
--------------------------------------------------------------------------------
 1 | ## TF-IDF
 2 | 
 3 | TF-IDF is a method to generate features from text by multiplying the frequency of a term (usually a word) in a document (the Term Frequency, or TF) by the importance (the Inverse Document Frequency or IDF) of the same term in an entire corpus. This last term weights less important words (e.g. the, it, and etc) down, and words that don’t occur frequently up. 
 4 | 
 5 | IDF is calculated as:
 6 | 
 7 | IDF(t) = log_e(Total number of documents / Number of documents with term t in it).
 8 | 
 9 | An example (from www.tfidf.com/) illustrates the concept nicely:
10 | 
11 | Consider a document containing 100 words in which the word cat appears 3 times. The term frequency (i.e., tf) for cat is then (3 / 100) = 0.03. Now, assume we have 10 million documents and the word cat appears in one thousand of these. Then, the inverse document frequency (i.e., idf) is calculated as log(10,000,000 / 1,000) = 4. Thus, the Tf-idf weight is the product of these quantities: 0.03 * 4 = 0.12.
12 | 
13 | TF-IDF is very useful in text classification and text clustering. It is used to transform documents into numeric vectors, that can easily be compared.
14 | 
15 | For string matching, algorithms like Jaro-Winkler or Levenshtein distance measures are used commonly. However, these algorithms are not suitable to find string similarities in large datasets as their responses are slow. Using TF-IDF with N-grams can be used to find similar strings as it transforms the problem into matrix multiplication problem and is computationally much cheaper.
16 | 


--------------------------------------------------------------------------------
/Tricks-improve-windows-PC-performance.md:
--------------------------------------------------------------------------------
 1 | ## Some tips to improve Windows PC Performance
 2 | 
 3 | ### High CPU or Disk usage caused by ntoskrnl.exe process in Windows 10
 4 | 
 5 | Windows NT kernel is responsible for managing various services like memory management, process management, hardware resoure management etc.Sometimes, it is seen that this process utilizes too much of CPU/Disk and cause the slowdown of computer especially at startup.
 6 | A minor registry tweak will solve the high CPU/Disk utilization issue. For this, you have to open registry editor ( Open "Run"->"Regedit") and modify the following registry setting:
 7 | 
 8 | 1. Go to section - HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management
 9 | 2. Change/modify key value of  "ClearPageFileAtShutdown" from 0 to 1.
10 | 
11 | ### System File Checker (SFC)
12 | System File Checker (SFC) is a utility in Windows that allows users to scan for corruptions in Windows system files and restore corrupted files. To run, open command prompt  "run"->"cmd " and type
13 | ```
14 | C:\Windows\system32> sfc /scannow
15 | ```
16 | Note - sfc program should be run in administrative mode.
17 | 
18 | ### Deployment Image servicing and management (DISM)
19 | A DISM scan can be used to repair and prepare Windows images, including the Windows Recovery Environment, Windows Setup, and Windows PE. To run a DISM scan, open Command Prompt as administrator and type this command: "DISM /Online /Cleanup-Image /RestoreHealth". Press Enter on the keyboard to execute it.
20 | ```
21 | C:\Windows\system32>  DISM /Online /Cleanup-Image/RestoreHealth
22 | ```
23 | 
24 | * Ref - https://blog.pcrisk.com/windows/12536-ntoskrnlexe-process-is-causing-high-cpu-or-disk-usage-how-to-fix-it
25 | 
26 | 


--------------------------------------------------------------------------------
/anonymous-presence-on-internet.md:
--------------------------------------------------------------------------------
 1 | ### Keeping anonymous on Internet
 2 | * TOR is the proven de-facto service to provide general anonymity
 3 | * Various freee/commercial VPN services also provide good privacy
 4 | * There are some peer-to-peer networks like i2p and freenet in development. You have to
 5 | dug deep to setup on your own!
 6 | * Virtual machine(s) - in one's own environment to provide security and in some way privacy; 
 7 | depends on the use
 8 | * Use of proxychains ( You can use them with Kali VM) - yes, these are not
 9 | completely anonymous; but the traffic will be tunneled through various routes
10 | * Tails operating system is a good option as it uses Tor in the background: available as easy to setup VM and deletes all the user activity after you close the VM.
11 | 
12 | 


--------------------------------------------------------------------------------
/big-data-topics.md:
--------------------------------------------------------------------------------
 1 | ## MapReduce
 2 | Mapreduce(MR) is the computing paradigm used in Hadoop cluster for parallel processing of large datasets.Its hypothesis is designed by google to achieve:
 3 | * parallel execution
 4 | * data distribution 
 5 | * fault tolerance
 6 | MR processes data in the form of key-value pairs. A key-value(KV) pair is a mapping element between the linked data items - key and its value.
 7 | Mapreduce architecture consists of two stages- map stage and reduce stage ( along with intermediate process like shuffling, splitting, sorting). Actual MR process happens in task traker.
 8 | 
 9 | ## How Hadoop provides solution to big data problems:
10 | 
11 | ### Storage issues
12 | HDFS provides distributed way to store big data. Data is stored in blocks across datanodes and you can specify size of blocks.e.g. 512MB is datasize and you have 128MB hadoop data blocks, HDFS will create 4 blocks and store it across different nodes. It will also replicate data blocks on different datanodes. It focuses on horizontal scaling instead of vertical scaling.
13 | 
14 | 
15 | ### Variety of data
16 | You can store all kinds of data - structured, unstrucutred or semi-structured data. There is no pre-dumping schema validation. It follows write onces and read many model. 
17 | 
18 | ### Accessing and processing data in faster way
19 | This is one of the challange of big data. In order to solve it, the processing is moved towards data and not the data towards processing/computing node.-i.e. moving data to master mode and processing it. In mapreduce, processing logic is sent to various slave node and data is processed parallely across different slave nodes. Processed results are sent to master node where the results are merged and response is sent to the client.
20 | 
21 | ### YARN
22 | YARN is used for resource management between data nodes and master nodes. In YARN architecture, we have ResourceManager and NodeManager. ResourceManager might or might not be configured on the same machine as NameNode. But, NodeManagers should be configured on the same machine where DataNodes are present.
23 | 
24 | ### Use case where Hadoop is not effective
25 | * Low latency data access - quick access to small parts of data
26 | * Multiple data modification - Hadoop is better fit only if we are concerned about reading the data and not modifying data
27 | * Lots of small files - Hadoop is suitable for scenerios where we have few but large files.
28 | 
29 | ### References
30 | * http://a4academics.com/tutorials/83-hadoop/840-map-reduce-architecture
31 | * https://www.edureka.co/blog/what-is-hadoop/
32 | 
33 | ## Cloud computing vs Grid computing vs Cluster computing
34 | 
35 | ### Grid Computing
36 | * Loosely coupled(Decentralization)
37 | * Diversity and Dynamism
38 | * Distributed Job Management & scheduling
39 | 
40 | ### Cloud computing
41 | * Dynamic computing infrastructure
42 | * IT servicecentric approach
43 | * Self service based usage model
44 | * Minimally or self managed platform
45 | 
46 | ### Cluster computing
47 | * Tightlycoupled systems
48 | * Single system image
49 | * Centralized Job management & scheduling system
50 | 
51 | Distributed Computing
52 | It is a technique to solve a single  large problem by breaking it down into several tasks where each task is  computed in the individual computers of the distributed system.
53 | 
54 | ### CAP theorm
55 | 
56 | This is proposed by Eric Brewer in 2000 with a set of 3 basic requirements for distributed system consisting of multiple nodes:
57 | * Consistency - All the servers will have same data. So, the users will get same copy regardless of which server they query
58 | * Availability - The system will always respond to request ( even if it's not having the latest data)
59 | * Partition tolerance - The system will continue to operate as a whole even if individual server fails or can't be reached.
60 | 
61 | All 3 requirements are impossible to be met.So, a combination of 2 is chosen and is the deciding factor while technology is used.
62 | 
63 | * Ref - https://www.quora.com/What-is-the-relation-between-SQL-NoSQL-the-CAP-theorem-and-ACID
64 | 


--------------------------------------------------------------------------------
/block-email-attachment.md:
--------------------------------------------------------------------------------
  1 | ### Block unsafe file types in email messages (attachements)
  2 | Google and Microsoft are most heavyweight in Internet World and receive maximum spam messages for their Gmail and Outlook services. In the support pages for each of these services, they have listed the file type extensions that are being blocked. If you are managing your own E-mail server, it's time to take advantage of these extension to secure your organization.
  3 | 
  4 | #### Gmail blocked extensions:
  5 | ```
  6 | ade
  7 | adp
  8 | apk
  9 | appx
 10 | appxbundle
 11 | bat
 12 | cab
 13 | chm
 14 | cmd
 15 | com
 16 | cpl
 17 | dll
 18 | dmg
 19 | exe
 20 | hta
 21 | ins
 22 | isp
 23 | iso
 24 | jar
 25 | js
 26 | jse
 27 | lib
 28 | lnk
 29 | mde
 30 | msc
 31 | msi
 32 | msix
 33 | msixbundle
 34 | msp
 35 | mst
 36 | nsh
 37 | pif
 38 | ps1
 39 | scr
 40 | sct
 41 | shb
 42 | sys
 43 | vb
 44 | vbe
 45 | vbs
 46 | vxd
 47 | wsc
 48 | wsf
 49 | wsh
 50 | ```
 51 | #### Microsoft blocked extensions
 52 | ```
 53 | ade
 54 | adp
 55 | app
 56 | asp
 57 | aspx
 58 | asx
 59 | bas
 60 | bat
 61 | cer
 62 | chm
 63 | cmd
 64 | cnt
 65 | com
 66 | cpl
 67 | crt
 68 | csh
 69 | der
 70 | diagcab
 71 | exe
 72 | fxp
 73 | gadget
 74 | grp
 75 | hlp
 76 | hpj
 77 | hta
 78 | htc
 79 | inf
 80 | ins
 81 | isp
 82 | its
 83 | jar
 84 | jnlp
 85 | js
 86 | jse
 87 | ksh
 88 | lnk
 89 | mad
 90 | maf
 91 | mag
 92 | mam
 93 | maq
 94 | mar
 95 | mas
 96 | mat
 97 | mau
 98 | mav
 99 | maw
100 | mcf
101 | mda
102 | mdb
103 | mde
104 | mdt
105 | mdw
106 | mdz
107 | msc
108 | msh
109 | msh1
110 | msh2
111 | mshxml
112 | msh1xml
113 | msh2xml
114 | msi
115 | msp
116 | mst
117 | msu
118 | ops
119 | osd
120 | pcd
121 | pif
122 | pl
123 | plg
124 | prf
125 | prg
126 | printerexport
127 | ps1
128 | ps1xml
129 | ps2
130 | ps2xml
131 | psc1
132 | psc2
133 | psd1
134 | psdm1
135 | pst
136 | py
137 | pyc
138 | pyo
139 | pyw
140 | pyz
141 | pyzw
142 | reg
143 | scf
144 | scr
145 | sct
146 | shb
147 | shs
148 | theme
149 | tmp
150 | url
151 | vb
152 | vbe
153 | vbp
154 | vbs
155 | vhd
156 | vhdx
157 | vsmacros
158 | vsw
159 | webpnp
160 | website
161 | ws
162 | wsc
163 | wsf
164 | wsh
165 | xbap
166 | xll
167 | xnk
168 | ```
169 | Since there are duplicates, I have merged them to form a combined list and you can use it to block in postfix/sendmail configurations.
170 | ```
171 | ade
172 | adp
173 | apk
174 | app
175 | appx
176 | appxbundle
177 | asp
178 | aspx
179 | asx
180 | bas
181 | bat
182 | cab
183 | cer
184 | chm
185 | cmd
186 | cnt
187 | com
188 | cpl
189 | crt
190 | csh
191 | der
192 | diagcab
193 | dll
194 | dmg
195 | exe
196 | fxp
197 | gadget
198 | grp
199 | hlp
200 | hpj
201 | hta
202 | htc
203 | inf
204 | ins
205 | iso
206 | isp
207 | its
208 | jar
209 | jnlp
210 | js
211 | jse
212 | ksh
213 | lib
214 | lnk
215 | mad
216 | maf
217 | mag
218 | mam
219 | maq
220 | mar
221 | mas
222 | mat
223 | mau
224 | mav
225 | maw
226 | mcf
227 | mda
228 | mdb
229 | mde
230 | mdt
231 | mdw
232 | mdz
233 | msc
234 | msh
235 | msh1
236 | msh1xml
237 | msh2
238 | msh2xml
239 | mshxml
240 | msi
241 | msix
242 | msixbundle
243 | msp
244 | mst
245 | msu
246 | nsh
247 | ops
248 | osd
249 | pcd
250 | pif
251 | pl
252 | plg
253 | prf
254 | prg
255 | printerexport
256 | ps1
257 | ps1xml
258 | ps2
259 | ps2xml
260 | psc1
261 | psc2
262 | psd1
263 | psdm1
264 | pst
265 | py
266 | pyc
267 | pyo
268 | pyw
269 | pyz
270 | pyzw
271 | reg
272 | scf
273 | scr
274 | sct
275 | shb
276 | shs
277 | sys
278 | theme
279 | tmp
280 | url
281 | vb
282 | vbe
283 | vbp
284 | vbs
285 | vhd
286 | vhdx
287 | vsmacros
288 | vsw
289 | vxd
290 | webpnp
291 | website
292 | ws
293 | wsc
294 | wsf
295 | wsh
296 | xbap
297 | xll
298 | xnk
299 | ```
300 | ### References:
301 | * Blocked attachments in outlook - https://support.office.com/en-us/article/Blocked-attachments-in-Outlook-434752E1-02D3-4E90-9124-8B81E49A8519
302 | * Blocked attachments in gmail - https://support.google.com/mail/answer/6590?hl=en
303 |  
304 | 


--------------------------------------------------------------------------------
/blue-team-tips.md:
--------------------------------------------------------------------------------
 1 | ## Blue team tips
 2 | 
 3 | ### Network diagram
 4 | 
 5 | A good network diagram that presents a high level overview of network including ingress/egress points across all sites is a MUST and should include:
 6 | 
 7 | * All routing devices, proxies or gateways that affect flow of traffic
 8 | * External/Internal IP addresses of routing devices, proxies and gateways
 9 | * Workstations, servers or other devices - IP address ranges, custom groupings
10 | 
11 | Scan the network using nmap and build your attack surface. Find also vulnerabilities associated with the services using nmap plugins.
12 | 
13 | ### Control Local Admin account
14 | Microsoft's local administrator password solution(Microsoft LAPS) should be used. If you have a single admin password for every machine, it's easy to compromise all the machines in lateral movement stage once the attacker is in the network. LAPS will create a unique admin password for each endpoint and is securely stored and managed in Active directory environment. However, it requires agent installation and GPO creation.
15 | 
16 | ### Gain visibility in powershell execution
17 | Powershell has become a de-facto standard for attackers for penetration in any network. Some of the popular exploit frameworks are Powersploit, Empire, Nishang, PoshC2 and many others. Powershell comes by default in all Windows installation now a days and it makes the attackers job easy. On the top of it, no logging by default is enabled in Windows.
18 | So, as a defender, you have to enable powershell logging to gain visibility. But, it is enabled only in Powershell Ver 5 onwards and allows powershell module logging, scriptblock logging(input/output commands) and automatic suspicious script detection. It also de-obfuscates encoded powershell commands. The powershell logging results in windows event log codes in the range 400-4xx.
19 | 
20 | ### Track process creation
21 | Tracking process execution is the key for anything malicious infecting the system(s) or running on the system(s). Process creation logging generates event ID 4688.
22 | 
23 | ### Advanced system logging with Sysmon
24 | SysMon provide more information about parent process, file hashes, network connection, loading of DLLs/drivers etc. On the top of it, it's a free tool from Microsoft and every organization should have deployed this tool in their environment.
25 | 
26 | ### Visualization of active directory environment
27 | In windows environment, it is essential to keep track of attack paths to domain controller and bloodhound is the tool you should definitely go for - https://github.com/BloodHoundAD/BloodHound 
28 | It gives information like:
29 | 
30 | ### Active directory defense
31 | You should definitely follow recommendations by Sean Metcalf available at https://adsecurity.org/?p=1684
32 | 
33 | ### Audit endpoint
34 | It is recommended to audit the endpoint using CIS benchmarks or "HardeningAuditor" tool - https://github.com/cottinghamd/HardeningAuditor
35 | Also, Australian signals directorate has a great guide on Windows hardening - https://www.asd.gov.au/publications/protect/Hardening_Win10.pdf
36 | Please go through it carefully.
37 | 
38 | If you wish, there are many good tips available here - 
39 | https://www.sneakymonkey.net/2018/06/25/blue-team-tips/
40 | 
41 | 
42 | 
43 | 
44 | 


--------------------------------------------------------------------------------
/bookmarks.md:
--------------------------------------------------------------------------------
 1 | ### Interesting Bookmarks
 2 | * Useful checklist for backend applications, covering networking, monitoring, logging, backups, secrets etc - https://medium.com/@aleksei.kornev/production-readiness-checklist-for-backend-applications-8d2b0c57ccec
 3 | * Attack pattern detection and prediction - https://medium.com/@ensarseker1/attack-pattern-detection-and-prediction-108fc3d47f03
 4 | * Malware analysis with visual pattern recognition - https://medium.com/@nkent/malware-analysis-with-visual-pattern-recognition-5a4d087c9d26
 5 | * Malware classification using CNN - https://medium.com/@hugom1997/malware-classification-using-convolutional-neural-networks-step-by-step-tutorial-a3e8d97122f
 6 | * Rangeforce Cyber security simulation training platform - https://rangeforce.com/wp-content/uploads/2020/03/A-Market-Guide-to-CyberSecurity-Simulation-Training-2020b.pdf
 7 | * Reconstruct process trees from event logs - https://github.com/williballenthin/process-forest
 8 | * Url analysis using Unfurl - https://lospi.net/python/unfurl/abrade/hacking/2018/02/08/unfurl-url-analysis.html
 9 | * Tracking potential malicious files Belkasoft paper - https://belkasoft.com/whitepaper_tracking_potentially_malicious_files
10 | * Automating PCAP analysis using bash,Security Onion - https://medium.com/@mikecybersec/automating-pcap-parsing-with-linux-cli-bash-security-onion-780cb2b08b6e
11 | * Monitoring linux logs with Kibana - https://medium.com/@solnichkin.antoine/monitoring-linux-logs-with-kibana-and-rsyslog-4dfbbd287807
12 | * Learning cyber security, good links - https://github.com/1d8/CybersecLearning
13 | * Thematic for success in offensive cyber operations, NCC group - https://research.nccgroup.com/wp-content/uploads/2020/07/1992-Insight-Space-Technical-Deep-Dive-June-v2.pdf
14 | 
15 | 
16 | ### Modern web development
17 | * Access AJAX,Websockets,SSE in HTML - https://htmx.org/
18 | * Build blazing fast, modern apps and websites with React - https://www.gatsbyjs.org
19 | 
20 | ### Webinars
21 | * Fishing for network health using batfish - https://www.brighttalk.com/webcast/17628/391789
22 | 
23 | ### Tracking evidence of program execution on windows
24 | * Forensic Artifacts: evidences of program execution on Windows systems - https://www.andreafortuna.org/2018/05/23/forensic-artifacts-evidences-of-program-execution-on-windows-systems/
25 | * Evidence of execution on Windows -
26 |    * https://blog.1234n6.com/2018/10/available-artifacts-evidence-of.html
27 |    * https://blog.1234n6.com/2019/01/available-artifacts-evidence-of.html
28 | ### ICS reated
29 | * Spire is an open-source intrusion-tolerant SCADA system for the power grid - http://www.dsn.jhu.edu/spire/
30 | * Prime: Byzantine Replication Under Attack - http://www.dsn.jhu.edu/prime/
31 | * Spines is a generic messaging infrastructure that provides transparent unicast, multicast and anycast communication over dynamic, multi-hop networking environments - http://spines.org/
32 | * SMesh is a seamless wireless mesh network - http://www.smesh.org/
33 | * pvBrowser, cross platform process visualization engine - https://pvbrowser.de/pvbrowser/index.php
34 | 
35 | ### Time series
36 | * Flexible time series feature extraction & processing library in python - https://github.com/predict-idlab/tsflex
37 | ### Log analysis
38 | * Predictive log analysis - https://github.com/animeshdutta888/System-Failure-Prediction-using-log-analysis
39 | ### Free Book
40 | * Joy of cryptography Book - https://joyofcryptography.com
41 | ### Virtual machines
42 | * Desired state configuration of VM - https://octo.vmware.com/introducing-virtual-machine-desired-state-configuration/
43 | ### IoT
44 | * Fileless attacks on Linux based IoT devices - https://www.ics.uci.edu/~alfchen/fan_mobisys19.pdf 
45 | 


--------------------------------------------------------------------------------
/bro_conn_history.md:
--------------------------------------------------------------------------------
 1 | ### Connection history 
 2 | 
 3 | **Letter**|**Meaning**
 4 | :-----:|:-----:
 5 | s|SYN w/o the ACK bit set
 6 | h|SYN+ACK ("handshake")
 7 | a|pure ACK
 8 | d|packet with payload ("data")
 9 | f|packet with FIN bit set
10 | r|packet with RST bit set
11 | c|packet with a bad checksum
12 | t|packet with retransmitted payload
13 | i|inconsistent packet (e.g. FIN+RST bits set)
14 | q|multi-flag packet (SYN+FIN or SYN+RST bits set)
15 | ^|connection direction was flipped by Bro's heuristic
16 | 


--------------------------------------------------------------------------------
/bro_conn_states.md:
--------------------------------------------------------------------------------
 1 | **Connection state**|**Meaning**
 2 | :-----:|:-----:
 3 | S0|Connection attempt seen
 4 | S1|Connection established
 5 | SF|Normal establishment and termination. Note that this is the same symbol as for state S1. You can tell the two apart because for S1 there will not be any byte counts in the summary
 6 | REJ|Connection attempt rejected.
 7 | S2|Connection established and close attempt by originator seen (but no reply from responder).
 8 | S3|Connection established and close attempt by responder seen (but no reply from originator).
 9 | RSTO|Connection established
10 | RSTR|Responder sent a RST.
11 | RSTOS0|Originator sent a SYN followed by a RST
12 | RSTRH|Responder sent a SYN ACK followed by a RST
13 | SH|Originator sent a SYN followed by a FIN
14 | SHR|Responder sent a SYN ACK followed by a FIN
15 | OTH|No SYN seen
16 | 


--------------------------------------------------------------------------------
/building-word-list.md:
--------------------------------------------------------------------------------
 1 | ### Building word list
 2 | Building word list is absolutely essential if you wish to do red teaming activities like password spraying or participation in CTFs.
 3 | 
 4 | There are 4 main techniques that are generally used to generate word lists.
 5 | * Using regular expressions
 6 | * Extraction of words from website
 7 | * Generation of words based on human heuristics or human profiling
 8 | * Word list from keyboard random key walks
 9 | 
10 | #### Word list using regular expression
11 | Most humans have a tendency to set password(s) based on some patterns. In many organization(s), these patterns are fixed. We can make use of regular expressions to generate word list that potentially matches the existing patterns and find out passwords.
12 | We can use python module ```Exrex```(https://pypi.org/project/exrex/) - a command line tool that generates all — or random — matching strings to a given regular expression and more. 
13 | 
14 | How to install and its usage
15 | ```
16 | $ pip install exrex
17 | $ exrex --help
18 | ```
19 | 
20 | #### Word list from extraction of word(s) from website
21 | CeWL (https://github.com/digininja/CeWL/) is a ruby app which spiders a given url to a specified depth, optionally following external links, and returns a list of words which can then be used for password crackers such as John the Ripper.
22 | 
23 | CeWL also has an associated command line app, FAB (Files Already Bagged) which uses the same meta data extraction techniques to create author/creator lists.
24 | 
25 | How to install and its usage
26 | ```
27 | $ apt install cewl
28 | $ cewl --help
29 | $ cewl -d 2 -m 5 -w docswords.txt https://example.com
30 | ```
31 | ### Word list from human profiling
32 | In many past security incidences, it has been found that employees are the weak link and can be easy target for setting an initial foothold for getting into the organization. People tend to set weak passwords that matches their personal interest and personal information. There is a tool ```CUPP``` (https://github.com/Mebus/cupp) that generates potential passwords based on personal information.
33 | 
34 | How to install and its usage
35 | ```
36 | $ git clone https://github.com/Mebus/cupp.git
37 | $ cd cupp
38 | $ python3 cupp.py -h
39 | $ python3 cupp.py -i
40 | ```
41 | ### Word list from keyboard random key walks
42 | Keyboard random walk refers to a word-list which are made up of adjacent keys on the keyboard like 12345678, or 1qazxsw2. Of course, there are many ways, key random walks can be generated. There is a python module ```kwprocessor``` (https://github.com/hashcat/kwprocessor) for tracking such key walks.
43 | 
44 | 
45 | How to install and its usage
46 | ```
47 | $ git clone https://github.com/hashcat/kwprocessor.git
48 | $ cd kwprocessor
49 | $ make
50 | ```
51 | Keymaps folder contains keyboard layout for multiple languages and the routes folder has 7 pre-configured keymap walks that can be used to generate a word-list.
52 | 
53 | Usage
54 | ```
55 | $ /kwp basechars/full.base keymaps/en.keymap routes/2-to-10-max-3-direction-changes.route
56 | ```
57 | This causes kwp to create multiple keymap walk combinations, of 2–10 characters with a maximum of 3 direction changes.
58 | 
59 | In addition, there are popular tools like crunch ,a wordlist generator,(https://sourceforge.net/projects/crunch-wordlist/files/crunch-wordlist/crunch-3.6.tgz/downloadoad) where you can specify a standard character set or a character set you specify. crunch generates wordlists in both combination and permutation ways.
60 | 
61 | Ref:
62 | * https://medium.com/owasp-chennai/building-word-lists-for-red-teamers-a8ba2d79ee3
63 | 


--------------------------------------------------------------------------------
/critical-infra-security.md:
--------------------------------------------------------------------------------
 1 | ### Security of Critical infrastructure
 2 | 
 3 | Some key areas to enable a secure, connected critical infrastructure:
 4 | * Enabling methodologies for communicating between a combination of trusted, untrusted, and adversarial networks as well as trusted, untrusted, and potentially adversarial equipment.
 5 | * Develop policy, guidelines, and suggestions for inspecting and whitelisting communications involving critical infrastructure including but not limited to ICS, SCADA, OT, and IoT.
 6 | * Deterministic timelines of security and functionality software/firmware updates to critical infrastructure 
 7 | * Making available innovative, trusted architectures
 8 | 
 9 | ### Cybersecurity Maturity Model Certification (CMMC)
10 | By developing the Cybersecurity Maturity Model Certification (CMMC), it is possible to normalize and standardizes cybersecurity preparedness. CMMC removes the disadvantages of a cybersecurity investment and requires an independently certified maturity level with well-defined guidelines in order to participate in certain acquisitions or to supply certain types of goods. This is good for cybersecurity because it reduces competitive threat from poorly secured Original Equipment Manufacturers (OEMs) and incentivizes positive security behaviors.
11 | 
12 | ### Network segmentation
13 | To mitigate risks from potentially compromised equipment, it is best to assume that it is compromised already but it needs to be used regardless. The most common approach adopted is widespread physical network segmentation.
14 | In network segmentation , you are seperating various devices into multiple subnets and blocking packet routing across the gateway. By enforcing this, you are limiting attackers from propagating laterally to additional targets.
15 | Generally, network segmentation is achieved by logically dividing the devices into small subnets and placing a Next-Generation Firewall (NGFW) at the gateway, and only allowing approved communications (port, protocol, application, and recipient) to pass.
16 | Another type of network segmentation is physical separation, or “air-gapping”, where a set of subnets operates on a separate network that cannot be routed in or out of. This is widely used in national defense and nuclear applications because it makes the execution of zero-day exploits exponentially more difficult.
17 | 
18 | ### Traffic inspection
19 | It's important to build traffic inspection mechanism(s) in the critical networks to capture unknown traffic patterns.
20 | Although implementing physical network segmentation is often recommended approach, it has its pros and cons - it keeps bad things out; but it keeps good things out as well. ICS and OT network need to be able to pass information such as control instructions and operating metrics. Further, technicians and engineers need to be
21 | able to perform maintenance.
22 | 
23 | To move this data across the air-gap, sometimes a data diode is used. A data diode passes data in one direction only and it cannot be reversed. Although diodes help get data across the gap, they are simple devices; they don’t check to make sure it’s valid data and there are no policy violations.
24 | To overcome these limitations, the concept of data guard is introduced. Data guards inspect the traffic moving between 2 or more airgapped networks and provide byte-level deep content inspection, data validation and filtering that can be tailored to customer-specific security policies, requirements, and risks. 
25 | 
26 | With strong traffic inspection and enforcement across the air-gap, it's possible to eliminate major vectors of attacks.
27 | 
28 | ### References:
29 | * https://securitydelta.nl/media/com_hsd/report/246/document/HSD-Rapport-Data-Diodes.pdf
30 | * https://www.zadara.com/wp-content/uploads/AirGap_Arrosoft_Solution_Brief.pdf
31 | 


--------------------------------------------------------------------------------
/cyber-security.md:
--------------------------------------------------------------------------------
 1 | ### Cyber security aims
 2 | * reduce the likehood of damaging cyber intrusion
 3 | * detect potential intrusion
 4 | * ensure that organization is prepared to respond if an intrusion occurs
 5 | * maximize organization's resilience to the destructive cyber incident
 6 | 
 7 | ### Reduce likehood of damaging cyber intrusion
 8 | * Validate all remote access to the organization's network and investigate the non-geniune ones
 9 | * Enforce privileged or admin access requires multi-factor authentication
10 | * All the softwares are up-to-date and all known vulnerabilities are patched.
11 | * All ports and protocols that are not essential are closed 
12 | * Ensure that all cloud service accesses are reviewed and strong auth controls are in place
13 | 
14 | ### Detect potential intrusion
15 | * Identify unexpected and/or unusual behaviour in network. Enable logging to better investigate the issues.
16 | * Entire network is protected by antivirus/antimalware software with up-to-date signatures
17 | 


--------------------------------------------------------------------------------
/detect-compromised-linux-machine.md:
--------------------------------------------------------------------------------
  1 | ## Detecting compromised linux server - some commands
  2 | 
  3 | ### Verify md5 checksum of RPM files
  4 | 
  5 | ```
  6 | # rpm -qa | xargs rpm -V
  7 | ```
  8 | 
  9 | ### Track network connections
 10 | ```
 11 | # netstat -an
 12 | # netstat -nalp
 13 | # nestat -plant
 14 | # ss -a -e -i
 15 | ```
 16 | 
 17 | ### Watch traffic in detail on demand for a specific port
 18 | ```
 19 | # tcpdump src port 6697
 20 | ```
 21 | 
 22 | ### Process tree
 23 | ```
 24 | # ps -auxwf
 25 | ```
 26 | 
 27 | ### Deleted binaries still running
 28 | ```
 29 | # ls -alR /proc/*/exe 2> /dev/null | grep -i deleted
 30 | ```
 31 | 
 32 | ### Process command name/cmdline
 33 | ```
 34 | # strings /proc/<PID>/comm
 35 | # strings /proc/<PID>/cmdline
 36 | ```
 37 | 
 38 | ### Real process path
 39 | ```
 40 | # ls -la /proc/<PID>/exe
 41 | ```
 42 | ### Process environment
 43 | ```
 44 | # strings /proc/<PID>/environ
 45 | ```
 46 | ### Process working directory
 47 | ```
 48 | # ls -alR /proc/*/cwd
 49 | ```
 50 | ### Processes running from tmp, dev directories
 51 | ```
 52 | # ls -alR /proc/*/cwd 2> /dev/nulll | grep tmp
 53 | # ls -alR /proc/*/cwd 2> /dev/nulll | grep dev
 54 | ```
 55 | ### List all hidden directories
 56 | ```
 57 | # find / -type d -name ".*"
 58 | ```
 59 | 
 60 | ### Check for zero size logs
 61 | ```
 62 | # ls -al /var/log/*
 63 | ```
 64 | ### Dump audit logs
 65 | ```
 66 | # utmpdump /var/log/wtmp
 67 | # utmpdump /var/run/utmp
 68 | # utmpdump /var/log/btmp
 69 | ```
 70 | ### Track last logins
 71 | ```
 72 | # last
 73 | # lastb
 74 | ```
 75 | ### Find logs with binary content
 76 | ```
 77 | # grep [[:cntrl:]] /var/log/*.log
 78 | ```
 79 | ### Check scheduled tasks
 80 | ```
 81 | # crontab -l
 82 | # atq
 83 | # systemctl list-timers --all
 84 | ```
 85 | ### Look for UID-0 and GID-0
 86 | ```
 87 | # grep ":0:" /etc/passwd
 88 | ```
 89 | ### Check sudoers file
 90 | ```
 91 | # cat /etc/sudoers
 92 | # cat /etc/group
 93 | ```
 94 | ### Find all ssh authorized_keys files
 95 | ```
 96 | # find / -name authorized_keys
 97 | ```
 98 | ### history file for user
 99 | ```
100 | # find / -name *history
101 | ```
102 | ### History files linked to /dev/null
103 | ```
104 | # ls -laR / 2> /dev/null | grep *history | grep null
105 | ```
106 | ### Find all hidden directories
107 | ```
108 | # find / -type d -name ".*"
109 | ```
110 | ### Find files modified within last day
111 | ```
112 | # find / -mtime -1
113 | ```
114 | ### Files/directories with no user/group name
115 | ```
116 | # find / \( -nouser -o -nogroup \) -exec ls -lg {} \;
117 | ```
118 | ### Immutable files and directories
119 | ```
120 | # lsattr / -R 2> /dev/null | grep "\----i"
121 | ```
122 | ### Find SUID/SGID files
123 | ```
124 | # find / -type f \( -perm -04000 -o -perm -02000 \) -exec ls -lg {} \;
125 | ```
126 | ### Find all executable files
127 | ```
128 | # find / -type f -exec file -p '{}' \; | grep ELF
129 | ```
130 | ### Find all executable files in tmp directory
131 | ```
132 | # find /tmp -type f -exec file -p '{}' \; | grep ELF
133 | ```
134 | Thanks to Sandrfly security - https://www.sandflysecurity.com/wp-content/uploads/2018/11/Linux.Compromise.Detection.Command.Cheatsheet.pdf
135 | 


--------------------------------------------------------------------------------
/dns.md:
--------------------------------------------------------------------------------
  1 | ### Recursive vs Non-Recursive(Iterative) DNS query
  2 | 
  3 | * Iterative DNS queries are the one in which DNS server is queried and returns an answer without querying other DNS servers. Iterative queries are non-recursive queries.
  4 | 
  5 | * Recursive queries occur when DNS client requests information from DNS server that is set to query subsequent DNS servers until a definitive answer is returned to client. The queries made to subsequent DNS servers from the first DNS server are iterative queries. It may be noted that root server's are always iterative servers.
  6 | 
  7 | A DNS server that supports recursive resolution is vulnerable to DOS (denial of service) attacks, DNS cache poisoning, unauthorized use of resources, and root name server performance degradation.
  8 | 
  9 | #### Ref:
 10 | * https://www.slashroot.in/difference-between-iterative-and-recursive-dns-query
 11 | 
 12 | ### Some flags in DNS packet
 13 | 
 14 | #### AA - Authoritative answer 
 15 | specifies if the responding name server is the authority for domain name in question
 16 | * 0 - non-authoritative
 17 | * 1 - authoritative
 18 | 
 19 | #### TC - Truncated
 20 | Indicates that only first 512 bytes of reply was returned
 21 | * 0 - truncated
 22 | * 1 - message truncated
 23 | 
 24 | #### RD - Recursion desired
 25 | Name server is directed to pursue query recursively
 26 | * 0 - recursion not desired
 27 | * 1 - recursion desired
 28 | 
 29 | #### RA - Recursion available
 30 | Indicates if recursive query support is available on name server
 31 | * 0 - recursive query support not available
 32 | * 1 - recursive query support available
 33 | 
 34 | #### Z 
 35 | This flag is reserved for future use
 36 | 
 37 | ## Tracking evil in DNS logs
 38 | DNS logs ( either from PassiveDNS or Bro/Zeek logs) contain lot of useful information and it can be used to track down malware. Please find below some of the queries that you can use to extract malicious domains.
 39 | 
 40 | ### Multiple level subdomains
 41 | DNS queries usually do not use multiple subdomains. High number of subdomains might indicate
 42 | that DNS is malicious. However, Content Delivery Networks (CDN) can be exception to this type 
 43 | of queries.
 44 | ```
 45 | $ echo W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | tr -cd '.' | wc -c
 46 | ```
 47 | If the number of dots are more than 5, it's safe to assume that something is not alright!!
 48 | 
 49 | ### Domain length / DNS query length 
 50 | Total length of dns query is a good indicator of malicious communication as long queries are often used for data exfiltration or communication with C&C servers. The longer the length of request query, the risk that it might be malicious is larger.
 51 | 
 52 | ```
 53 | $ echo -n W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | wc 
 54 | ``` 
 55 | The rule of thumb you can take is:
 56 | if you find 63 characters in any part (subdomain) then the request is likely malicious.
 57 | 
 58 | ### Domain Entropy
 59 | It is seen that entropy of malicious domains is higher than entropy of legitimate domains.
 60 | ```
 61 | $ echo -n W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | ent
 62 | ```
 63 | Average entropy of top 11k Alexia domains is around 3.1
 64 | However, relying on entropy alone can result in a lot number of false positive as entropy of CDN networks is also higher. So, you need to combine this indicator with other one like domain query length and it can be very effective.
 65 | 
 66 | ### Mix of uppercase and lowercase letters
 67 | If there are mixed upper/lower case characters in domains, it should be investigated as it might be base64 encoded data.
 68 | Usually, most of the domain characters are either in lowercase or uppercase.
 69 | ```
 70 | $ echo W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | tr -cd '[A-Z]' | wc -c
 71 | 13
 72 | $ echo W.0228452040.I0.aHR0cHM6Ly9zc2wuZ3N0YXRpYy5jb20v.19.x.wpad.software | tr -cd '[a-z]' | wc -c
 73 | 25
 74 | ```
 75 | 
 76 | ### Non-alphanumeric DNS requests
 77 | Domain name registration is allowed only with alphabet letters, digits and hypens. Domains which are using other characters like Punycode are rare.
 78 | Non-alphanumeric DNS requests are very rare and in most cases, these are related to malicious behaviour.
 79 | 
 80 | ```
 81 | $ host cmVkdGVhbS5wbA==.redteam.pl
 82 | cmVkdGVhbS5wbA==.redteam.pl is an alias for redteam.pl.
 83 | ```
 84 | ### Use of Punycode
 85 | Punycode (RFC 3492) allows usage of special letters from alphabets other than english e.g. polish in domain names
 86 | Punycode is not very popular and is often used in Phishing or malware campigns. So, it is a good practice to keep track of
 87 | puny domains.
 88 | 
 89 | ### Types of dns requests 
 90 | Common DNS queries are of Type - A, AAAA, and PTR. If you encounter unusual queries like AXFR, ANY, TXT, these need a closer look for investigation.
 91 | 
 92 | ### TTL (time to live)
 93 | A lower TTL value indicates the probability of malicious behaviour. But, it's not true in case of CDNs such as Cloudflare where TTL is 300 seconds.
 94 | If you observe a TTL of 0-1 seconds, then it's likely a malicious domain.
 95 | 
 96 | #### Ref:
 97 | The following article cover a lot of DNS queries that indicate possible malicious DNS communication and it is highly recommended to go through it for more information - https://blog.redteam.pl/2019/08/threat-hunting-dns-firewall.html
 98 | 
 99 | 
100 | 
101 | 


--------------------------------------------------------------------------------
/encrypted-traffic-fingerprinting.md:
--------------------------------------------------------------------------------
  1 | ## Encrypted traffic analysis
  2 | 
  3 | Most of the network traffic uses https protocol and it's difficult to get information about data payloads unless you have access to endpoints.
  4 | 
  5 | By using and analyzing initial TLS handshakes, good visibility can be achieved in encrypted traffic to detect the following use cases:
  6 | 
  7 | * Breach Detection
  8 | * Insider and Advanced Threat Detection
  9 | * High Risk Application Detection
 10 | * Policy Violations
 11 | * Encrypted Traffic Analytics
 12 | 
 13 | ### Why fingerprinting
 14 | Fingerprints in the digital world are similar to what human fingerprints are in the real world.
 15 | A fingerprint is a group of information that can be used to detect software, network protocols, operating systems or hardware devices.
 16 | 
 17 | Fingerprinting is used to correlate data sets in order to identify with high probability network services, operating system number and version, software applications, databases, configurations and more. Once the penetration tester has enough information, this fingerprinting data can be used as part of an exploit strategy against the target.
 18 | 
 19 | ### How OS and network fingerprinting work?
 20 | In order to detect OS, networks, services and application names and numbers, attackers will launch custom packets to the target. These packets will receive a response from the victim in the form of a digital signature. This signature is one of the keys to identify what software, protocols and OS is running the target device.
 21 | 
 22 | Fingerprinting techniques are based on detecting certain patterns and differences in network packets generated by operating systems. These often analyze different types of packets and information such as TCP Window size, TCP Options in TCP SYN and SYN+ACK packets, ICMP requests, HTTP packets, DHCP requests, IP TTL values as well as IP ID values, etc.
 23 | 
 24 | ### Active fingerprinting
 25 | Active fingerprinting is the most popular type of fingerprinting in use. It consists of sending packets to a victim and waiting for the victim’s reply to analyze the results. This is often the easiest way to detect remote OS, network and services. It’s also the most risky as it can be easily detected by intrusion detection systems (IDS) and packet filtering firewalls.
 26 | A popular platform used to launch active fingerprint tests is Nmap. This handy tool can help you detect specific operating systems and network service applications when you launch TCP, UDP or ICMP packets against any given target.
 27 | 
 28 | ### Passive fingerprinting
 29 | Passive fingerprinting is an alternative approach to avoid detection while performing your reconnaissance activities.
 30 | The main difference between active and passive fingerprinting is that passive fingerprinting does not actively send packets to the target system. Instead, it acts as a network scanner in the form of a sniffer, merely watching the traffic data on a network without performing network alteration.
 31 | 
 32 | In cybersecurity fingerprinting, one of the most popular methods involves OS name and version detection and is part of usual data intelligence process when running your OSINT research. While many tools may fit into this particular category, the following tools are popular in security community:
 33 | 
 34 | ### Nmap
 35 | Nmap has many features as a port scanner, but also as an OS detection software.
 36 | 
 37 | A simple OS detection query using nmap looks like this:
 38 | ```
 39 | $ sudo nmap -O X.X.X.X
 40 | ```
 41 | In case there is a firewall blocking your request, you can add the -Pn option, as shown below:
 42 | ```
 43 | $ sudo nmap -O X.X.X.X -Pn
 44 | ```
 45 | A more aggressive approach can be taken by using -A option, but this may result in firewall detection from the remote host:
 46 | ```
 47 | $ sudo nmap -A X.X.X.X
 48 | ```
 49 | 
 50 | ### P0f (http://lcamtuf.coredump.cx/p0f3/)
 51 | P0f offers a good alternative to Nmap and cane be used as a passive fingerprinting tool used to analyze network traffic and identify patterns behind TCP/IP based communications that are often blocked for Nmap active fingerprinting techniques.
 52 | It includes powerful network-level fingerprinting features, as well as one that analyzes application-level payloads such as HTTP. It’s also useful for detecting NAT, proxy and load balancing setups.
 53 | 
 54 | Once installed, you can perform any fingerprinting against the network by running:
 55 | ```
 56 | $ p0f -i eth0
 57 | ```
 58 | 
 59 | It is also possible to read offline pcap file
 60 | 
 61 | ```
 62 | $ p0f -r some_capture.cap
 63 | ```
 64 | 
 65 | ### Ettercap (http://ettercap.github.io/ettercap/)
 66 | Ettercap is network sniffing tool that supports many different protocols including Telnet, FTP, Imap, Smb, MySQL, LDAP, NFS and encrypted ones like SSH and HTTPS.
 67 | 
 68 | This tool is often used to launch man-in-the-middle attacks by the hackers. However, it is useful as a fingerprinting tool that can help identify local and remote operating systems along with running services, open ports, IP, mac address and network adapter vendor.
 69 | 
 70 | Ettercap can be easily installed on most Unix/Linux platforms. In order to perform OS and service detection, it will sniff the entire network and save the results in profiles.
 71 | 
 72 | 
 73 | ## Service fingerprinting
 74 | In addition to fingerprinting remote OS names and versions, it is also possible to fingerprint specific network services.
 75 | 
 76 | ### SSH Fingerprinting
 77 | 
 78 | Hassh (https://github.com/salesforce/hassh) has become de-facto SSH Fingerprinting standard to accurately detect and identify specific Client and Server SSH deployments. These fingerprints uses MD5 as a default storage method for later analysis and usage comparisions.
 79 | 
 80 | While SSH is a fairly secure protocol, it has a few drawbacks when it comes to analyzing interaction between client and server. In this case, using Hassh can help in situations that include:
 81 | 
 82 | * Managing alerts and automatically blocking SSH clients using a Hassh fingerprint outside of a known “good set”.
 83 | * Detecting exfiltration of data by using anomaly detection on SSH Clients with multiple distinct Hassh values
 84 | * In forensic investigation, SSH connection attempts can be tracked with greater granularity and can be followed up by source IPs. Since "Hassh" based hash is associated with SSH client software, it's possible to detect the origin even if the IP is behind a NAT and is shared by different SSH clients.
 85 | * Detecting and identifying specific client and server SSH implementations.
 86 | 
 87 | Hassh works by using the MD5 “hassh” and “hasshServer” (created from a specific set of algorithms by SSH clients and SSH server software) in the SSH encrypted channel. This generates a unique identification string that can be used to fingerprint client and server applications. e.g.
 88 | 
 89 | ```
 90 | c1c596caaeb93c566b8ecf3cae9b5a9e SSH-2.0-dropbear_2016.74
 91 | d93f46d063c4382b6232a4d77db532b2 SSH-2.0-dropbear_2016.72
 92 | 2dd9a9b3dbebfaeec8b8aabd689e75d2 SSH-2.0-AWSCodeCommit
 93 | ```
 94 | 
 95 | ### SSL fingerprinting using JA3 ( https://github.com/salesforce/ja3)
 96 | JA3/JA3S developed by Salesforce team  is an SSL/TLS fingerprint method. This tool allows you to create fingerprints that can be produced on any platform for threat intelligence analysis.
 97 | 
 98 | In the same cases, using JA3/JA3S as a fingerprinting technique for the TLS negotiation between both ends (client and server) can produce a more accurate identification of the encrypted communications and helps identify clients and servers with high probability in almost all cases e.g. 
 99 | 
100 | ```
101 | Standard Tor Client:
102 | 
103 | JA3 = e7d705a3286e19ea42f587b344ee6865 (Tor Client)
104 | JA3S = a95ca7eab4d47d051a5cd4fb7b6005dc (Tor Server Response)
105 | ```
106 | ### DNS fingerprinting using fpdns (https://github.com/kirei/fpdns)
107 | Some tools like Fpdns can be used to identify based on queries DNS the software that is used as the DNS server. This is especially TRUE even if DNS server "BIND" version printing is disabled.
108 | 
109 | ```
110 | $ sudo apt install fpdns
111 | $ sudo fpdns -D site.com
112 | 
113 | Replace site.com with the actual site of your interest!
114 | ```
115 | 
116 | ### Interesting links:
117 | * https://securitytrails.com/blog/cybersecurity-fingerprinting
118 | * https://blogs.cisco.com/security/tls-fingerprinting-in-the-real-world
119 | * https://www.netresec.com/?page=Blog&tag=Satori
120 | * https://www.grc.com/fingerprints.htm
121 | * https://jis-eurasipjournals.springeropen.com/articles/10.1186/s13635-016-0030-7
122 | * Various DNS tools - https://www.dns-oarc.net/tools
123 | 
124 | 


--------------------------------------------------------------------------------
/full-text-search.md:
--------------------------------------------------------------------------------
 1 | ### Why full text search
 2 | 
 3 | If you fire query like:
 4 | 
 5 | SELECT * FROM table_name WHERE Foo LIKE '%Bar';
 6 | 
 7 | can not take advantage of index. It has to look at every single row and see if it matches.  A full text index can give you instant answer! Full text index can offer a lot of flexibility in terms of the order of matching words, how words are closer.
 8 | 
 9 | #### Stemming
10 | A fulltext search can stem words. If you search for run, you can get results for "ran" or "running". Most fulltext engines have stem dictionaries in a variety of languages.
11 | 
12 | #### Weighted results
13 | A fulltext index can encompass multiple columns.e.g. if you search for "peach pie", the index can include title, keywords and a body. Results that match the title can be weighted higher as more relevant and can be sorted to show near the top.
14 | 
15 | #### Disadvantages
16 | A fulltext index can be potentially huge, many times larger than standard B-tree index. So, most hosted providers who offer database instances disable this feature or at least charge extra for it. 
17 | Fulltext indexes can be slower to update. If the data changes a lot, there might be some lag updating indexes compared to standard indexes.
18 | 
19 | The following stackoverflow answer nicely explains what is the meaning of fulltext search.
20 | 
21 | In general, there is a tradeoff between "precision" and "recall". High precision means that fewer irrelevant results are presented (no false positives), while high recall means that fewer relevant results are missing (no false negatives). Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall.
22 | 
23 | Most full text search implementations use an "inverted index". This is an index where the keys are individual terms, and the associated values are sets of records that contain the term. Full text search is optimized to compute the intersection, union, etc. of these record sets, and usually provides a ranking algorithm to quantify how strongly a given record matches search keywords.
24 | 
25 | The SQL LIKE operator can be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. In the worst case, the LIKE pattern will have leading wildcards that require every index key to be examined. In contrast, many information retrieval systems can enable support for leading wildcards by pre-compiling suffix trees in selected fields.
26 | 
27 | Other features typical of full-text search are
28 | 
29 | * lexical analysis or tokenization—breaking a block of unstructured text into individual words, phrases, and special tokens
30 | * morphological analysis, or stemming—collapsing variations of a given word into one index term; for example, treating "mice" and "mouse", or "electrification" and "electric" as the same word
31 | * ranking—measuring the similarity of a matching record to the query string
32 | 
33 | Ref - https://stackoverflow.com/questions/224714/what-is-full-text-search-vs-like
34 | 
35 | ### Hadoop-Spark-Elasticsearch
36 | Hadoop is distributed file system which allows you to develop distributed data processing applications under map-reduce model.
37 | Spark is a layer on the top of Hadoop which allows you to develop applications in scala or python and improves the performance of iterative process by a factor of 100.
38 | Elasticsearch is a distributed RESTful search engine and uses NoSQL to store documents.
39 | 


--------------------------------------------------------------------------------
/icmp-codes.md:
--------------------------------------------------------------------------------
 1 | ### ICMP codes
 2 | **Code**|**Description**|**References**
 3 | :-----:|:-----:|:-----:
 4 | 0|Network unreachable error|RFC 792
 5 | 1|Host unreachable error|RFC 792
 6 | 2|Protocol unreachable error Sent when the designated transport protocol is not supported|RFC 792
 7 | 3|Port unreachable error|Sent when the designated transport protocol is unable to demultiplex the datagram but has no protocol mechanism to inform the sender
 8 | 4|The datagram is too big|Packet fragmentation is required but the DF bit in the IP header is set
 9 | 5|Source route failed error|RFC 792
10 | 6|Destination network unknown error|RFC 1122
11 | 7|Destination host unknown error|RFC 1122
12 | 8|Source host isolated error (Obsolete)|RFC 1122
13 | 9|The destination network is administratively prohibited|RFC 1122
14 | 10|The destination host is administratively prohibited|RFC 1122
15 | 11|The network is unreachable for Type Of Service|RFC 1122
16 | 12|The host is unreachable for Type Of Service|RFC 1122
17 | 13|Communication Administratively Prohibited.This is generated if a router cannot forward a packet due to administrative filtering|RFC 1812
18 | 14|Host precedence violation.Sent by the first hop router to a host to indicate that a requested precedence is not permitted for the particular combination of source/destination host or network| upper layer protocol
19 | 15|Precedence cutoff in effect.The network operators have imposed a minimum level of precedence required for operation| the datagram was sent with a precedence below this level
20 | 16-255| Not assigned|	 
21 | 


--------------------------------------------------------------------------------
/interview-fun.md:
--------------------------------------------------------------------------------
 1 | ### Why do people hack websites? 
 2 | * To render a website useless or shut it down.
 3 | * To digitally steal your money, especially through banking Trojans and malicious lines of codes.
 4 | * Politically driven defacing of rivals websites. i.e., defacing a website belonging to a contestant in some election.
 5 | * Purely mischievous fun. e.g., school’s own students, attacking its website
 6 | 
 7 | ### Is it possible to detect that web site is hacked?
 8 | * Your website is redirected to another URL that in most cases is a pornographic website.
 9 | * A google alert appearing on the website which informs that the site has been hacked.
10 | * Strange looking JavaScript appears in the source code of the site.
11 | * You find new admin, database and FTP users which were not created by you.
12 | * Spam advertisements and pop-ups on the website due to malicious codes.
13 | * The site is no more accessible by Google.
14 | 
15 | ### What steps do you take to restore?
16 | Some steps - not in order:
17 | * Inform your hosting service provider/web designer.
18 | * Run a full virus scan of your computers.
19 | * It is critical to know how severe was the attack and exactly how much damage has it caused.
20 | * shutdown the site
21 | * change passwords
22 | * Request google review
23 | 
24 | ### what is cybersquatting?
25 | Similar domain to your actual domain. e.g. google.com will be typed as g00gle.com Users inadvetantly type the cybersquatted name and somebody has deliberately registered this domain for malicious purpose.
26 | 
27 | 
28 | 


--------------------------------------------------------------------------------
/linux-auth-log.md:
--------------------------------------------------------------------------------
 1 | ### Auth.log analysis
 2 | 
 3 | In linux, the file(/var/log/auth.log) contains authorization information like:
 4 | * remote login
 5 | * usage of sudo command
 6 | * instances where a user password is required for authorization
 7 | 
 8 | The file is stored in plain text, and rolled/archived logs will be compressed with gzip. 
 9 | The analysis of this file will allow us to track anomalous activities. Some of the use-cases that can be tracked are given below:
10 | 
11 | #### sudo commands
12 | 'sudo' allows a user to execute a command with superuser privileges, or another user (superuser is the default). These are authorization events that you should definitely keep track of from security perspective.
13 | [[screenshot]]
14 | 
15 | Parsing of this file allows us to see what folder the command was issued from, the user, as well as the command itself! 
16 | 
17 | #### root session
18 | Aside from sudo logs, which may be used to run a command with elevated privileges,  it is possible to keep track of users  escalating to root. From the logs, one can also find out number of session as well as their durations (opening time/closing time) and these metrics are useful for tracking malicious activities.
19 | [[screenshot]]
20 | 
21 | #### ssh activity
22 | It is also possible to keep track of ssh remote login activity - how many connection attempts, what is the success and failure rate of ssh connections, set of commands executed once the ssh session is established and so on.
23 | [[screenshot]]
24 | 
25 | If the number of ssh login failures are execessive, one can block the user on the basis of IP range or ASN numbers or City/Country and filter out the noise in the logs.
26 | 


--------------------------------------------------------------------------------
/linux-forensics.md:
--------------------------------------------------------------------------------
 1 | ### Collecting volatile data
 2 | * Date and time
 3 | * network interfaces
 4 | * promiscuous mode
 5 | * network connections
 6 | * open ports (TCP as well as UDP), Listening ports/services
 7 | * running processes and their ports
 8 | * open files
 9 | * routing tables
10 | * mounted filesystem(s)
11 | * loaded kernel modules
12 | * kernel version
13 | * uptime
14 | * last reboot time
15 | * filesystem datetime stamps
16 | * hash values of system files
17 | * current logged in users
18 | * current users with noshell, current users with active shell
19 | * login history, login times
20 | * user accounts, inactive accounts
21 | * user history files
22 | * hidden files and directories
23 | * suid/sgid files
24 | ### Dumping RAM
25 | * using fmem kernel module
26 | * using lime
27 | * using /proc/kcore
28 | ### Acquiring filesystem images
29 | * using dd
30 | * using dcfldd
31 | * write blocking options
32 |   * using forensics linux distributions like SIFT, Kali
33 |   * udev rule based blocker for devices like USB
34 | * Analysis of strange file
35 |   * regular files in /dev
36 |   * user history files
37 |   * hidden files
38 |   * suid/sgid files
39 |   * too old date files
40 |   * finding deleted files in last 7 days/last month
41 | ### Timeline analysis
42 | * use of autospy to establish timeline
43 | * when was the system installed, rebooted, upgraded etc
44 | * changed files
45 | * newly created files
46 | ### Netwokr forensics
47 | * usage of snort for detection of malicious packets
48 | * bro for detailed logs analysis of http/dns/https/ traffic
49 | * using tcpstat
50 | * conversation analysis using tcpflow
51 | ### Writing reports
52 | * autospy
53 | * dradis
54 | * openoffice/MS-office
55 | ### File forensics
56 | * comparing file hash to known values
57 | * unknown file analysis using
58 |   * file command
59 |   * strings command
60 |   * viewing symbols using nm
61 |   * reading objects using objdump
62 |   * analysis using gdb
63 | 


--------------------------------------------------------------------------------
/log-files-and-journalctl.md:
--------------------------------------------------------------------------------
 1 | ### Various log file under /var/log
 2 | 
 3 | * alternatives.log -- "run with" suggestions from update-alternatives
 4 | * apport.log -- information on intercepted crashes
 5 | * auth.log -- user logins and authentication mechanisms used
 6 | * boot.log -- boot time messages
 7 | * btmp -- failed login attempts
 8 | * dpkg.log -- information on when packages were installed or removed
 9 | * lastlog -- recent logins (use the lastlog command to view
10 | * faillog -- information on failed login attempts -- all zeroes if none have transpired (use faillog command to view)
11 | * kern.log -- kernel log messages
12 | * mail.err -- information on errors detected by the mail server
13 | * mail.log -- information from mail server
14 | * syslog -- system services log
15 | * wtmp -- login records
16 | 
17 | 
18 | ### Journalctl
19 | In addition to log files(/var/log), you should also watch journalctl activities. The journal represents an important collection of information on user and kernel activity and this information is retrieved from a variety of sources on the system.
20 | 
21 | Some useful commands that you should run:
22 | ```
23 | Returns total no of lines
24 | $ journalctl | wc -l
25 | 
26 | ```
27 | Journal logs from a date
28 | $ journalctl --since "2018-10-06 10:00"
29 | ```
30 | ```
31 | Log entries for a particular device
32 | $ journalctl -u networking.service
33 | ```
34 | ```
35 | Disk usage
36 | $ journalctl --disk-usage
37 | ```
38 | ```
39 | Activity for a specific process
40 | $ journalctl _PID=780
41 | ```
42 | 
43 | 


--------------------------------------------------------------------------------
/logs vs metrics.md:
--------------------------------------------------------------------------------
 1 | ## Logs vs Metrics
 2 | ### What are logs
 3 | A log message is system generated set of data when an event has happended and it describes the event. Log data contain details about the event such as what resource was accessed, who accessed it and the time. Each event in a system is going to have different sets of data in the message. In general, there are five different categories of logs - informational, debug, warning, error and alert.
 4 | 
 5 | ### What are metrics
 6 | While logs are about specific event, metrics are a measurement at a point in time for the system. This can have value, timestamp and identifier(tag). While logs may be collected at any time after event has happened, metrics are typically collected at a fixed time interval known as resolution. The collection of data is referred to as time-series metric and can be visualized in different types of graphs such as guages, counters and timers.
 7 | Although measurement of health of system can be stored in performance log file, it is costly to collect health of system. A metric will normalize log file data and the size of metric file will be a fraction of size of entire log file.
 8 | 
 9 | In summary, a log is an event that happened and a metric is a measurement of the health of a system.
10 | 
11 | Based on nice explanation from - https://www.sumologic.com/blog/logs-metrics-overview/
12 | Another interesting take on log vs metrics is here - https://whiteink.com/2019/logs-vs-metrics-a-false-dichotomy/
13 | 


--------------------------------------------------------------------------------
/machine learning terms:
--------------------------------------------------------------------------------
 1 | ### Common terms used Machine learning
 2 | 
 3 | * Statistics: Statistics is the science of collecting, organising, summarising, analysing and interpreting data.
 4 | 
 5 | * Data mining - process of automatically discovering useful information in large data repositories.
 6 | 
 7 | * Machine learning - set of techniques that allow you to deal with hugh dataset in intelligent way (by developing algorithms or set of logical rules) to derive actionable insights (delivering search for users in this case)
 8 | 
 9 | * Quantitative variables - take numerical values whose size is meaningful.
10 | Quantitative variables typically have measurement units, such as pounds, dollars, years, volts, gallons, megabytes, inches, degrees, miles per hour, pounds per square inch, BTUs, and so on. So, it makes sense to add, to subtract, and to compare two persons’ weights, or two families’ incomes.
11 | 
12 | * Qualitative variables - Qualitative (categorical) variables typically do not have units.e.g. gender, hair color, or ethnicity — group individuals. Qualitative and categorical variables have neither a “size” nor, typically, a natural ordering to their values.
13 | Some variables such as social security numbers, zip codes take numerical values but are not quantitative. The sum of two zip codes or social security numbers is not meaningful. so, they are qualitative or categorical variables.
14 | 
15 | * Types of data:
16 |   * Numerical data - continuous data, discrete data
17 |   * Categorical data - normal level, ordinal level
18 | * Measurements:
19 |   * Nominal -  values of variables are names, so used for categorical/qualitative analysis
20 |   * Ordinal -collecting information in which order is important e.g. tracking of student grades
21 |   * Interval - distance between values have a special meaning - difference in temperature
22 |   * Ratio - estimation of ratio between magnitude of continuous quantity
23 | 


--------------------------------------------------------------------------------
/malware-detection-methods.md:
--------------------------------------------------------------------------------
 1 | ## Malware detection methods
 2 | 
 3 | #### Signature based methods
 4 | 
 5 | Signature is unique feature for file, something like fingerprint of an executable. Signature based methods use patterns extracted from various malwares to identify them and are more efficient and faster than any other methods. Signature based methods have small error rates and this is the reason these are often used in commercial applications.
 6 | 
 7 | But, signature based methods are unable to detect unknown malware variants and requires high amount of manpower, time and money to extract unique signatures. Further, it is difficult to identify infection such as polymorphic and metamorphic codes.
 8 | 
 9 | #### Behaviour based methods
10 | 
11 | Behaviour based malware detection techniques observer behaviour of the program to conclude whether it is malicious or not. In these methods, programs with same behaviour are collected and a behaviour signature is developed. This signature can identify various samples of malware of same family. A behaviour based detector basically consists of following components:
12 | 
13 | Data collector - Collects dynamic/static information about the executable
14 | Interpreter - Converts raw information collected by data collector into intermediate representation
15 | Matcher - compare interpreter representation with the signature
16 | 
17 | One example of behaviour based detection approach is histogram based malicious code detection by Symantec.
18 | 
19 | Main advantages of behaviour based malware is ability to detect unknown or polymorphic malware variants. But disadvantage are high False Positive Ratio(FPR) and high amount of scanning time.
20 | 
21 | #### Heuristic methods
22 | 
23 | Heuristic malware detection methods use data mining and machine learning to learn behaviour of malicious file - e.g. Naive Bayes and Multi-Naive Bayes are employed to classify malware and benign files(https://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6620049) Typicall, these use API/System calls, N-gram, Op-code features.
24 | 
25 | #### Concealment stratergies
26 | 
27 | Malware authors try to hide malware presence by adopting techniques such as
28 | 
29 | Obfuscation - Actions such as garbage commands, un-necessary jumps etc.
30 | 
31 | Code encryption - Contain defensive mechanism to encrypt themselves or its malcious activities. Encrypted malware is a complex consists of a decryption algorithm, encryption algorithm, encryption keys and encrypted malicious code. When the malware runs, the key and decryption algorithm have been used to decrypt its malicious part.
32 | 
33 | More details - https://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6620049
34 | 


--------------------------------------------------------------------------------
/malwares.md:
--------------------------------------------------------------------------------
 1 | ### Fast flux Networks
 2 | Fast flux refer to those networks used by several botnets to hide the domains used to download malware or host phishing site. It can also refer to type of network used to host command-and-control centers or proxies used by those botnets, making them difficult to find and even more difficult to dismantle.
 3 | In fast flux network, multiple IPs are associated with a domain name and these IPs undergo change as frequently as few minutes!.e.g. Avalance botnet was having 80,0000 domains under its control.
 4 | 
 5 | Most machines on this network are not actually responsible for hosting and downloading malicious content for victims. The task is reserved for few servers while the rest act as re-directors that help botnet owners to maske real addresses of the system.
 6 | 
 7 | #### Single flux network
 8 | It is characterized by multiple individual nodes registering and de-registering IP addresses as a part of DNS A records for a single domain name. These registrations have very short lifespans( 5 min or less) and creata a constantly changing flow of addresses when attempting to access a specific domain.
 9 | 
10 | Moreover the domains used are hosted on bulletproof servers having good reputation and it's difficult to take them down at short notice.
11 | 
12 | #### Double flux network
13 | This type of network is similar to single flux network but with additional sophistication and it makes it difficult to locate the machine serving the malware.
14 | In this case, zombie computers that are part of botnet are used as proxies and it prevents the victim from interacting directly with the server hosting the malware. This is the concealment stratergy adopted by cyber criminals to keep the infrastructure running.
15 | In fact, this networks are typically characterized by multiple nodes registering and de-registering as a part of DNS NS records. Both DNS A records and authoritative NS records for malicious domains are continually changed in round robin manner and advertized into fast flux network.
16 | 
17 | 
18 | Ref:
19 | * https://www.thesecuritybuddy.com/dos-ddos-prevention/what-is-fast-flux-network/
20 | * https://resources.infosecinstitute.com/fast-flux-networks-working-detection-part-1/#gref
21 | * https://www.welivesecurity.com/2017/01/12/fast-flux-networks-work/
22 | * https://www.akamai.com/uk/en/multimedia/documents/white-paper/digging-deeper-in-depth-analysis-of-fast-flux-network.pdf
23 | 
24 | ### Crypto minining malware
25 | 
26 | These malwares are specially developed to take over computer resources and use them for cryptocurrency mining without explicit permissions. Cyber criminals have turned to writing cryptomining malware as a way to harness the computing power of large number of computers, smartphones to help them generate revenue from cryptocurrency mining. A single cryptocurrency mining botnet can net cyber criminals more than $30,000 per month as per Kaspersky report.
27 | 
28 | In addition to malwares specifically designed to mine cryptocurrency , cyber criminals are using browser based cryptocurrency mining to help them generate revenue. Coinhive is a software program that packages all the tools needed to easily enable website owners to use stealth scripting to force visitors into crypto currency mining while visiting their site, in most case without explicit permission.
29 | 
30 | ### What is RAT
31 | RAT is a sort of swiss army knief program consisting of many malicious functionalities.
32 | * Stealing of username and passwords
33 | * Logging of keystrokes
34 | * Gathering system information
35 | * Exfiltration of data
36 | * Command-and-control activities
37 | * Downloading of malwares for further actions
38 | * Accessing and uploading sensitive files
39 | * Recording of  audio/video
40 | 
41 | Typical Infection vectors - email attachments and malicious downloads
42 | 
43 | 


--------------------------------------------------------------------------------
/netflow traffic classification use cases.md:
--------------------------------------------------------------------------------
 1 | ### Explanation of Flows vs Connection vs Session
 2 | A network flow is defined as unidirectional sequence of packets between two network 
 3 | endpoints and have the following attributes:
 4 | * Source IP
 5 | * Destination IP
 6 | * Protocol
 7 | * Source Port
 8 | * Destination Port 
 9 | * Type of service (ToS)
10 | * Input interface
11 | 
12 | Whereas connection is simply a bidirectional flow (a forward flow and a reverse flow)
13 | 
14 | In a session, you will have many connections between same source and destination. In addition, routing 
15 | policies, paths followed by packets in flows and sessions are different as explained below:
16 | 
17 | IP routing is stateless and routes each packet based on the IP address and port. A stateless 
18 | connection is one in which no information is retained by either sender or receiver. TCP is used
19 | to manage state and is managed by the end servers. Firewalls are stateful and keep track of 
20 | TCP/UDP sessions. The firewall tracks the attributes of the session such as sequence numbers 
21 | and keeps this information in dynamic state tables. Load balancers and WAN Optimizers are 
22 | added to networks to manage the state of a session to solve the problem of stateless routers.
23 | 
24 |  IP routing creates a fixed path between two networks. While routes can change based on network
25 |  outages, it is not possible to dynamically route a flow over multiple paths in a stateless 
26 |  network. Flows are packet based whereas sessions are services/application based. In a flow, 
27 |  all packets that are alike, are treated the same. For instance, if there are six concurrent 
28 |  cloud based video streams, the router will treat all the UDP packets the same once the flow 
29 |  is established. Session based networking allows each session to be dynamically treated 
30 |  different such as priority and bandwidth shaping.
31 |  
32 |  ### Credits:
33 |  * https://talkingpointz.com/flows-vs-sessions/
34 | 
35 | ## Netflow traffic classification using netflow
36 | 
37 | ### Name: Good traffic
38 | * Class: Good
39 | * Score:10
40 | * Note: A list of known hosts/netblocks extracted from nDPI source code e.g. netblock of google, twitter etec.
41 | 
42 | ### Name: Alienvault Bad IP
43 | * Class: Bad
44 | * Score:10
45 | * Note: IP addresses from Alienvault threat intelligent database is used to flag malicious flows.
46 | 
47 | ### Name: Emerging threats Bad IP
48 | * Class: Bad
49 | * Score: 10
50 | * Note: IP addresses from Emerging threats feed is used to flag malicious flows. 
51 | 
52 | ### Name: Insecure TCP ports
53 | * Class: Bad
54 | * Score: 10
55 | * Note: Any traffic flow that included traffic to a list of TCP or UDP ports for insecure protocols -e.g. telnet,ftp, rsh
56 | 
57 | ### Name: Unknown port conversation
58 | * Class:Bad
59 | * Score:10
60 | * Note: Any traffic flow between unknown or un-assigned source and destination TCP or UDP ports is flagged as anomaly.
61 | 
62 | ### Name:SYN_Flood
63 | * Class:Bad
64 | * Score:10
65 | * Note: Any traffic flow that contains only SYN packet is treated as malicious.
66 | 
67 | ### Name: SSH Brute Force scan
68 | * Class:Bad
69 | * Score:20
70 | * Note: Any flows to port 22 having 11 to 51 packets in flow is treated as possible SSH Brute force attack.
71 | 
72 | ### Name: Possible TCP SCAN
73 | * Class: Bad
74 | * Score: 20
75 | * Note: For thie test, flows are grouped by destination port and counted along with average number of packets per flow. Any group with more than 1000 flows and less than 4 packets per flow is considered to indicate possible scanning of destination port associated with the group. The reasoning is that a large number of flows to a port with very few packets per flow is an indication of scanning attempt.
76 | 


--------------------------------------------------------------------------------
/network-security-monitoring.md:
--------------------------------------------------------------------------------
 1 | ### Network monitoring
 2 | It is a a system that constantly monitors a computer network for slow or failing systems and that notifies the network administrator in case of outages via email, pager or other alarms.
 3 | 
 4 | Network monitoring can be done in two ways:
 5 | * Active network monitoring
 6 | * Passive network monitoring
 7 | 
 8 | #### Active network monitoring
 9 | In active monitoring, a test traffic is injected onto the network and subsequent flows are monitored. This kind of monitoring is useful when you want data on patricular aspect of network performance e.g. latency between two end-points, packet-drops, jitter, analysis of malicious payloads and so on.
10 | 
11 | #### Passive monitoring
12 | It's like a continuous observation of network traffic followed by its detailed studies.
13 | 
14 | Typically, instead of injecting artificial traffic onto your network, passive monitoring involves monitoring of traffic that is already on the network. This kind of monitoring requires a device on the network to capture network packets for analysis.  This can be done with specialized probes designed to capture network data or with built-in capabilities on switches or other network devices.  Passive network monitoring can collect large volumes of data and from that we can derive a wide range of information.  For example, TCP headers contain information that can be used to derive network topology, identify services and operating systems running on networked devices, and detect potentially malicious probes.
15 | 
16 | Through passive monitoring, a security admin can gain a thorough understanding of the network's health. Much of this data can be gathered in an automated, non-intrusive manner through the use of standard tools.Passive monitoring tools can record, analyze, correlate and produce highly valuable security intelligence specific to a network.
17 | 
18 | ### Why Network security monitoring
19 | Network security monitoring involves collecting the full spectrum of data types (event, session, full content and statistical) needed to identify and validate intrusions. The goal is to detect and respond to threats as early as possible to prevent data loss or disruption and restore the normalcy in operations. Often, this is complicated when mountains of security-related events and log data are continuously produced by multiple disparate security tools.
20 | 
21 | Most of the commercial products do generate useful security alerts ,say, "event X happened" but they do not provide enought context so that the user can act on it. The end users are always in dilemma whether the alert is relevant or not and in many cases, it is just overlooked/ignored. The usage of network security monitoring tools in open source domain allows security analyst to drill down the alerts to its minute details and make decisions.
22 | 
23 | Network security monitoring can be done in two ways:
24 | 
25 | ### Active security monitoring
26 | Active (in-line) monitoring typically includes “bump in the wire” type solutions –
27 | 
28 | * Firewalls/Proxies 
29 | * Malware/Virus scanners (Spam, Phishing, Virus)
30 | * Whitelisting / blacklisting at various layers
31 | * Encryption/Man-in-the-middle
32 | 
33 | Active measures are good first steps but they are only as effective as the signature data and/or configuration driving them. Each organization’s traffic profile is different and a lot of times active measures are not sufficient or very effective and often, they go stale very quickly with the new attacks in the wild every day.
34 | 
35 | Most firewalls are configured to block or allow combinations of IP / port / protocol. There are next-gen firewalls that also do Deep packet inspection to block malicious IPs/URLs. Malware scanners depend on pre-configured patterns of known bad attachments or phishing URLs. Whitelisting / blacklisting rules need to be updated on a regular basis to be effective.
36 | 
37 | ### Passive security monitoring
38 | As indicated earlier, a passive monitoring system can be configured to parse a copy of live network traffic, flag known anomalies and take action(machine learning) or log it for a human(security admin) to look at. A good passive monitoring solution typically has following capabilities:
39 | * Can keep up and watch all 7-layers of network traffic
40 | * Can parse and de-construct connection flows on the fly
41 | * Can log traffic meta-data for correlation
42 | * Can apply pre-defined identification rules and flag off suspicious activities
43 | * Support flexible configuration to define new patterns on the fly
44 | 
45 | There are several open source tools available for passive security monitoring and the most commomly used are described below:
46 | 
47 | #### Snort/Suricata:
48 | Snort used to be the defacto IDS / IPS engine of choice for anyone looking to run an IDS.Suricata is another popular IDS project that allows efficient monitoring of very high speed links above 1Gbps. Snort / Suricata engines have a rich set of community and open/commercial rules sets available. It is possible to run it on an edge machine (router / firewall) or on a intranet machine to watch bad traffic that is flowing through and raise alerts.
49 | 
50 | #### Bro:
51 | Bro is a general purpose traffic analysis platform that can also function as IDS!  Bro engine is driven by program like scripts that define patterns to be matched, ignored or alerted. Bro can run on commodity hardware and can be scaled up to 100Gbps.
52 | 
53 | ### Wireshark
54 | 
55 | ### Network Miner
56 | 
57 | ### 
58 | #### Useful presentations:
59 | Principle of network security monitoring - https://www.mycert.org.my/mycert-sig/mycert-sig-08/slides/MyCERT-NSM-presentation.pdf
60 | 
61 | 


--------------------------------------------------------------------------------
/nmap-nse-scripts.md:
--------------------------------------------------------------------------------
 1 | ### How to install latest nmap scripts
 2 | I often forget the process to install latest nmap scripts from official Nmap repository. So, here is a quick note for myself!
 3 | #### Find location of nmap scripts directory
 4 | Under Windows
 5 | ```
 6 | Windows Key + F, *.nse
 7 | ```
 8 | Under linux
 9 | ```
10 | $ sudo find / -name '*.nse'
11 | ```
12 | ```
13 | $ sudo locate *.nse
14 | ```
15 | Most common place for nse script is
16 | ```
17 | c:\Program Files\Nmap\Scripts
18 | /usr/share/nmap/scripts
19 | /usr/local/share/nmap/scripts
20 | ```
21 | #### Download nse scripts from official nmap site
22 | Nmap ```.nse``` scripts are located under https://svn.nmap.org/nmap/scripts/ repository.
23 | 
24 | It's official Git-based mirror is here - https://github.com/nmap/nmap/tree/master/scripts
25 | 
26 | So, Download the repository using git pull command.
27 | 
28 | Extract scripts folder and copy/overwrite over existing ```scripts``` directory.
29 | 
30 | #### Update scripts database (optional)
31 | If you have internet connection, you can use Nmap script's update command
32 | ```
33 | $ nmap –script-updatedb
34 | ```
35 | 
36 | Now, you are ready to run nmap with the latest version of NSE script and this is particularly useful for finding vulnerabilities.
37 | 
38 | Cheers!
39 | 


--------------------------------------------------------------------------------
/osquery-threat-hunting.md:
--------------------------------------------------------------------------------
 1 | ## Threat hunting using Facebook OSQuery
 2 | 
 3 | ### List logged in users in the system at present
 4 | ```
 5 | osquery> select * from logged_in_users;
 6 | ```
 7 | 
 8 | ### Find all previous logins
 9 | ```
10 | osquery> select * from last;
11 | ```
12 | 
13 | ### List Firewall rules
14 | ```
15 | osquery> select * from iptables;
16 | osquery> select chain, policy, src_ip, dst_ip from iptables;
17 | ```
18 | 
19 | ### Find all jobs scheduled by crontab
20 | ```
21 | osquery> select command,path from crontab;
22 | ```
23 | 
24 | ### Find all files with setuid enabled.(suid bit set)
25 | ```
26 | osquery> select * from suid_bin;
27 | ```
28 | 
29 | ### Find list of kernel modules
30 | ```
31 | osquery> select name, used_by, status from kernel_modules where status='Live';
32 | ```
33 | 
34 | ### Find listening ports for any backdoors
35 | ```
36 | osquery> select * from listening_ports;
37 | ```
38 | 
39 | ### Find file activity in server along with responsible user
40 | ```
41 | osquery> select * from file_events;
42 | ```
43 | 
44 | ### Find top 10 largest processes by resident memory size
45 | ```
46 | osquery> select pid, name, uid, resident_size from processes order by resident_size desc limit 10;
47 | ```
48 | 
49 | ### Find all running processes
50 | ```
51 | osquery> select * from processes;
52 | ```
53 | 
54 | ### Find the process count and name for top 10 active processes
55 | ```
56 | osquery> select count(pid) as total, name from processes group by name order by total desc limit 10;
57 | ```
58 | 
59 | ### Find any listening ports/addresses that are not as per organization policy 
60 | ```
61 | osquery> select distinct process.name listening.port,listening.address,process.pid from processes as process JOIN listening_ports as listening ON process.pid = listening.pid;
62 | ```
63 | ### Attackers often delete malicious binary file after running in the system. Find all such processes with no corresponding disk file ( valid file path)
64 | ```
65 | osquery> select name,path,pid from processes where on_disk=0;
66 | ```
67 | 
68 | ### Find any malware reverse shell
69 | ```
70 | osquery> select * from processes where cmdline like 'bin/bash -i >& /dev/tcp%';
71 | ```
72 | ### Arp spoofing attack
73 | ```
74 | osquery> select * from ( select count(1) as mac_count, mac from arp_cache group by mac) where mac_count>1;
75 | ```
76 | ### Watch a process with strict RSS limits
77 | ```
78 | osquery> SELECT i.pid, i.version, p.resident_size, p.user_time, p.system_time, uptime.total_seconds FROM osquery_info i, processes p, uptime WHERE p.pid = i.pid;
79 | ```
80 | 
81 | #### Ref:
82 | * OSQuery to monitor linux - https://linoxide.com/monitoring-2/setup-osquery-monitor-security-threat-ubuntu/
83 | 
84 | 


--------------------------------------------------------------------------------
/pandas scaling.md:
--------------------------------------------------------------------------------
 1 | ### Scaling in Pandas (Pre-processing of data values)
 2 | #### Standard scalar 
 3 | The StandardScaler assumes your data is normally distributed within each feature and will scale them such that the distribution is now centred around 0, with a standard deviation of 1.
 4 | If data is not normally distributed, this is not the best scaler to use.
 5 | 
 6 | #### Min-Max scalar
 7 | It essentially shrinks the range such that the range is now between 0 and 1 (or -1 to 1 if there are negative values). This scaler works better for cases in which the standard scaler might not work so well. If the distribution is not Gaussian or the standard deviation is very small, the min-max scaler works better. However, it is sensitive to outliers, so if there are outliers in the data, you might want to consider the Robust Scaler.
 8 | 
 9 | #### Robust scalar
10 | The RobustScaler uses a similar method to the Min-Max scaler but it instead uses the interquartile range, rathar than the min-max, so that it is robust to outliers. Of course this means it is using the less of the data for scaling so it’s more suitable for when there are outliers in the data.
11 | 
12 | #### Normalizer
13 | The normalizer scales each value by dividing each value by its magnitude in n-dimensional space for n number of features.
14 | 
15 | * Ref link - http://benalexkeen.com/feature-scaling-with-scikit-learn/
16 | 


--------------------------------------------------------------------------------
/quantum-notes.md:
--------------------------------------------------------------------------------
 1 | ### Notes
 2 | Microsoft has released preview version of Quantum Development Kit with new language - Q#. Simulations can be done locally and or on Azure cloud platform. The platform offers rich libraries and code samples.
 3 | 
 4 | Quantum systems are highly suspectible to decoherence. The states of quantum bits are quickly randomized by interference from the environment.Q-CTRL toolkit help teams design and deploy control for their quantum hardware to suppress these errors.
 5 | 
 6 | Google has released Cirq, an open source software toolkit that lets developers create algorithms without needing a background in quantum physics. Google has also released OpenFremion-Cirq for creation of 
 7 | algorithms that simulate molecules and properties of materials.
 8 | 
 9 | ### Quantum algorithm and its implementation using Qskit 
10 | * Introduction to Coding Quantum Algorithms - https://arxiv.org/pdf/1903.04359.pdf
11 | * Fundamentals in quantum algorithms - https://arxiv.org/pdf/2008.10647.pdf
12 | 
13 | * Quantum implementation of Shor's code multiple simulator platforms -
14 |   * https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11167/111670B/Quantum-implementation-of-the-Shor-code-on-multiple-simulator-platforms/10.1117/12.2532539.full?SSO=1
15 |   * https://www.spiedigitallibrary.org/proceedings/Download?fullDOI=10.1117/12.2532539
16 | 
17 | * Prototype Container-Based Platform for Extreme Quantum Computing Algorithm Development - https://ieeexplore.ieee.org/document/8916430
18 | * Comparision of quantum computing platforms through quantum algorithm implementations - http://csis.pace.edu/~ctappert/srd/a12.pdf
19 | * Gate implementation and cancer detection with quantum computing - http://reports.ias.ac.in/report/19342/gate-implementation-and-cancer-detection-with-quantum-computing
20 | * Assessment of IBM-Q computer and its software environment - http://dice.cyfronet.pl/publications/source/MSc_theses/ZuzannaChrzastek-MSc-Thesis-June-2018.pdf
21 | * Introduction to quantum computing - https://cerfacs.fr/wp-content/uploads/2018/09/CSG_Suau-final_report.pdf
22 | * Quantum computing tutorial - https://pythonprogramming.net/qubits-gates-quantum-computer-programming-tutorial/
23 | 
24 | 
25 | ### Shor's algorithm implementation
26 | * Implementation of shor's algorithm using Qiskit - https://github.com/ttlion/ShorAlgQiskit
27 | * Quantum computing examples using QISKit - https://github.com/mrtkp9993/QuantumComputingExamples
28 | * Source code of MPI programs for simulating quantum algorithms and its post-processing - https://github.com/ekera/qunundrum
29 | * Complexity analysis for shor's algorithm - https://github.com/pkaran57/quantum-computing-final-project
30 | * An implementation of Shor's Quantum Algorithm with sequential QFT - https://github.com/nikoSchoinas/ShorsQuantumAlgorithm
31 | * Implementing Shor's algorithm in Cirq - https://github.com/dmitrifried/Shors-Algorithm-in-Cirq
32 | 
33 | ### Some interesting links
34 | * Simulation of 45-bit quantum circuit - https://arxiv.org/pdf/1704.01127.pdf
35 | * Quantum circuit analyzer tool - https://iopscience.iop.org/article/10.1088/1367-2630/ab60f6#references
36 | * Quantum computing presentations:
37 |   * https://appliedtech.iit.edu/sites/sat/files/pdfs/ITM/Quantum%20Computiing.pdf
38 |   * https://www1.icts.res.in/admin/wysiwyg_editor/downloads/1_Monday/1_Ronald_de_Wolf/qip07.pdf#page=69&zoom=auto,531,-171
39 | * Analysis and implementation of quantum computing algorithms - https://digitalcommons.unf.edu/cgi/viewcontent.cgi?article=1027&context=soars
40 | * Quantum algorithm implementations for beginners - https://arxiv.org/pdf/1804.03719.pdf
41 | * Introduction to qunatum algorithm - https://people.cs.umass.edu/~strubell/doc/quantum_tutorial.pdf
42 | * Quest and high performance simulation of quantum computers - https://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC6656884&blobtype=pdf
43 | 
44 | * Shores algorithm - https://github.com/mett29/Shor-s-Algorithm
45 | * Teach me quantum - https://github.com/msramalho/Teach-Me-Quantum
46 | * Awesome quantum computing - https://github.com/krishnakumarsekar/awesome-quantum-machine-learning
47 | * https://github.com/zommiommy/quantum_research
48 | 
49 | 
50 | * Introduction to quantum computing - https://medium.com/qc-applied-approach-to-build-your-own-quantum/introduction-to-quantum-computing-a5af5127de0d
51 | * Introduction to quantum logical gates - https://medium.com/qc-applied-approach-to-build-your-own-quantum/introduction-to-quantum-logical-gates-part-i-80f95fa851a2
52 | * Intel quantum simulator - https://arxiv.org/pdf/2001.10554v1.pdf
53 | 
54 | ### Shor's Algorithm for factoring large integers
55 | 
56 | * https://github.com/lialkaas/qiskit-shors
57 | * https://github.com/toddwildey/shors-python
58 | 
59 | * Distributed Memory Techniques for Classical Simulation of Quantum Circuits - https://www.groundai.com/project/distributed-memory-techniques-for-classical-simulation-of-quantum-circuits/1
60 | * Intel quantum simulator - https://arxiv.org/pdf/2001.10554v1.pdf
61 | * QuEST and High Performance Simulation of Quantum Computers - https://europepmc.org/article/pmc/pmc6656884
62 | * 0.5 Petabyte Simulation of a 45-Qubit Quantum Circuit - https://arxiv.org/pdf/1704.01127.pdf
63 | 
64 | 
65 | 
66 | 


--------------------------------------------------------------------------------
/replace-linux-on-smartphone.md:
--------------------------------------------------------------------------------
 1 | ## Replacing Android with Linux on smartphone
 2 | Linux can support any type of computer hardware and Android smartphone hardware is no different. It's possible to run Linux on Android smartphone if you wish.
 3 | 
 4 | As you know, the core of Android is Linux - i.e. kernel used in Android is based on Linux kernel. Even though, Android uses Linux Kernel, it does not come with other softwares of Linux distribution.
 5 | Android does not run typical linux applications as it uses Dalvik virtual machine to run specific applications written in Java. Android apps are specifically programmed to work on Android devices. So, Android is a bit different from Linux!
 6 | 
 7 | ### Why linux on smartphone
 8 | Though Android is open source, it is not considered open source by many people due to the presence of proprietary softwares. These software makes Android less privacy-focused OS.
 9 | 
10 | Linux offers a completely open source OS.So we can use our smartphones without any proprietary software. It will help us in keeping the data private and improving the privacy. Further, Linux is considered to be more secure than Android. So, installing a Linux on smartphone will make our devices more secure.
11 | 
12 | Linux has good support for older hardwares and it would be beneficial for smartphones. Usually, it is seen that smartphones get software updates for 3-4 years after their initial release. But, deployment of Linux enables long term software updates for smartphones, up to ten years! This will increase the life span of smartphones and will result in substantial cost savings.
13 | 
14 | Some Linux distributions have support for Android apps and this will give users a plethora of app choices (Android and Linux based apps combined). So Linux can become a nice alternative to Android.
15 | 
16 | Ref:
17 | * https://lotoftech.com/is-it-possible-to-replace-android-with-linux-on-a-smartphone/
18 | * https://blog.mobian-project.org/posts/2021/01/15/mobian-community-edition/
19 | * https://www.ubuntupit.com/top-20-best-linux-voip-and-video-chat-software/
20 | 


--------------------------------------------------------------------------------
/sandbox-drawbacks.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ### Why Sandboxing is not enough
 3 | * It's hard to have a generic sandbox configuration that can work with all kinds of malware. e.g. it's possible that a malware can sleep for 6 hours after infection. In sandbox, you are running most samples for
 4 | uoto 5 minutes only and as a result, all the samples may not get caught. So, using sandbox is not a silver bullet. It is to be remembered that each tool has a purpose and you have see the solution
 5 | as a means of achieving your aim.
 6 | 
 7 | * Some malwares make heavy use of techniques that allow them to track the environment they are running.So, these malwares have built-in anti-VM techniques embedded in malware itself and as a result, malware will not run when it encounters a sandbox.
 8 | This again emphasizes the fact that usage of sandbox for malware analysis can not gurantee 100% results.
 9 | 
10 | * It is to be remembered that Hackers often submit samples to public sandboxes to test their detection rate. A very low detection rate is a good sign for the hacker and gives him confidence that a variant of actual sample can be used for attack.
11 | 
12 | So, it is recommended to supplement sandbox investigation along with other options. This includes going through static analysis using IDA or usage of another round of 
13 | dynamic analysis using a debugger.
14 | 
15 | In summary, you have to use the available information and resources intelligently to do malware analysis efficiently using a combination of open source and/or commercial tools.
16 | 


--------------------------------------------------------------------------------
/scap-security-compliance.md:
--------------------------------------------------------------------------------
 1 | ### Security Content Automation Protocol(SCAP) based security compliance
 2 | 
 3 | In the present world, security compliance is a MUST requirement in many industries like finance, health, pharma etc and now, it has become a legal requirement in many cases. Regulatory standards like PCI-DSS, BITS, HIPPA and ISO27001 prescribe security recommendations for protecting data and improving information security management in the organization. By fulfiling the requirements of security compliance, you are able to mitigate many network and/or web application security attacks and are able to achieve specific IT security goals for an organization to protect its reputation.
 4 | 
 5 | On one hand, organizations are confronted increasing audit and security compliance obligations with increased privacy concerns, while on the other hand, the use of cloud services, mobile ubiquity, BYOD and other mechanisms have made achieving security compliance more complex. Further, each security standard involves evolving set of specific requirements, achieving security compliance has become complicated and a costly affair. In order to gain protection from liabilities in case of a security breach, organizations are spending large amount of time and money on regulatory compliance efforts. 
 6 | 
 7 | Improving security posture is never easy journey and there will be many hurdles in implementation; but with common security terminology and standardized tools like SCAP, it's achievable.
 8 | 
 9 | Security content automation protocol(SCAP) allows guidance documents like CIS benchmarks to be expressed in open and machine readable form. SCAP validation allows user to draw conclusions about their organization's security posture in a complex environment. SCAP allows machine processing of raw security data - e.g. naming of security flaws, test for presence of flaws, status of configuration option. This provides the potential for dramatically better, more automated security posture determination for the organization. The same assessment would take much more time if done in a manual way.
10 | 
11 | Typically, SCAP based validation is done in the following way:
12 | * Scan the system against open cybersecurity standards
13 | * Calculate the score to evaluate security posture
14 | * Interoperate with other SCAP validated scanners to present results in standard way
15 | 
16 | SCAP community discussions are based on deep analysis of technology and field testing of operational systems. Combined with threat information from security community and SCAP guidance present best translation of vulnerability knowledge into the language of system administrators.e.g. CIS benchmarks translates threat knowledge into system configuration that will prevent spread of many attack vector.
17 | 
18 | In spite of this,many organizations find it difficult to maintain compliance due to lack of resource and expertize.
19 | But, it is necessary to go for achieving compliance as it will result in:
20 | * Reduce cyber risk by following best practice guides and expedite overall compliance process
21 | * Fulfiling of information security reporting and auditing requirements
22 | 


--------------------------------------------------------------------------------
/scoring classification.md:
--------------------------------------------------------------------------------
 1 | ## ML Scoring classification
 2 | 
 3 | * Ratio of correctly predicted positive observations to total predicted positive observations
 4 | precision = TP/ (TP+FP)
 5 | 
 6 | * Recall is ratio of correcly predicted positive observations to all observations in actual class.
 7 | Recall = TP/ (TP+ FN)
 8 | 
 9 | * F1 score = weighted average of precision and recall. this takes in account both false positives and false negatives.
10 | F1 = 2* ( recall * precision) / (recall + precision)
11 | 
12 | #### Interesting links
13 | * Scoring classifier models - http://benalexkeen.com/scoring-classifier-models-using-scikit-learn/
14 | * ROC plots - https://go2analytics.wordpress.com/2016/07/26/implement-classification-in-python-and-roc-plotting-svc-example/
15 | 


--------------------------------------------------------------------------------
/security-guidance.md:
--------------------------------------------------------------------------------
 1 | ## Block attachments in mail gateway
 2 | You should block the following attachment types in mail gateway:
 3 | ```
 4 | .ADE, .ADP, .APK, .BAT, .CHM, .CMD, .COM, .CPL, .DLL, .DMG, .EXE, .HTA, .INS, .ISP, .JAR, .JS, .JSE, .LIB, .LNK, .MDE, .MSC, .MSI, .MSP, .MST, .NSH .PIF, .SCR, .SCT, .SHB, .SYS, .VB, .VBE, .VBS, .VXD, .WSC, .WSF, .WSH, .CAB
 5 | ```
 6 | Ref links - https://support.google.com/mail/answer/6590?hl=en
 7 | 
 8 | ## Improve effectiveness of clamav
 9 | To improve effectiveness of clamav, include signatures from Sanescurity.net and Securiteinfo.com
10 | 
11 | #### Interesting links:
12 | * http://sanesecurity.com/usage/signatures/
13 | * https://www.securiteinfo.com/services/improve-detection-rate-of-zero-day-malwares-for-clamav.shtml
14 | * https://portal.smartertools.com/community/a2583/how-to-greatly-improve-clamav-even-zero-hour-style-protection-for-free.aspx
15 | * https://portal.smartertools.com/community/a90798/are-clamav-cryen-basically-useless.aspx
16 | * Virus statistics on year basis - http://www.shadowserver.org/wiki/pmwiki.php/AV/VirusYearlyStats
17 | 


--------------------------------------------------------------------------------
/security-testing.md:
--------------------------------------------------------------------------------
 1 | ## Security testing of applications
 2 | 
 3 | 90% of security incidents result from attackers exploiting known software bugs. If you can eliminate bugs in the developement phase of software, it could reduce information security risks facing many organizations.
 4 | The following techniques are most commonly used for security testing of applications.
 5 | 
 6 | #### Static Application Security Testing(SAST)
 7 | It checks if coding is in conformance with the guidelines and standards. SAST does not find runtime errors. SAST can be easily automated and integrated in project's workflow.
 8 | 
 9 | #### Dynamic Application Security Testing(DAST)
10 | It is also known as blackbox testing. Used for finding vulnerabilities in web applications. DAST also allows you to identify flaws in authentication and configuration issues. DAST does not flag coding errors.
11 | 
12 | #### Hybrid (SAST and DAST) 
13 | Often SAST and DAST are used in tandem to improve performance.
14 | 
15 | #### Interactive application security testing(IAST) 
16 | SAST and DAST are older technologies but they can not handle modern web and mobile applications wherein extensive AJAX and other interactive technologies are used.
17 | 
18 | #### Run-time Application Security Protection (RASP) 
19 | RASP works inside the application and is more of a security tool. It is plugged into application and controls application execution. RASP lets the application to run continuous security checks on itself and response to live attacks by terminating attacker's session and alerting defender to the attack.
20 | 


--------------------------------------------------------------------------------
/signs-of-compromise.md:
--------------------------------------------------------------------------------
 1 | ### Use cases for signs of compromise
 2 | #### network artifacts
 3 | * Unusual DNS queries
 4 | * High or low volume port scanning
 5 | * DNS tunneling and zone transfers
 6 | * Low volume peiodic command and control traffic 
 7 | * Unusual http headers 
 8 | * Unknown IoT devices
 9 | * Unusual RDP traffic 
10 | * Unusual user agent string 
11 | * Detection of Tor exit node addresses
12 | * Traffic to and from unknown geographic locations 
13 | 
14 | #### Host artifacts
15 | * unknown running service(s)
16 | * unknown running programs
17 | * unusual startup locations for known programs
18 | * unusal network connections for program
19 | * sudden appearence of advertizements
20 | * slow system response
21 | 
22 | * Spot unknown malware, zero-days and rogue behaviour by insiders - by leveraging baselines and known patterns of bad behaviour 
23 | * Detect unusal lateral movement - look for trends in outbound communication 
24 | * Uncover APT - uncover hidden patterns in network traffic to unusual geographic location based on time, frequency and contexual information
25 | 
26 | 


--------------------------------------------------------------------------------
/source port 0 traffic:
--------------------------------------------------------------------------------
 1 | ## Traffic on Source port 0 in Netflow 
 2 | 
 3 | NetFlow will separate TCP communications longer than 5 minutes into separate  flows,  which  can  be  identified  because  the  source  port  is  ‘0.
 4 | 
 5 | In  addition, packets  that  exceed  the  maximum  transmission  unit  (MTU)  size  are  fragmented  into several packets but only the first packet will contain an valid TCP port.  The remaining fragments  will  have  no  layer  4  header  and  thus  have  a  destination  port  set  to  0.
 6 | 
 7 | IANA’s  Service  Name  and  Transport  Protocol  Port  Number  Registry  list  port  0  as reserved,  but  valid,  for  both  TCP  and  UDP.Because  the  specification  does  not define behavior for connections established on those ports, attackers may use responses to fingerprint the operating systems of destination hosts. Furthermore, hackers may craft 'impossible' packets to DDoS firewalls because some routers prevent administrators from entering port 0 in the access control list since it’s supposedly impossible for traffic to be on that port.  Making such packets requires using raw sockets software calls that specify everything after the Ethernet header using bytes
 8 | 
 9 | Ref - 
10 | * The strange history of port 0 - http://www.lovemytool.com/blog/2013/08/the-strange-history-of-port-0-by-jim-macleod.html
11 | 
12 | 


--------------------------------------------------------------------------------
/system-base-line-building.md:
--------------------------------------------------------------------------------
 1 | The following commands are helpful to establish a system base line.
 2 | ## User Information
 3 | #### Users
 4 | ```
 5 | $ cat /etc/passwd | cut -d ":" -f 1
 6 | ```
 7 | #### Uid information
 8 | ```
 9 | $ cat /etc/passwd | cut -d ":" -f 3
10 | ```
11 | #### Gid information
12 | ```
13 | $ cat /etc/passwd | cut -d ":" -f 4
14 | ```
15 | #### Root users
16 | ```
17 | $ grep -v -E "^#" /etc/passwd | awk -F: `$3 == 0 { print $1 }`
18 | ```
19 | ### Cron jobs
20 | #### Own cron jobs
21 | ```
22 | $ crontab -l -u `whoami`
23 | ```
24 | #### Job list ( own as well as other users)
25 | ```
26 | $ ls -la /etc/cron
27 | $ ls -laR /etc/cron
28 | ```
29 | #### Spool cron jobs
30 | ```
31 | $ sudo ls -la /var/spool/cron/crontabs 
32 | ```
33 | ### System information
34 | #### Kernel
35 | ```
36 | $ uname -r
37 | ```
38 | #### Hostname
39 | ```
40 | $ uname -n
41 | ```
42 | #### Architecture
43 | ```
44 | $ uname -m
45 | ```
46 | #### Shells present
47 | ```
48 | $ cat /etc/shells | grep "bin" | cut -d "/" -f3
49 | ```
50 | #### Environement
51 | ```
52 | $ env
53 | ```
54 | #### Path information
55 | ```
56 | $ $PATH
57 | ```
58 | ### Password information
59 | #### Umask
60 | ```
61 | $ grep -i "^umask" /etc/login.defs
62 | ```
63 | #### Password - max days
64 | ```
65 | $ grep -i "^pass_max" /etc/login.defs
66 | ```
67 | #### Password - min days
68 | ```
69 | $ grep -i "^pass_min" /etc/login.defs
70 | ```
71 | #### Password - warning days
72 | ```
73 | $ grep -i "^pass_warn" /etc/login.defs
74 | ```
75 | #### Password - encryption method
76 | ```
77 | $ grep -i "^encryption_method" /etc/login.defs
78 | ```
79 | 


--------------------------------------------------------------------------------
/tap-vs-span port.md:
--------------------------------------------------------------------------------
 1 | ### Taps vs Span ports
 2 | There are two common methods to extract traffic directly from the system: TAPs and SPANs. A network TAP is a hardware component that connects into the cabling infrastructure to copy packets for monitoring purposes. A SPAN (Switch Port ANalyzer) is a software function of a switch or router that duplicates traffic from incoming or outgoing ports and forwards the copied traffic to a special SPAN (or sometimes called mirror) port. In general, network TAPs are preferred over SPAN ports for the following reasons:
 3 | 
 4 | * SPAN ports are easily oversubscribed and have the  lowest priority when it comes to forwarding, which results in dropped packets
 5 | * The SPAN application is processor-intensive and can have a negative performance impact on the switch itself, possibly affecting network traffic
 6 | * Because SPAN traffic is easily reconfigured, SPAN output can change from day to day, resulting in inconsistent reporting
 7 | 
 8 | However, there are some situations where inserting a TAP is not practical. For example, traffic could be running on a physical infrastructure outside your direct control, or maintenance windows may not allow for timely TAP deployments. Perhaps a remote location may not be able to justify a permanent TAP, but has SPAN access for occasional troubleshooting needs since a SPAN can be added without bringing down a link.
 9 | 
10 | ### Passive Taps
11 | A passive TAP requires no power of its own and does not actively interact with other components of the network. It uses an optical splitter to create a copy of the signal and is sometimes referred to as a “photonic” TAP. Most passive TAPs have no moving parts, are highly reliable and do not require configuration.
12 | 
13 | ### Active taps
14 | Active TAPs are not passive. They require their own power source to regenerate the signals. There is no split ratio consideration because the TAP receives the message and then retransmits it to both the network and monitoring destinations. From a highlevel perspective this would appear to be a positive feature. Even so, passive TAPs are preferred. During a power outage, an active TAP cannot regenerate the signal, so it becomes a point of failure. Since a passive TAP is not powered, it would be unaffected during a power outage and the packets (originating from a source that still has power) would continue to flow.
15 | 
16 | * Ref - https://www.gigamon.com/content/dam/resource-library/english/white-paper/wp-network-taps-first-step-to-visibility.pdf
17 | 
18 | 
19 | 


--------------------------------------------------------------------------------
/things-to-explore.md:
--------------------------------------------------------------------------------
1 | * Reproducable Jupyter notebook - https://blog.reviewnb.com/reproducible-notebooks/
2 | ### Prads - passive assets fingerprinting
3 | * prads presentation - https://www.slideshare.net/huayrass/pradsdagenatificleanen
4 | * Visualization of prads - https://www.duo.uio.no/bitstream/handle/10852/42155/Desta-Dawit-Master.pdf
5 | 


--------------------------------------------------------------------------------
/threat-feeds.md:
--------------------------------------------------------------------------------
1 | * Coin-miners - https://github.com/ntop/nDPI/blob/dev/example/mining_hosts.txt
2 | 


--------------------------------------------------------------------------------
/useful-commands.md:
--------------------------------------------------------------------------------
 1 | # Useful commands
 2 | ### OS statistics
 3 | ```
 4 | $ ifconfig -a
 5 | $ netstat -s
 6 | $ netstat -ni
 7 | $ vmstat -S m 1
 8 | ```
 9 | ### NIC configuration Ethtool
10 | Format:
11 | Show            ;           Set 
12 | ```
13 | $ ethtool -S eth0 // Statistics
14 | $ ethtool -S eth0 | egrep '(rx_missed|no_buffer)'    // Drop Values  
15 | $ ethtool -g eth0 ; ethtool -G eth0 rx 4096 tx 4096 // FIFO RX Descriptors
16 | $ ethtool -k eth0 ; ethtool -K gro on gso on rx on // Offloading
17 | $ ethtool -a eth0 ; ethtool -A rx off autoneg off // Pause Frames
18 | $ ethtool -c eth0 ; ethtool -C eth0 rx-usecs 100 // Interrupt Coalescence 
19 | ```
20 | ### PCAP statistical data
21 | ```
22 | $ capinfos file.pcap
23 | $ tcpslice -r file.pcap
24 | $ tcpdstat file.pcap
25 | $ tcpprof -S lipn -P 30000 -r file.pcap
26 | ```
27 | ### Number System Conversions
28 | ```
29 | $ printf "%d" 0x2d
30 | $ printf "%x" 45
31 | $ printf '\x47\x45\x54\x0a'
32 | $ echo "GET" | hexdump -c 
33 | $ echo "GET" | hexdump -C 
34 | ```
35 | ### Session & Flow Data
36 | ```
37 | $ iftop -i eth0.pcap // live only, replay for same effect
38 | ```
39 | // use -i instead of -r for interface
40 | ```
41 | $ tcpflow -c -e -r file.pcap 'tcp and port (80 or 443)'
42 | $ tcpflow -r file.pcap tcp and port \(80 or 443\)
43 | $ tcpick -r file.pcap -C -yP -h 'port (25 or 587)'
44 | ```
45 | // [-wRu] write both flows; [-wRC] write client flows only ; [-wRS] write server flows only
46 | ```
47 | $ tcpick -r file.pcap -wRu 
48 | ```
49 | 
50 | ### Replay
51 | ```
52 | $ tcpreplay -M10 -i eth0 file.pcap
53 | $ netsniff-ng --in file.pcap --out eth0
54 | $ netsniff-ng --in eth0.pcap --out eth1.pcap
55 | $ trafgen --dev eth0 --conf trafgen.txf --bind-cpu 0
56 | ```
57 | ### Audit Record Generation And Utilization System
58 | ```
59 | $ argus -r file.pcap -w file.argus
60 | $ ra -nnr file.argus ; ra -Z b -nnr file.argus
61 | $ ra -nnr file.argus - host 192.168.1.1 and port 80
62 | $ racluster -M rmon -m saddr -r file.argus
63 | $ ra -nnr file.argus -w - - port 22 | racluster -M rmon -m saddr -r - | rasort -m bytes -r -
64 | $ racluster -M rmon -m proto -r file.argus -w - | rasort -m pkts -r - 
65 | $ racluster -M rmon -m proto sport -r file.argus
66 | $ ragraph bytes -M 30s -r file.argus -w bytes.png
67 | $ ragraph pkts -M 30s -r file.argus -w pkts.png
68 | $ ra -nnr file2.argus -s saddr,daddr,loss | sort -nr -k 3 | head -20
69 | $ ragraph dbytes sbytes -M 30s -r file.argus - dst port 80 and dst port 443
70 | $ ragraph dbytes sbytes dport sport -fill -M 30s -r file.argus
71 | ```
72 | ### Network Forensics - File Extraction
73 | ```
74 | $ tcpdump -nni eth0 -w image.pcap port 80 &
75 | $ wget http://upload.wikimedia.org/wikipedia/en/5/55/Bsd_daemon.jpg
76 | $ jobs
77 | $ kill %1
78 | $ tcpflow -r image.pcap
79 | $ tcpxtract -f file.pcap -o xtract/
80 | ```
81 | 
82 | ### change MAC's
83 | ```
84 | $ tcprewrite --enet-dmac=00:44:66:FC:29:AF,00:55:22:AF:C6:37
85 | --enet-smac=00:66:AA:D1:32:C2,00:22:55:AC:DE:AC --infile=in.pcap
86 | --outfile=out.pcap
87 | ```
88 | # randomize IP's
89 | ```
90 | $ tcprewrite --seed=423 --infile=in.pcap --outfile=out.pcap
91 | ```
92 | Ref - http://www.draconyx.net/talks/pcapworksheet.txt and many thanks to John Schipp.
93 | 
94 | 


--------------------------------------------------------------------------------
/vulnerability-management.md:
--------------------------------------------------------------------------------
 1 | ### Vulnerability management
 2 | Vulnerabilities in software is a practical reality for any IT professionals/Sysadmins and you can attribute many cyberattacks to the failure of sysadmins to identify vulnerabilties in time or failing to patch the existing vulnerabilities. The latest statistics have shown that such attacks cover more than 90% cases.
 3 | 
 4 | In broad sense, vulnerabilities can be categorized as:
 5 | #### Known vulnerabilties
 6 | * Known to the world but unknown to sysadmin (lack of awareness)
 7 | * Known to sysadmin but failure to patch in time
 8 | 
 9 | #### Unknown vulnerabilties
10 | * Vulnerabilties not yet discovered
11 | * Vulnerabilities known to very few people(zero-day)
12 | 
13 | In many cases, public exploits for these vulnerabilities are available on the internet and malicious actors/hackers will be using them on a case-to-case basis. In case of state sponsored hackers, these exploits are specifically written for a targeted environment.
14 | 
15 | So, no matter how large or small the network that you oversee, it is critical that every organization has to have a vulnerability management program.
16 | 
17 | Vulnerability management is never ending process and you have to be continuously proactive in identifying and mitigating existing vulnerabilties as soon as they are announced.
18 | 
19 | #### Vulnerability scans
20 | You need to do periodic vulnerability scanning of all your network devices, servers, host and any other devices on the network and list down the products and programs installed. Further, you have to ensure that this list is up-to-date as users/admin may install/un-install any products/programs in day-to-day operations.
21 | 
22 | A good vulnerability scanner(e.g. GFI LANGuard) not only maintains an up-to-date list of installed products/programs on the network but also keeps track of underlying operating system vulnerabilties(Linux, Windows).
23 | 
24 | #### Importance of patching
25 | There are many cyberattacks that have happened due to un-patched vulnerabilties. There are many reasons like improper patch schedule,over-confident sysadmins(it may not happen to my server) and lack of awareness of the security vulnerabilty.
26 | 
27 | So, it is important to have a patching schedule as a part of your vulnerability management policy. This schedule should allow enough time to test the patches for bugs or any side-effects in staging environements and only then the patching needs to be applied to production machines. For the zero-day or critical vulnerabilties, it is important to have contigency plan so that these can be applied to production systems as soon as possible to minimize the damage.
28 | 


--------------------------------------------------------------------------------
/web-logs-iocs.md:
--------------------------------------------------------------------------------
 1 | ## Web logs analysis - Some indicators of compromise(IOC)
 2 | 
 3 | ### IP-level statistics:
 4 | High frequency, periodicity or volume by a single IP address or subnet is suspicious.
 5 | 
 6 | ### User string abbrevations:
 7 | Self referencing paths(/./) or backreferences(/../) are used in path traversal attacks
 8 | 
 9 | ### Decoded URLs and HTML entities, escapaed characters, null byte string termination
10 | These are used by simplae signature/rule engines to avoid detection
11 | 
12 | ### Unusal refererr patterns:
13 | Page accesses with abnormal referrer url are often a signal of an unwelcome access to http endpoint
14 | 
15 | ### Sequence of accesses to endpoints:
16 | Out-of-order access to http endpoints that do not correspond to website logical flow is indicative of fuzzing or malicious explorations. e.g. if typically user access the website after logging in (/login using POST) followed by three successive GETs to /a, /b, /c. But a particular IP is repeatedly making GET requests to /b and /c without corresponding login or /a request. This could be a sign of bot automation or manual reconnaissance activity.
17 | 
18 | ### User agent patterns
19 | Perform user agent frequency analysis on user agents to alert on un-usual user agent string or extremely old client.
20 | 
21 | Web logs provide enough information about different OWASP Top Ten web application attacks.
22 | 
23 | Good Book - Machine Learning and Security: Protecting Systems with Data and Algorithms - David freeman O'reilly
24 |  
25 | 


--------------------------------------------------------------------------------
/weekly-report-template.md:
--------------------------------------------------------------------------------
 1 | ## Weekly report template
 2 | 
 3 | ### Objective
 4 | * Describe short-term objective(s) of the project keeping in mind the big picture( targets of project)
 5 | * If required, add long-term objectives to achieve the final goal.
 6 | 
 7 | ### Work you did
 8 | * Explain the steps/work you did to achieve your goal for this week.
 9 | 
10 | ### Remarks
11 | * Summarize your weekly activities
12 | * What you have learned or gained
13 | * Describe the work you liked
14 | 
15 | ### Follow up
16 | * Identify the activities that you plan to take up in the next or coming week
17 |  
18 | ### Meetings
19 | * List the name of the meeting that you attended, their purpose and any specific contributions from your side. Also, include if any task(s) assigned to you.
20 | * Any followup action required. 
21 | 
22 | 


--------------------------------------------------------------------------------
/why-time-series-databases.md:
--------------------------------------------------------------------------------
1 | Relational databases offer SQL which is way better than key-value or other level ways  to manipulate big data sets. However, SQL's expressive power is very limited in time-series domain. Relational tables grow "Downwards" by adding rows  and SQL is reasonably expressive and fast. But, time-series data is different - typically, it's a row series data consisting of primary key and series of other attributes. It follows "Wide rows" model. SQL's row-oriented model does not fit well for time-series data. You are essentially building entity-attribute-value model which is in-efficient and contains tons of repeated data and it's difficult to query.
2 | 
3 | Secondly, the size of time series data that you are dealing with is hugh - millions of entries. Single-node databases have only a limited capabilities and running a production time series service, you need a distributed database. RRD files are not good foundation for building this type of system. People have tried to build time-series databases on the top of NoSQL databases and popular examples are OpenTSDB, KairosDB. But, the problem with these solutions is getting expreienced people to run the DBs. Also, it's not possible to get a good read performance from these DBs. That's the reason, people have now turned their attention to native time-series databases.
4 | 
5 | A unique characteristics of time-series data include write-append-mostly, rare updates, sequential reads, and occasional bulk deletes. The datastore needs to be optimized for all of these. Good examples of Time series databases are OpenTSDB, InfluxDB. In addition, you may also have to look at other parameters like volume of data, flexibility of storage, horizontal scalbility, high availability etc.
6 | 


--------------------------------------------------------------------------------