├── .gitignore
├── Introduction.md
├── LICENSE
├── README.md
├── SUMMARY.md
├── assets
    ├── images
    │   └── video
    │   │   └── The rules of teaching.png
    └── video
    │   └── The rules of teaching.mp4
├── book.json
├── chapter1
    ├── section0.md
    ├── section1.md
    ├── section2.md
    └── section3.md
├── chapter2
    ├── section0.md
    ├── section1.md
    ├── section1
    │   ├── section1.1.md
    │   └── section1.2.md
    ├── section2.md
    └── section2
    │   ├── section2.1.md
    │   ├── section2.2.md
    │   └── section2.3.md
├── chapter3
    ├── section0.md
    ├── section1.md
    ├── section2.md
    ├── section3.md
    ├── section3
    │   ├── section3.1.md
    │   ├── section3.2.md
    │   ├── section3.3.md
    │   ├── section3.4.md
    │   ├── section3.5.md
    │   ├── section3.6.md
    │   ├── section3.7.md
    │   └── section3.8.md
    ├── section4.md
    └── section5.md
├── chapter4
    ├── section0.md
    ├── section1.md
    ├── section2.md
    └── section3.md
├── chapter5
    ├── section0.md
    ├── section1.md
    ├── section2.md
    └── section3.md
└── styles
    └── website.css


/.gitignore:
--------------------------------------------------------------------------------
1 | _book/*
2 | html/*
3 | node_modules/*


--------------------------------------------------------------------------------
/Introduction.md:
--------------------------------------------------------------------------------
1 | # 简介
2 | 
3 | `TH-Nebula`通过解析企业的网络流量 (包括 `http` 和 `https`) 来还原企业内部的各种事件, 从而进行深层次的分析来发现各种风险. 
4 | 
5 | 本片文档为`TH-Nebula`技术文档, 由于种种因素, 文档暂时还有诸多不完善的地方, 需不断进行修订, 如在系统使用过程中发现文档中未提及或者未解决的问题, 还请在与我们取得联系, 我们在此表示由衷地感谢! 
6 | 
7 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 信息
 2 | 
 3 | - 版本: v 1.0
 4 | - 标题: `Nebula_doc`
 5 | - 分类: 技术文档
 6 | 
 7 | # 文件结构
 8 | 
 9 | - chapter *: 章节文件存储目录
10 | - `book.json`: `Gitbook`文档配置文件
11 | - `Introduction.md`: 前言
12 | - `SUMMARY.md`: 文档目录配置文件
13 | 
14 | # 本地运行
15 | 
16 | - 环境要求: `Node.js` (v4.0.0及以上)
17 | 
18 | - 安装`Gitbook`(`NPM`安装)
19 | 
20 | ```shell
21 | npm install gitbook-cli -g
22 | ```
23 | 其中`gitbook-cli`是`Gitbook`的一个命令行工具, 通过它可以在电脑上安装和管理`Gitbook`的多个版本.
24 | 
25 | - `clone`代码并运行
26 |   
27 | ```shell
28 | git clone https://github.com/threathunterX/nebula_doc.git
29 | cd nebula_doc
30 | gitbook install
31 | gitbook serve
32 | ```
33 |   
34 | - 打开浏览器中通过 http://localhost:4000/ 进行访问
35 | 
36 | 
37 | 
38 | 
39 | 


--------------------------------------------------------------------------------
/SUMMARY.md:
--------------------------------------------------------------------------------
 1 | # Summary
 2 | 
 3 | * [Introduction](README.md)
 4 | * [简介](Introduction.md)
 5 | * [1. 关于星云](chapter1/section0.md)
 6 |     * [1.1. 星云简介](chapter1/section1.md)
 7 |     * [1.2. 星云特点](chapter1/section2.md)
 8 |     * [1.3. 星云解决问题](chapter1/section3.md)
 9 | * [2. 快速接入](chapter2/section0.md)
10 |     * [2.1. 快速入门](chapter2/section1.md)
11 |         * [2.1.1. 星云系统架构](chapter2/section1/section1.1.md)
12 |         * [2.1.2. 星云工作原理](chapter2/section1/section1.2.md)
13 |     * [2.2. 安装](chapter2/section2.md)
14 |         * [2.2.1. 配置要求](chapter2/section2/section2.1.md)
15 |         * [2.2.2. 二进制安装](chapter2/section2/section2.2.md)
16 |         * [2.2.3. 源码安装](chapter2/section2/section2.3.md)
17 | * [3. 使用手册](chapter3/section0.md)
18 |     * [3.1. 基本功能](chapter3/section1.md)
19 |     * [3.2. 常见使用指引](chapter3/section2.md)
20 |     * [3.3. 业务对接](chapter3/section3.md)
21 |         * [3.3.1. 场景介绍](chapter3/section3/section3.1.md)
22 |         * [3.3.2. 事件介绍](chapter3/section3/section3.2.md)
23 |         * [3.3.3. 变量介绍](chapter3/section3/section3.3.md)
24 |         * [3.3.4. 规则梳理](chapter3/section3/section3.4.md)
25 |         * [3.3.5. 脚本定制](chapter3/section3/section3.5.md)
26 |         * [3.3.6. 策略配置](chapter3/section3/section3.6.md)
27 |         * [3.3.7. 运营决策](chapter3/section3/section3.7.md)
28 |         * [3.3.8. 规则迭代](chapter3/section3/section3.8.md)
29 |     * [3.4. 星云系统配置功能](chapter3/section4.md)
30 |     * [3.5. 阻断星云中发现的风险](chapter3/section5.md)
31 | * [4. 设计理念](chapter4/section0.md)
32 |     * [4.1. 数据采集](chapter4/section1.md)
33 |     * [4.2. 数据分析](chapter4/section2.md)
34 |     * [4.3. 架构设计](chapter4/section3.md)
35 | * [5. 二次开发](chapter5/section0.md)
36 |     * [5.1. Sniffer原理及驱动定制](chapter5/section1.md)
37 |     * [5.2. Sniffer nginx kafka驱动支持](chapter5/section2.md)
38 |     * [5.3. Sniffer测试以及debug](chapter5/section3.md)
39 | 
40 | 


--------------------------------------------------------------------------------
/assets/images/video/The rules of teaching.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/threathunterX/nebula_doc/a112fee74d1e64b881875767a552c85435138023/assets/images/video/The rules of teaching.png


--------------------------------------------------------------------------------
/assets/video/The rules of teaching.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/threathunterX/nebula_doc/a112fee74d1e64b881875767a552c85435138023/assets/video/The rules of teaching.mp4


--------------------------------------------------------------------------------
/book.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "title": "Nebula",
 3 |     "author": "Threat Hunter",
 4 |     "description": "This book is powered by Threat Hunter.",
 5 | 	"output.name": "site",
 6 |     "language": "zh-hans",
 7 |     "gitbook": "3.2.3",
 8 |     "root": ".",
 9 | 	"structure": {
10 |         "readme": "Introduction.md"
11 |     },
12 |     "plugins": [
13 |         "disqus@0.1.0",
14 | 		"-lunr",
15 | 		"-search",
16 | 		"search-plus@1.0.3",
17 | 		"-highlight",
18 | 		"prism@2.4.0",
19 | 		"advanced-emoji@0.2.2",
20 | 		"github@2.0.0",
21 | 		"edit-link@2.0.2",
22 | 		"splitter@0.0.8",
23 | 		"mermaid-gb3@2.1.0",
24 | 		"todo@0.1.3",
25 | 		"tbfed-pagefooter@0.0.1",
26 | 		"expandable-chapters-small@0.1.7",
27 | 		"anchor-navigation-ex@1.0.13",
28 | 		"terminal@0.3.2",
29 | 		"alerts@0.2.0",
30 | 		"copy-code-button@0.0.2",
31 | 		"local-video@^1.0.1"
32 |     ],
33 |     "pluginsConfig": {
34 |         "disqus": {
35 |             "shortName": "test"
36 |         },
37 |         "github": {
38 |             "url": "https://lab.threathunter.cn/Nebula/Nebula_doc"
39 |             },
40 | 		"edit-link": {
41 | 		    "base": "https://lab.threathunter.cn/Nebula/Nebula_doc/edit/master",
42 | 			"label": "修改此页"
43 | 		},
44 |         "tbfed-pagefooter": {
45 |             "copyright": "Copyright &copy Threat Hunter 2018",
46 |             "modify_label": "该文件修改时间: ",
47 |             "modify_format": "YYYY-MM-DD HH:mm:ss"
48 |         },
49 |         "anchor-navigation-ex":{
50 | 			"showLevel": true,
51 | 			"associatedWithSummary": true,
52 | 			"printLog": false,
53 | 			"multipleH1": true,
54 | 			"mode": "float",
55 | 			"showGoTop":true,
56 | 			"float": {
57 | 				"floatIcon": "fa fa-navicon",
58 | 				"showLevelIcon": true,
59 | 				"level1Icon": "fa fa-hand-o-right",
60 | 				"level2Icon": "fa fa-hand-o-right",
61 | 				"level3Icon": "fa fa-hand-o-right"
62 | 			},
63 | 			"pageTop": {
64 | 				"showLevelIcon": true,
65 | 				"level1Icon": "fa fa-hand-o-right",
66 | 				"level2Icon": "fa fa-hand-o-right",
67 | 				"level3Icon": "fa fa-hand-o-right"
68 | 			}
69 | 		}
70 |     }
71 | }
72 | 


--------------------------------------------------------------------------------
/chapter1/section0.md:
--------------------------------------------------------------------------------
1 | # 1. 关于星云
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter1/section1.md:
--------------------------------------------------------------------------------
1 | # 1.1. 星云简介
2 | 
3 | `TH-Nebula`通过解析企业的网络流量 (包括 `http` 和 `https`) 来还原企业内部的各种事件, 从而进行深层次的分析来发现各种风险.
4 | 
5 | 本片文档为`TH-Nebula`技术文档, 由于种种因素, 文档暂时还有诸多不完善的地方, 需不断进行修订, 如在系统使用过程中发现文档中未提及或者未解决的问题, 还请在与我们取得联系, 我们在此表示由衷地感谢!
6 | 


--------------------------------------------------------------------------------
/chapter1/section2.md:
--------------------------------------------------------------------------------
 1 | # 1.2. 星云特点
 2 | 
 3 | **1.轻量级部署**
 4 | 
 5 | 星云采用完全旁路流量解析的方式来采集业务信息，企业只需要与运维配合即可完成部署。值得一提的是，即使在业务增加、变化的情况下企业都可快速地获取到网络访问、登陆、注册、下单、参与活动等业务行为。
 6 | 
 7 | **2.内置风险识别规则，简单易用**
 8 | 
 9 | 在“星云”上内置了大量业务场景下的攻防规则，并采用可视化规则编辑的方式，企业可以快速编辑策略并进行实际环境下的测试。
10 | 
11 | **3.无埋点，无敏感数据泄漏风险**
12 | 
13 | 星云不需要企业研发埋点即可实现访问、登陆、注册、信息修改等的数据实时采集，无敏感数据外泄风险，更好的保护企业数据隐私。
14 | 


--------------------------------------------------------------------------------
/chapter1/section3.md:
--------------------------------------------------------------------------------
1 | # 1.3. 星云解决问题
2 | 
3 | 风控系统的本质是为了能够让企业有能力主动发现业务风险，我们希望星云的开源能让企业能够快速的度过早期的基础建设阶段，进入到攻防效率提升阶。基于星云风控系统，企业可以针对不同的业务场景进行攻防对抗。
4 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9h25nhpj21cs0bkwzg.jpg)
5 | 


--------------------------------------------------------------------------------
/chapter2/section0.md:
--------------------------------------------------------------------------------
1 | # 2. 快速接入
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter2/section1.md:
--------------------------------------------------------------------------------
1 | # 2.1. 快速入门
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter2/section1/section1.1.md:
--------------------------------------------------------------------------------
 1 | # 2.1.1. 星云系统架构
 2 | 
 3 | 与常见的一些简易安全防护软件不同，Nebula本质上是一套完整且独立的数据分析平台，逻辑上，它需要提供以下几个方面的功能：
 4 | 
 5 |     * 数据采集与集成平台。负责对接客户现有系统不同形式存在的各种原始数据，包括流量，实时日志，日志文件等。
 6 |     * 数据规整化与业务日志提取系统。Nebula对原始数据进行清洗和标准转换，并根据配置抽象出各种标准的业务日志，方便后续进一步的分析。
 7 |     * 数据持久化功能。对于进入系统的日志，进行持久化，方便后续的离线计算以及攻击溯源操作。
 8 |     * 海量数据实时计算引擎。对进入系统的海量数据，进行大规模实时并行计算，得到关于用户的实时统计特征
 9 |     * 海量数据离线批处理计算引擎。对进入系统的海量数据，间隔性的进行离线批处理计算，得到关于用户的固定特征
10 |     * 高性能策略引擎。利用实时计算和离线计算的数据，对所有用户访问进行策略判别，识别出风险流量，方便后续进一步处理
11 |     * 风险事件和黑白名单管理功能。对于系统中识别出的风险事件，以及与之相关的黑白名单进行管理和查询
12 |     * 数据可视化和风险数据自助式分析系统。方便对原始数据进行review，对风险情况进行溯源
13 |     * 数据导出和api集成。用于将黑白名单和风险事件导出，集成到用户系统；同时可以进一步将系统内部数据导出。
14 |     * 系统配置和管理功能。复杂的系统需要配合相应的管理工具。
15 | 
16 | 下图粗略的展示了这些功能，以及功能与客户系统的交互：
17 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9hc6jupj21oj0su119.jpg)因此，整个Nebula系统，功能比较完整和复杂，所以无法用单个进程或软件的形态来提供这样一整套平台。在物理实现上，它由相当多的组件来组成，纯业务模块包括：
18 | 
19 |     * 数据采集和转化模块。数据采集和规整化由单个物理模块提供。
20 |     * 数据实时计算模块和规则引擎。提供了系统中实时处理的功能，包括实时计算，规则引擎，数据持久化。为了简化，目前统一在实时模块。
21 |     * 数据离线计算模块。离线计算处于负责离线数据的统计计算以及数据呈现。
22 |     * 系统配置和管理模块。配置和所有的数据管理都有单独的web应用负责。
23 |     * Nebula前段展现模块。Nebula的前段采用JS+API的模式，大量的数据展现功能由前端模块来提供支撑。
24 | 
25 | 当然，系统还用到了许多底层的平台支撑：
26 | 
27 |     * 系统缓存redis。redis提供了缓存数据的支撑，主要包括消息中间件和监控数据的存储。
28 |     * 文件系统。通过自研的文件数据库，可以提供海量数据的存储和查询。
29 |     * 数据存储mysql。mysql提供了所有具备强持久化需求的数据落地和读取。
30 |     * 用户画像辅助kv数据库aerospike。aerospike是一个kv数据库，为用户画像数据的高性能存取提供了支撑
31 |     * 其他。包括负载均衡nginx，离线管理脚本，进程监控平台，定制内核模块等多个其他功能。
32 | 
33 | 下图描述了系统的物理模块组成，以及逻辑模块在其中的划分
34 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9hg3pabj21oj0kg7ay.jpg)
35 | 


--------------------------------------------------------------------------------
/chapter2/section1/section1.2.md:
--------------------------------------------------------------------------------
1 | # 2.1.2. 星云工作原理
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter2/section2.md:
--------------------------------------------------------------------------------
1 | # 2.2. 安装
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter2/section2/section2.1.md:
--------------------------------------------------------------------------------
 1 | # 2.2.1. 配置要求
 2 | 
 3 | # 安装
 4 | 
 5 | ## 安装部署前准备
 6 | 
 7 | 安装 `TH-Nebula`对软件和硬件的系统要求:
 8 | 
 9 | **服务器要求**
10 | 
11 | |项目|确认数值|说明|
12 | |:-------:|:-------:|:-------|
13 | | 服务器操作系统 | centos 6.5/7.2 | 目前预编译系统只支持了`redhat`系的操作系统, 原生支持`cenots 6.5`或`centos 7.2`. 不采用流量镜像的部署可以使用其他小版本 |
14 | | cpu/内存 | 8核16G以上 | 根据流量的大小预估来确定硬件配置. 大部分环境下8核16G, 其他情况需要参考`TH-Nebula`硬件配置要求的相关文档 |
15 | | 硬盘 | 500G以上 确认分区 | 大部分情况下500G即可. 硬盘容量决定了 |
16 | | SSD磁盘 | 确认分区 | 具备SSD磁盘可以加快用户画像模块的系统, 需要确认是否存在 |
17 | | 独立的流量机器 | 是否需要 | 如果流量较大, 需要确认是否有独立的流量分析服务器供使用 |
18 | 
19 | **网络要求**
20 | 
21 | |项目|确认数值|说明|
22 | |:-------:|:-------:|:-------|
23 | | 网络镜像 | 确认服务器镜像流量的网口 | 如果采用硬件流量镜像, 需要确保交换机上已经完成配置, 并且确认服务器上流量镜像的网口 |
24 | | 软流量镜像 | `agent`已经部署 | 云平台上, 可以尝试软流量镜像插件. 要确保流量采集的`agent`已经部署, 并配置了捕获流量的端口; 同时要确保流量宿主服务器与`TH-Nebula`服务器之间的网络畅通 |
25 | | `https`流量确认 | `https` 卸载的节点 | `https`加密流量无法分析, 所以要确保获得的流量已经经过了`ssl` 的offloading, 从而拿到的是明文 |
26 | | 管理端口开放 | 9100端口开放 | `TH-Nebula`的管理界面采用9100端口, 确保防火墙开放或已经做好内部的负载均衡配置 |
27 | | 独立的流量机器 | 流量机与`TH-Nebula`机器互通 | 如果流量较大, 需要将流量采集机器独立, 确保流量采集机器可以访问`TH-Nebula`机器的9001端口 (配置获取) 和6379端口 (消息中间件) |
28 | 
29 | 注:
30 | *`TH-Nebula`会至少开放端口 (9001) , 请确保防火墙开放. *
31 | 
32 | 这里介绍`TH-Nebula`安装方式, 在这里主要介绍两种安装方式:
33 | 
34 | - 二进制安装
35 | - 源码安装
36 | 


--------------------------------------------------------------------------------
/chapter2/section2/section2.2.md:
--------------------------------------------------------------------------------
  1 | # 2.2.2. 二进制安装
  2 | 
  3 | ## Docker 安装
  4 | ### Docker 安装
  5 | 安装一些必要的系统工具：
  6 | 
  7 | ```
  8 | sudo yum install -y yum-utils device-mapper-persistent-data lvm2
  9 | ```
 10 | 
 11 | 添加软件源信息：
 12 | 
 13 | ```
 14 | sudo yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
 15 | ```
 16 | 
 17 | 更新 yum 缓存：
 18 | 
 19 | ```
 20 | sudo yum makecache fast
 21 | ```
 22 | 
 23 | 安装 Docker-ce：
 24 | 
 25 | ```
 26 | sudo yum -y install docker-ce
 27 | ```
 28 | 
 29 | 至此已经安装完 Docker-ce 了, 请输入以下命令查看安装是否完成
 30 | 
 31 | ```
 32 | docker -v
 33 | ```
 34 | 
 35 | 正确的提示, 版本号可能有所不同
 36 | 
 37 | ```
 38 | Docker version 18.09.0, build 4d60db4
 39 | ```
 40 | 
 41 | ![2141eb541d4d8721c.png](http://www.z4a.net/images/2018/12/06/2141eb541d4d8721c.png)
 42 | 
 43 | 设置Docker开机自启：
 44 | 
 45 | ```
 46 | sudo systemctl enable docker
 47 | ```
 48 | 
 49 | ### docker-compose 安装
 50 | 
 51 | 接下来安装 docker-compose, 首先更新 curl 工具
 52 | 
 53 | ```
 54 | yum update curl
 55 | ```
 56 | 
 57 | 然后下载 docker-compose 并安装, 只需要执行以下命令即可
 58 | 
 59 | ```
 60 | sudo curl -L "https://github.com/docker/compose/releases/download/1.23.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
 61 | ```
 62 | 
 63 | ![docker-compose success](http://www.z4a.net/images/2018/12/06/1f1ac0f349eef4d18.png)
 64 | 
 65 | 加入执行权限:
 66 | 
 67 | ```
 68 | sudo chmod +x /usr/local/bin/docker-compose
 69 | ```
 70 | 
 71 | 接下来输入以下命令验证, 可能会出现以下情况
 72 | 
 73 | ```
 74 | docker-compose -v
 75 | ```
 76 | 
 77 | ![32f4a2ce3aca7acbc.png](http://www.z4a.net/images/2018/12/06/32f4a2ce3aca7acbc.png)
 78 | 
 79 | 可以进入到 /usr/local/bin/ 查看 docker-compose 是否已经存在, 并执行以下命令验证
 80 | 
 81 | ```
 82 | cd  /usr/local/bin/
 83 | ./docker-compose -v
 84 | ```
 85 | 
 86 | ![44da24b4eda922b77.png](http://www.z4a.net/images/2018/12/06/44da24b4eda922b77.png)
 87 | 
 88 | 实际上在其他地方无法使用 docker-compose 是环境变量的问题, 只要添加环境变量即可, 首先编辑以下文件
 89 | 
 90 | ```
 91 | vim /etc/profile
 92 | ```
 93 | 
 94 | 然后在最后一行添加以下环境变量, 如下所示(以及附带图片展示)
 95 | 
 96 | ```
 97 | export PATH="/usr/local/bin/:$PATH"
 98 | ```
 99 | 
100 | ![5ed2c12c5f1d2b350.png](http://www.z4a.net/images/2018/12/06/5ed2c12c5f1d2b350.png)
101 | 
102 | 然后输入以下命令使配置文件生效, 接下来就可以使用 docker-compose 了
103 | 
104 | ```
105 | source /etc/profile
106 | ```
107 | 
108 | 
109 | ## TH-Nebula 安装
110 | 
111 | ### 安装步骤
112 | 
113 | 
114 | - 拉取Docker镜像：
115 | 
116 | 	```
117 | 	git clone --recursive https://github.com/threathunterX/nebula.git
118 | 	cd nebula
119 | 	docker-compose pull
120 | 	```
121 | 
122 | - 运行安装脚本
123 | 
124 | 	```
125 | 	./ctrl.sh install
126 | 	```
127 | 
128 | 	![a2. 安装过程截图](http://www.z4a.net/images/2018/11/29/a2.png)
129 | 
130 | - 启动系统
131 | 	```
132 | 	./ctrl.sh start
133 | 	```
134 | 	![a3. 启动过程截图](http://www.z4a.net/images/2018/11/29/a3.png)
135 | 
136 | - 查看运行状态
137 |   ```
138 |   ./ctrl.sh status
139 |   ```
140 |   ![a4. 查看运行状态](http://www.z4a.net/images/2018/11/29/a4.png)
141 | 
142 | 
143 | ## 流量抓取客户端sniffer安装
144 | 
145 | 
146 | 
147 | ### 安装步骤
148 | 
149 | - 拉取Docker镜像：
150 | 	```
151 | 	git clone --recursive https://github.com/threathunterX/sniffer.git
152 | 	cd sniffer
153 | 	docker-compose pull
154 | 	```
155 | 
156 | - 进入目录：
157 | 	```
158 | 	cd sniffer
159 | 	```
160 | 
161 | - 配置修改：
162 | 	```
163 | 	配置文件docker-compose.yml（直接修改此文件即可）
164 | 	
165 | 	  environment:
166 | 	   - REDIS_HOST=127.0.0.1  # 远程redisIP
167 | 	   - REDIS_PORT=16379      # 远程redis端口
168 | 	   - NEBULA_HOST=127.0.0.1 # 远程nebula服务IP
169 | 	   - NEBULA_PORT=9001      # 远程nebulaIP
170 | 
171 | 	   - SOURCES=default       # 数据源,支持多源
172 | 	   
173 | 	   #default,使用bro抓取网卡流量
174 | 	   - DRIVER_INTERFACE=eth0 # 监听网卡
175 | 	   - DRIVER_PORT=80,9001   # 业务服务端口
176 | 	   - BRO_PORT=47000
177 | 
178 | 	```
179 | 
180 | - 启动停止镜像：
181 | 	```
182 | 	1) 启动镜像
183 | 	docker-compose up -d
184 | 	2) 停止镜像  
185 | 	docker-compose down
186 | 	```
187 | 
188 | 
189 | ## 其他说明
190 | 
191 | `9001` 端口为 `TH-Nebula`的`http`端口, 可通过 `http://IP：9001`端口的方式访问 `TH-Nebula`界面
192 | 
193 | `管理员`：threathunter_test ：threathunter
194 | 
195 | `超级管理员`：threathunter ：threathunter
196 | 
197 | 


--------------------------------------------------------------------------------
/chapter2/section2/section2.3.md:
--------------------------------------------------------------------------------
 1 | # 2.2.3. 源码安装
 2 | 
 3 | 
 4 | ## 环境准备
 5 | 
 6 | ### 前端环境
 7 | 
 8 | - Node.js: 11.x
 9 | - npm: 6.x
10 | - webpack: 4.x
11 | 
12 | ### 后端环境
13 | 
14 | - java: 1.8
15 | - maven: 3.x
16 | - python: 2.7.x
17 | 
18 | 以上环境版本为推荐版本，其他版本可行性请自行测试。
19 | 
20 | ## 编译安装
21 | - nubula编译:
22 | 
23 |   ```
24 |   git clone --recursive https://github.com/threathunterX/nebula.git
25 |   cd nebula
26 |   ```
27 |   编译`apps`：
28 |   ```
29 |   为了方便，均可使用项目根目录下的`build.sh`脚本进行编译、安装。
30 |   ./build.sh -u -v 1.1.0 --apps
31 |   ```
32 |   编译`docker`镜像
33 |   ```
34 |   为了方便，均可使用项目根目录下的`build.sh`脚本进行编译、安装。
35 |   ./build.sh -u -v 1.1.0 --image
36 |   ```
37 | 
38 | - sniffer:
39 |   编译`docker`镜像：
40 |   ```
41 |   git clone --recursive https://github.com/threathunterX/sniffer.git
42 |   cd sniffer
43 |   docker-compose build
44 |   ```
45 | 
46 | 
47 | `sniffer`流量抓取服务以及`nebula`主项目安装运行方法请参考二进制安装部分
48 | 


--------------------------------------------------------------------------------
/chapter3/section0.md:
--------------------------------------------------------------------------------
1 | # 3. 使用手册
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter3/section1.md:
--------------------------------------------------------------------------------
 1 | # 3.1. 基本功能
 2 | 
 3 | **总览：**观察网站流量和风险事件的整体情况；纵览能够让你了解到Nebula发现的风险事件去世以及系统的运行状态，通过总览的指引你可以总一个宏观的角度逐渐进入每个风险事件的细节。
 4 | 
 5 | **分析风险事件**
 6 | 风险事件往往不只是一两条访问记录，比如一个爬虫事件或者是一次撞库事件可能对应着成千上万的纪录，但如果不进行聚合的话将无法进行分析和了解，所以在Nebula中一次出发了Alerts的事件我们定义为fact或者是证据，这些事件组合构成了一次风险事件。
 7 | 
 8 | **风险名单管理：**由你设置的某个具体的策略触发而产生，风险名单管理页面展示了风险名单的列别，通过这个页面可以进行风险名单的查询、删除和人工添加等操作。
 9 | 
10 | **风险事件管理：**风险事件由一组关联风险名单的基础事件组成，风险事件可以对不同的攻击进行整理成组，以便分析人员快速的针对一组风险事件进行查看。
11 | 
12 | **风险分析：**风险分析页面提供了IP、USER、PAGE、DEVICE ID四个维度的分析视角、允许分析人员通过不同的维度去查看某个IP、用户、设备或页面的细节以还原风险事件的整个流程。
13 | 
14 | **日志查询：**通过自定义的方式去搜索历史日志中的数据
15 | 
16 | **策略管理：**提供了可视化策略编辑功能，允许用户通过界面方式创建或编辑策略，并且可通过对策略状态的编辑快速的测试策略的有效程度，以及生产策略的下限。
17 | 
18 | 详见产品功能说明书
19 | 


--------------------------------------------------------------------------------
/chapter3/section2.md:
--------------------------------------------------------------------------------
 1 | # 3.2. 常见使用指引
 2 | 
 3 | 
 4 | ### 查看风险事件或风险名单
 5 | 
 6 | TH-Nebula 通常在安装时已经预置了一些常见问题的策略，这些策略在网络流量被正常的引入后便会开始分析并产生一些报警，这些报警您可以通过几个渠道了解到 TH-Nebula 目前为您发现了哪些风险问题，邮件或者是总览界面
 7 | 
 8 | ✧如果您可以正常的收到邮件风险事件的报警，可以通过链接访问到风险事件页面。
 9 | 
10 | ✧如果您通过查看总览页面发现了风险事件，可以通过维度导航直接进入风险分析页面对风
11 | 
12 | 险事件进行细节查看。
13 | 
14 | ### 分析某个IP地址
15 | 
16 | 当你需要调查某个 IP 地址的时候，可在 TH-Nebula 中通过以下步骤进行操作：
17 | 
18 | ● 在功能目录中选择风险分析，点击 IP 分析
19 | 
20 | ● 在 IP 分析页面中可以查看到所有的访问网站的 IP 地址，已经有明确的某个 IP 要进行分
21 | 
22 | 析的情况下，可在右上角搜索框内输入要分析的 IP 地址，点击查询。
23 | 
24 | ● 在 IP 分析统计页面中，可以查看到当下 IP 相关的信息：
25 | 
26 | ✧通过地图，你可以看到这个 IP 地址所处的地理位置，同时地理位置会显示在 IP 信息
27 | 
28 | 框当中。
29 | 
30 | ✧在顶部点击详情中，可以查看到这个 IP 地址在最近几天内的点击次数
31 | 
32 | ✧通过趋势统计，可以看到 IP 在每个小时内关联的不同用户数、设备数、访问不同动态
33 | 
34 | 页面数以及关联的风险名单数量
35 | 
36 | ✧风险统计栏内显示 IP 在对应小时内触发的不同场景风险名单数
37 | 
38 | ✧下方信息统计栏内显示 IP 在对应小时内关联不同用户列表、不同的 USERAGENT 列表、
39 | 
40 | 关联不同设备 ID 点击列表以及访问的不同页面列表
41 | 
42 | 点击页面右侧的点击详情，可进入详情页面查看 IP 在对应小时内的点击详情
43 | 
44 | ● 在点击详情页面中可以查看到 IP 相关的信息：
45 | 
46 | ✧IP 在某个小时内的点击详情，以事件河流图方式展示，如果点击事件关联了报警，点
47 | 
48 | 击事件会以红色圆点显示，便于分析师快速定位问题，在 IP 点击事件河流图内横轴为时间，纵轴以不同的用户进行分类，若-个 IP 中出现了多个用户的点击事件，则会按照关联用户分为不同的行，不存在用户的点击会被集中显示在一列。
49 | 
50 | ✧点击事件下方展示具体的点击列表，选择列表中的事件后，会在下方展示点击事件的详细内容
51 | 
52 | ### 分析某个用户
53 | 
54 | 当你需要调查某个用户的时候，可在 TH-Nebula 中通过以下步骤进行操作
55 | 
56 | ●在功能目录中选择风险分析，点击 USER 分析
57 | 
58 | ●在 USER 分析页面中可以查看到所有访问网站的用户，已经有明确的某个用户名要进行
59 | 
60 | 分析的情况下，可在右上角搜索框内输入要分析的用户名，点击查询。（注意：这里所指的用户名是 TH-Nebula 系统中在解析登录或注册信息时所指定的字段，这个字段往往存在于网络请求中的某个参数）● 在 USER 分析统计页面中，可以查看到当下 USER 相关的信息
61 | 
62 | ✧通过地图，你可以看到这个用户在某个小时内访问来源地址，如果存在多个地址来源
63 | 
64 | 时，会显示多个点以标注用户多个访问来源地址
65 | 
66 | ✧在顶部点击详情中，可以查看到这个用户在最近几天内的访问情况
67 | 
68 | ✧通过趋势统计，可以看到用户在每个小时内关联的不同 IP 数、设备数、访问不同动态
69 | 
70 | 页面数以及关联的风险名单数量
71 | 
72 | ✧风险统计栏内显示用户在对应小时内触发的不同场景风险名单数
73 | 
74 | ✧下方信息统计栏内显示用户在对应小时内关联不同 IP 列表、不同的 USERAGENT 列表、
75 | 
76 | 关联不同设备 ID 点击列表以及访问的不同页面列表
77 | 
78 | ● 点击页面右侧的点击详情，可进入详情页面查看用户在对应小时内的点击详情 ● 在点击详情页面中可以查看到用户的相关信息
79 | 
80 | ✧用户在某个小时内的点击详情，以事件河流图方式展示，如果用户对应的点击事件关联了报警，则会以红色圆点显示，在用户点击事件河流图内横轴为时间，纵轴以不同的 IP 进行分类，若-一个用户的访问事件中出现了多个 IP，则会按 IP 分为不同的行 ✧点击事件下方展示具体的点击列表，选择列表中的事件后，会在下方展示点击事件的详细内容
81 | 
82 | ### 分析某个页面
83 | 
84 | 不同于 IP、USER、DEVICE 维度的分析，在某些情况下我们感知到风险事件时往往并不是某个具体的 IP 地址或者用户名，TH-Nebula 为您考虑更多，提供了页面分析维度，当你需要调查某个页面关联的访问用户、IP 地址时，可在 TH-Nebula 中通过以下步骤进行操作：
85 | 
86 | ● 在功能目录中选择风险分析，点击 PAGE 分析
87 | ● 在 PAGE 分析页面中可以看到所有被访问的页面，已经有某个页面要进行分析的情况下，可在右。上角搜索框内输入要分析的网页地址，点击查询。
88 | ● 在 PAGE 分析统计页面中，可以查看到当前 PAGE 相关的信息：
89 | 
90 | ✧通过地图，可以看到这个页面在某个小时内的访问来源地址
91 | 
92 | ✧在顶部点击详情中，可以查看到这个页面在最近几天的访问情况（按小时）
93 | 
94 | 点击切换地图为趋势统计，可以看到页面在每个小时内关联的不同 IP 数、用户数、设
95 | 
96 | 备数以及风险事件数量
97 | 
98 | ✧风险统计栏内可以切换查看页面关联的 IP、USER、DEVICE 列表
99 | 


--------------------------------------------------------------------------------
/chapter3/section3.md:
--------------------------------------------------------------------------------
1 | # 3.3. 业务对接
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter3/section3/section3.1.md:
--------------------------------------------------------------------------------
 1 | # 3.3.1. 场景介绍
 2 | 
 3 | 
 4 | 对于公司业务细分到不同的场景,  再到定制策略,  以及`TH-Nebula`脚本可能会有些模糊,  对于一开始认识`TH-Nebula`系统可能会对此概念模糊不清,  所以接下来对于不同的场景分别制定策略,  来深入了解策略的定制,  以及`TH-Nebula`脚本的定制.
 5 | 
 6 | ## IP撞库登陆
 7 | 假设同一个`IP`, 不断的用不同的账号或者密码一直访问登陆页面,  那么设定次数之后就可以捕获到这个`IP`, 之后对此`IP`进行弹验证码, 封`IP`等处理.
 8 | 
 9 | ### 策略的制定
10 | 
11 | 首先对公司的登陆接口在新建策略的时候, 以 [事件-动态资源请求] 选定属性 [`page`] , 也就是`URL`中包含了` login` 字段的, 即认定为登陆接口, 此处要根据公司的登陆接口做相应的改变, 每家公司业务的登陆接口不尽相同.
12 | 在 ①处填公司的登陆接口
13 | 
14 | ![4.撞库登陆策略配置1](http://wx1.sinaimg.cn/large/0060lm7Tly1fxnn7rv3uij317z0diwg7.jpg)
15 | 
16 | 接下来选择点击图片中②所全选的＋号, 选择 [`条件判断`-`条件统计`-`动态资源请求`]
17 | 
18 | ![5.撞库登陆策略配置2](http://wx4.sinaimg.cn/large/0060lm7Tly1fxnnah8wggj30ps0bp0u0.jpg)
19 | 
20 | 然后编辑参数, 选择同一个`IP`的访问次数, 这样才可以达到计算的效果
21 | 
22 | ![6.撞库登陆策略配置3](http://wx3.sinaimg.cn/large/0060lm7Tly1fxnncg4iydj30ri0fb769.jpg)
23 | 
24 | 以上是实时计算5分钟内统计同一`IP`访问登陆接口则对次`IP`进行判定为审核或者是黑名单,  通过`TH-Nebula`查询接口可以查询到黑名单, 审核名单, 可以对其做进一步处理.
25 | 
26 | ##  IP大量请求却不加载静态资源
27 | 
28 | 假设一个`IP`大量的请求却不加载静态资源,  那么可以认定此`IP`是通过机器程序登陆的操作的, 只对接口访问, 这种`IP`存在一定的风险.
29 | 
30 | ### 策略的制定
31 | 
32 | 选择新建策略, 选择 [`事件`-`动态资源请求`] 选定属性`c_ip`, 包含  [ . ]  , 这样表达的意思是所有的`IP`都会被捕获到, 接下来通过条件筛选出同一个`IP`大量请求却不加载静态资源的`IP`.
33 | 
34 | ![7.IP大量请求策略配置1](http://wx2.sinaimg.cn/large/0060lm7Tly1fxnndt8p62j30zx0ev40g.jpg)
35 | 
36 | 选择 [`条件判断`-`条件统计`-`IP请求量5M`] 然后编辑参数设定为 [`c_ip`] 大于50,  表示在5分钟内某个相同的`IP`访问50次网站
37 | 接下来选择 [`条件判断`-`条件统计`-`静态资源请求量`]  编辑参数设定 [`c_ip`] 等于  [0]
38 | 
39 | ![8.IP大量请求策略配置2](http://wx1.sinaimg.cn/large/0060lm7Tly1fxnnf6zj59j30zv0g1q57.jpg)
40 | 这样设定就可以判断出同一个`IP`在5分钟内大量请求接口却不加载静态资源, 进行判定为审核或者是黑名单,  通过`TH-Nebula`查询接口可以查询到黑名单, 审核名单, 可以对其做进一步处理.
41 | 
42 | ## IP关联多个用户
43 | 
44 | 假设一个`IP`以多个用户登陆,  那么可以认定此`IP`是通过机器程序操作多个用, 这种`IP`存在一定的风险.
45 | 
46 | ### 策略的制定
47 | 
48 | 选择新建策略, 选择 [`事件`-`动态资源请求`] 选定属性 [`c_ip`] , 包含  [ . ]  , 这样表达的意思是所有的`IP`都会被捕获到, 接下来通过条件筛选出同一个`IP`关联多个用户
49 | 
50 | ![9.IP关联多用户策略配置1](http://wx3.sinaimg.cn/large/0060lm7Tly1fxnngxgjukj310h0cgmyd.jpg)
51 | 
52 | 随后增加条件筛选
53 | 
54 | ![10.IP关联多用户策略配置2](http://wx3.sinaimg.cn/large/0060lm7Tly1fxnnknu0ezj310u0e075w.jpg)
55 | 
56 | 需要注意编辑参数
57 | 
58 | ![11.IP关联多用户策略配置3](http://wx4.sinaimg.cn/large/0060lm7Tly1fxnnllmmafj31070ewtak.jpg)
59 | 
60 | 之后增加处置措施
61 | 
62 | ![12.IP关联多用户策略配置4](http://wx4.sinaimg.cn/large/0060lm7Tly1fxnnmo9lnkj31090hkwh3.jpg)
63 | 
64 | 
65 | 以上三个例子从3个不同的业务场景说明了制定场景的方法, 希望可以从这些例子帮助到理解根据业务制定策略.
66 | 


--------------------------------------------------------------------------------
/chapter3/section3/section3.2.md:
--------------------------------------------------------------------------------
  1 | # 3.3.2. 事件介绍
  2 | 
  3 | 
  4 | ## 事件的定义
  5 | 
  6 | 事件所代表的含义, 在于对所得到的流量分类, 根据事件分类之后, 可以根据事件的字段跟流量的字段对接起来, 然后事件的字段, 可以统计, 计算, 触发审核, 或者拉入黑名单.
  7 | 
  8 | ## 如何使用事件
  9 | 
 10 | 使用事件需要去定义一个策略,  进入`TH-Nebula`管理后台之后, 点击新建策略, 如图所示:
 11 | 
 12 | ![13.事件1](http://wx1.sinaimg.cn/large/0060lm7Tly1fxnnnrtbb0j31gc0mrq64.jpg)    
 13 | 
 14 | 然后输入策略的名字, 以及策略简介
 15 | 
 16 | ![14.事件2](http://wx1.sinaimg.cn/large/0060lm7Tly1fxnnoq31x9j30qh0bjmxx.jpg)
 17 | 
 18 | 
 19 | 然后添加事件, 事件总共有 18 种, 例如点击其中一种: [`账户`-`实名验证`]
 20 | 
 21 | ![15.事件3](http://wx1.sinaimg.cn/large/0060lm7Tly1fxnnpvp5qfj30q70gata4.jpg)
 22 | 
 23 | 之后选择[`c_ip`], 表示`client_ip`, 即客户端`IP`
 24 | 
 25 | ![16.事件4](http://wx4.sinaimg.cn/large/0060lm7Tly1fxnnr1nuahj30q40gomyr.jpg)
 26 | 
 27 | 接下来选择客户端, 然后选择条件, [ 包含 ] 值为 . 那么实际上就是所有`IP`都能被捕获到这个条件中去.
 28 | 
 29 | ![17.事件5](http://wx3.sinaimg.cn/large/0060lm7Tly1fxnns14c8uj30qd0erdhb.jpg)
 30 | 
 31 | 接下来点击添加风险名单, 然后编辑参数
 32 | 
 33 | ![18.事件6](http://wx3.sinaimg.cn/large/0060lm7Tly1fxnnsqyrrjj30q908tgmu.jpg)
 34 | 
 35 | 这其中的风险类型为`USER`, 风险值为`c_ip`分配好后, 可以在风险分析中查看到, 这个风险决策为审核, 那么当捕获到这个`IP`的时候, 设置为审核, 可以通过`TH-Nebula`查询接口查询到这个状态, 然后结合业务处理这个风险.
 36 | 
 37 | 这样就使用了一个名字叫做  [ 账号-实名验证 ] 的一个事件
 38 | 
 39 | ##  事件所拥有的属性
 40 | 
 41 | 总共有18种事件, 接下来介绍事件所拥有的属性,  这些属性可以用做计算,  后续会说明  
 42 | 
 43 | 事件:
 44 | 
 45 | - 策略名: `TRANSACTION_DEPOSIT`
 46 | - 策略备注: 资金存入
 47 | 
 48 | |字段名|字段类型|字段备注|
 49 | |:-------:|:-------:|:-------:|
 50 | | id | string | 空 |
 51 | | pid | string | 空 |
 52 | | c_ip | string | 客户端ip |
 53 | | sid | string | 空 |
 54 | | uid | string | 空 |
 55 | | did | string | 空 |
 56 | | platform | string | 空 |
 57 | | page | string | 空 |
 58 | | notices | string | 空 |
 59 | | c_port | long | 客户端端口 |
 60 | | c_bytes | long | 请求大小 |
 61 | | c_body | string | 请求内容 |
 62 | | c_type | string | 空 |
 63 | | s_ip | string | 服务端ip |
 64 | | s_port | long | 服务端端口 |
 65 | | s_bytes | long | 响应大小 |
 66 | | s_body | string | 响应内容 |
 67 | | s_type | string | 空 |
 68 | | host | string | 主机地址 |
 69 | | uri_stem | string | url问号前部分 |
 70 | | uri_query | string | url问号后部分 |
 71 | | referer | string | 空 |
 72 | | method | string | 请求方法 |
 73 | | status | long | 请求状态 |
 74 | | cookie | string | 空 |
 75 | | useragent | string | 空 |
 76 | | xforward | string | 空 |
 77 | | request_time | long | 空 |
 78 | | request_type | string | 空 |
 79 | | referer_hit | string | 空 |
 80 | | geo_city | string | 空 |
 81 | | user_name | string | 空 |
 82 | | transaction_id | string | 空 |
 83 | | deposit_amount | string | 空 |
 84 | | card_number | string | 空 |
 85 | | counterpart_user | string | 空 |
 86 | | account_balance_before | string | 空 |
 87 | | result | string | 操作结果 |
 88 | 
 89 | 
 90 | - 策略名: `ACCOUNT_CERTIFICATION`
 91 | - 策略备注: 账户实名认证
 92 | 
 93 | |字段名|字段类型|字段备注|
 94 | |:-------:|:-------:|:-------:|
 95 | | id | string | 空 |
 96 | | pid | string | 空 |
 97 | | c_ip | string | 客户端ip |
 98 | | sid | string | 空 |
 99 | | uid | string | 空 |
100 | | did | string | 空 |
101 | | platform | string | 空 |
102 | | page | string | 空 |
103 | | notices | string | 空 |
104 | | c_port | long | 客户端端口 |
105 | | c_bytes | long | 请求大小 |
106 | | c_body | string | 请求内容 |
107 | | c_type | string | 空 |
108 | | s_ip | string | 服务端ip |
109 | | s_port | long | 服务端端口 |
110 | | s_bytes | long | 响应大小 |
111 | | s_body | string | 响应内容 |
112 | | s_type | string | 空 |
113 | | host | string | 主机地址 |
114 | | uri_stem | string | url问号前部分 |
115 | | uri_query | string | url问号后部分 |
116 | | referer | string | 空 |
117 | | method | string | 请求方法 |
118 | | status | long | 请求状态 |
119 | | cookie | string | 空 |
120 | | useragent | string | 空 |
121 | | xforward | string | 空 |
122 | | request_time | long | 空 |
123 | | request_type | string | 空 |
124 | | referer_hit | string | 空 |
125 | | user_name | string | 空 |
126 | | person_id | string | 空 |
127 | | real_name | string | 空 |
128 | | result | string | 操作结果 |
129 | | geo_city | string | 空 |
130 | 
131 | 
132 | - 策略名: `ACCOUNT_LOGIN`
133 | - 策略备注: 账户登录
134 | 
135 | |字段名|字段类型|字段备注|
136 | |:-------:|:-------:|:-------:|
137 | | id | string | 空 |
138 | | pid | string | 空 |
139 | | c_ip | string | 客户端ip |
140 | | sid | string | 空 |
141 | | uid | string | 空 |
142 | | did | string | 空 |
143 | | platform | string | 空 |
144 | | page | string | 空 |
145 | | notices | string | 空 |
146 | | c_port | long | 客户端端口 |
147 | | c_bytes | long | 请求大小 |
148 | | c_body | string | 请求内容 |
149 | | c_type | string | 空 |
150 | | s_ip | string | 服务端ip |
151 | | s_port | long | 服务端端口 |
152 | | s_bytes | long | 响应大小 |
153 | | s_body | string | 响应内容 |
154 | | s_type | string | 空 |
155 | | host | string | 主机地址 |
156 | | uri_stem | string | url问号前部分 |
157 | | uri_query | string | url问号后部分 |
158 | | referer | string | 空 |
159 | | method | string | 请求方法 |
160 | | status | long | 请求状态 |
161 | | cookie | string | 空 |
162 | | useragent | string | 空 |
163 | | xforward | string | 空 |
164 | | request_time | long | 空 |
165 | | request_type | string | 空 |
166 | | referer_hit | string | 空 |
167 | | user_name | string | 空 |
168 | | login_verification_type | string | 空 |
169 | | password | string | 密码md5 |
170 | | captcha | string | 验证码 |
171 | | result | string | 操作结果 |
172 | | remember_me | string | 空 |
173 | | login_channel | string | 空 |
174 | | geo_city | string | 空 |
175 | 
176 | 
177 | - 策略名: `ACCOUNT_REFERRALCODE_CREATE`
178 | - 策略备注: `http`动态资源访问
179 | 
180 | |字段名|字段类型|字段备注|
181 | |:-------:|:-------:|:-------:|
182 | | id | string | 空 |
183 | | pid | string | 空 |
184 | | c_ip | string | 客户端ip |
185 | | sid | string | 空 |
186 | | uid | string | 空 |
187 | | did | string | 空 |
188 | | platform | string | 空 |
189 | | page | string | 空 |
190 | | notices | string | 空 |
191 | | c_port | long | 客户端端口 |
192 | | c_bytes | long | 请求大小 |
193 | | c_body | string | 请求内容 |
194 | | c_type | string | 空 |
195 | | s_ip | string | 服务端ip |
196 | | s_port | long | 服务端端口 |
197 | | s_bytes | long | 响应大小 |
198 | | s_body | string | 响应内容 |
199 | | s_type | string | 空 |
200 | | host | string | 主机地址 |
201 | | uri_stem | string | url问号前部分 |
202 | | uri_query | string | url问号后部分 |
203 | | referer | string | 空 |
204 | | method | string | 请求方法 |
205 | | status | long | 请求状态 |
206 | | cookie | string | 空 |
207 | | useragent | string | 空 |
208 | | xforward | string | 空 |
209 | | request_time | long | 空 |
210 | | request_type | string | 空 |
211 | | referer_hit | string | 空 |
212 | | geo_city | string | 空 |
213 | | user_name | string | 空 |
214 | | referralcode | string | 空 |
215 | | code_type | string | 空 |
216 | 
217 | 
218 | - 策略名: `HTTP_STATIC`
219 | - 策略备注: `http`静态资源访问
220 | 
221 | |字段名|字段类型|字段备注|
222 | |:-------:|:-------:|:-------:|
223 | | id | string | 空 |
224 | | pid | string | 空 |
225 | | c_ip | string | 客户端ip |
226 | | sid | string | 空 |
227 | | uid | string | 空 |
228 | | did | string | 空 |
229 | | platform | string | 空 |
230 | | page | string | 空 |
231 | | notices | string | 空 |
232 | | c_port | long | 客户端端口 |
233 | | c_bytes | long | 请求大小 |
234 | | c_body | string | 请求内容 |
235 | | c_type | string | 空 |
236 | | s_ip | string | 服务端ip |
237 | | s_port | long | 服务端端口 |
238 | | s_bytes | long | 响应大小 |
239 | | s_body | string | 响应内容 |
240 | | s_type | string | 空 |
241 | | host | string | 主机地址 |
242 | | uri_stem | string | url问号前部分 |
243 | | uri_query | string | url问号后部分 |
244 | | referer | string | 空 |
245 | | method | string | 请求方法 |
246 | | status | long | 请求状态 |
247 | | cookie | string | 空 |
248 | | useragent | string | 空 |
249 | | xforward | string | 空 |
250 | | request_time | long | 空 |
251 | | request_type | string | 空 |
252 | | referer_hit | string | 空 |
253 | | geo_city | string | 空 |
254 | 
255 | 
256 | - 策略名: `HTTP_DYNAMIC`
257 | - 策略备注: `http`动态资源访问
258 | 
259 | |字段名|字段类型|字段备注|
260 | |:-------:|:-------:|:-------:|
261 | | id | string | 空 |
262 | | pid | string | 空 |
263 | | c_ip | string | 客户端ip |
264 | | sid | string | 空 |
265 | | uid | string | 空 |
266 | | did | string | 空 |
267 | | platform | string | 空 |
268 | | page | string | 空 |
269 | | notices | string | 空 |
270 | | c_port | long | 客户端端口 |
271 | | c_bytes | long | 请求大小 |
272 | | c_body | string | 请求内容 |
273 | | c_type | string | 空 |
274 | | s_ip | string | 服务端ip |
275 | | s_port | long | 服务端端口 |
276 | | s_bytes | long | 响应大小 |
277 | | s_body | string | 响应内容 |
278 | | s_type | string | 空 |
279 | | host | string | 主机地址 |
280 | | uri_stem | string | url问号前部分 |
281 | | uri_query | string | url问号后部分 |
282 | | referer | string | 空 |
283 | | method | string | 请求方法 |
284 | | status | long | 请求状态 |
285 | | cookie | string | 空 |
286 | | useragent | string | 空 |
287 | | xforward | string | 空 |
288 | | request_time | long | 空 |
289 | | request_type | string | 空 |
290 | | referer_hit | string | 空 |
291 | | geo_city | string | 空 |
292 | 
293 | 
294 | - 策略名: `ACCOUNT_TOKEN_CHANGE`
295 | - 策略备注: 账户凭证修改
296 | 
297 | |字段名|字段类型|字段备注|
298 | |:-------:|:-------:|:-------:|
299 | | id | string | 空 |
300 | | pid | string | 空 |
301 | | c_ip | string | 客户端ip |
302 | | sid | string | 空 |
303 | | uid | string | 空 |
304 | | did | string | 空 |
305 | | platform | string | 空 |
306 | | page | string | 空 |
307 | | notices | string | 空 |
308 | | c_port | long | 客户端端口 |
309 | | c_bytes | long | 请求大小 |
310 | | c_body | string | 请求内容 |
311 | | c_type | string | 空 |
312 | | s_ip | string | 服务端ip |
313 | | s_port | long | 服务端端口 |
314 | | s_bytes | long | 响应大小 |
315 | | s_body | string | 响应内容 |
316 | | s_type | string | 空 |
317 | | host | string | 主机地址 |
318 | | uri_stem | string | url问号前部分 |
319 | | uri_query | string | url问号后部分 |
320 | | referer | string | 空 |
321 | | method | string | 请求方法 |
322 | | status | long | 请求状态 |
323 | | cookie | string | 空 |
324 | | useragent | string | 空 |
325 | | xforward | string | 空 |
326 | | request_time | long | 空 |
327 | | request_type | string | 空 |
328 | | referer_hit | string | 空 |
329 | | user_name | string | 空 |
330 | | old_token | string | 空 |
331 | | new_token | string | 空 |
332 | | token_type | string | 空 |
333 | | captcha | string | 验证码 |
334 | | result | string | 操作结果 |
335 | | geo_city | string | 空 |
336 | 
337 | 
338 | - 策略名: `TRANSACTION_ESCROW`
339 | - 策略备注: 第三方支付
340 | 
341 | |字段名|字段类型|字段备注|
342 | |:-------:|:-------:|:-------:|
343 | | id | string | 空 |
344 | | pid | string | 空 |
345 | | c_ip | string | 客户端ip |
346 | | sid | string | 空 |
347 | | uid | string | 空 |
348 | | did | string | 空 |
349 | | platform | string | 空 |
350 | | page | string | 空 |
351 | | notices | string | 空 |
352 | | c_port | long | 客户端端口 |
353 | | c_bytes | long | 请求大小 |
354 | | c_body | string | 请求内容 |
355 | | c_type | string | 空 |
356 | | s_ip | string | 服务端ip |
357 | | s_port | long | 服务端端口 |
358 | | s_bytes | long | 响应大小 |
359 | | s_body | string | 响应内容 |
360 | | s_type | string | 空 |
361 | | host | string | 主机地址 |
362 | | uri_stem | string | url问号前部分 |
363 | | uri_query | string | url问号后部分 |
364 | | referer | string | 空 |
365 | | method | string | 请求方法 |
366 | | status | long | 请求状态 |
367 | | cookie | string | 空 |
368 | | useragent | string | 空 |
369 | | xforward | string | 空 |
370 | | request_time | long | 空 |
371 | | request_type | string | 空 |
372 | | referer_hit | string | 空 |
373 | | user_name | string | 空 |
374 | | transaction_id | string | 空 |
375 | | escrow_type | string | 空 |
376 | | escrow_account | string | 空 |
377 | | pay_amount | double | 空 |
378 | | result | string | 操作结果 |
379 | | geo_city | string | 空 |
380 | 
381 | 
382 | - 策略名: `ORDER_SUBMIT`
383 | - 策略备注: 提交订单
384 | 
385 | |字段名|字段类型|字段备注|
386 | |:-------:|:-------:|:-------:|
387 | | id | string | 空 |
388 | | pid | string | 空 |
389 | | c_ip | string | 客户端ip |
390 | | sid | string | 空 |
391 | | uid | string | 空 |
392 | | did | string | 空 |
393 | | platform | string | 空 |
394 | | page | string | 空 |
395 | | notices | string | 空 |
396 | | c_port | long | 客户端端口 |
397 | | c_bytes | long | 请求大小 |
398 | | c_body | string | 请求内容 |
399 | | c_type | string | 空 |
400 | | s_ip | string | 服务端ip |
401 | | s_port | long | 服务端端口 |
402 | | s_bytes | long | 响应大小 |
403 | | s_body | string | 响应内容 |
404 | | s_type | string | 空 |
405 | | host | string | 主机地址 |
406 | | uri_stem | string | url问号前部分 |
407 | | uri_query | string | url问号后部分 |
408 | | referer | string | 空 |
409 | | method | string | 请求方法 |
410 | | status | long | 请求状态 |
411 | | cookie | string | 空 |
412 | | useragent | string | 空 |
413 | | xforward | string | 空 |
414 | | request_time | long | 空 |
415 | | request_type | string | 空 |
416 | | referer_hit | string | 空 |
417 | | order_id | string | 空 |
418 | | user_name | string | 空 |
419 | | product_id | string | 空 |
420 | | product_type | string | 空 |
421 | | product_attribute | string | 空 |
422 | | product_count | long | 空 |
423 | | product_total_count | long | 空 |
424 | | merchant | string | 商家 |
425 | | order_money_amount | double | 空 |
426 | | order_coupon_amount | double | 空 |
427 | | order_point_amount | double | 空 |
428 | | transaction_id | string | 空 |
429 | | receiver_mobile | string | 空 |
430 | | receiver_address_country | string | 空 |
431 | | receiver_address_province | string | 空 |
432 | | receiver_address_city | string | 空 |
433 | | receiver_address_detail | string | 空 |
434 | | receiver_realname | string | 空 |
435 | | result | string | 操作结果 |
436 | | geo_city | string | 空 |
437 | 
438 | 
439 | - 策略名: `ACTIVITY_DO`
440 | - 策略备注: 营销活动
441 | 
442 | |字段名|字段类型|字段备注|
443 | |:-------:|:-------:|:-------:|
444 | | id | string | 空 |
445 | | pid | string | 空 |
446 | | c_ip | string | 客户端ip |
447 | | sid | string | 空 |
448 | | uid | string | 空 |
449 | | did | string | 空 |
450 | | platform | string | 空 |
451 | | page | string | 空 |
452 | | c_port | long | 客户端端口 |
453 | | c_bytes | long | 请求大小 |
454 | | c_body | string | 请求内容 |
455 | | c_type | string | 空 |
456 | | s_ip | string | 服务端ip |
457 | | s_port | long | 服务端端口 |
458 | | s_bytes | long | 响应大小 |
459 | | s_body | string | 响应内容 |
460 | | s_type | string | 空 |
461 | | host | string | 主机地址 |
462 | | uri_stem | string | url问号前部分 |
463 | | uri_query | string | url问号后部分 |
464 | | referer | string | 空 |
465 | | method | string | 请求方法 |
466 | | status | long | 请求状态 |
467 | | cookie | string | 空 |
468 | | useragent | string | 空 |
469 | | xforward | string | 空 |
470 | | request_time | long | 空 |
471 | | request_type | string | 空 |
472 | | referer_hit | string | 空 |
473 | | username | string | 空 |
474 | | activity_name | string | 空 |
475 | | activity_type | string | 空 |
476 | | activity_gain_count | long | 空 |
477 | | activity_gain_amount | long | 空 |
478 | | activity_pay_amount | long | 空 |
479 | | acticity_counterpart_user | string | 空 |
480 | | result | string | 操作结果 |
481 | | geo_city | string | 空 |
482 | 
483 | 
484 | - 策略名: `HTTP_DYNAMIC_DELAY`
485 | - 策略备注: `http`动态资源访问`Delayevent`
486 | 
487 | |字段名|字段类型|字段备注|
488 | |:-------:|:-------:|:-------:|
489 | | id | string | 空 |
490 | | pid | string | 空 |
491 | | c_ip | string | 客户端ip |
492 | | sid | string | 空 |
493 | | uid | string | 空 |
494 | | did | string | 空 |
495 | | platform | string | 空 |
496 | | page | string | 空 |
497 | | notices | string | 空 |
498 | | c_port | long | 客户端端口 |
499 | | c_bytes | long | 请求大小 |
500 | | c_body | string | 请求内容 |
501 | | c_type | string | 空 |
502 | | s_ip | string | 服务端ip |
503 | | s_port | long | 服务端端口 |
504 | | s_bytes | long | 响应大小 |
505 | | s_body | string | 响应内容 |
506 | | s_type | string | 空 |
507 | | host | string | 主机地址 |
508 | | uri_stem | string | url问号前部分 |
509 | | uri_query | string | url问号后部分 |
510 | | referer | string | 空 |
511 | | method | string | 请求方法 |
512 | | status | long | 请求状态 |
513 | | cookie | string | 空 |
514 | | useragent | string | 空 |
515 | | xforward | string | 空 |
516 | | request_time | long | 空 |
517 | | request_type | string | 空 |
518 | | referer_hit | string | 空 |
519 | | geo_city | string | 空 |
520 | | delay_strategy | string | 空 |
521 | 
522 | 
523 | - 策略名: `ORDER_CANCEL`
524 | - 策略备注: 取消订单
525 | 
526 | |字段名|字段类型|字段备注|
527 | |:-------:|:-------:|:-------:|
528 | | id | string | 空 |
529 | | pid | string | 空 |
530 | | c_ip | string | 客户端ip |
531 | | sid | string | 空 |
532 | | uid | string | 空 |
533 | | did | string | 空 |
534 | | platform | string | 空 |
535 | | page | string | 空 |
536 | | notices | string | 空 |
537 | | c_port | long | 客户端端口 |
538 | | c_bytes | long | 请求大小 |
539 | | c_body | string | 请求内容 |
540 | | c_type | string | 空 |
541 | | s_ip | string | 服务端ip |
542 | | s_port | long | 服务端端口 |
543 | | s_bytes | long | 响应大小 |
544 | | s_body | string | 响应内容 |
545 | | s_type | string | 空 |
546 | | host | string | 主机地址 |
547 | | uri_stem | string | url问号前部分 |
548 | | uri_query | string | url问号后部分 |
549 | | referer | string | 空 |
550 | | method | string | 请求方法 |
551 | | status | long | 请求状态 |
552 | | cookie | string | 空 |
553 | | useragent | string | 空 |
554 | | xforward | string | 空 |
555 | | request_time | long | 空 |
556 | | request_type | string | 空 |
557 | | referer_hit | string | 空 |
558 | | order_id | string | 空 |
559 | | user_name | string | 空 |
560 | | merchant | string | 商家 |
561 | | cancel_reason | string | 空 |
562 | | transaction_id | string | 空 |
563 | | result | string | 操作结果 |
564 | | geo_city | string | 空 |
565 | 
566 | 
567 | 
568 | - 策略名: `ACCOUNT_PW_CHANGE`
569 | - 策略备注: 账户密码修改
570 | 
571 | |字段名|字段类型|字段备注|
572 | |:-------:|:-------:|:-------:|
573 | | id | string | 空 |
574 | | pid | string | 空 |
575 | | c_ip | string | 客户端ip |
576 | | sid | string | 空 |
577 | | uid | string | 空 |
578 | | did | string | 空 |
579 | | platform | string | 空 |
580 | | page | string | 空 |
581 | | notices | string | 空 |
582 | | c_port | long | 客户端端口 |
583 | | c_bytes | long | 请求大小 |
584 | | c_body | string | 请求内容 |
585 | | c_type | string | 空 |
586 | | s_ip | string | 服务端ip |
587 | | s_port | long | 服务端端口 |
588 | | s_bytes | long | 响应大小 |
589 | | s_body | string | 响应内容 |
590 | | s_type | string | 空 |
591 | | host | string | 主机地址 |
592 | | uri_stem | string | url问号前部分 |
593 | | uri_query | string | url问号后部分 |
594 | | referer | string | 空 |
595 | | method | string | 请求方法 |
596 | | status | long | 请求状态 |
597 | | cookie | string | 空 |
598 | | useragent | string | 空 |
599 | | xforward | string | 空 |
600 | | request_time | long | 空 |
601 | | request_type | string | 空 |
602 | | referer_hit | string | 空 |
603 | | user_name | string | 空 |
604 | | old_password | string | 旧密码 |
605 | | new_password | string | 新密码 |
606 | | verification_token | string | 空 |
607 | | verification_token_type | string | 空 |
608 | | captcha | string | 验证码 |
609 | | result | string | 操作结果 |
610 | | geo_city | string | 空 |
611 | 
612 | 
613 | - 策略名: `ACCOUNT_REGISTRATION`
614 | - 策略备注: 账户注册
615 | 
616 | |字段名|字段类型|字段备注|
617 | |:-------:|:-------:|:-------:|
618 | | id | string | 空 |
619 | | pid | string | 空 |
620 | | c_ip | string | 客户端ip |
621 | | sid | string | 空 |
622 | | uid | string | 空 |
623 | | did | string | 空 |
624 | | platform | string | 空 |
625 | | page | string | 空 |
626 | | notices | string | 空 |
627 | | c_port | long | 客户端端口 |
628 | | c_bytes | long | 请求大小 |
629 | | c_body | string | 请求内容 |
630 | | c_type | string | 空 |
631 | | s_ip | string | 服务端ip |
632 | | s_port | long | 服务端端口 |
633 | | s_bytes | long | 响应大小 |
634 | | s_body | string | 响应内容 |
635 | | s_type | string | 空 |
636 | | host | string | 主机地址 |
637 | | uri_stem | string | url问号前部分 |
638 | | uri_query | string | url问号后部分 |
639 | | referer | string | 空 |
640 | | method | string | 请求方法 |
641 | | status | long | 请求状态 |
642 | | cookie | string | 空 |
643 | | useragent | string | 空 |
644 | | xforward | string | 空 |
645 | | request_time | long | 空 |
646 | | request_type | string | 空 |
647 | | referer_hit | string | 空 |
648 | | user_name | string | 空 |
649 | | password | string | 密码md5 |
650 | | register_verification_token | string | 空 |
651 | | register_verification_token_type | string | 空 |
652 | | captcha | string | 验证码 |
653 | | result | string | 操作结果 |
654 | | register_realname | string | 空 |
655 | | register_channel | string | 空 |
656 | | invite_code | string | 空 |
657 | | geo_city | string | 空 |
658 | 
659 | 
660 | - 策略名: `TRANSACTION_WITHDRAW`
661 | - 策略备注: 资金取现
662 | 
663 | |字段名|字段类型|字段备注|
664 | |:-------:|:-------:|:-------:|
665 | | id | string | 空 |
666 | | pid | string | 空 |
667 | | c_ip | string | 客户端ip |
668 | | sid | string | 空 |
669 | | uid | string | 空 |
670 | | did | string | 空 |
671 | | platform | string | 空 |
672 | | page | string | 空 |
673 | | notices | string | 空 |
674 | | c_port | long | 客户端端口 |
675 | | c_bytes | long | 请求大小 |
676 | | c_body | string | 请求内容 |
677 | | c_type | string | 空 |
678 | | s_ip | string | 服务端ip |
679 | | s_port | long | 服务端端口 |
680 | | s_bytes | long | 响应大小 |
681 | | s_body | string | 响应内容 |
682 | | s_type | string | 空 |
683 | | host | string | 主机地址 |
684 | | uri_stem | string | url问号前部分 |
685 | | uri_query | string | url问号后部分 |
686 | | referer | string | 空 |
687 | | method | string | 请求方法 |
688 | | status | long | 请求状态 |
689 | | cookie | string | 空 |
690 | | useragent | string | 空 |
691 | | xforward | string | 空 |
692 | | request_time | long | 空 |
693 | | request_type | string | 空 |
694 | | referer_hit | string | 空 |
695 | | geo_city | string | 空 |
696 | | user_name | string | 空 |
697 | | transaction_id | string | 空 |
698 | | withdraw_amount | string | 空 |
699 | | withdraw_type | string | 空 |
700 | | card_number | string | 空 |
701 | | counterpart_user | string | 空 |
702 | | account_balance_before | string | 空 |
703 | | result | string | 操作结果 |
704 | 
705 | 
706 | - 策略名: `HTTP_CLICK`
707 | - 策略备注:`httpclick`事件
708 | 
709 | |字段名|字段类型|字段备注|
710 | |:-------:|:-------:|:-------:|
711 | | id | string | 空 |
712 | | pid | string | 空 |
713 | | c_ip | string | 客户端ip |
714 | | sid | string | 空 |
715 | | uid | string | 空 |
716 | | did | string | 空 |
717 | | platform | string | 空 |
718 | | page | string | 空 |
719 | | notices | string | 空 |
720 | | c_port | long | 客户端端口 |
721 | | c_bytes | long | 请求大小 |
722 | | c_body | string | 请求内容 |
723 | | c_type | string | 空 |
724 | | s_ip | string | 服务端ip |
725 | | s_port | long | 服务端端口 |
726 | | s_bytes | long | 响应大小 |
727 | | s_body | string | 响应内容 |
728 | | s_type | string | 空 |
729 | | host | string | 主机地址 |
730 | | uri_stem | string | url问号前部分 |
731 | | uri_query | string | url问号后部分 |
732 | | referer | string | 空 |
733 | | method | string | 请求方法 |
734 | | status | long | 请求状态 |
735 | | cookie | string | 空 |
736 | | useragent | string | 空 |
737 | | xforward | string | 空 |
738 | | request_time | long | 空 |
739 | | request_type | string | 空 |
740 | | referer_hit | string | 空 |
741 | | geo_city | string | 空 |
742 | 
743 | 
744 | 
745 | - 策略名:`HTTP_INCIDENT`
746 | - 策略备注: 风险事件
747 | 
748 | |字段名|字段类型|字段备注|
749 | |:-------:|:-------:|:-------:|
750 | | c_ip | string | 客户端ip |
751 | | page | string | 空 |
752 | | uid | string | 空 |
753 | | did | string | 空 |
754 | | timestamp | long | 空 |
755 | | notices | string | 空 |
756 | | tags | string | 空 |
757 | | scores | double | 空 |
758 | | strategies | string | 空 |
759 | 
760 | 
761 | ## 事件的计算
762 | 这些事件中包含了很多字段,  这些字段都可以作为次数计算, 在统计次数或者其他条件的时候可以使用, 例如每次`c_ip`访问一次, `result`记为1,  累计叠加, 那么可以10次的时候直接把这个ip作为黑名单处理.
763 | 截图说明:
764 | 
765 | ![19.事件10](http://wx1.sinaimg.cn/large/0060lm7Tly1fxnnua2jgpj30qc09nmy6.jpg)
766 | 
767 | ![20.事件11](http://wx2.sinaimg.cn/large/0060lm7Tly1fxnnv7valmj30qe0fmmyv.jpg)
768 | 这就是 [`账户`-`实名验证`] 这个事件中 `result` 的叠加计算.
769 | 


--------------------------------------------------------------------------------
/chapter3/section3/section3.3.md:
--------------------------------------------------------------------------------
1 | # 3.3.3. 变量介绍
2 | 
3 | 暂无，更新中...
4 | 


--------------------------------------------------------------------------------
/chapter3/section3/section3.4.md:
--------------------------------------------------------------------------------
1 | # 3.3.4. 规则梳理
2 | 
3 | 暂无，更新中...
4 | 


--------------------------------------------------------------------------------
/chapter3/section3/section3.5.md:
--------------------------------------------------------------------------------
  1 | # 3.3.7. 运营决策
  2 | 
  3 | 
  4 | 风控决策引擎是一堆风控规则的集合, 通过不同的分支, 层层规则的递进关系进行运算.而既然是组合的概念, 则在这些规则中, 以什么样的顺序与优先级执行便额外重要.
  5 | 
  6 | **自有规则运行优先于外部规则**
  7 | 
  8 | 举例说明:
  9 | 自有本地的黑名单库优先于外部的黑名单数据源运行, 如果触发自有本地的黑名单则风控结果可直接终止及输出“拒绝”结论.
 10 | 
 11 | ##  策略调整以及优化
 12 | 
 13 | - 前期可参考系统本身自带策略进行参考, 然后应用开启本地已有风控策略.
 14 | - 后期不同的业务场景, 对应不同的策略以及规则, 具体优化需要风控运营人员后期持续跟进, 对现有策略进行优化调整, 以及新增相应规则.
 15 | 
 16 | 
 17 | ##  回溯
 18 | 
 19 | 1. 可利用日志查询功能, 对某些疑问点进行批量数据查询以及导出, 随后进行相应的数据分析, 从而优化已有策略以及新增策略.
 20 | 
 21 | *日志导出如下*
 22 | 
 23 | ![23.运营决策.png](http://www.z4a.net/images/2018/11/28/23.png)
 24 | 
 25 | 
 26 | ## 拉黑阻断机制
 27 | 
 28 | 针对业务系统拉黑阻断，系统提供以下两种风险数据获取方法：
 29 | 
 30 | ### 请求api方式获取风险
 31 | 
 32 | **`/checkRisk`** 风险检测接口, 业务方主动发起风险检测请求, 判定`USER`, `IP`, `DID`,`ORDER ID`是否有风险.
 33 | 
 34 | 参数:  
 35 | `query`:
 36 | 需要查询的具体内容, 格式如下(需`url`编码):
 37 | 
 38 | ```json
 39 | {
 40 | 	"check_item": [
 41 | 		{
 42 | 			"k": "USER", //检测类型
 43 | 			"v": "threathunter_test" //值
 44 | 		},
 45 | 		{
 46 | 			"k": "USER",
 47 | 			"v": "threathunter"
 48 | 		}
 49 | 	],
 50 | 	"full_respond": true, //获取全部有效事件
 51 | 	"scene_type": "ORDER" //指定场景
 52 | }
 53 | ```
 54 | 备注:
 55 | >   检测类型:   ` IP` , `DEVICE ID`, `USER`, `ORDERID`  
 56 | >   场景: `OTHER`, `VISITOR`, `ACCOUNT`, `MARKETING`, `ORDER`, `TRANSACTION`
 57 | 
 58 | `auth`:
 59 |  鉴权用，在`nebula_web` 中 `setting.py`中配置:
 60 | 
 61 | 示例:
 62 | 
 63 | ```
 64 | http://127.0.0.1:9001/checkRisk?auth=7a7c4182f1bef7504a1d3d5eaa51a242&query=%7b>"check_item"%3a%5b%7b"k"%3a"USER"%2c"v"%3a"threathunter_test"%7d%2c%7b"k"%3a"USER"%2c"v"%3a"threathunter"%7d%5d%2c"full_respond"%3atrue%2c"scene_type"%3a"ORDER"%7d
 65 | ```
 66 | 
 67 | ![auth.png](http://www.z4a.net/images/2018/12/10/e990124abe12f054ceafdca6975a6179.png)
 68 | 
 69 | ### 直接读取redis获取数据
 70 | 
 71 | **推荐方式:**   
 72 | 采用redis的发布与订阅(publish/subscribe)模式，业务方订阅channel `nebula.realtime.notice`  获取实时通知即可
 73 | 
 74 | *数据示例:*
 75 | ```json
 76 | {
 77 | 	"remark": ">100, avg < 0.8, in 5m, web", //风险备注
 78 | 	"geo_city": "深圳市", //市
 79 | 	"checkpoints": "",
 80 | 	"timestamp": 1552635533123, //触发时间
 81 | 	"decision": "review", //风险决策，跟策略的配置有关
 82 | 	"tip": "IP页面停留时间过短Web",
 83 | 	"variable_values": "",
 84 | 	"risk_score": 0, //风险分值
 85 | 	"test": 0, //是否是测试策略
 86 | 	"strategy_name": "IP页面停留时间过短Web", //触发策略名
 87 | 	"expire": 1552635833123, //过期时间，跟策略的配置有关
 88 | 	"key": "183.15.177.209", //风险值
 89 | 	"scene_name": "VISITOR", //风险场景: `OTHER`, `VISITOR`, `ACCOUNT`, `MARKETING`, `ORDER`, `TRANSACTION`
 90 | 	"uri_stem": "112.74.58.210/user", //关联页面
 91 | 	"trigger_event": "{\"s_ip\": \"172.18.16.169\", \"app\": \"nebula\", \"pid\": \"000000000000000000000000\", \"s_type\": \"text/html; charset=utf-8\", \"uri_stem\": \"112.74.58.210/user\", \"c_bytes\": 0, \"id\": \"5c8b568c12a3650017e176e4\", \"uid\": \"\", \"request_time\": 3, \"platform\": \"\", \"s_body\": \"\", \"sid\": \"\", \"s_port\": 9001, \"method\": \"GET\", \"status\": 200, \"is_static\": false, \"geo_city\": \"\深\圳\市\", \"c_body\": \"\", \"timestamp\": 1552635532701, \"geo_province\": \"\广\东\省\", \"host\": \"112.74.58.210\", \"referer\": \"\", \"c_ip\": \"183.15.177.209\", \"key\": \"183.15.177.209\", \"useragent\": \"python-requests/2.11.1\", \"c_port\": 18311, \"xforward\": \"\", \"c_type\": \"\", \"name\": \"HTTP_DYNAMIC\", \"did\": \"\", \"s_bytes\": 648, \"request_type\": \"\", \"value\": 1.0, \"cookie\": \"group_id=2|1:0|10:1540368603|8:group_id|4:Mg==|30789028a7e8399f5f5ef115fc050e1b3424df25b966755be7554e0048dd29a4; user_id=2|1:0|10:1540368603|7:user_id|4:Mg==|141555b29c711f95ddebea3987820fb90c30a48f506eae9f8cee9b334372ae79; user=attack_test; auth=2|1:0|10:1540368603|4:auth|44:NGJlZTQwMDI0NTExYjM2NDVkNjkzOTM1ZTJmMDllMWY=|34718dcbcac603ee1a38daaa0a05fbafc524873af0fafb0013077a53a28b2d4b\", \"page\": \"112.74.58.210/user\", \"uri_query\": \"\", \"referer_hit\": \"F\"}", //触发事件
 92 | 	"geo_province": "广东省", //省
 93 | 	"check_type": "IP" //值类型, 如：` IP` , `DEVICE ID`, `USER`, `ORDERID`
 94 | }
 95 | ```
 96 | 
 97 | ![image](https://user-images.githubusercontent.com/31437628/54415966-e9eee900-4738-11e9-86fa-7b0229242216.png)
 98 | 
 99 | 
100 | **备用方法：**   
101 | 键名：
102 | `{metrics.db.write}db.write.notice_*`
103 | 
104 | 数据过期时间:
105 | 30分钟
106 | 
107 | 获取方法:
108 | 通过读取键名为`{metrics.db.write}db.write.notice` (`zset`, 通过时间戳排序) 中的数据`db.write.notice_*`, 与`{metrics.db.write}`进行拼接得到风险数据键名`{metrics.db.write}db.write.notice_*` (`hash`) , 然后获取键值即可获取单条风险数据.
109 | 
110 | 示例:
111 | 
112 | ![redis.png](https://www.z4a.net/images/2019/01/21/redis.png)
113 | 
114 | *注意:*  
115 | >
116 | > - 这种方式获取的数据时效性极高, 但是得到的数据不是聚合数据.
117 | > - 若当前没有风险数据, 上面提到数据键均有可能不存在.
118 | 
119 | 


--------------------------------------------------------------------------------
/chapter3/section3/section3.6.md:
--------------------------------------------------------------------------------
  1 | # 3.3.6. 策略配置
  2 | 
  3 | 
  4 | ### 序言
  5 | - 业务场景的介绍
  6 | - 业务场景的例子
  7 | - 章结语
  8 | 
  9 | #### 业务场景的介绍
 10 | 对于公司业务细分到不同的场景, 再到定制策略, 以及 Nebula 脚本可能会有些模糊, 对于一开始认识 Nebula 系统可能会对此概念模糊不清, 所以接下来对于不同的场景分别制定策略, 来深入了解策略的定制, 以及 Nebula 脚本的定制。这些场景分别是: 同一个 IP 不断登陆撞库, 同一个IP恶意注册, IP 爬虫业务系统,
 11 | #### 业务场景的例子
 12 | #### 例子一：同一个 IP 不断登陆撞库
 13 | >撞库词语解释: 撞库是黑客通过收集互联网已泄露的用户和密码信息，生成对应的字典表，尝试批量登陆其他网站后，得到一系列可以登录的用户。很多用户在不同网站使用的是相同的帐号密码，因此黑客可以通过获取用户在A网站的账户从而尝试登录B网址，这就可以理解为撞库攻击
 14 | 
 15 | 假设同一个IP，不断的用不同的账号或者密码一直访问登陆页面，在策略中设定次数之后就可以捕获到这个 IP，业务系统通过对 Nebula 风险接口的访问, 得到对此 IP 的风险判定, 对此 IP 进行弹验证码，封 IP 等处理。
 16 | 
 17 | >Nebula 查询风险接口: 请查看 Nebula 官方文档中 2.3.7 运营决策篇章, 得到更加详尽的解释
 18 | 
 19 | ##### 策略的制定
 20 | 
 21 | 1. 打开 Nebula 网站, 登陆之后出现为以下页面, 然后打开 ① 链接, 根据 Nebula 版本不同, 可能会有一些改变, 大致如下: ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijllr7oej21hc0pqq6q.jpg)
 22 | 2. 点击新建策略, 对策略的各个参数进行设定 ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijm7eu5xj21hc0q0q6e.jpg)
 23 | 3. 填写策略的基本信息, 对策略进行分类 ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijmcjikxj21hb0pydjj.jpg)
 24 | 4. 接下来是对策略的条款进行设定, 这个是核心部分, 请注意, 以【事件-动态资源请求】选定属性 page，假设 A 公司中的登陆接口是 /login , 那么需要在策略中设定捕获包含 login 接口, 也就是 page 中包含了 login 即认定为登陆接口，此处要根据自己公司的登陆接口做相应的改变，每家公司业务的登陆接口不尽相同, 例如有的公司的登陆接口也可以是 /register ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijmgtj5rj21hc0pun07.jpg)
 25 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijsxj9zkj21ha0pv41q.jpg)
 26 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijt64rwzj21hb0q00vr.jpg)
 27 | 
 28 | 5. 使用条件判断来计算值, 例如设定某 IP 访问同一页面在 5 分钟内超过 5次, 则认为此 IP 有撞库风险, 这部分理解难度比较高, 还请先照着图片教程做起来, 慢慢摸索理解  
 29 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijteeq4wj21h80q341n.jpg)
 30 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijtkfr0aj21hc0q2q62.jpg)
 31 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijtob4g6j21hb0q077f.jpg)
 32 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijttez1uj21hc0pn77n.jpg)
 33 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijtx9j9pj21h70q3gos.jpg)
 34 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1iju08b1yj21h80q0adm.jpg)
 35 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1iju2ztnwj21hc0q2djm.jpg)
 36 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1iju75e4mj21hc0q2gpb.jpg)
 37 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijuag6ioj21ha0q00wd.jpg)
 38 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijud8xs1j21h70q3juv.jpg)
 39 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijuhalatj21hc0q2424.jpg)
 40 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijukbjzdj21ha0pzq6g.jpg)
 41 | >需要注意: 上图中的 page 包含 login 需要与登陆接口一一对应, 之前填写的为 login, 现在也应该是 login
 42 | 
 43 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijunlqm1j21h80pygoq.jpg)
 44 | 6. 接下来填写 处理措施/添加风险名单 步骤, 此步骤的意义在于, 对符合条件的 IP 进行处理, 例如对此 IP 列为风险名单, 那么业务系统请求 Nebula 风险名单的时候, Nebula 会返回此 IP, 业务系统可以对此 IP 进行不同等级的处理, 例如弹出验证码, 要求重复登陆, 甚至是直接对 此 IP 封禁, 使其无法访问业务系统 ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijuqd483j21h40pqq6j.jpg)
 45 | 7. 此时已经完成了策略的配置, 点击'测试' 按钮后再点击 '上线' 按钮, 即可生效, 接下来可以测试 IP撞库 策略是否有效 ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijut5n6aj21h40nltbp.jpg)
 46 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijuwub4jj21hc0pw77f.jpg)
 47 | 8. 使用同一个 IP 对接口 /login 不停的访问, 即可在 '总览' 页面得到已触发策略, 且作为风险名单存在系统之中, 可以对 Nebula 的接口进行访问, 得到此 IP 为有风险的 IP ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ijv24nc9j21h50q976x.jpg)
 48 | 
 49 | 总结: 以上是实时在线计算 5 分钟内统计同一个 IP 访问登陆接口进行撞库的策略设置
 50 | 
 51 | 
 52 | 
 53 | #### 例子二：同一个IP恶意注册
 54 | 
 55 | 假设同一个IP，不断的用随机的账号密码邮箱去注册账号, 对企业账号风险埋下了极高的风险，在策略中设定某 IP 注册次数之后就可以捕获到这个 IP，业务系统通过对 Nebula 风险接口的访问, 得到对此 IP 的风险判定, 对此 IP 进行弹验证码，封 IP 等处理。
 56 | 
 57 | >Nebula 查询风险接口: 请查看 Nebula 官方文档中 2.3.7 运营决策篇章, 得到更加详尽的解释
 58 | 
 59 | ##### 策略的制定
 60 | 1. 打开 Nebula 网站, 登陆之后出现为以下页面, 然后打开 ① 链接, 根据 Nebula 版本不同, 可能会有一些改变, 大致如下:
 61 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1iks4g9lqj21hc0pqq6q.jpg)
 62 | 2. 点击新建策略, 对策略的各个参数进行设定
 63 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikt6bj1xj21hc0q0q6e.jpg)
 64 | 3. 填写策略的基本信息, 对策略进行分类
 65 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1iku0pt8xj21hc0pz762.jpg)
 66 | 4. 接下来是对策略的条款进行设定, 这个是核心部分, 请注意, 以【事件-动态资源请求】选定属性 page，假设 A 公司中的注册接口是 /register , 那么需要在策略中设定捕获包含 register 接口, 也就是 page 中包含了 register 即认定为登陆接口，此处要根据自己公司的注册接口做相应的改变，每家公司业务的注册接口不尽相同, 例如有的公司的注册接口也可以是 /login
 67 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikujar52j21hb0q30v3.jpg)
 68 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikurw2tkj21hb0pzmzf.jpg)
 69 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1iky1d9agj21hb0pzmzf.jpg)
 70 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1iky485inj21h70pugo9.jpg)
 71 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1iky7io42j21h80pxjtr.jpg)
 72 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikyajz1aj21ha0q0wh9.jpg)
 73 | 5. 使用条件判断来计算值, 例如设定某 IP 访问同一页面在 5 分钟内超过 5次, 则认为此 IP 有撞库风险, 这部分理解难度比较高, 还请先照着图片教程做起来, 慢慢摸索理解
 74 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikye47cxj21hc0q2go7.jpg)
 75 | >需要注意: 上图中的 page 包含 register 需要与登陆接口一一对应, 之前填写的为 register, 现在也应该是 register
 76 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikyjbleqj21hc0q2q5u.jpg)
 77 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikymdd9qj21hc0q441d.jpg)
 78 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikyp53g8j21hc0q176m.jpg)
 79 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikyssjc3j21ha0pyacf.jpg)
 80 | 6. 接下来填写 处理措施/添加风险名单 步骤, 此步骤的意义在于, 对符合条件的 IP 进行处理, 例如对此 IP 列为风险名单, 那么业务系统请求 Nebula 风险名单的时候, Nebula 会返回此 IP, 业务系统可以对此 IP 进行不同等级的处理, 例如弹出验证码, 要求重复登陆, 甚至是直接对 此 IP 封禁, 使其无法访问业务系统
 81 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikyxqspyj21ha0pjdio.jpg)
 82 | 7. 点击保存之后, 在策略管理中, 找到刚刚编写的策略, 依次点击测试, 上线, 使这条策略生效, 请注意, 这是必须做的, 否则该策略无效, 点击上线之后, 稍等一会即可生效.
 83 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikzgw9wsj21h80pvgon.jpg)
 84 | 8. 最后我们测试注册接口, 在风险事件管理中可以看到成功捕获了风险, 并且对该 IP 列为审核状态
 85 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1ikz6ysilj21hc0pvwg3.jpg)
 86 | 
 87 | 总结: 以上是实时在线计算 5 分钟内统计同一个 IP 恶意注册的策略设置
 88 | 
 89 | 
 90 | 
 91 | #### 例子三：IP 爬虫业务系统
 92 | 
 93 | >爬虫 词语解释: 网络爬虫（又被称为网页蜘蛛，网络机器人，在FOAF社区中间，更经常的称为网页追逐者），是一种按照一定的规则，自动地抓取万维网信息的程序或者脚本。另外一些不常使用的名字还有蚂蚁、自动索引、模拟程序或者蠕虫
 94 | 
 95 | 假设同一个IP，使用脚本爬虫不断的访问页面, 爬取业务系统资料，对企业的伤害极高, 在策略中设定某 IP 访问动态资源超过一定的量之后, 这个量的界定在于, 普通用户无法在短时间内达到这么高的量级, 需要根据本身业务系统的量级去调试，那么该 IP 超过量之后, 对该 IP 贴上风险标签, 业务系统通过对 Nebula 风险接口的访问, 得到对此 IP 的风险判定, 对此 IP 进行弹验证码，封 IP 等处理。
 96 | 
 97 | >Nebula 查询风险接口: 请查看 Nebula 官方文档中 2.3.7 运营决策篇章, 得到更加详尽的解释
 98 | 
 99 | ##### 策略的制定
100 | 1. 打开 Nebula 网站, 登陆之后出现为以下页面, 然后打开 ① 链接, 根据 Nebula 版本不同, 可能会有一些改变, 大致如下:
101 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il2rh5emj21hc0pqq6q.jpg)
102 | 2. 点击新建策略, 对策略的各个参数进行设定
103 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il2u49itj21hc0q0q6e.jpg)
104 | 3. 填写策略的基本信息, 对策略进行分类
105 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il2wizpxj21hc0pxjty.jpg)
106 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il2zby0bj21h90pwtbf.jpg)
107 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il3c0npij21hb0pztbl.jpg)
108 | 4. 接下来是对策略的条款进行设定, 这个是核心部分, 请注意, 以【事件-动态资源请求】选定属性 c_ip， c_ip 包含了 . 注意是英文符号 ".", 这表达的意思是, 所有的 IP 都会被作为数据去计算,  之后指定筛选条件, 筛选出符合条件的 IP, 接下来, 会选择条件判断来筛选.
109 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il3g4rf7j21h60pwju6.jpg)
110 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il3nbphbj21hb0q10vp.jpg)
111 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il3psfepj21h50pwacw.jpg)
112 | 5. 使用条件判断来计算值, 例如设定某 IP 访问同一页面在 5 分钟内获取的动态资源5m 超过 500 次, 则认为此 IP 正在爬取网页信息, 当然这个量的设定可以根据公司的业务去设置, 这部分理解难度可能比较高, 还请先照着图片教程做起来, 慢慢摸索理解
113 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il3s3yt3j21h60px0wc.jpg)
114 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il3utqcoj21hc0pxtbs.jpg)
115 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il3x39qkj21h60pwgoj.jpg)
116 | 6. 接下来填写 处理措施/添加风险名单 步骤, 此步骤的意义在于, 对符合条件的 IP 进行处理, 例如对此 IP 列为风险名单, 那么业务系统请求 Nebula 风险名单的时候, Nebula 会返回此 IP, 业务系统可以对此 IP 进行不同等级的处理, 例如弹出验证码, 要求重复登陆, 甚至是直接对 此 IP 封禁, 使其无法访问业务系统
117 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il3z7qbjj21h60q0777.jpg)
118 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il41wdshj21h80ppadf.jpg)
119 | 7. 此时已经完成了策略的配置, 点击'测试' 按钮后再点击 '上线' 按钮, 即可生效, 接下来可以测试 IP撞库 策略
120 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il445mglj21h80q00vu.jpg)
121 | 8. 使用同一个 IP 用脚本不停的爬取网页, 即可在 '总览' 页面得到已触发策略, 且作为风险名单存在系统之中, 可以对 Nebula 的接口进行访问, 得到此 IP 为有风险的 IP
122 | ![](http://ww1.sinaimg.cn/large/66d0828fly1g1il46nyc1j21hb0px0v6.jpg)
123 | 
124 | 总结: 以上是实时在线计算 5 分钟内统计同一个 IP 使用脚本爬虫不断的访问页面, 爬取业务系统资料, 捕获且标记为风险 IP 的策略定制
125 | 
126 | #### 章节语
127 | 经过这 3 个策略定制的例子, 对每一步都给出了细致的步骤, 希望能对定制策略者有一些入门级别的帮助, Nebula 这个风控系统的能力不止于此, 希望可以通过这些教程提供一些微小的帮助, 让你可以进入策略定制的大门, 设置一些符合业务系统的需求的策略, 达到帮助抵抗风险的能力.
128 | 


--------------------------------------------------------------------------------
/chapter3/section3/section3.7.md:
--------------------------------------------------------------------------------
  1 | ### 序言
  2 | - 解释流量, 事件, 策略, 风险的关联
  3 | - 日志解析的介绍
  4 | - 日志解析的例子
  5 | - 章结语
  6 | 
  7 | #### 解释流量, 事件, 策略, 风险的关联
  8 | 
  9 | * 流量: 访问一次页面, 或者是请求一次 API, 在系统中记为一次流量
 10 | * 事件: 当一个流量符合一个事件的规则则转化为事件
 11 | * 策略: 当事件符合一个策略中的规则则触发风险
 12 | * 风险: 由策略规则规定了风险的定义, 风险有次数, 时间, 设备等维度
 13 | 
 14 | #### 日志解析的介绍
 15 | 
 16 | 因为业务千变万化, `TH-Nebula`并不能预知到所有的业务场景, 所以需要自定义化, 可以对一个接口写一个, 甚至多个日志解析去触发策略.日志解析. 为了更加自由定制化的规则, 这部分需要充分理解了策略管理之后才可以理解日志解析, 这部分是依赖策略管理的, 简单来说, 就是以动态流量, 通过规则解析为任意的事件, 从而根据策略的条件触发策略, 这样就捕捉到风险. 然后再对风险做出更多的对策, 抵挡住风险.
 17 | 
 18 | #### 如何定制TH-Nebula日志解析？
 19 | 1. 首先取得所需要监控网站的接口以及页面(page)
 20 | 2. 对接口的参数需要收集起来,并对参数要明确了解
 21 | 3. 对接口的类型分类为已有的策略, 例如策略有: 账号-登陆(ACCOUNT_LOGIN), 点击(HTTP_CLICK), 根据接口参数所提供的, 在日志解析中可以利用到, 转化成规则, 分配到策略上, 就可以在有流量的时候, 将流量转化为事件, 将事件分配到策略上, 而是否触发策略, 需要根据策略的规则来定, 之后在详情页面拿到所触发的策略以及流量详情(包含账户信息, IP等等).
 22 | 
 23 | 
 24 | #### 例子一：日志解析 解析用户访问
 25 | 
 26 | 日志的解析可以做到很复杂的定制, 这个例子更偏向于解释日志解析这个功能, 业务风险需要理解后做到更贴切的一些规则. 但例子覆盖基本的操作来解释日志解析的功能.
 27 | 
 28 | 1, 首先打开 `TH-Nebula` 管理后台.然后打开日志解析的标签.
 29 | 
 30 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25qnxoc4fj217o0midhd.jpg)
 31 | 
 32 | 2, 点击新建解析, 来设定规则从动态流量转化为设定的事件.
 33 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25qupv4txj217r0mcjsx.jpg)
 34 | 
 35 | 3, 选择  `HTTP_DYNAMIC` 来将所有得到的 `HTTP_DYNAMIC` 转化为想要转换的事件
 36 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r2ighr8j217m0m7767.jpg)
 37 | 
 38 | 4, 可以看到图中, 将 c_ip 中包含 . IP , 这句话的意思是, 所有访问的 IP 的流量都转换为我想要的事件, 因为所有 IP 中不可避免的都包含了 .
 39 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r4h3rrsj217s0mhmz2.jpg)
 40 | 
 41 | 5, 看图中, 我设定了转换为 `ORDER_SUBMIT` , 上方矩形展示的是 `HTTP_DYNAMIC` 中拥有的字段, 而下方矩形展示的是 `ORDER_SUBMIT` 所拥有的字段, 这些字段在策略中都是可以进行计算的, 这涉及到了策略的复杂度, 就不展开讲了, 详情可以看策略的篇章.
 42 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r9g3xjhj217f0mamzu.jpg)
 43 | 
 44 | 6, 接下来可以看到, 我在下方设置了 user_name 是等于变量 cookie 中获取 user 的字段中的值的, 这个具体要看公司的业务, 在 cookie 中是如何设置的.
 45 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rcgwz2rj217s0mfq5m.jpg)
 46 | 
 47 | 7, 实际上到这里的简单的规则就设定完毕了.启用这条日志解析就可以了.
 48 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25re9b95fj217o0mdmys.jpg)
 49 | 
 50 | 8, 但是这些流量已经转换为具体的事件, 他并没有发挥捕捉风险的作用, 最关键的地方是需要设定策略, 与日志解析相配合, 达到效果, 符合策略的规则才可以触发策略, 将事件计算为风险, 这样让管理者明显的感知到这些用户是黑产用户, 对其做出封号, 或者其他的措施.
 51 | 
 52 | 9, 首先打开 `策略管理` , 到具体的分类创建具体的策略.
 53 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rlsxq33j217r0mewgj.jpg)
 54 | 
 55 | 10, 点击 `新建策略` , 填写策略的详细信息
 56 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25roukha0j217n0mg40t.jpg)
 57 | 
 58 | 11, 在策略规则上, 以简单的填写来解释日志解析的作用, 首先日志解析让流量符合 c_ip 包含 . 的流量转换为 订单提交(`ORDER_SUBMIT`) 的事件, 并且将流量中的 cookie 取出 user 作为这个流量的而策略中,如果事件符合 user_name 包含 threathunter, 则触发这个流量为风险流量, 这是个测试, 尽可能的简单, 我会在攻击页面的时候, 在cookie中带上 user=threathunter 这个key-value, 从而测试这个日志解析为事件, 事件触发策略是否可以成功.
 59 | 
 60 | 12, 接下来对风险的处理, 点击编辑参数.
 61 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rwmdgltj21gz0pvn1d.jpg)
 62 | 
 63 | 13, 填写对风险的处理, 基本填写如下, 将 IP 设置为值类型, 将 c_ip 设置为风险值, 这些数据都在其他地方体现, 你可以在其他的页面中找到对触发风险的 ip 有更加详细的分析.
 64 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rxjpcboj21h60pyae7.jpg)
 65 | 
 66 | 14, 对此策略进行测试并且上线处理.
 67 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25s0e8i9hj21h90o50us.jpg)
 68 | 
 69 | 15, 到此所有的设置就完成了, 如果有流量攻击页面或者 API, 那么 cookie 中带有 user=threathunter, 那么一定会触发这个风险, 将风险 IP 作为风险展示到总览页面以及风险名单页面中.
 70 | 
 71 | 16, 以下是触发风险的结果.
 72 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25s3ik5gpj21h40p4n0p.jpg)
 73 | 
 74 | 例子一总结: 对此, 你应该有一个大致的了解, 可以动手试试看是不是可以做到日志解析中最简单的流量转换为事件, 以及事件导向策略, 从而触发风险的流程.这对贴切公司业务, 高级定制规则是非常重要的开始.
 75 | 
 76 | #### 例子二：日志解析 解析 GET 请求
 77 | 以最常见的 GET 请求的例子来说明日志解析功能的强大, 以及灵活性, 这些对公司业务非常有帮助, 因为公司的业务非常的复杂, 风控系统不可能提前预知公司业务的复杂性, 所以通过日志解析来解析 GET 请求参数,具有很大的灵活性.
 78 | 
 79 | 1, 首先打开 `TH-Nebula` 管理后台.然后打开日志解析的标签.
 80 | 
 81 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25qnxoc4fj217o0midhd.jpg)
 82 | 
 83 | 2, 点击新建解析, 来设定规则从动态流量转化为设定的事件.
 84 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25qupv4txj217r0mcjsx.jpg)
 85 | 
 86 | 3, 选择  `HTTP_DYNAMIC` 来将所有得到的 `HTTP_DYNAMIC` 转化为想要转换的事件
 87 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r2ighr8j217m0m7767.jpg)
 88 | 
 89 | 4, 可以看到图中, 将 c_ip 中包含 . IP , 这句话的意思是, 所有访问的 IP 的流量都转换为我想要的事件, 因为所有 IP 中不可避免的都包含了 .
 90 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r4h3rrsj217s0mhmz2.jpg)
 91 | 
 92 | 5, 看图中, 我设定了转换为 `ORDER_SUBMIT` , 上方矩形展示的是 `HTTP_DYNAMIC` 中拥有的字段, 而下方矩形展示的是 `ORDER_SUBMIT` 所拥有的字段, 这些字段在策略中都是可以进行计算的, 这涉及到了策略的复杂度, 就不展开讲了, 详情可以看策略的篇章.
 93 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r9g3xjhj217f0mamzu.jpg)
 94 | 
 95 | 6, 接下来可以看到, 我在下方设置了 receiver_email 是等于变量 uri_query 中获取 username 的字段中的值的, 这个具体要看公司的业务, 在 GET请求中的参数都有哪些可以利用的, 这里的 username 的值假设是邮箱账号
 96 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26t2l907kj21760kp76y.jpg)
 97 | 
 98 | 7, 实际上到这里的简单的规则就设定完毕了.启用这条日志解析就可以了.
 99 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25re9b95fj217o0mdmys.jpg)
100 | 
101 | 8, 但是这些流量已经转换为具体的事件, 他并没有发挥捕捉风险的作用, 最关键的地方是需要设定策略, 与日志解析相配合, 达到效果, 符合策略的规则才可以触发策略, 将事件计算为风险, 这样让管理者明显的感知到这些用户是黑产用户, 对其做出封号, 或者其他的措施.
102 | 
103 | 9, 首先打开 `策略管理` , 到具体的分类创建具体的策略.
104 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rlsxq33j217r0mewgj.jpg)
105 | 
106 | 10, 点击 `新建策略` , 填写策略的详细信息
107 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26tgfh12rj217n0ktac5.jpg)
108 | 
109 | 11, 在策略规则上, 以简单的填写来解释日志解析的作用, 首先日志解析让流量符合 c_ip 包含 . 的流量转换为 订单提交(`ORDER_SUBMIT`) 的事件, 并且将流量中的 GET请求中的参数 username 取出来作为这个流量的数据, 赋值在了 receiver_email , 而策略中,如果事件符合 receiver_email 包含 @, 则触发这个流量为风险流量, 这是个测试, 尽可能的简单, 我会在攻击页面的时候, 在 GET 请求中带上 username=threathunter@threathunter.cn 这个key-value, 从而测试这个日志解析为事件, 事件触发策略是否可以成功.
110 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26tlltu5fj216m0kotb8.jpg)
111 | 
112 | 12, 接下来对风险的处理, 点击编辑参数.
113 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rwmdgltj21gz0pvn1d.jpg)
114 | 
115 | 13, 填写对风险的处理, 基本填写如下, 将 IP 设置为值类型, 将 c_ip 设置为风险值, 这些数据都在其他地方体现, 你可以在其他的页面中找到对触发风险的 ip 有更加详细的分析.
116 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rxjpcboj21h60pyae7.jpg)
117 | 
118 | 14, 对此策略进行测试并且上线处理.
119 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26tp1cxm4j217k0kqmyx.jpg)
120 | 
121 | 15, 到此所有的设置就完成了, 如果有流量攻击页面或者 API, 那么 GET 中带有 username中包含 @ 那么一定会触发这个风险, 将风险 IP 作为风险展示到总览页面以及风险名单页面中.
122 | 
123 | 16, 以下是触发风险的结果.
124 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26tu1ywexj21750ku41g.jpg)
125 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26tun9uvxj217f0knjte.jpg)
126 | 
127 | #### 例子三：日志解析 解析 POST 请求
128 | 以最常见的 POST 请求的例子来说明日志解析功能的强大, 以及灵活性, 这些对公司业务非常有帮助, 因为公司的业务非常的复杂, 风控系统不可能提前预知公司业务的复杂性, 所以通过日志解析来解析 POST 请求参数,具有很大的灵活性.
129 | 
130 | 1, 首先打开 `TH-Nebula` 管理后台.然后打开日志解析的标签.
131 | 
132 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25qnxoc4fj217o0midhd.jpg)
133 | 
134 | 2, 点击新建解析, 来设定规则从动态流量转化为设定的事件.
135 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25qupv4txj217r0mcjsx.jpg)
136 | 
137 | 3, 选择  `HTTP_DYNAMIC` 来将所有得到的 `HTTP_DYNAMIC` 转化为想要转换的事件
138 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r2ighr8j217m0m7767.jpg)
139 | 
140 | 4, 可以看到图中, 将 c_ip 中包含 . IP , 这句话的意思是, 所有访问的 IP 的流量都转换为我想要的事件, 因为所有 IP 中不可避免的都包含了 .
141 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r4h3rrsj217s0mhmz2.jpg)
142 | 
143 | 5, 看图中, 我设定了转换为 `ORDER_SUBMIT` , 上方矩形展示的是 `HTTP_DYNAMIC` 中拥有的字段, 而下方矩形展示的是 `ORDER_SUBMIT` 所拥有的字段, 这些字段在策略中都是可以进行计算的, 这涉及到了策略的复杂度, 就不展开讲了, 详情可以看策略的篇章.
144 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25r9g3xjhj217f0mamzu.jpg)
145 | 
146 | 6,  接下来可以看到, 我在下方设置了 transaction_id 是等于变量 从 POST 参数中获取 order_number 字段的值, 这个具体要看公司的业务, 在 POST 请求中的参数都有哪些可以利用的, 这里的 order_number 的值假设是订单号, 可以用于策略的计算 
147 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26u4fjxyjj216y0ksq5p.jpg)
148 | 
149 | 7, 实际上到这里的简单的规则就设定完毕了.启用这条日志解析就可以了.
150 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25re9b95fj217o0mdmys.jpg)
151 | 
152 | 8, 但是这些流量已经转换为具体的事件, 他并没有发挥捕捉风险的作用, 最关键的地方是需要设定策略, 与日志解析相配合, 达到效果, 符合策略的规则才可以触发策略, 将事件计算为风险, 这样让管理者明显的感知到这些用户是黑产用户, 对其做出封号, 或者其他的措施.
153 | 
154 | 9, 首先打开 `策略管理` , 到具体的分类创建具体的策略.
155 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rlsxq33j217r0mewgj.jpg)
156 | 
157 | 10, 点击 `新建策略` , 填写策略的详细信息
158 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26ud5ldj4j217q0krmzf.jpg)
159 | 
160 | 11, 在策略规则上, 以简单的填写来解释日志解析的作用, 首先日志解析让流量符合 c_ip 包含 . 的流量转换为 订单提交(`ORDER_SUBMIT`) 的事件, 并且将流量中的 POST 请求中的参数 order_number 取出来作为这个流量的数据, 赋值在了 transaction_id , 而策略中,如果事件符合 transaction_id 包含 0, 假设订单号至少有一个 0, 则触发这个流量为风险流量, 这是个测试, 尽可能的简单, 我会在攻击页面的时候, 在 POST 请求中带上 order_number=0888888880 这个key-value, 从而测试这个日志解析为事件, 事件触发策略是否可以成功.
161 | 
162 | 12, 接下来对风险的处理, 点击编辑参数.
163 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rwmdgltj21gz0pvn1d.jpg)
164 | 
165 | 13, 填写对风险的处理, 基本填写如下, 将 IP 设置为值类型, 将 c_ip 设置为风险值, 这些数据都在其他地方体现, 你可以在其他的页面中找到对触发风险的 ip 有更加详细的分析.
166 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g25rxjpcboj21h60pyae7.jpg)
167 | 
168 | 14, 对此策略进行测试并且上线处理.
169 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26uowzboaj21780ksac4.jpg)
170 | 
171 | 15, 到此所有的设置就完成了, 如果有流量攻击页面或者 API, 那么 POST 中带有 order_number 中包含 0 那么一定会触发这个风险, 将风险 IP 作为风险展示到总览页面以及风险名单页面中.
172 | 
173 | 16, 以下是触发风险的结果.
174 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26wgk56r1j21770ksgof.jpg)
175 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g26wnww89hj21770ktdhp.jpg)
176 | 
177 | 
178 | 


--------------------------------------------------------------------------------
/chapter3/section3/section3.8.md:
--------------------------------------------------------------------------------
  1 | # 3.3.5. 脚本定制
  2 | 
  3 | 
  4 | ## 什么是TH-Nebula脚本？
  5 | 
  6 | `TH-Nebula`脚本是`TH-Nebula`系统配合策略, 触发策略, 参与策略定义变量计算的程序.
  7 | 
  8 | ## 为何需要定制TH-Nebula脚本？
  9 | 
 10 | 因为业务千变万化, `TH-Nebula`并不能预知到所有的业务场景, 所以需要自定义化, 可以对一个接口写一个, 甚至多个脚本去触发策略.
 11 | 
 12 | #### 如何定制TH-Nebula脚本？
 13 | 
 14 | `TH-Nebula`脚本的总体流程
 15 | 
 16 | ![21.脚本定制1](http://www.z4a.net/images/2018/11/28/21.png)
 17 | 
 18 | 概要:
 19 | 
 20 | 1. 首先取得所需要监控网站的接口以及页面(`page`)
 21 | 2. 对接口的参数需要收集起来,并对参数要明确了解
 22 | 3. 对接口的类型分类为所属的策略, 根据接口参数所提供的, 在定制脚本上定制相应的类,也就是处理的函数,分配到策略上, 就可以在触发事件的时候, 通过处理函数,将事件分配到策略上, 在详情页面拿到所定制的流量数据.
 23 | 
 24 | ### 简要的例子
 25 | 
 26 |    在需要监控网站的订单页面, 某个用户提交订单, 那么在定制脚本通过监控订单接口, 拿到订单流量, 把所有的订单参数作为策略字段存到事件中去, 那么事件就会根据配备上线的策略去触发策略, 这事件就会在`TH-Nebula`中例如总览, 风险名单管理, 风险事件管理位置出现, 在详情页面就能拿到这订单的所有信息.
 27 | 
 28 | ### 完整的例子
 29 | 
 30 | 根据之前的例子, 新建了一个事件叫做账号-实名验证的策略, 那么它的类型如图所示它的类型为: `ACCOUNT_CERTIFICATION`
 31 | 
 32 | ![22.脚本定制2](http://www.z4a.net/images/2018/11/28/22.png)
 33 | 
 34 | 已经定制好策略, 然后在脚本中定义此策略相应的类, 以下是测试的类
 35 | 例如在/etc/nebula/sniffer/sniffer.conf中配置脚本名, 以下`testparser`即是脚本名, 脚本的所在位置在于`TH-Nebula_sniffer`项目:
 36 | 
 37 | `sniffer.conf`位置:
 38 | 
 39 | ```
 40 | /etc/nebula/sniffer/sniffer.conf
 41 | ```
 42 | 
 43 | 配置:
 44 | ```
 45 | sources: [eth0]
 46 | en0:
 47 |     driver: bro
 48 |     interface: en0
 49 |     ports: [80, 81, 1080, 3128, 8000, 8080, 8888, 9001, 8081-8083]
 50 |     start_port: 48880
 51 |     instances: 1
 52 |     parser:
 53 |         name: test
 54 |         module: testparser
 55 | ```
 56 | `testparser.py`位置:
 57 | ```
 58 | TH-Nebula_sniffer\TH-Nebula_sniffer\customparsers\testparser.py
 59 | ```
 60 | 
 61 | 配置：
 62 | 
 63 | ```python
 64 | #!/usr/bin/env python
 65 | # -*- coding: utf-8 -*-
 66 | import re
 67 | 
 68 | import logging
 69 | 
 70 | from threathunter_common.util import millis_now
 71 | from threathunter_common.event import Event
 72 | 
 73 | from ..parser import Parser, extract_common_properties, extract_http_log_event
 74 | from ..parserutil import extract_value_from_body, get_md5, get_json_obj
 75 | from ..msg import HttpMsg
 76 | 
 77 | logger = logging.getLogger("sniffer.parser.{}".format("testparser"))
 78 | 
 79 | def arg_from_get(data):
 80 |     d = data.split('?')[1].split('&')
 81 |     result = {}
 82 |     for e in d:
 83 |         a, b = e.split('=')
 84 |         result[a] = b
 85 |     return result
 86 | 
 87 | def account_test(httpmsg):
 88 |     if not isinstance(httpmsg, HttpMsg):
 89 |         return
 90 |     if "/loign" not in httpmsg.uri:
 91 |         return
 92 |     # get请求通过httpmsg的uri字段获得请求参数
 93 |     data = httpmsg.get_dict()    
 94 |     data = arg_from_get(data['uri'])
 95 |     properties = extract_common_properties(httpmsg)
 96 |     properties["result"] = "T"
 97 |     properties["new_password"] = data.get('new_password')
 98 |     properties["old_password"] = data.get('old_password')
 99 |     properties["register_realname"] = ""
100 |     properties["register_channel"] = "not"
101 |     properties["email"] = data.get('email')
102 |     properties["user_name"] = extract_value_from_body(r_mobile_pattern, httpmsg.req_body)
103 |     properties["password"] = data.get('password')
104 |     properties["captcha"] = ""
105 |     properties["register_verification_token"] = "none"
106 |     properties["register_verification_token_type"] = "null"
107 |     return Event("nebula", "ACCOUNT_CERTIFICATION", "", millis_now(), properties)
108 | 
109 | 
110 | #############Parser############################
111 | class TestParser(Parser):
112 |     def __init__(self):
113 |         super(TestParser, self).__init__()
114 |         self.http_msg_parsers = [
115 |             account_test,
116 |         ]
117 | 
118 |     def name(self):
119 |         return "test customparsers"
120 | 
121 |     def get_logbody_config(self):
122 |         return ["login", "user"]
123 | 
124 |     def get_events_from_http_msg(self, http_msg):
125 |         if not http_msg:
126 |             return []
127 | 
128 |         result = list()
129 |         for p in self.http_msg_parsers:
130 |             try:
131 |                 ev = p(http_msg)
132 |                 if ev:
133 |                     result.append(ev)
134 |             except:
135 |                 logger.debug("fail to parse with {}".format(p))
136 | 
137 |         return result
138 | 
139 |     def get_events_from_text_msg(self, text_msg):
140 |         return []
141 | 
142 |     def filter(self, msg):
143 |         if not isinstance(msg, HttpMsg):
144 |             return False
145 | 
146 |         return False
147 | 
148 | 
149 | Parser.add_parser("test", TestParser())
150 | ```
151 | 
152 | 
153 | 要注意最后`return中`的`Event`第二个参数是: `ACCOUNT_CERTIFICATION`
154 | 这个参数决定了触发了策略的类型, 系统会根据这个类型去匹配策略, 如果触发了策略, 那么会根据策略的处理流程处理.
155 | 
156 | ## httpmsg是什么？
157 | 
158 | `httpmsg`是`TH-Nebula`系统中的`sniffer`定义的请求所包含的数据, 设备信息, `IP`等等所有能抓到的所有信息
159 | 
160 | ## 流量中的请求如何体现在httpmsg中？
161 | 
162 | ### `GET`请求在`httpmsg`的体现
163 | 
164 | 注意请求参数order=88888888888888888888 在`uri`中, 可以以此找到所有请求参数,这是定制脚本重点!
165 | ```json
166 | {
167 | 	'uid': '',
168 | 	'status_code': 404,
169 | 	'resp_content_type': 'text/plain; charset=utf-8',
170 | 	'resp_headers': {
171 | 		'CONTENT-LENGTH': '356',
172 | 		'VARY': 'Accept-Encoding',
173 | 		'SERVER': 'openresty',
174 | 		'CONNECTION': 'close',
175 | 		'DATE': 'Fri, 26 Oct 2018 03:58:30 GMT',
176 | 		'CONTENT-TYPE': 'text/plain; charset=utf-8'
177 | 	},
178 | 	'id': ObjectId('5bd290e612a3650c2c1c440f'),
179 | 	'dest_port': 9001,
180 | 	'resp_body_len': 356,
181 | 	'log_body': False,
182 | 	'resp_body': '',
183 | 	'req_content_type': '',
184 | 	'debug_processing': False,
185 | 	'req_headers': {
186 | 		'ACCEPT-ENCODING': 'gzip, deflate',
187 | 		'HOST': '112.74.58.210:9001',
188 | 		'ACCEPT': '*/*',
189 | 		'USER-AGENT': 'python-requests/2.11.1',
190 | 		'CONNECTION': 'close',
191 | 		'COOKIE': 'auth=2|1:0|10:1540368603|4:auth|44:NGJlZTQwMDI0NTExYjM2NDVkNjkzOTM1ZTJmMDllMWY=|34718dcbcac603ee1a38daaa0a05fbafc524873af0fafb0013077a53a28b2d4b; group_id=2|1:0|10:1540368603|8:group_id|4:Mg==|30789028a7e8399f5f5ef115fc050e1b3424df25b966755be7554e0048dd29a4; user=2|1:0|10:1540368603|4:user|16:Ymlnc2VjX3Rlc3Q=|4a26e0eb0a60184839666d67af744a3fc869efc82d4c413182814d8732c2875b; user_id=2|1:0|10:1540368603|7:user_id|4:Mg==|141555b29c711f95ddebea3987820fb90c30a48f506eae9f8cee9b334372ae79'
192 | 	},
193 | 	'method': 'GET',
194 | 	'req_body': '',
195 | 	'req_body_len': 0,
196 | 	'host': '112.74.58.210',
197 | 	'referer': '',
198 | 	'xforward': '',
199 | 	'did': '',
200 | 	'status_msg': 'not found',
201 | 	'source_ip': '116.24.64.28',
202 | 	 // 这里需要注意 uri 字段值
203 | 	'uri': '112.74.58.210/history?order_id=88888888888888888888',
204 | 	'user_agent': 'python-requests/2.11.1',
205 | 	'source_port': 60563,
206 | 	'dest_ip': '172.18.16.169'
207 | }
208 | ```
209 | 
210 | ### POST请求在httpmsg的体现
211 | 
212 | 注意请求字段  在`req_body`字段中,可以以此找到所有请求参数,这是定制脚本重点
213 | ```json
214 | {
215 | 	'uid': '',
216 | 	'status_code': 404,
217 | 	'resp_content_type': 'text/plain;charset=utf-8',
218 | 	'resp_headers': {
219 | 		'CONTENT-LENGTH': '356',
220 | 		'VARY': 'Accept-Encoding',
221 | 		'SERVER': 'openresty',
222 | 		'CONNECTION': 'close',
223 | 		'DATE': 'Fri,26Oct201808:13:52GMT',
224 | 		'CONTENT-TYPE': 'text/plain;charset=utf-8'
225 | 	},
226 | 	'id': ObjectId('5bd2ccc012a365682aee1fbd'),
227 | 	'dest_port': 9001,
228 | 	'resp_body_len': 356,
229 | 	'log_body': False,
230 | 	'resp_body': 'Traceback(mostrecentcalllast):\nFile"/home/threathunter/nebula/nebula_web/venv/lib/python2.7/site-packages/tornado/web.py",line1422,in_execute\nresult=self.prepare()\nFile"/home/threathunter/nebula/nebula_web/venv/lib/python2.7/site-packages/tornado/web.py",line2149,inprepare\nraiseHTTPError(self._status_code)\nHTTPError:HTTP404:NotFound\n',
231 | 	'req_content_type': 'application/json',
232 | 	'debug_processing': False,
233 | 	'req_headers': {
234 | 		'CONTENT-LENGTH': '78',
235 | 		'ACCEPT-ENCODING': 'gzip,deflate',
236 | 		'HOST': '112.74.58.210:9001',
237 | 		'ACCEPT': '*/*',
238 | 		'USER-AGENT': 'python-requests/2.11.1',
239 | 		'CONNECTION': 'close',
240 | 		'COOKIE': 'auth=2|1:0|10:1540368603|4:auth|44:NGJlZTQwMDI0NTExYjM2NDVkNjkzOTM1ZTJmMDllMWY=|34718dcbcac603ee1a38daaa0a05fbafc524873af0fafb0013077a53a28b2d4b;group_id=2|1:0|10:1540368603|8:group_id|4:Mg==|30789028a7e8399f5f5ef115fc050e1b3424df25b966755be7554e0048dd29a4;user=weihong;user_id=2|1:0|10:1540368603|7:user_id|4:Mg==|141555b29c711f95ddebea3987820fb90c30a48f506eae9f8cee9b334372ae79',
241 | 		'CONTENT-TYPE': 'application/json'
242 | 	},
243 | 	'method': 'POST',
244 | 	 // 这里需要注意 req_body 字段值
245 | 	 'req_body': '{"password":"777777777","order":"999999999999999999","user":"weihong999"}',
246 | 	'req_body_len': 78,
247 | 	'host': '112.74.58.210',
248 | 	'referer': '',
249 | 	'xforward': '',
250 | 	'did': '',
251 | 	'status_msg': 'notfound',
252 | 	'source_ip': '61.141.65.209',
253 | 	'uri': '112.74.58.210/history',
254 | 	'user_agent': 'python-requests/2.11.1',
255 | 	'source_port': 62840,
256 | 	'dest_ip': '172.18.16.169'
257 | }
258 | ```
259 | 
260 | 


--------------------------------------------------------------------------------
/chapter3/section4.md:
--------------------------------------------------------------------------------
 1 | # 3.4. 星云系统配置功能
 2 | 
 3 | 系统配置功能允许管理员对 TH-Nebula 进行细节调整配置以适应不同的需求，配置包含以下 4 个部分：
 4 | 
 5 | **邮件报警**
 6 | 
 7 | 邮件报警配置可通过内部邮件服务器的方式发送风险事件情况
 8 | ![](https://i.loli.net/2019/04/04/5ca5ecf89684c.png)为了获得 SMTP 的相关配置，您可以需要向公司内的 IT 部门】申请一个用来发送 TH-Nebula 报警的内部邮件账号并输入在这里。
 9 | 
10 | **网络流量过滤**
11 | 
12 | 企业内部往往存在许多不需要进行分析的网络流量，通过网络流量过滤配置您可以将这些请求通过 IP、端口、域名等方式过滤掉
13 | ![](https://i.loli.net/2019/04/04/5ca5ee063906f.png)
14 | 
15 | **日志显示管理**
16 | 
17 | 经过流量的过滤后，所有的请求都会进入计算，但依然有部分请求是会干扰分析师查看用户行为的，比如静态图片的加载请求，在日志显示管理中你可以将某些干扰分析的请求过滤掉，在过滤后这些请求就不会显示在风险分析的详情中了。
18 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9i1y0bwj20xy0p0q7o.jpg)
19 | 
20 | **敏感字段加密**
21 | 
22 | 在业务请求中出现敏感字段需要脱敏时，可使用该配置对敏感字段进行 SHA1 加密，这里的字段 指的是query请求中的敏感参数,例如[www.xxx.com/login?password=love123&username=test](http://www.xxx.com/login?password=love123&username=test)
23 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9i5y82lj20xk0h8wh3.jpg)
24 | 


--------------------------------------------------------------------------------
/chapter3/section5.md:
--------------------------------------------------------------------------------
  1 | # 3.5. 阻断星云中发现的风险
  2 | 
  3 | 
  4 | 由于 TH-Nebula 属于旁路分析模式，所以无法主动拦截风险事件，需要与企业端应用进行集成后实现自动阻断的功能。
  5 | 
  6 | ### 企业端常见的阻断风险方式
  7 | 
  8 | ● 防火墙/交换机等网络串行设备阻断IP
  9 | 
 10 | ● 业务中间件阻断 IP 或 session
 11 | 
 12 | ● 通用 BOTS 拦截页面阻断 IP
 13 | 
 14 | ● 业务处置中心阻断帐号风险
 15 | 
 16 | ● 风险判断接口阻断风险业务行为（订单、交易）阻断账户风险
 17 | 
 18 | ● 登录态实效或增强验证阻断帐号风险
 19 | 
 20 | 上述阻断方式只是常见的一些阻断手段，采用的手段并不是越多越好，而是结合自身的企业特征选择，重点在于保证覆盖，这里指的覆盖是指不论何种风险自动分析系统，均可在阻断体系中找到自己的位子，同时在发生风险问题的时候能够通过多种不同的维度在应用架构中找到一层可以去阻拦风险行为的。推荐的最佳阻拦机制可见下图所示：
 21 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9iktozgj20tu0kqtg3.jpg)**最外层为访客层：**当一个未登录的用户访问网站或应用时，这时只能通过 IP 来去识别用户（设备指纹或 clientlD 等技术不在此处讨论），所以在访客层主要拦截维度是 IP。
 22 | 
 23 | **在访客登录某个帐号后进入账号层：**进入帐号层的主要入口包含常见的登录和注册，另外在线找回密码也属于一个帐号层入口，在这一层主要有两种类型的阻断手段：
 24 | 
 25 | 第一类是在入口处进行风险被动验证，比如视风险判断接口在登录时判断是否要给当前登录用户弹出验证码或者提示更高级别的手机验证或直接禁止用户登录；
 26 | 
 27 | 另一类是主动风险阻断，这一类是在帐号登录后通过用户登录后行为持续分析是否存在风险的，如果用户的某些特定行为触发了风险规则，而用户又已经登入了帐号，这个时候可以选择将用户登录态置为失效，风险用户就会在下一步操作时发现需要重新登录了，而在再次登录的时候可以通过增强验证或帐号锁定的方式阻断风险。
 28 | 
 29 | **帐号层内层是账户层：**这一层在许多企业中区分的可能不是非常明显，但当我们使用支付密码去使用账户资金下单的时候，往往使用的是与登录密码不同的支付密码或支付验证手机，那么这就意味着已经进入了最重要的资金层，在这一层对风险的阻断手法也以保护资金不被盗用为主，包含冻结资金、锁定支付账户或是拦截支付请求亦或订单。
 30 | 
 31 | 需要注意的是每一层的风险都可被上一层的阻断机制覆盖，但考虑到越向内层的行为越产生业务价值，所以在封禁手段上会更加细致，否则为了拦截订单而去拦截一个 IP 可能会产生误封的风险。
 32 | 
 33 | ### TH-Nebula 的风险拦截机制
 34 | 
 35 | 可以从。上一节看出一个成熟的阻断机制并未能够由某个单独的系统完成，TH-Nebula 与拦截链中的各个节点打通是通过两种机制来完成的
 36 | 
 37 | 
 38 | 
 39 | ## 拉黑阻断机制
 40 | 
 41 | 针对业务系统拉黑阻断，系统提供以下两种风险数据获取方法：
 42 | 
 43 | 主动推送：TH-Nebula 可以将分析发现的风险推送至拦截节点进行自动的风险阻断。
 44 | 
 45 | 被动调用：TH-Nebula 可以将分析发现的风险名单以接口方式提供给拦截节点调用判断风险。
 46 | 
 47 | ### 被动调用方式
 48 | 
 49 | **`/checkRisk`** 风险检测接口, 业务方发起风险检测请求, 判定`USER`, `IP`, `DID`,`ORDER ID`是否有风险.
 50 | 
 51 | 参数:  
 52 | `query`:
 53 | 需要查询的具体内容, 格式如下(需`url`编码):
 54 | 
 55 | ```json
 56 | {
 57 | 	"check_item": [
 58 | 		{
 59 | 			"k": "USER", //检测类型
 60 | 			"v": "threathunter_test" //值
 61 | 		},
 62 | 		{
 63 | 			"k": "USER",
 64 | 			"v": "threathunter"
 65 | 		}
 66 | 	],
 67 | 	"full_respond": true, //获取全部有效事件
 68 | 	"scene_type": "ORDER" //指定场景
 69 | }
 70 | ```
 71 | 备注:
 72 | >   检测类型:   ` IP` , `DEVICE ID`, `USER`, `ORDERID`  
 73 | >   场景: `OTHER`, `VISITOR`, `ACCOUNT`, `MARKETING`, `ORDER`, `TRANSACTION`
 74 | 
 75 | `auth`:
 76 |  鉴权用，在`nebula_web` 中 `setting.py`中配置:
 77 | 
 78 | 示例:
 79 | 
 80 | ```
 81 | http://127.0.0.1:9001/checkRisk?auth=7a7c4182f1bef7504a1d3d5eaa51a242&query=%7b>"check_item"%3a%5b%7b"k"%3a"USER"%2c"v"%3a"threathunter_test"%7d%2c%7b"k"%3a"USER"%2c"v"%3a"threathunter"%7d%5d%2c"full_respond"%3atrue%2c"scene_type"%3a"ORDER"%7d
 82 | ```
 83 | 
 84 | ![auth.png](http://www.z4a.net/images/2018/12/10/e990124abe12f054ceafdca6975a6179.png)
 85 | 
 86 | ### 主动推送方式
 87 | 
 88 | **推荐方式:**   
 89 | 采用redis的发布与订阅(publish/subscribe)模式，业务方订阅channel `nebula.realtime.notice`  获取实时通知即可
 90 | 
 91 | *数据示例:*
 92 | ```json
 93 | {
 94 | 	"remark": ">100, avg < 0.8, in 5m, web", //风险备注
 95 | 	"geo_city": "深圳市", //市
 96 | 	"checkpoints": "",
 97 | 	"timestamp": 1552635533123, //触发时间
 98 | 	"decision": "review", //决策，业务侧以此为处理依据（跟策略的配置有关）
 99 | 	"tip": "IP页面停留时间过短Web",
100 | 	"variable_values": "",
101 | 	"risk_score": 0, 
102 | 	"test": 0, //是否是测试策略
103 | 	"strategy_name": "IP页面停留时间过短Web", //触发策略名
104 | 	"expire": 1552635833123, //过期时间，跟策略的配置有关
105 | 	"key": "183.15.177.209", //风险值
106 | 	"scene_name": "VISITOR", //风险场景: `OTHER`, `VISITOR`, `ACCOUNT`, `MARKETING`, `ORDER`, `TRANSACTION`
107 | 	"uri_stem": "112.74.58.210/user", //关联页面
108 | 	"trigger_event": "{\"s_ip\": \"172.18.16.169\", \"app\": \"nebula\", \"pid\": \"000000000000000000000000\", \"s_type\": \"text/html; charset=utf-8\", \"uri_stem\": \"112.74.58.210/user\", \"c_bytes\": 0, \"id\": \"5c8b568c12a3650017e176e4\", \"uid\": \"\", \"request_time\": 3, \"platform\": \"\", \"s_body\": \"\", \"sid\": \"\", \"s_port\": 9001, \"method\": \"GET\", \"status\": 200, \"is_static\": false, \"geo_city\": \"\深\圳\市\", \"c_body\": \"\", \"timestamp\": 1552635532701, \"geo_province\": \"\广\东\省\", \"host\": \"112.74.58.210\", \"referer\": \"\", \"c_ip\": \"183.15.177.209\", \"key\": \"183.15.177.209\", \"useragent\": \"python-requests/2.11.1\", \"c_port\": 18311, \"xforward\": \"\", \"c_type\": \"\", \"name\": \"HTTP_DYNAMIC\", \"did\": \"\", \"s_bytes\": 648, \"request_type\": \"\", \"value\": 1.0, \"cookie\": \"group_id=2|1:0|10:1540368603|8:group_id|4:Mg==|30789028a7e8399f5f5ef115fc050e1b3424df25b966755be7554e0048dd29a4; user_id=2|1:0|10:1540368603|7:user_id|4:Mg==|141555b29c711f95ddebea3987820fb90c30a48f506eae9f8cee9b334372ae79; user=attack_test; auth=2|1:0|10:1540368603|4:auth|44:NGJlZTQwMDI0NTExYjM2NDVkNjkzOTM1ZTJmMDllMWY=|34718dcbcac603ee1a38daaa0a05fbafc524873af0fafb0013077a53a28b2d4b\", \"page\": \"112.74.58.210/user\", \"uri_query\": \"\", \"referer_hit\": \"F\"}", //触发事件
109 | 	"geo_province": "广东省", //省
110 | 	"check_type": "IP" //值类型, 如：` IP` , `DEVICE ID`, `USER`, `ORDERID`
111 | }
112 | ```
113 | 
114 | ![image](https://user-images.githubusercontent.com/31437628/54415966-e9eee900-4738-11e9-86fa-7b0229242216.png)
115 | 
116 | 
117 | 下表列举了 TH-Nebula 的两种风险通知机制如何应用于各个业务环节
118 | 
119 | |	|访客层	|帐号层	|账户层	|
120 | |---	|---	|---	|---	|
121 | |IP	|推送IP至业务中间件拦截；推送IP至串行网络阻断设备阻断IP(需提供API接口）
122 | 推送IP至通用验证码拦截页面	|入口调用Nebula风险判断接口对风险IP弹出验证码防止撞库风险	|订单调用Nebula风险判断接口对风险IP下单时弹出验证码防止机器恶意下单	|
123 | |帐号（用户名、会话ID)	|	|入口调用 TH-Nebula 风险判断接口阻拦风险帐号登录；
124 | 登录后 TH-Nebula 推送风险帐号给登录态管理服务拦截已登录风险帐号	|调用 TH-Nebula 风险判断接口对风险帐号修改交易密码请求由使用原密码改为验证交易绑定手机	|
125 | |账号（订单号、流水号等）	|	|	|下单调用 TH-Nebula 风险判断接口阻拦风险交易；
126 | TH-Nebula 推送风险订单号给订单取消接口	|
127 | 
128 | ### 推荐的阻断机制方案
129 | 
130 | **方案1:NGINX+LUA插件阻断**
131 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9ipvrrkj20xq0fmq7i.jpg)在 Tomcat Cluster 前增加一-组 NGNIX 集群做为 PROXY，使用NGINX+LUA 插件可根据 HTTP 请求报文中的信息实现风险阻断，比如 IP、设备信息等
132 | 
133 | **GOOD SIDE**: NGNIX 插件我们已经有标准解决方案，增加一-层透明层，提升了架构的灵活性，而且不依赖于后端 WEB 容器和程序语言，是目前成熟互联网公司逐渐采取的方式
134 | **BAD SIDE**：对于一些比较传统稳定的架构，需要做架构改造，带来了一定工作量
135 | **推荐：**对于中间件比较成熟，希望最小代价达到最好效果的用户
136 | 
137 | ### 阻断机制最佳实践
138 | 
139 | **用户 A：主要利用 TH-Nebula 防范爬虫风险**
140 | 
141 | 拦截实现：客户产品页长期有大量的网页爬虫爬取商品价格信息，在通过 TH-Nebula 策略实现爬虫 IP 自动捕获后，开始进行自动阻断打通，最终实现方案如下：
142 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9j4wbhvj211e0ckjw3.jpg)用户有一个 REDIS 黑名单服务用于拦截风险 IP，可将爬虫的请求拦截在应用服务之前，客户系统从redis中实时读取 TH-Nebula 推送出来的风险名单，提取出触发了爬虫策略的IP并存储到 REDIS 黑名单，客户通过 NGINX 将爬虫请求拦截返回固定页面提示用户访问存在异常。
143 | 
144 | **用户 B：主要利用 TH-Nebula 防范帐号盗用风险**
145 | 
146 | 拦截实现：客户存在有多个登录入口，各个入口强度不一，客户系统从redis中实时读取 TH-Nebula 推送出来的风险名单，发现撞库行为后，开始进行自动阻断打通，最终实现方案如下
147 | ![](http://ww1.sinaimg.cn/large/66d0828fgy1g1p9j7at0hj210w0d20w0.jpg)目前用户B 的单点登录（包含移动和新版页面），旧官网页面已接入风险判断接口提供帐号安全防护，由于所有风险判断请求均在内网完成，耗时在 50 ms 以内，达到业务方的响应速度要求。
148 | 


--------------------------------------------------------------------------------
/chapter4/section0.md:
--------------------------------------------------------------------------------
1 | # 4. 设计理念
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter4/section1.md:
--------------------------------------------------------------------------------
 1 | # 4.1. 数据采集
 2 | 
 3 | 
 4 | 业务风控主要是依赖业务数据进行判断感知, 需要有数据才能做后续的分析等等一系列的处理,  所以数据采集几乎是决定风控系统成败的关键. 下面我们来介绍我们在风控系统数据采集这一块的考虑.
 5 | 
 6 | ## 数据数量
 7 | 
 8 | 我们在做数据采集的时候, 应该尽可能拿到更多的数据, 拿到的数据越详细越好.例如分析账号风险, 如果我们拿到了用户登陆注册的数据, 我们可以从登陆注册的频率, 登陆注册的时间以及登陆注册的地点等相关特征类进行分析; 如果我们可以拿到用户在执行登陆注册时的上下文操作数据, 比如执行操作之前访问的页面, 加载的资源以及执行操作之后访问了什么页面或者数据, 这样我们就可以根据用户操作的行为轨迹增加更多可供分析的维度.
 9 | 
10 | ##  数据格式
11 | 
12 | 在确认好可以拿到的数据之后, 我们就有必要定义一个标准的数据格式, 这点我们在后面的数据分析中会做详细说明.比如常见的登陆, 注册, 下单, 结算等等都需要给出一个标准的数据格式, 并且属性字段命名一定要统一, 这样方便后续的管理以及一系列的计算操作, 避免由于字段命名不统一这种可以避免的疏忽造成不必要的麻烦.
13 | 
14 | ## 数据的质量
15 | 
16 | 数据的质量主要从两个方面来考虑:
17 | 
18 | ### 数据字段完整性
19 | 数据字段比如IP地址, 端口, User-Agent, Cookie等字段都是后续分析不可缺少的字段, 这些字段的缺失可能会造成很多后续的工作没法开展, 我们在采集的时候就要有一个明确的字段列表, 这样后续的工作才好以这个为基础开展.
20 | 
21 | ### 数据的准确性
22 | 数据准确性主要体现在数据字段对应和数据采集方式上.
23 | 数据采集入库, 我们需要保证数据与字段的对应关系, 这点需要在部署前期跟客户沟通确认, 以保证服务的质量.
24 | 数据采集手段主要有主动采集和被动采集两种:
25 | 
26 | - 主动方式
27 | 主动方式就是去客户的数据库, 日志里面去读取数据.这种方式实时性较差, 并且数据不能保证能拿到我们想要的字段.当然也有些公司有自己成熟的消息处理能力, 我们可以直接将此作为数据源进行采集, 但是总体来讲这种情况比较少.
28 | 
29 | - 被动方式
30 | 被动方式是指客户提供接口, 我们按照标准的数据格式将数据接收过来.
31 | 这里也存在两种方式, 第一种是通过在客户前端埋点的形式将数据流量发送过来; 第二种是通过旁路数据转发的形式将数据流量发送过来.被动接收的方法配合周期相对较长, 但是我们可以拿到高质量的数据, 所以这个是比较常见的风控系统的搭建方式.
32 | 
33 | 
34 | 上面说的这几种情况只是我们一些很基本的想法, 真正实际操作起来需要考虑的远远不止如此, 在这里就不一一赘述.
35 | 


--------------------------------------------------------------------------------
/chapter4/section2.md:
--------------------------------------------------------------------------------
  1 | # 4.2. 数据分析
  2 | 
  3 | 
  4 | 在获取了足够多的数据（后面称为原始数据）之后, 接下来我们要考虑的事情就是怎么去处理这一些数据, 怎样从数据中获取我们需要的信息.在这里我们有两件事情需要考虑:
  5 | 
  6 | - 数据建模
  7 | - 数据计算
  8 | 
  9 | 下面容我向大家一一道来:
 10 | 
 11 | ## 数据建模
 12 | 
 13 | 数据建模, `指的是对现实世界各类数据的抽象组织, 确定数据库需管辖的范围, 数据的组织形式等直至转化成现实的数据库`(这里借用某百科的解释) .
 14 | 
 15 | 通俗来讲, 就是我们现在采集到的是一些杂乱无章的数据, 每一个客户的数据均不相同, 如果我们就这样直接计算分析, 那就得一个客户一套分析方案, 咱们这个项目也就没有执行的必要了. 所以我们需要对我们拿到的数据做一个抽象, 把每一个数据抽象成一个事件, 把每一个事件中的数据字段抽象成一个属性, 然后再做其他的处理.
 16 | 
 17 | 但是单单只有事件, 好像计算分析不知道该怎么下手, 所以这里我们又引入了变量, 策略两个新模型, 接下来我们简单讲讲这三个模型.
 18 | 
 19 | 
 20 | ### 事件模型
 21 | 
 22 | 事件模型, 根据我们首先抽象出下面这种事件模型, 暂时叫做`0.1`版本:
 23 | 
 24 | ```json
 25 | {
 26 | 	"properties":[
 27 | 		{
 28 | 			"name": "c_ip",
 29 | 			"type": "string",
 30 | 			"subtype": "",
 31 | 			"visible_name": "客户端ip",
 32 | 			"remark": "客户端IP(默认取xforward最后一个)"
 33 | 		}, {
 34 | 			"name": "sid",
 35 | 			"type": "string",
 36 | 			"subtype": "",
 37 | 			"visible_name": "session会话ID",
 38 | 			"remark": "session会话ID"
 39 | 		}, {
 40 | 			"name": "uid",
 41 | 			"type": "string",
 42 | 			"subtype": "",
 43 | 			"visible_name": "用户ID",
 44 | 			"remark": "用户ID"
 45 | 		}, {
 46 | 			"name": "did",
 47 | 			"type": "string",
 48 | 			"subtype": "",
 49 | 			"visible_name": "设备ID",
 50 | 			"remark": "设备ID"
 51 | 		}, {
 52 | 			"name": "platform",
 53 | 			"type": "string",
 54 | 			"subtype": "",
 55 | 			"visible_name": "客户端类型",
 56 | 			"remark": "客户端类型"
 57 | 		}, {
 58 | 			"name": "page",
 59 | 			"type": "string",
 60 | 			"subtype": "",
 61 | 			"visible_name": "伪静态页面加工后地址",
 62 | 			"remark": "伪静态页面加工后地址(全部小写, 去端口)"
 63 | 		}, {
 64 | 			"name": "c_port",
 65 | 			"type": "long",
 66 | 			"subtype": "",
 67 | 			"visible_name": "客户端端口",
 68 | 			"remark": "客户端端口"
 69 | 		}, {
 70 | 			"name": "c_bytes",
 71 | 			"type": "long",
 72 | 			"subtype": "",
 73 | 			"visible_name": "请求内容大小",
 74 | 			"remark": "请求内容大小"
 75 | 		},
 76 |         ……
 77 |         , {
 78 | 			"name": "geo_province",
 79 | 			"type": "string",
 80 | 			"subtype": "",
 81 | 			"visible_name": "地理位置",
 82 | 			"remark": "地理位置"
 83 | 		}
 84 | 	]
 85 | }
 86 | ```
 87 | 
 88 | 后面发现这个还是有很明显的缺陷, 因为这样建立的事件模型解析出来的都是一类事件, 这给我们后续的计算分析工作带来很大的麻烦, 所以我们又下面的建立``0.2`版本:
 89 | 
 90 | ```json
 91 | {
 92 |     "name": "HTTP_DYNAMIC",
 93 | 	"visible_name": "动态资源请求",
 94 | 	"properties":[
 95 | 		{
 96 | 			"name": "c_ip",
 97 | 			"type": "string",
 98 | 			"subtype": "",
 99 | 			"visible_name": "客户端ip",
100 | 			"remark": "客户端IP(默认取xforward最后一个)"
101 | 		}, {
102 | 			"name": "sid",
103 | 			"type": "string",
104 | 			"subtype": "",
105 | 			"visible_name": "session会话ID",
106 | 			"remark": "session会话ID"
107 | 		}, {
108 | 			"name": "uid",
109 | 			"type": "string",
110 | 			"subtype": "",
111 | 			"visible_name": "用户ID",
112 | 			"remark": "用户ID"
113 | 		}, {
114 | 			"name": "did",
115 | 			"type": "string",
116 | 			"subtype": "",
117 | 			"visible_name": "设备ID",
118 | 			"remark": "设备ID"
119 | 		}, {
120 | 			"name": "platform",
121 | 			"type": "string",
122 | 			"subtype": "",
123 | 			"visible_name": "客户端类型",
124 | 			"remark": "客户端类型"
125 | 		}, {
126 | 			"name": "page",
127 | 			"type": "string",
128 | 			"subtype": "",
129 | 			"visible_name": "伪静态页面加工后地址",
130 | 			"remark": "伪静态页面加工后地址(全部小写, 去端口)"
131 | 		}, {
132 | 			"name": "c_port",
133 | 			"type": "long",
134 | 			"subtype": "",
135 | 			"visible_name": "客户端端口",
136 | 			"remark": "客户端端口"
137 | 		}, {
138 | 			"name": "c_bytes",
139 | 			"type": "long",
140 | 			"subtype": "",
141 | 			"visible_name": "请求内容大小",
142 | 			"remark": "请求内容大小"
143 | 		},
144 |         ……
145 |         , {
146 | 			"name": "geo_province",
147 | 			"type": "string",
148 | 			"subtype": "",
149 | 			"visible_name": "地理位置",
150 | 			"remark": "地理位置"
151 | 		}
152 | 	]
153 | }
154 | ```
155 | 
156 | 这个版本我们根据资源的请求方式将事件划分为动态资源请求`HTTP_DYNAMIC`和静态资源请求`HTTP_STATIC`, 但是这个模型距离我们分析的需求还有很大差距, 于是我们根据不同的业务场景(比如登陆, 注册, 下单, 支付等)在这两个分类上加上了其他的事件类型, 于是就出现了`0.3`版本的模型:
157 | 
158 | ```json
159 | {
160 |     "name": "HTTP_DYNAMIC",
161 | 	"visible_name": "动态资源请求",
162 |     "remark": "动态资源请求",
163 |     "source": [],
164 | 	"properties":[
165 | 		{
166 | 			"name": "c_ip",
167 | 			"type": "string",
168 | 			"subtype": "",
169 | 			"visible_name": "客户端ip",
170 | 			"remark": "客户端IP(默认取xforward最后一个)"
171 | 		}, {
172 | 			"name": "sid",
173 | 			"type": "string",
174 | 			"subtype": "",
175 | 			"visible_name": "session会话ID",
176 | 			"remark": "session会话ID"
177 | 		}, {
178 | 			"name": "uid",
179 | 			"type": "string",
180 | 			"subtype": "",
181 | 			"visible_name": "用户ID",
182 | 			"remark": "用户ID"
183 | 		}, {
184 | 			"name": "did",
185 | 			"type": "string",
186 | 			"subtype": "",
187 | 			"visible_name": "设备ID",
188 | 			"remark": "设备ID"
189 | 		}, {
190 | 			"name": "platform",
191 | 			"type": "string",
192 | 			"subtype": "",
193 | 			"visible_name": "客户端类型",
194 | 			"remark": "客户端类型"
195 | 		}, {
196 | 			"name": "page",
197 | 			"type": "string",
198 | 			"subtype": "",
199 | 			"visible_name": "伪静态页面加工后地址",
200 | 			"remark": "伪静态页面加工后地址(全部小写, 去端口)"
201 | 		}, {
202 | 			"name": "c_port",
203 | 			"type": "long",
204 | 			"subtype": "",
205 | 			"visible_name": "客户端端口",
206 | 			"remark": "客户端端口"
207 | 		}, {
208 | 			"name": "c_bytes",
209 | 			"type": "long",
210 | 			"subtype": "",
211 | 			"visible_name": "请求内容大小",
212 | 			"remark": "请求内容大小"
213 | 		},
214 |         ……
215 |         , {
216 | 			"name": "geo_province",
217 | 			"type": "string",
218 | 			"subtype": "",
219 | 			"visible_name": "地理位置",
220 | 			"remark": "地理位置"
221 | 		}
222 | 	]
223 | }
224 | ```
225 | 
226 | 
227 | 
228 | 看上面这个动态资源请求的模型也许并不能看出明显的区别, 下面我们看一个账号登陆的模型:
229 | 
230 | 
231 | 
232 | ```json
233 | {
234 | 	"app": "nebula",
235 | 	"name": "ACCOUNT_LOGIN",
236 | 	"visible_name": "账号-登录",
237 | 	"remark": "账号-登录",
238 | 	"source": [{
239 | 		"app": "nebula",
240 | 		"name": "HTTP_DYNAMIC"
241 | 	}],
242 | 	"properties": [{
243 | 		"name": "c_ip",
244 | 		"type": "string",
245 | 		"subtype": "",
246 | 		"visible_name": "客户端ip",
247 | 		"remark": "客户端IP(默认取xforward最后一个)"
248 | 	}, {
249 | 		"name": "sid",
250 | 		"type": "string",
251 | 		"subtype": "",
252 | 		"visible_name": "session会话ID",
253 | 		"remark": "session会话ID"
254 | 	}, {
255 | 		"name": "did",
256 | 		"type": "string",
257 | 		"subtype": "",
258 | 		"visible_name": "设备ID",
259 | 		"remark": "设备ID"
260 | 	}, {
261 | 		"name": "platform",
262 | 		"type": "string",
263 | 		"subtype": "",
264 | 		"visible_name": "客户端类型",
265 | 		"remark": "客户端类型"
266 | 	}, {
267 | 		"name": "page",
268 | 		"type": "string",
269 | 		"subtype": "",
270 | 		"visible_name": "伪静态页面加工后地址",
271 | 		"remark": "伪静态页面加工后地址(全部小写, 去端口)"
272 | 	}, {
273 | 		"name": "c_port",
274 | 		"type": "long",
275 | 		"subtype": "",
276 | 		"visible_name": "客户端端口",
277 | 		"remark": "客户端端口"
278 | 	}, {
279 | 		"name": "c_bytes",
280 | 		"type": "long",
281 | 		"subtype": "",
282 | 		"visible_name": "请求内容大小",
283 | 		"remark": "请求内容大小"
284 | 	},
285 | 	……
286 | 	, {
287 | 		"name": "login_channel",
288 | 		"type": "string",
289 | 		"subtype": "",
290 | 		"visible_name": "登陆渠道",
291 | 		"remark": "登陆渠道"
292 | 	}]
293 | }
294 | ```
295 | 
296 | 我们这个版本增加了一个`source`字段, 表示该事件是以`source`生成的事件, 上面这个账号登陆事件表示的正是如此.到这里我们针对原始数据的处理也就做完了, 事件模型也就基本确定了下来.
297 | 
298 | ### 变量模型
299 | 
300 | 顾名思义, 变量模型是为了方便我们后续的计算分析而增加的一个模型.
301 | 一想到变量, 大家可以想到的不外乎是计算, 分类, 过滤, 统计, 聚合之类的概念, 我们自然也不会脱离这些基本的概念, 可以说我们之所以想单独把变量模型化就是为了能更好, 更方便的实现这些概念. 首先与事件类似, 变量肯定需要按照不同的场景以及不同的计算要求进行分类, 例如
302 | 
303 | - 按照变量生成方式进行分类, 比如按照基础变量与策略引擎配置生成的变量分类.
304 | - 按照不同的计算模式进行分类, 比如top, filtter等.
305 | - ……
306 | 
307 | 于是, 到最终的版本, 我们的变量模型type就分成了filter, aggregate, collector, delaycollector, dual, event, internal, sequence, top. 值得注意的是与事件模型类似, 非event的变量模型均是基于event变量模型上配置生成, 所以我们这里自然需要配置resource字段.
308 | 
309 | 到这新的问题也就出现了, 总感觉还差点什么, 变量就是为了计算, 我们还需要给变量添加各种与计算相关的字段, 比如, 计算函数, 过滤条件, 变量存活时间等, 总之变量模型的设计不是拍拍脑袋就可以完成的事情, 需要慢慢摸索, 积累. 并且模型设计本来就没有一个最好的说法, 后期慢慢完善才是正解. 最后附上我们的最终模型示例:
310 | 
311 | ```
312 | {
313 | 	"status": "enable",
314 | 	"filter": {
315 | 		"type": "and",
316 | 		"condition": [{
317 | 			"object_subtype": "",
318 | 			"object": "value",
319 | 			"object_type": "long",
320 | 			"value": "5",
321 | 			"source": "_ip__strategy__1542255171782__4950E585B3E88194E5A49AE8AEBEE5A487E8AFB7E6B182E799BBE5BD95__counter__2__rt",
322 | 			"param": "",
323 | 			"operation": ">",
324 | 			"type": "simple"
325 | 		}]
326 | 	},
327 | 	"remark": "collector for strategy IP关联多设备请求登录",
328 | 	"name": "_ip__strategy__1542255171782__4950E585B3E88194E5A49AE8AEBEE5A487E8AFB7E6B182E799BBE5BD95__collect__rt",
329 | 	"hint": {},
330 | 	"value_category": "",
331 | 	"app": "nebula",
332 | 	"period": {},
333 | 	"module": "realtime",
334 | 	"value_subtype": "",
335 | 	"visible_name": "collector for strategy IP关联多设备请求登录",
336 | 	"source": [{
337 | 		"app": "nebula",
338 | 		"name": "_ip__strategy__1542255171782__4950E585B3E88194E5A49AE8AEBEE5A487E8AFB7E6B182E799BBE5BD95__trigger__rt"
339 | 	}, {
340 | 		"app": "nebula",
341 | 		"name": "_ip__strategy__1542255171782__4950E585B3E88194E5A49AE8AEBEE5A487E8AFB7E6B182E799BBE5BD95__counter__2__rt"
342 | 	}],
343 | 	"value_type": "",
344 | 	"groupbykeys": ["c_ip"],
345 | 	"function": {
346 | 		"object_subtype": "",
347 | 		"object": "",
348 | 		"object_type": "",
349 | 		"param": "IP关联多设备请求登录",
350 | 		"source": "_ip__strategy__1542255171782__4950E585B3E88194E5A49AE8AEBEE5A487E8AFB7E6B182E799BBE5BD95__trigger__rt",
351 | 		"config": {
352 | 			"trigger": "_ip__strategy__1542255171782__4950E585B3E88194E5A49AE8AEBEE5A487E8AFB7E6B182E799BBE5BD95__trigger__rt"
353 | 		},
354 | 		"method": "setblacklist"
355 | 	},
356 | 	"type": "collector",
357 | 	"dimension": "ip"
358 | }
359 | ```
360 | 
361 | 
362 | ###  策略模型
363 | 
364 | 既然是业务风控, 自然就少不了策略的配置, 风控前期靠开发, 后期靠运营, 而策略则将是后期整个运营的核心.
365 | 
366 | 策略, 就是计算的规则, 就是要告诉我们的计算引擎数据该怎么计算, 如果结合业务来说那就是告诉系统什么样的事件是风险事件, 当然, 如果结合我们整体的系统设计来讲的话, 策略就是告诉系统如何去生成计算变量. 到这里, 大家应该也就清楚了, 策略模型是一个面向用户的模型, 是一个用来配置计算变量的模型, 下面我们来简略地讲讲我们策略模型的设计想法.
367 | 
368 | 首先, 模型分类必然是不可以缺少的, 既然是用于配置计算变量的, 那么跟变量理应也是一致的. 然后既然是配置计算变量, 那么自然也就需要进行左右值的对比, 对比规则(基础运算, 正则匹配等)以及设置生效存活时间之类的字段, 这自然不会是一个一蹴而就的过程, 明显是一个需要我们精益求精, 不断尝试的过程. 最后, 我们给机器识别的字段之后, 在添加几个用户可读的备注说明字段, 到这里策略模型的设计才算是圆满结束, 此处附上我们最终的设计实例.
369 | 
370 | ```json
371 | {
372 |     "status": "inedit",
373 |     "terms": [{
374 |       "scope": "realtime",
375 |       "remark": "",
376 |       "op": "!regex",
377 |       "right": {
378 |         "subtype": "",
379 |         "config": {
380 |           "value": "^\\s*$"
381 |         },
382 |         "type": "constant"
383 |       },
384 |       "left": {
385 |         "subtype": "",
386 |         "config": {
387 |           "field": "page",
388 |           "event": ["nebula", "ORDER_SUBMIT"]
389 |         },
390 |         "type": "event"
391 |       }
392 | }
393 | ```
394 | 
395 | 
396 | ## 数据计算
397 | 
398 | 数据计算这块, 我们根据数据计算需求频率和资源消耗等考量, 把数据计算部分划分为两个部分:
399 | 
400 | - 实时计算:主要负责时间较短的变量计算以及统计
401 | - 离线计算:主要复制时间较长的变量计算, 统计以及数据持久化
402 | 
403 | ### 实时计算
404 | 
405 | 实时计算这块又根据计算的频率分为实时计算和准实时计算.
406 | 
407 | - 实时计算
408 | 
409 |   实时计算提供实时的, 准确的, 基于滑动窗口的统计和计算值, 具体表现为5分钟内数据计算.
410 | 
411 |   例如:
412 | 
413 |   (1) 5分钟内某个ip的实时访问总量, 静态资源访问总量, 某个页面的访问总量. 这个直接做过滤和聚合就可以.
414 | 
415 |   (2) 5分钟内某个ip的post比率. 这种需要多个变量作组合, 来生成新的变量.
416 | 
417 |   (3) 5分钟内某个ip的爬虫指数. 这个是偏业务的, 由多个变量来组合.
418 | 
419 |   (4) 5分钟内某个user的不同ip数目.
420 | 
421 |   这些指标从原始事件中生成, 能反应用户的实时特征, 对安全方面的需求会非常有价值. 这类指标的计算频率会比较大, 相关联的周期和数据会比较小.
422 | 
423 | - 准实时变量
424 | 
425 |   准实时计算频度略低, 但能反应较长时间段内的统计量, 例如:
426 | 
427 |   (1) 当天某个ip的实时访问总量
428 | 
429 |   (2) 当前小时内某个ip的post请求总量
430 | 
431 | 
432 | ### 离线计算
433 | 
434 | 离线计算主要是长时间的变量计算和数据持久化. 数据持久化即为数据存储, 将会在架构设计中较为详细说明, 此处不做说明, 这里我们主要聊聊计算部分.
435 | 
436 | 离线计算用来计算更长时间段内的统计量, 例如:
437 | 
438 | (1) 5天内某个ip的实时访问总量
439 | 
440 | (2) 30天内某个ip的post请求总量
441 | 
442 | (3) 3小时内某个ip的静态资源访问总量
443 | 
444 | 这类统计:
445 | 
446 | (1) 时间跨度比较长, 计算量比较大
447 | 
448 | (2) 因此要求查询次数不能过高
449 | 
450 | (3) 值的精确度和时间的精确度做不到100%（出于实现的考虑）
451 | 


--------------------------------------------------------------------------------
/chapter4/section3.md:
--------------------------------------------------------------------------------
 1 | # 4.3. 架构设计
 2 | 
 3 | 
 4 | `TH-Nebula`是一个互联网风控分析和检测平台, 可以对包括流量攻击, 账户攻击, 支付攻击在内的多种业务风险场景进行细致的分析, 找出威胁流量, 帮助用户减少损失.
 5 | 
 6 | 与常见的一些简易安全防护软件不同, `TH-Nebula`本质上应该是一套完整且独立的数据分析平台, 逻辑上, 它需要提供以下几个方面的功能:
 7 | 
 8 | - 数据采集与集成平台. 负责对接客户现有系统不同形式存在的各种原始数据, 包括流量, 实时日志, 日志文件等.
 9 | 
10 | - 数据规整化与业务日志提取系统. `TH-Nebula`对原始数据进行清洗和标准转换, 并根据配置抽象出各种标准的业务日志, 方便后续进一步的分析.
11 | 
12 | - 数据持久化功能.对于进入系统的日志, 进行持久化, 方便后续的离线计算以及攻击溯源操作.
13 | 
14 | - 海量数据实时计算引擎. 对进入系统的海量数据, 进行大规模实时并行计算, 得到关于用户的实时统计特征
15 | 
16 | - 海量数据离线批处理计算引擎. 对进入系统的海量数据, 间隔性地进行离线批处理计算, 得到关于用户的固定特征
17 | 
18 | - 高性能策略引擎. 利用实时计算和离线计算的数据, 对所有用户访问进行策略判别, 识别出风险流量, 方便后续进一步处理
19 | 
20 | - 风险事件和黑白名单管理功能. 对于系统中识别出的风险事件, 以及与之相关的黑白名单进行管理和查询
21 | 
22 | - 数据可视化和风险数据自助式分析系统. 方便对原始数据进行`review`, 对风险情况进行溯源
23 | 
24 | - 数据导出和`API`集成. 用于将黑白名单和风险事件导出, 集成到客户系统; 同时可以进一步将系统内部数据导出.
25 | 
26 | - 系统配置和管理功能. 复杂的系统需要配合相应的管理工具.
27 | 
28 | 系统粗略的架构设计如下图所示:
29 | 
30 | ![24.TH-Nebula架构](http://www.z4a.net/images/2018/11/28/24.png)
31 | 
32 | 
33 | 整个`TH-Nebula`系统, 功能比较完整和复杂, 无法用单个进程或软件的形态来提供这样一整套平台.在物理实现上, 将由多个独立的组件组成, 纯业务模块包括:
34 | 
35 | - 数据采集和转化模块. 数据采集和规整化由单个物理模块提供.
36 | 
37 | - 数据实时计算模块和规则引擎. 提供了系统中实时处理的功能, 包括实时计算, 准实时计算, 策略引擎等.为了简化, 目前统一在实时模块.
38 | 
39 | - 数据离线计算模块. 离线计算负责离线数据的统计计算, 数据呈现以及数据持久化等.
40 | 
41 | - 系统配置和管理模块. 配置和所有的数据管理都由单独的web应用负责.
42 | 
43 | - `TH-Nebula`前端展现模块. `TH-Nebula`的前端采用`JS`+`API`的模式, 大量的数据展现功能由前端模块来提供支撑.
44 | 
45 | 当然, 系统还用到了许多底层的平台支撑:
46 | 
47 | - 系统缓存``Redis``.``Redis``提供了缓存数据的支撑, 主要包括消息中间件和监控数据的存储.
48 | 
49 | - 文件系统. 通过文件数据库, 可以提供海量数据的存储和查询.
50 | 
51 | - 数据存储`MySQL`. `MySQL`提供了所有具备强持久化需求的数据落地和读取.
52 | 
53 | - 用户画像辅助`Key-Value`数据库`AeroSpike`.`AeroSpike`是一个`Key-Value`数据库, 为用户画像数据的高性能存取提供了支撑.
54 | 
55 | - 其他. 包括负载均衡``Nginx``, 离线管理脚本, 进程监控平台, 定制内核模块等多个其他功能.
56 | 
57 | 下图描述了系统的物理模块组成, 以及逻辑模块在其中的划分
58 | 
59 | ![25.TH-Nebula模块组成](http://www.z4a.net/images/2018/11/28/25.png)
60 | 


--------------------------------------------------------------------------------
/chapter5/section0.md:
--------------------------------------------------------------------------------
1 | # 5. 二次开发
2 | 
3 | 


--------------------------------------------------------------------------------
/chapter5/section1.md:
--------------------------------------------------------------------------------
  1 | # 5.1. Sniffer原理及驱动定制
  2 | 
  3 | 
  4 | 
  5 | sniffer默认使用的是基于bro流量分析的数据源驱动，但也提供了如：redis、kafka、rabbitmq、logstash、UDPServer、syslog、file等其他数据源驱动的demo供大家参考（无法直接使用，需要根据源数据的格式进行调整）
  6 | 
  7 | #### 工作方式
  8 | ---------
  9 | ![工作方式.png](https://i.loli.net/2019/04/01/5ca1eae1d280a.png)
 10 | 
 11 | 
 12 | 
 13 | #### 生成消息
 14 | ---------
 15 | 
 16 |     **此部分需根据自己业务进行修改定制**
 17 | 
 18 | 
 19 | 以Nginx + Logstash为例（Nginx输出日志格式配置仅供参考）
 20 | 
 21 | ```     
 22 | 1) 行格式
 23 | log_format log_line '"$remote_addr" "$remote_port" "$server_addr" "$server_port" "$request_length" \
 24 | "$content_length" "$body_bytes_sent" "$request_uri" "$host" "$http_user_agent" "$status" "$http_cookie" \
 25 | "$request_method" "$http_referer" "$http_x_forwarded_for" "$request_time" "$sent_http_set_cookie" \
 26 | "$content_type" "$upstream_http_content_type" "$request_data"';
 27 | 
 28 | 例：
 29 | <14>Jun 26 17:13:21 10-10-92-198 NGINX[26113]: "114.242.250.233" "65033" "10.10.92.198" "80" "726" "134" \
 30 | "9443" "/gateway/shop/getStroeForDistance" "m.lechebang.com" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_1_3
 31 | like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12B466 MicroMessenger/6.1.4 NetType/3G+" "200" \
 32 | "-" "POST" "http://m.lechebang.com/webapp/shop/list?cityId=10101&locationId=0&brandTypeId=\
 33 | 6454&maintenancePlanId=227223&oilInfoId=3906" "0.114"
 34 | 
 35 | 2) json格式
 36 | log_format log_json '{ "@timestamp": "$time_local", '
 37 | '"remote_addr": "$remote_addr", '
 38 | '"referer": "$http_referer", '
 39 | '"request": "$request", '
 40 | '"status": $status, '
 41 | '"bytes": $body_bytes_sent, '
 42 | '"agent": "$http_user_agent", '
 43 | '"x_forwarded": "$http_x_forwarded_for", '
 44 | '"up_addr": "$upstream_addr",'
 45 | '"up_host": "$upstream_http_host",'
 46 | '"up_resp_time": "$upstream_response_time",'
 47 | '"request_time": "$request_time"'
 48 | ' }';
 49 | 
 50 | 例：
 51 | { "@timestamp": "12/Dec/2017:14:30:40 +0800", "remote_addr": "10.88.122.108", "referer": "-", "request": "GET / HTTP/1.1", "status": 304, "bytes":0, "agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36", "x_forwarded": "-", "up_addr": "-","up_host": "-","up_resp_time": "-","request_time": "0.000" }
 52 | 
 53 | 
 54 | Nginx日志配置请参考：
 55 |     http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
 56 | 
 57 | 注：
 58 |     日志输出的详细字段、字段顺序、字段名，需要与sniffer中对应驱动的消息格式化处理逻辑一致
 59 | 
 60 | ```
 61 | 
 62 | 部署Logstash客户端，抽取Nginx日志输出:
 63 | ```
 64 | #logstash配置示例:
 65 | input {
 66 |     file {
 67 |         path => ['/data/nginx/logs/access.log']
 68 |         start_position => "beginning"
 69 |         codec => "json"
 70 |         tags => ['user']
 71 |         type => "nginx"
 72 |     }
 73 | }
 74 | output {
 75 |     if [type] == "nginx" {
 76 |         #Sniffer logstash驱动模式
 77 |         tcp {
 78 |             host => "127.0.0.1"
 79 |             port => "5044"
 80 |             key => "nginx"
 81 |             db => "10"
 82 |             data_type => "list"
 83 |             codec => "line"
 84 |         },
 85 |         #Sniffer redislist驱动模式
 86 |         redis {
 87 |             host => "127.0.0.1"
 88 |             port => "6379"
 89 |             key => "nginx_access_log"
 90 |             db => "0"
 91 |             data_type => "list"
 92 |             codec => "line"
 93 |         }
 94 |     }
 95 | }
 96 | ```
 97 | 
 98 | #### 读取数据
 99 | ---------
100 | 
101 |     **此部分需根据自己业务进行修改定制**
102 | 
103 | 修改sniffer.conf
104 | ```
105 | #支持同时多源
106 | sources: [logstash,redislist]
107 | 
108 | #对应redislistdriver.py
109 | redislist:
110 |     driver: redislist
111 |     host: 127.0.0.1
112 |     port: 6379
113 |     interface: any
114 |     instances: 1
115 |     parser:
116 |         name: test
117 |         module: testparser
118 | 
119 | #对应logstashdriver.py
120 | #其实是TCPServer
121 | logstash:
122 |     driver: logstash
123 |     port: 5044
124 |     instances: 1
125 |     interface: any
126 |     parser:
127 |         name: test
128 |         module: testparser
129 | 
130 | #对应rabbitmqdriver.py
131 | rabbitmq:
132 |     driver: rabbitmq
133 |     amqp_url: redis://localhost:6379/
134 |     queue_name: test_queue
135 |     exchange_name: test_queue
136 |     exchange_type: direct
137 |     durable: true
138 |     routing_key: test
139 |     instances: 1
140 |     interface: any
141 |     parser:
142 |         name: test
143 |         module: testparser
144 | 
145 | #对应brohttpdriver.py
146 | default:
147 |     driver: bro
148 |     interface: eth0
149 |     ports: [80, 81, 1080, 3128, 8000, 8080, 8888, 9001, 8081]
150 |     start_port: 48880
151 |     instances: 1
152 |     parser:
153 |         name: test
154 |         module: testparser
155 | 
156 | #对应syslogdriver.py
157 | syslog:
158 |     driver: syslog
159 |     interface: eth0
160 |     port: 9514
161 |     parser:
162 |         name: test
163 |         module: testparser
164 | 
165 | #对应kafkadriver.py
166 | kafka:
167 |     driver: kafka
168 |     interface: any
169 |     bootstrap_servers: 127.0.0.1:9092
170 |     parser:
171 |         name: test
172 |         module: testparser
173 | 
174 | ...其他省略...
175 | 
176 | ```
177 | 
178 | #### 驱动定制(消息格式化)
179 | --------
180 | 
181 | 
182 | 
183 | 所有驱动位于:
184 | 
185 |     目录：
186 |         sniffer/nebula_sniffer/nebula_sniffer/drivers/
187 |         #bro驱动
188 |             brohttpdriver.py
189 |         #基于文件的驱动
190 |             filedriver.py
191 |         #kafka驱动
192 |             kafkadriver.py  
193 |         #logstash驱动
194 |             logstashdriver.py
195 |         #UDPServer驱动
196 |             pktdriver.py
197 |         #rabbitmq驱动
198 |             rabbitmqdriver.py
199 |         #redis驱动
200 |             redislistdriver.py
201 |         #syslog驱动
202 |             syslogdriver.py
203 |             syslogtextdriver.py
204 |         #tshark驱动
205 |             tsharkhttpsdriver.py
206 | 
207 | 
208 | 
209 | sniffer会根据sniffer配置加载对应驱动：
210 | 
211 | ```
212 | def get_driver(config, interface, parser, idx):
213 |     """ global c """
214 | 
215 |     from complexconfig.configcontainer import configcontainer
216 | 
217 |     #不同driver，初始化方式不同
218 |     name = config['driver']
219 |     if name == "bro":
220 |         from nebula_sniffer.drivers.brohttpdriver import BroHttpDriver
221 |         embedded = config.get("embedded", True)
222 |         ports = config['ports']
223 |         from nebula_sniffer.utils import expand_ports
224 |         ports = expand_ports(ports)  # extend it
225 |         start_port = int(config['start_port'])
226 |         bpf_filter = config.get("bpf_filter", "")
227 | 
228 |         home = configcontainer.get_config("sniffer").get_string("sniffer.bro.home")
229 | 
230 |         if ports and home:
231 |             driver = BroHttpDriver(interface=interface, embedded_bro=embedded, idx=idx, ports=ports, bro_home=home,
232 |                                    start_port=start_port, bpf_filter=bpf_filter)
233 |         elif ports:
234 |             driver = BroHttpDriver(interface=interface, embedded_bro=embedded, idx=idx, ports=ports,
235 |                                    start_port=start_port, bpf_filter=bpf_filter)
236 |         elif home:
237 |             driver = BroHttpDriver(interface=interface, embedded_bro=embedded, idx=idx, bro_home=home,
238 |                                    start_port=start_port, bpf_filter=bpf_filter)
239 |         else:
240 |             driver = BroHttpDriver(interface=interface, embedded_bro=embedded, idx=idx,
241 |                                    start_port=start_port, bpf_filter=bpf_filter)
242 |         return driver
243 | 
244 |     if name == "tshark":
245 |         from nebula_sniffer.drivers.tsharkhttpsdriver import TsharkHttpsDriver
246 |         interface = interface
247 |         ports = config["ports"]
248 |         bpf_filter = config.get("bpf_filter", "")
249 |         if ports:
250 |             driver = TsharkHttpsDriver(interface=interface, ports=ports, bpf_filter=bpf_filter)
251 |         else:
252 |             driver = TsharkHttpsDriver(interface=interface, bpf_filter=bpf_filter)
253 |         return driver
254 | 
255 |     if name == "syslog":
256 |         from nebula_sniffer.drivers.syslogdriver import SyslogDriver
257 |         port = int(config["port"])
258 |         driver = SyslogDriver(port)
259 |         return driver
260 | 
261 |     if name == "packetbeat":
262 |         from nebula_sniffer.drivers.pktdriver import PacketbeatDriver
263 |         port = int(config["port"])
264 |         driver = PacketbeatDriver(port)
265 |         return driver
266 | 
267 |     if name == "redislist":
268 |         from nebula_sniffer.drivers.redislistdriver import RedisListDriver
269 |         host = config["host"]
270 |         port = int(config['port'])
271 |         password = config.get('password', '')
272 |         driver = RedisListDriver(host, port, password)
273 |         return driver
274 | 
275 |     if name == "logstash":
276 |         from nebula_sniffer.drivers.logstashdriver import LogstashDriver
277 |         port = int(config['port'])
278 |         driver = LogstashDriver(port)
279 |         return driver
280 | 
281 |     if name == "rabbitmq":
282 |         from nebula_sniffer.drivers.rabbitmqdriver import RabbitmqDriver
283 |         amqp_url = config['amqp_url']
284 |         queue_name = config['queue_name']
285 |         exchange_name = config['exchange_name']
286 |         exchange_type = config['exchange_type']
287 |         durable = bool(config['durable'])
288 |         routing_key = config['routing_key']
289 |         driver = RabbitmqDriver(amqp_url, queue_name, exchange_name, exchange_type, durable, routing_key)
290 |         return driver
291 | 
292 |     if name == "kafka":
293 |         from nebula_sniffer.drivers.kafkadriver import KafkaDriver
294 |         topics = config['topics']
295 |         #config['bootstrap_servers']
296 |         #kafka支持的配置参数
297 |         #请参考python kafka库的使用方法
298 |         driver = KafkaDriver(topics, **config)
299 |         return driver
300 | 
301 |     return None
302 | ```
303 | 
304 | 
305 | 下面是logstash驱动示例：
306 | 
307 | ```
308 | #其他部分代码略过，仅贴出核心代码
309 | #此处示例是对nginx json格式log的处理
310 | 
311 | class LogstashDriver(Driver):
312 | 
313 |     ...其他省略...
314 | 
315 |     #对logstash客户端发送过来的消息进行格式化
316 |     def _recv_msg_fn_in(self, msg, addr):
317 |         """
318 |         log-line：
319 |         36.7.130.69 - [16/Jul/2017:23:58:42 +0800] ffp.hnair.com 1 "GET /FFPClub/upload/index/e9b1bb4a-e1dd-47e1-8699-9828685004b4.jpg HTTP/1.1" 200 487752 - "http://ffp.hnair.com/FFPClub/cn/index.html" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0; NetworkBench/8.0.1.309-4992258-2148837) like Gecko" "-"
320 | 
321 |         log-json：
322 |         {"@timestamp_scs": "2017-09-21T12:18:17+08:00", "scs_request_uri": "/site-wap/my/transferin.htm",
323 |          "scs_status": "200", "scs_bytes_sent": "26829", "scs_upstream_cache_status": "-", "scs_request_time": "0.570",
324 |          "scs_upstream_response_time": "0.570", "scs_host": "pay.autohome.com.cn", "scs_remote_addr": "10.20.2.23",
325 |          "scs_server_addr": "10.20.252.33", "scs_upstream_addr": "10.20.252.20:8253", "scs_upstream_status": "200",
326 |          "scs_http_referer": "https://pay.autohome.com.cn/site-wap/activity/upin.htm?__sub_from=A2002027782510100",
327 |          "scs_http_user_agent": "Mozilla/5.0 (Linux; Android 5.1; OPPO A59m Build/LMY47I; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.121 Mobile Safari/537.36 autohomeapp/1.0+%28auto_android%3B8.4.0%3BOi-CR_rnyywoODk5jv0ve5luKLNxl7AfnEsGsBBPNdWdDtP8ZDdRHA3ePFiDlWOr%3B5.1%3BOPPO%2BA59m%29 auto_android/8.4.0 nettype/wifi",
328 |          "scs_http_X_Forwarded_For": "220.166.199.167"}
329 | 
330 |         """
331 | 
332 |         try:
333 |             if not msg:
334 |                 return
335 |             msg = msg.strip()
336 |             if not msg:
337 |                 return
338 |             self.logger.debug("get log msg %s from address %s", msg, addr)
339 | 
340 |             #收到消息后,根据消息格式进行解析处理
341 |             try:
342 |                 msg = json.loads(msg)
343 |             except Exception as e:
344 |                 return
345 | 
346 |             #从消息中提取字段数据
347 |             c_ip = msg.get('scs_http_X_Forwarded_For', '')
348 |             if c_ip:
349 |                 c_ip_group = c_ip.split(',')
350 |                 if c_ip_group:
351 |                     c_ip = c_ip_group[-1]
352 |             c_port = 0
353 |             s_port = 80
354 |             c_bytes = 0
355 |             s_bytes = msg.get('scs_bytes_sent', 0)
356 |             if s_bytes == '-':
357 |                 s_bytes = 0
358 |             else:
359 |                 s_bytes = int(s_bytes)
360 |             status = int(msg.get('scs_status', 0))
361 |             req_body = ''
362 | 
363 |             args = dict()
364 |             args["method"] = 'GET'
365 |             args["host"] = msg.get('scs_host', '').lower()
366 |             args["uri"] = msg.get('scs_request_uri', '').lower()
367 |             args["referer"] = msg.get('scs_http_referer', '').lower()
368 |             args["user_agent"] = msg.get('scs_http_user_agent', '').lower()
369 |             args["status_code"] = status
370 |             args["status_msg"] = ""
371 |             args["source_ip"] = c_ip
372 |             args["source_port"] = c_port
373 |             args["dest_ip"] = ''
374 |             args["dest_port"] = s_port
375 | 
376 |             request_time = 0.0
377 |             try:
378 |                 ctime = msg['@timestamp_scs']
379 |                 ctime = ctime.replace('T', ' ').replace('+08:00', '')
380 |                 time_array = time.strptime(ctime, "%Y-%m-%d %H:%M:%S")
381 |                 # 转换成时间戳
382 |                 request_time = time.mktime(time_array)
383 |             except Exception as e:
384 |                 pass
385 | 
386 |             args["req_time"] = int(request_time * 1000)
387 | 
388 |             # headers
389 |             args["req_headers"] = {}
390 | 
391 |             args["resp_headers"] = {}
392 | 
393 |             # no body for logstash
394 |             args["log_body"] = False
395 |             args["req_body"] = ""
396 |             args["resp_body"] = ""
397 |             args["req_body_len"] = c_bytes
398 |             args["resp_body_len"] = s_bytes
399 |             args["req_content_type"] = ''
400 |             args["resp_content_type"] = ''
401 |             args["req_body"] = req_body
402 | 
403 |             args["debug_processing"] = False
404 | 
405 |             self.logger.debug("get http data from logstash: %s", args)
406 | 
407 |             try:
408 |                 #最终格式化为Httpmsg格式
409 |                 new_msg = HttpMsg(**args)
410 |             except BeFilteredException as bfe:
411 |                 return
412 |             except Exception as err:
413 |                 self.logger.debug("fail to parse: %s", args)
414 |                 return
415 | 
416 |             self.logger.debug("get http msg from logstash: %s", new_msg)
417 | 
418 |             #丢到队列，进行事件提取
419 |             self.put_msg(new_msg)
420 |             self.count += 1
421 |             if self.count % 1000 == 0:
422 |                 print "has put {}".format(self.count)
423 |             return new_msg
424 | 
425 |         except Exception as ex:
426 |             self.logger.error("fail to parse logstash data: %s", ex)
427 | 
428 |     ...其他省略...
429 | 
430 | ```
431 | 
432 | #### 事件提取
433 | --------
434 | 
435 |     **通用模块，无需修改定制**
436 | 
437 | 
438 | ```
439 | #源码：
440 | #   sniffer/nebula_sniffer/nebula_sniffer/main.py
441 | 
442 | class Main(object):
443 |         def event_processor(self):
444 |             ...其他省略...
445 | 
446 |             events = []
447 |             if isinstance(msg, HttpMsg):
448 |                 # 对http信息进行处理，返回一个events（事件列表）
449 |                 events = self.parser.get_events_from_http_msg(msg)
450 | 
451 |             ...其他省略...
452 | ```
453 | 
454 | 
455 | #### 输出事件
456 | --------
457 | 
458 |     **通用模块，无需修改定制**
459 | 


--------------------------------------------------------------------------------
/chapter5/section2.md:
--------------------------------------------------------------------------------
  1 | # 5.2. Sniffer nginx kafka驱动支持
  2 | 
  3 | * 了解 openresty
  4 | * 安装 openresty
  5 | * 测试 openresty 是否正常
  6 | * 包含新的 nginx
  7 |     * 安装 openresty-kafka 模块
  8 |     * 导入 kafka.lua 等依赖
  9 |     * 修改 kafka.lua 中的配置
 10 | * 原机有 nginx    
 11 |     * 安装 lua
 12 |     * 重新编译 nginx 的环境依赖
 13 |     * 导入 kafka.lua 等依赖
 14 |     * 修改 kafka.lua 中的配置
 15 |     * 导入 openresty 依赖
 16 |     * 安装 lua-resty-kafka 模块
 17 | 
 18 | * nginx 的配置
 19 |     * 新的 nginx 配置
 20 |     * 原有 nginx 配置
 21 | * 解释如何收集流量以及计算
 22 | * sniffer 相关配置
 23 | * 排查错误
 24 | 
 25 | ### 了解 openresty
 26 | ----
 27 | OpenResty由 Nginx 核心加很多第三方模块组成，默认集成了Lua开发环境，使得Nginx可以作为一个Web Server使用. Nginx有很多的特性和好处，但是在Nginx上开发成了一个难题，Nginx模块需要用C开发，而且必须符合一系列复杂的规则，最重要的用C开发模块必须要熟悉Nginx的源代码，使得开发者对其望而生畏。为了开发人员方便，所以接下来我们要介绍一种整合了Nginx和lua的框架，那就是OpenResty，它帮我们实现了可以用lua的规范开发，实现各种业务，并且帮我们弄清楚各个模块的编译顺序.
 28 | 
 29 | ### 安装 openresty
 30 | ----
 31 | 安装依赖
 32 | yum install readline-devel pcre-devel openssl-devel gcc
 33 | 
 34 | --1. 下载openresty源码： http://openresty.org/cn/download.html
 35 | $ wget https://openresty.org/download/openresty-1.15.8.1rc1.tar.gz
 36 | 
 37 | 
 38 | -- 2. 解压tar包
 39 | $ tar -zxvf openresty-1.15.8.1rc1.tar.gz
 40 | 
 41 | -- 3. 配置编译选项，可以根据你的实际情况增加、减少相应的模块
 42 | $ ./configure --prefix=/opt/openresty --with-luajit --without-http_redis2_module --with-http_iconv_module
 43 | 
 44 | -- 4. 编译并安装
 45 | $ make
 46 | $ make install
 47 | ### 测试安装是否正常
 48 | -- 1. 修改配置文件如下：
 49 | $ cat /opt/openresty/nginx/conf/nginx.conf
 50 | worker_processes  1;
 51 | error_log logs/error.log info;
 52 | 
 53 | 
 54 | events {
 55 |     worker_connections 1024;
 56 | }
 57 | 
 58 | 
 59 | http {
 60 |     server {
 61 |         listen 8003;
 62 | 
 63 | 
 64 |         location / {
 65 |             content_by_lua 'ngx.say("hello world.")';
 66 |         }
 67 |     }
 68 | }
 69 | 
 70 | 
 71 | -- 2. 启动nginx
 72 | $ /opt/openresty/nginx/sbin/nginx
 73 | 
 74 | 
 75 | -- 3. 检查nginx
 76 | $ curl http://127.0.0.1:8003/
 77 | hello world.
 78 | 
 79 | 出现 hello world 则是正常的
 80 | 
 81 | 
 82 | ### 包含新的 nginx
 83 | 
 84 | #### 安装 openresty-kafka 模块
 85 | 1. 首先在 github 上下载开源的 lua-resty-kafka
 86 | wget https://github.com/doujiang24/lua-resty-kafka/archive/v0.06.tar.gz
 87 | 2. 然后解压
 88 | tar -zxvf v0.06.tar.gz
 89 | 3. 到以下目录 复制
 90 | cd lua-resty-kafka-0.06/lib/resty
 91 | cp -rf kafka /opt/openresty/lualib/resty/
 92 | 这个目录是openresty的安装目录中的resty 请自行判断, 你安装在哪个位置, resty就对应在哪个位置
 93 | 4. 这样就完成了
 94 | 
 95 | #### 导入 kafka.lua 等依赖
 96 | 从nginx得到的流量转化成sniffer适用的格式, 先导入kafka.lua脚本
 97 | kafka.lua 的位置在 sniffer 的 lua 目录中
 98 | 导入的位置是 /opt/openresty/lualib/
 99 | 
100 | #### 修改 kafka.lua 中的配置
101 | 在 kafka 脚本中的 kafka 的 IP 以及端口需要更改
102 | vim kafka.lua, 修改以下地址
103 | 
104 | -- 定义kafka broker地址
105 | local broker_list = {
106 |     { host = "172.x.x.x", port = 9092 },
107 | }
108 | 
109 | ### 原机有 nginx    
110 | 
111 | #### 安装 lua
112 | 实际上在下载 openresty 安装包的时候，里面其实已经依赖了lua了，只需要安装就好了
113 | 到解压 openresty 安装包中
114 | 
115 | cd bundle/LuaJIT-2.1-20190228
116 | 如果版本不一致就是对一下 LuaJIT 的文件夹就好了
117 | 
118 | make
119 | 
120 | make install
121 | 
122 | 安装好的环境依赖如下
123 | ```
124 | ==== Installing LuaJIT 2.1.0-beta3 to /usr/local ====
125 | mkdir -p /usr/local/bin /usr/local/lib /usr/local/include/luajit-2.1 /usr/local/share/man/man1 /usr/local/lib/pkgconfig /usr/local/share/luajit-2.1.0-beta3/jit /usr/local/share/lua/5.1 /usr/local/lib/lua/5.1
126 | cd src && install -m 0755 luajit /usr/local/bin/luajit-2.1.0-beta3
127 | cd src && test -f libluajit.a && install -m 0644 libluajit.a /usr/local/lib/libluajit-5.1.a || :
128 | rm -f /usr/local/lib/libluajit-5.1.so.2.1.0 /usr/local/lib/libluajit-5.1.so /usr/local/lib/libluajit-5.1.so.2
129 | cd src && test -f libluajit.so && \
130 |   install -m 0755 libluajit.so /usr/local/lib/libluajit-5.1.so.2.1.0 && \
131 |   ldconfig -n /usr/local/lib && \
132 |   ln -sf libluajit-5.1.so.2.1.0 /usr/local/lib/libluajit-5.1.so && \
133 |   ln -sf libluajit-5.1.so.2.1.0 /usr/local/lib/libluajit-5.1.so.2 || :
134 | cd etc && install -m 0644 luajit.1 /usr/local/share/man/man1
135 | cd etc && sed -e "s|^prefix=.*|prefix=/usr/local|" -e "s|^multilib=.*|multilib=lib|" luajit.pc > luajit.pc.tmp && \
136 |   install -m 0644 luajit.pc.tmp /usr/local/lib/pkgconfig/luajit.pc && \
137 |   rm -f luajit.pc.tmp
138 | cd src && install -m 0644 lua.h lualib.h lauxlib.h luaconf.h lua.hpp luajit.h /usr/local/include/luajit-2.1
139 | cd src/jit && install -m 0644 bc.lua bcsave.lua dump.lua p.lua v.lua zone.lua dis_x86.lua dis_x64.lua dis_arm.lua dis_arm64.lua dis_arm64be.lua dis_ppc.lua dis_mips.lua dis_mipsel.lua dis_mips64.lua dis_mips64el.lua vmdef.lua /usr/local/share/luajit-2.1.0-beta3/jit
140 | ln -sf luajit-2.1.0-beta3 /usr/local/bin/luajit
141 | ==== Successfully installed LuaJIT 2.1.0-beta3 to /usr/local ====
142 | ```
143 | 
144 | ### 重新编译 nginx 的环境依赖
145 | 首先以下的相关依赖都是根据 nginx 默认的安装目录, 依赖去配置的, 如果有不同的地方, 需要自己配置一下.
146 | ```
147 | nginx 默认配置目录
148 |   nginx path prefix: "/usr/local/nginx"
149 |   nginx binary file: "/usr/local/nginx/sbin/nginx"
150 |   nginx modules path: "/usr/local/nginx/modules"
151 |   nginx configuration prefix: "/usr/local/nginx/conf"
152 |   nginx configuration file: "/usr/local/nginx/conf/nginx.conf"
153 |   nginx pid file: "/usr/local/nginx/logs/nginx.pid"
154 |   nginx error log file: "/usr/local/nginx/logs/error.log"
155 |   nginx http access log file: "/usr/local/nginx/logs/access.log"
156 | ```
157 | 下载 nginx 源码编译
158 | ```
159 | wget http://nginx.org/download/nginx-1.16.0.tar.gz
160 | 
161 | cd nginx-1.16.0
162 | 
163 | export LUAJIT_LIB=/usr/local/openresty/lualib/ export LUAJIT_INC=/usr/local/openresty/luajit/include/luajit-2.1/
164 | 
165 | 重新配置依赖 要注意的是 openresty 之前放置的目录位置 是 /home/root
166 | ./configure --prefix=/usr/local/nginx --with-cc-opt=-O2 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_devel_kit-0.3.1rc1 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/echo-nginx-module-0.61 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/xss-nginx-module-0.06 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_coolkit-0.2 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/set-misc-nginx-module-0.32 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/form-input-nginx-module-0.12 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/encrypted-session-nginx-module-0.08 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/srcache-nginx-module-0.31 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_lua-0.10.14 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_lua_upstream-0.07 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/headers-more-nginx-module-0.33 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/array-var-nginx-module-0.05 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/memc-nginx-module-0.19 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/redis2-nginx-module-0.15 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/redis-nginx-module-0.3.7 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/rds-json-nginx-module-0.15 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/rds-csv-nginx-module-0.09 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_stream_lua-0.0.6 --with-ld-opt=-Wl,-rpath,/usr/local/lib/ --with-stream --with-stream_ssl_module --with-http_ssl_module
167 | 
168 | 参数解释:
169 | –prefix=/usr/local/nginx：nginx安装目录
170 | –add-module=/home/root/openresty-1.15.8.1rc1/bundle/：这个是刚刚下载的openresty安装包
171 | –with-ld-opt=-Wl,-rpath,/usr/local/lib/：lua安装的路径，上面lua安装的时候，默认是这个位
172 | 置的
173 | 
174 | 重新编译
175 | make
176 | 
177 | 但切记不要 make install, 否则就覆盖了原有的nginx.
178 | 但切记不要 make install, 否则就覆盖了原有的nginx.
179 | 但切记不要 make install, 否则就覆盖了原有的nginx.
180 | 
181 | 使用 objs/nginx -V 检验依赖是否正确
182 | 
183 | nginx version: nginx/1.16.0
184 | built by gcc 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
185 | built with OpenSSL 1.1.0g  2 Nov 2017
186 | TLS SNI support enabled
187 | configure arguments: --prefix=/usr/local/nginx --with-cc-opt=-O2 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_devel_kit-0.3.1rc1 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/echo-nginx-module-0.61 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/xss-nginx-module-0.06 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_coolkit-0.2 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/set-misc-nginx-module-0.32 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/form-input-nginx-module-0.12 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/encrypted-session-nginx-module-0.08 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/srcache-nginx-module-0.31 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_lua-0.10.14 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_lua_upstream-0.07 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/headers-more-nginx-module-0.33 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/array-var-nginx-module-0.05 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/memc-nginx-module-0.19 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/redis2-nginx-module-0.15 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/redis-nginx-module-0.3.7 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/rds-json-nginx-module-0.15 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/rds-csv-nginx-module-0.09 --add-module=/home/root/openresty-1.15.8.1rc1/bundle/ngx_stream_lua-0.0.6 --with-ld-opt=-Wl,-rpath,/usr/local/lib/ --with-stream --with-stream_ssl_module --with-http_ssl_module
188 | 
189 | 目前的依赖情况是这样的
190 | 
191 | 备份之前的 nginx
192 | cd /usr/local/nginx/sbin/
193 | cp ./nginx ./nginx.old
194 | 
195 | 覆盖 nginx
196 | cd /home/root/nginx-1.16.0/objs
197 | cp ./nginx  /usr/local/nginx/sbin/
198 | 
199 | 覆盖之后测试
200 | cd /usr/local/nginx/sbin
201 | 
202 | ./nginx -t
203 | 可以得到是否有问题
204 | 
205 | ./nginx 启动
206 | 
207 | 测试 lua 模块
208 | 
209 | cd /usr/local/nginx
210 | 
211 | mkdir lua
212 | cd lua
213 | vim hello.lua
214 | 写入
215 | ngx.log(ngx.ERR,"hello");
216 | 
217 | 在nginx文件中更改
218 | 
219 | location / {
220 |         root /workspace/hexo/public/; index index.html index.htm; access_by_lua_file lua/hello.lua;
221 | }
222 | 重启nginx
223 | 访问nginx监听的端口, 去查看错误日志, 正常的情况下是可以看到以下的结果
224 | 
225 | 2019/05/09 11:39:26 [error] 14225#0: *1 [lua] hello.lua:1: hello, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", host: "127.0.0.1"
226 | 
227 | ```
228 | 
229 | 
230 | ### 导入 kafka.lua 等依赖
231 | ```
232 | cd /usr/local/nginx/
233 | mkdir lua
234 | 
235 | 导入路径为 /usr/local/nginx/lua
236 | 总共 1 个文件
237 | 从 sniffer 的 lua 目录中 导入
238 | ```
239 | 
240 | #### 修改 kafka.lua 中的配置
241 | 在 kafka 脚本中的 kafka 的 IP 以及端口需要更改
242 | vim kafka.lua, 修改以下地址
243 | 
244 | -- 定义kafka broker地址
245 | local broker_list = {
246 |     { host = "172.x.x.x", port = 9092 },
247 | }
248 | ### 导入 openresty 依赖
249 | 
250 | 首先安装 openresty 的目录是 /opt/openresty 所以以下命令都是按这个命令来了, 如果你是安装在其他地方, 请注意一下
251 | 
252 | cp -rf /opt/openresty/lualib/resty  /usr/local/share/luajit-2.1.0-beta3/
253 | 
254 | 如果不行请确认以下 luajit 的版本, 基本不会有啥问题
255 | 
256 | ### 安装 lua-resty-kafka 模块
257 | 
258 | 1. 首先在 github 上下载开源的 lua-resty-kafka
259 | wget https://github.com/doujiang24/lua-resty-kafka/archive/v0.06.tar.gz
260 | 2. 然后解压
261 | tar -zxvf v0.06.tar.gz
262 | 3. 到以下目录 复制
263 | cd lua-resty-kafka-0.06/lib/resty
264 | cp -rf kafka  /usr/local/share/luajit-2.1.0-beta3/resty/
265 | 
266 | 这个目录是 luajit-2.1.0-beta3/resty/ 请自行判断, 你安装在哪个位置, resty就对应在哪个位置
267 | 4. 这样就完成了
268 | 
269 | 
270 | 
271 | ### nginx 的配置
272 | #### 新的 nginx 配置
273 | 1. 接下来需要修改nginx配置来启用 kafka.lua 脚本
274 | 2. 首先需要开启错误日志 找到#error_log  logs/error.log; 将 '#' 删除
275 | 3. 然后需要在server中增加参数
276 |         set $httplog_to_kafka 1;
277 |         set $lualib_path  /opt/openresty/lualib;
278 | 4. 以及将流量转发到 lua 中
279 | if ( $httplog_to_kafka ) {
280 |                 #content_by_lua 'ngx.print("run kafka.lua.")';
281 |                 log_by_lua_file $lualib_path/kafka.lua;
282 | }
283 | 5. 以上是简单版, 纯属解释以下, 上个完整版方便你自己的 nginx 对照
284 | ```
285 | #user  nobody;
286 | worker_processes  1;
287 | 
288 | error_log  logs/error.log;
289 | #error_log  logs/error.log  notice;
290 | #error_log  logs/error.log  info;
291 | 
292 | pid        logs/nginx.pid;
293 | 
294 | 
295 | events {
296 |     worker_connections  1024;
297 | }
298 | 
299 | 
300 | http {
301 |     include       mime.types;
302 |     default_type  application/octet-stream;
303 | 
304 |     #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
305 |     #                  '$status $body_bytes_sent "$http_referer" '
306 |     #                  '"$http_user_agent" "$http_x_forwarded_for"';
307 | 
308 |     #access_log  logs/access.log  main;
309 | 
310 |     sendfile        on;
311 |     #tcp_nopush     on;
312 | 
313 |     #keepalive_timeout  0;
314 |     keepalive_timeout  65;
315 | 
316 |     #gzip  on;
317 | 
318 |     server {
319 |         listen       8003;
320 | 
321 | 
322 |         #charset koi8-r;
323 | 
324 |         #access_log  logs/host.access.log  main;
325 |         set $nginx_metrics_enable 0;
326 |         set $httplog_to_kafka 1;
327 |         set $lualib_path  /opt/openresty/lualib;
328 |         location / {
329 |             if ( $httplog_to_kafka ) {
330 |                 log_by_lua_file $lualib_path/kafka.lua;
331 |             }
332 |         }
333 | 
334 |         #error_page  404              /404.html;
335 | 
336 |         # redirect server error pages to the static page /50x.html
337 |         #
338 |         error_page   500 502 503 504  /50x.html;
339 |         location = /50x.html {
340 |             root   html;
341 |         }
342 | 
343 |         # proxy the PHP scripts to Apache listening on 127.0.0.1:80
344 |         #
345 |         #location ~ \.php$ {
346 |         #    proxy_pass   http://127.0.0.1;
347 |         #}
348 | 
349 |         # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
350 |         #
351 |         #location ~ \.php$ {
352 |         #    root           html;
353 |         #    fastcgi_pass   127.0.0.1:9000;
354 |         #    fastcgi_index  index.php;
355 |         #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
356 |         #    include        fastcgi_params;
357 |         #}
358 | 
359 |         # deny access to .htaccess files, if Apache's document root
360 |         # concurs with nginx's one
361 |         #
362 |         #location ~ /\.ht {
363 |         #    deny  all;
364 |         #}
365 |     }
366 | 
367 | 
368 |     # another virtual host using mix of IP-, name-, and port-based configuration
369 |     #
370 |     #server {
371 |     #    listen       8000;
372 |     #    listen       somename:8080;
373 |     #    server_name  somename  alias  another.alias;
374 | 
375 |     #    location / {
376 |     #        root   html;
377 |     #        index  index.html index.htm;
378 |     #    }
379 |     #}
380 | 
381 | 
382 |     # HTTPS server
383 |     #
384 |     #server {
385 |     #    listen       443 ssl;
386 |     #    server_name  localhost;
387 | 
388 |     #    ssl_certificate      cert.pem;
389 |     #    ssl_certificate_key  cert.key;
390 | 
391 |     #    ssl_session_cache    shared:SSL:1m;
392 |     #    ssl_session_timeout  5m;
393 | 
394 |     #    ssl_ciphers  HIGH:!aNULL:!MD5;
395 |     #    ssl_prefer_server_ciphers  on;
396 | 
397 |     #    location / {
398 |     #        root   html;
399 |     #        index  index.html index.htm;
400 |     #    }
401 |     #}
402 | 
403 | }
404 | ```
405 | 
406 | 
407 | #### 原有 nginx 配置
408 | 关键的地方是
409 | ```
410 | log_by_lua_file lua/kafka.lua;
411 | ```
412 | 运行的目录是 lua 中的 kafka.lua
413 | 导入的模块位置是 /usr/local/share/luajit-2.1.0-beta3/resty/
414 | 
415 | ```
416 | 
417 | #user  nobody;
418 | worker_processes  1;
419 | 
420 | #error_log  logs/error.log;
421 | #error_log  logs/error.log  notice;
422 | #error_log  logs/error.log  info;
423 | 
424 | #pid        logs/nginx.pid;
425 | 
426 | 
427 | events {
428 |     worker_connections  1024;
429 | }
430 | 
431 | 
432 | http {
433 |     include       mime.types;
434 |     default_type  application/octet-stream;
435 | 
436 |     #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
437 |     #                  '$status $body_bytes_sent "$http_referer" '
438 |     #                  '"$http_user_agent" "$http_x_forwarded_for"';
439 | 
440 |     #access_log  logs/access.log  main;
441 | 
442 |     sendfile        on;
443 |     #tcp_nopush     on;
444 | 
445 |     #keepalive_timeout  0;
446 |     keepalive_timeout  65;
447 | 
448 |     #gzip  on;
449 | 
450 |     server {
451 |         listen       80;
452 |         server_name  localhost;
453 | 
454 |         #charset koi8-r;
455 | 
456 |         #access_log  logs/host.access.log  main;
457 | 
458 |         location / {
459 |             root   html;
460 |             index  index.html index.htm;
461 |             if ( $httplog_to_kafka ) {
462 |                 log_by_lua_file lua/kafka.lua;
463 |             }
464 |         }
465 | 
466 |         #error_page  404              /404.html;
467 | 
468 |         # redirect server error pages to the static page /50x.html
469 |         #
470 |         error_page   500 502 503 504  /50x.html;
471 |         location = /50x.html {
472 |             root   html;
473 |         }
474 | 
475 |         # proxy the PHP scripts to Apache listening on 127.0.0.1:80
476 |         #
477 |         #location ~ \.php$ {
478 |         #    proxy_pass   http://127.0.0.1;
479 |         #}
480 | 
481 |         # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
482 |         #
483 |         #location ~ \.php$ {
484 |         #    root           html;
485 |         #    fastcgi_pass   127.0.0.1:9000;
486 |         #    fastcgi_index  index.php;
487 |         #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
488 |         #    include        fastcgi_params;
489 |         #}
490 | 
491 |         # deny access to .htaccess files, if Apache's document root
492 |         # concurs with nginx's one
493 |         #
494 |         #location ~ /\.ht {
495 |         #    deny  all;
496 |         #}
497 |     }
498 | 
499 | 
500 |     # another virtual host using mix of IP-, name-, and port-based configuration
501 |     #
502 |     #server {
503 |     #    listen       8000;
504 |     #    listen       somename:8080;
505 |     #    server_name  somename  alias  another.alias;
506 | 
507 |     #    location / {
508 |     #        root   html;
509 |     #        index  index.html index.htm;
510 |     #    }
511 |     #}
512 | 
513 | 
514 |     # HTTPS server
515 |     #
516 |     #server {
517 |     #    listen       443 ssl;
518 |     #    server_name  localhost;
519 | 
520 |     #    ssl_certificate      cert.pem;
521 |     #    ssl_certificate_key  cert.key;
522 | 
523 |     #    ssl_session_cache    shared:SSL:1m;
524 |     #    ssl_session_timeout  5m;
525 | 
526 |     #    ssl_ciphers  HIGH:!aNULL:!MD5;
527 |     #    ssl_prefer_server_ciphers  on;
528 | 
529 |     #    location / {
530 |     #        root   html;
531 |     #        index  index.html index.htm;
532 |     #    }
533 |     #}
534 | 
535 | }
536 | ```
537 | 
538 | 
539 | ### 解释如何收集流量以及计算
540 | 这部分相对比较简单了, 就是解释一下流量是如何流入到计算出风险的, 首先流量是先经过 nginx 的, 而 openresty 是一个扩展模块, 支持 lua 脚本, 这时候流量会经过我们设定好的 lua 脚本, lua脚本通过设定的分析, 归类为一个结构体, 当然这个结构体是为了适应 sniffer 所支持的格式. 具体可以看 kafka.lua 这个文件中的代码, 也可以看依赖的其他 lua 文件. kafka.lua 这个文件中, 最终把流量解析成一定的格式发送到了 kafka, 然后 sniffer 通过 docker-compose 配置的 kafka,  sniffer 中的 kafka 驱动得到启动, 从 kafka 获得这些流量, 将这些流量处理之后, 发送到了计算模块.
541 | 
542 | ### sniffer 相关配置
543 | 
544 | ```
545 | version: "2"
546 | 
547 | services:
548 |  nebula_sniffer:
549 |   image : threathunterx/nebula_sniffer:latest
550 |   container_name: nebula_sniffer_9003
551 |   network_mode: "host"
552 |   volumes:
553 |    - ./logs:/home/nebula_sniffer/logs
554 |   environment:
555 |    - DEBUG=True
556 |    - REDIS_HOST=127.0.0.1
557 |    - REDIS_PORT=36379
558 |    - NEBULA_HOST=127.0.0.1
559 |    - NEBULA_PORT=9003
560 |    - SOURCES=kafka
561 |    #bro
562 |    - DRIVER_INTERFACE=eth0
563 |    - DRIVER_PORT=80,8080,9003
564 |    - BRO_PORT=47000
565 |    #kafka
566 |    - BOOTSTRAP_SERVERS=127.0.0.1:9092
567 |    - TOPICS=nebula_nginx_lua
568 |    - GROUP_ID=nebula
569 | ```
570 | 
571 | 其实关键的地方在于
572 | 1. BOOTSTRAP_SERVERS=kafka 的 IP 和端口
573 | 2. SOURCES=kafka 那么 sniffer 是按 kafka 驱动来获得流量的
574 | 
575 | 
576 | ### 排查错误
577 | 排查错误, 主要的错误在于 nginx 与 openresty lua 的程序上, 所以需要在 nginx 上开启 error 日志记录, 上面的教程中有写, 非常的简单. 如果流量进来, 通过 lua 脚本运行有什么错误, 都会在 error.log 日志中体现, 如果没有错误, 那流量在 nginx 这端基本上没有什么问题.
578 | 
579 | 如果在 snffer 还是没办法获取到流量, 那么还是要通过排查 kafka sniffer 方面的日志来解决.
580 | 


--------------------------------------------------------------------------------
/chapter5/section3.md:
--------------------------------------------------------------------------------
  1 | # 5.3. Sniffer测试以及debug
  2 | 
  3 | 
  4 | * bro模式
  5 |     * 成功的启动日志
  6 |     * bro的排查流程
  7 | * kakfa模式
  8 |     * 成功启动的日志
  9 |     * kafka的排查流程
 10 | * 配置验证程序
 11 |     * 安装方式
 12 |     * 可以检查的事项
 13 |     * 结果验证
 14 | 
 15 | Bro 模式
 16 | 
 17 | #### 成功的启动日志
 18 | 正确的启动提示 bro 模式
 19 | ```
 20 | >docker-compose up
 21 | /usr/lib/python2.7/site-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.22) or chardet (2.2.1) doesn't match a supported version!
 22 | RequestsDependencyWarning)
 23 | Creating nebula_sniffer_9003 ... done
 24 | Attaching to nebula_sniffer_9003
 25 | nebula_sniffer_9003 | No handlers could be found for logger "config"
 26 | nebula_sniffer_9003 | global conf path using the local path /home/nebula_sniffer/conf
 27 | nebula_sniffer_9003 | sniffer conf path using the local path /home/nebula_sniffer/conf
 28 | nebula_sniffer_9003 | creat logger nebula.produce
 29 | nebula_sniffer_9003 | creat logger nebula.sniffer
 30 | nebula_sniffer_9003 | logging debug level is 'debug'
 31 | nebula_sniffer_9003 | 2019-06-19 17:05:16: starting sniffer
 32 | nebula_sniffer_9003 | 2019-06-19 17:05:16: start to init config
 33 | nebula_sniffer_9003 | 2019-06-19 17:05:17: successfully loaded the file config from /home/nebula_sniffer/conf/sniffer.conf
 34 | nebula_sniffer_9003 | WebLoader: web_config_loader, sniffer.web_config.config_url:http://127.0.0.1:9001/platform/config, params:{'auth': '1ac1a08630d68a2fdd0b719d5c07f915'}
 35 | nebula_sniffer_9003 | 2019-06-19 17:05:17: successfully loaded the web config from http://127.0.0.1:9001/platform/config
 36 | nebula_sniffer_9003 | 2019-06-19 17:05:17: successfully loaded config
 37 | nebula_sniffer_9003 | 2019-06-19 17:05:17: start to init sentry
 38 | nebula_sniffer_9003 | 2019-06-19 17:05:18: start to init Produce
 39 | nebula_sniffer_9003 | 2019-06-19 17:05:18: start to init redis
 40 | nebula_sniffer_9003 | 2019-06-19 17:05:18: successfully init redis[host=127.0.0.1,port=36379,password=]
 41 | nebula_sniffer_9003 | 2019-06-19 17:05:18: start to init metrics
 42 | nebula_sniffer_9003 | 2019-06-19 17:05:18: successfully initializing metrics with config {'redis': {'host': '127.0.0.1', 'type': 'redis', 'port': 36379}, 'server': 'redis'}
 43 | nebula_sniffer_9003 | 2019-06-19 17:05:18: successfully init auto parsers, event from: http://127.0.0.1:9001/platform/event_models, parser from:
 44 | nebula_sniffer_9003 | 2019-06-19 17:05:18: start to processing
 45 | nebula_sniffer_9003 | creat logger sniffer.httpmsg
 46 | nebula_sniffer_9003 | creat logger sniffer.parser.defaultparser
 47 | nebula_sniffer_9003 | creat logger sniffer.driver.bro.1
 48 | nebula_sniffer_9003 | *** failed to set config parameter work-stealing.moderate-sleep-duration-us: invalid name
 49 | nebula_sniffer_9003 | *** failed to set config parameter work-stealing.relaxed-sleep-duration-us: invalid name
 50 | nebula_sniffer_9003 | listening on eth0
 51 | nebula_sniffer_9003 |
 52 | nebula_sniffer_9003 | bro server start on 127.0.0.1:47001
 53 | nebula_sniffer_9003 | receive update_staticresourcesuffix=gif,png,ico,css,js,csv,txt,jpeg,jpg,woff,ttf
 54 | nebula_sniffer_9003 | receive update_filteredhosts=
 55 | nebula_sniffer_9003 | receive update_filteredurls=
 56 | nebula_sniffer_9003 | receive update_filteredservers=
 57 | ```
 58 | #### bro的排查流程
 59 | 启动日志有一些 warning 没啥关系, 这些是 bro 引擎中的
 60 | 可以在启动提示中首先确认
 61 | 
 62 | 
 63 | 1. 启动的模式为 debug 那么在调试模式下, 有很一些更多额外的日志输出到 logs 文件夹中的日志, 当然日志占用的空间也会越来越多, 比较需要关注的是 sniffer.parser.defaultparser 当你开debug模式的时候, 流量的详情会写入其中, 需要注意的是, 流量巨大的情况记得关了debug模式, 因为日志会非常巨大
 64 | 2. 主要看创建了 nebula.produce nebula.sniffer sniffer.httpmsg sniffer.parser.defaultparser sniffer.driver.bro.1 几个日志, 日志对应的目录是 /home/nebula_sniffer/logs/ 中, 可以到里面查看详情
 65 | 3. 查看redis链接情况, redis的作用是将sniffer捕获的流量输出到nebula中, 那么 redis 的链接情况就很重要, 如果没有链接成功, 那么系统就不会计算流量是否是威胁流量, redis 成功链接的日志标志性为 successfully init redis[host=127.0.0.1,port=36379,password=] 这一条
 66 | 4. 查看sniffer与neubla的链接情况, nebula中有很多配置是实时加载到了sniffer, 也就是你在nebula的web系统中做出了修改, 那么sniffer的配置同样及时刷新, 那么sniffer链接到nebula就很重要, sniffer 链接 nebula 成功链接的日志标志性为 successfully loaded the web config from http://127.0.0.1:9001/platform/config
 67 | 5. 更加自由高级的解析流量, Produce 这个功能, 当然如果是刚开始使用nebula, 没必要去深究这个功能, 他的启动日志如下:nebula_sniffer_9003 | 2019-06-19 17:05:18: start to processing, 在 logs 中有记录, 可以查看它是否有错误.kakfa模式
 68 | 
 69 | kafka 模式
 70 | 
 71 | #### 成功启动的日志
 72 | 成功链接kafka模式的启动日志
 73 | ```
 74 | [root@test-02 /home/wei/sniffer]# docker-compose up
 75 | /usr/lib/python2.7/site-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.22) or chardet (2.2.1) doesn't match a supported version!
 76 | RequestsDependencyWarning)
 77 | Starting nebula_sniffer_9003 ... done
 78 | Attaching to nebula_sniffer_9003
 79 | nebula_sniffer_9003 | No handlers could be found for logger "config"
 80 | nebula_sniffer_9003 | global conf path using the local path /home/nebula_sniffer/conf
 81 | nebula_sniffer_9003 | sniffer conf path using the local path /home/nebula_sniffer/conf
 82 | nebula_sniffer_9003 | creat logger nebula.produce
 83 | nebula_sniffer_9003 | creat logger nebula.sniffer
 84 | nebula_sniffer_9003 | logging debug level is 'debug'
 85 | nebula_sniffer_9003 | 2019-06-19 17:34:35: starting sniffer
 86 | nebula_sniffer_9003 | 2019-06-19 17:34:35: start to init config
 87 | nebula_sniffer_9003 | 2019-06-19 17:34:35: successfully loaded the file config from /home/nebula_sniffer/conf/sniffer.conf
 88 | nebula_sniffer_9003 | WebLoader: web_config_loader, sniffer.web_config.config_url:http://127.0.0.1:9001/platform/config, params:{'auth': '1ac1a08630d68a2fdd0b719d5c07f915'}
 89 | nebula_sniffer_9003 | 2019-06-19 17:34:35: successfully loaded the web config from http://127.0.0.1:9001/platform/config
 90 | nebula_sniffer_9003 | 2019-06-19 17:34:35: successfully loaded config
 91 | nebula_sniffer_9003 | 2019-06-19 17:34:35: start to init sentry
 92 | nebula_sniffer_9003 | 2019-06-19 17:34:35: start to init Produce
 93 | nebula_sniffer_9003 | 2019-06-19 17:34:35: start to init redis
 94 | nebula_sniffer_9003 | 2019-06-19 17:34:35: successfully init redis[host=127.0.0.1,port=36379,password=]
 95 | nebula_sniffer_9003 | 2019-06-19 17:34:35: start to init metrics
 96 | nebula_sniffer_9003 | 2019-06-19 17:34:35: successfully initializing metrics with config {'redis': {'host': '127.0.0.1', 'type': 'redis', 'port': 36379}, 'server': 'redis'}
 97 | nebula_sniffer_9003 | 2019-06-19 17:34:35: successfully init auto parsers, event from: http://127.0.0.1:9001/platform/event_models, parser from:
 98 | nebula_sniffer_9003 | 2019-06-19 17:34:35: start to processing
 99 | nebula_sniffer_9003 | creat logger sniffer.httpmsg
100 | nebula_sniffer_9003 | creat logger sniffer.parser.defaultparser
101 | nebula_sniffer_9003 | creat logger sniffer.driver.kafka
102 | ```
103 | #### kafka的排查流程
104 | 无法链接kafka, 解决办法
105 | 
106 | 
107 | 1. 自行测试好 kafka 的服务, 检查kafka的配置, 需要注意的是 sniffer 需要用的是 nebula_nginx_lua 这个频道, groupid 是 nebula
108 | 2. 在nebula的 conf 配置中修改sniffer的配置
109 | 
110 | vim nebula_sniffer/conf/sniffer.conf
111 | 修改第一行
112 | sources: [kafka] 为 sources: [default]
113 | 这样使用了 bro 默认的抓取流量引擎
114 | 重新启动容器即可
115 | 
116 | 1. kafka 的排查链接是否成功的一些方法
117 | 
118 | 首先下载好 kafka 装好 java 环境
119 | 下载
120 | wget http://apache.lauf-forum.at/kafka/2.2.1/kafka_2.11-2.2.1.tgz
121 | 解压
122 | > tar -xzf kafka_2.11-1.1.0.tgz
123 | > cd kafka_2.11-1.1.0
124 | 在配置中指定启动的端口以及 zookeeper 的端口
125 | 默认是 9092
126 | 在kafka文件夹目录消费消息
127 | > bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic nebula_nginx_lua —from-beginning
128 | 如果有消息, 那么就是可以正常链接的
129 | 
130 | 1. 启动失败, 无法链接 kafka 会出现以下日志
131 | 
132 | ```
133 | [root@test-02 /home/wei/sniffer]# docker-compose up
134 | /usr/lib/python2.7/site-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.22) or chardet (2.2.1) doesn't match a supported version!
135 | RequestsDependencyWarning)
136 | Creating nebula_sniffer ... done
137 | Attaching to nebula_sniffer
138 | nebula_sniffer | No handlers could be found for logger "config"
139 | nebula_sniffer | using the local path /home/nebula_sniffer/conf
140 | nebula_sniffer | using the local path /home/nebula_sniffer/conf
141 | nebula_sniffer | creat logger nebula.produce
142 | nebula_sniffer | creat logger nebula.sniffer
143 | nebula_sniffer | WebLoader: web_config_loader, sniffer.web_config.config_url:http://127.0.0.1:9001/platform/config, params:{'auth': '1ac1a08630d68a2fdd0b719d5c07f915'}
144 | nebula_sniffer | creat logger sniffer.httpmsg
145 | nebula_sniffer | creat logger sniffer.parser.defaultparser
146 | nebula_sniffer | creat logger sniffer.driver.kafka
147 | nebula_sniffer | Process Process-1:
148 | nebula_sniffer | Traceback (most recent call last):
149 | nebula_sniffer | File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
150 | nebula_sniffer | self.run()
151 | nebula_sniffer | File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
152 | nebula_sniffer | self._target(*self._args, **self._kwargs)
153 | nebula_sniffer | File "/home/nebula_sniffer/sniffer.py", line 131, in run_task
154 | nebula_sniffer | main.start()
155 | nebula_sniffer | File "/home/nebula_sniffer/nebula_sniffer/main.py", line 71, in start
156 | nebula_sniffer | self.driver.start()
157 | nebula_sniffer | File "/home/nebula_sniffer/nebula_sniffer/drivers/kafkadriver.py", line 138, in start
158 | nebula_sniffer | self.consumer = KafkaConsumer(self.topics,**self.config)
159 | nebula_sniffer | File "/usr/lib/python2.7/site-packages/kafka/consumer/group.py", line 353, in __init__
160 | nebula_sniffer | self._client = KafkaClient(metrics=self._metrics, **self.config)
161 | nebula_sniffer | File "/usr/lib/python2.7/site-packages/kafka/client_async.py", line 239, in __init__
162 | nebula_sniffer | self.config['api_version'] = self.check_version(timeout=check_timeout)
163 | nebula_sniffer | File "/usr/lib/python2.7/site-packages/kafka/client_async.py", line 865, in check_version
164 | nebula_sniffer | raise Errors.NoBrokersAvailable()
165 | nebula_sniffer | NoBrokersAvailable: NoBrokersAvailable
166 | nebula_sniffer | creat logger main.client-any-1
167 | nebula_sniffer | terminating
168 | nebula_sniffer exited with code 0
169 | ```
170 | 
171 | 
172 | 配置验证程序
173 | 
174 | #### 安装方式
175 | 
176 | 1. 将 sniffer_nebula_test.py requirements.txt 放置在 sniffer 文件夹中
177 | 2. 安装依赖:
178 | ```
179 | pip3 install -r requirements.txt
180 | 
181 | requirements.txt 内容
182 | certifi==2019.6.16
183 | chardet==3.0.4
184 | idna==2.8
185 | kafka==1.3.5
186 | numpy==1.16.4
187 | pandas==0.24.2
188 | python-dateutil==2.8.0
189 | pytz==2019.1
190 | PyYAML==5.1.1
191 | redis==3.2.1
192 | requests==2.22.0
193 | six==1.12.0
194 | urllib3==1.25.3
195 | ```
196 | 1. python3 sniffer_nebula_test.py
197 | 2. 输入监控的网址接口or IP地址端口, 输入的ip端口是 openresty nginx 监控的流量到lua脚本, lua脚本到kafka)
198 | 
199 | #### 可以检查的事项
200 | 
201 | 
202 | 1. 检查配置文件
203 | 2. 检查是否打开了debug模式, debug会生成更多的日志, 占据硬盘空间, 需要注意使用
204 | 3. 抓取流量的源(sources mode)
205 | 4. 检查nebula_web是否可以访问, 这关系到sniffer拉取nebula的配置, 以及登录后的操作
206 | 5. 检查redis是否可以读取, 检查是否可以连接, 并且写入读取数据
207 | 6. 如果 sources mode 是 kafka, 检查是否可以连接, 并且写入读取数据
208 | 7. 如果 sources mode 是 kafka, 检查openresty nginx 是否工作, 把数据通过lua脚本输出到 kafka
209 | 8. 如果 sources mode 是 bro(default), 目前的程序无法轻易检测到, 需要到日志中分析启动情况
210 | 
211 | #### 结果验证
212 | 结果验证如下, 这样的结果是正常的. 如果有错误, 会有相应的提示, 改动配置然后再次检查就可以了
213 | ![ZexLNR.png](https://s2.ax1x.com/2019/06/26/ZexLNR.png)
214 | 
215 | 
216 | 
217 | 


--------------------------------------------------------------------------------
/styles/website.css:
--------------------------------------------------------------------------------
1 | /* CSS for website */
2 | .video-js {
3 |     width:100%;
4 |     height: 100%;
5 | }


--------------------------------------------------------------------------------