├── log.png
├── LICENSE
└── README.md


/log.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xtaci/log_analysis/HEAD/log.png


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2016 xtaci
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Practical Log Analysis
 2 | 
 3 | ## scenario
 4 | ![scenario](log.png)
 5 | 
 6 | tested on the versions below:
 7 | * apache-hive-2.1.0-bin.tar.gz
 8 | * elasticsearch-5.0.1.tar.gz
 9 | * kafka_2.11-0.10.1.0.tgz
10 | * kibana-5.0.1-linux-x86_64.tar.gz
11 | * logstash-5.0.0.tar.gz
12 | * mysql-connector-java-5.1.40.tar.gz
13 | * spark-1.6.3-bin-hadoop2-without-hive.tgz
14 | * hadoop-2.6.5.tar.gz
15 | 
16 | ## hadoop
17 | * http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html  -- 单节点hdfs部署
18 | * https://github.com/chrislusf/gleam -- Fast, efficient, and scalable distributed map/reduce system written in Go and LuaJIT
19 | 
20 | ## kafka
21 | * https://kafka.apache.org/documentation   --kafka官方文档
22 | * https://www.elastic.co/blog/just-enough-kafka-for-the-elastic-stack-part1  -- es和kafka的最佳实践
23 | * https://www.elastic.co/blog/just-enough-kafka-for-the-elastic-stack-part2
24 | * https://github.com/travisjeffery/jocko   --golang的kafka复刻
25 | * https://github.com/oldratlee/translations/blob/master/log-what-every-software-engineer-should-know-about-real-time-datas-unifying/README.md --经典
26 | * https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations  kafka论文和ppt
27 | * https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
28 | * https://www.youtube.com/watch?v=77huw-31oZg
29 | * https://www.youtube.com/watch?v=k_Y5ieFHGbs
30 | * https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
31 | 
32 | ## logstash
33 | * https://www.elastic.co/guide/en/logstash/current/index.html -- Centralize, Transform & Stash Your Data
34 | * https://github.com/influxdata/telegraf -- The plugin-driven server agent for collecting & reporting metrics.
35 | * https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html -- logstash部署
36 | 
37 | ## hive
38 | * https://cwiki.apache.org/confluence/display/Hive/GettingStarted -- hive配置
39 | * https://cwiki.apache.org/confluence/display/Hive/LanguageManual --hive的SQL手册
40 | * https://github.com/xtaci/json2hive -- 通过json构造hive schema
41 | 
42 | ## metastore
43 | * https://hub.docker.com/_/mysql/  -- 可以给metastore用的mysql镜像
44 | * https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf   -- metastore结构
45 | * https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin -- metastore配置
46 | * https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool -- schema创建
47 | 
48 | ## spark
49 | * https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started -- hive和spark集成
50 | * http://spark.apache.org/docs/latest/spark-standalone.html -- spark配置
51 | * http://mangocool.com/1467770109867.html -- hive on spark的版本问题
52 | * http://www.csdn.net/article/2015-04-24/2824545 -- Intel李锐：Hive on Spark解析
53 | 
54 | ## elasticsearch
55 | * https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html --es和hive的集成
56 | * https://www.elastic.co/blog/found-sizing-elasticsearch -- es索引规划，容量规划
57 | * https://www.elastic.co/blog/performance-indexing-2-0 -- es索引
58 | * https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up --es内部原理
59 | * https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html -- index模板
60 | * https://www.elastic.co/blog/found-elasticsearch-in-production --es生产部署
61 | * https://www.smashingmagazine.com/2012/05/stop-redesigning-start-tuning-your-site/
62 | * https://www.elastic.co/blog/customizing-your-document-routing -- es读取优化
63 | * https://www.elastic.co/videos/big-data-search-and-analytics
64 | * https://www.elastic.co/blog/disk-based-field-data-a-k-a-doc-values
65 | * https://aphyr.com/posts/288-the-network-is-reliable
66 | * https://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions
67 | * https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html --mapping重建
68 | * http://www.cnblogs.com/Creator/p/3722408.html --mapping重建
69 | * http://wzktravel.github.io/2016/05/11/elasticsearch-reindex/  --mapping重建
70 | 
71 | ## s3
72 | * https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html  --es数据备份
73 | * https://www.elastic.co/guide/en/elasticsearch/plugins/5.0/repository-s3.html --es备份到s3的插件
74 | * https://github.com/minio/minio --s3兼容存储
75 | 
76 | ## mongodb:
77 | * https://github.com/mongodb/mongo-hadoop 
78 | * https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage -- hive和mongodb的集成
79 | * https://docs.mongodb.com/manual/tutorial/deploy-replica-set/ -- mongodb复制集部署
80 | * https://www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-1-introduction-setup -- mongodb和spark/hive集成
81 | * https://www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-2-hive-example
82 | * https://www.mongodb.com/blog/post/using-mongodb-hadoop-spark-part-3-spark-example-key-takeaways
83 | 
84 | ## application library
85 | * https://github.com/gliderlabs/logspout -- 采集docker容器的标准输出
86 | * https://github.com/Sirupsen/logrus -- 结构化日志输出
87 | 


--------------------------------------------------------------------------------