├── columns ├── starrocks │ └── StarRocks技术内幕.md ├── flink │ ├── Flink实战系列.md │ ├── Flink 相关论文.md │ ├── Flink开源项目汇总.md │ ├── Flink零基础入门.md │ ├── Flink进阶教程.md │ ├── Flink架构、源码分析专栏.md │ └── Apache Flink 漫谈系列.md ├── kudu │ ├── Kudu原理论文.md │ └── 网易云Kudu技术文章.md ├── spark │ └── Apache Spark的设计与实现.md ├── opensource │ └── 数仓相关开源项目汇总.md ├── doris │ ├── Doris最佳实践.md │ └── Doris全面解析.md ├── presto │ ├── Presto资料汇总、会议资讯专栏.md │ ├── Presto最佳实践、调优、踩坑专栏.md │ └── Presto架构、源码分析专栏.md └── hive │ └── hive教程.md └── README.md /columns/starrocks/StarRocks技术内幕.md: -------------------------------------------------------------------------------- 1 | # StarRocks技术内幕 2 | 3 | - [多表物化视图的设计与实现](https://blog.csdn.net/StarRocks/article/details/127863764) 2022-11 -------------------------------------------------------------------------------- /columns/flink/Flink实战系列.md: -------------------------------------------------------------------------------- 1 | # Flink实战系列 2 | 3 | 4 | 5 | - [从零构建Flink SQL计算平台 - 1平台搭建概述](https://www.cnblogs.com/pyx0/p/12348114.html) 6 | - [从零构建Flink SQL计算平台 - 2实现作业提交](https://www.cnblogs.com/pyx0/p/12387509.html) 7 | - [从零构建Flink SQL计算平台 - 3实现校验和调试](https://www.cnblogs.com/pyx0/p/12441367.html) 8 | 9 | 10 | - [网易游戏基于 Flink 的流式 ETL 建设](http://www.whitewood.me/2020/12/20/%E7%BD%91%E6%98%93%E6%B8%B8%E6%88%8F%E5%9F%BA%E4%BA%8E-Flink-%E7%9A%84%E6%B5%81%E5%BC%8F-ETL-%E5%BB%BA%E8%AE%BE/) 2020-12 11 | 12 | 13 | 14 | 15 | -------------------------------------------------------------------------------- /columns/flink/Flink 相关论文.md: -------------------------------------------------------------------------------- 1 | # Flink 相关论文 2 | 3 | - [Distributed Snapshots: Determining Global States of Distributed Systems ](https://www.microsoft.com/en-us/research/uploads/prod/2016/12/Determining-Global-States-of-a-Distributed-System.pdf?ranMID=24542&ranEAID=J84DHJLQkR4&ranSiteID=J84DHJLQkR4-mVoVymFnAblBx3zwyf98Pw&epi=J84DHJLQkR4-mVoVymFnAblBx3zwyf98Pw&irgwc=1&OCID=AID2000142_aff_7593_1243925&tduid=%28ir__1hs2uuow6wkfq3oxkk0sohzzwm2xpc33lxd0o6g200%29%287593%29%281243925%29%28J84DHJLQkR4-mVoVymFnAblBx3zwyf98Pw%29%28%29&irclickid=_1hs2uuow6wkfq3oxkk0sohzzwm2xpc33lxd0o6g200) 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /columns/kudu/Kudu原理论文.md: -------------------------------------------------------------------------------- 1 | # Kudu 原理 2 | 3 | 4 | - [Apache Kudu Read & Write Paths](https://blog.cloudera.com/apache-kudu-read-write-paths/) 2017-04 5 | - [Kudu存储原理](https://github.com/collabH/repository/blob/master/bigdata/olap/kudu/Kudu%E5%8E%9F%E7%90%86%E5%88%86%E6%9E%90.md) 6 | 7 | 8 | 9 | # Kudu 相关论文 10 | 11 | 12 | - [LSM Tree](https://www.cs.umb.edu/~poneil/lsmtree.pdf) 13 | - [Kudu论文解读: Fast Analytics on Fast Data (上)](https://zhuanlan.zhihu.com/p/137238298) 2020-04 14 | - [Kudu论文解读: Fast Analytics on Fast Data (下)](https://zhuanlan.zhihu.com/p/137243163) 2020-04 15 | -------------------------------------------------------------------------------- /columns/kudu/网易云Kudu技术文章.md: -------------------------------------------------------------------------------- 1 | # 网易云Kudu技术文章 2 | 3 | 4 | - [【大数据之数据仓库】选型流水记](https://sq.sf.163.com/blog/article/174995941069086720) 2018-07 5 | - [【大数据之数据仓库】kudu客户端java驱动缺陷](https://sq.sf.163.com/blog/article/169595475122905088) 2018-06 6 | - [【大数据之数据仓库】kudu性能测试报告分析](https://sq.sf.163.com/blog/article/174995336187535360) 2018-07 7 | 8 | - [分布式存储系统 Kudu 与 HBase 的简要分析与对比](https://sq.163yun.com/blog/article/198870236065431552) 2018-11 9 | 10 | - [【kudu pk parquet】runtime filter实践](https://sq.sf.163.com/blog/article/174993565549518848) 2018-07 11 | - [【kudu pk parquet】TPC-H Query2对比解析](https://sq.sf.163.com/blog/article/175000124925075456) 2018-07 12 | 13 | -------------------------------------------------------------------------------- /columns/flink/Flink开源项目汇总.md: -------------------------------------------------------------------------------- 1 | # Flink开源项目汇总 2 | 3 | 4 | - [flink-sql-gateway](https://github.com/ververica/flink-sql-gateway#readme) 5 | 6 | - [flink-jdbc-driver](https://github.com/ververica/flink-jdbc-driver) 7 | 8 | - [flinkStreamSQL](https://github.com/DTStack/flinkStreamSQL) 9 | 10 | - [flinkx](https://github.com/DTStack/flinkx) 11 | 12 | - [waterdrop](https://github.com/InterestingLab/waterdrop) 13 | 14 | - [streamx](https://github.com/streamxhub/streamx) 15 | 16 | - [flink-streaming-platform-web](https://github.com/zhp8341/flink-streaming-platform-web) 17 | 18 | - [dlink](https://github.com/DataLinkDC/dlink) 19 | 20 | - [plink](https://github.com/hairless/plink) 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /columns/spark/Apache Spark的设计与实现.md: -------------------------------------------------------------------------------- 1 | # Apache Spark的设计与实现 2 | 3 | > Spark Version: 1.0.2 Doc Version: 1.0.2.0 4 | 5 | 6 | - [介绍](https://spark-internals.books.yourtion.com/index.html) 7 | - [概览](https://spark-internals.books.yourtion.com/markdown/1-Overview.html) 8 | - [Job 逻辑执行图](https://spark-internals.books.yourtion.com/markdown/2-JobLogicalPlan.html) 9 | - [Job 物理执行图](https://spark-internals.books.yourtion.com/markdown/3-JobPhysicalPlan.html) 10 | - [Shuffle 过程](https://spark-internals.books.yourtion.com/markdown/4-shuffleDetails.html) 11 | - [架构](https://spark-internals.books.yourtion.com/markdown/5-Architecture.html) 12 | - [Cache 和 Checkpoint](https://spark-internals.books.yourtion.com/markdown/6-CacheAndCheckpoint.html) 13 | - [Broadcast](https://spark-internals.books.yourtion.com/markdown/7-Broadcast.html) 14 | 15 | - [SparkInternals - github](https://github.com/JerryLead/SparkInternals) 16 | 17 | 18 | 19 | -------------------------------------------------------------------------------- /columns/flink/Flink零基础入门.md: -------------------------------------------------------------------------------- 1 | # Flink零基础入门 2 | 3 | 时间:2019 4 | 来源:Ververica中文社区 5 | 6 | - [Apache Flink 零基础入门(一&二):基础概念解析](https://ververica.cn/developers/flink-basic-tutorial-1-basic-concept/) 7 | - [Apache Flink 零基础入门(三):开发环境搭建和应用的配置、部署及运行](https://ververica.cn/developers/flink-basic-tutorial-1-environmental-construction/) 8 | - [Apache Flink 零基础入门(四):DataStream API 编程](https://ververica.cn/developers/apache-flink-basic-zero-iii-datastream-api-programming/) 9 | - [Apache Flink 零基础入门(五):客户端操作](https://ververica.cn/developers/apache-flink-zero-basic-introduction-iv-client-operation/) 10 | - [Apache Flink 零基础入门(六):Flink Time & Window 解析](https://ververica.cn/developers/time-window/) 11 | - [Apache Flink 零基础入门(七):状态管理及容错机制](https://ververica.cn/developers/state-management/) 12 | - [Apache Flink 零基础入门(八):Table API 编程](https://ververica.cn/developers/table-api-programming/) 13 | - [Apache Flink 零基础入门(九):Flink SQL 编程实践](https://ververica.cn/developers/flink-sql-programming-practice/) 14 | 15 | 16 | 17 | -------------------------------------------------------------------------------- /columns/opensource/数仓相关开源项目汇总.md: -------------------------------------------------------------------------------- 1 | # 数仓相关开源项目汇总 2 | 3 | 4 | ## 元数据、数据治理 5 | - [atlas](https://github.com/apache/atlas) 6 | - [datahub](https://github.com/linkedin/datahub) 7 | 8 | 9 | ## 数据集成 10 | - [DataX](https://github.com/alibaba/DataX) 11 | - [datax-web](https://github.com/WeiYe-Jing/datax-web) 12 | 13 | 14 | ## 数据计算 15 | - [streamx](https://github.com/streamxhub/streamx) 16 | - [plink](https://github.com/hairless/plink) Platform for Flink 17 | - [FlinkSQL](https://github.com/ambition119/FlinkSQL) 18 | - [flinkStreamSQL](https://github.com/DTStack/flinkStreamSQL) 19 | - [waterdrop](https://github.com/InterestingLab/waterdrop) 20 | 21 | 22 | ## 调度 23 | - [dolphinscheduler](https://github.com/apache/dolphinscheduler) 24 | 25 | 26 | ## 开发平台、其他 27 | - [davinci](https://github.com/edp963/davinci) 28 | - [DataSphereStudio](https://github.com/WeBankFinTech/DataSphereStudio) 微众银行 29 | - [wormhole](https://github.com/edp963/wormhole) 宜信 30 | - [big-whale](https://github.com/MeetYouDevs/big-whale) 31 | - [lark](https://github.com/wxgzgl/lark) -------------------------------------------------------------------------------- /columns/flink/Flink进阶教程.md: -------------------------------------------------------------------------------- 1 | # Flink进阶教程 2 | 3 | 时间:2019 4 | 来源:Ververica中文社区 5 | 6 | 7 | - [Apache Flink 进阶教程(一):Runtime 核心机制剖析](https://ververica.cn/developers/advanced-tutorial-1-analysis-of-the-core-mechanism-of-runtime/) 8 | - [Apache Flink 进阶教程(二):Time 深度解析](https://ververica.cn/developers/advanced-tutorial-2-time-depth-analysis/) 9 | - [Apache Flink 进阶教程(三):Checkpoint 的应用实践](https://ververica.cn/developers/advanced-tutorial-2-checkpoint-application-practice/) 10 | - [Apache Flink 进阶教程(四):Flink on Yarn/K8s 原理剖析及实践](https://ververica.cn/developers/advanced-tutorial-2-flink-on-yarn-k8s/) 11 | - [Apache Flink 进阶教程(五):数据类型和序列化](https://ververica.cn/developers/advanced-tutorial-2-serialize/) 12 | - [Apache Flink 进阶教程(六):Flink 作业执行深度解析](https://ververica.cn/developers/advanced-tutorial-2-flink-job-execution-depth-analysis/) 13 | - [Apache Flink 进阶教程(七):网络流控及反压剖析](https://ververica.cn/developers/advanced-tutorial-2-analysis-of-network-flow-control-and-back-pressure/) 14 | - [Apache Flink 进阶教程(八):详解 Metrics 原理与实战](https://ververica.cn/developers/advanced-tutorial-2-principles-and-practice-of-metrics/) 15 | 16 | 17 | -------------------------------------------------------------------------------- /columns/doris/Doris最佳实践.md: -------------------------------------------------------------------------------- 1 | # Doris最佳实践 2 | 3 | ## 调优 4 | - [Compaction调优(1)](https://mp.weixin.qq.com/s/Kv71HomwNioHQDz8NUec1A) 2021-06 5 | - [Compaction调优(2)](https://mp.weixin.qq.com/s/mJrxpvYIoE9rgP9Hvo1Dnw) 2021-06 6 | - [Compaction调优(3)](https://mp.weixin.qq.com/s/cZmXEsNPeRMLHp379kc2aA) 2021-06 7 | - [Apache Doris Join 实现与调优实践](https://mp.weixin.qq.com/s/pukjERSOW-D-BM4z1G9JlA) 2021-09 8 | 9 | 10 | ## 业务实现 11 | - [Apache Doris 基于 Bitmap的精确去重和用户行为分析](https://mp.weixin.qq.com/s/e0IrXgkinpeEDKi0etfGKA) 2020-01 12 | - [Doris在用户画像人群业务的应用](https://mp.weixin.qq.com/s/HGyIgqCIIXfeJtNdKbj-fQ) 2020-10 13 | 14 | 15 | ## 组件结合 16 | - [基于 Iceberg 拓展 Doris 数据湖能力的实践](https://mp.weixin.qq.com/s/Vgo2kWED8oxg45x6zumEYQ) 2021-07 17 | - [Flink 消费 Kafka 实时写入 Apache Doris(KFD)](https://mp.weixin.qq.com/s/nUeHwFBQs50EvPukqnrinQ) 2021-09 18 | - [Spark Doris Connector的最佳实践](https://mp.weixin.qq.com/s/c8zE7ymv6jC1WTlV44dldQ) 2020-04 19 | - [ProxySQL实现Doris FE高可用](https://mp.weixin.qq.com/s/XHgtIzekxkiGCjqcRbqndw) 2020-08 20 | 21 | 22 | ## 其他 23 | - [Apache Doris和ClickHouse的深度分析](https://mp.weixin.qq.com/s/fyVSRB3wxmsZUx4kY1eQRQ) 2021-10 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /columns/doris/Doris全面解析.md: -------------------------------------------------------------------------------- 1 | # Doris全面解析 2 | 3 | ## 原理 4 | - [Apache Doris : 一个开源 MPP 数据库的架构与实践](https://www.jianshu.com/p/d3742af8ecce) 5 | 6 | ## 存储相关 7 | - [存储层设计介绍1——存储结构设计解析](https://mp.weixin.qq.com/s/aJ3FwDI6KprYYUwXzhl_-A) 2020-07 8 | - [存储层设计介绍2——写入流程、删除流程分析](https://mp.weixin.qq.com/s/xl4ePcsSVPPNQDGBw-KoKA) 2020-07 9 | - [存储层设计介绍3——读取流程、Compaction流程分析](https://mp.weixin.qq.com/s/U9w3VxCKhTk_3Sglo9J-aA) 2020-08 10 | - [Doris Compaction机制解析](https://mp.weixin.qq.com/s/5D1gAOEiFWM7N6KPwqHHdw) 2021-02 11 | - [Apache Doris Parquet文件读取的设计与实现](https://mp.weixin.qq.com/s/5D6G_kvl9TzYCMIgynhERA) 2019-08 12 | - [Doris核心功能介绍——数据模型和物化视图](https://mp.weixin.qq.com/s/eRUg1du8AQxLvqYjJ621fA) 2020-07 13 | 14 | 15 | ## 计算相关 16 | - [Apache Doris 查询原理](https://blog.bcmeng.com/post/apache-doris-query.html) 2020-03 17 | - [Doris SQL 原理解析](https://mp.weixin.qq.com/s/v1jI1MxEHPT5czCWd0kRxw) 2021-01 18 | - [Doris Stream Load原理解析](https://mp.weixin.qq.com/s/NUSHwAUsFskSXG5R0mw8kg) 2021-06 19 | - [Apache Doris 索引机制解析](https://mp.weixin.qq.com/s/KdCdXb9Z3MdUZ5S0RV726Q) 2021-09 20 | - [Spark Doris Sink的设计和实现](https://mp.weixin.qq.com/s/uoPLfFBv9Vt2gg9HEriR0Q) 2019-08 21 | 22 | 23 | ## 其他 24 | - [Doris基于Hive表的全局字典设计与实现](https://mp.weixin.qq.com/s/YlZnlMTTI8xhULmk1y-N6w) 2020-08 25 | -------------------------------------------------------------------------------- /columns/presto/Presto资料汇总、会议资讯专栏.md: -------------------------------------------------------------------------------- 1 | # Presto资料汇总、会议资讯专栏 2 | 3 | ## 一、官网、技术博客 4 | ### 1.1 官网 5 | - [PrestoDB 官网](https://prestodb.io/) 6 | - [Trino 官网](https://trino.io/) 原PrestoSql 7 | - [PrestoDB Blog](https://prestodb.io/blog/index.html) 8 | - [Trino Blog](https://trino.io/blog/) 9 | - [PrestoDB github](https://github.com/prestodb/presto) 10 | - [Trino github](https://github.com/trinodb/trino) 11 | 12 | 13 | ### 1.2 讨论区(群组、公众号等) 14 | - [Google Presto Group](https://groups.google.com/g/presto-users) 15 | - [PrestoDB Slack](https://prestodb.slack.com) 16 | - [Trino Slack](https://trinodb.slack.com) 17 | - 公众号:Presto News 18 | - 公众号:FFCompute 19 | 20 | 21 | ### 1.3 技术博客 22 | - [Presto知乎专栏](https://www.zhihu.com/column/presto-cn) 23 | - [若飞-技术博客](http://armsword.com/archives/) 24 | 25 | 26 | 27 | 28 | 29 | ## 二、书籍相关 30 | - [《Presto: The Definitive Guide》](https://trino.io/blog/2020/04/11/the-definitive-guide.html) 31 | - [《Presto技术内幕》](https://book.douban.com/subject/26855863/) 京东Presto团队 32 | 33 | 34 | 35 | 36 | 37 | 38 | ## 三、会议、资讯 39 | ### 3.1 会议 40 | - [Presto Meetup Oct 2019](https://zhuanlan.zhihu.com/p/88350254) 2019-10 41 | - [PrestoCon 2020](https://prestocon2020.sched.com/) 42 | - [PrestoCon 2021](https://prestocon2021.sched.com/) 43 | - [PrestoCon 2022](https://prestocon2022.sched.com/) 44 | 45 | 46 | ### 3.2 资讯 47 | - [惊闻Facebook开源大数据引擎Presto团队正在分裂](https://zhuanlan.zhihu.com/p/55628236) 2019-01 48 | - [与 Facebook 分手后 ,PrestoSQL 再度因商标侵权被迫更名](https://www.infoq.cn/article/WmH0WXhqsWqpHDm6PpjC) 2021-01 49 | 50 | 51 | 52 | -------------------------------------------------------------------------------- /columns/flink/Flink架构、源码分析专栏.md: -------------------------------------------------------------------------------- 1 | # Flink架构、源码分析专栏 2 | 3 | 4 | 5 | ## 流式计算原理 6 | - [Streaming 101: The world beyond batch](https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/) 7 | - [Streaming 102: The world beyond batch](https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102/) 8 | 9 | 10 | ## DataSet,DataStream 11 | 12 | 13 | 14 | ## Table,SQL 15 | 16 | 17 | 18 | ## Time,Watermark 19 | - [Flink Watermark 机制浅析](http://www.whitewood.me/2018/06/01/Flink-Watermark-%E6%9C%BA%E5%88%B6%E6%B5%85%E6%9E%90/) 2018-06 20 | 21 | 22 | 23 | ## State 24 | - [Flink State 最佳实践](https://ververica.cn/developers/flink-state-best-practices/) 2020-04 25 | 26 | 27 | 28 | ## Checkpoint,Savepoint 29 | - 关键词:Barrier非对齐 30 | - [分布式快照算法: Chandy-Lamport 算法](https://zhuanlan.zhihu.com/p/53482103) 2020-11 31 | - [Flink Checkpoint 原理流程以及常见失败原因分析](https://tech.youzan.com/flink_checkpoint_mechanism/) 2019-12 32 | - [Flink 轻量级异步快照 ABS 实现原理](http://www.whitewood.me/2018/05/13/Flink-%E8%BD%BB%E9%87%8F%E7%BA%A7%E5%BC%82%E6%AD%A5%E5%BF%AB%E7%85%A7-ABS-%E5%AE%9E%E7%8E%B0%E5%8E%9F%E7%90%86/) 2018-05 33 | - [Flink Checkpoint/Savepoint 差异](http://www.whitewood.me/2018/09/06/Flink-Checkpoint-Savepoint-%E5%B7%AE%E5%BC%82/) 2018-09 34 | 35 | 36 | ## Operators 37 | ### Windows 38 | 39 | ### Joining 40 | 41 | ### ProcessFunction 42 | 43 | 44 | 45 | 46 | ## Connector 47 | - [漫谈 Flink Source 接口重构](http://www.whitewood.me/2020/02/11/%E6%BC%AB%E8%B0%88-Flink-Source-%E6%8E%A5%E5%8F%A3%E9%87%8D%E6%9E%84/) 2020-02 48 | - [Flink JDBC Connector:Flink 与数据库集成最佳实践](https://developer.aliyun.com/article/776069) 49 | 50 | 51 | ## Flink On YARN 52 | - [Flink on YARN(上):一张图轻松掌握基础架构与启动流程](https://developer.aliyun.com/article/719262) 53 | - [Flink on YARN(下):常见问题与排查思路](https://developer.aliyun.com/article/719703) 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | -------------------------------------------------------------------------------- /columns/hive/hive教程.md: -------------------------------------------------------------------------------- 1 | # Hive教程 2 | 3 | 4 | ## Hive学习之路 2018 5 | - [Hive学习之路 (一)Hive初识](https://www.cnblogs.com/qingyunzong/p/8707885.html) 6 | - [Hive学习之路 (二)Hive安装](https://www.cnblogs.com/qingyunzong/p/8708057.html) 7 | - [Hive学习之路 (三)Hive元数据信息对应MySQL数据库表](https://www.cnblogs.com/qingyunzong/p/8710356.html) 8 | - [Hive学习之路 (四)Hive的连接3种连接方式](https://www.cnblogs.com/qingyunzong/p/8715925.html) 9 | - [Hive学习之路 (五)DbVisualizer配置连接hive](https://www.cnblogs.com/qingyunzong/p/8715250.html) 10 | - [Hive学习之路 (六)Hive SQL之数据类型和存储格式](https://www.cnblogs.com/qingyunzong/p/8733924.html) 11 | - [Hive学习之路 (七)Hive的DDL操作](https://www.cnblogs.com/qingyunzong/p/8723271.html) 12 | - [Hive学习之路 (八)Hive中文乱码](https://www.cnblogs.com/qingyunzong/p/8724155.html) 13 | - [Hive学习之路 (九)Hive的内置函数](https://www.cnblogs.com/qingyunzong/p/8744593.html) 14 | - [Hive学习之路 (十)Hive的高级操作](https://www.cnblogs.com/qingyunzong/p/8746159.html) 15 | - [Hive学习之路 (十一)Hive的5个面试题](https://www.cnblogs.com/qingyunzong/p/8747656.html) 16 | - [Hive学习之路 (十二)Hive SQL练习之影评案例](https://www.cnblogs.com/qingyunzong/p/8727264.html) 17 | - [Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX](https://www.cnblogs.com/qingyunzong/p/8782794.html) 18 | - [Hive学习之路 (十四)Hive分析窗口函数(二) NTILE,ROW_NUMBER,RANK,DENSE_RANK](https://www.cnblogs.com/qingyunzong/p/8798102.html) 19 | - [Hive学习之路 (十五)Hive分析窗口函数(三) CUME_DIST和PERCENT_RANK](https://www.cnblogs.com/qingyunzong/p/8798382.html) 20 | - [Hive学习之路 (十六)Hive分析窗口函数(四) LAG、LEAD、FIRST_VALUE和LAST_VALUE](https://www.cnblogs.com/qingyunzong/p/8798606.html) 21 | - [Hive学习之路 (十七)Hive分析窗口函数(五) GROUPING SETS、GROUPING__ID、CUBE和ROLLUP](https://www.cnblogs.com/qingyunzong/p/8798987.html) 22 | - [Hive学习之路 (十八)Hive的Shell操作](https://www.cnblogs.com/qingyunzong/p/8847532.html) 23 | - [Hive学习之路 (十九)Hive的数据倾斜](https://www.cnblogs.com/qingyunzong/p/8847597.html) 24 | - [Hive学习之路 (二十)Hive 执行过程实例分析](https://www.cnblogs.com/qingyunzong/p/8847651.html) 25 | - [Hive学习之路 (二十一)Hive 优化策略](https://www.cnblogs.com/qingyunzong/p/8847775.html) 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | -------------------------------------------------------------------------------- /columns/flink/Apache Flink 漫谈系列.md: -------------------------------------------------------------------------------- 1 | # Apache Flink 漫谈系列 (阿里云实时计算Flink) 2 | 3 | 4 | ## 教程 5 | - [Apache Flink 漫谈系列(01) - 序](https://developer.aliyun.com/article/666043?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL) 6 | - [Apache Flink 漫谈系列(02) - 概述](https://developer.aliyun.com/article/666052?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL) 7 | - [Apache Flink 漫谈系列(03) - Watermark](https://developer.aliyun.com/article/666056?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL) 8 | - [Apache Flink 漫谈系列(04) - State](https://developer.aliyun.com/article/667562?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL) 9 | - [Apache Flink 漫谈系列(05) - Fault Tolerance](https://developer.aliyun.com/article/667564?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL) 10 | - [Apache Flink 漫谈系列(06) - 流表对偶(duality)性](https://developer.aliyun.com/article/667566?spm=a2c6h.14164896.0.0.59817cb20Sk3GI) 11 | - [Apache Flink 漫谈系列(07) - 持续查询(Continuous Queries)](https://developer.aliyun.com/article/667700?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL) 12 | - [Apache Flink 漫谈系列(08) - SQL概览](https://developer.aliyun.com/article/670202?spm=a2c6h.14164896.0.0.59817cb20Sk3GI) 13 | - [Apache Flink 漫谈系列(09) - JOIN 算子](https://developer.aliyun.com/article/672760?spm=a2c6h.14164896.0.0.59817cb20Sk3GI) 14 | - [Apache Flink 漫谈系列(10) - JOIN LATERAL](https://developer.aliyun.com/article/674345?spm=a2c6h.14164896.0.0.59817cb20Sk3GI) 15 | - [Apache Flink 漫谈系列(11) - Temporal Table JOIN](https://developer.aliyun.com/article/679659?spm=a2c6h.14164896.0.0.59817cb20Sk3GI) 16 | - [Apache Flink 漫谈系列(12) - Time Interval(Time-windowed) JOIN](https://developer.aliyun.com/article/683681?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL) 17 | - [Apache Flink 漫谈系列(13) - Table API 概述](https://developer.aliyun.com/article/685085?spm=a2c6h.14164896.0.0.59817cb20Sk3GI) 18 | - [Apache Flink 漫谈系列(14) - DataStream Connectors之Kafka](https://developer.aliyun.com/article/686809?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL) 19 | 20 | 21 | 22 | ## 资源 23 | - [阿里云实时计算Flink](https://developer.aliyun.com/group/sc?spm=a2c6h.12873639.0.0.e12d59b2IvG4B2#/?_k=9flh5j) 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /columns/presto/Presto最佳实践、调优、踩坑专栏.md: -------------------------------------------------------------------------------- 1 | # Presto最佳实践、调优、踩坑专栏 2 | 3 | 4 | 5 | 6 | ## 一、最佳实践 7 | - [Presto的ETL之路](https://zhuanlan.zhihu.com/p/53996153) 2019-01 8 | - [Presto的应用场景与企业案例](https://zhuanlan.zhihu.com/p/260653669) 2020-10 9 | 10 | 11 | ### 1.1 技术选型 12 | - [PrestoDB VS PrestoSQL发展比较](https://zhuanlan.zhihu.com/p/87621360) 2019-10 13 | - [PrestoDB和PrestoSQL比较及选择](http://armsword.com/2020/05/02/the-difference-between-prestodb-and-prestosql/) 2020-05 14 | 15 | 16 | ### 1.2 大厂实践 17 | - [Presto在B站的实践](https://www.bilibili.com/read/cv16043517) 2022-04 18 | - [Presto 在字节跳动的内部实践与优化(优化篇)](https://xie.infoq.cn/article/061bb0935a8575e01ea243852) 2021-12 19 | - [Presto at Tencent at Scale - pdf](https://static.sched.com/hosted_files/prestocon2021/ed/Presto%20at%20Tencent%20at%20Scale%20%281%29.pdf) 2021-12 20 | - [Presto在车好多的实践](https://mp.weixin.qq.com/s/Bmqv54sVZgTqQ82I_RfmsA) 2020-12 21 | - [Presto在滴滴的探索与实践](https://zhuanlan.zhihu.com/p/266162270) 2020-10 22 | - [Presto 在有赞的实践之路](https://tech.youzan.com/presto-zai-you-zan-de-shi-jian-zhi-lu/) 2020-04 23 | - [PrestoCon 2020:云原生数据湖分析DLA的Presto实践](https://zhuanlan.zhihu.com/p/260784762) 2020-03 24 | - [携程 Presto 技术演进之路](https://zhuanlan.zhihu.com/p/41538472) 2018-08 25 | - [Presto实现原理和美团的使用实践](https://tech.meituan.com/2014/06/16/presto.html) 2014-06 26 | - [阿里数据湖 Presto分析算力隔离技术剖析 ](https://mp.weixin.qq.com/s/lV_nzLI6_Ott7Abyaik_bw) 27 | 28 | 29 | 30 | 31 | ## 二、性能调优 32 | - [Presto性能调优的五大技巧](https://zhuanlan.zhihu.com/p/162809568) 2020-07 33 | - [Presto内存管理原理和调优](http://armsword.com/2018/05/22/the-memory-management-and-tuning-experience-of-presto/) 2018-05 34 | - [Presto内存管理相关参数设置](http://armsword.com/2019/11/13/the-configuration-settings-of-presto-memory-management/) 2019-11 35 | - [Presto集群内存不足时保护机制](http://armsword.com/2020/02/18/presto-memory-kill-policy/) 2020-02 36 | - [火焰图在Presto YGC优化中的应用](https://mp.weixin.qq.com/s/BZG7Av5f9HH9gueVF8ABvQ) 2020-03 37 | - [使用火焰图定位 OLAP 引擎瓶颈](https://mp.weixin.qq.com/s/pIYdeF0TtbGgV0Va35ejQg) 2021-03 38 | - [How to Make The Presto Query Engine Run Fastest](https://ahana.io/learn/presto/making-the-presto-query-engine-run-faster/) 39 | 40 | 41 | 42 | 43 | 44 | ## 三、问题排查(踩坑) 45 | - [说下那些导致Presto查询变慢的JVM Bug和解决方法](http://armsword.com/2021/02/07/jvm-bug-causes-Presto-queries-to-slow-down/) 2021-02 46 | - [Presto Master JVM Core问题调研](http://armsword.com/2020/12/10/solve-presto-jvm-coredump/) 2020-12 47 | - [Jetty导致Presto堆外内存泄露的排查过程](http://armsword.com/2020/06/23/jetty-cause-presto-memory-leak/) 2020-06 48 | - [记一次Presto Worker OOM的查找过程](http://armsword.com/2020/06/03/the-solution-of-presto-oom-caused-by-orc-statistics/) 2020-06 49 | - [Presto System load过高问题调研](http://armsword.com/2019/09/18/solve-presto-system-load-too-high/) 2019-09 50 | - [一次 Presto 的连接数超限的问题定位](https://zhuanlan.zhihu.com/p/57956341) 2019-03 51 | - [Presto Codegen问题排查案例](https://zhuanlan.zhihu.com/p/66243773) 2019-05 52 | - [Presto coordinator的CPU持续上涨,原因竟然是这样](https://mayunlei.github.io/2019/05/20/Presto-coordinator%E7%9A%84CPU%E6%8C%81%E7%BB%AD%E4%B8%8A%E6%B6%A8%EF%BC%8C%E5%8E%9F%E5%9B%A0%E7%AB%9F%E7%84%B6%E6%98%AF%E8%BF%99%E6%A0%B7/) 2019-05 53 | - [Presto内存泄露问题调查](https://mayunlei.github.io/2019/09/02/Presto%E5%86%85%E5%AD%98%E6%B3%84%E9%9C%B2%E9%97%AE%E9%A2%98%E8%B0%83%E6%9F%A5/) 2019-09 54 | 55 | -------------------------------------------------------------------------------- /columns/presto/Presto架构、源码分析专栏.md: -------------------------------------------------------------------------------- 1 | # Presto架构、源码分析专栏 2 | 3 | ## 一、原理、架构 4 | - [Presto概述:特性、原理、架构](https://zhuanlan.zhihu.com/p/260399749) 2020-10 5 | - [分布式SQL查询引擎Presto原理介绍](http://armsword.com/2017/12/05/presto/) 2017-12 6 | - [深入理解Presto](https://zhuanlan.zhihu.com/p/101366898) 2020-01 7 | - [分布式SQL查询引擎原理(以Presto SQL为例)](https://zhuanlan.zhihu.com/p/293775390) 2020-11 8 | - [深入理解Presto,Presto的内部架构](https://mayunlei.github.io/2020/08/16/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3Presto-Presto%E7%9A%84%E5%86%85%E9%83%A8%E6%9E%B6%E6%9E%84/) 2020-08 9 | - [Presto 分布式SQL查询引擎及原理分析](https://mp.weixin.qq.com/s?__biz=MzI5MDEzMzg5Nw==&mid=2660400264&idx=1&sn=ebff65980ef45f7dffea1e5ec7d51fdc&chksm=f7425e6ec035d778dcc5704babe5241d8c80f3d21059434b00d8d4c46d9ce0bd232467ec92a6&scene=21#wechat_redirect) 2020-05 10 | 11 | 12 | 13 | 14 | 15 | ## 二、源码分析 16 | 17 | ### 2.1 前期准备 18 | - [如何快速掌握Presto源码:思路和经验](https://zhuanlan.zhihu.com/p/262236892) 2020-10 19 | - [Presto 源码阅读: Overview](https://zhuanlan.zhihu.com/p/51393518) 2018-12 20 | - [Presto的一些基本概念](http://armsword.com/2018/08/11/the-basic-concepts-of-presto/) 2018-08 21 | - [Presto/Trino权威指南及官方设计文档解读](https://www.jianshu.com/p/d3600d2a115d) 2021-05 22 | 23 | 24 | ### 2.2 数据类型、Query Execution Model 25 | - [Presto类型系统初探](https://zhuanlan.zhihu.com/p/55299409) 2019-01 26 | - [Presto源码分析之数据类型](https://zhuanlan.zhihu.com/p/52713533) 2018-12 27 | - [Presto Core Data Structures: Slice, Block & Page](https://zhuanlan.zhihu.com/p/60813087) 2019-03 28 | - [Presto源码分析之Slice](https://zhuanlan.zhihu.com/p/52735465) 2018-12 29 | - [Presto Driver,Split and Pipeline](https://www.lewuathe.com/presto-driver,split-and-pipeline.html) 2017-05 30 | 31 | 32 | ### 2.3 SQL解析、执行计划生成与优化 33 | - [Presto 源码分析:Coordinator 篇](https://www.infoq.cn/article/VNe0A9yKszPCmp32akCa) 2019-12 34 | - [Presto SQL Parser源码分析](https://zhuanlan.zhihu.com/p/57438825) 2019-02 35 | - [Presto 源码阅读:Optimizers](https://zhuanlan.zhihu.com/p/52154130) 2019-01 36 | - [Presto逻辑执行计划生成](https://zhuanlan.zhihu.com/p/57395047) 2019-02 37 | - [Presto源码分析之IterativeOptimizer](https://zhuanlan.zhihu.com/p/52879375) 2018-12 38 | - [Presto源码分析之模式匹配](https://zhuanlan.zhihu.com/p/52916774) 2018-12 39 | - [Presto技术源码解析总结-一个SQL的奇幻之旅 上](https://www.jianshu.com/p/3fccfa82e1ec) 2019-04 40 | - [Presto技术源码解析总结-一个SQL的奇幻之旅 下](https://www.jianshu.com/p/d8a3d7488358) 2019-04 41 | - [Presto查询执行过程和索引条件下推分析](https://mp.weixin.qq.com/s?src=11×tamp=1616394200&ver=2961&signature=E7fzfl-wO5wGpohLLkE8v9hRKn5GR1TbVwU-N6Hl11T0Xl6TtlgCbhJmisPs*Z-hYiprO0yYK91O5GR0m-V-s5kvv6NudfeWMGW4iPXdAdetAfDAo4EITB9l*yZajiJS&new=1) 2020-05 42 | 43 | 44 | ### 2.4 分布式任务调度、split生成与调度策略、worker选择策略 45 | - [Presto运行时浅析](https://zhuanlan.zhihu.com/p/345733460) 2021-01 46 | - [Presto源码阅读——如何获取Hive中的Metadata(HMS+HDFS)](https://blog.csdn.net/huang_quanlong/article/details/80380474) 2018-07 47 | - [Presto如何构建和使用海量Hive Splits](https://zhuanlan.zhihu.com/p/344559757) 2021-01 48 | - [Presto之Task执行框架](https://zhuanlan.zhihu.com/p/54172313) 2019-01 49 | - [Presto 是如何 schedule task 的?](https://zhuanlan.zhihu.com/p/58959725) 2019-03 50 | - [Presto 由Stage到Task的旅程](https://zhuanlan.zhihu.com/p/55785284) 2019-01 51 | - [Presto调度task选择Worker方法](http://armsword.com/2020/04/08/presto-scheduling-task/) 2020-04 52 | - [presto中的AllAtOnce与Phased](https://zhuanlan.zhihu.com/p/61656233) 2019-05 53 | - [Presto 任务调度: 任务分配到哪里](https://mayunlei.github.io/2020/05/30/Presto-%E4%BB%BB%E5%8A%A1%E8%B0%83%E5%BA%A6%EF%BC%9A-%E4%BB%BB%E5%8A%A1%E5%88%86%E9%85%8D%E5%88%B0%E5%93%AA%E9%87%8C/) 2020-05 54 | - [Presto Split 详解](https://blog.csdn.net/zhanyuanlin/article/details/109215177) 55 | 56 | 57 | ### 2.5 常用Operator分析、常用SQL底层实现原理 58 | - [Window函数与WindowOperator源码解析](https://zhuanlan.zhihu.com/p/59550902) 2019-03 59 | - [Presto中coalesce函数的实现与Expression Codegen](https://zhuanlan.zhihu.com/p/64131496) 2019-04 60 | - [Presto Limit 类算子分析](https://zhuanlan.zhihu.com/p/62448395) 2019-04 61 | - [Presto分页功能概述](https://zhuanlan.zhihu.com/p/57030465) 2019-02 62 | 63 | #### join、shuffle 64 | - [Presto 数据如何进行shuffle](https://zhuanlan.zhihu.com/p/61565957) 2019-04 65 | - [Presto中的Hash Join](https://zhuanlan.zhihu.com/p/54731892) 2019-03 66 | 67 | #### 分组聚合 68 | - [Presto中的分组聚合查询流程](https://zhuanlan.zhihu.com/p/54385845) 2019-01 69 | - [深入理解Presto中的Group By查询](https://zhuanlan.zhihu.com/p/67742519) 2019-09 70 | 71 | 72 | ### 2.6 Function、UDF 73 | 74 | 75 | 76 | ### 2.7 Connector机制、常用Connector分析 77 | - [ORC & Presto](https://zhuanlan.zhihu.com/p/110013789) 2020-02 78 | - [Presto ORC及其性能优化](http://armsword.com/2019/09/30/presto-orc-and-performance-optimization/) 2019-09 79 | - [Presto Hive MetaStore相关代码分析](https://zhuanlan.zhihu.com/p/109033118) 2020-02 80 | - [Presto Connector之SystemTable](https://zhuanlan.zhihu.com/p/60934739) 2019-03 81 | - [如何让Presto可以连接Hbase?文中含Hbase-Connect开发详解](https://www.analysys.cn/article/detail/20019023) 2018-11 82 | 83 | 84 | ### 2.8 其他 85 | - [Presto源码分析之TupleDomain](https://zhuanlan.zhihu.com/p/53113638) 2018-12 86 | - [Presto的缓存机制](https://zhuanlan.zhihu.com/p/196398077) 2020-08 87 | - [Presto Caching](https://zhuanlan.zhihu.com/p/147769024) 2020-06 88 | - [Presto Codegen简介与优化尝试](https://zhuanlan.zhihu.com/p/53469238) 2018-12 89 | - [Presto Procedure](https://zhuanlan.zhihu.com/p/59159147) 2019-03 90 | - [How is data inserted into Presto?](https://zhuanlan.zhihu.com/p/59846328) 2019-03 91 | - [Presto兼容Hive SQL的一些改造工作](http://armsword.com/2019/03/31/presto-compatible-hive-syntax/) 2019-03 92 | - [Presto Coordinator分布式改造](https://mayunlei.github.io/2019/11/26/Presto-Coordinator%E5%88%86%E5%B8%83%E5%BC%8F%E6%94%B9%E9%80%A0/) 2019-11 93 | - [Visualize Execution Plan in Presto](https://www.lewuathe.com/visualize-execution-plan-in-presto.html) 2019-09 94 | - [Presto兼容Hive隐式类型转换](https://mp.weixin.qq.com/s/1hn3nVBdBtBeiPl3wxvHfQ) 2021-02 95 | - [Presto 标量函数注册和调用过程简述](https://mp.weixin.qq.com/s/vd65OVeIOH7YFQ0QOAmsUg) 2020-09 96 | - [Presto 函数实现简述](https://mp.weixin.qq.com/s/1Z_qik61N3hKwWqG8QR69w) 2020-07 97 | - [Improved Hive Bucketing](https://trino.io/blog/2019/05/29/improved-hive-bucketing.html) 98 | 99 | 100 | 101 | 102 | ## 三、相关论文 103 | - [官方论文《Presto: SQL on everything》](https://trino.io/Presto_SQL_on_Everything.pdf) [中文翻译](https://www.jianshu.com/p/de0a1de9f26e) 104 | - [《F1 Query: Declarative Querying at Scale》读后感](https://zhuanlan.zhihu.com/p/53299556) 2018-12 105 | - [《Column-Stores vs. Row-Stores》读后感](https://zhuanlan.zhihu.com/p/54433448) 2019-01 abei-知乎 106 | - [读后感之《Column-Stores vs. Row-Stores》](https://zhuanlan.zhihu.com/p/54484592) 2019-01 萌豆-知乎 107 | - [Wander Join:Online Aggregation via Random Walks读后感](https://zhuanlan.zhihu.com/p/55050773) 2020-03 108 | - [《The Snowflake Elastic Data Warehouse》读后感](https://zhuanlan.zhihu.com/p/55577067) 2019-01 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DPKB 2 | 3 | 大数据相关知识库,主要包括: 4 | * 数据存储层、数据库(HDFS、Hive、HBase、Kudu、Doris、StarRocks、ClickHouse、TiDB等) 5 | * 数据处理层、OLAP引擎(Spark、Flink、Presto、Trino等) 6 | * 数据湖(IceBerg、Hudi、Delta等) 7 | * 大数据开发、应用(主要包括ETL、调度、数仓、数据应用等,例如Seatunnel、Dolphinscheduler等) 8 | * 数据治理(元数据管理、数据模型、数据标准、数据质量、数据安全等) 9 | 10 | 持续更新中(2024-12) 11 | 12 | 13 | 14 | 15 | 16 | ## 一、数据存储层、数据库(HDFS、Hive、HBase、Kudu、Doris、StarRocks、ClickHouse、TiDB等) 17 | 18 | ### ▶ HDFS 19 | 20 | 21 | 22 | ### ▶ Yarn 23 | #### 1)原理 24 | - [Hadoop Yarn 一文搞懂 Yarn架构原理和工作机制](https://www.cnblogs.com/liangzilx/p/14837562.html) 25 | 26 | 27 | 28 | ### ▶ Hive 29 | #### 1)官网、社区、博客 30 | - [Hive 官网](https://hive.apache.org/) 31 | 32 | 33 | #### 2)专栏 34 | - [Hive 教程](columns/hive/hive教程.md) 35 | 36 | 37 | #### 3)大厂实践 38 | - [HiveCube 在有赞的实践](https://tech.youzan.com/cube/) 2019-11 39 | - [Hive Metastore Federation 在滴滴的实践](https://blog.didiyun.com/index.php/2019/03/25/hive-metastore-federation/) 2019-03 40 | 41 | 42 | 43 | 44 | ### ▶ HBase 45 | #### 1)官网、社区、博客 46 | - [HBase 官网](https://hbase.apache.org/) 47 | - [hbasefly](http://hbasefly.com/) 48 | 49 | 50 | #### 2)专栏 51 | 52 | 53 | #### 3)大厂实践 54 | 55 | 56 | #### 4)其他 57 | - [HBase Bulkload 实践探讨](https://tech.youzan.com/hbase-bulkloadshi-practice/) 2019-12 58 | 59 | 60 | 61 | 62 | ### ▶ Kudu 63 | #### 1)官网、社区、博客 64 | - [Kudu 官网](https://kudu.apache.org/) 65 | 66 | 67 | #### 2)专栏 68 | - [Kudu 原理 论文](columns/kudu/Kudu原理论文.md) 69 | - [网易云Kudu技术专栏](columns/kudu/网易云Kudu技术文章.md) 70 | 71 | 72 | #### 3)大厂实践 73 | - [Apache Kudu 在网易的实践](https://www.infoq.cn/article/kgwyqb5wer5wl8cquweq) 2021-08 74 | - [Apache Kudu 在网易实时数仓的实践](https://www.infoq.cn/article/QETxjyIu5tAJTZ9ksMdu) 2020-02 75 | - [Kudu架构介绍及其在小米的应用实践](https://www.modb.pro/db/119708) 2017-06 76 | 77 | 78 | #### 4) 其他 79 | - [我是如何成为Apache Kudu committer & PMC 的?](https://cloud.tencent.com/developer/article/1450749) 2019-06 80 | 81 | 82 | 83 | 84 | ### ▶ Doris 85 | #### 1)官网、社区、博客 86 | - [Doris 官网](https://doris.apache.org/) 87 | - [Doris github](https://github.com/apache/doris) 88 | - [Doris 论坛](https://github.com/apache/incubator-doris/discussions) 89 | 90 | 91 | #### 2)专栏 92 | - [Doris全面解析](columns/doris/Doris全面解析.md) 93 | - [Doris最佳实践](columns/doris/Doris最佳实践.md) 94 | 95 | 96 | #### 3)案例实践 97 | - [Apache Doris在美团外卖数仓中的应用实践](https://tech.meituan.com/2020/04/09/doris-in-meituan-waimai.html) 2020-04 98 | - [Apache Doris 在韵达物流领域的应用实践](https://mp.weixin.qq.com/s/Z_PhWk92ctZ7slz4SrVZ9Q) 2021-07 99 | - [Apache Doris 在蜀海供应链的实践](https://mp.weixin.qq.com/s/SHuE-KCsIyh6jfo0DqLD6w) 2021-07 100 | - [京东物流基于 Doris 的亿级数据自助探索应用](https://mp.weixin.qq.com/s/qVFa40yMg0_N9Lsb10ACQA) 2021-07 101 | - [Doris on ES在快手商业化的最佳实践](https://mp.weixin.qq.com/s/5Pc5ewVFWPgauG4hNLH9xw) 2021-08 102 | - [基于Doris的有道精品课数据中台建设实践](https://mp.weixin.qq.com/s/Gz-au9CHJ4lHrs5MkzeAJg) 2020-12 103 | - [美团外卖实时数仓建设实践](https://mp.weixin.qq.com/s/-JPWqa_-at7F5hZ0zekVSQ) 2020-10 104 | - [Doris在作业帮实时数仓中的应用&实践](https://mp.weixin.qq.com/s/hjbMM8CbElO04VLN5cfJtQ) 2020-09 105 | - [基于Apache Doris的小米增长分析平台实践](https://mp.weixin.qq.com/s/WeNAItPJ4b7fsqW4kf0dSA) 2020-08 106 | - [Apache Doris在京东双十一大促中的实践](https://mp.weixin.qq.com/s/8XnwJXm4kzq56SvElwL6kA) 2020-03 107 | - [Apache Doris 在百度商业大规模微服务全链路监控的实践](https://mp.weixin.qq.com/s/k7CcCdHPTK1ZTDs_qKgh5w) 2020-02 108 | 109 | 110 | 111 | 112 | ### ▶ StarRocks 113 | #### 1)官网、社区、博客 114 | - [StarRocks](https://www.starrocks.com/zh-CN/index) 115 | - [StarRocks文档](https://docs.starrocks.com/zh-cn/main/introduction/StarRocks_intro) 116 | - [编程小梦 康凯森](https://blog.bcmeng.com/) 117 | 118 | 119 | #### 2) 专栏 120 | - [StarRocks技术内幕](columns/starrocks/StarRocks技术内幕.md) 121 | 122 | 123 | 124 | 125 | ### ▶ ClickHouse 126 | #### 1)官网、社区、博客 127 | - [ClickHouse 官网](https://clickhouse.com/) 128 | 129 | 130 | #### 2)专栏 131 | 132 | 133 | #### 3)大厂实践 134 | - [ClickHouse 在有赞的实践之路](https://tech.youzan.com/clickhouse-zai-you-zan-de-shi-jian-zhi-lu/) 2021-01 135 | 136 | 137 | #### 4)其他 138 | 139 | 140 | 141 | 142 | ## 二、数据处理层、OLAP引擎(Spark、Flink、Presto、Trino等) 143 | 144 | ### ▶ Spark 145 | #### 1)官网、社区、博客 146 | - [Spark 官网](https://spark.apache.org/) 147 | 148 | 149 | #### 2)专栏 150 | - [Apache Spark 的设计与实现](columns/spark/Apache%20Spark的设计与实现.md) 151 | 152 | 153 | #### 3)大厂实践 154 | - [SparkSQL 在有赞的实践](https://tech.youzan.com/sparksql-in-youzan/) 2019-01 155 | - [SparkSQL 在有赞大数据的实践(二)](https://tech.youzan.com/sparksql-in-youzan-2/) 2020-01 156 | 157 | 158 | 159 | 160 | ### ▶ Flink 161 | #### 1)官网、社区、博客 162 | - [Flink 官网](https://flink.apache.org/) 163 | - [Flink Confluence](https://cwiki.apache.org/confluence/display/FLINK/) 164 | - [Flink Blog](https://flink.apache.org/blog/) 165 | - [Ververica Blog](https://www.ververica.com/blog?hsLang=en) 166 | - [Ververica 中文](https://ververica.cn/developers-resources/) 167 | - [Flink 知识图谱](https://ververica.cn/wp-content/uploads/2020/03/Apache-Flink-Stateful-Computations-over-Data-Streams.pdf) 168 | - [Jark's Blog - WuChong - 云邪](http://wuchong.me/) 169 | 170 | 171 | #### 2)专栏 172 | - [Flink 架构、源码分析专栏](columns/flink/Flink架构、源码分析专栏.md) 173 | - [Flink 实战系列](columns/flink/Flink实战系列.md) 174 | - [Flink 开源项目汇总](columns/flink/Flink开源项目汇总.md) 175 | ##### 教程 176 | - [Flink SQL Cookbook - Ververica](https://github.com/ververica/flink-sql-cookbook/) 177 | - [Flink 零基础入门](columns/flink/Flink零基础入门.md) 178 | - [Flink 进阶教程](columns/flink/Flink进阶教程.md) 179 | - [Apache Flink 漫谈系列](columns/flink/Apache%20Flink%20漫谈系列.md) 180 | - [Flink 相关论文](columns/flink/Flink%20相关论文.md) 181 | 182 | 183 | #### 3)大厂实践 184 | - [flink-forward-asia-hackathon-2021](https://github.com/flink-china/flink-forward-asia-hackathon-2021/issues) 185 | 186 | 187 | 188 | 189 | ### ▶ Presto、Trino 190 | #### 1)官网、社区、博客 191 | - [PrestoDB 官网](https://prestodb.io/) 192 | - [Trino 官网](https://trino.io/) 原PrestoSql 193 | - [Google Presto Group](https://groups.google.com/g/presto-users) 194 | - [Presto 知乎专栏](https://www.zhihu.com/column/presto-cn) 195 | - [若飞-技术博客](http://armsword.com/archives/) 196 | 197 | 198 | #### 2)专栏 199 | - [Presto 架构、源码分析专栏](columns/presto/Presto架构、源码分析专栏.md) 200 | - [Presto 最佳实践、调优、踩坑专栏](columns/presto/Presto最佳实践、调优、踩坑专栏.md) 201 | - [Presto 资料汇总、会议资讯专栏](columns/presto/Presto资料汇总、会议资讯专栏.md) 202 | 203 | 204 | #### 3)大厂实践 205 | - [Presto 在车好多的实践](https://mp.weixin.qq.com/s/Bmqv54sVZgTqQ82I_RfmsA) 2020-12 206 | - [Presto 在滴滴的探索与实践](https://zhuanlan.zhihu.com/p/266162270) 2020-10 207 | - [Presto 在有赞的实践之路](https://tech.youzan.com/presto-zai-you-zan-de-shi-jian-zhi-lu/) 2020-04 208 | - [PrestoCon 2020:云原生数据湖分析DLA的Presto实践](https://zhuanlan.zhihu.com/p/260784762) 2020-03 209 | - [携程 Presto 技术演进之路](https://zhuanlan.zhihu.com/p/41538472) 2018-08 210 | - [Presto 实现原理和美团的使用实践](https://tech.meituan.com/2014/06/16/presto.html) 2014-06 211 | - [Presto 高性能引擎在美图的实践](https://zhuanlan.zhihu.com/p/408957032) 2021-09 212 | 213 | 214 | 215 | 216 | ## 三、数据湖(IceBerg、Hudi、Delta等) 217 | 218 | - [一文看懂:什么是数据库、数据湖、数据仓库、湖仓一体、智能湖仓?](https://www.smartcity.team/consultingskills/experience/shujukuyushujuhu/#comments) 2021-08 219 | 220 | 221 | ### ▶ Iceberg 222 | #### 1)官网、社区、博客 223 | - [Iceberg 官网](https://iceberg.apache.org/) 224 | 225 | 226 | #### 2)应用 227 | - [数据湖 Iceberg | 实时数据仓库的发展、架构和趋势](https://mp.weixin.qq.com/s?__biz=MzIwNTUxNTI1Ng==&mid=2247485623&idx=1&sn=9f03a36dbfc06c712b6132faabaa1dfd&chksm=972ef820a05971360311fd69c686e4b420222cfa639a1bcb5648bece4c3d886ae8f981712d8c&scene=21#wechat_redirect) 2021-03 228 | - [数据湖 Iceberg | Apache Iceberg 快速入门](https://mp.weixin.qq.com/s?__biz=MzIwNTUxNTI1Ng==&mid=2247485637&idx=1&sn=0489f233e3bda2bcef221c9532bb001e&chksm=972ef852a0597144538b7807948443a27e58f99ba33d17a7bcb12ccb8b382fd1d712d6e80cbc&cur_album_id=1746684202856579076&scene=190#rd) 2021-03 229 | - [数据湖 Iceberg | 如何正确使用 Iceberg](https://mp.weixin.qq.com/s?__biz=MzIwNTUxNTI1Ng==&mid=2247485644&idx=1&sn=b2194d8f3c1e7cf7e8e8d9296b9025e2&chksm=972ef85ba059714dc69472e3860497389f2ca4503d2cddeedd348695b5c314da49aad0278978&cur_album_id=1746684202856579076&scene=190#rd) 2021-04 230 | - [数据湖 Iceberg | 在网易云音乐的实践](https://mp.weixin.qq.com/s?__biz=MzIwNTUxNTI1Ng==&mid=2247485718&idx=1&sn=34347ac54e97877e4401ad37f1d15577&chksm=972ef981a059709724b7abab56786ef047a68f31fd829031d2214fa4994b9ec0f1b04e25318c&cur_album_id=1746684202856579076&scene=190#rd) 2021-04 231 | 232 | 233 | 234 | 235 | ### ▶ Hudi 236 | #### 1)官网、社区、博客 237 | - [Hudi 官网](https://hudi.apache.org/) 238 | 239 | #### 2)应用 240 | - [Flink CDC + Hudi + Hive + Presto 构建实时数据湖最佳实践](https://mp.weixin.qq.com/s/079VeDeIM_MQPyiiDX2l_w) 241 | 242 | 243 | 244 | 245 | ### ▶ Delta 246 | 247 | 248 | 249 | 250 | ## 四、大数据开发、应用(主要包括ETL、调度、数仓、数据应用等,例如Seatunnel、Dolphinscheduler等) 251 | 252 | ### ▶ Seatunnel 253 | 254 | 255 | 256 | ### ▶ DolphinScheduler 257 | 258 | 259 | 260 | ### ▶ 大数据架构 261 | - [SQL on Hadoop 在快手大数据平台的实践与优化](https://www.infoq.cn/article/BN9cJjg1t-QSWE6fqkoR) 2019-06 262 | - [携程机票大数据架构最佳实践](https://dbaplus.cn/news-73-1420-1.html) 2017-08 263 | - [火山引擎DataLeap一站式数据治理解决方案及平台架构](https://www.cnblogs.com/bytedata/p/17745908.html) 2023-10 264 | 265 | 266 | 267 | ### ▶ 数仓相关 268 | - [有赞数据仓库实践之路](https://tech.youzan.com/dw-in-youzan/) 2020-03 269 | - [OneData 建设探索之路:SaaS 收银运营数仓建设](https://tech.meituan.com/2019/10/17/meituan-saas-data-warehouse.html) 2019-10 270 | - [面向AI技术的工程架构实践 | 贝壳一站式大数据开发平台实践](https://www.infoq.cn/article/mmnwzdlcyjg83qm0tgqm) 2020-11 271 | 272 | 273 | 274 | 275 | ### ▶ 报表平台 276 | - [有赞 BI 平台实现原理](https://tech.youzan.com/principle-on-bi-platform/) 2021-01 277 | 278 | 279 | 280 | 281 | ## 五、数据治理(元数据管理、数据指标、数据标准、数据质量、数据安全等) 282 | 283 | ### ▶ 数据治理 284 | - [美团配送数据治理实践](https://tech.meituan.com/2020/03/12/delivery-data-governance.html) 2020-03 285 | - [全链路数据治理在网易严选的实践](https://www.infoq.cn/article/FOV6aEWRGNOfhD91YVcr) 2020-10 286 | - [数据资产、数据治理 - 有赞](https://tech.youzan.com/shu-ju-zi-chan-zan-zhi-zhi-li/) 2019-11 287 | - [美团酒旅起源数据治理平台的建设与实践](https://tech.meituan.com/2018/12/27/onedata-origin.html) 2018-12 288 | - [滴滴数据仓库指标体系建设实践](https://mp.weixin.qq.com/s/-pLpLD_HMiasyyRxo5oTRQ) 2020-08 289 | - [有赞指标库实践](https://tech.youzan.com/you-zan-zhi-biao-ku-shi-jian/) 2020-03 290 | - [浅谈有赞大数据安全体系](https://tech.youzan.com/you-zan-da-shu-ju-an-quan-ti-xi-jian-she-shi-jian/) 2021-01 291 | 292 | 293 | 294 | ### ▶ 元数据管理 295 | - [字节跳动构建Data Catalog数据目录系统的实践](https://www.cnblogs.com/bytedata/p/16189474.html) 2022-04 296 | - [有赞数据仓库元数据系统实践](https://tech.youzan.com/youzan-metadata/) 2018-08 297 | - [饿了么元数据管理实践之路](https://dbaplus.cn/news-73-2143-1.html) 2018-07 298 | - [数据治理方案技术调研 Atlas VS Datahub VS Amundsen](https://cloud.tencent.com/developer/article/1746714) 2020-11 299 | - [数据资产治理-元数据采集那点事 - 有赞](https://tech.youzan.com/zi-chan-zhi-li-yuan-shu-ju-cai-ji-na-dian-shi/) 2020-12 300 | - [来看看字节跳动内部的数据血缘用例与设计](https://segmentfault.com/a/1190000041452770) 2022-02 301 | - [携程数据血缘构建及应用](https://mp.weixin.qq.com/s/LGK3YPZCe6oPTf48QaAIqA) 2021-09 302 | - [Datahub](https://datahubproject.io/) A Metadata Platform for the Modern Data Stack 303 | 304 | 305 | 306 | 307 | 308 | ## 六、机器学习、AI 309 | 310 | ### ▶ 机器学习平台 311 | - [机器学习平台建设指南](https://mp.weixin.qq.com/s/HEg_6Gly2WMrcPD5Ao2n6g) 2021-04 312 | - [一站式机器学习平台建设实践](https://mp.weixin.qq.com/s/ZDRD0vAxkSqe4UeXi9avKQ) 2020-02 313 | - [汽车之家机器学习平台的架构与实践](https://blog.csdn.net/hellozhxy/article/details/107210015) 2020-07 314 | - [微博推荐算法实践与机器学习平台演进](https://blog.csdn.net/m0_37586850/article/details/116465255) 2021-05 315 | - [爱奇艺机器学习平台的建设实践](https://mp.weixin.qq.com/s/Np4w7RC2JFlB7ZGIduu71w) 2020-11 316 | - [爱奇艺一站式机器学习平台Deepthought的建设与初探](https://mp.weixin.qq.com/s?__biz=MzI0MjczMjM2NA==&mid=2247487206&idx=1&sn=c8db1e12378376722a1521f409149d44&chksm=e97692c5de011bd3f1b42a8112cd04c24907cb101ac5474b0054c95941ff5c4769a42d496f3a&scene=21#wechat_redirect) 2020-06 317 | - [一站式机器学习平台在 vivo AI 的实践](https://www.infoq.cn/article/THlkStomYLRgXL2hzm8w) 2020-02 318 | - [再见,Yarn!滴滴机器学习平台架构演进](https://mp.weixin.qq.com/s/iTfHv8EFx4O4G1sNxsuMkg) 2019-03 319 | - [网易严选机器学习平台建设实践](https://www.6aiq.com/article/1661745581086) 2022 320 | - [Sunfish-有赞智能平台实践](https://tech.youzan.com/sunfish/) 2020-06 321 | - [同程-利用已有的大数据技术,如何构建机器学习平台](https://www.infoq.cn/news/build-machine-learning-platform-bigdata) 2017-11 322 | 323 | 324 | 325 | 326 | ## 七、LLM应用 327 | 328 | ### ▶ Text2SQL 329 | - [NL2SQL基础系列(1):业界顶尖排行榜、权威测评数据集及LLM大模型(Spider vs BIRD)全面对比优劣分析](https://blog.csdn.net/sinat_39620217/article/details/137603846) 330 | - [NL2SQL基础系列(2):主流大模型与微调方法精选集,Text2SQL经典算法技术回顾七年发展脉络梳理](https://blog.csdn.net/sinat_39620217/article/details/137603958) 331 | - [NL2SQL进阶系列(1):DB-GPT-Hub、SQLcoder、Text2SQL开源应用实践详解](https://blog.csdn.net/sinat_39620217/article/details/137674671) 332 | 333 | 334 | 335 | 336 | ## 八、资源汇总 337 | 338 | ### ▶ 大厂技术博客 339 | - [美团技术团队](https://tech.meituan.com/) 340 | - [有赞技术团队](https://tech.youzan.com/) 341 | - [滴滴云博客](https://blog.didiyun.com/) 342 | 343 | 344 | 345 | ### ▶ 大数据相关网站 346 | - [dbaplus](https://dbaplus.cn/) 347 | 348 | 349 | 350 | ### ▶ 相关开源项目 351 | - [数仓相关开源项目汇总](columns/opensource/数仓相关开源项目汇总.md) 352 | 353 | 354 | 355 | ### ▶ 相关论文 356 | - [raft 中文翻译](https://github.com/maemual/raft-zh_cn/blob/master/raft-zh_cn.md) 357 | 358 | 359 | --------------------------------------------------------------------------------