└── README.md /README.md: -------------------------------------------------------------------------------- 1 | 大数据学习 2 | ======================= 3 | 4 | 大数据学习的资料整理。 5 | 6 | 经典文章 7 | ------------------- 8 | 9 | 对大数据给出 **整体认识**(架构/场景/方案说明) 或是 **重点说明**(关键组件及其特点) 。 10 | 11 | 1. [100 open source Big Data architecture papers for data professionals](https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan) 12 | \# 中文译文:[`PayPal`高级工程总监:读完这100篇论文就能成大数据高手](http://www.csdn.net/article/2015-07-07/2825148) 13 | 1. [The Log: What every software engineer should know about real-time data's unifying abstraction](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying) 14 | \# 中文译文:[日志:每个软件工程师都应该知道的有关实时数据的统一抽象](https://github.com/oldratlee/translations/tree/master/log-what-every-software-engineer-should-know-about-real-time-datas-unifying) 15 | 来自`LinkedIn`的`Kreps`发表的一篇博文,虽然很长,但是被称为[程序员 ***史诗般必读*** 文章](http://bryanpendleton.blogspot.hk/2014/01/the-log-epic-software-engineering.html)。 16 | 日志原本应该是运维人员掌握的,如今也是研发人员必须关心的,这是符合`DevOps`原则。 17 | 1. `Google`公开的大数据领域论文 18 | 1. [Big Data beyond MapReduce: Google's Big Data papers](http://blog.mikiobraun.de/2013/02/big-data-beyond-map-reduce-googles-papers.html) 19 | \# 中文译文:[那些年`Google`公开的大数据领域论文](http://www.csdn.net/article/2013-02-28/2814298-google-bigdata-papers) 20 | 1. [More Google Big Data papers: Megastore and Spanner](http://blog.mikiobraun.de/2013/03/more-google-papers-megastore-spanner-voted-commits.html) 21 | 22 | 已有的资料汇编 23 | ------------------- 24 | 25 | 1. [分布式系统(Distributed System)资料](https://github.com/ty4z2008/Qix/blob/master/ds.md) by @ty4z2008 26 | 1. [大数据应用与技术 - 入门资源汇编](https://github.com/memect/hao/blob/master/awesome/learn-big-data.md) by @memect 27 | 1. [详细的领域列表 - Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata) [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) 28 | 1. [The Hadoop Ecosystem Table](http://hadoopecosystemtable.github.io/) 29 | 30 | 书籍 31 | ------------------- 32 | 33 | 个人整理的[大数据书籍豆列](http://www.douban.com/doulist/40606671/) 34 | 35 | 讨论 & 科普 36 | ------------------- 37 | 38 | ### 典型技术 39 | 40 | - [初识大数据:`Hadoop`、`Spark`、`Storm`](http://blog.csdn.net/hyj_13/article/details/43021357) 41 | - [如何用形象的比喻描述大数据的技术生态?`Hadoop`、`Hive`、`Spark`之间是什么关系?](http://www.zhihu.com/question/27974418) 42 | - [大数据处理技术的趋势-五种开源技术介绍](http://www.thebigdata.cn/QiTa/9698.html) 43 | - [`Storm`与`Spark`:谁才是我们的实时处理利器](http://developer.51cto.com/art/201412/460116.htm) 44 | 45 | ### `Data Scientist` 46 | 47 | - [关于`Data Scientist`的介绍](http://www.douban.com/note/247983915/) 48 | - [Software engineer’s guide to getting started with data science](http://www.r-bloggers.com/software-engineers-guide-to-getting-started-with-data-science/) 49 | - [IT屌丝如何成为数据科学家?](http://www.open-open.com/news/view/1787f79) 50 | - [`R`和`python`的`numpy`,`scipy`和`pandas`,你会选择用哪个](http://www.zhihu.com/question/20388507) 51 | - [Stanford Machine Learning](https://class.coursera.org/ml-003/lecture/index) 52 | --------------------------------------------------------------------------------