├── README.md
├── archive-2014.md
└── awesome
    ├── bayesian-network-python.md
    ├── chinese-word-segmentation.md
    ├── chinese-word-similarity.md
    ├── computer-vision-dataset.md
    ├── crawler.md
    ├── dataset.md
    ├── deep-learning-introduction.md
    ├── entity-linking.md
    ├── fenci.md
    ├── health-data.md
    ├── image-cbr.md
    ├── imbalanced-data-classification.md
    ├── influential-user-social-network.md
    ├── learn-big-data.md
    ├── machine-learning-guide.md
    ├── machine-learning-reading.md
    ├── manifold-learning.md
    ├── mlss.md
    ├── multiclass-boosting.md
    ├── multitask-learning.md
    ├── nlp.md
    ├── ocr-tools.md
    ├── opendata-gbif.md
    ├── outlier-text-mining.md
    ├── phonetic_algorithm.md
    ├── piecewise-linear-regression.md
    ├── query-intent.md
    ├── question-answer.md
    ├── rdb-rdf.md
    ├── recurrent-neural-networks.md
    ├── reverse-proxy-load-balancer.md
    ├── semanticweb-dl
    ├── sparse-representation-cv.md
    ├── speech-recognition.md
    ├── stanford-cs224w.md
    └── test-recent.md


/README.md:
--------------------------------------------------------------------------------
 1 | # <img align="right" width=150 height=150 src="http://u.memect.com/shared/image/hao.png"/>  好东西传送门
 2 | [http://www.weibo.com/haoawesome](http://www.weibo.com/haoawesome)
 3 | * [简介](README.md#简介) :  [问答服务](README.md#问答服务),   [订阅服务](README.md#订阅服务),   [使用许可](README.md#使用许可) 
 4 | * [问答与传送档案](README.md#问答与传送档案)
 5 | * [通知与声明](README.md#通知与声明)
 6 | 
 7 | 
 8 | ## 简介
 9 | *好东西传送门* 支持微博上的知识传播，集成微博好人好东西，帮您快速解决问题，为您精选专业知识
10 | * [欢迎提供建议](https://github.com/memect/hao/issues/new)
11 | 
12 | ### 问答服务
13 | 1. 微博用户 [访问微博](http://www.weibo.com/haoawesome/)
14 |   * 发一条微博提问，里面加上 @好东西传送门
15 |   * 发私信给 好东西传送门
16 |  
17 | 2. github用户：
18 |   * [提问](https://github.com/memect/hao/issues/new)
19 |   * [跟踪问答进展](https://github.com/memect/hao/issues) 欢迎认领还没有回答的问题
20 | 
21 | ### 订阅服务 
22 | 1. 订阅微信公众号： 好东西传送门　（发送好东西传送门的一些推荐和＜机器学习日报＞）
23 | 
24 | <img width=150 height=150 src="http://u.memect.com/shared/image/hao-wechat.jpeg"/>
25 | 
26 | 2. [订阅好东西周报](http://haoweekly.memect.com/) (邮件列表，每周的问答与资源推荐合集，大约每周五发）
27 | 
28 | ### 使用许可
29 | 
30 | 本站内容许可证：[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/)
31 | <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
32 | 
33 | 
34 | 
35 | ## 问答与传送档案
36 | 
37 | 最新的内容请看好东西周报 [http://haoweekly.memect.com/](http://haoweekly.memect.com/) ， 每周更新
38 | 
39 | 2014-11以前的内容看 [存档](https://github.com/memect/hao/blob/master/archive-2014.md)
40 | 


--------------------------------------------------------------------------------
/awesome/bayesian-network-python.md:
--------------------------------------------------------------------------------
 1 | # Bayesian network 与python概率编程实战入门
 2 | contributors: @西瓜大丸子汤 @王威廉 @不确定的世界2012 @Rebecca1020
 3 | 
 4 | 
 5 | ##  [1. Bayesian network 入门讲义幻灯片](http://bigdata.memect.com/?tag=hao71)
 6 | 
 7 | http://www.cs.cmu.edu/~epxing/Class/10708/lectures/lecture2-BNrepresentation.pdf Directed Graphical Models: Bayesian Networks
 8 | * 王威廉 推荐
 9 | 
10 | http://www.ee.columbia.edu/~vittorio/Lecture12.pdf Inference and Learning in Bayesian Networks 
11 | 
12 | http://courses.cs.washington.edu/courses/cse515/09sp/slides/bnets.pdf Bayesian networks
13 | 
14 | 
15 | ## [2. 基于python的实战入门](http://python.memect.com/?tag=hao71)
16 | 
17 | [Bayesian Methods for Hackers](http://python.memect.com/?p=6737)  6000+ star book on github
18 | * 西瓜大丸子汤 推荐
19 | * 小猴机器人 推荐中文介绍 张天雷 写的《概率编程语言与贝叶斯方法实践》 http://www.infoq.com/cn/news/2014/07/programming-language-bayes
20 | 
21 | 
22 | [Frequentists and Bayesians series](http://python.memect.com/?tag=fb-series)  four blogs
23 | 
24 | [PyMC tutorial](http://python.memect.com/?p=8536)  pretty short
25 | 
26 | 
27 | ----
28 | 
29 | ## 补充相关材料
30 | ### 基于R的实战入门
31 | 
32 | http://site.douban.com/182577/widget/notes/12817482/note/273585095/ 贝叶斯网的R实现（ Bayesian network in R）
33 | 
34 | 
35 | ### 相关进阶
36 | http://bayes.cs.ucla.edu/BOOK-2K/index.html Causality: Models, Reasoning, and Inference　
37 | * Judea Pearl的书　http://en.wikipedia.org/wiki/Judea_Pearl
38 | 
39 | http://www.biostat.jhsph.edu/~cfrangak/papers/preffects.pdf Principal Stratification in Causal Inference - Biostatistics （2002）
40 | * Don Rubin
41 | * Rebecca1020 推荐
42 | 
43 | http://www.cs.cmu.edu/~epxing/Class/10708/lecture.html Probabilistic Graphical Models by Eric Xing(CMU) 
44 | * 王威廉 推荐
45 | 
46 | 
47 | http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage Bayesian Reasoning and Machine Learning by David Barber
48 | * @诸神善待民科组  推荐 (比 Koller 的 PGM 好读，好处是图多)
49 | 
50 | ### 相关微博
51 | 
52 | @王威廉 ：CMU机器学习系Eric Xing老师的Probabilistic Graphic Model 已经开了10个年头了， 这学期貌似是第一次把视频放在网上：http://t.cn/zTh9OqO 目前这学期的课程刚开始。
53 | 1月23日15:21
54 | http://weibo.com/1657470871/AtrlldqAU
55 | 
56 | @西瓜大丸子汤 ：在推荐一本我最近正在看的书Probabilistic Programming and Bayesian Methods for Hackers 贝叶斯方法实战，用Python来解释各种概率推理方法，有代码有真相。基于PyMC 包，解剖了MCMC ，大数定律，金融分析等概念与应用。Github上已经有5000颗星。
57 | 7月8日20:06
58 | http://weibo.com/1932835417/BcKj0k0Wx
59 | 
60 | 
61 | 
62 | @不确定的世界2012 ：【贝叶斯网的R实现（ Bayesian network in R）（一）gRain(1)】#本文主要介绍运用贝叶斯网的一些R语言工具。 贝叶斯网，又称信念网络或概率有向无环图模型（Bayesian network，belief network，probabilistic directed acyclic graphical m... http://t.cn/zToro0U
63 | 2013-7-2 19:52
64 | http://weibo.com/1768506843/zEfzDsln9
65 | 
66 | 
67 | 
68 | 
69 | @Rebecca1020 ：因果推断在USA分两大学派：因果推断本质统计做不了的，但为了能得到inference，必须要加入假设。不同假设就产生了两大不同的学派。西边以berkeley为主，Jordan他们搞的是bayesian network，用有向图来代表之间因果关系。东边Rubin在03年提出Principal stratification，以此为主要假设来进行统计推断。
70 | 2013-4-11 02:44
71 | http://weibo.com/1669820502/zrFNJv8DI
72 | 
73 | 张天雷 提供中文介绍《概率编程语言与贝叶斯方法实践》 //@小猴机器人: 来，给个中文介绍哈， http://t.cn/RPwbEPz
74 | http://www.weibo.com/5220650532/BmkyPihT4
75 | 


--------------------------------------------------------------------------------
/awesome/chinese-word-segmentation.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ## 综述
 3 | http://www.zhihu.com/question/19929473
 4 | 
 5 | ## 专题
 6 | 
 7 | http://www.52nlp.cn/%E4%B8%AD%E6%96%87%E5%88%86%E8%AF%8D%E5%85%A5%E9%97%A8%E4%B9%8B%E8%B5%84%E6%BA%90  中文分词
 8 | 
 9 | http://www.google.com/patents/US20100306139 google专利，中文姓名识别
10 | 
11 | https://github.com/fxsjy/jieba 中文分词
12 | 
13 | http://nlp.stanford.edu/software/CRF-NER.shtml  stanford named entity recognition
14 | 


--------------------------------------------------------------------------------
/awesome/chinese-word-similarity.md:
--------------------------------------------------------------------------------
  1 | # 中文词汇的语义相似度计算方法与工具  (Chinese Word Similarity) 
  2 | contributors: 
  3 |    杜振东_java , 
  4 |    刘知远THU , 
  5 |    昊奋, 
  6 |    算文解字,  
  7 |    Mr_UnderWaterrrrrr, 
  8 |    朱鉴，
  9 |    董力at北航，
 10 |    尘绳聋-SYSU， 
 11 |    西瓜大丸子汤，
 12 | 
 13 | card list:  http://hao.memect.com/?tag=ChineseWordSimilarity
 14 | more to read:
 15 |  * Word2vector: http://bigdata.memect.com/?tag=word2vec
 16 |  * GloVe:  http://hao.memect.com/?s=glove
 17 |  * Explicit Semantic Analysis (ESA):  http://nlp.memect.com/?tag=esa
 18 |  * python gensim: http://nlp.memect.com/?tag=gensim
 19 | 
 20 | 
 21 | discussion:  https://github.com/memect/hao/issues/67
 22 | 
 23 | https://github.com/memect/hao/blob/master/awesome/chinese-word-similarity.md
 24 | 
 25 | ## readings
 26 | ### word2vector
 27 | https://github.com/danielfrg/word2vec
 28 | 
 29 | http://radimrehurek.com/2013/09/deep-learning-with-word2vec-and-gensim/
 30 | 
 31 | http://radimrehurek.com/2014/02/word2vec-tutorial/
 32 | 
 33 | @Mr_UnderWaterrrrrr :
 34 | http://t.cn/8Fc67pF 如何用word2vector 去训练中文语料。获得词的距离
 35 | http://www.weibo.com/1969853791/Atq0vz18S
 36 | 
 37 | @朱鉴 :
 38 | LDA or Word2Vec: http://t.cn/8DkHrFg
 39 | http://www.weibo.com/1656097544/AiJDZbfQ5
 40 | 
 41 | @朱鉴 :
 42 | 这两天看了一下google的word2vec，目前看还是google的版本较容易理解，强调算法。这个算法的思想有点类似于latent factor model，假设设任何词可以用latent factor来表示，然后使用sgd算法去训练生成这个latent factor，假设非常棒！
 43 | http://www.weibo.com/1656097544/AhM49jMYL
 44 | 
 45 | ### glove 
 46 | http://stanford.edu/~jpennin/papers/glove.pdf  Richard Socher, EMNLP2014, GloVe: Global Vectors for Word Representation
 47 | 
 48 |  “Word similarity. 
 49 |  While the analogy task is our 
 50 |  primary focus since it tests for interesting vector
 51 |  space substructures, we also evaluate our model on
 52 |  a variety of word similarity tasks in Table 3. These
 53 |  include WordSim-353 (Finkelstein et al., 2001),
 54 |  MC (Miller and Charles, 1991), RG (Rubenstein
 55 |  and Goodenough, 1965), SCWS (Huang et al.,
 56 |  2012), and RW (Luong et al., 2013)”
 57 | 
 58 | http://blog.csdn.net/adooadoo/article/details/38505497 glove入门实战 
 59 | 
 60 | http://nlp.stanford.edu/projects/glove/
 61 | 
 62 | http://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes Improving Word Representations Via Global Context And Multiple Word Prototypes  earlier work 
 63 | 
 64 | 
 65 | @杜振东_java :
 66 | 深夜总算完成了《glove入门实战》的码字工作，发出两张利用glove聚类的效果图，具体工作参考 http://t.cn/RP0xXNx ，代码在此 http://t.cn/RP0xOx0   感谢@刘知远THU 老师提供关于glove的信息，并感谢@张成_ICT 的帮助,顺便@夏睿 老师和@章成志 老师
 67 | http://www.weibo.com/1247953577/BhRfpyyJw
 68 | 
 69 | @刘知远THU :
 70 | 斯坦福Richard Socher在EMNLP2014发表新作：GloVe: Global Vectors for Word Representation 粗看是融合LSA等算法的想法，利用global word co-occurrence信息提升word vector学习效果，很有意思，在word analogy task上准确率比word2vec提升了11%。 http://t.cn/RPohHyc
 71 | http://www.weibo.com/1464484735/BhbLD70wa
 72 | 
 73 | ＠董力at北航 :
 74 | Yoav Goldberg写了个测评文档，大致结论就是GloVe和word2vec如果正常比的话 效果差不多，没有宣称的11%这么大。。 链接：http://t.cn/RP0gMXB
 75 | http://www.weibo.com/1895401411/BhVDWofI5
 76 | 
 77 | 
 78 | @康积华_绩点侠：richard socher有一篇12年的文章是使用神经网络来做这个，Improving Word Representations Via Global Context And Multiple Word Prototypes，顺势开始大量使用dl去做这些任务，可以去他主页上一看 (今天 08:03)
 79 | * http://www.socher.org/uploads/Main/HuangSocherManning_ACL2012.pdf
 80 | http://www.weibo.com/5220650532/BnmMGBraU
 81 | 
 82 | 
 83 | @刘知远THU 转发于2014-11-23 10:09
 84 | 不少同学都在关注GloVe和word2vec并好奇它们的异同。贵系本科生史天泽利用NIPS上neural word embedding as implicit matrix factorization的结论，对两者优化目标做了简单分析和实验验证，结论整理成Linking GloVe with word2vec放在arxiv上，供大家参考，欢迎讨论和建议。http://t.cn/RzyMrkm [呵呵]
 85 | 
 86 | 
 87 | ### ESA (Explicit Semantic Analysis)
 88 | 
 89 | http://en.wikipedia.org/wiki/Explicit_semantic_analysis
 90 | 
 91 | http://www.cs.technion.ac.il/~gabr/papers/ijcai-2007-sim.pdf Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, (2007) IJCAI
 92 | 
 93 | 
 94 | @刘知远THU : 
 95 | 可以考虑用传统的distributional representation/similarity的方法，即选取这些关键词出现的上下文的词来表示它，构建分类器。或者explicit semantic analysis（ESA），即用关键词在wikipedia文章中出现的情况来表示它。这些应该都比LDA的topic distribution更具区分能力。
 96 | http://www.weibo.com/1464484735/BfMxEh40q
 97 | 
 98 | @昊奋 : 
 99 | 对于ESA，如果单纯使用wikipedia，由于中文维基百科的语料相比英语小很多，所以其实不满足ESA本身需要有高覆盖率的好处，需要自行采用百度百科或互动百科进行处理。我们会考虑利用zhishi.me来为大家提供ESA的服务。
100 | http://www.weibo.com/2045933955/BhWfr2LYv
101 | 
102 | @Siegfried围脖：
103 | 我们也在做类似的工作，简单滴说，就是利用主题学习补充 既有概念体系的空缺。。。 
104 | http://www.weibo.com/1578099090/Bj2N9kyhc?mod=weibotime
105 | 
106 | ### python gensim
107 | 
108 | https://github.com/piskvorky/gensim/
109 | 
110 | @算文解字 :
111 | 基于分布的：Python gensim一般就够用了，包括了传统的bag-of-words (1-hot) vector representation基础上的模型，以及几种常见相似度表征，还有最新的word2vec都有。
112 | 基于资源的：中文没有免费的类似wordnet的资源，hownet是要收费的。然而也许会有帮助的一个免费资源是哈工大的扩展板"同义词词林"
113 | 
114 | @西瓜大丸子汤 : 
115 | 刚才说到python优化，举个具体的例子 Gensim的作者把word2vec(深度学习)做了几个经典优化：循环，numpy/BLAS，cython，多线程（真的可以）结果效率提高了上千倍，比Google开源出来的原始C版本还快3倍。他最近还写了个word2vec教程。无论是学习word2vec还是python优化，都不可不看 http://t.cn/Rvkt0Hk
116 | http://www.weibo.com/1932835417/BcSwEc2iu
117 | 
118 | @尘绳聋-SYSU：Sklearn没有LDA/LSA让我很郁闷，不过还好有好用的gensim: http://t.cn/8k2M2tU PS. Python搞NLP好方便！
119 | http://www.weibo.com/1254062861/B8WGG8Yii
120 | 
121 | 
122 | ### more readings
123 | http://cs.tju.edu.cn/szdw/jsfjs/fengwei/papers/ICASSP2013_Nie/icassp2013.pdf MEASURING SEMANTIC SIMILARITY BY CONTEXTUAL WORD CONNECTIONS IN CHINESE NEWS STORY SEGMENTATION
124 | 
125 | 
126 | http://www.cs.york.ac.uk/semeval-2012/task4/  Peng Jin, Yunfang Wu,  Evaluating Chinese Word Similarity
127 | 
128 | 


--------------------------------------------------------------------------------
/awesome/computer-vision-dataset.md:
--------------------------------------------------------------------------------
 1 | # 计算机视觉数据集不完全汇总
 2 | contributors: @丕子 @邹宇华 @李岩ICT人脸识别 @网路冷眼 @王威廉 @金连文 @数据堂  zhubenfulovepoem@cnblog
 3 | 
 4 | created: 2014-09-24
 5 | 
 6 | keywords: computer vision,  dataset
 7 | 
 8 | discussion: https://github.com/memect/hao/issues/222
 9 | 
10 | 
11 | ## 经典/热点计算机视觉数据集
12 | * http://yann.lecun.com/exdb/mnist/ The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. Collected by Yann LeCun, Corinna Cortes, Christopher J.C. Burges
13 | * http://www.cs.toronto.edu/~kriz/cifar.html cifar10 The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 
14 | * http://en.wikipedia.org/wiki/Caltech_101 Caltech 101 is a data set of digital images created in September, 2003, compiled by Fei-Fei Li, Marco Andreetto, Marc 'Aurelio Ranzato and Pietro Perona at the California Institute of Technology. It is intended to facilitate Computer Vision research and techniques. It is most applicable to techniques involving recognition, classification, and categorization. 
15 | * http://www.image-net.org/ ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. CVPR 这几年的竞赛用这个数据集测试
16 | * http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for @网路冷眼 推荐【Yahoo实验室公开1亿Flickr图像和视频供研究之用】 One Hundred Million Creative Commons Flickr Images for Research
17 | * http://sourceforge.net/projects/oirds/ Overhead Imagery Research Data Set (OIRDS) - an annotated data library & tools to aid in the development of computer vision algorithms
18 | 
19 | ## 计算机视觉数据集：目录
20 | * http://riemenschneider.hayko.at/vision/dataset/ @邹宇华 推荐 比较新的一个计算机视觉数据库网站 Yet Another Computer Vision Index To Datasets (YACVID) 200多数据集 
21 | * http://www.computervisiononline.com/datasets   @丕子 Richard Szeliski 推荐 上百数据集
22 | * http://www.cvpapers.com/datasets.html 上百数据集
23 | * http://datasets.visionbib.com/  Richard Szeliski 推荐 有分类  
24 | * http://homepages.inf.ed.ac.uk/rbf/CVonline/  @李岩ICT人脸识别 Richard Szeliski 推荐  有分类  
25 | * http://blog.csdn.net/zhubenfulovepoem/article/details/7191794 由 [zhubenfulovepoem](http://my.csdn.net/zhubenfulovepoem) (cnblog) 整理自ComputerVision: Algorithms and Applications by Richard Szeliski
26 | 
27 | * http://vision.ucsd.edu/datasetsAll  UCSD 数据集
28 | * http://www-cvr.ai.uiuc.edu/ponce_grp/data/ UIUC Datasets
29 | * http://www.vcipl.okstate.edu/otcbvs/bench/ OTCBVS Datasets
30 | * http://www.nicta.com.au/research/projects/AutoMap/computer_vision_datasets  @数据堂  推荐NICTA Pedestrian Dataset(澳大利亚信息与通讯技术研究中心行人数据库) 论文 http://www.nicta.com.au/pub?doc=1245
31 | * http://clickdamage.com/sourcecode/cv_datasets.php 几十个数据集，有分类
32 | * http://www.iapr-tc11.org/mediawiki/index.php/Datasets_List @金连文 推荐 IAPR TC11的官网上有许多文档处理相关的数据集，例如联机及脱机手写数据、Text、自然场景的文档图像
33 | * http://en.wikipedia.org/wiki/Category:Datasets_in_computer_vision 维基百科的列表 列了几个经典数据集
34 | * http://webscope.sandbox.yahoo.com/catalog.php?datatype=i  @王威廉 推荐
35 | 
36 | 
37 | ## 计算机视觉数据集：人脸识别：目录
38 | * http://www.face-rec.org/databases/  几十个数据集
39 | * http://en.wikipedia.org/wiki/Comparison_of_facial_image_datasets  11个数据集列表对比
40 | 
41 | ## 基本策略
42 | 通常可以查阅相关论文或竞赛，再顺藤摸瓜找数据集，有时还需要联系原作者， ICCV， CVPR 应该都有一些线索
43 | 


--------------------------------------------------------------------------------
/awesome/crawler.md:
--------------------------------------------------------------------------------
 1 | # 网络爬虫（Web crawler）资料
 2 | 
 3 | ## 概念
 4 | http://en.wikipedia.org/wiki/Web_crawler A Web crawler is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing. A Web crawler may also be called a Web spider,[1] an ant, an automatic indexer,[2] or (in the FOAF software context) a Web scutter.
 5 | 
 6 | http://zh.wikipedia.org/zh-cn/%E7%B6%B2%E8%B7%AF%E8%9C%98%E8%9B%9B 网络蜘蛛（Web spider）也叫网络爬虫（Web crawler）[1]，蚂蚁（ant），自动检索工具（automatic indexer），或者（在FOAF软件概念中）网络疾走（WEB scutter），是一种“自动化浏览网络”的程序，或者说是一种网络机器人。它们被广泛用于互联网搜索引擎或其他类似网站，以获取或更新这些网站的内容和检索方式。它们可以自动采集所有其能够访问到的页面内容，以供搜索引擎做进一步处理（分检整理下载的页面），而使得用户能更快的检索到他们需要的信息。
 7 | 
 8 | ## 基本爬虫框架和最简单的例子
 9 | 
10 | ![](http://upload.wikimedia.org/wikipedia/commons/thumb/d/df/WebCrawlerArchitecture.svg/300px-WebCrawlerArchitecture.svg.png)
11 | * URL列表（queue): 一个数据表包含一组URL。需要初始化，每次循环后加入未访问过的URL。要有去重机制。 高级一些还要避免爬虫陷阱。
12 | * 调度器（scheduler）：选择queue里的URL，以设定的频率，顺序或并发地调用下载模块。最简单实现就是for循环
13 |   * 注意遵循[爬虫机器人须知 Robots.txt](http://en.wikipedia.org/wiki/Robots_exclusion_standard)。
14 | * 下载器（downloader）：给定一个URL，下载URL的网页内容(content) 以及相关元数据(http header)，写到下载数据storage中。一般都有HTTP客户端开源实现
15 |   * 链接提取器（link extractors): 解析网页文本内容，提取URL，最后写到queue里。 可以任选字符串匹配，正则表达式，网页解析器(html/xml parser)等工具实现。
16 | * 下载数据存储（storage）：，同时保存网页内容（文本、图片...)和下载时的相关元数据(URL,下载时间, 文件大小, 服务器端最后更新时间...）
17 | 
18 | 下面是两个非常简单的可执行代码样例
19 | * https://cs.nyu.edu/courses/fall02/G22.3033-008/WebCrawler.java  - java 
20 | * https://github.com/kezakez/python-web-crawler  - python
21 | 
22 | 
23 | ## 进阶讲义
24 | * http://www.slideshare.net/denshe/icwe13-tutorial-webcrawling
25 | 
26 | ## 开源工具
27 | * http://java-source.net/open-source/crawlers Open Source Crawlers in Java
28 | * http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers Open-source crawlers
29 | 


--------------------------------------------------------------------------------
/awesome/dataset.md:
--------------------------------------------------------------------------------
 1 | ## dataset catalogs
 2 | 
 3 | https://snap.stanford.edu/data/ Stanford Large Network Dataset Collection
 4 | 
 5 | http://www.rdatamining.com/resources/data Free Datasets for R
 6 | 
 7 | http://aws.amazon.com/publicdatasets/
 8 | 
 9 | http://catalog.data.gov/dataset 
10 | 
11 | http://data.worldbank.org/
12 | 
13 | http://www.infochimps.com/datasets/
14 | 
15 | http://ckan.org/instances/#
16 | 
17 | http://archive.ics.uci.edu/ml/datasets.html
18 | 
19 | http://www.kdnuggets.com/datasets/index.html
20 | 
21 | 
22 | ## individual datasets
23 | https://developers.google.com/freebase/data freebase
24 | 
25 | https://archive.org/details/stackexchange stack overflow
26 | 
27 | http://commoncrawl.org/data/accessing-the-data/ common crawl
28 | 
29 | http://km.aifb.kit.edu/projects/btc-2012/ billion triple challenge (including dbpedia, dblp, tumbler ...)
30 | 


--------------------------------------------------------------------------------
/awesome/deep-learning-introduction.md:
--------------------------------------------------------------------------------
  1 | ## 深度学习入门与综述资料
  2 | 
  3 | contributors:  @自觉自愿来看老婆微博 @邓侃 @星空下的巫师
  4 | 
  5 | created: 2014-09-16
  6 | 
  7 | 
  8 | ## 初学入门
  9 | http://en.wikipedia.org/wiki/Deep_learning Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using model architectures composed of multiple non-linear transformations.
 10 | 
 11 | 
 12 | ### 科普短文
 13 | 
 14 | http://cacm.acm.org/magazines/2013/6/164601-deep-learning-comes-of-age/abstract Deep Learning Comes of Age
 15 | * Gary Anthes. 2013. Commun. ACM 56, 6 (June 2013),下载PDF http://phdtree.org/pdf/29093526-deep-learning-comes-of-age/
 16 | * @星空下的巫师 @自觉自愿来看老婆微博 共同推荐
 17 | 
 18 | <img width=100 src="http://cacm.acm.org/system/assets/0001/1870/052213_CACMpg13_Deep-Learning2.large.jpg?1369232776&1369232776">
 19 | 
 20 | http://www.datarobot.com/blog/a-primer-on-deep-learning/ A Primer on Deep Learning （2014）
 21 | 
 22 | <img width=100 src= "https://s3.amazonaws.com/datarobotblog/images/deepLearningIntro/009.png">
 23 | 
 24 | ### 基于编程语言的实战入门
 25 | * http://deeplearning.net/tutorial/gettingstarted.html  Getting Started (通过python编程学习基本概念)
 26 | * http://karpathy.github.io/neuralnets/  以独特视角讲NN（Javascript ConvNetJS )
 27 | 
 28 | 
 29 | ### 入门指南
 30 | 
 31 | http://deeplearning.net/tutorial/ Deep Learning Tutorials 
 32 | * [600+ star on github](https://github.com/lisa-lab/DeepLearningTutorials) 
 33 | 
 34 | 
 35 | http://neuralnetworksanddeeplearning.com/index.html  Michael Nielsen (2014) 概念讲得很细致 
 36 | * @自觉自愿来看老婆微博 共同推荐
 37 | 
 38 | 邓侃  Deep Learning 系列
 39 | * http://blog.sina.com.cn/s/blog_46d0a3930101fswl.html  Deep Learning 和 Knowledge Graph 引爆大数据革命
 40 | * http://blog.sina.com.cn/s/blog_46d0a3930101gs5h.html Deep Learning 【2,3】
 41 | * http://blog.sina.com.cn/s/blog_46d0a3930101h6nf.html  Deep Learning 教程翻译
 42 | 
 43 | http://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckdqtpe  伯克利Michael Jordan教授论深度学习, 附上[学习笔记](http://www.weibo.com/5220650532/BmtY3eXDx)
 44 | 1. layer,parallel,ensemble有用,不能限于模拟人脑思维
 45 | 2. backpropagation是关键, 本质是supervised learning
 46 | 3. 很多成功案例是大规模样本＋监督学习
 47 | 4. 很少用在工业界咨询,不少其它问题(7个例子)
 48 | 5. 机器学习不止是AI,还要接近system与数据库
 49 | 
 50 | 
 51 | ## 综述与分支   
 52 | 注意Vision、Text、Speech都用DL，用法不尽相同
 53 | 
 54 | 
 55 | http://research.microsoft.com/pubs/204048/APSIPA-Trans2013-revised-final.pdf
 56 | Li Deng, A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning , in APSIPA Transactions on Signal and Information Processing, Cambridge University Press, 2014
 57 | * 还有一个大部头 http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol7-SIG-039.pdf  Deep Learning Methods and Applications， Li Deng and Dong Yu
 58 | 
 59 | 
 60 | 
 61 | ### Text 文本 NLP
 62 | http://nlp.stanford.edu/courses/NAACL2013/ Deep Learning for Natural Language Processing (without Magic)
 63 | * 自然语言处理 NLP 方向（文本为主）
 64 | 
 65 | ### Speech 语音 NLP
 66 | http://research.microsoft.com/pubs/217165/ICASSP_DeepTextLearning_v07.pdf  Deep learning for natural language processing and related applications (Tutorial at ICASSP)
 67 | * Xiaodong He, Jianfeng Gao, and Li Deng
 68 | * 自然语言处理 NLP 方向 （语音为主，也包括文本）
 69 | * spoken language understanding (SLU), machine translation (MT), and semantic information retrieval (IR) from text.
 70 | 
 71 | ### Computer Vision 视觉 
 72 | https://sites.google.com/site/deeplearningcvpr2014/  TUTORIAL ON DEEP LEARNING FOR VISION
 73 | * Computer vision,  CVPR 2014 Tutorial 
 74 | * 计算机视觉 方向
 75 | * cardbox  http://bigdata.memect.com/?tag=cvpr2014+vision
 76 | 
 77 | 
 78 | Yann LeCun's Lecture on Computer Perception with Deep Learning in Course 9.S912: "Vision and learning - computers and brains", Nov 12, 2013:
 79 | * Part1: http://techtv.mit.edu/videos/26739-yann-lecun-computer-perception-with-deep-learning-part-1
 80 | * Part2: http://techtv.mit.edu/videos/26740-yann-lecun-computer-perception-with-deep-learning-part-2
 81 | * 计算机视觉 方向
 82 | 
 83 | 
 84 | ## 过去的相关推荐
 85 | 
 86 | * https://github.com/memect/hao/issues/90 深度学习做推荐的文章资料
 87 | * https://github.com/memect/hao/issues/39 机器学习和深度学习在多媒体信息检索领域的资源
 88 | * https://github.com/memect/hao/issues/31 深度学习或者机器学习在图像检索
 89 | * https://github.com/memect/hao/issues/181 deep learning相关的图像检索资料
 90 | * https://github.com/memect/hao/issues/30 深度学习在文本挖掘或者自然语言处理(NLP)方面的好的资源
 91 | * https://github.com/memect/hao/issues/168  Michael Jordan 论深度学习
 92 | * https://github.com/memect/hao/issues/184  深度学习入门
 93 | * https://github.com/memect/hao/issues/190  深度学习工具箱
 94 | 
 95 | ## 计算工具
 96 | ###theano
 97 | <img src="https://cloud.githubusercontent.com/assets/8302062/4296833/99106f56-3dfe-11e4-9437-10c29aefee67.jpg"/>
 98 | 
 99 | ### caffe
100 | <img  src="http://emma.memect.com/t/6d9fcce0c36ac740b5c9ebddfb6b89b2ad308408a026671cfbd27657dde4439f/caffe.jpg"/>
101 | 
102 | ### Torch-7
103 | <img  src="http://emma.memect.com/t/101449380b840a422860b5a2be6524394d646f812417e4315fb100369ca4b169/torch7.jpg"/>
104 | 
105 | 
106 | ### matlab deeplearning toolbox
107 | <img height=400 src="http://emma.memect.com/t/b8398867d7e5d7184192306fd2c19da0ceee23eec8357be34bd6184d6dceabec/content.jpg"/>
108 | 
109 | 


--------------------------------------------------------------------------------
/awesome/entity-linking.md:
--------------------------------------------------------------------------------
1 | # reading lists
2 | http://nlp.cs.rpi.edu/kbp/2014/elreading.html Entity linking paper reading list, by Heng Ji.
3 | 
4 | # tutorial 
5 | http://nlp.cs.rpi.edu/paper/wikificationtutorial.pdf ACL 2014 wikification tutorial by Dan Roth (UIUC), Heng Ji (RPI), Ming-Wei Chang (MSR), and Taylor Cassidy (ARL, IBM)
6 | 


--------------------------------------------------------------------------------
/awesome/fenci.md:
--------------------------------------------------------------------------------
 1 | Ansj中文分词　　java
 2 | 
 3 | http://t.cn/zWDqIRw
 4 | 
 5 | python　结巴分词
 6 | 
 7 | http://t.cn/zlfOaMU
 8 | 
 9 | 结巴"中文分词的C++版本
10 | 
11 | http://t.cn/RPICG0o
12 | 
13 | 技术文章：
14 | 
15 | 基础类(这两个选一个就可以)：
16 | 
17 | http://t.cn/RPICqae　
18 | 
19 | http://t.cn/zHm2KHK
20 | 
21 | 常用的算法
22 | ＣＲＦ
23 | 
24 | http://t.cn/RPIC5fy
25 | 
26 | ＨＭＭ
27 | 
28 | http://t.cn/zOec8CW
29 | 
30 | 数据结构
31 | 
32 | ｔｉｒｅ树
33 | 
34 | http://t.cn/RPIC5mA
35 | 
36 | 双数组
37 | 
38 | http://t.cn/ar6lK9
39 | 


--------------------------------------------------------------------------------
/awesome/health-data.md:
--------------------------------------------------------------------------------
 1 | 国际组织相关卫生统计数据
 2 | 
 3 | http://t.cn/8FDT5pG
 4 | 
 5 | http://t.cn/RPSIhDv
 6 | 
 7 | http://t.cn/RPSIhDZ
 8 | 
 9 | http://t.cn/RPSIhDP
10 | 
11 | 美国卫生统计数据是分散在各个部门
12 | 
13 | http://t.cn/RPSIhDh 
14 | 
15 | http://t.cn/RPSIhDz
16 | 
17 | http://t.cn/RPSIhD7 
18 | 
19 | 中国的卫生统计数据
20 | 
21 | http://t.cn/zYK9zeF
22 | 
23 | 芝加哥大学有个主页搜集了一些卫生统计数据
24 | 
25 | http://t.cn/RPSIhDw
26 | 


--------------------------------------------------------------------------------
/awesome/image-cbr.md:
--------------------------------------------------------------------------------
 1 | http://www.openimaj.org/
 2 | 
 3 | http://www.openimaj.org/tutorial-pdf.pdf
 4 | 
 5 | https://code.google.com/p/lire/
 6 | 
 7 | http://demo-itec.uni-klu.ac.at/liredemo/
 8 | 
 9 | http://www.phash.org/
10 | 
11 | http://www.phash.org/docs/pubs/thesis_zauner.pdf
12 | 


--------------------------------------------------------------------------------
/awesome/imbalanced-data-classification.md:
--------------------------------------------------------------------------------
 1 | # 不平衡数据分类(Imbalanced data classification)
 2 | 
 3 | contributors: AixinSG, 刘知远THU , xierqi , eacl_newsmth 
 4 | 
 5 | https://github.com/memect/hao/blob/master/awesome/imbalanced-data-classification.md
 6 | 
 7 | card list: http://bigdata.memect.com/?tag=imbalanceddataclassification
 8 | 
 9 | discussion: https://github.com/memect/hao/issues/47
10 | 
11 | keywords:
12 |   Positive only,
13 |   Imbalanced data,
14 |   classification,
15 |   
16 |   
17 | ## readings
18 | 
19 | ### survey
20 | http://www.cs.cmu.edu/~qyj/IR-Lab/ImbalancedSummary.html  Yanjun Qi, A Brief Literature Review of Class Imbalanced Problem
21 | (2004)
22 | 
23 | ### classic
24 | http://homes.cs.washington.edu/~pedrod/papers/kdd99.pdf  (@xierqi 推荐) Domingo,  MetaCost: A General Method for Making Classifiers Cost, KDD 1999
25 | 
26 | https://www.jair.org/media/953/live-953-2037-jair.pdf  SMOTE: Synthetic Minority Over-sampling Technique (2002) JAIR
27 | 
28 | 
29 | http://cseweb.ucsd.edu/~elkan/posonly.pdf  Learning Classiﬁers from Only Positive and Unlabeled Data (2008)
30 | 
31 | http://www.ele.uri.edu/faculty/he/PDFfiles/ImbalancedLearning.pdf Haibo He,  Edwardo A. Garcia . (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284.
32 | 
33 | http://www.computer.org/csdl/proceedings/icnc/2008/3304/04/3304d192-abs.html Guo, X., Yin, Y., Dong, C., Yang, G., & Zhou, G. (2008). On the Class Imbalance Problem. 2008 Fourth International Conference on Natural Computation (pp. 192-201).
34 | 
35 | 
36 | 
37 | ### current
38 | http://www.aclweb.org/anthology/P/P13/P13-2141.pdf (@eacl_newsmth 推荐)  Towards Accurate Distant Supervision for Relational Facts Extraction, acl 2013
39 | 
40 | http://link.springer.com/article/10.1007/s10618-012-0295-5 Training and assessing classification rules with imbalanced data (2014) Data Mining and Knowledge Discovery
41 | 
42 | http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/viewFile/6353/6827 An Effective Approach for Imbalanced Classification: Unevenly Balanced Bagging (2013) AAAI
43 | 
44 | 
45 | 
46 | 
47 | ### further readings
48 | http://stackoverflow.com/questions/12877153/tools-for-multiclass-imbalanced-classification-in-statistical-packages
49 | 
50 | 
51 | ## tools
52 | 
53 | http://www.nltk.org/_modules/nltk/classify/positivenaivebayes.html   nltk
54 | 
55 | http://weka.wikispaces.com/MetaCost  Weka
56 | 
57 | http://tokestermw.github.io/posts/imbalanced-datasets-random-forests/ smote 
58 | 
59 | https://github.com/fmfn/UnbalancedDataset based on SMOTE
60 | 
61 | ## datasets
62 | 
63 | http://pages.cs.wisc.edu/~dpage/kddcup2001/  Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin
64 | 
65 | http://code.google.com/p/imbalanced-data-sampling/ Imbalanced Data Sampling Using Sample Subset Optimization
66 | 
67 | #### dataset list
68 | https://archive.ics.uci.edu/ml/datasets.html?format=&task=cla&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=table  UCI dataset repo, classification category
69 | 
70 | http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html  dataset list
71 | 
72 | 
73 | ## discussion
74 | ### @eastone01 不平衡数据分类数据集 https://github.com/memect/hao/issues/47
75 | 
76 | <b>请问目前有木有关于不平衡数据分类（imbalance dataset classification）任务的人工二维toy dataset?</b>
77 | 
78 | AixinSG：Undersampling 总体上效果有限，个人理解
79 |  
80 | 刘知远THU: 不平衡数据分类，尤其是标注正例特别多，几乎没有标注负例，但有大量未标注数据的话，应当怎么处理呢？这个问题在relation extraction中很普遍。现在只能在大量未标注数据中随机抽样作为负例。
81 | 
82 | xierqi: 有段调研过这方面，90%都是采样，最大问题是评估方法不适合真实场景。个人推荐domingos的meta-cost，非常实用，经验设下cost就好。http://t.cn/RPiexE9
83 | 
84 | eacl_newsmth: 在关系抽取中，是正例特别多? 没有负例么？我怎么觉得很多情况下是正例有限，但负例很多（当然你也可以argue说负例其实很难界定）。。。。
85 | 
86 | 刘知远THU：回复@eacl_newsmth: 就像knowledge graph中可以提供很多正例，但负例需要通过随机替换正例中的entity来产生，这样容易把也是正确的样例当成负例来看。
87 | 
88 | eacl_newsmth：回复@刘知远THU:恩，我估计你就要说这个例子，所以我在后面说，看你怎么界定负例，哈哈，我也纠结过好久，后来觉得其实还是正例少，而且很多时候你能保证正例是对的么？	
89 | 
90 | 刘知远THU：回复@eacl_newsmth: 正例基本是正确的，例如来自Freebase的，但负例对效果影响很大。:)今年AAAI有篇MSRA做的TransH的模型中，就提出一个负例选取的trick，效果拔群。
91 | 
92 | eacl_newsmth：回复@刘知远THU:恩，KB中的实例确实是正确的，但是依据这些实例去海量文档中寻找的那些样本未必是正确的啊。 就目前的工作来看，确实很多在负例上做文章的工作都能把效率提升一些，去年语言所的一个学生利用“关系”特性，优选训练样本，也确实能提升性能。但单就这个问题而言，不能回避正例的可靠性	
93 | 
94 | 刘知远THU：回复@eacl_newsmth: 你说的这篇文章能告诉一下题目么？我现在关注的还不是从文本中抽关系，而是做knowledge graph completion，有点类似于graph上的link prediction，但要预测的link是有不同类型的relation。
95 | 
96 | eacl_newsmth：回复@刘知远THU:http://t.cn/RPX75A3 恩，看了你们那里一个小伙的talk，感觉和sebastian之前的工作很相关啊，也许是他表述的问题？啥时候回北京？可以好好讨论一下。
97 | 
98 | 


--------------------------------------------------------------------------------
/awesome/influential-user-social-network.md:
--------------------------------------------------------------------------------
 1 | # Influential User Identification in Online Social Networks
 2 | 
 3 | contributors:  @唐小sin @善良的右行
 4 | 
 5 | discussion: https://github.com/memect/hao/issues/89
 6 | 
 7 | keywords:
 8 |  意见领袖 ( opinion leader),
 9 |  user influence,
10 |  inﬂuential spreaders ,
11 |  inﬂuential user ,
12 |  twitter ,
13 | 
14 | # 微博讨论精华
15 | 
16 | 善良的右行：@好东西传送门 这几篇论文略旧……当然引用率是不用说的……貌似问题本质是重要节点挖掘……菜鸟冒泡一下……不知说的对不对…… (今天 14:45)
17 | 
18 | 好东西传送门：发现重要节点一直是社交网络研究的重要问题, 研究热点大约在2007~2010社交媒体蓬勃发展的时候, 2014年已经有influential user identification的综述了.鉴于这类研究的算法并不困难，但数据量较大且较难获得，研究前沿已经逐渐从学术界转移到工业界/创业应用。http://t.cn/RPQfWRW (52分钟前)
19 | 
20 | 唐小sin：的确是这样，现在social influence这块需要一个很好的问题去解，感觉就是做得太多很难入手。
21 | 
22 | 
23 | 
24 | 唐小sin：任何influence的文章都可以哪来读读，而至于意见领袖不妨看看twitterrank (今天 15:13)
25 | 
26 | 好东西传送门：回复@唐小sin: 这篇文章很不错哦， 还对比了TunkRank， Topic-sensitive PageRank (TSPR) (44分钟前)
27 | 
28 | 
29 | 善良的右行：@好东西传送门 惭愧，我也是菜鸟，当然很乐意共享：Identification of influentialspreaders in complex networks；Leaders in Social Networks, the Delicious Case； Absence of influential spreaders in rumor dynamics，都是牛人牛文……
30 | 
31 | 
32 | @好东西传送门: 回复@善良的右行: 这几个推荐文章都很好呀,第一篇引用率都快400了. 要不是了解领域,谁能想到这个关键词呢, influential spreaders . 意共享：Identification of influentialspreaders in complex networks；Leaders in Social Networks, the De
33 | 
34 | 
35 | 
36 | 
37 | # readings
38 | 
39 | ## industry
40 | http://mashable.com/2014/02/25/socialrank-brands/  SocialRank Tool Helps Brands Find Most Valuable Followers (2014)
41 | 
42 | http://www.smallbusinesssem.com/find-interesting-influential-twitter-users/3974/  Quick Way to Find Interesting & Influential Twitter Users (2011)
43 | 
44 | ## readings
45 | 
46 | ### inﬂuential user/spreader identification/ranking
47 | http://link.springer.com/chapter/10.1007/978-3-319-01778-5_37 Survey of Influential User Identification Techniques in Online Social Networks (2014) Advances in Intelligent Systems and Computing
48 | 
49 | http://dl.acm.org/citation.cfm?id=1835935 Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. 2010. Community-based greedy algorithm for mining top-K influential nodes in mobile social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '10)
50 | 
51 | http://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1503&context=sis_research  Twitterrank: Finding Topic-Sensitive Influential Twitterers 2010
52 | @唐小sin 推荐
53 | 
54 | http://www.anderson.ucla.edu/faculty/anand.bodapati/Determining-Influential-Users.pdf Determining Influential Users in Internet Social Networks
55 | 
56 | http://polymer.bu.edu/hes/articles/kghlmsm10.pdf  Identiﬁcation of inﬂuential spreaders in complex networks
57 | @善良的右行 推荐
58 | 
59 | http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0021202  Lü L, Zhang Y-C, Yeung CH, Zhou T (2011) Leaders in Social Networks, the Delicious Case. PLoS ONE 6(6)
60 | @善良的右行 推荐
61 | 
62 | http://arxiv.org/pdf/1112.2239.pdf  Absence of inﬂuential spreaders in rumor dynamics
63 | @善良的右行 推荐
64 | 
65 | ### measure influence
66 | 
67 | http://blog.datalicious.com/awesome-new-research-measuring-twitter-user-influence-from-meeyoung-cha-max-planck-institute/  Awesome new research: Measuring twitter user influence from Meeyoung Cha, Max Planck Institute (2010)  read the original paper below
68 | 
69 | http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1538%20Amit%20Goyal%2C%20Francesco%20Bonchi%2C%20Laks%20V.%20S.%20Lakshmanan%3A%20Approximation%20Analysis%20of%20Influence%20Spread%20in%20Social%20Networks%20CoRR%20abs/1826  Measuring User Inﬂuence in Twitter: The Million Follower Fallacy
70 | 
71 | http://dl.acm.org/citation.cfm?id=2480726   
72 | Mario Cataldi, Nupur Mittal, and Marie-Aude Aufaure. 2013. Estimating domain-based user influence in social networks. In Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC '13). 
73 | 
74 | http://www.cse.ust.hk/~qnature/pdf/globecom13.pdf Analyzing the Inﬂuential People in Sina Weibo
75 | Dataset (2013)
76 | 
77 | http://dl.acm.org/citation.cfm?id=1935845  
78 | Eytan Bakshy, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Everyone's an influencer: quantifying influence on twitter. In Proceedings of the fourth ACM international conference on Web search and data mining (WSDM '11)
79 | 
80 | 
81 | 
82 | ## related
83 | http://en.wikipedia.org/wiki/Opinion_leadership
84 | 
85 | http://dl.acm.org/citation.cfm?id=2503797 
86 | Adrien Guille, Hakim Hacid, Cecile Favre, and Djamel A. Zighed. 2013. Information diffusion in online social networks: a survey. SIGMOD Rec. 42, 2 (July 2013), 17-28. 
87 | 
88 | http://dl.acm.org/citation.cfm?id=2601412 Charu Aggarwal and Karthik Subbian. 2014. Evolutionary Network Analysis: A Survey. ACM Comput. Surv. 47, 1, Article 10 (May 2014), 36 pages. 
89 | 
90 | 
91 | 


--------------------------------------------------------------------------------
/awesome/learn-big-data.md:
--------------------------------------------------------------------------------
 1 | # 大数据应用与技术 - 入门资源汇编
 2 | 
 3 | 大数据是一个内涵非常广泛的概念，涵盖了统计，数据科学，机器学习，数据挖掘，分布式数据库，分布式计算，云端存储，信息可视化等等诸多领域．
 4 | 更详细的领域列表可以见Github上的 [Awesome Big Data](https://github.com/onurakpolat/awesome-bigdata)
 5 | 
 6 | 一般个人和中小企业学习大数据可以先了解一些大数据应用的案例,再基于自身拥有的数据与业务(不论大小)进行实践.
 7 | 注意, 盲目上大数据技术很容易浪费学习时间，也能带来大量不必要的运营成本．
 8 | 
 9 | 
10 | ## 大数据应用 - 什么算大数据
11 | 
12 | 作为产品经理, 要了解大数据的基本概念和特点,进而找到与自身业务流程相关的地方. 也要多看看大数据应用案例,鉴于这些应用的规模很有可能只能在500强企业中才会出现，中小企业应要灵活学习而不必照搬技术框架.
13 | 
14 | http://www.planet-data.eu/sites/default/files/presentations/Big_Data_Tutorial_part4.pdf 这个大数据讲义(2012, 41页)综合了很多关于大数据的分析图表,也列举了不少关键技术用例. 
15 | 
16 | http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/ 该文整理了在高盛云计算大会学到的核心概念.
17 | ![](http://hortonworks.com/wp-content/uploads/2012/05/bigdata_diagram.png)
18 | 
19 | 
20 | ## 大数据技术 - 简版进阶方案
21 | 
22 | 要想成为数据科学家, 通常可以选修网上相关课程，如coursera和小象学院．
23 | 这里我们面向Excel为基础的中小企业初学者设计一个简版进阶方案．
24 | 
25 | 第0级：电子表格Excel -- 实现简单的数据分析与图表
26 | 
27 | 第1级：关系数据库和SQL语言，例如Access和MySQL -- 利用数据库查询聚合大量业务数据纪录
28 | 
29 | 第2级：基础的编程语言，例如Python/R，Java -- 通过程序将数据处理流程自动化
30 | 
31 | 第3级：在程序中访问数据库，例如ORM, ODBC, JDBC -- 进一步提高数据处理自动化程度
32 | 
33 | 第4级：了解一个NoSQL数据库，例如redis，mongodb，neo4j，elasticsearch --  根据业务需要选择一个合用的就行,传统关系数据库的性能未必不够用.
34 | 
35 | 第5级：了解一点数据分析(含机器学习/数据挖掘)常识，如线性回归，多项式拟合，逻辑回归，KNN聚类，决策树，Naive贝叶斯等．Python/R/Java都有现成实现
36 | 
37 | 第6级：如果需要使用变态多的计算/存储资源，学习云计算平台，如亚马逊的EC2, S3, Google Compute Engine, Microsoft Azure
38 | 
39 | 第7级：如果要处理变态多的数据，学习分布式计算Hadoop和MapReduce的原理，然后使用一个现成的实现，如Amazon Elastic MapReduce (Amazon EMR)
40 | 
41 | 第8级：如果要在变态多的数据上做数据分析，学习spark, mahout 或任何一个SQL on Hadoop．
42 | 
43 | 到此恭喜你，在任何一个＂大数据群＂都可以指点江山了．
44 | 
45 | 
46 | ## 傻瓜入门参考书
47 | 
48 | (英文) Big Data Glossary 大数据入门指导图书，主要讲解大数据处理技术及工具，内容涵盖了NoSql Database，各种MapReduce，Storage，Servers，数据清理阶段工具，NLP库与工具包，Machine learning机器学习工具包，数据可视化工具包，公共数据清洗，序列化指南等等。有点老（2011），不过重点推荐。有免费pdf
49 | http://download.bigbata.com/ebook/oreilly/books/Big_Data_Glossary.pdf
50 | 
51 | (英文) Big Data For Dummies 有免费pdf　http://it-ebooks.info/book/2082/　
52 | 
53 | "大数据时代从入门到全面理解"　http://book.douban.com/review/6131027/ 适合了解大数据的一些基本概念．不过作者看法有些片面, 有很多吸引眼球的段子, 但与技术流结合地不够紧密.
54 | 
55 | ## 数据科学家学习资源
56 | 
57 | http://www.douban.com/note/247983915/ 数据科学家的各种资源
58 | 
59 | http://www.aboutyun.com/thread-7569-1-1.html 大数据入门：各种大数据技术介绍
60 | 
61 | https://class.coursera.org/datasci-001  coursera上的公开课 大数据科学入门 Introduction to Data Science
62 | 
63 | 
64 | ## 应用案例资源
65 | 
66 | http://www.ibm.com/big-data/us/en/big-data-and-analytics/case-studies.html IBM的一些大数据分析案例
67 | 
68 | http://www.sas.com/resources/asset/Big-Data-in-Big-Companies.pdf SAS的大数据案例
69 | 
70 | http://www.teradata.com/big-data/use-cases/ Teradata的大数据案例
71 | 
72 | 
73 | 
74 | 


--------------------------------------------------------------------------------
/awesome/machine-learning-guide.md:
--------------------------------------------------------------------------------
  1 | # 机器学习入门资源不完全汇总
  2 | 2014-10-14版, 好东西传送门编辑整理, 原文链接 http://ml.memect.com/article/machine-learning-guide.html
  3 | 
  4 | 感谢贡献者： tang_Kaka_back@新浪微博
  5 | 
  6 | 欢迎补充指正，转载请保留原作者和原文链接。本文是[机器学习日报](http://ml.memect.com)的一个专题合集，欢迎订阅: 给hao@memect.com发个邮件，标题＂订阅机器学习日报＂。
  7 | 
  8 | [基本概念](#基本概念) ｜ [入门攻略](#入门攻略) ｜ [课程资源](#课程资源) ｜ [论坛网站](#论坛网站)  ｜ [东拉西扯](#东拉西扯)  
  9 | 
 10 | 
 11 | 
 12 | ## 基本概念
 13 | [机器学习](http://zh.wikipedia.org/zh/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0) "机器学习是近20多年兴起的一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习理论主要是设计和分析一些让计算机可以自动“学习”的算法。**机器学习算法是一类从数据中自动分析获得规律，并利用规律对未知数据进行预测的算法**。因为学习算法中涉及了大量的统计学理论，机器学习与统计推断学联系尤为密切，也被称为统计学习理论。算法设计方面，机器学习理论关注可以实现的，行之有效的学习算法。" --摘自维基百科
 14 | 
 15 | [How do you explain Machine Learning and Data Mining to non Computer Science people? @quora](http://www.quora.com/How-do-you-explain-Machine-Learning-and-Data-Mining-to-non-Computer-Science-people) by Pararth Shah, 中文版 [如何向小白介绍何谓机器学习和数据挖掘？买回芒果他就懂了 @36kr](http://www.36kr.com/p/200601.html) -- 这印证了上面讲的定义 “机器学习就是从现象中发现统计规律，再利用规律预测”。当一车水果混作一团时，监督学习（supervised learning)能根据你提供的几个苹果样本帮你把所有苹果从梨，芒果中区分出来； 无监督学习（unsupervised learning）能根据已知的各种特征,无需样本自动把类似的水果分上几堆（也许是红水果和黄水果，也许是大苹果小苹果,...）；关联规则学习（association rule learning) 则是帮你发现基于规则的规律，例如绿色的小苹果都有点酸。
 16 | 
 17 | 
 18 | 下面从微观到宏观试着梳理一下机器学习的范畴：一个具体的算法，领域进一步细分，实战应用场景，与其他领域的关系。
 19 | 
 20 | <img src="http://www.nltk.org/images/supervised-classification.png"/>
 21 | 
 22 | 图1: 机器学习的例子：NLTK监督学习的工作流程图 (source: http://www.nltk.org/book/ch06.html)
 23 | 
 24 | <img src="http://work.caltech.edu/images1/map.png"/>
 25 | 
 26 | 图2: 机器学习概要图 by Yaser Abu-Mostafa (Caltech) (source: http://work.caltech.edu/library/181.html)
 27 | 
 28 | 
 29 | <img src="http://4.bp.blogspot.com/-o0vLxYf6YZ4/UQVO9K2jxDI/AAAAAAAACt8/Z5w0bSgqkxw/s640/machine_learning.png"/>
 30 | 
 31 | 图3: 机器学习实战：在python scikit learn 中选择机器学习算法 by Nishant Chandra (source: http://n-chandra.blogspot.com/2013/01/picking-machine-learning-algorithm.html)
 32 | 
 33 | 
 34 | <img src="http://nirvacana.com/thoughts/wp-content/uploads/2013/07/RoadToDataScientist1-1024x831.png"/>
 35 | 
 36 | 图4: 机器学习和其他学科的关系： 数据科学的地铁图 by Swami Chandrasekaran (source: http://nirvacana.com/thoughts/becoming-a-data-scientist/)
 37 | 
 38 | 
 39 | ## 入门攻略
 40 | 
 41 | 大致分三类： 起步体悟，实战笔记，行家导读
 42 | 
 43 | * [机器学习入门者学习指南 @果壳网](http://www.guokr.com/post/512037/) (2013) 作者 [白马](http://www.guokr.com/group/i/0373595356/)  -- [起步体悟] 研究生型入门者的亲身经历
 44 | 
 45 | * [有没有做机器学习的哥们？能否介绍一下是如何起步的 @ourcoders](http://ourcoders.com/thread/show/2837/) -- [起步体悟] 研究生型入门者的亲身经历，尤其要看[reyoung](http://ourcoders.com/user/show/25895/reyoung/)的建议 
 46 | 
 47 | * [tornadomeet 机器学习 笔记](http://www.cnblogs.com/tornadomeet/tag/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/) (2013) -- [实战笔记] 学霸的学习笔记，看看小伙伴是怎样一步一步地掌握“机器学习”
 48 | 
 49 | * [Machine Learning Roadmap: Your Self-Study Guide to Machine Learning](https://machinelearningmastery.com/machine-learning-roadmap-your-self-study-guide-to-machine-learning/) (2014) Jason Brownlee -- [行家导读] 虽然是英文版，但非常容易读懂。对Beginner,Novice,Intermediate,Advanced读者都有覆盖。
 50 |  * [A Tour of Machine Learning Algorithms](http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/)  （2013） 这篇关于机器学习算法分类的文章也非常好
 51 |  * [Best Machine Learning Resources for Getting Started](http://machinelearningmastery.com/best-machine-learning-resources-for-getting-started/)（2013） 这片有中文翻译 [机器学习的最佳入门学习资源 @伯乐在线](http://blog.jobbole.com/56256/) 译者 [programmer_lin](http://www.jobbole.com/members/linwenhui/)
 52 | 
 53 | 
 54 | * 门主的几个建议
 55 |   * 既要有数学基础，也要编程实践
 56 |   * 别怕英文版，你不懂的大多是专业名词，将来不论写文章还是读文档都是英文为主
 57 |   * [我是小广告][我是小广告]订阅机器学习日报，跟踪业内热点资料。
 58 | 
 59 | 
 60 | ### 更多攻略
 61 | 
 62 | * [机器学习该怎么入门 @知乎](http://www.zhihu.com/question/20691338) (2014) 
 63 | * [What's the easiest way to learn machine  learning @quora](http://www.quora.com/Whats-the-easiest-way-to-learn-machine-learning) (2013)
 64 | * [What is the best way to study machine learning @quora](http://www.quora.com/What-is-the-best-way-to-study-machine-learning)  (2012)
 65 | * [Is there any roadmap for learning Machine Learning (ML) and its related courses at CMU Is there any roadmap for learning Machine Learning (ML) and its related courses at CMU](http://www.quora.com/Is-there-any-roadmap-for-learning-Machine-Learning-ML-and-its-related-courses-at-CMU) (2014)
 66 | 
 67 | ## 课程资源
 68 | Tom Mitchell 和 Andrew Ng 的课都很适合入门
 69 | 
 70 | ### 入门课程
 71 | 
 72 | #### 2011 Tom Mitchell(CMU)机器学习
 73 |  [英文原版视频与课件PDF](http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml)  他的《机器学习》在很多课程上被选做教材，有中文版。
 74 | * Decision Trees
 75 | * Probability and Estimation 
 76 | * Naive Bayes 
 77 | * Logistic Regression 
 78 | * Linear Regression 
 79 | * Practical Issues: Feature selection，Overfitting ...
 80 | * Graphical models: Bayes networks, EM，Mixture of Gaussians clustering ...
 81 | * Computational Learning Theory: PAC Learning, Mistake bounds ...
 82 | * Semi-Supervised Learning
 83 | * Hidden Markov Models
 84 | * Neural Networks
 85 | * Learning Representations: PCA, Deep belief networks, ICA, CCA ...
 86 | * Kernel Methods and SVM
 87 | * Active Learning 
 88 | * Reinforcement Learning
 89 | 以上为课程标题节选
 90 | 
 91 | #### 2014 Andrew Ng (Stanford)机器学习
 92 |  [英文原版视频](https://www.coursera.org/course/ml) ｜ [果壳讨论](http://mooc.guokr.com/course/16/Machine-Learning/) 这就是针对自学而设计的，免费还有修课认证。“老师讲的是深入浅出，不用太担心数学方面的东西。而且作业也非常适合入门者，都是设计好的程序框架，有作业指南，根据作业指南填写该完成的部分就行。”（参见白马同学的入门攻略）"推荐报名，跟着上课，做课后习题和期末考试。(因为只看不干，啥都学不会)。" (参见reyoung的建议）  
 93 | 
 94 |  1.  Introduction (Week 1)
 95 |  2. Linear Regression with One Variable (Week 1)
 96 |  3. Linear Algebra Review (Week 1, Optional)
 97 |  4. Linear Regression with Multiple Variables (Week 2)
 98 |  5. Octave Tutorial (Week 2)
 99 |  6. Logistic Regression (Week 3)
100 |  7. Regularization (Week 3)
101 |  8. Neural Networks: Representation (Week 4)
102 |  9. Neural Networks: Learning (Week 5)
103 |  10. Advice for Applying Machine Learning (Week 6)
104 |  11. Machine Learning System Design (Week 6)
105 |  12. Support Vector Machines (Week 7)
106 |  13. Clustering (Week 8)
107 |  14. Dimensionality Reduction (Week 8)
108 |  15. Anomaly Detection (Week 9)
109 |  16. Recommender Systems (Week 9)
110 |  17. Large Scale Machine Learning (Week 10)
111 |  18. Application Example: Photo OCR
112 |  19. Conclusion
113 | 
114 | ### 进阶课程
115 | 
116 | **2013年Yaser Abu-Mostafa (Caltech) Learning from Data**  -- 内容更适合进阶
117 | [课程视频,课件PDF@Caltech](http://work.caltech.edu/lectures.html)
118 | 
119 |  1. The Learning Problem
120 |  2. Is Learning Feasible?
121 |  3. The Linear Model I
122 |  4. Error and Noise
123 |  5. Training versus Testing
124 |  6. Theory of Generalization
125 |  7. The VC Dimension
126 |  8. Bias-Variance Tradeoff
127 |  9. The Linear Model II
128 |  10. Neural Networks
129 |  11. Overfitting
130 |  12. Regularization
131 |  13. Validation
132 |  14. Support Vector Machines
133 |  15. Kernel Methods
134 |  16. Radial Basis Functions
135 |  17. Three Learning Principles
136 |  18. Epilogue
137 | 
138 | **2014年 林軒田(国立台湾大学) 機器學習基石 (Machine Learning Foundations)**  -- 内容更适合进阶，華文的教學講解
139 | [课程主页](https://www.coursera.org/course/ntumlone)
140 | 
141 | When Can Machines Learn? [何時可以使用機器學習]
142 |  The Learning Problem [機器學習問題]
143 | -- Learning to Answer Yes/No [二元分類]
144 | -- Types of Learning [各式機器學習問題]
145 | -- Feasibility of Learning [機器學習的可行性]
146 | 
147 | Why Can Machines Learn? [為什麼機器可以學習]
148 | -- Training versus Testing [訓練與測試]
149 | -- Theory of Generalization [舉一反三的一般化理論]
150 | -- The VC Dimension [VC 維度]
151 | -- Noise and Error [雜訊一錯誤]
152 | 
153 | How Can Machines Learn? [機器可以怎麼樣學習]
154 | -- Linear Regression [線性迴歸]
155 | -- Linear `Soft' Classification [軟性的線性分類]
156 | -- Linear Classification beyond Yes/No [二元分類以外的分類問題]
157 | -- Nonlinear Transformation [非線性轉換]
158 | 
159 | How Can Machines Learn Better? [機器可以怎麼樣學得更好]
160 | -- Hazard of Overfitting [過度訓練的危險]
161 | -- Preventing Overfitting I: Regularization [避免過度訓練一：控制調適]
162 | -- Preventing Overfitting II: Validation [避免過度訓練二：自我檢測]
163 | -- Three Learning Principles [三個機器學習的重要原則]
164 | 
165 | 
166 | 
167 | ### 更多选择
168 | 
169 | **2008年Andrew Ng CS229 机器学习** -- 这组视频有些年头了，主讲人这两年也高大上了.当然基本方法没有太大变化，所以课件PDF可下载是优点。
170 | [中文字幕视频@网易公开课](http://v.163.com/special/opencourse/machinelearning.html)  |  [英文版视频@youtube](https://www.youtube.com/playlist?list=PLA89DCFA6ADACE599)  |
171 | [课件PDF@Stanford](http://cs229.stanford.edu/materials.html)
172 |  
173 |  第1集.机器学习的动机与应用
174 |  第2集.监督学习应用.梯度下降
175 |  第3集.欠拟合与过拟合的概念
176 |  第4集.牛顿方法
177 |  第5集.生成学习算法
178 |  第6集.朴素贝叶斯算法
179 |  第7集.最优间隔分类器问题
180 |  第8集.顺序最小优化算法
181 |  第9集.经验风险最小化
182 |  第10集.特征选择
183 |  第11集.贝叶斯统计正则化
184 |  第12集.K-means算法
185 |  第13集.高斯混合模型
186 |  第14集.主成分分析法
187 |  第15集.奇异值分解
188 |  第16集.马尔可夫决策过程
189 |  第17集.离散与维数灾难
190 |  第18集.线性二次型调节控制
191 |  第19集.微分动态规划
192 |  第20集.策略搜索
193 | 
194 | 
195 | **2012年余凯(百度)张潼(Rutgers) 机器学习公开课** -- 内容更适合进阶
196 | [课程主页@百度文库](http://wenku.baidu.com/course/view/49e8b8f67c1cfad6195fa705)  ｜ [课件PDF@龙星计划](http://bigeye.au.tsinghua.edu.cn/DragonStar2012/download.html)
197 | 
198 |  第1节Introduction to ML and review of linear algebra, probability, statistics (kai)
199 |  第2节linear model (tong) 
200 |  第3节overfitting and regularization(tong)
201 |  第4节linear classification (kai)
202 |  第5节basis expansion and kernelmethods (kai)
203 |  第6节model selection and evaluation(kai)
204 |  第7节model combination (tong)
205 |  第8节boosting and bagging (tong)
206 |  第9节overview of learning theory(tong)
207 |  第10节optimization in machinelearning (tong)
208 |  第11节online learning (tong)
209 |  第12节sparsity models (tong)
210 |  第13节introduction to graphicalmodels (kai)
211 |  第14节structured learning (kai)
212 |  第15节feature learning and deeplearning (kai)
213 |  第16节transfer learning and semi supervised learning (kai)
214 |  第17节matrix factorization and recommendations (kai)
215 |  第18节learning on images (kai)
216 |  第19节learning on the web (tong)
217 | 
218 | 
219 | 
220 | 
221 | ## 论坛网站
222 | ### 中文
223 | http://www.52ml.net/ 我爱机器学习
224 | 
225 | http://www.mitbbs.com/bbsdoc/DataSciences.html MITBBS－ 电脑网络 - 数据科学版
226 | 
227 | http://www.guokr.com/group/262/ 果壳 > 机器学习小组
228 | 
229 | http://cos.name/cn/forum/22  统计之都 » 统计学世界 » 数据挖掘和机器学习
230 | 
231 | http://bbs.byr.cn/#!board/ML_DM  北邮人论坛 >> 学术科技 >> 机器学习与数据挖掘
232 | 
233 | 
234 | ### 英文
235 | https://github.com/josephmisiti/awesome-machine-learning  机器学习资源大全
236 | 
237 | http://work.caltech.edu/library/ Caltech 机器学习视频教程库，每个课题一个视频
238 | 
239 | http://www.kdnuggets.com/ 数据挖掘名站
240 | 
241 | http://www.datasciencecentral.com/  数据科学中心网站
242 | 
243 | 
244 | ## 东拉西扯
245 | 一些好东西，入门前未必看得懂，要等学有小成时再看才能体会。
246 | 
247 | [机器学习与数据挖掘的区别](http://en.wikipedia.org/wiki/Machine_learning#Machine_learning_and_data_mining)
248 | * 机器学习关注从训练数据中学到已知属性进行预测
249 | * 数据挖掘侧重从数据中发现未知属性
250 | 
251 | [Dan Levin, What is the difference between statistics, machine learning, AI and data mining?](http://www.quora.com/What-are-some-good-machine-learning-jokes)
252 | * If there are up to 3 variables, it is statistics.
253 | * If the problem is NP-complete, it is machine learning.
254 | * If the problem is PSPACE-complete, it is AI.
255 | * If you don't know what is PSPACE-complete, it is data mining.
256 | 
257 | 几篇高屋建瓴的机器学习领域概论, 参见[原文](http://machinelearningmastery.com/best-machine-learning-resources-for-getting-started/) 
258 | * [The Discipline of Machine Learning](http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf)Tom Mitchell 当年为在CMU建立机器学习系给校长写的东西。
259 | * [A Few Useful Things to Know about Machine Learning](http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf) Pedro Domingos教授的大道理，也许入门时很多概念还不明白，上完公开课后一定要再读一遍。这是刘知远翻译的中文版 [机器学习那些事 PDF](http://www.valleytalk.org/wp-content/uploads/2012/11/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E9%82%A3%E4%BA%9B%E4%BA%8B.pdf)
260 | 
261 | 几本好书，书籍推荐很多高人都做过，这里就不多说了，直接给链接
262 | * [Machine Learning in Action](http://manning.com/pharrington/) Peter Harrington  中文版 [机器学习实战 @豆瓣](http://book.douban.com/subject/24703171/)  -- “这本书能让你明白：那些被吹捧得出神入化的分类算法，竟然实现起来如此简单； 那些看是高深的数学理论，其实一句话就能道明其本质； 一切复杂的事物，出发点都是非常简单的想法。” 摘自[Kord @豆瓣](http://book.douban.com/review/6249619/)的评论
263 | * 李航博士的书 [统计学习方法 @豆瓣](http://book.douban.com/subject/10590856/)  -- 首先这是一本好书，“如果我什么都不知道，这种干货为主的传统教科书很可能会让我讨厌机器学习的（个人观点）。但是，如果把这本书作为参考书，那将是非常好的一本，一方面算是比较权威吧，另一方面是简洁，用公式、逻辑说话，不做太多通俗的解释，比起PRML等书就简洁了很多，有着独特的魅力和市场需求。”  摘自[chentingpc @豆瓣](http://book.douban.com/review/5540889/)的评论
264 | * [机器学习经典书籍 @算法组](http://suanfazu.com/discussion/109/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E7%BB%8F%E5%85%B8%E4%B9%A6%E7%B1%8D/p1) by [算法组](http://www.weibo.com/suanfazu)
265 | 


--------------------------------------------------------------------------------
/awesome/machine-learning-reading.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | ## readings recommend by michael jordan
 4 | 
 5 | source: http://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckdqzph
 6 | 
 7 | "I now tend to add some books that dig still further into foundational topics. In particular, I recommend A. Tsybakov's book "Introduction to Nonparametric Estimation" as a very readable source for the tools for obtaining lower bounds on estimators, and Y. Nesterov's very readable "Introductory Lectures on Convex Optimization" as a way to start to understand lower bounds in optimization. I also recommend A. van der Vaart's "Asymptotic Statistics", a book that we often teach from at Berkeley, as a book that shows how many ideas in inference (M estimation---which includes maximum likelihood and empirical risk minimization---the bootstrap, semiparametrics, etc) repose on top of empirical process theory. I'd also include B. Efron's "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction", as a thought-provoking book"
 8 | 
 9 | 
10 | http://www.amazon.com/Introduction-Nonparametric-Estimation-Springer-Statistics/dp/1441927093
11 | Introduction to Nonparametric Estimation
12 | 
13 | <img height=200 src="http://ecx.images-amazon.com/images/I/31vP%2BXbyAuL.jpg"/>
14 | 
15 | http://www.amazon.com/Introductory-Lectures-Convex-Optimization-Applied/dp/1402075537
16 | Introductory Lectures on Convex Optimization
17 | 
18 | <img height=200 src="http://ecx.images-amazon.com/images/I/41L6K%2BAyoGL.jpg"/>
19 | 
20 | 
21 | http://www.amazon.com/Asymptotic-Statistics-Statistical-Probabilistic-Mathematics/dp/0521784506
22 | Asymptotic Statistics
23 | 
24 | <img height=200 src="http://ecx.images-amazon.com/images/I/710vE3Y5KjL.jpg"/>
25 | 
26 | http://www.amazon.com/Large-Scale-Inference-Estimation-Prediction-Mathematical/dp/110761967X  
27 | Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction
28 | 
29 | <img height=200 src="http://ecx.images-amazon.com/images/I/419DXlKMiDL.jpg"/>
30 | 
31 | 


--------------------------------------------------------------------------------
/awesome/manifold-learning.md:
--------------------------------------------------------------------------------
 1 | 讨论与进展 issue 26 https://github.com/memect/hao/issues/26
 2 | 
 3 | ## Introduction
 4 | 
 5 | http://blog.sina.com.cn/s/blog_eccca60e0101h1d6.html @cmdyz 流形学习 (Manifold Learning)
 6 | 
 7 | http://blog.pluskid.org/?p=533 浅谈流形学习
 8 | 
 9 | http://blog.csdn.net/chl033/article/details/6107042 流形学习（manifold learning）综述 
10 | 
11 | http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ Neural Networks, Manifolds, and Topology
12 | 
13 | # Tutorial
14 | 
15 | http://www.cad.zju.edu.cn/reports/%C1%F7%D0%CE%D1%A7%CF%B0.pdf 何晓飞 流形学习
16 | 
17 | https://www.cs.cmu.edu/~efros/courses/AP06/presentations/ThompsonDimensionalityReduction.pdf
18 | 
19 | http://mlsp2012.conwiz.dk/fileadmin/lectures/mlsp2012_raich.pdf MLSP2012 Tutorial: Manifold Learning: Modeling and. Algorithms
20 | 
21 | # Additional Tutorials
22 | 
23 | http://www2.imm.dtu.dk/projects/manifold/Syllabus.html Summer School on Manifold Learning in Image and Signal Analysis
24 | 
25 | ## Implementation
26 | 
27 | http://scikit-learn.org/stable/modules/manifold.html
28 | 
29 | 谁还关注这个话题： @王斌_ICTIR @丕子  
30 | 
31 | 


--------------------------------------------------------------------------------
/awesome/mlss.md:
--------------------------------------------------------------------------------
 1 | # MLSS Machine Learning Summer Schools
 2 |  (forked from http://www.mlss.cc/) adding more links to the list
 3 | 
 4 | ## highlights
 5 | * 特别推荐09年UK的MLSS 所有还幻灯片 [打包下载ZIP 51M](http://mlg.eng.cam.ac.uk/mlss09/mlss_slides.zip)  @bigiceberg 推荐 "其中09年UK的mlss最经典"
 6 | 
 7 | ## Future (8)
 8 | * MLSS Spain (Fernando Perez-Cruz), late spring 2016 (tentative)
 9 | * MLSS London (tentative)
10 | * MLSS Tübingen, summer 2017 (tentative)
11 | * MLSS Africa (very tentative)
12 | * MLSS Kyoto (Marco Cuturi, Masashi Sugiyama, Akihiro Yamamoto), August 31 - September 11 (tentative), 2015
13 | * MLSS Tübingen (Michael Hirsch, Philipp Hennig, Bernhard Schölkopf), July 13-24, 2015
14 | * MLSS Sydney (Edwin Bonilla, Yang Wang, Bob Williamson), 16 - 25 February, 2015 http://www.nicta.com.au/research/machine_learning/mlss2015
15 | * MLSS Austin (Peter Stone, Pradeep Ravikumar), January 7-16, 2015 http://www.cs.utexas.edu/mlss/
16 | 
17 | ## Past (25)
18 | * MLSS China, Beijing (Stephen Gould, Hang Li, Zhi-Hua Zhou), June 15-21, 2014, colocated with ICML http://lamda.nju.edu.cn/conf/mlss2014/
19 | * MLSS Pittsburgh (Alex Smola & Zico Kolter), July 6-18, 2014 http://mlss2014.com/
20 | * MLSS Iceland (Sami Kaski), April 26 - May 4, 2014 (colocated with AISTATS) http://mlss2014.hiit.fi/
21 | * MLSS Tübingen, Germany, 26 August - 07 September 2013 http://mlss.tuebingen.mpg.de
22 | * MLSS Kyoto, August 27 - September 7, 2012 http://www.i.kyoto-u.ac.jp/mlss12/
23 | * MLSS Santa Cruz, July 9-20, 2012 http://mlss.soe.ucsc.edu/home
24 | * MLSS La Palma, Canary Islands, April 11-19, 2012 (followed by AISTATS) http://mlss2012.tsc.uc3m.es/
25 | * MLSS France, September 4 - 17, 2011 http://mlss11.bordeaux.inria.fr/
26 | * MLSS @Purdue, June 13 and June 24, 2011 http://learning.stat.purdue.edu/wiki/mlss/start
27 | * MLSS Singapore, June 13 - 17, 2011 http://bigbird.comp.nus.edu.sg/pmwiki/farm/mlss/
28 | * MLSS Canberra, Australia, September 27 - October 6, 2010 http://canberra10.mlss.cc
29 | * MLSS Sardinia, May 6 - May 12, 2010 http://www.sardegnaricerche.it/index.php?xsl=370&s=139254&v=2&c=3841 [video lecture](http://videolectures.net/mlss2010_sardinia/)
30 | * MLSS Cambridge, UK, August 29 - September 10, 2009 http://mlg.eng.cam.ac.uk/mlss09
31 | * MLSS Canberra, Australia, January 26 - February 6, 2009 http://ssll.cecs.anu.edu.au/
32 | * MLSS Isle de Re, France, September 1-15, 2008 [archive](https://web.archive.org/web/20080329172541/http://mlss08.futurs.inria.fr/) [announcement](http://eventseer.net/e/7178/) 
33 | * MLSS Kioloa, Australia, March 3 - 14, 2008 http://kioloa08.mlss.cc
34 | * MLSS Tübingen, Germany, August 20 - August 31, 2007 http://videolectures.net/mlss07_tuebingen/
35 | * MLSS Taipei, Taiwan, July 24 - August 2, 2006 http://www.iis.sinica.edu.tw/MLSS2006/
36 | * MLSS Canberra, Australia, February 6-17, 2006 http://canberra06.mlss.cc/
37 | * MLSS Chicago, USA, May 16-27, 2005 [archive](https://web.archive.org/web/20080314055344/http://chicago05.mlss.cc/) [announcement](http://linguistlist.org/LL/fyi/fyi-details.cfm?submissionid=49210)
38 | * MLSS Canberra, Australia, January 23 - February 5, 2005  [archive](https://web.archive.org/web/20060105025204/http://canberra05.mlss.cc/)
39 | * MLSS Berder, France, September 12-25, 2004 [archive](https://web.archive.org/web/20080406175615/http://www.kyb.tuebingen.mpg.de/mlss04/)
40 | * MLSS Tübingen, Germany, August 4-16, 2003  [archive](https://web.archive.org/web/20080409113424/http://www.kyb.tuebingen.mpg.de/mlss04/mlss03/)
41 | * MLSS Canberra, Australia, February 2-14, 2003 [archive](https://web.archive.org/web/20030607005801/http://mlg.anu.edu.au/summer2003/)
42 | * MLSS Canberra, Australia, February 11-22, 2002 [archive](https://web.archive.org/web/20030607063738/http://mlg.anu.edu.au/summer2002/)
43 | 


--------------------------------------------------------------------------------
/awesome/multiclass-boosting.md:
--------------------------------------------------------------------------------
 1 | #Awesome Multi-class Boosting Resources
 2 | 
 3 | abstract: classic papers, slides and overviews, plus Github code. 
 4 | 
 5 | ![Multi-class boosting](http://emma.memect.com/t/e7c2d6935a3a0e92486bee03cca3797954f8833ecb60ca4348b6fa32dba345f7)
 6 | 
 7 | (image source http://www.svcl.ucsd.edu/projects/)
 8 | 
 9 | chinese abstract: 问：@图像视觉研究 有没有经典的Multi-Class boosting的相关资料推荐推荐？ 答：找到几篇经典论文，几个幻灯片、录像以及工具包。相关学校有MIT，UCSD，Stanford，umich等。软件有C++, Pythton (scikit-learn) 实现，也有几个GITHUB开源软件。  [资料卡片](http://bigdata.memect.com/?tag=MultiClassBoosting)
10 | 
11 | https://github.com/memect/hao/blob/master/awesome/multiclass-boosting.md
12 | 
13 | # overview
14 | http://www.svcl.ucsd.edu/projects/mcboost/
15 | 
16 | http://classes.soe.ucsc.edu/cmps242/Fall09/proj/Mario_Rodriguez_Multiclass_Boosting_talk.pdf  Multi-class boosting (slides), Mario Rodriguez, 2009
17 | 
18 | http://cmp.felk.cvut.cz/~sochmj1/adaboost_talk.pdf presentation summarizing AdaBoost 
19 | 
20 | 
21 | # people
22 | 
23 | http://dept.stat.lsa.umich.edu/~jizhu/  check his contribution on SAMME
24 | 
25 | 
26 | #video lectures
27 | 
28 | https://www.youtube.com/watch?v=L6BlpGnCYVg  "A Theory of Multiclass Boosting", Rob Schapire, Partha Niyogi Memorial Conference: Computer Science
29 | 
30 | http://techtalks.tv/talks/multiclass-boosting-with-hinge-loss-based-on-output-coding/54338/ Multiclass Boosting with Hinge Loss based on Output Coding, Tianshi Gao; Daphne Koller, ICML 2011
31 | 
32 | # classical paper
33 | http://web.mit.edu/torralba/www/cvpr2004.pdf Sharing features: efficient boosting procedures for multiclass object detection, Antonio Torralba Kevin P. Murphy William T. Freeman, CVPR 2004
34 | 
35 | 
36 | http://dept.stat.lsa.umich.edu/~jizhu/pubs/Zhu-SII09.pdf  Multi-class AdaBoost, Ji Zhu, Hui Zou, Saharon Rosset and Trevor Hastie, Statistics and Its Interface, 2009
37 | 
38 | http://www.cs.princeton.edu/~imukherj/nips10.pdf  A Theory of Multiclass Boosting, Indraneel Mukherjee, Robert E. Schapire, NIPS 2010
39 | 
40 | http://papers.nips.cc/paper/4450-multiclass-boosting-theory-and-algorithms.pdf Multiclass Boosting: Theory and Algorithms, Mohammad J. Saberian, Nuno Vasconcelos, NIPS, 2011 
41 | 
42 | 
43 | # tools
44 | http://www.multiboost.org/ a fast C++ implementation of multi-class/multi-label/multi-task boosting algorithms. It is based on AdaBoost.MH but also implements popular cascade classifiers and FilterBoost along with a batch of common multi-class base learners (stumps, trees, products, Haar filters).
45 | 
46 | http://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_multiclass.html
47 | 
48 | https://github.com/cshen/fast-multiboost-cw
49 | 
50 | https://github.com/pengsun/AOSOLogitBoost
51 | 
52 | https://github.com/circlingthesun/omclboost
53 | 


--------------------------------------------------------------------------------
/awesome/multitask-learning.md:
--------------------------------------------------------------------------------
  1 | # MultiTask Learning 资源合集
  2 | 
  3 | contributors: 唐小sin 王威廉 黄厝海滨 李航博士 李沐M Copper_PKU 复旦李斌 eeyangc 李晗littlefool 李亚超NLP lby9
  4 |    
  5 | discussion: https://github.com/memect/hao/issues/93
  6 | 
  7 | keywords
  8 |   multi-task learning
  9 | 
 10 | ## 微博讨论
 11 | 问: @唐小sin 有没有multi-task learning的相关学习资料呢？
 12 | 答: 维基百科上有不少经典文献。AAAI和ICML都有论文(北大/清华)。找到今年Honglak Lee (U Michigan 教授)的短教程。Lan Žagar 博士论文(2014) Ranking by Multitask Learning. 问答追踪: #93 求补充
 13 | http://www.weibo.com/5220650532/BiZl47k80?ref=
 14 | 
 15 | 唐小sin：补充下吧，刚自己也找了一遍classic paper里面的Caruana的博士论文就是Multitask Learning，他是Tom Mitchell的学生。
 16 | 
 17 | 
 18 | 
 19 | 王威廉：今年SIGKDD最佳博士论文颁给了CMU计算机系金光熹同学的论文 Reconstruction and Applications of Collective Storylines from Web Photo Collections http://t.cn/RPNmgEw 还有一个优胜奖也由CMU的multitask learning论文（Mladen Kolar，现芝大教授）获得。
 20 | http://weibo.com/1657470871/BhG9eDbcm
 21 | 
 22 | 黄厝海滨：可以说说他的导师啊，因为这两者的导师都是Eric Xing (8月10日 22:42)
 23 | 
 24 | 
 25 | 
 26 | 
 27 | 李航博士 ：#WSDM2014# Best paper award: Amr Ahmed, Abhimanyu Das, Alex Smola, Hierarchical multitask learning: scalable algorithms and an application to conversion optimization in display advertising
 28 | http://weibo.com/2060750830/AyJKFeZmQ
 29 | 
 30 | 李沐M ：恭喜小伙伴和老板。这篇文章先被拒了一次，然后狠下心好好改了改写作。然后就happy ending了。老板然后眨着眼说，你懂了吗？我问懂神码？一呢，写作很重要，二呢，我写作很糟糕，你不能太依靠了。。。 (2月28日 03:53)
 31 | 
 32 | 
 33 | Copper_PKU：six NLP Task from Ronan Collobert, Jason Weston. A Unified Architecture for Natural Language Processing:Deep Neural Networks with Multitask Learning. ICML. 2008....畅读版【http://t.cn/8FOioh1】
 34 | http://weibo.com/1758509357/AwFYMa0ot
 35 | 
 36 | 
 37 | 复旦李斌：最右说的应该就是vowpal wabbit中使用的feature hash方法，Ping Li每篇论文都会提到这个，Smola在ICML-09把这个方法用multitask learning，我也把这个方法用于graph控制特征维度。//@鲁东东胖: 有没有具体一点的描述啊 //@夏粉_百度: 在那次Adworkshop上，yahoo介绍了另外一种降维方法，通过hash的方法
 38 | http://weibo.com/2303649634/A83kaktRT
 39 | 
 40 | 
 41 | eeyangc：从生物角度，你可以说是共同的遗传作用；从machine learning角度看，你可以看成multitask learning；从统计学角度看，你可以说是random-effects and hierarchical structures。横看成岭侧成峰，远近高低各不同。
 42 | http://weibo.com/2107700352/A4CuScVmV
 43 | 
 44 | 
 45 | 
 46 | 李晗littlefool：deep learning造冗余特征的思路不错，boosted decision tree 和 kernel svm 现在有用但是有其局限性。坚持我的观点 基于deep learning的无监督特征工程，和现有的非线性模型。配以online learning的实时特征抽取和模型更新，并借用multitask和transfer learning的知识来进行信息扩展和加强问题适用性。
 47 | http://weibo.com/1489962750/zp3daxlC6
 48 | 
 49 | 李亚超NLP：A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning http://t.cn/aued1i
 50 | http://weibo.com/1732906091/zjTJ94IaH
 51 | 
 52 | lby9：1) Deep learning在文本中的应用比较有意思的是这篇《A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning》。基本上网络结构是底层的low level features是共享的，上层的任务（POS, Chunking, NER等）共享部分的底层feature。网络结构如图。
 53 | http://weibo.com/1873273890/zhvrbkS8c
 54 | 
 55 | 
 56 | ## overview and survey
 57 | http://en.wikipedia.org/wiki/Multi-task_learning
 58 | 
 59 | https://sites.google.com/site/deeplearningcvpr2014/DL-Multimodal_multitask_learning.pdf Multimodal learning and multitask learning (2014)
 60 | 
 61 | http://www.siam.org/meetings/sdm12/zhou_chen_ye.pdf Multi-Task Learning: Theory, Algorithms, and Applications  (2012, SDM tutorial)
 62 | 
 63 | http://jcse.kiise.org/files/JCSE-V5N3-09.pdf  A Survey of Transfer and Multitask Learning in Bioinformatics (2009, JCSE)
 64 | 
 65 | http://www.cse.wustl.edu/~kilian/research/multitasklearning/multitasklearning.html  
 66 | Multitask Learning / Domain Adaptation related publications, maintained by Prof. Kilian Q. Weinberger
 67 | 
 68 | ## classic paper
 69 | http://www.eecs.berkeley.edu/~russell/classes/cs294/f05/papers/caruana-1997.pdf  Caruana, R. (1997). Multitask learning: A knowledge-based source of inductive bias. Machine Learning
 70 | 
 71 | http://www.thespermwhale.com/jaseweston/papers/unified_nlp.pdf
 72 | Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (ICML '08)
 73 | * Copper_PKU, 李亚超NLP, lby9 共同推荐
 74 | 
 75 | ## current
 76 | http://research.microsoft.com/pubs/210041/wsdm2014-multitask.pdf 
 77 | Amr Ahmed, Abhimanyu Das, Alex Smola, Hierarchical multitask learning: scalable algorithms and an application to conversion optimization in display advertising 
 78 | * 李航博士 ：#WSDM2014# Best paper award
 79 | 
 80 | http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/download/8486/8820 Encoding Tree Sparsity in Multi-Task Learning: A Probabilistic Framework (2014) AAAI
 81 | 
 82 | http://machinelearning.wustl.edu/mlpapers/paper_files/icml2014c2_lic14.pdf Bayesian Max-margin Multi-Task Learning with Data Augmentation , (2014) ICML
 83 | 
 84 | http://link.springer.com/chapter/10.1007/978-3-642-37331-2_1  Beyond Dataset Bias: Multi-task Unaligned Shared Knowledge Transfer (2013)
 85 | 
 86 | ## thesis
 87 | http://repository.cmu.edu/dissertations/229/ 
 88 | Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems
 89 | (2013) Mladen Kolar, PhD Thesis
 90 | * 王威廉 推荐， KDD 2014 dissertation award Honorable mention http://www.kdd.org/blog/2014-doctoral-dissertation-award
 91 | 
 92 | http://eprints.fri.uni-lj.si/2486/  
 93 | Lan Žagar (2014) Ranking by Multitask Learning. PhD thesis.
 94 | 
 95 | http://gogoshen.org/ml/Research%20Paper%20Library/caruana97multitask2.pdf  
 96 | Caruana, (1997) Multitask Learning, PhD Thesis
 97 | * 唐小sin：补充下吧，刚自己也找了一遍classic paper里面的Caruana的博士论文就是Multitask Learning，他是Tom Mitchell的学生。
 98 | 
 99 | ## related
100 | http://burrsettles.com/pub/settles.activelearning.pdf  Active Learning Literature Survey, Burr Settles (2010)  1000+ citation
101 | 
102 | http://bigdata.memect.com/?s=multitask
103 | 
104 | 
105 | 


--------------------------------------------------------------------------------
/awesome/nlp.md:
--------------------------------------------------------------------------------
  1 | # NLP常用信息资源
  2 | 
  3 | ## resource portal
  4 | http://nlp.hivefire.com/ NLP News
  5 | 
  6 | https://nlppeople.com/ NLP Jobs
  7 | 
  8 | http://www.cs.rochester.edu/~tetreaul/conferences.html Computational Linguistics / NLP Conferences
  9 | 
 10 | http://www.ldc.upenn.edu/ LDC: The Linguistic Data Consortium
 11 | 
 12 | http://www.clt.gu.se/wiki/nlp-resources NLP Resources
 13 | 
 14 | http://www.aaai.org/AITopics/html/natlang.html AAAI Topics on NLP
 15 | 
 16 | http://www-nlp.stanford.edu/links/statnlp.html Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources
 17 | 
 18 | http://wordnet.princeton.edu/ WordNet
 19 | 
 20 | http://www.keenage.com/ 知网
 21 | 
 22 | http://www.corpus4u.org/ 语料库语言学在线 
 23 | 
 24 | 
 25 | http://trec.nist.gov/ TREC
 26 | * The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992 as part of the TIPSTER Text program.
 27 | 
 28 | ## tutorial 
 29 | http://nlp.cs.berkeley.edu/tutorials/variational-tutorial-slides.pdf  Variational Inference in Structured NLP Models, Presented at NAACL 2012 with David Burkett.
 30 | 
 31 | http://pages.cs.wisc.edu/~jerryzhu/pub/ZhuCCFADL46.pdf Tutorial on Statistical Machine Learning for NLP 2013
 32 | 
 33 | 
 34 | ## courses
 35 | http://www.stanford.edu/class/cs224n/  CS 224N / Ling 284  —  Natural Language Processing
 36 | 
 37 | http://www.cs.berkeley.edu/~klein/cs288/sp10/ CS 288: Statistical Natural Language Processing, Spring 2010
 38 | 
 39 | 
 40 | http://demo.clab.cs.cmu.edu/fa2013-11711/index.php/Main_Page Algorithms for NLP: Basic Information (Fall 2013) 
 41 | 
 42 | http://www.cs.colorado.edu/~martin/csci5832/lectures_and_readings.html Natural Language Processing, CSCI 5832 FALL 2013
 43 | 
 44 | http://www.cs.columbia.edu/~cs4705/  COMS W4705: Natural Language Processing 2013
 45 | 
 46 | http://www1.cs.columbia.edu/~julia/courses/CS4705/syllabus10.htm COMS 4705: Natural Language Processing, Fall 2010
 47 | 
 48 | http://www1.cs.columbia.edu/~julia/courses/CS4706/syllabus12.htm CS4706: Spoken Language Processing, Spring 2012
 49 | 
 50 | http://www.cs.cornell.edu/courses/cs4740/2014sp/  CS 4740/5740 - Introduction to Natural Language Processing, Spring 2014
 51 | 
 52 | http://l2r.cs.uiuc.edu/~danr/Teaching/CS546-13/  Machine Learning and Natural Language Spring 2013
 53 | 
 54 | http://www.cs.jhu.edu/~jason/465/ Natural Language Processing Course # 600.465 — Fall 2013
 55 | 
 56 | http://web.stanford.edu/class/cs224s/ CS 224S/LINGUIST 285 Spoken Language Processing
 57 | 
 58 | http://www.umiacs.umd.edu/~resnik/ling773_sp2014/ Ling773/CMSC773/INST728C, Spring 2014 Computational Linguistics II
 59 | 
 60 | http://cs.nyu.edu/courses/spring13/CSCI-GA.2590-001/index.html
 61 | 
 62 | http://www.cis.upenn.edu/~cis530/  CIS 530 Fall 2013 Computational Linguistics
 63 | 
 64 | http://pages.cs.wisc.edu/~jerryzhu/cs769.html CS 769: Advanced Natural Language Processing Spring 2010
 65 | 
 66 | http://pages.cs.wisc.edu/~bsnyder/cs769.html  
 67 | 
 68 | 
 69 | 
 70 | ## group 
 71 | http://nlp.stanford.edu/ Stanford NLP group
 72 | 
 73 | http://nlp.cs.berkeley.edu/ Berkeley NLP group
 74 | 
 75 | http://www.lti.cs.cmu.edu/ CMU Language Technologies Institute
 76 | 
 77 | http://nlp.ict.ac.cn/index_zh.php  中科院计算所自然语言处理研究组
 78 | 
 79 | http://www.sogou.com/labs/  Sogou实验室
 80 | 
 81 | http://linguistics.georgetown.edu/ Department of Linguistics, Georgetown  University
 82 | 
 83 | http://ir.hit.edu.cn/  	哈工大社会计算与信息检索研究中心
 84 | 
 85 | http://www.childrenshospital.org/research-and-innovation/research-labs/natural-language-processing-lab
 86 | 
 87 | https://wiki.umiacs.umd.edu/clip/index.php/Main_Page
 88 | 
 89 | http://nlp.cs.nyu.edu/ 
 90 | 
 91 | http://nlp.cis.upenn.edu/
 92 | 
 93 | http://www.eng.utah.edu/~cs5340/ 
 94 | 
 95 | 
 96 | ## Textbook
 97 | http://www.cs.colorado.edu/~martin/slp2.html  SPEECH and LANGUAGE PROCESSING 2nd edition  2009
 98 | * 浔雨: "自然语言处理综论" 这本书的权威自不用说，译者是冯志伟老师和孙乐老师，当年读这本书的时候，还不知道冯老师是谁，但是读起来感觉非常好，想想如果没有在这个领域积攒多年的实力，是不可能翻译的这么顺畅的。这本书在国内外的评价都比较好，对自然语言处理的两个学派（语言学派和统计学派）所关注的内容都有所包含，但因此也失去一些侧重点。从我的角度来说更偏向于统计部分，所以需要了解统计
 99 | 
100 | http://cognet.mit.edu/library/books/view?isbn=0262133601 Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999. 
101 | * http://www.csd.uwo.ca/~olga/Courses//Winter2010//CS4442_9542b/Books/StatNatLangProc/
102 | 
103 | 
104 | ## people
105 | http://nlp.stanford.edu/~manning/
106 | 
107 | http://www.umiacs.umd.edu/~hal/ 
108 | 
109 | http://mimno.infosci.cornell.edu/ David Mimno 
110 | * maintainer of MALLET
111 | 
112 | 
113 | http://www.cs.berkeley.edu/~klein/  Dan Klein
114 | 
115 | http://cs.brown.edu/people/ec/home.html Eugene Charniak
116 | 
117 | http://www.cs.colorado.edu/~martin/
118 | 
119 | http://www.cs.columbia.edu/~mcollins/
120 | 
121 | http://www1.cs.columbia.edu/~julia/
122 | 
123 | http://www.cs.cornell.edu/home/cardie/
124 | 
125 | http://www.eecs.harvard.edu/shieber/
126 | * computational Linguistics
127 | 
128 | http://l2r.cs.uiuc.edu/~danr/ 
129 | 
130 | http://www.cs.jhu.edu/~jason/
131 | 
132 | http://www.stanford.edu/~jurafsky/
133 | 
134 | http://www.umiacs.umd.edu/~resnik/
135 | 
136 | http://cs.nyu.edu/grishman/
137 | 
138 | http://homes.cs.washington.edu/~taskar/
139 | 
140 | http://www.cis.upenn.edu/~nenkova/
141 | 
142 | http://www.cs.utah.edu/~riloff/
143 | 
144 | http://pages.cs.wisc.edu/~jerryzhu/
145 | 
146 | http://pages.cs.wisc.edu/~bsnyder/
147 | 
148 | http://www.cs.cmu.edu/~nasmith/
149 | 
150 | http://www.cs.cmu.edu/~alavie/
151 | 
152 | 
153 | 
154 | # Tools
155 | ## NLP Toolbox
156 | http://gate.ac.uk GATE 
157 | * 孔牧: 你可以按照它的要求向其中添加组件， 完成自己的nlp任务. 我在的项目组曾经尝试过使用， 虽然它指出组件开发， 但是灵活性还是不高， 所以我们自己又开发了一套流水线。
158 | 
159 | http://nltk.org Natural Language Toolkit(NLTK) 
160 | 
161 | http://mallet.cs.umass.edu MALLET  MAchine Learning for LanguagE Toolkit
162 | 
163 | 
164 | http://opennlp.apache.org/ OpenNLP 
165 | 
166 | http://alias-i.com/lingpipe/ LingPipe is tool kit for processing text using computational linguistics. 
167 | 
168 | https://textblob.readthedocs.org/en/dev/ TextBlob: Simplified Text Processing (python)
169 | 
170 | https://github.com/HIT-SCIR/ltp 语言技术平台（Language Technology Platform，LTP）是哈工大社会计算与信息检索研究中心历时十年开发的一整套中文语言处理系统。
171 | * http://www.ltp-cloud.com/ “语言技术平台云”（LTP-Cloud）
172 | * 孔牧: 这个是一个较完善的流水线了， 不说质量怎么样， 它提供分词、语义标注、 句法依赖、 实体识别。 虽然会出现错误的结果， 但是， 找不到更好的了。
173 | 
174 | https://github.com/xpqiu/fnlp/  中文自然语言处理工具包
175 | * 邱锡鹏: 推荐自家的FudanNLP 
176 | 
177 | 
178 | ## English Stemmer
179 | http://snowball.tartarus.org/ Snowball
180 | 
181 | ## English POS Tagger
182 | http://nlp.stanford.edu/software/tagger.shtml Stanford POS Tagger 
183 | 
184 | http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ TreeTagger
185 | 
186 | http://www.coli.uni-saarland.de/~thorsten/tnt/ TnT 
187 | 
188 | ## Parser
189 | http://nlp.stanford.edu/software/lex-parser.shtml Stanford Parser 
190 | 
191 | http://nlp.cs.berkeley.edu/software.shtml Berkeley Parser 
192 | 
193 | https://github.com/BLLIP/bllip-parser Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006
194 | 
195 | ## English Keyphrase Extractor
196 | http://www.nzdl.org/Kea/index_old.html KEA keyphrase extraction
197 | 
198 | ## English Name Entity Recognizer
199 | http://nlp.stanford.edu/software/CRF-NER.shtml Stanford NER 
200 | 
201 | ## Chinese Word Segmentation
202 | http://nlp.stanford.edu/software/segmenter.shtml Stanford Word Segmenter 
203 | 
204 | https://github.com/fxsjy/jieba 中文分词
205 | 
206 | http://ictclas.org/  中科院分词ICTCLAS
207 | * 孔牧: 一个比较权威的分词器， 相信你最后会选择它作为项目的分词工具， 虽然本身存在很多问题， 但是我找不到更好的开源项目了。
208 |   
209 | http://msdn.microsoft.com/zh-cn/library/jj163981.aspx
210 | * 孔牧: 当然这个是不开源的， 但是分词非常准， 但是悲剧的是它将分词和实体识别同时完成了， 而且分词（在它提供的工具中）不提供词性标注。
211 | 
212 | https://github.com/ansjsun/ansj_seg ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
213 | 
214 | ## speech recognition
215 | http://cmusphinx.sourceforge.net/ CMU Sphinx
216 | 
217 | 
218 | ## Topic Modeling Tools
219 | http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm Matlab Topic Modeling Toolbox 1.4
220 | 
221 | http://gibbslda.sourceforge.net/ GibbsLDA++ 
222 | 
223 | http://code.google.com/p/glda/ GLDA GPU-accelerated Latent Dirichlet allocation training 
224 | 
225 | ## Search Engines
226 | http://lucene.apache.org/ Lucene
227 | 
228 | 
229 | # classic papers
230 | 
231 | 
232 | ## Chinese Word Segmentaion
233 | http://zhangkaixu.github.io/bibpage/cws.html 张开旭同学整理的文献列表
234 | 
235 | ## Information Extraction
236 | (2008) Sunita Sarawagi. Information extraction. Foundations and Trends in Databases.
237 | 
238 | ## Language Model
239 | (2000) Rosenfeld, R. Two decades of statistical language modeling: where do we go from here?. Proc. IEEE.
240 | (2009) Chengxiang Zhai. Statistical Language Models For information Retrieval. Lecture Notes.
241 | http://www.cs.cmu.edu/~roni/papers/survey-slm-IEEE-PROC-0004.pdf Two decades of Statistical Language Models 
242 | 
243 | ## Parsing
244 | (2009) Sandra Kubler, Ryan McDonald, Joakim Nivre. Dependency Parsing. Synthesis Lectures on Human Language Technologies.
245 | 
246 | ## Sentiment Analysis and Opinion Mining
247 | (2008) Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval .
248 | 
249 | ## Word Sense Disambiguation
250 | (2009) Navigli, R. Word sense disambiguation: A survey. ACM Computing Surveys.
251 | 
252 | ## Topic Models
253 | http://mimno.infosci.cornell.edu/topics.html Topic modeling bibliography 
254 | 
255 | 
256 | Parsing（句法结构分析~语言学知识多，会比较枯燥）
257 | 
258 |     Klein & Manning: "Accurate Unlexicalized Parsing" ( )
259 |     Klein & Manning: "Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency" (革命性的用非监督学习的方法做了parser)
260 |     Nivre "Deterministic Dependency Parsing of English Text" (shows that deterministic parsing actually works quite well)
261 |     McDonald et al. "Non-Projective Dependency Parsing using Spanning-Tree Algorithms" (the other main method of dependency parsing, MST parsing)
262 | 
263 | 
264 | Machine Translation（机器翻译，如果不做机器翻译就可以跳过了，不过翻译模型在其他领域也有应用）
265 | 
266 |     Knight "A statistical MT tutorial workbook" (easy to understand, use instead of the original Brown paper)
267 |     Och "The Alignment-Template Approach to Statistical Machine Translation" (foundations of phrase based systems)
268 |     Wu "Inversion Transduction Grammars and the Bilingual Parsing of Parallel Corpora" (arguably the first realistic method for biparsing, which is used in many systems)
269 |     Chiang "Hierarchical Phrase-Based Translation" (significantly improves accuracy by allowing for gappy phrases)
270 | 
271 | 
272 | Language Modeling (语言模型)
273 | 
274 |     Goodman "A bit of progress in language modeling" (describes just about everything related to n-gram language models 这是一个survey，这个survey写了几乎所有和n-gram有关的东西，包括平滑 聚类)
275 |     Teh "A Bayesian interpretation of Interpolated Kneser-Ney" (shows how to get state-of-the art accuracy in a Bayesian framework, opening the path for other applications)
276 | 
277 | 
278 | Machine Learning for NLP
279 | 
280 |     Sutton & McCallum "An introduction to conditional random fields for relational learning" (CRF实在是在NLP中太好用了！！！！！而且我们大家都知道有很多现成的tool实现这个，而这个就是一个很简单的论文讲述CRF的，不过其实还是蛮数学= =。。。)
281 |     Knight "Bayesian Inference with Tears" (explains the general idea of bayesian techniques quite well)
282 |     Berg-Kirkpatrick et al. "Painless Unsupervised Learning with Features" (this is from this year and thus a bit of a gamble, but this has the potential to bring the power of discriminative methods to unsupervised learning)
283 | 
284 | Information Extraction
285 | 
286 |     Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. COLING 1992. (The very first paper for all the bootstrapping methods for NLP. It is a hypothetical work in a sense that it doesn't give experimental results, but it influenced it's followers a lot.)
287 |     Collins and Singer. Unsupervised Models for Named Entity Classification. EMNLP 1999. (It applies several variants of co-training like IE methods to NER task and gives the motivation why they did so. Students can learn the logic from this work for writing a good research paper in NLP.)
288 | 
289 | Computational Semantics
290 | 
291 |     Gildea and Jurafsky. Automatic Labeling of Semantic Roles. Computational Linguistics 2002. (It opened up the trends in NLP for semantic role labeling, followed by several CoNLL shared tasks dedicated for SRL. It shows how linguistics and engineering can collaborate with each other. It has a shorter version in ACL 2000.)
292 |     Pantel and Lin. Discovering Word Senses from Text. KDD 2002. (Supervised WSD has been explored a lot in the early 00's thanks to the senseval workshop, but a few system actually benefits from WSD because manually crafted sense mappings are hard to obtain. These days we see a lot of evidence that unsupervised clustering improves NLP tasks such as NER, parsing, SRL, etc,
293 |     
294 | # Reference
295 | 1. http://www.newsmth.net/nForum/#!article/NLP/43   zibuyu (得之我幸失之我命), NLP常用信息资源, 水木社区 (Wed Mar 14 23:56:43 2007)
296 | 
297 | 2. http://www.newsmth.net/nForum/#!article/NLP/3849   zibuyu (得之我幸失之我命), NLP常用开源/免费工具, 水木社区 (Wed Mar 14 23:56:43 2007)
298 | 
299 | 3. http://www.newsmth.net/nForum/#!article/NLP/5461   zibuyu (得之我幸失之我命), NLP领域经典综述, 水木社区  (Tue Feb 24 11:13:53 2009)
300 | 
301 | 4. http://www.zhihu.com/question/19929473   "目前常用的自然语言处理开源项目/开发包有哪些？" 孔牧, 邱锡鹏, 裴飞, 贺一帆 武博文
302 | 
303 | 
304 | 5. http://www.zhihu.com/question/19895141 "自然语言处理怎么最快入门？"
305 | 
306 | 
307 | 
308 | 
309 | 


--------------------------------------------------------------------------------
/awesome/ocr-tools.md:
--------------------------------------------------------------------------------
 1 | 极客杨的OCR工具箱：Tesseract 是目前应用最广泛的免费开源OCR工具（背后有Google的支持）。商业产品有ABBYY的finereader，还有Adobe;国产的有文通和汉王。当前热点是将OCR移植到智能手机上拓展新的输入渠道、IOS有基于Tesseract的实现，Android有高通vuforia API。
 2 | 
 3 | 识别效率高低的关键还是调参数，主要两点：不同的语言有不同的初始设置； 有颜色或渐进的背景会极大降低识别准确率，需要先转换成黑白／灰度模式（可以试试OpenCV)。 推荐看两篇文章，一篇是Tesseract简介（2007）,另一篇报告了Tesseract在处理彩色图片中遇到的问题。
 4 | 
 5 | 资料卡片流： http://hao.memect.com/?tag=ocr-tools
 6 | 
 7 | 
 8 | [![Tesseract](http://img.memect.com/05jtNcF8k5Kgc3Euvqf5rfZCinM=/400x0/t/1b014ddddc07c435ce3775f3ba85199e706d69456870b9fcb24b5d8ce8c684da)](http://hao.memect.com/?tag=ocr-tools)
 9 | 
10 | # Top Reading - Market Survey
11 | https://tesseract-ocr.googlecode.com/files/TesseractOSCON.pdf  Tesseract features and key issues (2007)
12 | 
13 | http://www.assistivetechnology.vcu.edu/files/2013/09/pxc3882784.pdf Optical Character Recognition by Open Source OCR 
14 | Tool Tesseract: A Case Study (2012)
15 | 
16 | http://lifehacker.com/5624781/five-best-text-recognition-tools 
17 | 
18 | http://www.zhihu.com/question/19593313  
19 | 
20 | http://www.perfectgeeks.com/list/top-best-free-ocr-software/13
21 | 
22 | http://lib.psnc.pl/Content/358/PSNC_Tesseract-FineReader-report.pdf  Report on the comparison of Tesseract and 
23 | ABBYY FineReader OCR engines (2012)
24 | 
25 | 
26 | # best OCR tools
27 | https://code.google.com/p/tesseract-ocr/  mostly used open source ocr software. apache 2.0. It has been improved extensively by Google
28 | 
29 | http://finereader.abbyy.com/   one of the best commercial product
30 | 
31 | http://www.wintone.com.cn/en/  one of the best commercial product for Chinese
32 | 
33 | # Tesseract in action and Q/A
34 | http://benschmidt.org/dighist13/?page_id=129
35 | 
36 | http://stackoverflow.com/questions/13511102/ios-tesseract-ocr-image-preperation
37 | 
38 | http://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy?rq=1
39 | 
40 | http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version
41 | 
42 | 
43 | 
44 | # Tesseract related applications
45 | https://github.com/gali8/Tesseract-OCR-iOS
46 | 
47 | https://github.com/rmtheis/android-ocr
48 | 
49 | https://github.com/rmtheis/tess-two
50 | 
51 | 
52 | # misc
53 | 
54 | https://developer.vuforia.com/resources/sample-apps/text-recognition
55 | https://www.youtube.com/watch?v=KLqFQ2u52iU
56 | 
57 | http://blog.ayoungprogrammer.com/2013/01/equation-ocr-part-1-using-contours-to.html
58 | 


--------------------------------------------------------------------------------
/awesome/opendata-gbif.md:
--------------------------------------------------------------------------------
 1 | http://www.gbif.org/mendeley/usecases  research papers
 2 | 
 3 | http://www.gbif.org/newsroom/uses  showcases using aggregated data
 4 | 
 5 | http://imsgbif.gbif.org/CMS_ORC/?doc_id=2613&download=1  2014 overview
 6 | 
 7 | http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0066559  research paper on grey squirrel
 8 | 
 9 | http://www.gbif.org/  homepage
10 | 
11 | 


--------------------------------------------------------------------------------
/awesome/outlier-text-mining.md:
--------------------------------------------------------------------------------
 1 | # Outlier Detection in Text Mining
 2 | 
 3 | contributor: 郭惠礼 , 许扬逸Dijkstra , phunter_lau , ai_东沂 
 4 | 
 5 | card list: http://bigdata.memect.com/?tag=outlierdetectionandtextmining
 6 | 
 7 | https://github.com/memect/hao/blob/master/awesome/outlier-text-mining.md
 8 | 
 9 | keywords:
10 |   outlier detection,
11 |   anomaly detection,
12 |   text mining
13 | 
14 | ## Outlier detection survey 
15 | http://info.mapr.com/resources_anewlook_anomalydetection_ty.html.html?aliId=7992403 
16 | @ 郭惠礼 ：刚看完一本书. Practical Machine Learning: A New Look At Anomaly Detection. " http://t.cn/RPJX4YT 一本免费的机器学习实践书。此书主要以Anomaly Detection与T-digest算法为主轴展开论述， 不涉及太深的知识。 比较简单，适合刚接触ML的初学者. 
17 | 
18 | http://arxiv.org/abs/1009.6119 A Comprehensive Survey of Data Mining-based Fraud Detection Research, Clifton Phua, Vincent Lee, Kate Smith, Ross Gayler 2010
19 | 
20 | http://www.kdnuggets.com/2014/05/book-outlier-detection-temporal-data.html Outlier Detection for Temporal Data (Book)
21 | 
22 | http://en.wikipedia.org/wiki/Anomaly_detection
23 | 
24 | http://www.siam.org/meetings/sdm10/tutorial3.pdf Outlier Detection Techniques - SIAM
25 | 
26 | http://www.slideshare.net/HouwLiong/chapter-12-outlier
27 | 
28 | 
29 | ## Outlier/anomaly detection in Text mining
30 | 
31 | http://nlp.shef.ac.uk/Completed_PhD_Projects/guthrie.pdf David Guthrie, Unsupervised Detection of Anomalous Text
32 | 来自UK Shef大学的博士论文
33 | 
34 | http://link.springer.com/chapter/10.1007%2F978-1-4614-6396-2_7 Aggarwal的outlier analysis一书的chapter 7 Outlier Detection in Categorical, Text and Mixed Attribute Data
35 | 
36 | 
37 | http://www.amazon.com/Survey-Text-Mining-Clustering-Classification/dp/1848000456  Survey of Text Mining: Clustering, Classiﬁcation, and Retrieval, Second Edition , Michael W. Berry and Malu Castellanos, Editors 2007 (check part IV Part IV Anomaly Detection) https://perso.uclouvain.be/vincent.blondel/publications/08-textmining.pdf
38 | 
39 | http://www.mdpi.com/1999-4893/5/4/469 Contextual Anomaly Detection in Text Data 2012
40 | 
41 | 
42 | ## Text mining (focus on topic models)
43 | http://www.itee.uq.edu.au/dke/filething/get/855/text-mining-ChengXiangZhai.pdf  Statistical Methods for Mining Big Text Data, ChengXiang Zhai 2014
44 | 
45 | http://cs.gmu.edu/~carlotta/publications/AlsumaitL_onlineLDA.pdf On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking, Loulwah AlSumait, Daniel Barbar´a, Carlotta Domeniconi
46 | 
47 | 
48 | ## 相关点评
49 | 
50 | 
51 | @phunter_lau:里面用到的技能也就是outlier detection然后根据outlier所在的几个表进行join，暴力搜索，这是常见手段
52 | http://weibo.com/1770891687/B5Gs7xdqQ
53 | 
54 | phunter_lau：注意,下面这段话不是常规办法也没多少理论依据，不能误导大家：
55 |  <i>"phunter_lau：也可以，并且对于非连通的情况可以随机加入连通，比如“你就是偷看那个妹子了”并继续分析有意想不到的结果"</i >
56 | 
57 | 
58 | 
59 | 许扬逸Dijkstra:  在antispam，multidimension outlier detection上也可以试试它
60 | @计兮 【金融数据挖掘之朴素贝叶斯】by@数说工作室网站：本文介绍了金融数据挖掘过程中的朴素贝叶斯模型，供大家参考。原文链接→http://t.cn/RPzhx7S
61 | http://www.weibo.com/1642083541/Be9vDxvyw
62 | 
63 | 
64 | ai_东沂: 我补充一下之前搜到的资料，来自UK Shef大学的博士论文http://nlp.shef.ac.uk/Completed_PhD_Projects/guthrie.pdf
65 | Aggarwal的outlier analysis一书的chapter 7 Outlier Detection in Categorical, Text and Mixed Attribute Data，http://link.springer.com/chapter/10.1007%2F978-1-4614-6396-2_7
66 | 


--------------------------------------------------------------------------------
/awesome/phonetic_algorithm.md:
--------------------------------------------------------------------------------
 1 | # 语音相似度算法与代码
 2 | 讨论： https://github.com/memect/hao/issues/164
 3 | 
 4 | 
 5 | ## 概念
 6 | [语音算法](http://en.wikipedia.org/wiki/Phonetic_algorithm) A phonetic algorithm is an algorithm for indexing of words by their pronunciation.
 7 | 
 8 | 相关关键字：
 9 |   语音相似度 phonetic similarity
10 |   声音相似度 Acoustic similarity/Confusability
11 | 
12 | 
13 | ## 算法与开源代码
14 | 
15 | ![](https://cloud.githubusercontent.com/assets/8302062/4227773/54b9e7f8-394c-11e4-9c5b-95fe817dee05.png)
16 | 
17 | algorithms
18 | * Soundex
19 |  * Daitch–Mokotoff Soundex
20 |  * Kölner Phonetik
21 | * Metaphone 
22 |   * Double Metaphone
23 | * New York State Identification and Intelligence System
24 | * Match Rating Approach (MRA)
25 | * Caverphone
26 | 
27 | open source code
28 | * https://github.com/elasticsearch/elasticsearch-analysis-phonetic/ -- java
29 | * https://github.com/maros/Text-Phonetic -- perl
30 | * https://github.com/dotcypress/phonetics -- go
31 | * https://github.com/lukelex/soundcord -- ruby
32 | * https://github.com/Simmetrics/simmetrics -- java
33 | * https://github.com/oubiwann/metaphone - https://pypi.python.org/pypi/Metaphone/0.4 --python
34 | * https://bitbucket.org/yougov/fuzzy - https://pypi.python.org/pypi/Fuzzy/1.0 --python
35 | * https://github.com/sunlightlabs/jellyfish - https://pypi.python.org/pypi/jellyfish/0.3.2 -- python
36 | 
37 | source： wikipedia, github
38 | 
39 | ## 相关论文
40 | 
41 | http://saffron.insight-centre.org/acl/topic/phonetic_similarity/ 相关论文列表
42 | 
43 | https://homes.cs.washington.edu/~bhixon/papers/phonemic_similarity_metrics_Interspeech_2011.pdf Phonemic Similarity Metrics to Compare Pronunciation Methods (2011)
44 | 
45 | http://webdocs.cs.ualberta.ca/~kondrak/papers/lingdist.pdf Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification (2006)
46 | 
47 | http://webdocs.cs.ualberta.ca/~kondrak/papers/chum.pdf Phonetic alignment and similarity (2003)
48 | 
49 | http://www.aclweb.org/anthology/C69-5701 THE Measurement OF PHONETIC SIMILARITY (1967)
50 | 
51 | http://www.aclweb.org/anthology/P/P06/P06-1125.pdf A Phonetic-Based Approach to Chinese Chat Text Normalization 中文方法
52 | 语音相似度 phonetic similarity 算法与开源代码
53 | 
54 | 


--------------------------------------------------------------------------------
/awesome/piecewise-linear-regression.md:
--------------------------------------------------------------------------------
 1 | # 分段线性模型资料与软件－入门篇
 2 | 
 3 | * contributors: @视觉动物晴木明川 @heavenfireray @禅系一之花
 4 | * keywords:  分段线性模型, Piecewise linear regression, Segmented linear regression,
 5 | * license: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
 6 | * cardbox: http://bigdata.memect.com/?tag=piecewiselinearregression
 7 | * discussion: https://github.com/memect/hao/issues/70
 8 | 
 9 | https://github.com/memect/hao/blob/master/awesome/piecewise-linear-regression.md
10 | 
11 | 
12 | ## 教程
13 | 
14 | https://onlinecourses.science.psu.edu/stat501/node/77 Piecewise linear regression models
15 | 
16 | http://www.fs.fed.us/rm/pubs/rmrs_gtr189.pdf A Tutorial on the Piecewise Regression Approach Applied to Bedload Transport Data, Sandra E. Ryan, Laurie S. Porth
17 | * @禅系一之花 我喜欢这个指南
18 | 
19 | http://www.ee.ucla.edu/ee236a/lectures/pwl.pdf UCLA, (2013) Lecture 2 Piecewise-linear optimization
20 | * 补充了一个UCLA 的偏理论的教程幻灯片
21 | 
22 | ## 统计软件, 都支持这个功能
23 | 
24 | http://people.ucalgary.ca/~aniknafs/index_files/TR%2094%202011.pdf RapidMiner (这个有免费版,用户挺多)
25 | 
26 | http://mathematica.stackexchange.com/questions/45745/fitting-piecewise-functions Mathematica
27 | 
28 |   *  http://forums.wolfram.com/student-support/topics/22308 "piecewise linear fit"
29 |   *  "Mathematica Navigator: Mathematics, Statistics and Graphics" page 516
30 |   *  http://dsp.stackexchange.com/questions/1227/fit-piecewise-linear-data
31 |   *  http://coen.boisestate.edu/bknowlton/files/2011/12/Mathematica-Tutorial-Megan-Frary.pdf Mathematica Tutorial
32 | 
33 | http://mobiusfunction.wordpress.com/2012/06/26/piece-wise-linear-regression-from-two-dimensional-data-multiple-break-points/ matlab
34 | 
35 | http://stats.stackexchange.com/questions/18468/how-to-do-piecewise-linear-regression-with-multiple-unknown-knots 
36 | matlab
37 | 
38 | http://www.ats.ucla.edu/stat/sas/faq/nlin_optimal_knots.htm SAS
39 | 
40 | http://climateecology.wordpress.com/2012/08/19/r-for-ecologists-putting-together-a-piecewise-regression/ R
41 | * "Piecewise or segmented regression for when your data has two different linear patterns. Again, comments here are good" source: https://twitter.com/statsforbios/status/378163948740026368
42 | 
43 | https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/linear_model.rst python
44 | 
45 | 
46 | ## 网友评论
47 | @视觉动物晴木明川 ：分段线性是众多非线性处理方法的本质！//@机器学习那些事儿:你上次说的MLR的分片思想本以为是基于LR-based adaboost 看来要好好学习你的论文了 //@heavenfireray:给个好玩的，我之前演示分段线性的菱形数据，LR-based adaboost的准确率超不过55%（分片线性模型能到99%以上）
48 | http://weibo.com/1718403260/ADrUnChqt
49 | 
50 | http://arxiv.org/abs/1401.6413 Online Piecewise Linear Regression via Infinite Depth Context Trees N. Denizcan Vanli, Muhammed O. Sayin, Suleyman S. Kozat
51 | 
52 | 
53 | ## 相关的阅读
54 | http://www.eccf.ukim.edu.mk/ArticleContents/JCEBI/03%20Miodrag%20Lovric,%20Marina%20Milanovic%20and%20Milan%20Stamenkovic.pdf 时间序列分析
55 | 
56 | 
57 | 


--------------------------------------------------------------------------------
/awesome/query-intent.md:
--------------------------------------------------------------------------------
 1 | http://www.cnblogs.com/yangxudong/p/3750358.html Query意图分析：记一次完整的机器学习过程（scikit learn library学习笔记）
 2 | 
 3 | http://www.tao-sou.com/740.html 淘宝搜索Query的15个类型（500个query样本）以及与百度谷歌的比较
 4 | 
 5 | http://searchnewscentral.com/20110531166/Technical/query-classification-understanding-user-intent.html Query classification; understanding user intent
 6 | 
 7 | http://dl.acm.org/citation.cfm?id=1351372 Determining the informational, navigational, and transactional intent of Web queries
 8 | 
 9 | http://dl.acm.org/citation.cfm?id=1507510 Survey and evaluation of query intent detection methods
10 | 
11 | http://www.slideshare.net/daniel.gayo/survey-and-evaluation-of-query-intent-detection-methods
12 | 
13 | http://www.ijarce.com/downloads/may-2014/IJARCE-13201416.pdf Survey and Analysis for User Intention Refined Internet Image Search
14 | 
15 | http://gesterling.wordpress.com/2010/03/03/local-queries-vs-local-intent/  Local Queries vs. ‘Local Intent’
16 | 


--------------------------------------------------------------------------------
/awesome/question-answer.md:
--------------------------------------------------------------------------------
 1 | # 问答系统资料整理
 2 | 
 3 | 
 4 | ## 智能个人助理（Intelligent personal assistant）
 5 | * [Amazon Evi](http://www.evi.com/) (launched in 2012) "best selling mobile app that can answer questions about local knowledge"
 6 | ** formerly [True Knowledge](http://en.wikipedia.org/wiki/Evi_(software)) (launched in 2007), "a natural answering question answering system", acquired by Amazon in 2012
 7 | * [Google Now](http://www.google.com/landing/now/) (launched in 2012) "an intelligent personal assistant developed by Google"
 8 | * [Apple Siri](https://www.apple.com/ios/siri/) (launched in 2011) "an intelligent personal assistant and knowledge navigator which works as an application for Apple Inc.'s iOS." 
 9 | ** Siri IOS app (by Siri Inc.) (launched in 2009), founed in 2007, acquired by Apple in 2010
10 | * [Microsoft Cortana](http://www.windowsphone.com/en-us/how-to/wp8/cortana/meet-cortana) "an intelligent personal assistant on Windows Phone 8.1"
11 | * [Sumsung S Voice](http://www.samsung.com/global/galaxys3/svoice.html) (launched in 2012) "an intelligent personal assistant and knowledge navigator which is only available as a built-in application for the Samsung Galaxy”
12 | 
13 | 
14 | * [Jelly](http://en.wikipedia.org/wiki/Jelly_%28app%29) "an app (currently available on iOS and Android) that serves as a Q&A platform, created by a company of the same name led by Biz Stone, one of Twitter's co-founders. " ," it encourages people to use photos to ask questions"
15 | * [Viv](http://viv.ai/) (launching in 2014) "a global platform that enables developers to plug into and create an intelligent, conversational interface to anything." 
16 | * [出门问问](http://chumenwenwen.com/) 
17 | 
18 | * [Project CALO](http://en.wikipedia.org/wiki/CALO) (2003-2008) funded by the Defense Advanced Research Projects Agency (DARPA) under its Personalized Assistant that Learns (PAL) program
19 | * [Vlingo](http://en.wikipedia.org/wiki/Vlingo) acquired by Nuance in December 2011
20 | * [Voice Mate](http://en.wikipedia.org/wiki/Voice_Mate)  LG
21 | 
22 | 
23 | ## 智能自动问答系统：
24 | * [IBM Watson](http://www.ibm.com/smarterplanet/us/en/ibmwatson/) (launched in 2013)
25 | ** [IBM DeepQA (watson)](https://www.research.ibm.com/deepqa/deepqa.shtml) (launched in 2011) "A first stop along the way is the Jeopardy! Challenge..."
26 | * [Wolfram alpha](http://www.wolframalpha.com/) "which was released on May 15, 2009"
27 | * [Project Aristo](http://www.allenai.org/TemplateGeneric.aspx?contentId=8) current project at Allen Institute for Artificial Intelligence (AI2) 
28 | ** [Porject Halo](http://www.allenai.org/TemplateGeneric.aspx?contentId=9) past project 
29 | 
30 | 
31 | ## 聊天机器人(Chatbot)与图灵测试:
32 | * [小Q（腾讯聊天机器人）](http://qrobot.qq.com/) "QQ机器人是腾讯公司陆续推出的人工智能聊天机器人的总称" (2013)
33 | * [微软小冰](http://www.msxiaoice.com/v2/DesktopLanding) "微软小冰是领先的跨平台人工智能机器人" (2014)
34 | * [Eugene Goostman](http://en.wikipedia.org/wiki/Eugene_Goostman) "portrayed as a 13-year-old Ukrainian boy" (2001-)
35 | * [Cleverbot](http://en.wikipedia.org/wiki/Cleverbot) "a web application that uses an artificial intelligence algorithm to have conversations with humans"
36 | * [ELIZA](http://en.wikipedia.org/wiki/ELIZA) ELIZA is a computer program and an early example of primitive natural language processing (1976)
37 | 
38 | ## 人工问答系统：
39 | * ask.com
40 | * https://answers.yahoo.com/
41 | * http://answers.com 
42 | **  http://wiki.answers.com/
43 | * stackoverflow
44 | * reddit
45 | * quora
46 | * Formspring  qa based social network
47 | * 知乎
48 | * 百度知道
49 | * 百度微问答
50 | * http://segmentfault.com/
51 | * 天涯 http://wenda.tianya.cn/
52 | 更多见维基百科 http://en.wikipedia.org/wiki/List_of_question-and-answer_websites 
53 | 
54 | 


--------------------------------------------------------------------------------
/awesome/rdb-rdf.md:
--------------------------------------------------------------------------------
 1 | # Relational Databases to RDF (RDB2RDF)
 2 | 摘要：[经典收藏]如何将关系数据库数据映射到语义万维网RDF表达方式并支持SPARQL查询语言。
 3 | 
 4 | editor(s): [吴伟](https://github.com/wwumit), [好东西传送门](https://github.com/haoawesome)
 5 | 
 6 | 
 7 | # Overview
 8 | http://www.csee.umbc.edu/courses/graduate/691/spring14/01/notes/20_rdbs/20r2r.pdf Short story - RDB and RDF 1, Tim Finin's class notes - [CMSC 491/691 Special Topics: A Web of Data]( http://www.csee.umbc.edu/courses/graduate/691/spring14/01/)
 9 | 
10 | http://www.slideshare.net/juansequeda/rdb2-rdf-tutorial-iswc2013 Long story - the Relational Databases to RDF (RDB2RDF) Tutorial at the 2013 International Semantic Web Conference (ISWC2013)
11 | 
12 | http://www.w3.org/2001/sw/wiki/RDB2RDF
13 | 
14 | 
15 | # W3C Recommendations
16 | 
17 | http://www.w3.org/TR/r2rml/  R2RDF, W3C Recommendation 2012
18 | 
19 | http://www.w3.org/TR/rdb-direct-mapping/   Direct Mapping, W3C Recommendation 2012
20 | 
21 | # Tools
22 | 
23 | ## Academic Research
24 | 
25 | https://github.com/nkons/r2rml-parser
26 | 
27 | https://github.com/antidot/db2triples
28 | 
29 | http://d2rq.org/
30 | 
31 | http://www.capsenta.com/
32 | 
33 | http://www.dblab.ntua.gr/~bikakis/SPARQL-RW.html
34 | 
35 | http://www.dblab.ntua.gr/~bikakis/SPARQL2XQuery.html
36 | 
37 | ## Commerial Tools
38 | http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtR2RML  OpenLink Virtuoso
39 | 
40 | http://docs.oracle.com/database/121/RDFRM/sem_relational_views.htm  Oracle database
41 | 
42 | 


--------------------------------------------------------------------------------
/awesome/recurrent-neural-networks.md:
--------------------------------------------------------------------------------
 1 | contributors: @ICT_朱亚东 @维尔茨 
 2 | 
 3 | card list:  http://bigdata.memect.com/?tag=rnn
 4 | 
 5 | https://github.com/memect/hao/blob/master/awesome/recurrent-neural-networks.md
 6 | 
 7 | ## 学习资源
 8 | http://en.wikipedia.org/wiki/Recurrent_neural_network 背景知识
 9 | 
10 | http://minds.jacobs-university.de/sites/default/files/uploads/papers/ESNTutorialRev.pdf ( @ICT_朱亚东 推荐, 短教程) H. Jaeger (2002): Tutorial on training recurrent neural networks, covering BPPT,RTRL, EKF and the "echo state network" approach. GMD Report 159, German National Research Center for Information Technology, 2002 (48 pp.)
11 | 
12 | http://www.cs.toronto.edu/~graves/preprint.pdf (@维尔茨 认证, 教科书) Supervised Sequence Labelling with Recurrent Neural Networks. Textbook, Studies in Computational Intelligence, Springer, 2012. 
13 | 
14 | http://www.idsia.ch/~juergen/rnn.html (资源列表) over 60 RNN papers by Jürgen Schmidhuber's group at IDSIA　
15 | 
16 | ## 专家
17 | 
18 | http://www.cs.toronto.edu/~graves/ Alex Graves
19 | 
20 | http://www.idsia.ch/~juergen/ Jürgen Schmidhuber
21 | 
22 | http://research.microsoft.com/en-us/projects/rnn/ Microsoft RNN group
23 | 
24 | 
25 | ## 相关讨论 
26 | ### @维尔茨 RNN label sequence:  https://github.com/memect/hao/issues/41 
27 | 问： @维尔茨 有木有关于循环神经网络在segmented sequence labeling方面的papers么？我希望用RNN label sequence本身而非sequence members
28 | 
29 | 答： 多伦多大学的 Alex Graves 有专著研究此问题. 基于recurrent neural networks(RNN)研究: @ICT_朱亚东 推荐Herbert Jaeger的短教程(40多页). Jürgen Schmidhuber教授收集了60多相关论文, 微软研究院利用RNN做自然语言处理
30 | 


--------------------------------------------------------------------------------
/awesome/reverse-proxy-load-balancer.md:
--------------------------------------------------------------------------------
 1 | # 提高网站页面响应速度的解决方案: DNS A-Record, 反向代理及负载均衡
 2 | 
 3 | contributors @mahak, BUPTGuo , 情非得已小屋, 新世界_玉兔 , 52cs
 4 | 
 5 | discussion: https://github.com/memect/hao/issues/48
 6 | 
 7 | keywords:
 8 |  DNS A-Record, 
 9 |  负载均衡(load balancer),
10 |  反向映射 (reverse proxy),
11 | 
12 | 
13 | ## 解决方案
14 | http://webmasters.stackexchange.com/questions/10927/using-multiple-a-records-for-my-domain-do-web-browsers-ever-try-more-than-one 最简单的方案, DNS设置, 在一个域名下设置多个 "A" record, 即一个域名映射多个IP地址, 然后由域名服务器与浏览器共同选择其中的一个IP访问
15 | 
16 | http://yijiu.blog.51cto.com/433846/1408443 基于Nginx反向代理及负载均衡
17 | 
18 | http://fournines.wordpress.com/2011/12/02/improving-page-speed-cdn-vs-squid-varnish-nginx/ Improving page speed: CDN vs Squid/Varnish/nginx/mod_proxy
19 | 
20 | http://en.wikipedia.org/wiki/Reverse_proxy
21 | 
22 | 
23 | http://en.wikipedia.org/wiki/Load_balancing_%28computing%29#Load_balancer_features
24 | 
25 | 
26 | ## 讨论
27 | <b>@52cs 一个域名貌似只能绑定一个IP，这么多服务器怎么都可以被域名找到呢？</b>
28 | 
29 | mahak: 域名服务的A记录可以是多个ip做循环（round roubin），请求到了ip之后，可以是负载均衡设备，具体均衡策略可根据应用调整，比如是否会话保持等。
30 | 
31 | BUPTGuo：负载均衡？ (8月3日 17:17)
32 | 
33 | 好东西传送门：[求助] 欢迎大家到这里去解答 http://t.cn/RPi5Prc 小声说一句：应该是通过load balancer或reverse proxy //@龙星计划: 求科普 (8月3日 17:51)
34 | 
35 | 情非得已小屋：负载均衡+反向映射 (8月3日 19:24)
36 | 
37 | 新世界_玉兔：DNS提供负载均衡 (8月4日 16:05)
38 | 
39 | 
40 | 


--------------------------------------------------------------------------------
/awesome/semanticweb-dl:
--------------------------------------------------------------------------------
 1 |  徐涵W3China   2014-11-21 08:00
 2 | 《黄智生博士谈语义网与Web 3.0》时隔多年，这篇5年前的访谈至今很大程度上仍然受用。@好东西传送门http://t.cn/RzA6G69
 3 | 好东西传送门 转发于2014-11-21 10:10
 4 | 在知识图谱已广为人知的今天，回顾这篇访谈很有必要。
 5 | Gary南京 转发于2014-11-22 06:57
 6 | 谈语义Web最好不要把OWL和描述逻辑的作用过分夸大，因为本体不等于描述逻辑，语义Web的实现不一定要描述逻辑，描述逻辑很多东西在Web上是无用的
 7 | 任远AI 转发于2014-11-22 07:04
 8 | OWL作为一个逻辑上的探索还是非常有价值的，提供了可计算性，完备性，正确性都有保障的情况下的一个表达能力的（近似）上界。
 9 | 徐涵W3China 转发于2014-11-22 08:05
10 | 逻辑专家谈描述逻辑！
11 | 昊奋 转发于2014-11-22 08:48
12 | 任何时候都有符合当前潮流需要大力推广的技术，至少在这几年，基于描述逻辑和owl知识表示的任何技术还只能停留在科研范围，不过说不定过了几年又会得到重视，deep semantic或许是一个好的名字，也应该会走从shallow learning到deep learning发展的路。
13 | Gary南京 转发于2014-11-22 10:22
14 | OWL和描述逻辑只是众多推理和表示方法中的一种，其实没那么重要，之所以最近几年红火，只是学术界吹捧的，发了很多没什么用的论文，真正实用性是很差的，一旦没有实用性，就会被抛弃，这就是近两年来描述逻辑冷下去的原因，脱离实际的推理是不会有什么影响力的
15 | 昊奋 转发于2014-11-22 10:37
16 | 由漆教授作这样的逻辑推理专家做出如此反思和结论，更值得称赞，同时也比我等不做逻辑的人大谈特谈来得更能让人信服
17 | Gary南京 转发于2014-11-22 10:48
18 | 呵呵，@昊奋 对推理的理解也是比较深的，我之所以说这些，是因为我不认为我自己的搞描述逻辑的，我不会把自己限制在某个门派，只要是有意思的东西都可以做，其实目前真正有用的还是早期的那些产生式规则、语义网络的东西
19 | 昊奋 转发于2014-11-22 11:05
20 | 回复@Gary南京:这种开放的精神值得赞
21 | 任远AI 转发于2014-11-22 12:04
22 | 赞同不应有门派之间。不过科研成果的实用性还是很难预计的。像语义网络在早期发展的时候也没特别大的影响力，这两年才在知识图谱之类的工业界应用上开始发挥作用。所以以后逻辑方法会有怎么样的前景还很不好说。
23 | 任远AI 转发于2014-11-22 12:09
24 | 如果漆教授可以再详细阐述一下描述逻辑之所以实用性差的问题核心，以及结合实际的推理技术应具备的特征，会是一个很有价值的课题！
25 | 昊奋 转发于2014-11-22 12:37
26 | 这个提议很好，不过挺难回答好，如果说清楚会对业界和学术界影响很大
27 | Gary南京 转发于2014-11-22 13:21
28 | 这个要全面分析是很难的，我也是在思考当中，不过今年年初在Huddersfield的一个聚集了OWL推理的一些精英的研讨会上，大家对OWL的在大公司中是否有用的讨论中，发现其实很少有公司在用，其实OWL比较有用的也就DL-Lite, EL，就算这两个影响力其实是有限的。我说OWL实用性差就是基于此次讨论做的
29 | Gary南京 转发于2014-11-22 13:26
30 | 另外，要注意的是，搞逻辑的总觉得自己的东西很有用，其实现在逻辑是基于知识库才有威力的，而真正有多少知识库是描述逻辑可以用的？知识获取的瓶颈突破不了，逻辑只是纸上谈兵而以，这就是KR不如ML和NLP的主要原因，而KR届真正意识到这点的人很少
31 | 任远AI 转发于2014-11-22 16:23
32 | 回复@Gary南京:我个人觉得描述逻辑的研究的出发点是相当有野心的，试图找出各种概念模型的一个可判定的最大“并集”，以此来解决异构知识的整合问题。可是工程和认知上实现并集的代价太大了，目前能做的其实只是各种模型的“交集”，这也是为什么越轻的DL相对越常用的原因。复杂DL只在极特定的领域可用
33 | Gary南京 转发于2014-11-22 16:46
34 | 回复@任远AI:描述逻辑是否有用这个问题其实不需要去争论，因为肯定是有用的。不过在Web上，知识的表示是多样性的，描述逻辑只是其中一种而以，不需要过分的夸大，这就是我的观点，08年以前就是过分夸大了，照成泡沫，现在也差不多爆掉了
35 | Gary南京 转发于2014-11-22 16:52
36 | 如果你去看看现在搞描述逻辑的人都在做什么你就会发现，所谓的OWL 2其实没多大影响力，大部分人都在搞DL-Lite, EL, OWL 2 RL，这其实就是对的，很多时候，越是简单的越实用。我其实对OWL 2一直就觉得没多大用，都是搞研究的人在空想的，应用中不一定是这样，只有根植于生活中的东西才有生命力
37 | 任远AI 转发于2014-11-22 16:55
38 | KR和知识获取本来应该是相互依存的关系。但现在知识获取有瓶颈，KR的人等不下去，于是只能想象出一些情境来做研究。以后Linked Data和WikiData可能会给KR提供一个更扎实的基础。
39 | Gary南京 转发于2014-11-22 16:59
40 | 表面上看是KR的人等不下去了，其实本质上是做KR的人没有应用驱动的去思考问题，只会去从理论方向去想问题，容易脱离实际，我觉得要真正做好KR，就需要去了解应用，而不是纸上谈兵。现在KR届的人思想太僵化，抱着自己的一某三分地不放，没有创新，最终很多组都会消亡
41 | 任远AI 转发于2014-11-22 17:02
42 | 这点我赞同，其实Ian和Franz早期搞DL的时候还是基于Galen和SNOMED之类的本体的，还是贴近实践的。只是搞逻辑的天生喜欢精巧复杂的东西，喜欢探讨理论上的可能性。这个算是KR领域的一个基因了。。。
43 | 任远AI 转发于2014-11-22 17:15
44 | 我觉得主要是逻辑这个圈子和工程师思维八字不合。像做ML或者NLP的可以说针对某个特定的应用对某个经典的模型进行改进提升了n%的精度。这种文章在KR里面是很难发的，你必须说你这个改进不是ad hoc的，有可推广性，是某种意义上的最优解。这就逼到理论的路子上去了。
45 | 任远AI 转发于2014-11-22 17:29
46 | 回复@Gary南京:哈哈哈深有同感。其实搞理论，搞证明，搞复杂的东西没啥错。为理论而理论，而证明而证明，而复杂而复杂就没必要的。有时候看到很多文章，框架定理一套套，证明了一堆很玄的东西，看得你热血沸腾，最后实质可以用的就那么一丁点。我就不说是谁了[doge][doge][doge]
47 | 昊奋 转发于2014-11-22 17:54
48 | KR只是解决知识表示和知识模型的问题，但终究还有知识获取等问题。所以要成功，一定是开放，拥抱其他领域，针对具体的问题，踏踏实实的做出一些东西。ML和NLP的深入人心也是靠做出来的
49 | 昊奋 转发于2014-11-22 17:55
50 | 已经很明显地说明是谁了，[嘻嘻]
51 | 昊奋 转发于2014-11-22 18:01
52 | 一般要确定你做的是本体编辑还是ontology population还是ontology learning，对于编辑，可以用protégé或各种基于wiki的本体编辑，如果是population，如NELL等基于本体的学习算法可用，这时是生成实例，如果是最后一种情况，MPI的PATTY等可以参考，这种可以学习新的本体模式
53 | Gary南京 转发于2014-11-22 18:03
54 | 回复@昊奋:是的，ML和NLP也很多灌水的论文，基本上没多大用，只是因为有应用支撑才红火起来的
55 | 任远AI 转发于2014-11-22 18:14
56 | 手工本体编辑很难规模化，大的本体都是十多年的努力才做成的。也许以后要用自动翻译之类的方法来生成本体
57 | 昊奋 转发于2014-11-22 18:17
58 | 所以在本体编辑的时候需要借助搜索或其他途径来获取现有相关本体并达到复用的目的。
59 | 昊奋 转发于2014-11-22 18:32
60 | 回复@anklebreaker11: 领域本体的构建请先查阅是否有相关的本体或者是否可以从通用的本体或知识库中抽取一个子集来获得。接着，再是类似NELL的方法来进一步扩充实例知识。
61 | 昊奋 转发于2014-11-22 18:48
62 | 回复@anklebreaker11: 医学领域比较复杂，不过你可以先了解一下LODD (linked open drug data) 以及 linked life science中涉及到的如snomed-ct等本体。另外，很多本体是包含中文标签的。当然如果涉及中医，可能需要更多依赖中文的资料，特别是医古文书籍等进行开放式抽取等。
63 |  
64 | ​
65 | 


--------------------------------------------------------------------------------
/awesome/sparse-representation-cv.md:
--------------------------------------------------------------------------------
 1 | http://www.eecs.berkeley.edu/~yang/software/l1benchmark/index.html sparse Optimization
 2 | 
 3 | http://perception.csl.illinois.edu/matrix-rank/  Low-Rank Representation
 4 | 
 5 | http://www.eecs.berkeley.edu/%7Eyang/courses/ECCV2012/ECCV12-lecture1.pdf Introduction to Sparse Representation and Low-Rank Representation
 6 | 
 7 | http://www.eecs.berkeley.edu/%7Eyang/courses/ECCV2012/ECCV12-lecture2.pdf Variations of Sparse Optimization and Their Numerical Implementation
 8 | 
 9 | http://www.eecs.berkeley.edu/%7Eyang/courses/ECCV2012/ECCV12-lecture3.pdf Finding and Harnessing Low-Dimensional Structure of High-Dimensional Data
10 | 
11 | http://www.eecs.berkeley.edu/~yang/ Allen Y. Yang
12 | 
13 | http://www.columbia.edu/~jw2966/ John Wright
14 | 
15 | http://yima.csl.illinois.edu/ Yi Ma
16 | 


--------------------------------------------------------------------------------
/awesome/speech-recognition.md:
--------------------------------------------------------------------------------
  1 | #深度学习在语音识别的研究，以及语音处理常用资源
  2 | 
  3 | keywords: 
  4 |   speech processing,
  5 |   speech recognition,
  6 |   speaker recognition,
  7 |   deep learning
  8 |   
  9 | card lists:
 10 | * http://hao.memect.com/?tag=speechRecognition
 11 | * http://bigdata.memect.com/?tag=speech+deeplearning 
 12 |   
 13 | https://github.com/memect/hao/blob/master/awesome/speech-recognition.md
 14 | 
 15 | ## deep learning and speech recognition 
 16 | ###Microsoft
 17 | 
 18 | http://research.microsoft.com/en-us/people/deng/  
 19 | Li Deng (IEEE M'89;SM'92;F'04) received the Ph.D. degree from the University of Wisconsin-Madison. He was an assistant professor (1989-1992), tenured associate professor (1992-1996), and tenured Full Professor (1996-1999) at the University of Waterloo, Ontario, Canada. In 1999, he joined Microsoft Research, Redmond, WA, where he is currently Principal Researcher and Research Manager of the Deep Learning Technology Center. Since 2000, he has also been an Affiliate Full Professor and graduate committee member at the University of Washington, Seattle, teaching graduate course of Computer Speech Processing and serving on Ph.D. thesis committees. Prior to joining Microsoft, he also worked or/and taught at Massachusetts Institute of Technology, ATR Interpreting Telecom. Research Lab. (Kyoto, Japan), and HKUST. He has been granted over 60 US or international patents in acoustics/audio, speech/language technology, and machine learning. He received numerous awards/honors bestowed by IEEE, ISCA, ASA, Microsoft, and other organizations.
 20 | 
 21 | http://research.microsoft.com/pubs/217165/ICASSP_DeepTextLearning_v07.pdf Deep Learning for Natural Language Processing and Related Applications, Microsoft
 22 | 
 23 | http://www.cs.toronto.edu/~ndjaitly/techrep.pdf Application of Pretrained Deep Neural Networks to Large Vocabulary Conversational Speech Recognition (2012) interspeech 
 24 | (work at Google)
 25 | 
 26 | http://research.microsoft.com/pubs/189008/tasl-deng-2244083-x_2.pdf Li Deng, Xiao Li,  Machine Learning Paradigms for Speech Recognition: An Overview
 27 | 
 28 | ### Google
 29 | 
 30 | http://www.cs.toronto.edu/~hinton/
 31 | Geoffrey Everest Hinton FRS (born 6 December 1947) is a British-born computer scientist and psychologist, most noted for his work on artificial neural networks. He is now partly working for Google.[1] He is the co-inventor of the backpropagation and contrastive divergence training algorithms and is an important figure in the deep learning movement.
 32 | 
 33 | http://research.google.com/pubs/VincentVanhoucke.html
 34 | Vincent Vanhoucke is a Research Scientist at Google. He is a technical lead and manager in Google's deep learning infrastructure team. Prior to that, he lead the speech recognition quality effort for Google Search by Voice. He holds a Ph.D. in Electrical Engineering from Stanford University and a Diplôme d'Ingénieur from the Ecole Centrale Paris. 
 35 | 
 36 | http://psych.stanford.edu/~jlm/pdfs/Hinton12IEEE_SignalProcessingMagazine.pdf  Deep Neural Networks for Acoustic Modeling in Speech Recognition (2012) IEEE Signal Processing Magazine
 37 | 
 38 | http://research.google.com/pubs/SpeechProcessing.html  Google Speech processing
 39 | 
 40 | ### other research groups
 41 | http://mi.eng.cam.ac.uk/Main/Speech/   Cambridge University
 42 | * 回复@黄浩XJU: 谢谢指正，剑桥的工作很全面，目前 http://t.cn/RP8YGTX Phil Woodland 有个中国学生 张超 在做深度学习研究
 43 | 
 44 | 
 45 | http://www.speech.cs.cmu.edu/  CMU
 46 | 
 47 | http://www.speech.sri.com/ SRI
 48 | 
 49 | http://www.clsp.jhu.edu/people/  Center for Language and Speech Processing at Johns Hopkins University
 50 | 
 51 | 
 52 | ## speech processing resources
 53 | 
 54 | 
 55 | 
 56 | ### tools and open source tools
 57 | http://en.wikipedia.org/wiki/List_of_speech_recognition_software
 58 | * quite some software leverage google speech api to provide online speech to text on mobile devices.
 59 | 
 60 | 
 61 | http://www.signalprocessingsociety.org/technical-committees/list/sl-tc/spl-nl/2013-05/ALIZE/ ALIZE 3.0 - Open-source platform for speaker recognition
 62 | 
 63 | https://github.com/taf2/speech2text
 64 | 
 65 | 
 66 | http://kaldi.sourceforge.net/about.html Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers. 
 67 | * @黄浩XJU: 提一下Daniel Povey(http://www.danielpovey.com/) 的Kaldi吧，很好的工具  
 68 | 
 69 | 
 70 | ### products
 71 | 
 72 | http://www.consumersearch.com/voice-recognition-software/review
 73 | 
 74 | http://www.nuance.com/dragon/index.htm
 75 | 
 76 | http://en.wikipedia.org/wiki/Windows_Speech_Recognition
 77 | 
 78 | http://download.cnet.com/windows/voice-recognition-software/?tag=bc
 79 | 
 80 | http://www.labnol.org/internet/dictation-for-google-chrome/24719/
 81 | 
 82 | 
 83 | ### exploration
 84 | http://en.wikipedia.org/wiki/Speech_recognition
 85 | 
 86 | http://www.technologyreview.com/news/427793/where-speech-recognition-is-going/  Where Speech Recognition Is Going
 87 | 
 88 | http://technav.ieee.org/tag/1597/speaker-recognition 48 resources related to Speaker Recognition
 89 | 
 90 | http://www.emory.edu/BUSINESS/speech/SpeechRecCase.pdf  nuance white paper, business use cases
 91 | 
 92 | 
 93 | ###  conferences
 94 | Popular speech recognition conferences held each year or two include SpeechTEK and SpeechTEK Europe, ICASSP, Interspeech/Eurospeech, and the IEEE ASRU. Conferences in the field of natural language processing, such as ACL, NAACL, EMNLP, and HLT, are beginning to include papers on speech processing. Important journals include the IEEE Transactions on Speech and Audio Processing (now named IEEE Transactions on Audio, Speech and Language Processing), Computer Speech and Language, and Speech Communication.
 95 | 
 96 | http://www.interspeech2014.org/public.php?page=tutorial.html  tutorial of interspeech 2014
 97 | 
 98 | http://www.icassp2014.org/tutorials.html  icassp 2014
 99 | 
100 | http://www.speechtek.com/2014/ SpeechTek
101 | 
102 | http://www.asru2013.org/  ASRU
103 | 
104 | http://www.iscslp2014.org/public.php?page=keynote.html  ISCSLP@INTERSPEECH 2014 - The 9th International Symposium on Chinese Spoken Language Processing
105 | 
106 | 
107 | ## discussion
108 | <b> @血色又残阳 问：需要语音处理的资料?  要求 
109 | 1、论文最好有配套代码，可以跑；
110 | 2、当前学术界和工业界最新或者主流技术有哪些；
111 | 3、是否有跟深度学习结合的；
112 | 4、最好也有说话人鉴别的相关论文和代码。
113 | </b>
114 | https://github.com/memect/hao/issues/50
115 | 
116 | 
117 | <b> yongsun ：有没有开源或者免费的英文语音识别软件/或项目？ 打算翻译一些冰球教学的视频，想结合识别结果来进行听译</b>
118 | https://github.com/memect/hao/issues/53
119 | 


--------------------------------------------------------------------------------
/awesome/stanford-cs224w.md:
--------------------------------------------------------------------------------
 1 | http://web.stanford.edu/class/cs224w/
 2 | 
 3 | 
 4 | #class notes
 5 | http://web.stanford.edu/class/cs224w/slides/01-intro.pdf
 6 | 
 7 | http://web.stanford.edu/class/cs224w/slides/02-gnp.pdf
 8 | 
 9 | http://web.stanford.edu/class/cs224w/slides/03-smallworld.pdf
10 | 
11 | http://web.stanford.edu/class/cs224w/slides/04-navigation.pdf
12 | 
13 | http://web.stanford.edu/class/cs224w/slides/05-evals.pdf
14 | 
15 | http://web.stanford.edu/class/cs224w/slides/06-signed.pdf
16 | 
17 | http://web.stanford.edu/class/cs224w/slides/07-cascading.pdf
18 | 
19 | http://web.stanford.edu/class/cs224w/slides/08-cascades.pdf
20 | 
21 | http://web.stanford.edu/class/cs224w/slides/09-influence.pdf
22 | 
23 | http://web.stanford.edu/class/cs224w/slides/10-outbreak.pdf
24 | 
25 | http://web.stanford.edu/class/cs224w/slides/11-powerlaws.pdf
26 | 
27 | http://web.stanford.edu/class/cs224w/slides/12-evolution.pdf
28 | 
29 | http://web.stanford.edu/class/cs224w/slides/13-pagerank.pdf
30 | 
31 | http://web.stanford.edu/class/cs224w/slides/14-kronecker.pdf
32 | 
33 | http://web.stanford.edu/class/cs224w/slides/15-weakties.pdf
34 | 
35 | http://web.stanford.edu/class/cs224w/slides/16-spectral.pdf
36 | 
37 | http://web.stanford.edu/class/cs224w/slides/17-overlapping.pdf
38 | 
39 | http://web.stanford.edu/class/cs224w/slides/19-memes.pdf
40 | 
41 | http://web.stanford.edu/class/cs224w/slides/20-review.pdf
42 | 


--------------------------------------------------------------------------------
/awesome/test-recent.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 2014-09-26 问：有没有最新的讲述人工智能发展史,现状,展望的资料? 答：人工智能(Artificial Intelligence) 领域综述有一个很好玩的图 "AI Landscape" （2008年AI Magazine附送的海报）, 再配上一个AI历史大事件的时间轴demo “ Companion Timeline of Artificial Intelligence History” http://t.cn/RhTXnDF [ [微博](http://www.weibo.com/5220650532/BoJAcrUuy) ]
 3 | 
 4 | 2014-09-26 不错，这个应该是第二版 @Vamei 2013年的第一版还有些有趣的图片 http://t.cn/zYtMBGK //@西瓜大丸子汤: 推荐给@好东西传送门 //@Vamei:原作者来认领 [ [微博](http://www.weibo.com/5220650532/BoHdz858S) ]
 5 | 
 6 | > 2014-09-25 @Linux中国: #Python 语言的发展简史# Python是我喜欢的语言，简洁，优美，容易使用。前两天，我很激昂的向朋友宣传Python的好处。  好吧，我承认Python不错，但它为什么叫Python呢？ 呃，似乎是一个电视剧的名字。 那你说的Guido是美国人么？ 他从Google换到Dropb…http://t.cn/RhYgiGm [ [微博](http://www.weibo.com/1772191555/BoG25tiMh) ]
 7 | 
 8 | 2014-09-25 这个scrum guide是个经典，对scrum困惑的同学可以看看。同时推荐好文 "The 2013 Scrum Guide changes" http://t.cn/RhjdQ1W 1. Artefact Transparency strengthened 2. Sprint Planning 3. Definition of Ready 4. Time boxes relaxed for most meetings 5. Daily Scrum purpose clarified [ [微博](http://www.weibo.com/5220650532/BoBCqkL9Z) ]
 9 | 
10 | > 2014-09-25 @朱少民: 当Scrum 的应用爆炸式增长时，形形色色的Scrum变种就出现了，不少公司已经忘记了Scrum 的价值和原则，为此，Scrum Alliance、scrum.org等联合发布了对Scrum的指导文件： http://t.cn/Rhjrrbs [ [微博](http://www.weibo.com/1652927771/BoByZyCjh) ]
11 | 
12 | 2014-09-25 问: 求计算神经科学资料? 答: 1. 资源门户网站(学者,论文,课程一网打尽) "Computational Neuroscience on the Web" http://t.cn/RhjQAgV 2. 暑期学校(2010至2014共5期) http://t.cn/RhjQAgc 3. 还有华盛顿大学公开课 "Computational Neuroscience" 谢 @苏梦Neuro-Gatsby @课程图谱 @要有光LTBL 推荐 [ [微博](http://www.weibo.com/5220650532/BoAQg5kj6) ]
13 | 
14 | 2014-09-25 [计算机视觉数据集不完全汇总] http://t.cn/Rhj0T9K 经典热点数据集: ImageNet,Flickr,MNIST 数据集目录: YACVID(200+),ComputerVisionOnline(100+),CVpapers(100+),CVOnline(100+),UIUC,UCSD,NICTA... 感谢 @丕子 @邹宇华 @李岩ICT人脸识别 @网路冷眼 @王威廉 @金连文 @数据堂 zhubenfulovepoem 推荐 [ [微博](http://www.weibo.com/5220650532/BoAbfmDPA) ]
15 | 
16 | 2014-09-24 搞数据挖掘的同仁怎么看？ 气象学专业呢？ //@复旦陈硕frank: 转发微博 [ [微博](http://www.weibo.com/5220650532/Bot0Cl2BQ) ]
17 | 
18 | > 2014-09-24 @中国社会科学院金融评论: Journal of Economic Literature最新一期的文章http://t.cn/RhlbJno 对近年来采用高频面板数据研究天气（相对于以往低频数据刻画的“气候”）经济效应的文献进行了评述。作为这一领域的外行，感觉这篇有趣的综述除了有助于找各种IV之外，在某些具体事实和技巧上也很有启发。 [ [微博](http://www.weibo.com/3205772127/BosQWsyNb) ]
19 | 
20 | 2014-09-24 可以看看教学录像，这个课可为两种目标服务：第一、了解计算生物学中的挑战性问题，寻求更好的计算方法，应用前沿的机器学习方法（很好奇深度学习的应用）第二、理解可以使用计算方法，尤其是现成的机器学习工具，把它们应用到生物学、医学前沿问题中 Bioinformatics, Health informatics //@医学统计 [ [微博](http://www.weibo.com/5220650532/Bosjr9NpC) ]
21 | 
22 | > 2014-09-24 @好东西传送门: 一张图表解析生物信息学中算法的实际应用(摘自"An Introduction to Bioinformatics Algorithms"）同时推荐该书作者Pavel Pevzner (UCSD教授, ACM院士) Coursera公开课Bioinformatics Algorithms (今年10月开课) http://t.cn/RhWs4Cp YouTube教学视频 http://t.cn/RhWs4CO 需要较强的数学及算法基础 [ [微博](http://www.weibo.com/5220650532/BorSV49Fo) ]
23 | 
24 | 2014-09-24 一张图表解析生物信息学中算法的实际应用(摘自"An Introduction to Bioinformatics Algorithms"）同时推荐该书作者Pavel Pevzner (UCSD教授, ACM院士) Coursera公开课Bioinformatics Algorithms (今年10月开课) http://t.cn/RhWs4Cp YouTube教学视频 http://t.cn/RhWs4CO 需要较强的数学及算法基础 [ [微博](http://www.weibo.com/5220650532/BorSV49Fo) ]
25 | 
26 | 2014-09-24 推荐 @tornadomeet 整理的 《本人常用资源整理(ing...)》 http://t.cn/zO1YaAE #深度学习#， #机器学习#，#数据挖掘#， #计算机视觉#，优化，数学，Linux，领域牛人，课程 ... ;-) 此人的博客可以归类为 #学霸的学习笔记# [ [微博](http://www.weibo.com/5220650532/BortzCrYs) ]
27 | 
28 | 2014-09-24 回复@尘绳聋-SYSU: 补上 @tornadomeet 原作 “机器学习&数据挖掘笔记_16（常见面试之机器学习算法思想简单梳理）” http://t.cn/zRoZPzP 现在已经写了25个笔记！ //@尘绳聋-SYSU:数盟的链接里没有标明原作：@tornadomeet [ [微博](http://www.weibo.com/5220650532/Borpttofb) ]
29 | 
30 | > 2014-09-24 @陈利人: 好文！常见面试之机器学习算法思想简单梳理 http://t.cn/RhWuNHg [ [微博](http://www.weibo.com/1915548291/Bor6t48ji) ]
31 | 
32 | 2014-09-24 感谢！ 附09年MLSS主页 http://t.cn/zl1sHfi 09年MLSS 所有还幻灯片打包下载 51M ZIP http://t.cn/RhWBmXr //@bigiceberg: mark，其中09年UK的mlss最经典。 [ [微博](http://www.weibo.com/5220650532/Borng7Ukv) ]
33 | 
34 | > 2014-09-24 @好东西传送门: 机器学习暑期学校MLSS全集(2002-): MLSS汇集了机器学习界名师，提供基础教程，展示领域进展, 免费讲义下载 -- 是了解领域前沿的好去处。全集罗列了过去的26次课和未来的8次课, 基本上欧洲,美国,澳洲,亚洲各自一摊。原始链接 www.mlss.cc 我们做了个github版补全了缺失链接 http://t.cn/RhWRlBo [ [微博](http://www.weibo.com/5220650532/BoqHnj2qe) ]
35 | 
36 | 2014-09-24 //@AixinSG: 我们做过hashtag扩散的研究 http://t.cn/RhWmsw8 Google Scholar上也有了一些相关的引用文章 http://t.cn/RhWmswE 相对来说扩散要比溯源容易做，溯源很不容易验证 [ [微博](http://www.weibo.com/5220650532/Bor4eu5sU) ]
37 | 
38 | > 2014-09-24 @好东西传送门: 问: 做基于话题的社交网络中的溯源，寻找源头用户, 求文章？ 答: 找到5篇论文 http://t.cn/RhW6Suk 特别推荐Guille等"在线社交网络中信息扩散综述"(SIGMOD Record 2013)脑图, 讲了三个挑战及相关解法: 发现有趣话题,扩散过程建模, 识别高影响力节点。此外还有几篇溯源算法研究及一篇Science相关好文 [ [微博](http://www.weibo.com/5220650532/BoqRO7Mzg) ]
39 | 
40 | 2014-09-24 问: 做基于话题的社交网络中的溯源，寻找源头用户, 求文章？ 答: 找到5篇论文 http://t.cn/RhW6Suk 特别推荐Guille等"在线社交网络中信息扩散综述"(SIGMOD Record 2013)脑图, 讲了三个挑战及相关解法: 发现有趣话题,扩散过程建模, 识别高影响力节点。此外还有几篇溯源算法研究及一篇Science相关好文 [ [微博](http://www.weibo.com/5220650532/BoqRO7Mzg) ]
41 | 
42 | 2014-09-24 机器学习暑期学校MLSS全集(2002-): MLSS汇集了机器学习界名师，提供基础教程，展示领域进展, 免费讲义下载 -- 是了解领域前沿的好去处。全集罗列了过去的26次课和未来的8次课, 基本上欧洲,美国,澳洲,亚洲各自一摊。原始链接 www.mlss.cc 我们做了个github版补全了缺失链接 http://t.cn/RhWRlBo [ [微博](http://www.weibo.com/5220650532/BoqHnj2qe) ]
43 | 
44 | 2014-09-23 [资料合集] http://t.cn/RhOz6bQ 情感分析(sentiment analysis) 两本经典综述PDF下载: A Survey of Opinion Mining and Sentiment Analysis (2012) by Bing Liu; Opinion mining and sentiment analysis (2008) by Bo Pang, Lillian Lee, 另附Richard Socher等深度学习用于情感分析的论文 欢迎补充 [ [微博](http://www.weibo.com/5220650532/Bohx6Ahic) ]
45 | 
46 | 2014-09-23 回复@禅系一之花: 谢谢提示。《傅立叶变换的简易指南》 http://t.cn/8srbg2x 译者：Taurelasse //@禅系一之花:译言上有翻译版 //@好东西传送门:感谢右边传送 An Interactive Guide To The Fourier Transform //@赶路人林文: http://t.cn/zjN3lQ6 这个傅里叶转换的文章是我看到 [ [微博](http://www.weibo.com/5220650532/Boh4Y1Doi) ]
47 | 
48 | > 2014-09-19 @好东西传送门: 问: @ShawnLeesr 给找一些好到逆天的 1.信号处理 2 傅里叶变换 3.小波变换的入门资料吧 答: 资料整理 http://t.cn/RhKNdKs 推 @Heinrich_DMU 傅里叶分析之掐死教程。进阶有Stanford傅立叶变换课(Brad Osgood) http://t.cn/RhKNdKF , MIT小波分析课(Gilbert Strang) http://t.cn/RhKNd9v 请指正补充 [ [微博](http://www.weibo.com/5220650532/BnHcFiekf) ]
49 | 
50 | 2014-09-23 //@AllAboutStorage: Freebase小介绍（目标结构化internet）。母公司2010年被Google收购，其技术应该被用到了Google Knowledge Graph这个项目中。感兴趣的同学还可以看一看Google的图数据库Cayley http://t.cn/RvHuYpL 。其介绍就清楚写明：Cayley是受Google知识图谱以及Freebase背后的图数据库启发。 [ [微博](http://www.weibo.com/5220650532/Boh3LyNLP) ]
51 | 
52 | > 2014-09-23 @好东西传送门: @low_accepted 问：求Freebase Wikipedia Extraction (WEX)的数据集(66GB大小，tsv格式) 答：AWS上有66GB版本的ESB snap-1781757e，挂靠在EC2上免下载。刚才实验了可用。WEX把维基百科英文版的模板、信息框、目录等转化为XML格式 http://t.cn/Rh0kIXp 更多Freebase资源 http://t.cn/Rh0kIX0 [ [微博](http://www.weibo.com/5220650532/Bogtpf4Jr) ]
53 | 
54 | 2014-09-23 问: @神经明亮的人 求perl教程呀? 答: 资料合集 http://t.cn/RhOvrpN Randal Schwartz 的learning Perl(小骆驼)是公认的入门教程, 浅显短小, 建议看英文版。更短有Learn Perl in about 2 hours 30 minutes. 更多看perlmonks.org和perl-tutorial.org的教程合集. 进阶看大骆驼Programming Perl 欢迎补充 [ [微博](http://www.weibo.com/5220650532/Boh22i7QV) ]
55 | 
56 | 2014-09-23 @low_accepted 问：求Freebase Wikipedia Extraction (WEX)的数据集(66GB大小，tsv格式) 答：AWS上有66GB版本的ESB snap-1781757e，挂靠在EC2上免下载。刚才实验了可用。WEX把维基百科英文版的模板、信息框、目录等转化为XML格式 http://t.cn/Rh0kIXp 更多Freebase资源 http://t.cn/Rh0kIX0 [ [微博](http://www.weibo.com/5220650532/Bogtpf4Jr) ]
57 | 
58 | 2014-09-23 感谢右边传送 An Interactive Guide To The Fourier Transform //@赶路人林文: http://t.cn/zjN3lQ6 这个傅里叶转换的文章是我看到的最棒的，无比生动。特别适合文科生，八年没碰过物理，五年没碰过数学的我都看懂了。有时间一定把这个翻译成中文。 [ [微博](http://www.weibo.com/5220650532/BofcOk20k) ]
59 | 
60 | > 2014-09-19 @好东西传送门: 问: @ShawnLeesr 给找一些好到逆天的 1.信号处理 2 傅里叶变换 3.小波变换的入门资料吧 答: 资料整理 http://t.cn/RhKNdKs 推 @Heinrich_DMU 傅里叶分析之掐死教程。进阶有Stanford傅立叶变换课(Brad Osgood) http://t.cn/RhKNdKF , MIT小波分析课(Gilbert Strang) http://t.cn/RhKNd9v 请指正补充 [ [微博](http://www.weibo.com/5220650532/BnHcFiekf) ]
61 | 
62 | 2014-09-22 Yar, Yac, Yaf 都是 @Laruence 直接在GITHUB上开源的 http://t.cn/zWiKwkj ， Zend Optimizer 也有他 http://t.cn/Rh0h8RZ [ [微博](http://www.weibo.com/5220650532/BoaTCoZbG) ]
63 | 
64 | > 2014-09-22 @Laruence: 又要写总结报告了, 这是目前微博俩年来达成的LNMP的技术结构图..... 也就这么些东西, 大部分都是开源的, 欢迎借鉴. [ [微博](http://www.weibo.com/1170999921/BoaKMhnJp) ]
65 | 
66 | 2014-09-22 问: 增强现实近几年的文章或者相关资料特别是关于PTAM的资料? 答: 资料汇总 http://t.cn/Rh0v03Y PTAM是"即时定位与地图构建" (Simultaneous localization and mapping, SLAM, 机器人视觉的研究方向)的重要进展, 概念于2007年ISMAR最佳论文中提出。2014 CVPR 有一组段教程涉及相关研究 欢迎补充指正 [ [微博](http://www.weibo.com/5220650532/BoaJeg31R) ]
67 | 
68 | 2014-09-22 传送好东西 #自然语言处理# 论文“Distributed Representations of Sentences and Documents ” Quoc V. Le, Tomas Mikolov, ICML 2014 链接 http://t.cn/RhpdQqv PV = Paragraph Vector [ [微博](http://www.weibo.com/5220650532/BoabnoAha) ]
69 | 
70 | > 2014-09-22 @ustczen: “Distributed Representations of Sentences and Documents ”中提到的句子向量化算法PV-DM在github上已经有了基于gensim的python实现：http://t.cn/RPDxH82，word2vec论坛有人用它在IMDB数据集上尝试做情感分类，效果没有论文声称的那么牛，但可以参考下实现。@好东西传送门 [ [微博](http://www.weibo.com/2872565912/Bo9xyfdib) ]
71 | 
72 | 2014-09-22 感谢@hnlyjzh 搬运！ Large Scale Visual Recognition Challenge视频免梯子下载 [ [微博](http://www.weibo.com/5220650532/Bo6SLASYp) ]
73 | 
74 | > 2014-09-21 @hnlyjzh: ILSVRC2014的视频在这里http://t.cn/RhNBfX6 @好东西传送门 [ [微博](http://www.weibo.com/1244843177/Bo3i6cufT) ]
75 | 
76 | 2014-09-21 继续传送 //@ICT秦磊: 转了GoogLeNet，放在优酷上。 http://t.cn/RhN58TY 好东西传送门: 帮转，在YouTube上的，看看有没有大神帮忙传送回国 [ [微博](http://www.weibo.com/5220650532/Bo0laE8yh) ]
77 | 
78 | > 2014-09-20 @贾旭kul_visics: @好东西传送门 ILSVRC2014 videos http://t.cn/RhCTDKX [ [微博](http://www.weibo.com/3195545915/BnUjy7FgT) ]
79 | 
80 | 2014-09-21 转发理论：一张图简明扼要总结了各种概率分布的关系，对机器学习和统计都极具参考价值。另补充维基百科上无版权的图 http://t.cn/zjyvP9q 并有对各种分布的详细解释 [ [微博](http://www.weibo.com/5220650532/BnYMPiRcz) ]
81 | 
82 | > 2014-09-21 @_散沙_民工智能_: 基础中的基础，各路大数据科学家首先忽略的东西。晚安 http://t.cn/z8AJfHW [ [微博](http://www.weibo.com/1438548745/BnWtujF4q) ]
83 | 
84 | 2014-09-20 帮转，在YouTube上的，看看有没有大神帮忙传送回国 [ [微博](http://www.weibo.com/5220650532/BnVt2ffR0) ]
85 | 
86 | > 2014-09-20 @贾旭kul_visics: @好东西传送门 ILSVRC2014 videos http://t.cn/RhCTDKX [ [微博](http://www.weibo.com/3195545915/BnUjy7FgT) ]
87 | 
88 | 2014-09-20 问: @情非得已小屋 推荐点关于推荐系统的综述么? 答: 问答207 http://t.cn/RhCt7lc 强推KDD2014讲义 "the recommender problem revisited": 第一部分Xavier Amatriain的综述(135页, 2014机器学习夏季学校版有248页), 第二部分"Context Aware Recommendation" (64页) 谢 @小飞鱼_露 @明风Andy 推荐 [ [微博](http://www.weibo.com/5220650532/BnRHSq1xl) ]
89 | 
90 | 2014-09-20 问: @水月小和尚 求隐私保护的资料 答: http://t.cn/Rh9egwV 隐私保护是大数据时代的重要问题。先推荐一篇2010年综述privacy-preserving data publishing 讲数据发布中的攻击模型, 隐私模型和匿名算法(看附图) 1.3节还列了一些综述, 讲"数据挖掘、数据查询、统计数据发布"中实现隐私保护 欢迎补充指正 [ [微博](http://www.weibo.com/5220650532/BnPOcry6i) ]
91 | 
92 | 2014-09-20 过去一周新增的问答和推荐资源都整理到Github上了http://t.cn/Rh9NSVm 到目前为止有360条主题。要找以前推荐过的资源直接可以在页面上Ctrl+F搜索。BTW，如果你想订阅每周更新，发邮箱给我的私信吧 [ [微博](http://www.weibo.com/5220650532/BnMt3bdgh) ]
93 | 


--------------------------------------------------------------------------------