├── README.md
├── homework
    ├── homework1.pdf
    └── homework2.pdf
├── note
    ├── intro_nn.pdf
    ├── lec1.pdf
    ├── lec2.pdf
    ├── lec3.pdf
    ├── lec4.pdf
    ├── lec5.pdf
    ├── lec6.pdf
    ├── lec7.1.pdf
    ├── lec7.2.pdf
    ├── lec7.pdf
    ├── lec9.1.pdf
    ├── lec9.2.pdf
    ├── lec9.3.pdf
    ├── overview.pdf
    └── recent_progresses.pdf
├── paper_list.md
├── pre_schedule.txt
└── template
    ├── scribe.sty
    └── template.tex


/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | ## Mathematical Theory of Neural Network Models
  3 | 
  4 | ### Announcements
  5 | - **7/26**: Lecture 7 and 9 are out.
  6 | - **7/25**: The report of paper review is due on 8/2, 12 pm. 
  7 | - 7/19: The [schedule](pre_schedule.txt) of presentations is out.
  8 | - 7/18: A draft of Lecture [4](note/lec4.pdf) is out. 
  9 | - 7/17: Drafts of Lecture [3](note/lec3.pdf), [5](note/lec5.pdf) and [6](note/lec6.pdf) are out. 
 10 | - 7/12: A draft of [Lecture 2](note/lec2.pdf) is out.
 11 | - 7/12: Some references for random feature models, Barron spaces and regularization theory of two-layer nets are added.
 12 | - 7/9: A draft of [Lecture 1](note/lec1.pdf) is out.
 13 | - 7/9: [Homework 2](homework/homework2.pdf) is out. It is due on Tuesday, 7/16, 12pm.
 14 | - 7/6: [Homework 1](homework/homework1.pdf) is out. It is due on Friday, 7/12, 12pm.
 15 | 
 16 | 
 17 | ### Administrative information
 18 | 
 19 | - **Instructor:**   
 20 |     - [Weinan E](https://web.math.princeton.edu/~weinan/) 
 21 |     - [Lei Wu](https://scholar.google.com/citations?user=CMweeYcAAAAJ&hl=en),     leiwu@princeton.edu 
 22 |     - Chao Ma,   chaom@princeton.edu
 23 | 
 24 | - **Time:** Tue: 2:00-5:00 pm; Thu: 2:00-5:00 pm; Fri: 3:00-5:00 pm. 
 25 | 
 26 | - **Location:**  Room 515, [Teaching Building 2](https://maps.baidu.com/poi/%E5%8C%97%E4%BA%AC%E5%A4%A7%E5%AD%A6(%E7%87%95%E5%9B%AD%E6%A0%A1%E5%8C%BA)%E7%AC%AC%E4%BA%8C%E6%95%99%E5%AD%A6%E6%A5%BC(%E6%9D%8E%E5%85%86%E5%9F%BA%E6%A5%BC)/@12948834.869857343,4837581.844142513,19.6z?uid=82548a63754afc91735e80e4&primaryUid=10472254985355704340&ugc_type=3&ugc_ver=1&device_ratio=1&compat=1&querytype=detailConInfo&da_src=shareurl)
 27 | 
 28 | 
 29 | 
 30 | 
 31 | ### Course Content
 32 | **Description:**
 33 | 
 34 | This course introduces the basic models for supervised learning,  including kernel method, two-layer neural network and residual network. We then provide a unified approach to analyze these models.
 35 | 
 36 | 
 37 | **Topic:**
 38 | 
 39 | - Supverised learning, generalization/approximation/estimation error, a priroi/posteriori estimates
 40 | - Kernel method, two-layer nerual network, residual network 
 41 | - Reproducing kernel Hilbert space, Barron space, compositional function space
 42 | - Rademacher complexity, margin, gradient descent, implicit regularization
 43 | 
 44 | **Prerequisite:**
 45 | 
 46 | - A solid background in linear algebra, real analysis and probability/measure theory
 47 | - Basic knowledge in (convex) optimization and statistics 
 48 | 
 49 | 
 50 | ### Grading
 51 | **Coursework:**
 52 | - **Homework** (45%)
 53 | - **Paper review** (45%): You are asked to choose a paper from this [ paper list](paper_list.md) and write a review. The review should not only summarize the paper, but also identify the novelty and limitation of the result.  A good paper review at least attempts to answer the following four questions:
 54 |     - What is the main result of the paper?
 55 |     - Why is the result important and significant compared with other papers?
 56 |     - What is the limitation of the result?
 57 |     - What is the potential research direction inspired by the paper?
 58 |     
 59 |  You are required to give a presentation (15%) and submit a report of 3 pages (30%).
 60 | 
 61 | - **Scribe notes** (10%): You are asked to scribe a note in LaTeX. The scribe notes can be done in pairs. Please use this [template](template/):  
 62 | 
 63 | **Collaboration policy:** We encourage you to form study groups and discuss courseworks.  However, you must write up all the coureworks from scrach independently without refering to any notes from  others. 
 64 | 
 65 | 
 66 | 
 67 | ### Texts and References
 68 |  - [Peter Bartlett's course: Statistical Learning Theory](https://www.stat.berkeley.edu/~bartlett/courses/2014spring-cs281bstat241b/)
 69 |  - [MIT's course: Statistical Learning Theory](http://www.mit.edu/~9.520/fall18/)
 70 |  - [Mohri's book: Foundations of Machine Learning](https://cs.nyu.edu/~mohri/mlbook/)
 71 |  - [Shai Shalev-Shwartz's book: Understanding Machine Learning: From Theory to Algorithms](https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/copy.html)
 72 | 
 73 | ---
 74 | ### Schedule (subject to change)
 75 | 
 76 | #### Week 1
 77 | - Tue 7/2: Introduction to supervised learning methods 
 78 |     - [Lecture 1](note/lec1.pdf)
 79 |     -  [Random Features for Large-Scale Kernel Machines](https://papers.nips.cc/paper/3182-random-features-for-large-scale-kernel-machines)
 80 | - Thu 7/4: Overview of mathematical theory for neural network models 
 81 |     - [Lecture 2](note/lec2.pdf)
 82 |     - [Slide of Prof. E](note/overview.pdf)
 83 |     - [A priori estimates](https://en.wikipedia.org/wiki/A_priori_estimate)
 84 | - Fri 7/5: Rademacher complexity, covering number, metric entropy and uniform bound 
 85 |     - [Lecture 3](note/lec3.pdf)
 86 |     - [Concentration inequalities](https://www.stat.berkeley.edu/~mjwain/stat210b/Chap2_TailBounds_Jan22_2015.pdf)
 87 | 
 88 | #### Week 2
 89 | - Reproducing kernel Hilbert space and random feature model
 90 |     - [Lecture 4](note/lec4.pdf)
 91 |     - [What is an RKHS?](http://www.stats.ox.ac.uk/~sejdinov/teaching/atml14/Theory_2014.pdf) 
 92 |     - [Uniform Approximation of Functions with Random Bases](https://people.eecs.berkeley.edu/~brecht/papers/08.Rah.Rec.Allerton.pdf)
 93 | - Error estimates for random feature model with explict and implicit regularizations
 94 |     - [Lecture 5](note/lec5.pdf)
 95 |     - The analysis of implicit regularization for the random feature model can be found in this [paper](https://arxiv.org/abs/1904.04326)
 96 |     - [Learning with SGD and Random Features](https://arxiv.org/abs/1807.06343)
 97 |     - [Optimal Rates for the Regularized Least-Squares Algorithm](https://link.springer.com/article/10.1007/s10208-006-0196-8)
 98 | - Barron space and regularization theory of two-layer neural networks
 99 |     - [Lecture 6](note/lec6.pdf)
100 |     - Properties of Barron space can found in Section 2 of this [paper](https://arxiv.org/abs/1906.08039)
101 |     - The a priori estimates of regularized two-layer neural networks can be found in this [paper](https://arxiv.org/abs/1810.06397)
102 |     - [The must-read classic paper of Andrew Barron](http://www.stat.yale.edu/~arb4/publications_files/UniversalApproximationBoundsForSuperpositionsOfASigmoidalFunction.pdf) (This is the first paper that provides an approximation rate without the course of dimensionality.)
103 | 
104 | #### Week 3
105 | - Implicit regularization for two-layer neural networks 
106 |     - Lecture [7](note/lec7.pdf), [7.1](note/lec7.1.pdf), [7.2](note/lec7.2.pdf)
107 |     - The main materials can be found in this [paper](https://arxiv.org/pdf/1904.04326v1.pdf)
108 | - A priori estimates for regularized deep residual networks 
109 |     - [A Priori Estimates of the Population Risk for Residual Networks](https://arxiv.org/abs/1903.02154)
110 | - F-principle and it application in deep learning (Guest speakers: Zhiqin Xu, Yaoyu Zhang, Tao Luo)
111 |     - An introduction to F-principle [Lecture 9.1](note/lec9.1.pdf)
112 |     - Application of F-principle in learning two-layer neural networks [Lecture 9.2](note/lec9.2.pdf)
113 |     - General theory of F-principle [Lecture 9.3](note/lec9.3.pdf)
114 | 
115 | #### Week 4
116 | - Compositonal function space for deep residual networks
117 |     - The mathematical theory of compositonal function spaces can found in Section 3 of this [paper](https://arxiv.org/abs/1906.08039)
118 | - Overview of recent progresses in theoretical deep learning 
119 |     - [Slide](note/recent_progresses.pdf)
120 | 
121 | 
122 | 
123 | 
124 | 
125 | 
126 | 
127 | 
128 | 
129 | 
130 | 


--------------------------------------------------------------------------------
/homework/homework1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/homework/homework1.pdf


--------------------------------------------------------------------------------
/homework/homework2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/homework/homework2.pdf


--------------------------------------------------------------------------------
/note/intro_nn.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/intro_nn.pdf


--------------------------------------------------------------------------------
/note/lec1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec1.pdf


--------------------------------------------------------------------------------
/note/lec2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec2.pdf


--------------------------------------------------------------------------------
/note/lec3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec3.pdf


--------------------------------------------------------------------------------
/note/lec4.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec4.pdf


--------------------------------------------------------------------------------
/note/lec5.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec5.pdf


--------------------------------------------------------------------------------
/note/lec6.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec6.pdf


--------------------------------------------------------------------------------
/note/lec7.1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec7.1.pdf


--------------------------------------------------------------------------------
/note/lec7.2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec7.2.pdf


--------------------------------------------------------------------------------
/note/lec7.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec7.pdf


--------------------------------------------------------------------------------
/note/lec9.1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec9.1.pdf


--------------------------------------------------------------------------------
/note/lec9.2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec9.2.pdf


--------------------------------------------------------------------------------
/note/lec9.3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/lec9.3.pdf


--------------------------------------------------------------------------------
/note/overview.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/overview.pdf


--------------------------------------------------------------------------------
/note/recent_progresses.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/leiwu0/course.math_theory_nn/6a635d692305b383f719afbd285e74b0ce72db76/note/recent_progresses.pdf


--------------------------------------------------------------------------------
/paper_list.md:
--------------------------------------------------------------------------------
 1 | ### Approximation and estimation theory 
 2 | - [x] [Optimal Approximation with Sparsely Connected Deep Neural Networks](https://arxiv.org/abs/1705.01714)
 3 | - [x] [On the Expressive Power of Deep Polynomial Neural Networks](https://arxiv.org/pdf/1905.12207.pdf)
 4 | - [x] [Deep Network Approximation Characterized by Number of Neurons](https://arxiv.org/pdf/1906.05497v1.pdf)
 5 | - [ ] [The phase diagram of approximation rates for deep neural networks](https://arxiv.org/pdf/1906.09477v1.pdf)
 6 | - [ ] [Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality](https://openreview.net/forum?id=H1ebTsActm)
 7 | - [x] [Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks](https://arxiv.org/abs/1903.10047)
 8 | - [ ] [Optimal Approximation of Piecewise Smooth Functions Using Deep ReLU Neural Networks](https://arxiv.org/pdf/1709.05289v4.pdf)
 9 | - [ ] [Approximation Spaces of Deep Neural Networks](https://arxiv.org/pdf/1905.01208v2.pdf)
10 | - [x] [On the Power and Limitations of Random Features for Understanding Neural Networks](https://arxiv.org/abs/1904.00687)
11 | - [x] [Error bounds for deep ReLU networks using the Kolmogorov--Arnold superposition theorem](https://arxiv.org/abs/1906.11945)
12 | - [x] [Benign Overfitting in Linear Regression](https://arxiv.org/abs/1906.11300)
13 | 
14 | ### Implicit regularization
15 | - [x] [Risk and Parameter Convergence of Logistic Regression](https://arxiv.org/abs/1803.07300)
16 | - [x] [Implicit Regularization in Deep Matrix Factorization](https://arxiv.org/pdf/1905.13655.pdf)
17 | - [x] [Gradient Descent Maximizes the Margin of Homogeneous Neural Networks](https://arxiv.org/pdf/1906.05890.pdf)
18 | - [x] [Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets](https://arxiv.org/abs/1906.06247v1)
19 | - [x] [Gradient Dynamics of Shallow Univariate ReLU Networks](https://arxiv.org/abs/1906.07842)
20 | 
21 | ### PDE 
22 | - [ ] [Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations](https://arxiv.org/abs/1809.03062)
23 | - [x] [A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations](https://arxiv.org/abs/1901.10854)
24 | 


--------------------------------------------------------------------------------
/pre_schedule.txt:
--------------------------------------------------------------------------------
 1 | Schedule of presentations
 2 | 
 3 | 
 4 | =============================
 5 | Thursday (7/25)
 6 | =============================
 7 | 14:00 - 14:20       On the Power and Limitations of Random Features for Understanding Neural Networks
 8 | 14:20 - 14:40       On the Expressive Power of Deep Polynomial Neural Networks
 9 | 14:40 - 15:00       Benign Overfitting in Linear Regression
10 | 
11 | 15:10 - 15:30       Risk and Parameter Convergence of Logistic Regression 
12 | 15:30 - 15:50       Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks 
13 | 
14 | 16:00 - 16:20       Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets 
15 | 16:20 - 16:40       Gradient Dynamics of Shallow Univariate ReLU Networks
16 | 
17 | 
18 | =============================
19 | Friday (7/26)
20 | =============================
21 | 15:10 - 15:30       Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
22 | 15:30 - 15:50       Deep Network Approximation Characterized by Number of Neurons
23 | 15:50 - 16:10       Optimal Approximation with Sparsely Connected Deep Neural Networks
24 | 
25 | 16:20 - 16:40       Error bounds for deep ReLU networks using the Kolmogorov--Arnold superposition theorem
26 | 16:40 - 17:00       A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations
27 | 17:00 - 17:20       Implicit Regularization in Deep Matrix Factorization
28 | 


--------------------------------------------------------------------------------
/template/scribe.sty:
--------------------------------------------------------------------------------
  1 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  2 | % Scribe notes style file      
  3 | %
  4 | % This file should be called scribe.sty
  5 | %
  6 | % Your main LaTeX file should look like this:
  7 | %
  8 | %   \documentclass[12pt]{article}
  9 | %   \usepackage{scribe}
 10 | %
 11 | %   \Scribe{YOUR NAME}
 12 | %   \Lecturer{Anupam Gupta OR Ryan O'Donnell}
 13 | %   \LectureNumber{N}
 14 | %   \LectureDate{DATE}
 15 | %   \LectureTitle{A TITLE FOR THE LECTURE}
 16 | %
 17 | %   \begin{document}
 18 | %   \MakeScribeTop
 19 | %
 20 | %   \section{SECTION NAME}
 21 | %
 22 | %   NOTES GO HERE
 23 | %
 24 | %   \section{ANOTHER SECTION NAME}
 25 | %
 26 | %   MORE NOTES GO HERE
 27 | %
 28 | %   etc.
 29 | %
 30 | %   \bibliographystyle{abbrv}           % if you need a bibliography
 31 | %   \bibliography{mybib}                % assuming yours is named mybib.bib
 32 | %
 33 | %   \end{document}
 34 | %
 35 | %
 36 | % A .bib file is a text file containing a sequence like...
 37 | % 
 38 | %   @article{ADR82,
 39 | %     author = "Alain Aspect and Jean Dalibard and G{\'e}rard Roger",
 40 | %     title = "Experimental Test of {B}ell's Inequalities Using Time-Varying Analyzers",
 41 | %     journal = "Phys.\ Rev.\ Lett.",
 42 | %     volume = 49,
 43 | %     number = 25,
 44 | %     pages = "1804--1807",
 45 | %     year = 1982
 46 | %   }
 47 | %
 48 | %   @inproceedings{Fei91,
 49 | %     author = "Uriel Feige",
 50 | %     title = "On the success probability of the two provers in one round proof systems",
 51 | %     booktitle = "Proc.\ 6th Symp.\ on Structure in Complexity Theory (CCC)",
 52 | %     pages = "116--123",
 53 | %     year = 1991
 54 | %   }
 55 | %
 56 | %
 57 | %
 58 | %
 59 | %
 60 | %
 61 | % For your LaTeX files, there are some macros you may want to use below...
 62 | 
 63 | 
 64 | \oddsidemargin 0in \evensidemargin 0in \marginparwidth 40pt
 65 | \marginparsep 10pt \topmargin 0pt \headsep 0in \headheight 0in
 66 | \textheight 8.5in \textwidth 6.5in \brokenpenalty=10000
 67 | 
 68 | \usepackage{amssymb}
 69 | \usepackage{amsfonts}
 70 | \usepackage{amsmath}
 71 | \usepackage{amsthm}
 72 | \usepackage{latexsym}
 73 | \usepackage{epsfig}
 74 | \usepackage{bm}
 75 | \usepackage{xspace}
 76 | \usepackage{times}
 77 | \usepackage[utf8x]{inputenc}
 78 | \usepackage[T1]{fontenc}
 79 | \usepackage{listings}
 80 | \usepackage{color}
 81 |  
 82 | \definecolor{codegreen}{rgb}{0.3,0.6,0.4}
 83 | \definecolor{codegray}{rgb}{0.5,0.5,0.5}
 84 | \definecolor{codepurple}{rgb}{0.58,0,0.82}
 85 | \definecolor{backcolour}{rgb}{0.95,0.95,0.92}
 86 |  
 87 | \lstdefinestyle{mystyle}{
 88 |     backgroundcolor=\color{backcolour},   
 89 |     commentstyle=\color{codegreen},
 90 |     keywordstyle=\color{magenta},
 91 |     numberstyle=\tiny\color{codegray},
 92 |     stringstyle=\color{codepurple},
 93 |     basicstyle=\footnotesize,
 94 |     breakatwhitespace=false,         
 95 |     breaklines=true,                 
 96 |     captionpos=b,                    
 97 |     keepspaces=true,                 
 98 |     numbers=left,                    
 99 |     numbersep=5pt,                  
100 |     showspaces=false,                
101 |     showstringspaces=false,
102 |     showtabs=false,                  
103 |     tabsize=2
104 | }
105 | 
106 | %%
107 | %% Julia definition (c) 2014 Jubobs
108 | %%
109 | \lstdefinelanguage{Julia}%
110 |   {morekeywords={abstract,break,case,catch,const,continue,do,else,elseif,%
111 |       end,export,false,for,function,immutable,import,importall,if,in,%
112 |       macro,module,otherwise,quote,return,switch,true,try,type,typealias,%
113 |       using,while},%
114 |    sensitive=true,%
115 |    alsoother={$},%
116 |    morecomment=[l]\#,%
117 |    morecomment=[n]{\#=}{=\#},%
118 |    morestring=[s]{"}{"},%
119 |    morestring=[m]{'}{'},%
120 | }[keywords,comments,strings]%
121 | 
122 | \lstset{%
123 |     language         = Julia,
124 |     basicstyle       = \ttfamily,
125 |     keywordstyle     = \bfseries\color{blue},
126 |     stringstyle      = \color{magenta},
127 |     commentstyle     = \color{ForestGreen},
128 |     showstringspaces = false,
129 | }
130 | 
131 | 
132 | \newtheorem{theorem}{Theorem}[section]
133 | \newtheorem{lemma}[theorem]{Lemma}
134 | \newtheorem{claim}[theorem]{Claim}
135 | \newtheorem{proposition}[theorem]{Proposition}
136 | \newtheorem{corollary}[theorem]{Corollary}
137 | \newtheorem{fact}[theorem]{Fact}
138 | \newtheorem{example}[theorem]{Example}
139 | \newtheorem{notation}[theorem]{Notation}
140 | \newtheorem{observation}[theorem]{Observation}
141 | \newtheorem{conjecture}[theorem]{Conjecture}
142 | 
143 | \theoremstyle{definition}
144 | \newtheorem{definition}[theorem]{Definition}
145 | 
146 | \theoremstyle{remark}
147 | \newtheorem{remark}[theorem]{Remark}
148 | 
149 | % Setting the theorem style back to plain in case theorems are defined in the main file
150 | \theoremstyle{plain}
151 | 
152 | 
153 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
154 | % Useful macros
155 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
156 | 
157 | % for temporarily chunks of text
158 | \newcommand{\ignore}[1]{}
159 | 
160 | % Probability/expectation operators.  The ones ending in x should be used if you want
161 | % subscripts that go directly *below* the operator (in math mode); no x means the subscripts
162 | % go below and to the right.  NB:  \P is remapped below for the complexity class P.
163 | \renewcommand{\Pr}{{\bf Pr}}
164 | \newcommand{\Prx}{\mathop{\bf Pr\/}}
165 | \newcommand{\E}{{\bf E}}
166 | \newcommand{\Ex}{\mathop{\bf E\/}}
167 | \newcommand{\Var}{{\bf Var}}
168 | \newcommand{\Varx}{\mathop{\bf Var\/}}
169 | \newcommand{\Cov}{{\bf Cov}}
170 | \newcommand{\Covx}{\mathop{\bf Cov\/}}
171 | 
172 | % shortcuts for symbol names that are too long to type
173 | \newcommand{\eps}{\epsilon}
174 | \newcommand{\lam}{\lambda}
175 | \renewcommand{\l}{\ell}
176 | \newcommand{\la}{\langle}
177 | \newcommand{\ra}{\rangle}
178 | \newcommand{\wh}{\widehat}
179 | \newcommand{\wt}{\widetilde}
180 | 
181 | % "blackboard-fonted" letters for the reals, naturals etc.
182 | \newcommand{\R}{\mathbb R}
183 | \newcommand{\N}{\mathbb N}
184 | \newcommand{\Z}{\mathbb Z}
185 | \newcommand{\F}{\mathbb F}
186 | \newcommand{\Q}{\mathbb Q}
187 | \newcommand{\C}{\mathbb C}
188 | 
189 | % operators that should be typeset in Roman font
190 | \newcommand{\poly}{\mathrm{poly}}
191 | \newcommand{\polylog}{\mathrm{polylog}}
192 | \newcommand{\sgn}{\mathrm{sgn}}
193 | \newcommand{\avg}{\mathop{\mathrm{avg}}}
194 | \newcommand{\val}{{\mathrm{val}}}
195 | 
196 | % complexity classes
197 | \renewcommand{\P}{\mathrm{P}}
198 | \newcommand{\NP}{\mathrm{NP}}
199 | \newcommand{\BPP}{\mathrm{BPP}}
200 | \newcommand{\DTIME}{\mathrm{DTIME}}
201 | \newcommand{\ZPTIME}{\mathrm{ZPTIME}}
202 | \newcommand{\BPTIME}{\mathrm{BPTIME}}
203 | \newcommand{\NTIME}{\mathrm{NTIME}}
204 | 
205 | % values associated to optimization algorithm instances
206 | \newcommand{\Opt}{{\mathsf{Opt}}}
207 | \newcommand{\Alg}{{\mathsf{Alg}}}
208 | \newcommand{\Lp}{{\mathsf{Lp}}}
209 | \newcommand{\Sdp}{{\mathsf{Sdp}}}
210 | \newcommand{\Exp}{{\mathsf{Exp}}}
211 | 
212 | % if you think the sum and product signs are too big in your math mode; x convention
213 | % as in the probability operators
214 | \newcommand{\littlesum}{{\textstyle \sum}}
215 | \newcommand{\littlesumx}{\mathop{{\textstyle \sum}}}
216 | \newcommand{\littleprod}{{\textstyle \prod}}
217 | \newcommand{\littleprodx}{\mathop{{\textstyle \prod}}}
218 | 
219 | % horizontal line across the page
220 | \newcommand{\horz}{
221 | \vspace{-.4in}
222 | \begin{center}
223 | \begin{tabular}{p{\textwidth}}\\
224 | \hline
225 | \end{tabular}
226 | \end{center}
227 | }
228 | 
229 | % calligraphic letters
230 | \newcommand{\calA}{{\cal A}}
231 | \newcommand{\calB}{{\cal B}}
232 | \newcommand{\calC}{{\cal C}}
233 | \newcommand{\calD}{{\cal D}}
234 | \newcommand{\calE}{{\cal E}}
235 | \newcommand{\calF}{{\cal F}}
236 | \newcommand{\calG}{{\cal G}}
237 | \newcommand{\calH}{{\cal H}}
238 | \newcommand{\calI}{{\cal I}}
239 | \newcommand{\calJ}{{\cal J}}
240 | \newcommand{\calK}{{\cal K}}
241 | \newcommand{\calL}{{\cal L}}
242 | \newcommand{\calM}{{\cal M}}
243 | \newcommand{\calN}{{\cal N}}
244 | \newcommand{\calO}{{\cal O}}
245 | \newcommand{\calP}{{\cal P}}
246 | \newcommand{\calQ}{{\cal Q}}
247 | \newcommand{\calR}{{\cal R}}
248 | \newcommand{\calS}{{\cal S}}
249 | \newcommand{\calT}{{\cal T}}
250 | \newcommand{\calU}{{\cal U}}
251 | \newcommand{\calV}{{\cal V}}
252 | \newcommand{\calW}{{\cal W}}
253 | \newcommand{\calX}{{\cal X}}
254 | \newcommand{\calY}{{\cal Y}}
255 | \newcommand{\calZ}{{\cal Z}}
256 | 
257 | % bold letters (useful for random variables)
258 | \renewcommand{\a}{{\boldsymbol a}}
259 | \renewcommand{\b}{{\boldsymbol b}}
260 | \renewcommand{\c}{{\boldsymbol c}}
261 | \renewcommand{\d}{{\boldsymbol d}}
262 | \newcommand{\e}{{\boldsymbol e}}
263 | \newcommand{\f}{{\boldsymbol f}}
264 | \newcommand{\g}{{\boldsymbol g}}
265 | \newcommand{\h}{{\boldsymbol h}}
266 | \renewcommand{\i}{{\boldsymbol i}}
267 | \renewcommand{\j}{{\boldsymbol j}}
268 | \renewcommand{\k}{{\boldsymbol k}}
269 | \newcommand{\m}{{\boldsymbol m}}
270 | \newcommand{\n}{{\boldsymbol n}}
271 | \renewcommand{\o}{{\boldsymbol o}}
272 | \newcommand{\p}{{\boldsymbol p}}
273 | \newcommand{\q}{{\boldsymbol q}}
274 | \renewcommand{\r}{{\boldsymbol r}}
275 | \newcommand{\s}{{\boldsymbol s}}
276 | \renewcommand{\t}{{\boldsymbol t}}
277 | \renewcommand{\u}{{\boldsymbol u}}
278 | \renewcommand{\v}{{\boldsymbol v}}
279 | \newcommand{\w}{{\boldsymbol w}}
280 | \newcommand{\x}{{\boldsymbol x}}
281 | \newcommand{\y}{{\boldsymbol y}}
282 | \newcommand{\z}{{\boldsymbol z}}
283 | \newcommand{\A}{{\boldsymbol A}}
284 | \newcommand{\B}{{\boldsymbol B}}
285 | \newcommand{\D}{{\boldsymbol D}}
286 | \newcommand{\G}{{\boldsymbol G}}
287 | \renewcommand{\H}{{\boldsymbol H}}
288 | \newcommand{\I}{{\boldsymbol I}}
289 | \newcommand{\J}{{\boldsymbol J}}
290 | \newcommand{\K}{{\boldsymbol K}}
291 | \renewcommand{\L}{{\boldsymbol L}}
292 | \newcommand{\M}{{\boldsymbol M}}
293 | \renewcommand{\O}{{\boldsymbol O}}
294 | \renewcommand{\S}{{\boldsymbol S}}
295 | \newcommand{\T}{{\boldsymbol T}}
296 | \newcommand{\U}{{\boldsymbol U}}
297 | \newcommand{\V}{{\boldsymbol V}}
298 | \newcommand{\W}{{\boldsymbol W}}
299 | \newcommand{\X}{{\boldsymbol X}}
300 | \newcommand{\Y}{{\boldsymbol Y}}
301 | 
302 | 
303 | 
304 | % useful for Fourier analysis
305 | \newcommand{\bits}{\{-1,1\}}
306 | \newcommand{\bitsn}{\{-1,1\}^n}
307 | \newcommand{\bn}{\bitsn}
308 | \newcommand{\isafunc}{{: \bitsn \rightarrow \bits}}
309 | \newcommand{\fisafunc}{{f : \bitsn \rightarrow \bits}}
310 | 
311 | % if you want
312 | \newcommand{\half}{{\textstyle \frac12}}
313 | 
314 | \newcommand{\myfig}[4]{\begin{figure}[h] \begin{center} \includegraphics[width=#1\textwidth]{#2} \caption{#3} \label{#4} \end{center} \end{figure}} 
315 | 
316 | 
317 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
318 | % Feel free to ignore the rest of this file
319 | 
320 | 
321 | 
322 | \def\ScribeStr{??}
323 | \def\LecStr{??}
324 | \def\LecNum{??}
325 | \def\LecTitle{??}
326 | \def\LecDate{??}
327 | \newcommand{\Scribe}[1]{\def\ScribeStr{Scribe: #1}}
328 | \newcommand{\Scribes}[1]{\def\ScribeStr{Scribes: #1}}
329 | \newcommand{\Lecturer}[1]{\def\LecStr{Lecturer: #1}}
330 | \newcommand{\Lecturers}[1]{\def\LecStr{Lecturers: #1}}
331 | \newcommand{\LectureNumber}[1]{\def\LecNum{#1}}
332 | \newcommand{\LectureDate}[1]{\def\LecDate{#1}}
333 | \newcommand{\LectureTitle}[1]{\def\LecTitle{#1}}
334 | 
335 | \newdimen\headerwidth
336 | 
337 | \newcommand{\MakeScribeTop}{
338 | \noindent
339 | \begin{center}
340 |   \framebox{
341 |     \vbox{
342 |       \headerwidth=\textwidth
343 |       \advance\headerwidth by -0.22in
344 |       \hbox to \headerwidth {\hfill Mathematical Theory of Neural Network Models}
345 |       \vspace{4mm}
346 |       \hbox to \headerwidth {{\Large \hfill Lecture \LecNum: {\LecTitle} \hfill}}
347 |       \vspace{2mm}
348 |       \hbox to \headerwidth {\hfill \LecDate \hfill}
349 |       \vspace{2mm}
350 |       \hbox to \headerwidth {{\it \LecStr \hfill \ScribeStr}}
351 |       }
352 |     }
353 | \end{center}
354 | \vspace*{4mm}}
355 | 


--------------------------------------------------------------------------------
/template/template.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{article}
 2 | \usepackage[english]{babel}
 3 | \usepackage[utf8x]{inputenc}
 4 | \usepackage[T1]{fontenc}
 5 | \usepackage{scribe}
 6 | \usepackage{listings}
 7 | 
 8 | \Scribe{Shuhai Zhao, Yilei Han}
 9 | \Lecturer{Lei Wu}
10 | \LectureNumber{1}
11 | \LectureDate{July 2}
12 | \LectureTitle{An Introduction to Supervised Learning}
13 | 
14 | \lstset{style=mystyle}
15 | 
16 | \begin{document}
17 |     \MakeScribeTop
18 | 
19 | \section{Supervised Learning}
20 | Some basic terminologies:
21 | \begin{itemize}
22 | \item \textit{features}:  The set of attibutes, often represented as a vector, associated to an example.
23 | \item \textit{Hypothesis space}: A set $\mathcal{F}$ of functions mapping features to the set of labels $\mathcal{Y}$.
24 | \item \textit{Loss function}: A function $l$ that measures the difference, or loss, between a predicted label and a true label: $l:\mathcal{Y}×\mathcal{Y}\to \mathbb{R}_{+}$, for example, $l(y,y')=(y-y')^{2}$.
25 | \end{itemize}
26 | 
27 | 
28 | \bibliographystyle{abbrv}           % if you need a bibliography
29 | \bibliography{mybib}                % assuming yours is named mybib.bib
30 | 
31 | \end{document}


--------------------------------------------------------------------------------