└── README.md /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Self-Study NLP Roadmap 3 | 4 | This roadmap combines topics and references from **CMU’s 11-711 Advanced NLP** and **UMass Amherst’s CS 685**. It’s designed for a structured, step-by-step self-study over about **12–14 weeks** (or longer, depending on your schedule). Each “week” is flexible and can span 1–2 weeks of part-time study. 5 | 6 | --- 7 | 8 | ## Mind Map / Outline 9 | 10 | Below is a simplified *mind map* illustrating how topics connect. It starts from the fundamentals of NLP and branches out to advanced areas like large language models (LLMs), retrieval-augmented generation (RAG), and interpretability. 11 | 12 |
13 | Click to expand the Mind Map 14 | 15 | ``` 16 | ┌─────────────────────────┐ 17 | │ Week 1: Intro │ 18 | │ & NLP Fundamentals │ 19 | └─────────────┬───────────┘ 20 | │ 21 | ┌───────────────────────────────┼───────────────────────────────┐ 22 | ▼ ▼ ▼ 23 | ┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐ 24 | │ Week 2: Word Reps & │ │ Week 3: Language │ │ Week 4: Sequence │ 25 | │ Text Classification │ │ Modeling │ │ Modeling (RNNs, LSTMs)│ 26 | └───────────────────────┘ └───────────────────────┘ └───────────────────────┘ 27 | │ │ │ 28 | │ ▼ │ 29 | ├──────────────┐ ┌───────────────────────────────────┐ │ 30 | │ ▼ │ Week 5: Transformers & Attention │ │ 31 | │ ┌────────────────────────────────────────────────────┘ │ 32 | │ │ │ │ 33 | │ ▼ ▼ ▼ 34 | ┌───────────────────────┐ ┌─────────────────────────────┐ ┌───────────────────────────────┐ 35 | │ Week 6: Text Gen & │ │ Week 7: Instruction Tuning & │ │ Week 8: Experimental Design & │ 36 | │ Prompting Basics │ │ Efficient Fine-Tuning │ │ Human Annotation │ 37 | └───────────────────────┘ └─────────────────────────────┘ └───────────────────────────────┘ 38 | │ │ │ 39 | │ └───────────────┬────────────────┘ 40 | │ │ 41 | ▼ ▼ 42 | ┌────────────────────────────┐ ┌────────────────────────────┐ 43 | │ Week 9: Retrieval & RAG │ │ Week 10: Distillation, │ 44 | │ (Retrieval-Augmented Gen) │ │ Quantization & RL from HF │ 45 | └────────────────────────────┘ └────────────────────────────┘ 46 | │ │ 47 | └───────────────┬────────────────────────┘ 48 | │ 49 | ▼ 50 | ┌────────────────────────────┐ 51 | │ Week 11: Debugging & │ 52 | │ Interpretation (Probing, │ 53 | │ Mechanistic Interp.) │ 54 | └─────────────┬──────────────┘ 55 | │ 56 | ▼ 57 | ┌─────────────────────────────────────────────────────┐ 58 | │ Week 12: Advanced LLMs, Agents & Long Contexts │ 59 | │ (LLaMa, GPT-4, Toolformer, ReAct, etc.) │ 60 | └─────────────┬───────────────────────────────────────┘ 61 | │ 62 | ▼ 63 | ┌───────────────────────────────────────────────────┐ 64 | │ Week 13: Complex Reasoning & Linguistics │ 65 | │ (Chain-of-thought, abductive reasoning, etc.) │ 66 | └─────────────┬─────────────────────────────────────┘ 67 | │ 68 | ▼ 69 | ┌───────────────────────────────────────────────────┐ 70 | │ Week 14: Multilingual NLP & Wrap-Up │ 71 | │ (mBERT, XLM-R, zero-shot cross-lingual) │ 72 | └───────────────────────────────────────────────────┘ 73 | ``` 74 | 75 |
76 | 77 | --- 78 | 79 | ## Table of Contents 80 | 81 | 1. [Week 1: Introduction & NLP Fundamentals](#week-1-introduction--nlp-fundamentals) 82 | 2. [Week 2: Word Representations & Text Classification](#week-2-word-representations--text-classification) 83 | 3. [Week 3: Language Modeling](#week-3-language-modeling) 84 | 4. [Week 4: Sequence Modeling (RNNs, LSTMs, GRUs)](#week-4-sequence-modeling-rnns-lstms-grus) 85 | 5. [Week 5: Transformers & Attention Mechanisms](#week-5-transformers--attention-mechanisms) 86 | 6. [Week 6: Text Generation Algorithms & Prompting Basics](#week-6-text-generation-algorithms--prompting-basics) 87 | 7. [Week 7: Instruction Tuning & Efficient Fine-Tuning Methods](#week-7-instruction-tuning--efficient-fine-tuning-methods) 88 | 8. [Week 8: Experimental Design & Human Annotation](#week-8-experimental-design--human-annotation) 89 | 9. [Week 9: Retrieval & Retrieval-Augmented Generation (RAG)](#week-9-retrieval--retrieval-augmented-generation-rag) 90 | 10. [Week 10: Distillation, Quantization & RL from Human Feedback](#week-10-distillation-quantization--rl-from-human-feedback) 91 | 11. [Week 11: Debugging & Interpretation](#week-11-debugging--interpretation) 92 | 12. [Week 12: Advanced LLMs, Agents & Long Contexts](#week-12-advanced-llms-agents--long-contexts) 93 | 13. [Week 13: Complex Reasoning & Linguistics](#week-13-complex-reasoning--linguistics) 94 | 14. [Week 14: Multilingual NLP & Wrap-Up](#week-14-multilingual-nlp--wrap-up) 95 | 15. [Additional Tips & Final Notes](#additional-tips--final-notes) 96 | 97 | --- 98 | 99 | ## Week 1: Introduction & NLP Fundamentals 100 | 101 | **Core Topics** 102 | - Overview of Natural Language Processing (NLP) 103 | - Rule-based, statistical, and neural approaches 104 | - Introductory tasks: classification, tagging, QA, generation 105 | 106 | **Suggested References** 107 | - **CMU 11-711, Lecture 1 (Introduction)** 108 | - *Intro Slides* 109 | - “Examining Power and Agency in Film” – Sap et al. (2017) 110 | - **UMass CS 685, Week 1** 111 | - Basic LM intros (Jurafsky & Martin, Sections 3.1–3.5 and 7) 112 | 113 | **Practical Exercise** 114 | - Install a DL framework (e.g., PyTorch). Implement a simple **rule-based** text classifier vs. a **logistic regression** classifier on a small dataset. 115 | 116 | --- 117 | 118 | ## Week 2: Word Representations & Text Classification 119 | 120 | **Core Topics** 121 | - Bag-of-words (BoW) and subword models (BPE, SentencePiece) 122 | - Continuous word embeddings (word2vec, GloVe) 123 | - Visualizing embeddings (t-SNE, PCA) 124 | 125 | **Suggested References** 126 | - **CMU 11-711, Lecture 2** 127 | - Sennrich et al. (2015) – Subword NMT 128 | - Kudo (2018) – SentencePiece 129 | - **UMass CS 685, Week 2** 130 | - Bengio et al. (2003) – foundational neural LM 131 | - Karpathy’s blog post on backprop basics 132 | 133 | **Practical Exercise** 134 | - Train a **CNN or LSTM**-based text classifier using SentencePiece tokenization. Compare with a **bag-of-words** approach. 135 | 136 | --- 137 | 138 | ## Week 3: Language Modeling 139 | 140 | **Core Topics** 141 | - N-gram language models 142 | - Neural LMs (feed-forward vs. RNN-based) 143 | - Perplexity, smoothing, log-linear models 144 | 145 | **Suggested References** 146 | - **CMU 11-711, Lecture 3** 147 | - Goodman (1998) – smoothing 148 | - kenlm toolkit 149 | - **UMass CS 685** 150 | - Jurafsky & Martin, sections 3.1–3.5 & 7 151 | 152 | **Practical Exercise** 153 | - Implement a **count-based n-gram** LM (compute perplexity). Then build a **simple feed-forward** LM and compare. 154 | 155 | --- 156 | 157 | ## Week 4: Sequence Modeling (RNNs, LSTMs, GRUs) 158 | 159 | **Core Topics** 160 | - Recurrent Neural Networks (RNNs) 161 | - Vanishing/exploding gradients 162 | - LSTM and GRU architectures 163 | 164 | **Suggested References** 165 | - **CMU 11-711, Lecture 4** 166 | - Elman (1990) – “Finding Structure in Time” 167 | - Hochreiter & Schmidhuber (1997) – LSTM 168 | - **UMass CS 685, Week 3** 169 | - Pascanu et al. (2013) – vanishing gradients in RNNs 170 | 171 | **Practical Exercise** 172 | - Build an **LSTM language model**. Compare performance to a feed-forward LM on a text corpus. 173 | 174 | --- 175 | 176 | ## Week 5: Transformers & Attention Mechanisms 177 | 178 | **Core Topics** 179 | - Attention (Bahdanau, Luong, Vaswani) 180 | - Self-attention, multi-head attention, positional encodings 181 | - Encoder–decoder vs. decoder-only Transformers 182 | 183 | **Suggested References** 184 | - **CMU 11-711, Lecture 5** 185 | - Bahdanau et al. (2015) – alignment-based attention 186 | - Vaswani et al. (2017) – “Attention Is All You Need” 187 | - **UMass CS 685, Weeks 3-4** 188 | - Illustrated Transformer blog post by Jay Alammar 189 | 190 | **Practical Exercise** 191 | - Implement a **Transformer encoder** for text classification. Compare speed/accuracy to an LSTM approach. 192 | 193 | --- 194 | 195 | ## Week 6: Text Generation Algorithms & Prompting Basics 196 | 197 | **Core Topics** 198 | - Decoding: greedy, beam, top-k, nucleus sampling 199 | - Intro to prompting for generation 200 | - (Optional) Minimum Bayes Risk decoding 201 | 202 | **Suggested References** 203 | - **CMU 11-711, Lecture 6** 204 | - Holtzmann et al. (2020) – nucleus sampling 205 | - Kool et al. (2019) – stochastic beam search 206 | - **UMass CS 685, Week 7** 207 | - RankGen (Krishna et al., 2022) 208 | 209 | **Practical Exercise** 210 | - Implement **nucleus sampling** in a Transformer LM. Experiment with different prompts to observe changes in output. 211 | 212 | --- 213 | 214 | ## Week 7: Instruction Tuning & Efficient Fine-Tuning Methods 215 | 216 | **Core Topics** 217 | - Few-shot prompting vs. full fine-tuning 218 | - Parameter-efficient tuning (LoRA, adapters, prompt tuning) 219 | - Models like T5, BERT, FLAN 220 | 221 | **Suggested References** 222 | - **CMU 11-711, Lectures 7–8** 223 | - Brown et al. (2020) – GPT-3 & in-context learning 224 | - Wei et al. (2021, 2022) – FLAN / instruction tuning 225 | - **UMass CS 685, Week 5** 226 | - Sennrich et al. (2016) – subword units 227 | - LoRA (Hu et al., 2021) 228 | - Lester et al. (2021) – prompt tuning 229 | 230 | **Practical Exercise** 231 | - Fine-tune a **T5** or **BERT** model with **LoRA** for a QA task. Compare with standard fine-tuning. 232 | 233 | --- 234 | 235 | ## Week 8: Experimental Design & Human Annotation 236 | 237 | **Core Topics** 238 | - Designing NLP experiments 239 | - Human annotation best practices 240 | - Data collection & inter-annotator agreement 241 | 242 | **Suggested References** 243 | - **CMU 11-711, Lecture 9** 244 | - Bender & Friedman (2018) – data statements for NLP 245 | - Lones (2021) – “How to avoid ML pitfalls” 246 | 247 | **Practical Exercise** 248 | - Collect a **sentiment dataset**, label it with 2 or more annotators, and compute **Cohen’s Kappa** or **Krippendorff’s Alpha**. 249 | 250 | --- 251 | 252 | ## Week 9: Retrieval & Retrieval-Augmented Generation (RAG) 253 | 254 | **Core Topics** 255 | - Information retrieval (BM25, DPR) 256 | - Retrieval-augmented LMs (REALM, RAG) 257 | - Dense vs. sparse retrieval, long context 258 | 259 | **Suggested References** 260 | - **CMU 11-711, Lecture 10** 261 | - Chen et al. (2017) – DrQA 262 | - Karpukhin et al. (2020) – Dense Passage Retrieval 263 | - Lewis et al. (2020) – RAG 264 | - **UMass CS 685, Week 8** 265 | - Guu et al. (2020) – REALM 266 | - Schick et al. (2023) – Toolformer 267 | 268 | **Practical Exercise** 269 | - Implement a **retrieval-augmented QA** system. Compare **BM25** vs. **DPR** for document retrieval. 270 | 271 | --- 272 | 273 | ## Week 10: Distillation, Quantization & RL from Human Feedback 274 | 275 | **Core Topics** 276 | - Model compression (pruning, quantization, distillation) 277 | - RL from human feedback (RLHF, RLAIF, DPO) 278 | - Impact on performance, inference cost 279 | 280 | **Suggested References** 281 | - **CMU 11-711, Lecture 11** 282 | - Sanh et al. (2019) – DistilBERT 283 | - Dettmers et al. (2023) – QLoRA 284 | - Frankle & Carbin (2019) – Lottery Ticket Hypothesis 285 | - **UMass CS 685, Week 6** 286 | - Ouyang et al. (2022) – RLHF 287 | - Lee et al. (2023) – RLAIF 288 | - Rafailov et al. (2023) – Direct Preference Optimization 289 | 290 | **Practical Exercise** 291 | - Distill a **larger Transformer** to a smaller one. Or set up a **mini RLHF** pipeline with a preference dataset. 292 | 293 | --- 294 | 295 | ## Week 11: Debugging & Interpretation 296 | 297 | **Core Topics** 298 | - Model debugging strategies 299 | - Probing classifiers (edge probing) 300 | - Mechanistic interpretability (circuits, induction heads) 301 | - Model editing (ROME) 302 | 303 | **Suggested References** 304 | - **CMU 11-711, Lecture 12** 305 | - Tenney et al. (2019) – edge probing 306 | - Elhage et al. (2021) – Transformer circuits 307 | - Meng et al. (2022) – ROME 308 | - **UMass CS 685, Week 10** 309 | - Olsson et al. (2022) – induction heads 310 | - Hernandez et al. (2023) – knowledge representations 311 | 312 | **Practical Exercise** 313 | - Perform **edge probing** on a BERT model to analyze linguistic features. 314 | - Use **ROME** to edit a factual statement in GPT-style LMs. 315 | 316 | --- 317 | 318 | ## Week 12: Advanced LLMs, Agents & Long Contexts 319 | 320 | **Core Topics** 321 | - Modern LLMs (LLaMa, GPT-4, Claude, Mistral) 322 | - Long-context solutions (Transformer-XL, RoPE, FlashAttention) 323 | - Language agents & tool use (Toolformer, ReAct) 324 | 325 | **Suggested References** 326 | - **CMU 11-711, Lecture 15–16** 327 | - Touvron et al. (2023) – LLaMa 328 | - Yao et al. (2023) – ReAct 329 | - Schick et al. (2023) – Toolformer 330 | - **UMass CS 685** 331 | - Su et al. (2021) – RoPE 332 | - Dao et al. (2022) – FlashAttention 333 | 334 | **Practical Exercise** 335 | - Experiment with a **decoder-only LLM** (e.g., LLaMa) on a larger context. 336 | - Integrate “tool use” (e.g., a calculator or database lookup). 337 | 338 | --- 339 | 340 | ## Week 13: Complex Reasoning & Linguistics 341 | 342 | **Core Topics** 343 | - Chain-of-thought prompting 344 | - Abductive reasoning, logic-based inference 345 | - Linguistic structure in neural models 346 | - Compositional generalization (COGS, SCAN) 347 | 348 | **Suggested References** 349 | - **CMU 11-711, Lecture 21–22** 350 | - Wei et al. (2022) – chain-of-thought 351 | - Kojima et al. (2022) – “Let’s Think Step by Step” 352 | - Harris (1954) – distributional structure 353 | - Kim & Linzen (2020) – COGS 354 | - **UMass CS 685** 355 | - Various alignment & reasoning references 356 | 357 | **Practical Exercise** 358 | - Try **chain-of-thought** prompts on multi-step reasoning tasks. 359 | - Evaluate compositional generalization on a synthetic dataset (COGS). 360 | 361 | --- 362 | 363 | ## Week 14: Multilingual NLP & Wrap-Up 364 | 365 | **Core Topics** 366 | - Multilingual embeddings (mBERT, XLM-R) 367 | - Zero-shot/few-shot cross-lingual transfer 368 | - Summarizing your entire NLP pipeline 369 | 370 | **Suggested References** 371 | - **CMU 11-711, Lecture 23** 372 | - Johnson et al. (2016) – Google’s multilingual NMT 373 | - Wu & Dredze (2019) – “Beto, Bentz, Becas” 374 | - NLLB Team (2022) – “No Language Left Behind” 375 | - **UMass CS 685** 376 | - Apply earlier methods in a multilingual setting 377 | 378 | **Practical Exercise** 379 | - Fine-tune an **mBERT** model on a classification task in one language, test zero-shot in another. 380 | 381 | --- 382 | 383 | ## Additional Tips & Final Notes 384 | 385 | 1. **Time Commitment** 386 | - Each “week” can be **1–2 weeks** of part-time study. Expect **3–5 months** for thorough coverage. 387 | 388 | 2. **Project Building** 389 | - Combine modules into a **final project**, e.g., a retrieval-augmented, instruction-tuned mini-LLM tested on compositional tasks. 390 | 391 | 3. **Tooling** 392 | - Frameworks: **PyTorch** or **TensorFlow** 393 | - Retrieval libs: **FAISS**, **Chroma**, **Lucene** 394 | - Model hub: **Hugging Face Transformers** 395 | 396 | 4. **Community & Discussion** 397 | - Check out **NLP Slack/Discord** groups, **Reddit r/MachineLearning**. 398 | - Present mini-projects for feedback. 399 | 400 | 5. **Math Foundations** 401 | - Reinforce **backprop, linear algebra, probability** as needed, especially for RNNs and Transformers. 402 | 403 | 6. **Flexibility** 404 | - Reorder or skip modules to suit your interests (e.g., focus on generation, interpretation, or multilingual). 405 | 406 | --- 407 | 408 | **Good luck with your NLP self-study!** By studying each week’s topics, reading key papers, watching relevant lectures, and coding weekly exercises, you’ll build a robust understanding of modern NLP and large language models. 409 | ``` 410 | --------------------------------------------------------------------------------