└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # Self-Study NLP Roadmap
  3 | 
  4 | This roadmap combines topics and references from **CMU’s 11-711 Advanced NLP** and **UMass Amherst’s CS 685**. It’s designed for a structured, step-by-step self-study over about **12–14 weeks** (or longer, depending on your schedule). Each “week” is flexible and can span 1–2 weeks of part-time study.
  5 | 
  6 | ---
  7 | 
  8 | ## Mind Map / Outline
  9 | 
 10 | Below is a simplified *mind map* illustrating how topics connect. It starts from the fundamentals of NLP and branches out to advanced areas like large language models (LLMs), retrieval-augmented generation (RAG), and interpretability.
 11 | 
 12 | <details>
 13 | <summary>Click to expand the Mind Map</summary>
 14 | 
 15 | ```
 16 |                                ┌─────────────────────────┐
 17 |                                │     Week 1: Intro       │
 18 |                                │   & NLP Fundamentals    │
 19 |                                └─────────────┬───────────┘
 20 |                                              │
 21 |              ┌───────────────────────────────┼───────────────────────────────┐
 22 |              ▼                               ▼                               ▼
 23 |  ┌───────────────────────┐       ┌───────────────────────┐       ┌───────────────────────┐
 24 |  │ Week 2: Word Reps &   │       │  Week 3: Language     │       │ Week 4: Sequence      │
 25 |  │  Text Classification  │       │  Modeling             │       │ Modeling (RNNs, LSTMs)│
 26 |  └───────────────────────┘       └───────────────────────┘       └───────────────────────┘
 27 |              │                               │                           │
 28 |              │                               ▼                           │
 29 |              ├──────────────┐   ┌───────────────────────────────────┐    │
 30 |              │              ▼   │  Week 5: Transformers & Attention │    │
 31 |              │  ┌────────────────────────────────────────────────────┘    │
 32 |              │  │                        │                                │
 33 |              │  ▼                        ▼                                ▼
 34 |  ┌───────────────────────┐       ┌─────────────────────────────┐  ┌───────────────────────────────┐
 35 |  │ Week 6: Text Gen &    │       │ Week 7: Instruction Tuning & │  │ Week 8: Experimental Design & │
 36 |  │ Prompting Basics      │       │  Efficient Fine-Tuning       │  │     Human Annotation          │
 37 |  └───────────────────────┘       └─────────────────────────────┘  └───────────────────────────────┘
 38 |              │                        │                                │
 39 |              │                        └───────────────┬────────────────┘
 40 |              │                                        │
 41 |              ▼                                        ▼
 42 |  ┌────────────────────────────┐            ┌────────────────────────────┐
 43 |  │ Week 9: Retrieval & RAG    │            │ Week 10: Distillation,     │
 44 |  │ (Retrieval-Augmented Gen)  │            │ Quantization & RL from HF  │
 45 |  └────────────────────────────┘            └────────────────────────────┘
 46 |              │                                        │
 47 |              └───────────────┬────────────────────────┘
 48 |                              │
 49 |                              ▼
 50 |                  ┌────────────────────────────┐
 51 |                  │ Week 11: Debugging &       │
 52 |                  │ Interpretation (Probing,   │
 53 |                  │ Mechanistic Interp.)       │
 54 |                  └─────────────┬──────────────┘
 55 |                                │
 56 |                                ▼
 57 |              ┌─────────────────────────────────────────────────────┐
 58 |              │ Week 12: Advanced LLMs, Agents & Long Contexts      │
 59 |              │ (LLaMa, GPT-4, Toolformer, ReAct, etc.)             │
 60 |              └─────────────┬───────────────────────────────────────┘
 61 |                            │
 62 |                            ▼
 63 |           ┌───────────────────────────────────────────────────┐
 64 |           │ Week 13: Complex Reasoning & Linguistics         │
 65 |           │ (Chain-of-thought, abductive reasoning, etc.)    │
 66 |           └─────────────┬─────────────────────────────────────┘
 67 |                         │
 68 |                         ▼
 69 |           ┌───────────────────────────────────────────────────┐
 70 |           │ Week 14: Multilingual NLP & Wrap-Up              │
 71 |           │ (mBERT, XLM-R, zero-shot cross-lingual)          │
 72 |           └───────────────────────────────────────────────────┘
 73 | ```
 74 | 
 75 | </details>
 76 | 
 77 | ---
 78 | 
 79 | ## Table of Contents
 80 | 
 81 | 1. [Week 1: Introduction & NLP Fundamentals](#week-1-introduction--nlp-fundamentals)  
 82 | 2. [Week 2: Word Representations & Text Classification](#week-2-word-representations--text-classification)  
 83 | 3. [Week 3: Language Modeling](#week-3-language-modeling)  
 84 | 4. [Week 4: Sequence Modeling (RNNs, LSTMs, GRUs)](#week-4-sequence-modeling-rnns-lstms-grus)  
 85 | 5. [Week 5: Transformers & Attention Mechanisms](#week-5-transformers--attention-mechanisms)  
 86 | 6. [Week 6: Text Generation Algorithms & Prompting Basics](#week-6-text-generation-algorithms--prompting-basics)  
 87 | 7. [Week 7: Instruction Tuning & Efficient Fine-Tuning Methods](#week-7-instruction-tuning--efficient-fine-tuning-methods)  
 88 | 8. [Week 8: Experimental Design & Human Annotation](#week-8-experimental-design--human-annotation)  
 89 | 9. [Week 9: Retrieval & Retrieval-Augmented Generation (RAG)](#week-9-retrieval--retrieval-augmented-generation-rag)  
 90 | 10. [Week 10: Distillation, Quantization & RL from Human Feedback](#week-10-distillation-quantization--rl-from-human-feedback)  
 91 | 11. [Week 11: Debugging & Interpretation](#week-11-debugging--interpretation)  
 92 | 12. [Week 12: Advanced LLMs, Agents & Long Contexts](#week-12-advanced-llms-agents--long-contexts)  
 93 | 13. [Week 13: Complex Reasoning & Linguistics](#week-13-complex-reasoning--linguistics)  
 94 | 14. [Week 14: Multilingual NLP & Wrap-Up](#week-14-multilingual-nlp--wrap-up)  
 95 | 15. [Additional Tips & Final Notes](#additional-tips--final-notes)
 96 | 
 97 | ---
 98 | 
 99 | ## Week 1: Introduction & NLP Fundamentals
100 | 
101 | **Core Topics**  
102 | - Overview of Natural Language Processing (NLP)  
103 | - Rule-based, statistical, and neural approaches  
104 | - Introductory tasks: classification, tagging, QA, generation  
105 | 
106 | **Suggested References**  
107 | - **CMU 11-711, Lecture 1 (Introduction)**
108 |   - *Intro Slides*  
109 |   - “Examining Power and Agency in Film” – Sap et al. (2017)
110 | - **UMass CS 685, Week 1**  
111 |   - Basic LM intros (Jurafsky & Martin, Sections 3.1–3.5 and 7)
112 | 
113 | **Practical Exercise**  
114 | - Install a DL framework (e.g., PyTorch). Implement a simple **rule-based** text classifier vs. a **logistic regression** classifier on a small dataset.
115 | 
116 | ---
117 | 
118 | ## Week 2: Word Representations & Text Classification
119 | 
120 | **Core Topics**  
121 | - Bag-of-words (BoW) and subword models (BPE, SentencePiece)  
122 | - Continuous word embeddings (word2vec, GloVe)  
123 | - Visualizing embeddings (t-SNE, PCA)
124 | 
125 | **Suggested References**  
126 | - **CMU 11-711, Lecture 2**  
127 |   - Sennrich et al. (2015) – Subword NMT  
128 |   - Kudo (2018) – SentencePiece
129 | - **UMass CS 685, Week 2**  
130 |   - Bengio et al. (2003) – foundational neural LM  
131 |   - Karpathy’s blog post on backprop basics
132 | 
133 | **Practical Exercise**  
134 | - Train a **CNN or LSTM**-based text classifier using SentencePiece tokenization. Compare with a **bag-of-words** approach.
135 | 
136 | ---
137 | 
138 | ## Week 3: Language Modeling
139 | 
140 | **Core Topics**  
141 | - N-gram language models  
142 | - Neural LMs (feed-forward vs. RNN-based)  
143 | - Perplexity, smoothing, log-linear models
144 | 
145 | **Suggested References**  
146 | - **CMU 11-711, Lecture 3**  
147 |   - Goodman (1998) – smoothing  
148 |   - kenlm toolkit  
149 | - **UMass CS 685**  
150 |   - Jurafsky & Martin, sections 3.1–3.5 & 7
151 | 
152 | **Practical Exercise**  
153 | - Implement a **count-based n-gram** LM (compute perplexity). Then build a **simple feed-forward** LM and compare.
154 | 
155 | ---
156 | 
157 | ## Week 4: Sequence Modeling (RNNs, LSTMs, GRUs)
158 | 
159 | **Core Topics**  
160 | - Recurrent Neural Networks (RNNs)  
161 | - Vanishing/exploding gradients  
162 | - LSTM and GRU architectures
163 | 
164 | **Suggested References**  
165 | - **CMU 11-711, Lecture 4**
166 |   - Elman (1990) – “Finding Structure in Time”
167 |   - Hochreiter & Schmidhuber (1997) – LSTM
168 | - **UMass CS 685, Week 3**
169 |   - Pascanu et al. (2013) – vanishing gradients in RNNs
170 | 
171 | **Practical Exercise**  
172 | - Build an **LSTM language model**. Compare performance to a feed-forward LM on a text corpus.
173 | 
174 | ---
175 | 
176 | ## Week 5: Transformers & Attention Mechanisms
177 | 
178 | **Core Topics**  
179 | - Attention (Bahdanau, Luong, Vaswani)  
180 | - Self-attention, multi-head attention, positional encodings  
181 | - Encoder–decoder vs. decoder-only Transformers
182 | 
183 | **Suggested References**  
184 | - **CMU 11-711, Lecture 5**  
185 |   - Bahdanau et al. (2015) – alignment-based attention  
186 |   - Vaswani et al. (2017) – “Attention Is All You Need”
187 | - **UMass CS 685, Weeks 3-4**  
188 |   - Illustrated Transformer blog post by Jay Alammar
189 | 
190 | **Practical Exercise**  
191 | - Implement a **Transformer encoder** for text classification. Compare speed/accuracy to an LSTM approach.
192 | 
193 | ---
194 | 
195 | ## Week 6: Text Generation Algorithms & Prompting Basics
196 | 
197 | **Core Topics**  
198 | - Decoding: greedy, beam, top-k, nucleus sampling  
199 | - Intro to prompting for generation  
200 | - (Optional) Minimum Bayes Risk decoding
201 | 
202 | **Suggested References**  
203 | - **CMU 11-711, Lecture 6**  
204 |   - Holtzmann et al. (2020) – nucleus sampling  
205 |   - Kool et al. (2019) – stochastic beam search
206 | - **UMass CS 685, Week 7**  
207 |   - RankGen (Krishna et al., 2022)
208 | 
209 | **Practical Exercise**  
210 | - Implement **nucleus sampling** in a Transformer LM. Experiment with different prompts to observe changes in output.
211 | 
212 | ---
213 | 
214 | ## Week 7: Instruction Tuning & Efficient Fine-Tuning Methods
215 | 
216 | **Core Topics**  
217 | - Few-shot prompting vs. full fine-tuning  
218 | - Parameter-efficient tuning (LoRA, adapters, prompt tuning)  
219 | - Models like T5, BERT, FLAN
220 | 
221 | **Suggested References**  
222 | - **CMU 11-711, Lectures 7–8**  
223 |   - Brown et al. (2020) – GPT-3 & in-context learning  
224 |   - Wei et al. (2021, 2022) – FLAN / instruction tuning
225 | - **UMass CS 685, Week 5**  
226 |   - Sennrich et al. (2016) – subword units  
227 |   - LoRA (Hu et al., 2021)  
228 |   - Lester et al. (2021) – prompt tuning
229 | 
230 | **Practical Exercise**  
231 | - Fine-tune a **T5** or **BERT** model with **LoRA** for a QA task. Compare with standard fine-tuning.
232 | 
233 | ---
234 | 
235 | ## Week 8: Experimental Design & Human Annotation
236 | 
237 | **Core Topics**  
238 | - Designing NLP experiments  
239 | - Human annotation best practices  
240 | - Data collection & inter-annotator agreement
241 | 
242 | **Suggested References**  
243 | - **CMU 11-711, Lecture 9**  
244 |   - Bender & Friedman (2018) – data statements for NLP  
245 |   - Lones (2021) – “How to avoid ML pitfalls”
246 | 
247 | **Practical Exercise**  
248 | - Collect a **sentiment dataset**, label it with 2 or more annotators, and compute **Cohen’s Kappa** or **Krippendorff’s Alpha**.
249 | 
250 | ---
251 | 
252 | ## Week 9: Retrieval & Retrieval-Augmented Generation (RAG)
253 | 
254 | **Core Topics**  
255 | - Information retrieval (BM25, DPR)  
256 | - Retrieval-augmented LMs (REALM, RAG)  
257 | - Dense vs. sparse retrieval, long context
258 | 
259 | **Suggested References**  
260 | - **CMU 11-711, Lecture 10**  
261 |   - Chen et al. (2017) – DrQA  
262 |   - Karpukhin et al. (2020) – Dense Passage Retrieval  
263 |   - Lewis et al. (2020) – RAG
264 | - **UMass CS 685, Week 8**  
265 |   - Guu et al. (2020) – REALM  
266 |   - Schick et al. (2023) – Toolformer
267 | 
268 | **Practical Exercise**  
269 | - Implement a **retrieval-augmented QA** system. Compare **BM25** vs. **DPR** for document retrieval.
270 | 
271 | ---
272 | 
273 | ## Week 10: Distillation, Quantization & RL from Human Feedback
274 | 
275 | **Core Topics**  
276 | - Model compression (pruning, quantization, distillation)  
277 | - RL from human feedback (RLHF, RLAIF, DPO)  
278 | - Impact on performance, inference cost
279 | 
280 | **Suggested References**  
281 | - **CMU 11-711, Lecture 11**  
282 |   - Sanh et al. (2019) – DistilBERT  
283 |   - Dettmers et al. (2023) – QLoRA  
284 |   - Frankle & Carbin (2019) – Lottery Ticket Hypothesis
285 | - **UMass CS 685, Week 6**  
286 |   - Ouyang et al. (2022) – RLHF  
287 |   - Lee et al. (2023) – RLAIF  
288 |   - Rafailov et al. (2023) – Direct Preference Optimization
289 | 
290 | **Practical Exercise**  
291 | - Distill a **larger Transformer** to a smaller one. Or set up a **mini RLHF** pipeline with a preference dataset.
292 | 
293 | ---
294 | 
295 | ## Week 11: Debugging & Interpretation
296 | 
297 | **Core Topics**  
298 | - Model debugging strategies  
299 | - Probing classifiers (edge probing)  
300 | - Mechanistic interpretability (circuits, induction heads)  
301 | - Model editing (ROME)
302 | 
303 | **Suggested References**  
304 | - **CMU 11-711, Lecture 12**  
305 |   - Tenney et al. (2019) – edge probing  
306 |   - Elhage et al. (2021) – Transformer circuits  
307 |   - Meng et al. (2022) – ROME
308 | - **UMass CS 685, Week 10**  
309 |   - Olsson et al. (2022) – induction heads  
310 |   - Hernandez et al. (2023) – knowledge representations
311 | 
312 | **Practical Exercise**  
313 | - Perform **edge probing** on a BERT model to analyze linguistic features.  
314 | - Use **ROME** to edit a factual statement in GPT-style LMs.
315 | 
316 | ---
317 | 
318 | ## Week 12: Advanced LLMs, Agents & Long Contexts
319 | 
320 | **Core Topics**  
321 | - Modern LLMs (LLaMa, GPT-4, Claude, Mistral)  
322 | - Long-context solutions (Transformer-XL, RoPE, FlashAttention)  
323 | - Language agents & tool use (Toolformer, ReAct)
324 | 
325 | **Suggested References**  
326 | - **CMU 11-711, Lecture 15–16**  
327 |   - Touvron et al. (2023) – LLaMa  
328 |   - Yao et al. (2023) – ReAct  
329 |   - Schick et al. (2023) – Toolformer
330 | - **UMass CS 685**  
331 |   - Su et al. (2021) – RoPE  
332 |   - Dao et al. (2022) – FlashAttention
333 | 
334 | **Practical Exercise**  
335 | - Experiment with a **decoder-only LLM** (e.g., LLaMa) on a larger context.  
336 | - Integrate “tool use” (e.g., a calculator or database lookup).
337 | 
338 | ---
339 | 
340 | ## Week 13: Complex Reasoning & Linguistics
341 | 
342 | **Core Topics**  
343 | - Chain-of-thought prompting  
344 | - Abductive reasoning, logic-based inference  
345 | - Linguistic structure in neural models  
346 | - Compositional generalization (COGS, SCAN)
347 | 
348 | **Suggested References**  
349 | - **CMU 11-711, Lecture 21–22**  
350 |   - Wei et al. (2022) – chain-of-thought  
351 |   - Kojima et al. (2022) – “Let’s Think Step by Step”  
352 |   - Harris (1954) – distributional structure  
353 |   - Kim & Linzen (2020) – COGS
354 | - **UMass CS 685**  
355 |   - Various alignment & reasoning references
356 | 
357 | **Practical Exercise**  
358 | - Try **chain-of-thought** prompts on multi-step reasoning tasks.  
359 | - Evaluate compositional generalization on a synthetic dataset (COGS).
360 | 
361 | ---
362 | 
363 | ## Week 14: Multilingual NLP & Wrap-Up
364 | 
365 | **Core Topics**  
366 | - Multilingual embeddings (mBERT, XLM-R)  
367 | - Zero-shot/few-shot cross-lingual transfer  
368 | - Summarizing your entire NLP pipeline
369 | 
370 | **Suggested References**  
371 | - **CMU 11-711, Lecture 23**  
372 |   - Johnson et al. (2016) – Google’s multilingual NMT  
373 |   - Wu & Dredze (2019) – “Beto, Bentz, Becas”  
374 |   - NLLB Team (2022) – “No Language Left Behind”
375 | - **UMass CS 685**  
376 |   - Apply earlier methods in a multilingual setting
377 | 
378 | **Practical Exercise**  
379 | - Fine-tune an **mBERT** model on a classification task in one language, test zero-shot in another.
380 | 
381 | ---
382 | 
383 | ## Additional Tips & Final Notes
384 | 
385 | 1. **Time Commitment**  
386 |    - Each “week” can be **1–2 weeks** of part-time study. Expect **3–5 months** for thorough coverage.
387 | 
388 | 2. **Project Building**  
389 |    - Combine modules into a **final project**, e.g., a retrieval-augmented, instruction-tuned mini-LLM tested on compositional tasks.
390 | 
391 | 3. **Tooling**  
392 |    - Frameworks: **PyTorch** or **TensorFlow**  
393 |    - Retrieval libs: **FAISS**, **Chroma**, **Lucene**  
394 |    - Model hub: **Hugging Face Transformers**
395 | 
396 | 4. **Community & Discussion**  
397 |    - Check out **NLP Slack/Discord** groups, **Reddit r/MachineLearning**.  
398 |    - Present mini-projects for feedback.
399 | 
400 | 5. **Math Foundations**  
401 |    - Reinforce **backprop, linear algebra, probability** as needed, especially for RNNs and Transformers.
402 | 
403 | 6. **Flexibility**  
404 |    - Reorder or skip modules to suit your interests (e.g., focus on generation, interpretation, or multilingual).
405 | 
406 | ---
407 | 
408 | **Good luck with your NLP self-study!** By studying each week’s topics, reading key papers, watching relevant lectures, and coding weekly exercises, you’ll build a robust understanding of modern NLP and large language models.
409 | ```
410 | 


--------------------------------------------------------------------------------