├── _config.yml ├── index.html └── README.md /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-minimal -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Activity 1: Basic HTML Bio 6 | 7 | 8 | 9 | 10 |

Your Name

11 | 12 | Your Name 13 | 14 |

Write a short paragraph about yourself or some placeholder text.

15 |

A second short paragraph about yourself or some placeholder text.

16 | 17 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 |
BooksMoviesGames
The HobbitHot FuzzDark Souls
The Name of the WindThe AvengersThe Last of Us
The Girl With All the GiftsThe MatrixDragon Age: Origins
45 | 46 | 47 | 48 | 49 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # machine-learning-for-nlp-guide 2 | Guide for engineers interested in NLP machine learning 3 | 4 | ## Path 5 | 1. Understand possibilities and form business applications 6 | 1. Everyone [AI for Everyone](https://www.coursera.org/learn/ai-for-everyone) 7 | 8 | 1. Either level up through: 9 | 1. __Gaining theoretical foundation of Deep Learning for NLP__ 10 | 1. Stanford Course Materials http://web.stanford.edu/class/cs224n/ 11 | 1. Natural Language Processing with Deep Learning https://www.youtube.com/watch?v=8rXD5-xhemo&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z 12 | 1. Stanford CS224U: Natural Language Understanding https://www.youtube.com/watch?v=tZ_Jrc_nRJY&list=PLoROMvodv4rObpMCir6rNNUlFAn56Js20 13 | 1. __Getting "Practical" Knowledge of Deep Learning for NLP__ 14 | 1. [3Blue1Brown Neural Networks](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi) 15 | 1. [Rasa Whiteboard Youtube](https://www.youtube.com/watch?v=mWvnlVw_LiY&list=PL75e0qA87dlG-za8eLI6t0_Pbxafk-cxb&index=5) 16 | 1. [Rasa Whiteboard Github](https://www.youtube.com/redirect?redir_token=JoSXMpXu79Zsu0ao_9CQMdS4Jr18MTU4OTEzMjUzOEAxNTg5MDQ2MTM4&q=https%3A%2F%2Fgithub.com%2FRasaHQ%2Falgorithm-whiteboard-resources&v=mWvnlVw_LiY&event=video_description) 17 | 18 | 1. Learn how to Deep Learning 19 | 1. [Nuts and Bolts of Applying Deep Learning](https://www.youtube.com/watch?v=F1ka6a13S9I) 20 | 1. "Everyday" Engineers [Fast.ai](https://www.fast.ai/) 21 | 1. Research Engineers [Deep Learning AI](https://www.deeplearning.ai/deep-learning-specialization/) 22 | 23 | 1. Learn about all the stuff "they don't teach" 24 | 1. Learn Production-Level Deep Learning: https://fullstackdeeplearning.com/ 25 | 1. Resources: https://github.com/full-stack-deep-learning/fsdl-text-recognizer-project 26 | 1. Base Models to Use 27 | 1. [Spacy](https://spacy.io/) for general NLP tasks 28 | 1. [HuggingFace Transformers](https://github.com/huggingface/transformers) 29 | 30 | 1. Profit 31 | 32 | ## State of the Art Methods 33 | * [NLP Progress](https://github.com/sebastianruder/NLP-progress) 34 | * [Glue](https://gluebenchmark.com/leaderboard) 35 | * [Papers with code](https://paperswithcode.com/sota) 36 | 37 | ## Resources 38 | * Syntactic Search over Wikipedia: https://spike.wikipedia.apps.allenai.org/search/wikipedia 39 | * Odinson: Rapidly query a natural language knowledge base https://github.com/lum-ai/odinson 40 | * CheckList: Behavioral Testing NLP https://github.com/marcotcr/checklist 41 | * Data project checklist https://www.fast.ai/2020/01/07/data-questionnaire 42 | * BERT, ELMo, & GPT-2: How Contextual are Contextualized Word Representations? http://ai.stanford.edu/blog/contextual/ 43 | * BERT commit log https://amitness.com/2020/05/git-log-of-bert/ 44 | * Full stack deep learning github repo: https://github.com/full-stack-deep-learning/fsdl-text-recognizer-project 45 | * Expand Data Labeled Data using Unlabled Data 46 | * Blog: https://ai.googleblog.com/2019/03/harnessing-organizational-knowledge-for.html 47 | * Detailed Article: https://towardsdatascience.com/a-look-into-snorkel-drybell-8e9e781dc250 48 | * Explain Predictions 49 | * Python Library: https://github.com/jphall663/awesome-machine-learning-interpretability 50 | * Deploy models to production 51 | * Tutorial: https://hackernoon.com/enterprise-af-solution-for-text-classification-using-bert-9fe2b7234c46 52 | * Learn how to implement new models 53 | * Deep Learning from the Foundations: https://www.fast.ai/2019/06/28/course-p2v3/ 54 | * More Learning Resources: 55 | * [The Best Artificial Intelligence, Machine Learning and Data Science Resources*](https://www.notion.so/b3b97fa097b747698e87fd3badc657cf) 56 | * [nlp-library curated list of papers](https://github.com/mihail911/nlp-library) 57 | * Machine Learning System Best Practice and Design: 58 | * The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction: https://ai.google/research/pubs/pub46555 59 | * Machine Learning: The High Interest Credit Card of Technical Debt: https://ai.google/research/pubs/pub43146 60 | * [An Interactive Visualization to Explore NLP Papers](https://saifmohammad.com/WebPages/nlpscholar-demo-basic.html) 61 | * [How Big Should My Language Model Be?](https://huggingface.co/calculator/) 62 | * [Accelerate your NLP pipelines using Hugging Face Transformers and ONNX Runtime](https://opendatascience.com/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime/) 63 | 64 | 65 | ## Tools 66 | * https://prodi.gy/buy 67 | * Text and image annotation 68 | * https://github.com/chakki-works/doccano 69 | * Open source text annotation tool 70 | * https://www.media.mit.edu/projects/dive/overview/ 71 | * DIVE is a web-based data exploration system that lets non-technical users create stories from their data without writing code. DIVE combines semantic data ingestion, recommendation-based visualization and analysis, and dynamic story sharing into a unified workflow. 72 | 73 | 74 | ## Infrastructure 75 | * Seldon 76 | * https://www.youtube.com/watch?time_continue=2&v=cDtzu4WBzWA 77 | * https://github.com/kubeflow/example-seldon 78 | * https://docs.seldon.io/projects/seldon-core/en/latest/examples/nvidia_mnist.html 79 | * Kubeflow 80 | * https://www.kubeflow.org/docs/started/getting-started/ 81 | * TFX 82 | * https://www.tensorflow.org/tfx 83 | * ![Comare TFX and Kubeflow](https://imgur.com/IuH3T04.png) 84 | 85 | ## Research Interest 86 | * Text Atlas 87 | * Feature Visualization https://distill.pub/2017/feature-visualization/ 88 | * Activation Atlas https://distill.pub/2019/activation-atlas/ 89 | 90 | ## Newsletter to Follow 91 | * NLP News http://newsletter.ruder.io 92 | * The Batch https://www.deeplearning.ai/thebatch/ 93 | 94 | ## Podcasts to listen 95 | * NLP Highlights https://soundcloud.com/nlp-highlights 96 | 97 | ## Blogs to Follow 98 | * Google Data Analytics https://cloud.google.com/blog/products/data-analytics/ 99 | * AWS Big Data Blog https://aws.amazon.com/blogs/big-data/ 100 | * fast.ai http://www.fast.ai/ 101 | * FastML http://fastml.com/ 102 | * The Unofficial Google Data Science Blog http://www.unofficialgoogledatascience.com/ 103 | * DeepMind https://deepmind.com/blog/ 104 | * The Official Google Blog https://www.blog.google/ 105 | * Distill https://distill.pub 106 | * DataCamp Community https://www.datacamp.com/community 107 | * AI Applications https://vaultanalytics.com/marketinganalytics 108 | * Google AI Blog http://ai.googleblog.com/ 109 | * Google Developers Blog http://developers.googleblog.com/ 110 | * the morning paper https://blog.acolyer.org 111 | * Machine Learning @ Berkeley https://medium.com/@ml.at.berkeley?source=rss-a34a9c1d8009------2 112 | * All - naacl.org http://naacl-org.github.com 113 | * Facebook Research https://research.fb.com 114 | * OpenAI https://blog.openai.com 115 | * Y Combinator http://www.ycombinator.com 116 | * The Berkeley Artificial Intelligence Research Blog http://bair.berkeley.edu/blog/ 117 | * No Free Hunch http://blog.kaggle.com 118 | * Off the convex path http://offconvex.github.io/ 119 | 120 | ## Datasets 121 | * A unified platform for sharing, training and evaluating dialogue models across many tasks. https://parl.ai/ 122 | 123 | You can also follow me on twitter: https://twitter.com/LeoApolonio 124 | --------------------------------------------------------------------------------