├── .gitignore ├── README.md ├── hw1 ├── README.md └── rosenbrock.py ├── hw2 ├── README.md ├── classifier.py └── run_classifier.sh ├── hw3 └── README.md ├── hw4 ├── README.md └── resnet.py ├── hw5 ├── README.md ├── cmc1.png ├── cmc2.png ├── mike1.png ├── mike2.png ├── mike3.png ├── mike4.png └── mike5.png ├── hw6 ├── README.md ├── img │ └── xkcd-training.png ├── names.py ├── names.tar ├── names.tar.gz ├── names.zip └── names │ ├── Arabic.txt │ ├── Chinese.txt │ ├── Czech.txt │ ├── Dutch.txt │ ├── English.txt │ ├── French.txt │ ├── German.txt │ ├── Greek.txt │ ├── Irish.txt │ ├── Italian.txt │ ├── Japanese.txt │ ├── Korean.txt │ ├── Polish.txt │ ├── Portuguese.txt │ ├── Russian.txt │ ├── Scottish.txt │ ├── Spanish.txt │ └── Vietnamese.txt ├── hw7 ├── README.md └── names.py ├── img └── layers.png ├── lecture_notes ├── ad.py ├── ad2.py └── einsum.py └── project ├── README.md ├── img ├── line0000.char.png ├── line0000.word.png ├── line0001.char.png ├── line0001.word.png ├── line0002.char.png ├── line0002.word.png ├── line0003.char.png └── line0003.word.png ├── names.py ├── names_embedding.py ├── names_transformers.py └── transformers_tutorial.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | *.swo 3 | __pycache__ 4 | notes 5 | data 6 | names 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CSCI181: Deep Learning 2 | 3 | ## About the Instructor 4 | 5 | ||| 6 | |-|-| 7 | | Name | Mike Izbicki (call me Mike) | 8 | | Email | mizbicki@cmc.edu | 9 | | Office | Adams 216 | 10 | | Office Hours | Monday 9:00-10:00AM, Tuesday/Thursday 2:30-3:30PM, or by appointment ([see my schedule](https://izbicki.me/schedule.html));
if my door is open, feel free to come in | 11 | | Webpage | [izbicki.me](https://izbicki.me) | 12 | | Research | Machine Learning (see [izbicki.me/research.html](https://izbicki.me/research.html) for some past projects) | 13 | | Fun Facts | grew up in San Clemente, 7 years in the navy, phd/postdoc at UC Riverside, taught in [DPRK](https://pust.co) | 14 | 15 | ## About the Course 16 | 17 | This is a course on **deep learning** (not big data). 18 | 19 |

20 | 21 |

22 | 23 | **Course Objectives:** 24 | 25 | Learning objectives: 26 | 27 | 1. Write basic PyTorch applications 28 | 1. Understand the "classic" deep network architectures 29 | 1. Use existing models in a "reasonable" way 30 | 1. Understand the limitations of deep learning 31 | 1. Read research papers published in deep learning 32 | 1. Understand what graduate school in machine learning is like 33 | 1. (Joke) [Understand that Schmidhuber invented machine learning](https://www.reddit.com/r/MachineLearning/comments/eivtmq/d_nominate_jurgen_schmidhuber_for_the_2020_turing/) 34 | 35 | My personal goal: 36 | 37 | 1. Find students to conduct research with me 38 | 39 | **Expected Background:** 40 | 41 | Necessary: 42 | 43 | 1. Basic Python programming 44 | 1. Linear algebra 45 | 1. Calc III 46 | 1. Statistics 47 | 48 | Good to have: 49 | 50 | 1. Machine learning / data mining 51 | 1. Lots of math 52 | 1. Familiarity with Unix and github 53 | 54 | **Resources:** 55 | 56 | Textbook: 57 | 58 | 1. [The Deep Learning Book](http://www.deeplearningbook.org/), by Ian Goodfellow and Yoshua Bengio and Aaron Courville; I will assume that you already know all of Part I of this book (basically the equivalent of a data mining/machine learning course) 59 | 1. Various papers/webpages as listed below 60 | 61 | Deep learning examples: 62 | 63 | 1. Images / Video 64 | 1. [Deoldify](https://github.com/jantic/DeOldify) 65 | 1. [style transfer](https://genekogan.com/works/style-transfer/) 66 | 1. [more style transfer](https://github.com/lengstrom/fast-style-transfer) 67 | 1. [dance coreography](https://experiments.withgoogle.com/billtjonesai) 68 | 1. [StyleGAN](https://github.com/NVlabs/stylegan) 69 | 1. [DeepPrivacy](https://github.com/hukkelas/DeepPrivacy) 70 | 1. https://thispersondoesnotexist.com/ 71 | 1. https://thiscatdoesnotexist.com/ 72 | 1. [Deep fakes](https://www.creativebloq.com/features/deepfake-examples) 73 | 1. [In Event of Moon Disaster](https://www.wbur.org/news/2019/11/22/mit-nixon-deep-fake) 74 | 75 | 1. Text 76 | 1. [Image captioning](https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/) 77 | 1. [AI Dungeon](https://www.aidungeon.io/) 78 | 1. https://www.thisstorydoesnotexist.com/ 79 | 1. https://translate.google.com 80 | 81 | 1. Games 82 | 1. [AlphaGo](https://deepmind.com/research/case-studies/alphago-the-story-so-far) 83 | 1. [Dota 2](https://openai.com/projects/five/) 84 | 1. [StarCraft 2](https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning) 85 | 1. [MarioCart](https://www.youtube.com/watch?v=Ipi40cb_RsI) 86 | 1. [Mario](https://www.youtube.com/watch?v=qv6UVOQ0F44) 87 | 88 | 1. Other 89 | 1. [iSketchNFill](https://github.com/arnabgho/iSketchNFill) 90 | 1. [scrying-pen](https://experiments.withgoogle.com/scrying-pen) 91 | 1. [Tacotron](https://google.github.io/tacotron/publications/speaker_adaptation/) 92 | 93 | The good: 94 | 95 | 1. [most influential research in 2019 is deep learning papers](https://www.altmetric.com/top100/2019/) 96 | 1. [/r/machinelearning](https://reddit.com/r/machinelearning) 97 | 1. [recent open source AI programs](https://www.reddit.com/r/MachineLearning/comments/egyp7w/d_what_is_your_favorite_opensource_project_of/) 98 | 1. [The state of jobs in deep learning](https://www.reddit.com/r/MachineLearning/comments/egt6dp/d_are_decent_machine_learning_graduates_having_a/) 99 | 1. [The decade in review](https://leogao.dev/2019/12/31/The-Decade-of-Deep-Learning/) 100 | 101 | The bad: 102 | 103 | 1. [Machine learning reproducibility crisis](https://www.wired.com/story/artificial-intelligence-confronts-reproducibility-crisis/) 104 | 1. [Logistic regression vs deep learning in aftershock prediction](https://www.reddit.com/r/MachineLearning/comments/dcy2ar/r_one_neuron_versus_deep_learning_in_aftershock/) 105 | 1. [Pictures of black people](https://news.ycombinator.com/item?id=21147916) 106 | 1. [NLP Clever Hans BERT](https://thegradient.pub/nlps-clever-hans-moment-has-arrived/) 107 | 1. [Ex-Baidu researcher denies cheating at machine learning competition](https://www.enterpriseai.news/2015/06/12/baidu-fires-deep-images-ren-wu/) 108 | 109 | Computing resources: 110 | 111 | 1. [Google Colab](https://colab.research.google.com/notebooks/welcome.ipynb) provides 12 hours of free GPUs in a Jupyter notebook 112 | 1. [Kaggle](https://forums.fast.ai/t/kaggle-kernels-now-support-gpu-for-free/16217) provides 30 hours of free GPU 113 | 1. I have a 40CPU/8GPU machine that you can access for the course 114 | 1. I have another 4CPU/1GPU machine that needs someone to set it up 115 | 116 | Videos: 117 | 118 | 1. [3blue1brown](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi) 119 | 1. [2 minute papers](https://www.youtube.com/user/keeroyz) 120 | 1. [arxiv insights](https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg) 121 | 122 | 126 | 127 | ## Schedule 128 | 129 | | Week | Date | Topic | 130 | | ---- | ------------ | -------------------------------------- | 131 | | 1 | Tues, 21 Jan | Intro: Examples of Deep Learning | 132 | | 1 | Thur, 23 Jan | Automatic differentiation

[pytorch tutorial part 1](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html)
[pytorch tutorial part 2](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)
[automatic differentiation tutorial](https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation)
[einstein summation tutorial](https://rockt.github.io/2018/04/30/einsum)
[NeurIPS paper](http://papers.nips.cc/paper/8092-automatic-differentiation-in-ml-where-we-are-and-where-we-should-be-going)
[JMLR paper](http://www.jmlr.org/papers/v18/17-468.html)
[pytoch: forward mode ad](https://github.com/pytorch/pytorch/issues/10223)
[tensorflow: forward mode ad](https://github.com/pytorch/pytorch/issues/10223)

| 133 | | 2 | Tues, 28 Jan | Machine Learning Basics (Deep Learning Book Part 1, especially chapters 5.2-5.4) | 134 | | 2 | Thur, 30 Jan | Optimization

[why momentum really works](https://distill.pub/2017/momentum/)
[Leon Bottou's SGD paper](https://datajobs.com/data-science-repo/Stochastic-Gradient-Descent-[Leon-Bottou].pdf)
[pytorch loss functions](https://pytorch.org/docs/stable/nn.html#crossentropyloss)
[reflections on random kitchen sinks](http://www.argmin.net/2017/12/05/kitchen-sinks/)
[Ali Rahimi's NIPS/NeurIPS 2017 keynote](https://www.youtube.com/watch?v=Qi1Yry33TQE)
[OpenAI switches to PyTorch](https://openai.com/blog/openai-pytorch/)

| 135 | | 3 | Tues, 04 Feb | Image: CNNs

[Stanford lecture slides](http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture05.pdf)

| 136 | | 3 | Thur, 06 Feb | Image: CNNs II

[An intuitive explanation of CNNs](https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/) (compare with [eigenfaces](https://towardsdatascience.com/eigenfaces-recovering-humans-from-ghosts-17606c328184))
[The history of neural networks](https://dataconomy.com/2017/04/history-neural-networks/)

[Summer Research](https://www.cmc.edu/summer-research/program-overview) | 137 | | 4 | Tues, 11 Feb | Regularization | 138 | | 4 | Thur, 13 Feb | Image: ResNet

[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
[An Overview of ResNet and its Variants](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035)
[CVPR2016 Video](https://www.youtube.com/watch?v=C6tLw-rPQ2o)

More links:

[Schmidhuber on ResNet I](http://people.idsia.ch/~juergen/microsoft-wins-imagenet-through-feedforward-LSTM-without-gates.html)
[Schmidhuber on ResNet II](http://people.idsia.ch/~juergen/highway-networks.html)
[Baidu scandal at ILSVRC15](https://web.archive.org/web/20150602165531/http://www.image-net.org/challenges/LSVRC/announcement-June-2-2015)
[What I learned from competing against a convnet on ImageNet](http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/)
[MSCOCO](http://cocodataset.org)

| 139 | | 5 | Tues, 18 Feb | ResNet continued | 140 | | 5 | Thur, 20 Feb | ResNet continued

[DenseNet](https://arxiv.org/abs/1608.06993)
Visualizing the Landscape of Neural Network ([OpenReview](https://openreview.net/forum?id=HkmaTz-0W), [NIPS/NeurIPS](http://papers.nips.cc/paper/7875-visualizing-the-loss-landscape-of-neural-nets))

| 141 | | 6 | Tues, 25 Feb | YOLO

[YOLOv3: An Incremental Improvement](https://arxiv.org/abs/1804.02767)
[YOLO9000: Better, Faster, Stronger](https://arxiv.org/abs/1612.08242) and [reviews](https://pjreddie.com/publications/yolo9000/)
[You Only Look Once: Unified Real-Time Object Detection](http://arxiv.org/abs/1506.02640) and [reviews](https://pjreddie.com/publications/yolo/)
[YOLO video example](https://www.youtube.com/watch?v=MPU2HistivI)
[YOLO video presentation](https://www.youtube.com/watch?v=NM6lrxy0bxs&feature=youtu.be)
[Joseph Redmon's CV](https://pjreddie.com/static/Redmon%20Resume.pdf)
[Ethical concerns and YOLO](https://medium.com/syncedreview/yolo-creator-says-he-stopped-cv-research-due-to-ethical-concerns-b55a291ebb29)

The MvMF loss for geolocation:

[ECML-PKDD paper](https://izbicki.me/public/papers/ecmlpkdd2019-image-geolocation.pdf)

| 142 | | 6 | Thur, 27 Feb | Text: Basic text models

[bag of words]()
[tf-idf](http://www.tfidf.com/)
[n-grams](https://en.wikipedia.org/wiki/N-gram)
[zipf's law](https://en.wikipedia.org/wiki/Zipf%27s_law)
[hashing trick](https://booking.ai/dont-be-tricked-by-the-hashing-trick-192a6aae3087)

Python text processing libraries:

[spacy](https://spacy.io/)
[neuralcoref](https://github.com/huggingface/neuralcoref)
[NLTK](https://www.nltk.org/)
[TextBlob](https://textblob.readthedocs.io/en/dev/)
[textstat](https://pypi.org/project/textstat/)

| 143 | | 7 | Tues, 03 Mar | Text: CNNs

[character-level convolutional networks for text classification](http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classifica)
[very deep convolutional networks for text classification](https://arxiv.org/abs/1606.01781) (uses resnets internally)

Text: RNNs

[RNN vs GRU vs LSTM](https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs-lstm-difference-lies-in-more-flexible-control-5f33e07b1e57)

| 144 | | 7 | Thur, 05 Mar | Text: Lab exercise | 145 | | 8 | Tues, 10 Mar | Text: Seq2seq

[the unreasonable effectiveness of RNNs](https://karpathy.github.io/2015/05/21/rnn-effectiveness/)
[what is temperature?](https://cs.stackexchange.com/questions/79241/what-is-temperature-in-lstm-and-neural-networks-generally)
[sampling strategies in pictures](https://medium.com/machine-learning-at-petiteprogrammer/sampling-strategies-for-recurrent-neural-networks-9aea02a6616f)
[automatic image captioning](https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/)

| 146 | | 8 | Thur, 12 Mar | Text: Attention | 147 | | 9 | Tues, 17 Mar | **NO CLASS:** Spring Break | 148 | | 9 | Thur, 19 Mar | **NO CLASS:** Spring Break | 149 | | 10 | Tues, 24 Mar | Text: Transformers (paper, [blog post](http://jalammar.github.io/illustrated-transformer/)) | 150 | | 10 | Thur, 26 Mar |TBD | 151 | | 11 | Tues, 31 Mar | TBD | 152 | | 11 | Thur, 02 Apr | TBD | 153 | | 12 | Tues, 07 Apr | TBD | 154 | | 12 | Thur, 09 Apr | TBD | 155 | | 13 | Tues, 14 Apr | TBD | 156 | | 13 | Thur, 16 Apr | TBD | 157 | | 14 | Tues, 21 Apr | TBD | 158 | | 14 | Thur, 23 Apr | TBD | 159 | | 15 | Tues, 28 Apr | TBD | 160 | | 15 | Thur, 30 Apr | Project Presentations | 161 | | 16 | Thur, 05 May | Project Presentations | 162 | | 16 | Thur, 07 May | **NO CLASS:** Reading Day | 163 | 164 | 167 | 168 | 172 | 173 | 174 | 175 | 176 | ### Assignments 177 | 178 | | Week | Weight | Topic | 179 | | ---- | ------ | ------------------------------- | 180 | | 2 | 10 | Rosenbrock Function | 181 | | 3 | 10 | Crossentropy Loss | 182 | | 4 | 10 | CNN | 183 | | 6 | 10 | Image Transfer Learning | 184 | | 7 | 10 | RNN | 185 | | 10 | 10 | Text Transfer Learning | 186 | | -- | 10 | Reading | 187 | | 15 | 30 | Project | 188 | 189 | There are no exams in this course. 190 | 191 | **Late Work Policy:** 192 | 193 | You lose 10% on the assignment for each day late. 194 | If you have extenuating circumstances, contact me in advance of the due date and I may extend the due date for you. 195 | 196 | **Collaboration Policy:** 197 | 198 | You are encouraged to work together with other students on all assignments and use any online resources. 199 | Learning the course material is your responsibility, 200 | and so do whatever collaboration will help you learn the material. 201 | 202 | 246 | 247 | 258 | 259 | ## Accommodations for Disabilities 260 | 261 | I want you to succeed and I'll make every effort to ensure that you can. 262 | If you need any accommodations, please ask. 263 | 264 | If you have already established accommodations with Disability Services at CMC, please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course. You can start this conversation by forwarding me your accommodation letter. If you have not yet established accommodations through Disability Services, but have a temporary health condition or permanent disability (conditions include but are not limited to: mental health, attention-related, learning, vision, hearing, physical or health), you are encouraged to contact Assistant Dean for Disability Services & Academic Success, Kari Rood, at disabilityservices@cmc.edu to ask questions and/or begin the process. General information and the Request for Accommodations form can be found at the CMC DOS Disability Service’s website. Please note that arrangements must be made with advance notice in order to access the reasonable accommodations. You are able to request accommodations from CMC Disability Services at any point in the semester. Be mindful that this process may take some time to complete and accommodations are not retroactive. It is important to Claremont McKenna College to create inclusive and accessible learning environments consistent with federal and state law. If you are not a CMC student, please connect with the Disability Services Coordinator on your campus regarding a similar process. 265 | 266 | -------------------------------------------------------------------------------- /hw1/README.md: -------------------------------------------------------------------------------- 1 | # Intro to pytorch 2 | 3 | **Due:** Tuesday, 28 Jan at midnight 4 | 5 | ## Tasks 6 | 7 | You are required to complete the following tasks: 8 | 9 | 1. Install pytorch 10 | 1. Modify the `rosenbrock.py` file so that it calculates the minimum of the rosenbrock function 11 | 1. Upload your completed code to Sakai 12 | 13 | ### Optional 14 | 15 | You are not required to complete the following tasks, 16 | however they are good exercises to get you familiar with pytorch. 17 | 18 | 1. Modify the `rosenbrock` function so that instead of taking two scalar variables as input, it takes a 2-dimensional vector as input. 19 | You will also have to update the optimization code to handle the new type of input. 20 | 21 | 1. Extend the `rosenbrock` function to arbitrary dimensions using wikipedia's high-dimensional rosenbrock function: https://en.wikipedia.org/wiki/Rosenbrock_function 22 | 23 | ## Submission 24 | 25 | Upload your `rosenbrock.py` file to sakai 26 | -------------------------------------------------------------------------------- /hw1/rosenbrock.py: -------------------------------------------------------------------------------- 1 | ''' 2 | The rosenbrock test function is a common "banana-shaped" function to test how well optimization routines work. 3 | See: https://en.wikipedia.org/wiki/Rosenbrock_function 4 | ''' 5 | import torch 6 | 7 | def rosenbrock(x,y): 8 | a = 2 9 | b = 4 10 | return (a-x)**2 + b*(y-x**2)**2 11 | 12 | def rosenbrock_mod(x): 13 | a = 2 14 | b = 4 15 | return (a-x[0])**2 + b*(x[1]-x[0]**2)**2 16 | 17 | # add your code here 18 | alpha = 0.01 19 | x = torch.tensor([0.0,0.0],requires_grad=True) 20 | for i in range(5000): 21 | print('i=',i,'x=',x) 22 | z = rosenbrock_mod(x) 23 | z.backward() 24 | x = x - alpha * x.grad 25 | x = torch.tensor(x,requires_grad=True) 26 | -------------------------------------------------------------------------------- /hw2/README.md: -------------------------------------------------------------------------------- 1 | # Intro to pytorch 2 | 3 | **Due:** Tuesday, 4 Feb at midnight 4 | 5 | ## Required tasks 6 | 7 | Modify the `classifier.py` file so that: 8 | 9 | 1. Adjust the training procedure so that the test set is evaluated at the end of every epoch. 10 | 11 | 1. Implement three new models: the random feature model and the 1 hidden layer neural network model. 12 | You should add a command line argument `--model` which takes one of four options 13 | (`linear`, `factorized_linear`, `random_feature`, and `nn`) 14 | and another argument `--size` which takes takes an integer argument and controls the number of random features or the size of the hidden layer, depending on the model. 15 | This will require modifying the `define the model` and the `optimization` sections of the homework file. 16 | 17 | 1. Add a command line option to use the MNIST dataset instead of CIFAR10 for training and testing. 18 | (This will require changing code in both the `load dataset` and the `define the model` sections of code.) 19 | Torchvision has many other datasets as well (see https://pytorch.org/docs/stable/torchvision/datasets.html), and you can add these datasets too. 20 | 21 | You should experiment with different values of `--alpha`, `--epochs`, `--batch_size`, `--model`, and `--size` to see how they effect your training time and the resulting accuracy of your models. 22 | Try to find the combination that results in the best training accuracy. 23 | 24 | ## Recommended tasks 25 | 26 | You are not required to complete the following tasks, 27 | however they are good exercises to get you familiar with pytorch. 28 | 29 | 1. Currently, the print statement of the inner loop of the optimization prints the loss of a single batch of data. 30 | Because this is only a single batch of data, the loss value is highly noisy, and it is difficult to tell if the model is converging. 31 | The [exponential moving average](https://en.wikipedia.org/wiki/Moving_average) is a good way to smooth these values, 32 | and machine learning practitioners typically use this technique to smooth the training loss and measure convergence. 33 | Implement this technique in your `classifier.py` file. 34 | 35 | 1. Make the optimization use SGD with momentum. 36 | Add a command line flag that controls the strength of the momentum, 37 | and experiment to find a good momentum value. 38 | (Beta = 0.9 is often used.) 39 | 40 | 1. Add a "deep" neural network as one of the possible classifiers that has more than 1 hidden layer. 41 | Make the number of layers and the size of each layer a parameter on the command line. 42 | 43 | ## Submission 44 | 45 | Upload your `classifier.py` file to sakai 46 | 47 | -------------------------------------------------------------------------------- /hw2/classifier.py: -------------------------------------------------------------------------------- 1 | #!/bin/python3 2 | ''' 3 | <<<<<<< HEAD 4 | Here are some results of running this code: 5 | 6 | ======= 7 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 8 | > python3 classifier.py --dataset=mnist --model=linear 9 | test set accuracy = 0.9126833333333333 10 | > python3 classifier.py --dataset=mnist --model=factorized_linear 11 | test set accuracy = 0.8846833333333334 12 | > python3 classifier.py --dataset=mnist --model=neural_network --size=256 13 | test set accuracy = 0.92685 14 | > python3 classifier.py --dataset=mnist --model=kitchen_sink --size=256 15 | <<<<<<< HEAD 16 | test set accuracy = 0.8658333333333333 17 | ======= 18 | test set accuracy = 0.92685 19 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 20 | ''' 21 | 22 | # process command line args 23 | import argparse 24 | parser = argparse.ArgumentParser() 25 | <<<<<<< HEAD 26 | 27 | parser_model = parser.add_argument_group('model options') 28 | parser_model.add_argument('--model',choices=['linear','factorized_linear','kitchen_sink','neural_network'],default='linear') 29 | parser_model.add_argument('--size',type=int,default=32) 30 | 31 | parser_data = parser.add_argument_group('data options') 32 | parser_data.add_argument('--dataset',choices=['mnist','cifar10']) 33 | 34 | parser_opt = parser.add_argument_group('optimization options') 35 | parser_opt.add_argument('--seed',type=int) 36 | parser_opt.add_argument('--batch_size',type=int,default=16) 37 | parser_opt.add_argument('--alpha',type=float,default=0.01) 38 | parser_opt.add_argument('--epochs',type=int,default=10) 39 | 40 | parser_debug = parser.add_argument_group('debug options') 41 | parser_debug.add_argument('--show_image',action='store_true') 42 | parser_debug.add_argument('--print_step',type=int,default=1000) 43 | parser_debug.add_argument('--ema_alpha',type=float,default=0.99) 44 | parser_debug.add_argument('--eval_each_epoch',action='store_true') 45 | 46 | ======= 47 | parser.add_argument('--batch_size',type=int,default=16) 48 | parser.add_argument('--alpha',type=float,default=0.01) 49 | parser.add_argument('--epochs',type=int,default=10) 50 | parser.add_argument('--show_image',action='store_true') 51 | parser.add_argument('--size',type=int,default=32) 52 | parser.add_argument('--print_step',type=int,default=1000) 53 | parser.add_argument('--dataset',choices=['mnist','cifar10']) 54 | parser.add_argument('--ema_alpha',type=float,default=0.99) 55 | parser.add_argument('--model',choices=['linear','factorized_linear','kitchen_sink','neural_network'],default='linear') 56 | parser.add_argument('--eval_each_epoch',action='store_true') 57 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 58 | args = parser.parse_args() 59 | 60 | # imports 61 | import datetime 62 | import torch 63 | import torch.nn as nn 64 | import torchvision 65 | import torchvision.transforms as transforms 66 | 67 | <<<<<<< HEAD 68 | # make deterministic 69 | if args.seed is not None: 70 | torch.manual_seed(0) 71 | torch.backends.cudnn.deterministic = True 72 | torch.backends.cudnn.benchmark = False 73 | ======= 74 | # setting device on GPU if available, else CPU 75 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 76 | print('Using device:', device) 77 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 78 | 79 | # load dataset 80 | if args.dataset=='cifar10': 81 | transform = transforms.Compose([ 82 | transforms.ToTensor(), 83 | transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) 84 | ], 85 | ) 86 | trainset = torchvision.datasets.CIFAR10( 87 | root = './data', 88 | train = True, 89 | download = True, 90 | transform = transform, 91 | ) 92 | testset = torchvision.datasets.CIFAR10( 93 | root = './data', 94 | <<<<<<< HEAD 95 | train = False, 96 | ======= 97 | train = True, 98 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 99 | download = True, 100 | transform = transform, 101 | ) 102 | else: 103 | transform = transforms.Compose([ 104 | transforms.ToTensor(), 105 | transforms.Normalize(( 0.5,), ( 0.5,)) 106 | ], 107 | ) 108 | trainset = torchvision.datasets.MNIST( 109 | root = './data', 110 | train = True, 111 | download = True, 112 | transform = transform, 113 | ) 114 | testset = torchvision.datasets.MNIST( 115 | root = './data', 116 | <<<<<<< HEAD 117 | train = False, 118 | ======= 119 | train = True, 120 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 121 | download = True, 122 | transform = transform, 123 | ) 124 | 125 | trainloader = torch.utils.data.DataLoader( 126 | trainset, 127 | batch_size = args.batch_size, 128 | shuffle = True, 129 | ) 130 | testloader = torch.utils.data.DataLoader( 131 | testset, 132 | batch_size = args.batch_size, 133 | shuffle = True, 134 | ) 135 | 136 | # display images 137 | if args.show_image: 138 | import matplotlib.pyplot as plt 139 | import numpy as np 140 | 141 | def imshow(img): 142 | img = img / 2 + 0.5 # unnormalize 143 | npimg = img.numpy() 144 | plt.imshow(np.transpose(npimg, (1, 2, 0))) 145 | plt.show() 146 | 147 | # get some random training images 148 | dataiter = iter(trainloader) 149 | images, labels = dataiter.next() 150 | 151 | # show images 152 | imshow(torchvision.utils.make_grid(images)) 153 | 154 | # exit 155 | import sys 156 | sys.exit(0) 157 | 158 | # define the model 159 | images, labels = iter(trainloader).next() 160 | shape_input = images.shape[1:] 161 | shape_output = torch.Size([10]) 162 | h = torch.Size([args.size]) 163 | 164 | <<<<<<< HEAD 165 | w = torch.tensor(torch.randn(shape_input+shape_output),requires_grad=True) 166 | u = torch.tensor(torch.randn(shape_input+h),requires_grad=True) 167 | v = torch.tensor(torch.randn(h+shape_output),requires_grad=True) 168 | 169 | # typically hard code the order of tensors 170 | # typically not hard code the actual values of the dimension (shape) 171 | 172 | def linear(x): 173 | #return torch.einsum('bijk,ijkl -> bl',x,w) 174 | #print('x.shape=',x.shape) # 16,1,28,28 = bijk 175 | #print('w.shape=',w.shape) # 1,28,28,10 = ijkl 176 | out = torch.einsum('bijk,ijkl -> bl',x,w) 177 | #print('out.shape=',out.shape) # 10 = l 178 | return out 179 | 180 | def factorized_linear(x): 181 | return torch.einsum('bijk,ijkh,hl -> bl',x,u,v) 182 | 183 | def neural_network(x): 184 | net = torch.einsum('bijk,ijkh -> bh',x,u) 185 | net = torch.relu(net) 186 | #relu = torch.nn.ReLU() 187 | #net = relu(net) 188 | #net = torch.max(torch.zeros(net.shape),net) 189 | net = torch.einsum('bh,hl -> bl',net,v) 190 | return net 191 | 192 | ======= 193 | w = torch.tensor(torch.rand(shape_input+shape_output),requires_grad=True,device=device) 194 | u = torch.tensor(torch.rand(shape_input+h),requires_grad=True,device=device) 195 | v = torch.tensor(torch.rand(h+shape_output),requires_grad=True,device=device) 196 | 197 | def linear(x): 198 | return torch.einsum('bijk,ijkl -> bl',x,w) 199 | 200 | def factorized_linear(x): 201 | return torch.einsum('bijk,ijkh,hl -> bl',x,u,v) 202 | 203 | relu = nn.ReLU() 204 | def neural_network(x): 205 | net = torch.einsum('bijk,ijkh -> bh',x,u) 206 | net = relu(net) 207 | net = torch.einsum('bh,hl -> bl',net,v) 208 | return net 209 | 210 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 211 | kitchen_sink = neural_network 212 | 213 | f = eval(args.model) 214 | 215 | # eval on test set 216 | def eval_test_set(): 217 | correct = 0.0 218 | total = 0.0 219 | with torch.no_grad(): 220 | for data in testloader: 221 | images, labels = data 222 | outputs = f(images) 223 | _, predicted = torch.max(outputs.data, 1) 224 | total += labels.size(0) 225 | correct += (predicted == labels).sum().item() 226 | print('test set accuracy = ', correct/total) 227 | 228 | # optimize 229 | criterion = nn.CrossEntropyLoss() 230 | loss = float('inf') 231 | loss_ave = loss 232 | for epoch in range(args.epochs): 233 | for i, data in enumerate(trainloader, 0): 234 | if i%args.print_step==0: 235 | print( 236 | datetime.datetime.now(), 237 | 'epoch=',epoch, 238 | 'i=',i, 239 | 'loss_ave=',loss_ave 240 | ) 241 | images, labels = data 242 | images.cuda() 243 | labels.cuda() 244 | outputs = f(images) 245 | loss = criterion(outputs,labels) 246 | if loss_ave == float('inf'): 247 | loss_ave = loss 248 | else: 249 | loss_ave = args.ema_alpha * loss_ave + (1 - args.ema_alpha) * loss 250 | loss.backward() 251 | if args.model=='linear': 252 | w = w - args.alpha * w.grad 253 | w = torch.tensor(w,requires_grad=True) 254 | else: 255 | <<<<<<< HEAD 256 | #print('|u.grad|=',torch.norm(u.grad)) 257 | ======= 258 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 259 | if args.model!='kitchen_sink': 260 | u = u - args.alpha * u.grad 261 | u = torch.tensor(u,requires_grad=True) 262 | v = v - args.alpha * v.grad 263 | v = torch.tensor(v,requires_grad=True) 264 | <<<<<<< HEAD 265 | 266 | if args.eval_each_epoch: 267 | eval_test_set() 268 | 269 | if not args.eval_each_epoch: 270 | eval_test_set() 271 | ======= 272 | 273 | if args.eval_each_epoch: 274 | eval_test_set() 275 | 276 | if not args.eval_each_epoch: 277 | eval_test_set() 278 | 279 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac 280 | -------------------------------------------------------------------------------- /hw2/run_classifier.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | mkdir -p output 4 | 5 | python3 -u classifier.py --dataset=mnist --model=linear > output/linear 6 | python3 -u classifier.py --dataset=mnist --model=factorized_linear > output/factorized_linear 7 | python3 -u classifier.py --dataset=mnist --model=neural_network --size=256 > output/neural_network 8 | python3 -u classifier.py --dataset=mnist --model=kitchen_sink --size=256 > output/kitchen_sink 9 | 10 | for i in 16 32 64 128 256 512 1024; do 11 | python3 -u classifier.py --dataset=mnist --model=neural_network --size=$i --epoch=25 > output/neural_network.$i 12 | done 13 | -------------------------------------------------------------------------------- /hw3/README.md: -------------------------------------------------------------------------------- 1 | # CNNs 2 | 3 | **Due:** Thursday, 13 February at midnight 4 | 5 | ## Tasks 6 | 7 | You are required to complete the following tasks: 8 | 9 | 1. Use PyTorch's neural networks tutorial [part 1](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html) and [part 2](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) to create a python script that trains a small CNN on the CIFAR10 dataset 10 | 1. Modify the code to work on the Fashion-MNIST dataset as well 11 | 1. Modify the data input pipeline to include data augmentation using the `torchsample` library. See [this github repo](https://github.com/jiangqy/Data-Augmentation-Pytorch) for example code. 12 | 13 | ### Optional 14 | 15 | You are not required to complete the following tasks, 16 | however they are good exercises to get you familiar with pytorch. 17 | 18 | 1. Structure your python file so that it uses argparse to store all hyperparameters. 19 | 20 | 1. Write a shell script that experiments with different hyperparameter combinations. 21 | 22 | 1. Make one of the optional hyperparameters the use of the Adam optimizer instead of SGD. 23 | See [this pytorch tutorial](https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_optim.html) for an example. 24 | 25 | ## Submission 26 | 27 | Upload your python file to sakai 28 | 29 | -------------------------------------------------------------------------------- /hw4/README.md: -------------------------------------------------------------------------------- 1 | # ResNets 2 | 3 | **Due:** Tuesday, ~~25 February~~ 27 February at midnight 4 | 5 | ## Tasks 6 | 7 | You are required to complete the following tasks: 8 | 9 | 1. Extend your code from hw3 to implement a 20-layer resnet model (you do not need to use batch norm layers) 10 | 11 | ### Optional 12 | 13 | You are not required to complete the following tasks, 14 | however they are good exercises to get you familiar with pytorch. 15 | 16 | 1. Use the batch normalization layer 17 | 1. Reproduce the main result from the resnet paper by: 18 | 1. implement the 20 layer plain network, 56 layer resnet, and 56 layer plain network 19 | 1. verify that training error for the 56 layer plain model is worse than for the 20 layer plain model 20 | 1. verify that training error for the 56 layer resnet is better than the 20 layer resnet (and the 20/56 layer plain models) 21 | 1. Implement the dense blocks from the ``Densely Connected Convolutional Networks'' paper 22 | 23 | ## Submission 24 | 25 | Upload your python file to sakai 26 | 27 | -------------------------------------------------------------------------------- /hw4/resnet.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | # process command line args 4 | parser = argparse.ArgumentParser() 5 | 6 | parser_model = parser.add_argument_group('model options') 7 | parser_model.add_argument('--connections',choices=['plain','resnet'],default='resnet') 8 | parser_model.add_argument('--size',type=int,default=20) 9 | 10 | parser_opt = parser.add_argument_group('optimization options') 11 | parser_opt.add_argument('--batch_size',type=int,default=16) 12 | parser_opt.add_argument('--learning_rate',type=float,default=0.01) 13 | parser_opt.add_argument('--epochs',type=int,default=10) 14 | parser_opt.add_argument('--warm_start',type=str,default=None) 15 | 16 | parser_data = parser.add_argument_group('data options') 17 | parser_data.add_argument('--dataset',choices=['mnist','cifar10']) 18 | 19 | parser_debug = parser.add_argument_group('debug options') 20 | parser_debug.add_argument('--show_image',action='store_true') 21 | parser_debug.add_argument('--print_delay',type=int,default=60) 22 | parser_debug.add_argument('--log_dir',type=str) 23 | parser_debug.add_argument('--eval',action='store_true') 24 | 25 | args = parser.parse_args() 26 | 27 | # load libraries 28 | import datetime 29 | import os 30 | import sys 31 | import time 32 | 33 | import torch 34 | import torch.nn as nn 35 | from torch.utils.tensorboard import SummaryWriter 36 | import torchvision 37 | import torchvision.transforms as transforms 38 | import matplotlib.pyplot as plt 39 | import numpy as np 40 | 41 | # load data 42 | if args.dataset=='cifar10': 43 | image_shape=[3,32,32] 44 | 45 | transform = transforms.Compose( 46 | [ transforms.RandomHorizontalFlip() 47 | , transforms.ToTensor() 48 | , transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) 49 | ]) 50 | 51 | trainset = torchvision.datasets.CIFAR10( 52 | root='./data', 53 | train=True, 54 | download=True, 55 | transform=transform 56 | ) 57 | trainloader = torch.utils.data.DataLoader( 58 | trainset, 59 | batch_size=args.batch_size, 60 | shuffle=True, 61 | num_workers=2 62 | ) 63 | 64 | testset = torchvision.datasets.CIFAR10( 65 | root='./data', 66 | train=False, 67 | download=True, 68 | transform=transform 69 | ) 70 | testloader = torch.utils.data.DataLoader( 71 | testset, 72 | batch_size=args.batch_size, 73 | shuffle=True, 74 | num_workers=2 75 | ) 76 | 77 | if args.dataset=='mnist': 78 | image_shape=[1,28,28] 79 | 80 | transform = transforms.Compose( 81 | [ transforms.RandomHorizontalFlip() 82 | , transforms.ToTensor() 83 | , transforms.Normalize((0.5,), (0.5,)) 84 | ]) 85 | 86 | trainset = torchvision.datasets.MNIST( 87 | root='./data', 88 | train=True, 89 | download=True, 90 | transform=transform 91 | ) 92 | trainloader = torch.utils.data.DataLoader( 93 | trainset, 94 | batch_size=args.batch_size, 95 | shuffle=True, 96 | num_workers=2 97 | ) 98 | 99 | testset = torchvision.datasets.MNIST( 100 | root='./data', 101 | train=False, 102 | download=True, 103 | transform=transform 104 | ) 105 | testloader = torch.utils.data.DataLoader( 106 | testset, 107 | batch_size=args.batch_size, 108 | shuffle=True, 109 | num_workers=2 110 | ) 111 | 112 | # show image 113 | if args.show_image: 114 | def imshow(img): 115 | img = img / 2 + 0.5 # unnormalize 116 | npimg = img.numpy() 117 | plt.imshow(np.transpose(npimg, (1, 2, 0))) 118 | plt.show() 119 | dataiter = iter(trainloader) 120 | images, labels = dataiter.next() 121 | imshow(torchvision.utils.make_grid(images)) 122 | 123 | # define the model 124 | def conv3x3(channels_in, channels_out): 125 | """3x3 convolution with padding""" 126 | return nn.Conv2d( 127 | channels_in, 128 | channels_out, 129 | kernel_size=3, 130 | stride=1, 131 | padding=1, 132 | groups=1, 133 | bias=False, 134 | dilation=dilation 135 | ) 136 | 137 | class ResnetBlock(nn.Module): 138 | def __init__( 139 | self, 140 | channels, 141 | use_bn = True, 142 | ): 143 | super(BasicBlock, self).__init__() 144 | norm_layer = torch.nn.BatchNorm2d 145 | self.use_bn = use_bn 146 | self.conv1 = conv3x3(channels, channels, stride) 147 | if self.use_bn: 148 | self.bn1 = norm_layer(channels) 149 | self.relu = nn.ReLU(inplace=True) 150 | self.conv2 = conv3x3(channels, channels) 151 | if self.use_bn: 152 | self.bn2 = norm_layer(channels) 153 | self.downsample = downsample 154 | self.stride = stride 155 | 156 | def forward(self, x): 157 | identity = x 158 | out = self.conv1(x) 159 | if self.use_bn: 160 | out = self.bn1(out) 161 | out = self.relu(out) 162 | out = self.conv2(out) 163 | if self.use_bn: 164 | out = self.bn2(out) 165 | out += identity 166 | out = self.relu(out) 167 | return out 168 | 169 | import functools 170 | image_size = functools.reduce(lambda x, y: x * y, image_shape, 1) 171 | 172 | class Net(nn.Module): 173 | def __init__(self): 174 | super(Net, self).__init__() 175 | self.fc = torch.nn.Linear(image_size,10) 176 | pass 177 | 178 | def forward(self, x): 179 | out = x.view(args.batch_size,image_size) 180 | out = self.fc(out) 181 | return out 182 | 183 | net = Net() 184 | 185 | # load pretrained model 186 | if args.warm_start is not None: 187 | print('warm starting model from',args.warm_start) 188 | model_dict = torch.load(os.path.join(args.warm_start,'model')) 189 | net.load_state_dict(model_dict['model_state_dict']) 190 | 191 | # create save dir 192 | log_dir = args.log_dir 193 | if log_dir is None: 194 | log_dir = 'log/'+str(datetime.datetime.now()) 195 | 196 | try: 197 | os.mkdir(log_dir) 198 | except FileExistsError: 199 | print('cannot create log dir,',log_dir,'already exists') 200 | sys.exit(1) 201 | 202 | writer = SummaryWriter(log_dir=log_dir) 203 | 204 | # train the model 205 | criterion = nn.CrossEntropyLoss() 206 | optimizer = torch.optim.SGD(net.parameters(), lr=args.learning_rate, momentum=0.9) 207 | net.train() 208 | 209 | total_iter = 0 210 | last_print = 0 211 | 212 | steps = 0 213 | for epoch in range(args.epochs): 214 | for i, data in enumerate(trainloader): 215 | steps += 1 216 | inputs, labels = data 217 | optimizer.zero_grad() 218 | outputs = net(inputs) 219 | loss = criterion(outputs, labels) 220 | loss.backward() 221 | optimizer.step() 222 | 223 | # accuracy 224 | prediction = torch.argmax(outputs,dim=1) 225 | accuracy = (prediction==labels).float().mean() 226 | 227 | # tensorboard 228 | writer.add_scalar('train/loss', loss.item(), steps) 229 | writer.add_scalar('train/accuracy', accuracy.item(), steps) 230 | 231 | # print statistics 232 | total_iter += 1 233 | if time.time() - last_print > args.print_delay: 234 | print(datetime.datetime.now(),'epoch = ',epoch,'steps=',steps,'batch/sec=',total_iter/args.print_delay) 235 | total_iter = 0 236 | last_print = time.time() 237 | 238 | torch.save({ 239 | 'epoch':epoch, 240 | 'model_state_dict': net.state_dict(), 241 | 'optimizer_state_dict': optimizer.state_dict(), 242 | 'loss':loss 243 | }, os.path.join(log_dir,'model')) 244 | 245 | 246 | # test set 247 | if args.eval: 248 | print('evaluating model') 249 | net.eval() 250 | 251 | loss_total = 0 252 | accuracy_total = 0 253 | for i, data in enumerate(testloader): 254 | inputs, labels = data 255 | outputs = net(inputs) 256 | loss = criterion(outputs, labels) 257 | 258 | # accuracy 259 | prediction = torch.argmax(outputs,dim=1) 260 | accuracy = (prediction==labels).float().mean() 261 | 262 | # update variables 263 | loss_total += loss.item() 264 | accuracy_total += accuracy.item() 265 | 266 | print('loss=',loss_total/i) 267 | print('accuracy=',accuracy_total/i) 268 | 269 | -------------------------------------------------------------------------------- /hw5/README.md: -------------------------------------------------------------------------------- 1 | # YOLOv3 lol 2 | 3 | **Due:** Tuesday, 3 March at midnight 4 | 5 | **Learning Objective:** 6 | 7 | 1. use other people's implementations of pytorch models 8 | 1. gain familiarity with YOLO 9 | 10 | ## Tasks 11 | 12 | You are required to complete the following tasks: 13 | 14 | 1. [This github repo](https://github.com/eriklindernoren/PyTorch-YOLOv3) contains a pytorch implementation of YOLOv3. 15 | Follow the directions to apply the YOLO object detection model to several of your own images. 16 | 1. Adjust the code so that it outputs images at their original resolution by changing the call to `plt.subplots(1)` to read 17 | ``` 18 | plt.subplots(1,figsize=(img.shape[1]/96, img.shape[0]/96), dpi=96) 19 | ``` 20 | 1. Upload your favorite images to sakai. 21 | 22 | Example images I created are: 23 | 24 |

25 |

26 |

27 |

28 |

29 |

30 | 31 | ### Optional 32 | 33 | You are not required to complete the following tasks, 34 | however they are good exercises to get you familiar with pytorch. 35 | 36 | 1. The code resizes all images before passing it to the YOLO model using the size specified by the `--img_size` parameter. 37 | Experiment with different values of this parameter to see the effects on the outputs. 38 | You should notice that smaller images result in fewer objects detected, 39 | but larger images require more computation. 40 | 1. Other parameters such as `--conf_thres` and `--nms_thres` adjust the model's results by trading off between object classification accuracy and localization accuracy. 41 | Try adjusting these parameters to get better results on your images. 42 | 43 | ## Submission 44 | 45 | Upload your images to sakai. 46 | You do not need to upload any code. 47 | 48 | -------------------------------------------------------------------------------- /hw5/cmc1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/cmc1.png -------------------------------------------------------------------------------- /hw5/cmc2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/cmc2.png -------------------------------------------------------------------------------- /hw5/mike1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike1.png -------------------------------------------------------------------------------- /hw5/mike2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike2.png -------------------------------------------------------------------------------- /hw5/mike3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike3.png -------------------------------------------------------------------------------- /hw5/mike4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike4.png -------------------------------------------------------------------------------- /hw5/mike5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike5.png -------------------------------------------------------------------------------- /hw6/README.md: -------------------------------------------------------------------------------- 1 | # Name Classifier 2 | 3 | In this assignment, you will create a model that can predict the nationality of surnames. 4 | 5 | **Due:** ~~Thursday 12 March~~ Sunday 29 March at midnight 6 | 7 | **Learning Objective:** 8 | 9 | 1. gain familiarity with character-level text models (RNN / CNN) 10 | 1. effectively use tensorboard to monitor the training of models and select hyperparameters 11 | 12 | ## Tasks 13 | 14 | Complete the following required tasks. 15 | 16 | 1. **Download the starter code and data:** 17 | On the command line, run: 18 | ``` 19 | $ wget https://github.com/mikeizbicki/cmc-csci181/blob/master/hw6/names.py 20 | $ wget https://github.com/mikeizbicki/cmc-csci181/blob/master/hw6/names.tar.gz 21 | $ tar -xf names.tar.gz 22 | ``` 23 | You should always manually inspect the data values before performing any coding or model training. 24 | In this case, get a list of the data files by running 25 | ``` 26 | $ ls names 27 | ``` 28 | Notice that there is a file for several different nationalities. 29 | Inspect the values of some of these files. 30 | ``` 31 | $ head names/English.txt 32 | $ head names/Spanish.txt 33 | $ head names/Korean.txt 34 | ``` 35 | Notice that the names have been romanized for all languages, 36 | but the names are not entirely ASCII. 37 | We will revisit this fact later. 38 | 39 | Also notice that each line contains a unique name. 40 | To get the total number of lines in each file (and therefore the total number of examples for each class), run 41 | ``` 42 | $ wc -l names/* 43 | ``` 44 | Observe based on this output that the training data is not equally balanced between classes. 45 | 46 | We will not be dividing this data up into a train/test split. 47 | In this case, that is not needed, because our data is essentially exhaustive of all possible names. 48 | (For example, the 94 Korean surnames account for >99% of all Korean last names.) 49 | Our primary goal is not to generalize to unseen names, 50 | but rather to have an efficient "compressed" representation of all names. 51 | This wil let us create a function for assigning the nationality to a name without having to explicitly store and search all 20,000 names. 52 | (As a side benefit, this function will generalize to typos and other unseen data, but we're not going to explicitly evaluate its ability to do this.) 53 | 54 | Compressing a training set without a test set is actually a common setting in deep learning. 55 | The [Hutter Prize](http://prize.hutter1.net/) will award $500,000 to the first people to efficient compress all human knowledge (i.e. wikipedia), 56 | and Google has a [similar competition](https://www.androidpolice.com/2018/01/12/compress-google-issues-machine-learning-challenge-build-better-jpeg/) for improving jpeg image compression. 57 | Many AGI researchers believe that the problem of creating an optimal data compression scheme is isomorphic to creating artificial intelligence. 58 | 59 | 1. **Different learning rates:** 60 | At the command prompt, execute the following line: 61 | ``` 62 | $ python3 names.py --train --learning_rate=1e-1 63 | ``` 64 | In a separate command prompt, launch tensorboard with the line: 65 | ``` 66 | $ tensorboard --logdir=log 67 | ``` 68 | You should observe that the loss function is diverging. 69 | Experiment with different learning rates to find the optimal value (i.e. the largest value that causes the loss to converge to zero). 70 | 71 | **NOTE:** 72 | In order to easily interpret the tensorboard plots, you may have to increasing the smoothing paramaeter very close to 1. 73 | I used a value of 0.99. 74 | 75 | **Question 1:** 76 | What is the optimal learning rate you found? 77 | 78 | 1. **Gradient clipping:** 79 | Tensorboard is recording three values: the training accuracy, the training loss, and the norm of the gradient. 80 | Notice that as training progresses, the norm of the gradient increases. 81 | This is called the *exploding gradient problem*. 82 | 83 | The standard solution to the exploding gradient problem is *gradient clipping*. 84 | In gradient clipping, we first measure the L2 norm of the gradient; 85 | then, if it is larger than some threshold value, we shrink the gradient so that it points in the same direction but has norm equal to the threshold. 86 | 87 | To add support for gradient clipping to your code, 88 | paste the following lines just before the call to `optimizer.step`. 89 | ``` 90 | if args.gradient_clipping: 91 | torch.nn.utils.clip_grad_norm_(model.parameters(),1.0) 92 | ``` 93 | 94 | Rerun the model training code, but now with the `--gradient_clipping` flag to enable gradient clipping. 95 | Once again, experiment to find the optimal value for the learning rate. 96 | 97 | **Question 2:** 98 | What is the new optimal learning rate you found with gradient clipping enabled? 99 | 100 | **Question 3:** 101 | Which set of hyperparameters is converging faster? 102 | 103 | At this point, hopefully this XKCD comic is starting to make sense: 104 |

105 | 106 |

107 | 108 | 1. **Optimization method:** 109 | [Adam](https://arxiv.org/abs/1412.6980) is a popular alternative to SGD for optimizing models that was invented in 2014. 110 | The basic idea behind Adam is that not all parameters will need to take the same step size on each iteration, 111 | and so we should somehow learn the optimal step size for each parameter independently. 112 | It almost always converges much faster than SGD in practice, but sometimes has worse generalization error. 113 | Because of the fast convergence, Adam is widely used in practice. 114 | The original paper has over [38k citations on google scholar](https://scholar.google.com/scholar?cluster=16194105527543080940). 115 | (I think it's the second most cited paper of all time after the resnet paper, but I'm not 100% sure how to check this.) 116 | Unfortunately, however, [a 2018 paper](https://openreview.net/forum?id=ryQu7f-RZ) found a fatal flaw in the proof of convergence of the Adam paper, and showed that Adam is guaranteed not to converge even on some simple convex problems. 117 | Despite this flaw, Adam remains widely popular, is the optimizer of choice for thousands of pytorch users, and has thousands of citations already this year. 118 | In the deep learning world, people simply don't care about proofs yet. 119 | 120 | To add support for the Adam optimizer to the code, 121 | paste the following lines below the code for the SGD optimizer. 122 | ``` 123 | if args.optimizer == 'adam': 124 | optimizer = torch.optim.Adam( 125 | model.parameters(), 126 | lr=args.learning_rate, 127 | weight_decay=args.weight_decay 128 | ) 129 | ``` 130 | 131 | Use the `--optimizer=adam` flag to train a model using Adam instead of SGD. 132 | Like SGD, Adam takes a learning rate hyperparameter, 133 | 134 | **Question 4:** 135 | What is the optimal learning rate for Adam? 136 | and you should experiment with different values to find the optimal value. 137 | 138 | The [`torch.optim`](https://pytorch.org/docs/stable/optim.html?highlight=torch optim) module contains many other optimizers that you can use. 139 | Select one of these additional optimizers to include in your code, 140 | and make the appropriate adjustments in the arguments list and training loop. 141 | 142 | **Question 5:** 143 | Which combination of optimizer/hyperparameters is converging faster? 144 | 145 | 1. **Different types of RNNs:** 146 | There are three different types of RNNs is common use. 147 | So far, your model has been using "vanilla" RNNs, 148 | which is what we discussed in class. 149 | Two other types are *gated recurrent units* (GRUs) and *long short term memories* (LSTMs). 150 | GRUs and LSTMs have more complicated activation functions that try to better capture long-term dependencies within the input text. 151 | Visit [this webpage](https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs-lstm-difference-lies-in-more-flexible-control-5f33e07b1e57) to see a picture representation of each of the RNN units. 152 | 153 | Understanding in detail the differences between the types of RNNs is not important. 154 | What is important is that the inputs and outputs of vanilla RNNs, GRUs, and LSTMs are all the same. 155 | This means that you can easily switch between the type of recurrent network by simply calling the appropriate torch library function. 156 | (The functions are `torch.nn.RNN`, `torch.nn.GRU`, and `torch.nn.LSTM`.) 157 | 158 | Adjust the `Model` class so that it uses either RNNs, GRUs, or LSTMs depending on the value of the `--model` input parameter. 159 | 160 | **Question 6:** 161 | Once again, experiment with different combinations of hyperparameters using these new layers. 162 | What gives the best results? 163 | 164 | 1. **RNN size:** 165 | There are two hyperparameters that control the "size" of an RNN. 166 | The advantages of larger sizes are: better training accuracy and improved ability to capture longterm dependencies within text. 167 | The disadvantages are: longer training time and worse generalization error. 168 | 169 | Adjust the `Model` class so that the calls to `torch.nn.RNN`,`torch.nn.GRU`, and `torch.nn.LSTM` use the `--hidden_layer_size` and `--num_layers` command line flags to determine the size of the hidden layers and number (respectively). 170 | 171 | **Question 7:** 172 | Experiment with different model sizes to find a good balance between speed and accuracy. 173 | 174 | 1. **Change batch size:** 175 | Currently, the training procedure uses a fixed batch size with only a single training example. 176 | There are two key functions for generating batch tensors from strings. 177 | 178 | The first is `unicode_to_ascii`, which converts a Unicode string into a Latin-only alphabet representation. 179 | Run this command on the strings `Izbicki`, `Ízbìçkï` and `이즈비키` to see how they get processed and the limitations of our current system. 180 | 181 | The second is `str_to_tensor`, which converts an input string into a 3rd order tensor. 182 | Notice that the first dimension is for the length of the string, and the second dimension is for the batch size. 183 | This is a standard pytorch convention. 184 | 185 | Modify the `str_to_tensor` function so that it takes a list of b strings as input instead of only a single string. 186 | The input strings are unlikely to be all of the same length, 187 | but the output tensor must have the same length for each string. 188 | To solve this problem, the first dimension of the tensor will have the largest size of all of the input strings; 189 | then, the remaining input strings will have their slices padded with all zeros to fill the space. 190 | To help the model understand when it reaches the end of a name, the special `$` character is used to symbolize the end of a name, and this should be inserted at the end of each string, before the zero padding. 191 | 192 | Next, you will need to modify the data sampling step to sample `args.batch_size` data points on each iteration. 193 | This is the part of the code after the comment `# get random training example`. 194 | 195 | **Question 8:** 196 | Experiment with a larger batch size. 197 | This should make training your model a bit faster because the matrix multiplications in your CPU will have better cache coherency. 198 | (A single step of batch size 10 should take only about 5x longer than a step of batch size 1.) 199 | A larger batch size will also reduce the variance of each step of SGD/Adam, and so a larger step size can be used. 200 | As a rule of thumb, increasing the batch size by a factor of `a` will let you increase the learning rate by a factor of `a` as well. 201 | 202 | With a batch size of 16, what is the new optimal learning rate? 203 | 204 | 1. **Add CNN support:** 205 | CNNs typically have a linear layer on top of them, 206 | and this linear layer requires that all inputs have a fixed length. 207 | 208 | 1. Modify the `str_to_tensor` function so that if the `--input_length` parameter is specified, 209 | then the tensor is padded/truncated so that the first dimension has size `args.input_length`. 210 | 211 | 1. Modify the `Model` class so that if the `--model=cnn` parameter is specified, 212 | then a cnn is used (the `torch.nn.Conv1d` function). 213 | Your implementation should use a width 3 filter. 214 | `--hidden_layer_size` as the number of channels, 215 | and your should have `--num_layers` cnn layers. 216 | 217 | **Question 9:** 218 | Experiment with different hyperparameters to find the best combination for the CNN model. 219 | How does the CNN model compare to the RNN models? 220 | 221 | 1. **Longrun model training:** 222 | Once you have a set of model hyperparameters that you like, 223 | then increase `--samples` to 100000 to train a more accurate model. 224 | (Depending on your specific hyperparameters, you may need to use an even larger number of samples to get the model to converge.) 225 | Then, use the `--warm_start` parameter to reload this model, 226 | and train for another 100000 samples (but this time with a learning rate lowered by a factor of 10). 227 | Repeat this procedure one more time. 228 | 229 | The whole procedure should take 10-30 minutes depending on the speed of your computer and the complexity of your model. 230 | This would be a good point to have an office chair jousting dual 231 |

232 | 233 |

234 | (Comic modified from https://xkcd.com/303/) 235 | 236 | 1. **Inference:** 237 | You can use the `--infer` parameter combined with `--warm_start` to use the model for inference (sometimes called model *deployment*). 238 | In this mode, `names.py` passes each line in stdin to the model and outputs the class predictions. 239 | 240 | You should modify the inference code so that instead of outputting a single prediction, 241 | it outputs the model's top 3 predictions along with the probability associated with each prediction. 242 | 243 | To get more than the top 1 prediction, you will have to change how the `topk` function is called. 244 | 245 | To convert the `output` tensor into probabilities, you will have to apply the `torch.nn.Softmax` function. 246 | 247 | ## Submission 248 | 249 | Upload your code to sakai. 250 | Hand in a hard copy of your completed answers. 251 | -------------------------------------------------------------------------------- /hw6/img/xkcd-training.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw6/img/xkcd-training.png -------------------------------------------------------------------------------- /hw6/names.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # process command line args 4 | import argparse 5 | parser = argparse.ArgumentParser(fromfile_prefix_chars='@') 6 | 7 | parser_model = parser.add_argument_group('model options') 8 | parser_model.add_argument('--model',choices=['cnn','rnn','gru','lstm'],default='rnn') 9 | parser_model.add_argument('--hidden_layer_size',type=int,default=128) 10 | parser_model.add_argument('--num_layers',type=int,default=1) 11 | 12 | parser_opt = parser.add_argument_group('optimization options') 13 | parser_opt.add_argument('--batch_size',type=int,default=128) 14 | parser_opt.add_argument('--learning_rate',type=float,default=1e-1) 15 | parser_opt.add_argument('--optimizer',choices=['sgd','adam'],default='sgd') 16 | parser_opt.add_argument('--gradient_clipping',action='store_true') 17 | parser_opt.add_argument('--momentum',type=float,default=0.9) 18 | parser_opt.add_argument('--weight_decay',type=float,default=1e-4) 19 | parser_opt.add_argument('--samples',type=int,default=10000) 20 | parser_opt.add_argument('--input_length',type=int) 21 | parser_opt.add_argument('--warm_start') 22 | 23 | parser_data = parser.add_argument_group('data options') 24 | parser_data.add_argument('--data',default='names') 25 | 26 | parser_debug = parser.add_argument_group('debug options') 27 | parser_debug.add_argument('--print_delay',type=int,default=5) 28 | parser_debug.add_argument('--log_dir',type=str) 29 | parser_debug.add_argument('--save_every',type=int,default=1000) 30 | parser_debug.add_argument('--infer',action='store_true') 31 | parser_debug.add_argument('--train',action='store_true') 32 | 33 | args = parser.parse_args() 34 | 35 | # load args from file if warm starting 36 | if args.warm_start is not None: 37 | import sys 38 | import os 39 | args_orig = args 40 | args = parser.parse_args(['@'+os.path.join(args.warm_start,'args')]+sys.argv[1:]) 41 | args.train = args_orig.train 42 | 43 | # load modules 44 | import datetime 45 | import glob 46 | import os 47 | import math 48 | import random 49 | import string 50 | import sys 51 | import time 52 | import unicodedata 53 | 54 | import torch 55 | import torch.nn as nn 56 | from torch.utils.tensorboard import SummaryWriter 57 | 58 | # import the training data 59 | vocabulary = string.ascii_letters + " .,;'$" 60 | 61 | def unicode_to_ascii(s): 62 | ''' 63 | Removes diacritics from unicode characters. 64 | See: https://stackoverflow.com/a/518232/2809427 65 | ''' 66 | return ''.join( 67 | c for c in unicodedata.normalize('NFD', s) 68 | if unicodedata.category(c) != 'Mn' 69 | and c in vocabulary 70 | ) 71 | 72 | # Build the category_lines dictionary, a list of names per language 73 | category_lines = {} 74 | all_categories = [] 75 | 76 | for filename in glob.glob(os.path.join(args.data,'*.txt')): 77 | category = os.path.splitext(os.path.basename(filename))[0] 78 | all_categories.append(category) 79 | lines = open(filename, encoding='utf-8').read().strip().split('\n') 80 | lines = [unicode_to_ascii(line) for line in lines] 81 | category_lines[category] = lines 82 | 83 | n_categories = len(all_categories) 84 | 85 | def str_to_tensor(s): 86 | ''' 87 | converts aa string into a = 0.8 (only for the model without the `--conditional_model` flag) 40 | 41 | 1. Use tensorboard.dev to upload your tensorboard training runs to the cloud. 42 | You must create two separate tensorboard.dev webpages, 43 | one for each of the two sets of models trained above. 44 | Each of the tensorboard.dev pages should not have unrelated training runs appear in the plots. 45 | 46 | Here are the examples that I created in the videos: 47 | 1. unconditional model: https://tensorboard.dev/experiment/AMAd6axxQEuP2P20PIDKpQ/#scalars&_smoothingWeight=0.99 48 | 1. conditional model: https://tensorboard.dev/experiment/x99kAW5cQQ2NMgwU0lOdDQ/#scalars&_smoothingWeight=0.9 49 | 52 | 53 | ### Optional tasks 54 | 55 | 1. Modify the `CNNModel` class so that it also predicts the next character. 56 | 57 | In the video lectures, we only discussed how to modify the `RNNModel` class to predict the next character. 58 | The `CNNModel` class can also be used for predicting the next character in the same way. 59 | (And in fact, using `--model=cnn` currently results in a crash because of our changes to the training code.) 60 | 61 | *Bonus question:* 62 | Using a `CNNModel` to predict the next character is special case of using a *markov chain* to predict the next character of text. 63 | Why? 64 | 65 | 1. In the video lectures, I mention that you can use rejection sampling to sample from the conditional distributions when the model is an unconditional model. 66 | Implement this algorithm. 67 | 68 | 1. Implement beam search in your generation code. 69 | 70 | ## Submission 71 | 72 | 1. Submit the links to your tensorboard.dev pages on sakai. 73 | -------------------------------------------------------------------------------- /hw7/names.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # process command line args 4 | import argparse 5 | parser = argparse.ArgumentParser(fromfile_prefix_chars='@') 6 | 7 | parser_control = parser.add_argument_group('control options') 8 | parser_control.add_argument('--infer',action='store_true') 9 | parser_control.add_argument('--train',action='store_true') 10 | 11 | parser_data = parser.add_argument_group('data options') 12 | parser_data.add_argument('--data',default='names') 13 | 14 | parser_model = parser.add_argument_group('model options') 15 | parser_model.add_argument('--model',choices=['cnn','rnn','gru','lstm'],default='rnn') 16 | parser_model.add_argument('--hidden_layer_size',type=int,default=128) 17 | parser_model.add_argument('--num_layers',type=int,default=1) 18 | 19 | parser_opt = parser.add_argument_group('optimization options') 20 | parser_opt.add_argument('--batch_size',type=int,default=1) 21 | parser_opt.add_argument('--learning_rate',type=float,default=1e-1) 22 | parser_opt.add_argument('--optimizer',choices=['sgd','adam'],default='sgd') 23 | parser_opt.add_argument('--gradient_clipping',action='store_true') 24 | parser_opt.add_argument('--momentum',type=float,default=0.9) 25 | parser_opt.add_argument('--weight_decay',type=float,default=1e-4) 26 | parser_opt.add_argument('--samples',type=int,default=10000) 27 | parser_opt.add_argument('--input_length',type=int,default=20) 28 | parser_opt.add_argument('--warm_start') 29 | 30 | parser_debug = parser.add_argument_group('debug options') 31 | parser_debug.add_argument('--print_delay',type=int,default=5) 32 | parser_debug.add_argument('--log_dir',type=str) 33 | parser_debug.add_argument('--save_every',type=int,default=1000) 34 | 35 | args = parser.parse_args() 36 | 37 | if args.model=='cnn' and args.input_length is None: 38 | raise ValueError('if --model=cnn, then you must specify --input_length') 39 | 40 | # load args from file if warm starting 41 | if args.warm_start is not None: 42 | import sys 43 | import os 44 | args_orig = args 45 | args = parser.parse_args(['@'+os.path.join(args.warm_start,'args')]+sys.argv[1:]) 46 | args.train = args_orig.train 47 | 48 | # supress warnings 49 | import warnings 50 | warnings.simplefilter(action='ignore', category=FutureWarning) 51 | 52 | # load modules 53 | import datetime 54 | import glob 55 | import os 56 | import math 57 | import random 58 | import string 59 | import sys 60 | import time 61 | import unicodedata 62 | 63 | import torch 64 | import torch.nn as nn 65 | from torch.utils.tensorboard import SummaryWriter 66 | 67 | # import the training data 68 | vocabulary = string.ascii_letters + " .,;'$" 69 | 70 | def unicode_to_ascii(s): 71 | ''' 72 | Removes diacritics from unicode characters. 73 | See: https://stackoverflow.com/a/518232/2809427 74 | ''' 75 | return ''.join( 76 | c for c in unicodedata.normalize('NFD', s) 77 | if unicodedata.category(c) != 'Mn' 78 | and c in vocabulary 79 | ) 80 | 81 | # Build the category_lines dictionary, a list of names per language 82 | category_lines = {} 83 | all_categories = [] 84 | for filename in glob.glob(os.path.join(args.data,'*.txt')): 85 | category = os.path.splitext(os.path.basename(filename))[0] 86 | all_categories.append(category) 87 | lines = open(filename, encoding='utf-8').read().strip().split('\n') 88 | lines = [unicode_to_ascii(line) for line in lines] 89 | category_lines[category] = lines 90 | 91 | def str_to_tensor(ss,input_length=None): 92 | ''' 93 | Converts a list of strings into a tensor of shape . 94 | This is used to convert text into a form suitable for input into a RNN/CNN. 95 | ''' 96 | max_length = max([len(s) for s in ss]) + 1 97 | if input_length: 98 | max_length = input_length 99 | tensor = torch.zeros(max_length, len(ss), len(vocabulary)) 100 | for j,s in enumerate(ss): 101 | s+='$' 102 | for i, letter in enumerate(s): 103 | if ibvl',x) 137 | out = self.cnn(out) 138 | out = self.relu(out) 139 | for cnn in self.cnns: 140 | out = cnn(out) 141 | out = self.relu(out) 142 | out = out.view(args.batch_size,args.hidden_layer_size*args.input_length) 143 | out = self.fc(out) 144 | return out 145 | 146 | # load the model 147 | if args.model=='cnn': 148 | model = CNNModel() 149 | else: 150 | model = RNNModel() 151 | 152 | if args.warm_start: 153 | print('warm starting model from',args.warm_start) 154 | model_dict = torch.load(os.path.join(args.warm_start,'model')) 155 | model.load_state_dict(model_dict['model_state_dict']) 156 | 157 | # training 158 | if args.train: 159 | 160 | # create log_dir 161 | log_dir = args.log_dir 162 | if log_dir is None: 163 | log_dir = 'log/'+( 164 | 'model='+args.model+ 165 | '_lr='+str(args.learning_rate)+ 166 | '_optim='+args.optimizer+ 167 | '_clip='+str(args.gradient_clipping)+ 168 | '_'+str(datetime.datetime.now()) 169 | ) 170 | try: 171 | os.makedirs(log_dir) 172 | with open(os.path.join(log_dir,'args'), 'w') as f: 173 | f.write('\n'.join(sys.argv[1:])) 174 | except FileExistsError: 175 | print('cannot create log dir,',log_dir,'already exists') 176 | sys.exit(1) 177 | writer = SummaryWriter(log_dir=log_dir) 178 | 179 | # prepare model for training 180 | criterion = nn.CrossEntropyLoss() 181 | if args.optimizer == 'sgd': 182 | optimizer = torch.optim.SGD( 183 | model.parameters(), 184 | lr=args.learning_rate, 185 | momentum=args.momentum, 186 | weight_decay=args.weight_decay 187 | ) 188 | if args.optimizer == 'adam': 189 | optimizer = torch.optim.Adam( 190 | model.parameters(), 191 | lr=args.learning_rate, 192 | weight_decay=args.weight_decay 193 | ) 194 | model.train() 195 | 196 | # training loop 197 | start_time = time.time() 198 | for step in range(1, args.samples + 1): 199 | 200 | # get random training example 201 | categories = [] 202 | lines = [] 203 | for i in range(args.batch_size): 204 | category = random.choice(all_categories) 205 | line = random.choice(category_lines[category]) 206 | categories.append(all_categories.index(category)) 207 | lines.append(line) 208 | category_tensor = torch.tensor(categories, dtype=torch.long) 209 | line_tensor = str_to_tensor(lines,args.input_length) 210 | 211 | # perform training step 212 | output = model(line_tensor) 213 | loss = criterion(output, category_tensor) 214 | loss.backward() 215 | grad_norm = sum([ torch.norm(p.grad)**2 for p in model.parameters()])**(1/2) 216 | if args.gradient_clipping: 217 | torch.nn.utils.clip_grad_norm_(model.parameters(),1.0) 218 | optimizer.step() 219 | 220 | # get category from output 221 | top_n, top_i = output.topk(1) 222 | guess_i = top_i[-1].item() 223 | category_i = category_tensor[-1] 224 | guess = all_categories[guess_i] 225 | category = all_categories[category_i] 226 | accuracies = torch.where( 227 | top_i[:,0]==category_tensor, 228 | torch.ones([args.batch_size]), 229 | torch.zeros([args.batch_size]) 230 | ) 231 | accuracy = torch.mean(accuracies).item() 232 | 233 | # tensorboard 234 | writer.add_scalar('train/loss', loss.item(), step) 235 | writer.add_scalar('train/accuracy', accuracy, step) 236 | writer.add_scalar('train/grad_norm', grad_norm.item(), step) 237 | 238 | # print status update 239 | if step % 100 == 0: 240 | correct = '✓' if guess == category else '✗ (%s)' % category 241 | print('%d %d%% (%.2f sec) %.4f %s / %s %s' % ( 242 | step, 243 | step / args.samples * 100, 244 | time.time()-start_time, 245 | loss, 246 | line, 247 | guess, 248 | correct 249 | )) 250 | 251 | # save model 252 | if step%args.save_every == 0 or step==args.samples: 253 | print('saving model checkpoint') 254 | torch.save({ 255 | 'step':step, 256 | 'model_state_dict': model.state_dict(), 257 | 'optimizer_state_dict': optimizer.state_dict(), 258 | 'loss':loss 259 | }, os.path.join(log_dir,'model')) 260 | 261 | 262 | # infer 263 | model.eval() 264 | softmax = torch.nn.Softmax(dim=1) 265 | if args.infer: 266 | for line in sys.stdin: 267 | line = line.strip() 268 | line_tensor = str_to_tensor([line],args.input_length) 269 | output = model(line_tensor) 270 | probs = softmax(output) 271 | top_n, top_i = probs.topk(3) 272 | guess_0 = all_categories[top_i[0,0].item()] 273 | guess_1 = all_categories[top_i[0,1].item()] 274 | guess_2 = all_categories[top_i[0,2].item()] 275 | print( 276 | 'name=',line, 277 | 'guess0=%s (%0.2f)'%(guess_0,probs[0,top_i[0,0].item()],), 278 | 'guess1=%s (%0.2f)'%(guess_1,probs[0,top_i[0,1].item()],), 279 | 'guess2=%s (%0.2f)'%(guess_2,probs[0,top_i[0,2].item()],), 280 | ) 281 | 282 | -------------------------------------------------------------------------------- /img/layers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/img/layers.png -------------------------------------------------------------------------------- /lecture_notes/ad.py: -------------------------------------------------------------------------------- 1 | print('hello world') 2 | 3 | import torch 4 | 5 | # 0 order tensors = numbers 6 | x = torch.tensor(0.0) 7 | y = torch.tensor(2.0) 8 | z = x + y 9 | print('z=',z.item()) 10 | 11 | # 1st order tensors = vectors 12 | x = torch.tensor([1,2,3]) 13 | 14 | # 2nd order tensor = matrix 15 | m = torch.tensor( 16 | [[1,2,3] 17 | ,[4,5,6]]) 18 | 19 | #m2 = torch.tensor( 20 | # [[1,2,3,4] 21 | # ,[4,5,6]]) 22 | 23 | # 3rd order tensors = cubes 24 | c = torch.tensor([[[3],[3]]]) 25 | 26 | # two new features of torch 27 | # 1. works on GPUs 28 | # 2. supports automatic diff. 29 | # tensorflow: 30 | # 1. also has TPU 31 | # 2. better deployment deveops 32 | -------------------------------------------------------------------------------- /lecture_notes/ad2.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | def f(x): 4 | return x**2 + 4*x + 2 5 | 6 | def df(x): 7 | return 2*x + 4 8 | 9 | # minimum at x=-2 10 | # analytic formula 11 | # closed-form formula 12 | # for the minimum of f 13 | 14 | x = torch.tensor( 15 | 0.0, 16 | requires_grad=True 17 | ) 18 | y = torch.tensor( 19 | 1.0, 20 | requires_grad=True 21 | ) 22 | 23 | 24 | # d/dx f(x) == d/dx z 25 | z = f(x) 26 | z.backward() # computes the derivative 27 | 28 | x.grad # this is df(x) 29 | 30 | print('f(x)=',f(x)) 31 | #print('df(x)=',df(x)) 32 | print('x.grad=',x.grad) 33 | print('y.grad=',y.grad) 34 | 35 | # gradient descent 36 | x0 = torch.tensor(7.0,requires_grad=True) 37 | z0 = f(x0) 38 | z0.backward() 39 | 40 | alpha = 0.1 # step size, learning rate 41 | x1 = x0 - alpha * x0.grad # key formula 42 | x1 = torch.tensor(x1,requires_grad=True) 43 | z1 = f(x1) 44 | z1.backward() 45 | 46 | x2 = x1 - alpha * x1.grad 47 | 48 | print('x0=',x0) 49 | print('x1=',x1) 50 | print('x2=',x2) 51 | 52 | # loop version of gradient descent 53 | x = torch.tensor(7.0,requires_grad=True) 54 | for i in range(50): 55 | print('i=',i,'x=',x) 56 | z = f(x) 57 | z.backward() 58 | x = x - alpha * x.grad 59 | x = torch.tensor(x,requires_grad=True) 60 | 61 | 62 | # higher order tensors 63 | x = torch.tensor([[[[[7.0,5.6]]]]]) 64 | 65 | x = torch.ones(3,4,5) 66 | # 3rd order = R^m*n*o 67 | # m = 3, n=4, o=5 68 | print('x=',x) 69 | 70 | #x = torch.zeros(3,4,5) 71 | #x = torch.empty(3,4,5) 72 | print('x=',x) 73 | 74 | z = f(x) 75 | print('z=',z) 76 | z.backward() 77 | x.grad 78 | -------------------------------------------------------------------------------- /lecture_notes/einsum.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | x=torch.ones(2,3) 4 | 5 | print('x=',x) 6 | 7 | print('sum=',torch.einsum('im->',[x])) 8 | print('trans=',torch.einsum('ij->ji',[x])) 9 | 10 | y = torch.ones(2) 11 | print('l2=',torch.einsum('i,i->',[y,y])) 12 | 13 | print('complex=',torch.einsum( 14 | 'ij, ij, ij -> ij', 15 | [x,x,x] 16 | )) 17 | -------------------------------------------------------------------------------- /project/README.md: -------------------------------------------------------------------------------- 1 | # Project: analyzing news articles about the coronavirus 2 | 3 | **Overview:** 4 | This is the final project for [CMC's CS181: Deep Learning](https://github.com/mikeizbicki/cmc-csci181) course. 5 | The project will guide you through the process of using state of the art deep learning techniques to analyze the news coverage of the coronavirus. 6 | The dataset you will analyze contains 2 million news articles written in 20 languages and published in 50,000 venues around the world. 7 | Despite this large dataset size, 8 | the project has been designed to be completed on only modest hardware, 9 | and specifically does not require access to a GPU. 10 | 11 | **Scientific goals:** 12 | We will try to answer the following questions: 13 | 14 | 1. What is the bias of different news sources? (geographic, topical, ideological, etc.) 15 | 16 | 1. How has coverage of the coronavirus changed over time? 17 | 18 | 1. Can we detect/generate "fake news" stories about the coronavirus? 19 | Wikipedia has a [big list of fake news stories related to coronavirus](https://en.wikipedia.org/wiki/Misinformation_related_to_the_2019%E2%80%9320_coronavirus_pandemic). 20 | 21 | 34 | 35 | 39 | 40 | **Learning objectives:** 41 | 42 | 1. Have a cool project in your portfolio to talk about in job interviews 43 | 44 | 1. Understand the following deep learning concepts 45 | 1. explainability 46 | 1. attention (compared with RNNs and CNNs) 47 | 1. transfer learning / fine tuning 48 | 1. embeddings 49 | 50 | 1. Apply deep learning techniques to a real world dataset 51 | 1. understand data cleaning techniques 52 | 1. understand the importance of Unicode in both English and foreign language text 53 | 1. learn how to use "natural supervision" to generate labels for unlabeled data 54 | 1. learn how to approach a problem that no one knows the answers to 55 | 56 | 1. Understand the research process 57 | 58 | **Related projects:** 59 | 60 | There's been lots of machine learning research applied to the coronavirus ([see here for a really big list](https://towardsdatascience.com/machine-learning-methods-to-aid-in-coronavirus-response-70df8bfc7861)). 61 | The closest related research to this project is: 62 | 63 | 1. Kaggle is hosting [a competition to analyze academic articles about coronavirus](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge). 64 | The dataset includes 47,000 scientific articles, 65 | which is too many for doctors to read, 66 | and the goal is to extract the most relevant information about these articles. 67 | 1. Stanford hosted [a virtual conference on COVID-19 and AI](https://hai.stanford.edu/events/covid-19-and-ai-virtual-conference/agenda), the most relavent presentation was by [Renée DiResta](https://hai.stanford.edu/people/ren-e-diresta) on [Misinformation & Disinformation in state media about COVID-19](https://www.youtube.com/watch?v=z4105Exe23Q&t=1hr58m10s) 68 | 69 | ## Part 0: The data 70 | 71 | You can download version 0 of the dataset at: 72 | 73 | 1. training set: https://izbicki.me/public/cs/cs181/coronavirus-headlines-train.jsonl.gz 74 | 1. test set: https://izbicki.me/public/cs/cs181/coronavirus-headlines-test.jsonl.gz 75 | 76 | You should download the training set and place it in a directory called `coronavirus_headlines` with the following commands: 77 | ``` 78 | $ mkdir coronavirus-headlines 79 | $ cd coronavirus-headlines 80 | $ wget https://izbicki.me/public/cs/cs181/coronavirus-headlines-train.jsonl.gz 81 | ``` 82 | 83 | The dataset is stored in the [JSON Lines](http://jsonlines.org/) format, 84 | where every line represents a single news articles and has the following keys: 85 | 86 | | key | semantics | 87 | | ------------- | --------- | 88 | | `url` | The url of the article | 89 | | `hostname` | The hostname field of the url (e.g. `www.huffpost.com` or `www.breitbart.com`) | 90 | | `title` | The title of the article as extracted by [newspaper3k](https://newspaper.readthedocs.io/en/latest/). This is typically the `

` tag or the `` tag of the webpage. No preprocessing has been done on the titles, and so many titles contain weird unicode values that need to be normalized. | 91 | | `pub_time` | The date of publication as extracted by [newspaper3k](https://newspaper.readthedocs.io/en/latest/). These dates should be taken with a heavy grain of salt. Many of these dates are clearly wrong, for example there are dates in the 2030s and dates from thousands of years in the past. Furthermore, some domains like `sputniknews.com` use the European date convention of YYYY-MM-DD, but these are interpreted by newspaper3k as YYYY-DD-MM dates, and so have their day and month flipped. My guess is that for dates that might be relevant to the coronavirus (roughly 2019-11-01 to 2020-04-01), somewhere between 50%-80% of the dates are correct. | 92 | | `lang` | The language ISO-2 code determined by applying [langid.py](https://github.com/saffsd/langid.py) to the body of the article. For popular languages like English and Chinese, I think these labels are fairly accurate. But for other languages I'm less confident. For example, there are many articles about coronavirus labeled as Latin, and my suspicion is that most of these articles are actually written in another romance language like Spanish or Italian. | 93 | 94 | This isn't every article ever written about the coronavirus, 95 | but it's a large fraction of them. 96 | The list has been filtered to include only English language articles, 97 | and articles whose title contains one of the strings `coronavirus`, `corona virus`, `covid` or `ncov`. 98 | 99 | ## Part 1: explainable machine learning 100 | 101 | due date: Thursday, 23 April 102 | 103 | ### What's already been done 104 | Our first goal when analyzing this dataset is to predict the hostname that published an article given just the title. 105 | We will see that this gives us a simple (but crude) way to measure how similar two different hostnames are. 106 | 107 | I trained this model with the following command: 108 | ``` 109 | $ python3 names.py \ 110 | --train \ 111 | --data=coronavirus-headlines/coronavirus-headlines-train.jsonl.gz \ 112 | --data_format=headlines \ 113 | --model=gru \ 114 | --hidden_layer_size=512 \ 115 | --num_layers=8 \ 116 | --resnet \ 117 | --dropout=0.2 \ 118 | --optimizer=adam \ 119 | --learning_rate=1e-3 \ 120 | --gradient_clipping \ 121 | --batch_size=128 122 | ``` 123 | You do not have to run this command yourself, as it will take a long time. 124 | (I let it run for 2 days on my GPU system.) 125 | I am providing the command just so that you can see the particular hyperparameters used to train the model. 126 | 127 | You should download the pretrained model by running the commands 128 | ``` 129 | $ wget https://izbicki.me/public/cs/cs181/gru_512x8.tar.gz 130 | $ mkdir models 131 | $ mv gru_512x8.tar.gz models 132 | $ tar -xzf models/gru_512x8.tar.gz 133 | ``` 134 | 135 | The last step you need to do before running the code is create a directory for the output explanations to be saved into: 136 | ``` 137 | $ mkdir explain_outputs 138 | ``` 139 | 140 | We can now run inference on our model with the command 141 | ``` 142 | $ python3 names.py --infer --warm_start=models/gru_512x8 143 | ``` 144 | which will accept text from stdin and run the inference algorithm on it. 145 | 146 | Examples: 147 | 148 | 1. Geographic similarity: 149 | The Australian website `www.news.com.au` ran the story titled 150 | ``` 151 | Sick Qantas passenger at Melbourne Airport sparks coronavirus fears 152 | ``` 153 | We can run our inference algorithm on this title using the command 154 | ``` 155 | $ python3 names.py --infer --warm_start=models/gru_512x8 <<< "Sick Qantas passenger at Melbourne Airport sparks coronavirus fears" 156 | ``` 157 | The top 5 predictions are: 158 | ``` 159 | 0 www.news.com.au (0.24) 160 | 1 www.abc.net.au (0.04) 161 | 2 www.dailymail.co.uk (0.04) 162 | 3 au.news.yahoo.com (0.03) 163 | 4 news.yahoo.com (0.03) 164 | ``` 165 | Most of these are other Australian newspapers. 166 | 167 | 1. Topical similarity: 168 | The website `virological.org` is a discussion forum where doctors and biologists post their analysis of different viruses. 169 | Understandably, they have been recently been posting detailed analyses of the coronavirus, 170 | and one such post was titled 171 | ``` 172 | nCoV's relationship to bat coronaviruses & recombination signals (no snakes) - no evidence the 2019-nCoV lineage is recombinant 173 | ``` 174 | We can run our inference algorithm using the command 175 | ``` 176 | $ python3 names.py --infer --warm_start=models/gru_512x8 <<< "nCoV's relationship to bat coronaviruses & recombination signals (no snakes) - no evidence the 2019-nCoV lineage is recombinant" 177 | ``` 178 | The top 5 predictions are: 179 | ``` 180 | 0 virological.org (0.46) 181 | 1 www.businessinsider.com (0.04) 182 | 2 www.insider.com (0.02) 183 | 3 contagiontracker.com (0.02) 184 | 4 www.en24.news (0.01) 185 | ``` 186 | This suggests that these websites all publish relatively more academic articles about the coronavirus than other news websites. 187 | Notice that more relavent sites such as `medarxiv.org` (a website for publishing academic medical papers that contains several analyses of the cornavirus) do not appear in this list even though they are very similar to `virological.org`. 188 | 189 | 1. Politics similarity: 190 | `breitbart.com` is a conservative news source that is well known for supporting President Trump. 191 | They published the following article: 192 | ``` 193 | Pollak: Coronavirus Panic Partly Driven by Anti-Trump Hysteria 194 | ``` 195 | We can run our inference algorithm using the command 196 | ``` 197 | $ python3 names.py --infer --warm_start=models/gru_512x8 <<< "Pollak: Coronavirus Panic Partly Driven by Anti-Trump Hysteria" 198 | ``` 199 | The top 5 predictions are: 200 | ``` 201 | 0 ussanews.com (0.04) 202 | 1 www.infowars.com (0.04) 203 | 2 crossman66.wordpress.com (0.04) 204 | 3 fromthetrenchesworldreport.com (0.03) 205 | 4 jonsnewplace.wordpress.com (0.02) 206 | ``` 207 | In this case, Breitbart does not appear as one of the top predictions. 208 | But all the other sources listed share a similar conservative perspective. 209 | 210 |  241 | 242 | The Problem: 243 | We have a (crude) way to measure the similarity of two hostnames, 244 | but we can't explain why two hostnames get measured similarly. 245 | Your goal in this assignment is to find these explanations using the "sliding window algorithm". 246 | 247 | ### The sliding window algorithm 248 | 249 | The sliding window algorithm is a folklore technique for explaining the result of any machine learning algorithm. 250 | There are more sophisticated algorithms (such as [LIME](https://github.com/marcotcr/lime) and [SHAP](https://github.com/slundberg/shap)), 251 | but these are significantly more difficult to implement and interpret. 252 | They do both have nice libraries, however, which give pretty visualizations. 253 | 254 | Basic idea. 255 | If we have an input sentence 256 | ``` 257 | Pollak: Coronavirus Panic Partly Driven by Anti-Trump Hysteria 258 | ``` 259 | and we want to know how important the word `Trump` is for our final classification, 260 | we can: 261 | (1) remove the word `Trump`, 262 | (2) rerun the classification on the modified sentence, 263 | and (3) compare the results of the model on the modified and unmodified sentences. 264 | If the results are similar, then the word `Trump` is not important; 265 | if the results are different, then the word `Trump` is important. 266 | 267 | How to remove a word? 268 | There are a surprising number of ways to remove a word from a sentence: 269 | 270 | 1. Create a new sentence by concatenating everything to the left and right of the removed word. 271 | Thus, our example sentence would become 272 | ``` 273 | Pollak: Coronavirus Panic Partly Driven by Anti- Hysteria 274 | ``` 275 | This method is easy to implement, 276 | but can result in grammatically incorrect sentences. 277 | Since our model is only trained on grammatically correct sentences, 278 | there is no reason to expect it to perform well on malformed sentences, 279 | and it is likely to output a large difference for every word in the sentence. 280 | 281 | 2. Replace the word with another word. 282 | In our example sentence, we might replace the word `Trump` with the word `Biden` to get 283 | ``` 284 | Pollak: Coronavirus Panic Partly Driven by Anti-Biden Hysteria 285 | ``` 286 | This sentence is now grammatically correct, 287 | and so we could expect our model to do reasonably on it. 288 | But how well it does would depend on our choice of replacement word, 289 | and so we would need to do many replacements and take an average to get a good estimate of the word's importance. 290 | 291 | 3. Insert "blank" inputs where the selected word should be. 292 | Recall that the inputs to our neural network are one-hot encoded letters. 293 | Therefore, every letter has a vector associated with it that has exactly one `1`. 294 | If we replace the `1` with a `0`, then the model will still know that there is a word in this location (because the size of the input tensor doesn't change), 295 | but the model is getting no information about what that word is. 296 | 297 | Comparing model outputs. 298 | There are many ways to calculate the similarity between two model outputs, 299 | but the simplest is using the Euclidean distance between output the model probabilities, 300 | and that's what you should use in this assignment. 301 | 302 | Example outputs. 303 | In the following images, each word is colored with the Euclidean distance calculated above. 304 | Darker green values indicate that the word is more important. 305 | 306 | 307 | <img src=img/line0000.word.png width=800px> 308 | 309 | 310 | 311 | <img src=img/line0001.word.png width=800px> 312 | 313 | 314 | 315 | <img src=img/line0002.word.png width=800px> 316 | 317 | 318 | Pseudocode. 319 | The following pseudocode summarizes the sliding window explanation algorithm. 320 | ``` 321 | construct input_tensor from input_sentence 322 | let probs = softmax(model(input_tensor)) 323 | for each word in the input sentence: 324 | construct input_tensor' by setting the colums associated with word to 0 325 | let probs' = softmax(model(input_tensor)) 326 | weight[word] = |probs-probs'| 327 | ``` 328 | 329 | Character level explanations. 330 | By repeating the above procedure with individual characters (rather than words), 331 | we can generate character-level explanations of our text. 332 | 333 | The resulting explanations look like: 334 | 335 | 336 | <img src=img/line0000.char.png width=800px> 337 | 338 | 339 | 340 | <img src=img/line0001.char.png width=800px> 341 | 342 | 343 | 344 | <img src=img/line0002.char.png width=800px> 345 | 346 | 347 |  351 | 352 | ### Tasks for you to complete 353 | 354 | 1. Follow the instructions above to download the pretrained model weights 355 | 1. Implement the `explain` function in the `names.py` files source code 356 | 1. Reproduce the example explanations above to ensure that your code is working correctly 357 | 1. Select 3 titles from the training data and generate word/char level explanations for these titles 358 | 359 | ### Submission 360 | 361 | Upload your explanation images and source code to sakai. 362 | 363 | ## Part 2: the attention mechanism and fine tuning 364 | 365 | This part of the assignment is based on a new dataset [corona.multilan100.jsonl.gz](https://izbicki.me/public/cs/cs181/corona.multilang100.jsonl.gz), 366 | which is in the same format as the previous dataset. 367 | You should download this file and place it in your project folder. 368 | 369 | Unlike the previous dataset, this dataset is multilingual. 370 | It contains news headlines written in: 371 | 1. English, 372 | 1. Spanish, 373 | 1. Portuguese, 374 | 1. Italian, 375 | 1. French, 376 | 1. German, 377 | 1. Russian, 378 | 1. Chinese, 379 | 1. Korean, 380 | 1. and Japanese. 381 | 382 | For each of these 10 languages, 10 prominent news sources were selected to be included in the dataset. 383 | There are therefore 100 classes. 384 | The overall size of the dataset 106767 headines, and so there are about 1000 headlines per news source. 385 | 386 | The character level model we used before will not work in this multilingual setting because many of these languages use different vocabularies. 387 | We will instead use the BERT transformer model and the [transformers](https://huggingface.co/transformers/) python library. 388 | 389 | ### Tasks for you to complete 390 | 391 | 1. Add support for training a multilingual BERT model to the `names.py` file. 392 | 1. Train the BERT model so that you get at least the following training accuuracies: 393 | 1. accuracy@1 >= 0.3 394 | 1. accuracy@20 >= 0.9 395 | 396 | These are very conservative numbers. 397 | You can view my [tensorboard.dev log](https://tensorboard.dev/experiment/2WJbkgdyTlGvh6Gk4mu0PQ/#scalars&_smoothingWeight=0.99) to see what type of performance levels are possible. 398 | 399 | You will have to experiment with different hyperparamter combinations in order to get good results. 400 | You do not have to do any warmstarts to get these results. 401 | I encourage you to try warmstarts since you can get much better accuracies, 402 | but the computational expense may be too much for some of your computers, 403 | and so I am not requiring it. 404 | 405 | 1. Generate a tensorboard.dev plot showing your model training progress 406 | 407 | ### Submission 408 | 409 | Upload the link to you tensorboard.dev output on sakai 410 | 411 | ### Optional task 412 | 413 | Extend the explanation code from part 1 so that it works on the BERT model as well. 414 | 415 | ## Part 3: embeddings 416 | 417 | Recall that in the previous part of the project, we created the following BERT model: 418 | 419 | ``` 420 | class BertFineTuning(torch.nn.Module): 421 | def init(self): 422 | super().init() 423 | self.fc_class = torch.nn.Linear(768,num_classes) 424 | 425 | def forward(self,x): 426 | last_layer,embedding = bert(x) 427 | embedding = torch.mean(last_layer,dim=1) 428 | out = self.fc_class(embedding) 429 | return out 430 | ``` 431 | 432 | The linear layer `self.fc_class` is just a matrix that is `768 x num_classes`. 433 | In other words, each class has a 768 dimensional vector associated with it, 434 | and that vector encodes lots of information about the class. 435 | We call this vector an "embedding" of the class. 436 | 437 | By visualizing the embeddings, we can understand which classes our model thinks are "similar". 438 | Tensorboard has some built-in tools for this visualization using algorithms like PCA and t-SNE. 439 | 440 | ### Tasks for you to complete 441 | 442 | 1. Modify your `names.py` file so that it outputs class embeddings to tensorboard. 443 | 444 | 1. Load tensorboard and visualize the resulting embeddings. 445 | 446 | ### Submission 447 | 448 | Upload a screenshot of your embeddings and your source code to sakai. 449 | -------------------------------------------------------------------------------- /project/img/line0000.char.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0000.char.png -------------------------------------------------------------------------------- /project/img/line0000.word.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0000.word.png -------------------------------------------------------------------------------- /project/img/line0001.char.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0001.char.png -------------------------------------------------------------------------------- /project/img/line0001.word.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0001.word.png -------------------------------------------------------------------------------- /project/img/line0002.char.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0002.char.png -------------------------------------------------------------------------------- /project/img/line0002.word.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0002.word.png -------------------------------------------------------------------------------- /project/img/line0003.char.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0003.char.png -------------------------------------------------------------------------------- /project/img/line0003.word.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0003.word.png -------------------------------------------------------------------------------- /project/names.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # process command line args 4 | import argparse 5 | parser = argparse.ArgumentParser(fromfile_prefix_chars='@') 6 | 7 | parser_control = parser.add_argument_group('control options') 8 | parser_control.add_argument('--infer',action='store_true') 9 | parser_control.add_argument('--train',action='store_true') 10 | parser_control.add_argument('--generate',action='store_true') 11 | 12 | parser_data = parser.add_argument_group('data options') 13 | parser_data.add_argument('--data',default='names') 14 | parser_data.add_argument('--data_format',choices=['names','headlines'],default='names') 15 | parser_data.add_argument('--sample_strategy',choices=['uniform_line','uniform_category'],default='uniform_category') 16 | parser_data.add_argument('--case_insensitive',action='store_true') 17 | parser_data.add_argument('--dropout',type=float,default=0.0) 18 | 19 | parser_model = parser.add_argument_group('model options') 20 | parser_model.add_argument('--model',choices=['cnn','rnn','gru','lstm'],default='rnn') 21 | parser_model.add_argument('--resnet',action='store_true') 22 | parser_model.add_argument('--hidden_layer_size',type=int,default=128) 23 | parser_model.add_argument('--num_layers',type=int,default=1) 24 | parser_model.add_argument('--conditional_model',action='store_true') 25 | 26 | parser_opt = parser.add_argument_group('optimization options') 27 | parser_opt.add_argument('--batch_size',type=int,default=1) 28 | parser_opt.add_argument('--learning_rate',type=float,default=1e-1) 29 | parser_opt.add_argument('--optimizer',choices=['sgd','adam'],default='sgd') 30 | parser_opt.add_argument('--gradient_clipping',action='store_true') 31 | parser_opt.add_argument('--momentum',type=float,default=0.9) 32 | parser_opt.add_argument('--weight_decay',type=float,default=1e-4) 33 | parser_opt.add_argument('--samples',type=int,default=10000) 34 | parser_opt.add_argument('--input_length',type=int) 35 | parser_opt.add_argument('--warm_start') 36 | parser_opt.add_argument('--disable_categories',action='store_true') 37 | 38 | parser_infer = parser.add_argument_group('inference options') 39 | parser_infer.add_argument('--infer_path',default='explain_outputs') 40 | 41 | parser_generate = parser.add_argument_group('generate options') 42 | parser_generate.add_argument('--temperature',type=float,default=1.0) 43 | parser_generate.add_argument('--max_sample_length',type=int,default=100) 44 | parser_generate.add_argument('--category',nargs='') 45 | 46 | parser_debug = parser.add_argument_group('debug options') 47 | parser_debug.add_argument('--device',choices=['auto','cpu','gpu'],default='auto') 48 | parser_debug.add_argument('--print_delay',type=int,default=5) 49 | parser_debug.add_argument('--log_dir_base',type=str,default='log') 50 | parser_debug.add_argument('--log_dir',type=str) 51 | parser_debug.add_argument('--save_every',type=int,default=1000) 52 | parser_debug.add_argument('--print_every',type=int,default=100) 53 | 54 | args = parser.parse_args() 55 | 56 | if args.model=='cnn' and args.input_length is None: 57 | raise ValueError('if --model=cnn, then you must specify --input_length') 58 | 59 | # load args from file if warm starting 60 | if args.warm_start is not None: 61 | import sys 62 | import os 63 | args_orig = args 64 | args = parser.parse_args(['@'+os.path.join(args.warm_start,'args')]+sys.argv[1:]) 65 | args.train = args_orig.train 66 | 67 | # supress warnings 68 | import warnings 69 | warnings.simplefilter(action='ignore', category=FutureWarning) 70 | 71 | # load modules 72 | import datetime 73 | import glob 74 | import os 75 | import math 76 | import random 77 | import string 78 | import sys 79 | import time 80 | import unicodedata 81 | from unidecode import unidecode 82 | 83 | import torch 84 | import torch.nn as nn 85 | from torch.utils.tensorboard import SummaryWriter 86 | 87 | # set device to cpu/gpu 88 | if args.device=='gpu' or (torch.cuda.is_available() and args.device=='auto'): 89 | device = torch.device('cuda') 90 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 91 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 92 | else: 93 | device = torch.device('cpu') 94 | print('device=',device) 95 | 96 | # import the training data 97 | BOL = '\x00' 98 | EOL = '\x01' 99 | OOV = '\x02' 100 | if args.case_insensitive: 101 | vocabulary = string.ascii_lowercase 102 | else: 103 | vocabulary = string.ascii_letters 104 | vocabulary += " .,;'" + '1234567890:-/#$%' + OOV + BOL + EOL 105 | print('len(vocabulary)=',len(vocabulary)) 106 | 107 | def unicode_to_ascii(s): 108 | ''' 109 | Removes diacritics from unicode characters. 110 | See: https://stackoverflow.com/a/518232/2809427 111 | ''' 112 | return ''.join( 113 | c for c in unicodedata.normalize('NFD', s) 114 | if unicodedata.category(c) != 'Mn' 115 | and c in vocabulary 116 | ) 117 | 118 | def format_line(line): 119 | line = unidecode(line) 120 | if args.case_insensitive: 121 | line = line.lower() 122 | return line 123 | 124 | # Build the category_lines dictionary, a list of names per language 125 | if args.data_format == 'names': 126 | category_lines = {} 127 | all_categories = [] 128 | #for filename in glob.glob(os.path.join(args.data,'.txt')): 129 | for filename in glob.glob(os.path.join(args.data,'')): 130 | print('filename=',filename) 131 | category = os.path.splitext(os.path.basename(filename))[0] 132 | all_categories.append(category) 133 | lines = open(filename, encoding='utf-8').read().strip().split('\n') 134 | lines = [format_line(line) for line in lines] 135 | category_lines[category] = lines 136 | 137 | elif args.data_format == 'headlines': 138 | import gzip 139 | import json 140 | from collections import defaultdict,Counter 141 | 142 | # load data points 143 | category_lines = defaultdict(lambda: []) 144 | lines_category = [] 145 | categories_counter = Counter() 146 | with gzip.open(args.data,'rt') as f: 147 | for line in f: 148 | article = json.loads(line) 149 | hostname = article['hostname'] 150 | categories_counter[hostname] += 1 151 | day = article['day'].split()[0] 152 | title = article['title'] 153 | title = format_line(title) 154 | category_lines[hostname].append(title) 155 | lines_category.append((title, hostname)) 156 | all_categories = [ hostname for hostname,count in categories_counter.most_common() ] 157 | all_categories = list(all_categories) 158 | 159 | print('len(lines_category)=',len(lines_category)) 160 | print('len(all_categories)=',len(all_categories)) 161 | 162 | def str_to_tensor(ss,input_length=None): 163 | ''' 164 | Converts a list of strings into a tensor of shape <max_length, len(ss), len(vocabulary)>. 165 | This is used to convert text into a form suitable for input into a RNN/CNN. 166 | ''' 167 | max_length = max([len(s) for s in ss]) + 2 168 | if input_length: 169 | max_length = input_length 170 | tensor = torch.zeros(max_length, len(ss), len(vocabulary)).to(device) 171 | for j,s in enumerate(ss): 172 | s = BOL + s + EOL 173 | for i, letter in enumerate(s): 174 | if i<max_length: 175 | vocabulary_i = vocabulary.find(letter) 176 | if vocabulary_i==-1: 177 | vocabulary_i = vocabulary.find(OOV) 178 | tensor[i,j,vocabulary_i] = 1 179 | return tensor 180 | 181 | # define the model 182 | input_size = len(vocabulary) 183 | if args.conditional_model: 184 | input_size += len(all_categories) 185 | 186 | class RNNModel(nn.Module): 187 | def init(self): 188 | super(RNNModel,self).init() 189 | if args.model=='rnn': 190 | mk_rnn = nn.RNN 191 | if args.model=='gru': 192 | mk_rnn = nn.GRU 193 | if args.model=='lstm': 194 | mk_rnn = nn.LSTM 195 | self.rnn = mk_rnn( 196 | input_size, 197 | args.hidden_layer_size, 198 | num_layers=args.num_layers, 199 | dropout=args.dropout 200 | ) 201 | self.fc_class = nn.Linear(args.hidden_layer_size,len(all_categories)) 202 | self.dropout = nn.Dropout(args.dropout) 203 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,len(vocabulary)) 204 | 205 | def forward(self, x): 206 | # out is 3rd order: < len(line) x batch size x hidden_layer_size > 207 | out,h_n = self.rnn(x) 208 | out = self.dropout(out) 209 | out_class = self.fc_class(out[out.shape[0]-1,:,:]) 210 | out_nextchars = torch.zeros(out.shape[0] , out.shape[1], len(vocabulary) ).to(device) 211 | for i in range(out.shape[0]): 212 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:]) 213 | return out_class, out_nextchars 214 | 215 | class ResnetRNNModel(nn.Module): 216 | def init(self): 217 | super().init() 218 | if args.model=='rnn': 219 | mk_rnn = nn.RNN 220 | if args.model=='gru': 221 | mk_rnn = nn.GRU 222 | if args.model=='lstm': 223 | mk_rnn = nn.LSTM 224 | rnn_input_size = input_size 225 | self.rnns = [] 226 | for layer in range(args.num_layers): 227 | rnn = mk_rnn( 228 | rnn_input_size, 229 | args.hidden_layer_size, 230 | num_layers=1, 231 | ) 232 | self.add_module('rnn'+str(layer),rnn) 233 | self.rnns.append(rnn) 234 | rnn_input_size = args.hidden_layer_size 235 | self.fc_class = nn.Linear(args.hidden_layer_size,len(all_categories)) 236 | self.dropout = nn.Dropout(args.dropout) 237 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,len(vocabulary)) 238 | 239 | def forward(self, x): 240 | # out is 3rd order: < len(line) x batch size x hidden_layer_size > 241 | out = x 242 | for layer,rnn in enumerate(self.rnns): 243 | out_prev = out 244 | out,_ = rnn(out) 245 | if layer>0 and args.resnet: 246 | out = out + out_prev 247 | out = self.dropout(out) 248 | out_class = self.fc_class(out[out.shape[0]-1,:,:]) 249 | out_nextchars = torch.zeros(out.shape[0] , out.shape[1], len(vocabulary) ).to(device) 250 | for i in range(out.shape[0]): 251 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:]) 252 | return out_class, out_nextchars 253 | 254 | 255 | class CNNModel(nn.Module): 256 | def init(self): 257 | super(CNNModel,self).init() 258 | self.relu = nn.ReLU() 259 | self.cnn = \ 260 | nn.Conv1d(input_size,args.hidden_layer_size,3,padding=1) 261 | self.cnns = (args.num_layers-1)[ 262 | nn.Conv1d(args.hidden_layer_size,args.hidden_layer_size,3,padding=1) 263 | ] 264 | self.dropout = nn.Dropout(args.dropout) 265 | self.fc_class = nn.Linear(args.hidden_layer_sizeargs.input_length,len(all_categories)) 266 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,input_size) 267 | 268 | def forward(self,x): 269 | out = torch.einsum('lbv->bvl',x) 270 | out = self.cnn(out) 271 | out = self.relu(out) 272 | for cnn in self.cnns: 273 | out = cnn(out) 274 | out = self.relu(out) 275 | out = self.dropout(out) 276 | out_class = out.view(args.batch_size,args.hidden_layer_sizeargs.input_length) 277 | out_class = self.fc_class(out_class) 278 | out = torch.einsum('ijk->kij',out) 279 | out_nextchars = torch.zeros([out.shape[0],out.shape[1],input_size]) 280 | for i in range(out.shape[0]): 281 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:]) 282 | return out_class, out_nextchars 283 | 284 | 285 | # load the model 286 | if args.model=='cnn': 287 | model = CNNModel() 288 | else: 289 | if args.resnet: 290 | model = ResnetRNNModel() 291 | else: 292 | model = RNNModel() 293 | model.to(device) 294 | 295 | if args.warm_start: 296 | print('warm starting model from',args.warm_start) 297 | model_dict = torch.load(os.path.join(args.warm_start,'model'), map_location=device) 298 | model.load_state_dict(model_dict['model_state_dict']) 299 | 300 | # training 301 | if args.train: 302 | 303 | # create log_dir 304 | log_dir = args.log_dir 305 | if log_dir is None: 306 | log_dir = os.path.join(args.log_dir_base,( 307 | 'model='+args.model+ 308 | '_hidden='+str(args.hidden_layer_size)+ 309 | '_layers='+str(args.num_layers)+ 310 | '_cond='+str(args.conditional_model)+ 311 | '_resnet='+str(args.resnet)+ 312 | '_lr='+str(args.learning_rate)+ 313 | '_optim='+args.optimizer+ 314 | '_clip='+str(args.gradient_clipping)+ 315 | '_'+str(datetime.datetime.now()) 316 | )) 317 | try: 318 | os.makedirs(log_dir) 319 | with open(os.path.join(log_dir,'args'), 'w') as f: 320 | f.write('\n'.join(sys.argv[1:])) 321 | except FileExistsError: 322 | print('cannot create log dir,',log_dir,'already exists') 323 | sys.exit(1) 324 | writer = SummaryWriter(log_dir=log_dir) 325 | 326 | # prepare model for training 327 | criterion = nn.CrossEntropyLoss() 328 | if args.optimizer == 'sgd': 329 | optimizer = torch.optim.SGD( 330 | model.parameters(), 331 | lr=args.learning_rate, 332 | momentum=args.momentum, 333 | weight_decay=args.weight_decay 334 | ) 335 | if args.optimizer == 'adam': 336 | optimizer = torch.optim.Adam( 337 | model.parameters(), 338 | lr=args.learning_rate, 339 | weight_decay=args.weight_decay 340 | ) 341 | model.train() 342 | 343 | # training loop 344 | start_time = time.time() 345 | for step in range(1, args.samples + 1): 346 | 347 | # get random training example 348 | categories = [] 349 | lines = [] 350 | for i in range(args.batch_size): 351 | if args.sample_strategy == 'uniform_category': 352 | category = random.choice(all_categories) 353 | line = random.choice(category_lines[category]) 354 | elif args.sample_strategy == 'uniform_line': 355 | line, category = random.choice(lines_category) 356 | 357 | categories.append(all_categories.index(category)) 358 | lines.append(line) 359 | category_tensor = torch.tensor(categories, dtype=torch.long).to(device) 360 | line_tensor = str_to_tensor(lines,args.input_length) 361 | 362 | if args.conditional_model: 363 | category_onehot = torch.nn.functional.one_hot(category_tensor, len(all_categories)).float() 364 | category_onehot = torch.unsqueeze(category_onehot,0) 365 | category_onehot = torch.cat(line_tensor.shape[0][category_onehot],dim=0) 366 | input_tensor = torch.cat([line_tensor,category_onehot],dim=2) 367 | else: 368 | input_tensor = line_tensor 369 | 370 | input_tensor = input_tensor.to(device) 371 | category_tensor = category_tensor.to(device) 372 | 373 | # perform training step 374 | output_class,output_nextchars = model(input_tensor) 375 | loss_class = criterion(output_class, category_tensor) 376 | loss_nextchars_perchar = torch.zeros(output_nextchars.shape[0]).to(device) 377 | for i in range(output_nextchars.shape[0]-1): 378 | _, nextchar_i = line_tensor[i+1,:].topk(1) 379 | nextchar_i = nextchar_i.view([-1]) 380 | loss_nextchars_perchar[i] = criterion(output_nextchars[i,:], nextchar_i) 381 | loss_nextchars = torch.mean(loss_nextchars_perchar) 382 | 383 | if args.conditional_model or args.disable_categories: 384 | loss = loss_nextchars 385 | else: 386 | loss = loss_class + loss_nextchars 387 | loss.backward() 388 | grad_norm = sum([ torch.norm(p.grad)2 for p in model.parameters() if p.grad is not None])(1/2) 389 | if args.gradient_clipping: 390 | torch.nn.utils.clip_grad_norm_(model.parameters(),1.0) 391 | optimizer.step() 392 | 393 | # log optimization information 394 | writer.add_scalar('train/loss_class', loss_class.item(), step) 395 | writer.add_scalar('train/loss_nextchars', loss_nextchars.item(), step) 396 | writer.add_scalar('train/loss', loss.item(), step) 397 | writer.add_scalar('train/grad_norm', grad_norm.item(), step) 398 | 399 | # get accuracy@k 400 | ks = [1, 5, 10, 20] 401 | k = max(ks) 402 | top_n, top_i = output_class.topk(k) 403 | category_tensor_k = torch.cat(k[torch.unsqueeze(category_tensor,dim=1)],dim=1) 404 | accuracies = torch.where( 405 | top_i[:,:]==category_tensor_k, 406 | torch.ones([args.batch_size,k]).to(device), 407 | torch.zeros([args.batch_size,k]).to(device) 408 | ) 409 | for k in ks: 410 | accuracies_k,_ = torch.max(accuracies[:,:k], dim=1) 411 | accuracy_k = torch.mean(accuracies_k).item() 412 | writer.add_scalar('accuracy/@'+str(k), accuracy_k, step) 413 | 414 | # print status update 415 | if step % args.print_every == 0: 416 | 417 | # get category from output 418 | top_n, top_i = output_class.topk(1) 419 | guess_i = top_i[-1].item() 420 | category_i = category_tensor[-1] 421 | guess = all_categories[guess_i] 422 | category = all_categories[category_i] 423 | 424 | # print results 425 | correct = '✓' if guess == category else '✗ (%s)' % category 426 | print('%d %d%% (%.2f sec) %.4f %s / %s %s' % ( 427 | step, 428 | step / args.samples * 100, 429 | time.time()-start_time, 430 | loss, 431 | line, 432 | guess, 433 | correct 434 | )) 435 | 436 | # save model 437 | if step%args.save_every == 0 or step==args.samples: 438 | print('saving model checkpoint') 439 | torch.save({ 440 | 'step':step, 441 | 'model_state_dict': model.state_dict(), 442 | 'optimizer_state_dict': optimizer.state_dict(), 443 | 'loss':loss 444 | }, os.path.join(log_dir,'model')) 445 | 446 | # infer 447 | def infer(line): 448 | line = line.strip() 449 | if args.case_insensitive: 450 | line = line.lower() 451 | line_tensor = str_to_tensor([line],args.input_length) 452 | output_class,output_nextchars = model(line_tensor) 453 | probs = softmax(output_class) 454 | k=20 455 | top_n, top_i = probs.topk(k) 456 | print('line=',line) 457 | for i in range(k): 458 | guess = all_categories[top_i[0,i].item()] 459 | print(' ',i,guess, '(%0.2f)'%top_n[0,i].item()) 460 | if args.infer_path is not None: 461 | i = 0 462 | while os.path.exists(os.path.join(args.infer_path,"line%s.char.png" % str(i).zfill(4))): 463 | i += 1 464 | path_base = os.path.join(args.infer_path,'line'+str(i).zfill(4)) 465 | print('path_base=',path_base) 466 | explain(line, path_base+'.char.png', 'char') 467 | explain(line, path_base+'.word.png', 'word') 468 | 469 | def explain(line,filename,explain_type): 470 | scores = torch.zeros([len(line)]) 471 | scores[0]=5 472 | scores[1]=4 473 | scores[2]=3 474 | scores[3]=2 475 | scores[4]=1 476 | line2img(line,scores,filename) 477 | 478 | 479 | def line2img( 480 | line, 481 | scores, 482 | filename, 483 | maxwidth=40, 484 | img_width=800 485 | ): 486 | ''' 487 | Outputs an image containing text with green/red background highlights to indicate the importance of words in the text. 488 | 489 | Arguments: 490 | line (str): the text that should be printed 491 | scores (Tensor): a vector of size len(line), where each index contains the "weight" of the corresponding letter in the line string; positive values will be colored green, and negative values red. 492 | filename (str): the name of the output file 493 | ''' 494 | import matplotlib 495 | import matplotlib.colors as colors 496 | matplotlib.use('Agg') 497 | import matplotlib.pyplot as plt 498 | import numpy as np 499 | import math 500 | 501 | im_height=1+len(line)//maxwidth 502 | im=np.zeros([maxwidth,im_height]) 503 | for i in range(scores.shape[0]): 504 | im[i%maxwidth,im_height-i//maxwidth-1] = scores[i] 505 | 506 | cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", ["red","white","green"]) 507 | scores_max=torch.max(scores) 508 | norm=plt.Normalize(-scores_max,scores_max) 509 | 510 | dpi=96 511 | fig, ax = plt.subplots(figsize=(img_width/dpi, 300/dpi), dpi=dpi) 512 | ax.get_xaxis().set_visible(False) 513 | ax.get_yaxis().set_visible(False) 514 | ax.spines['left'].set_visible(False) 515 | ax.spines['bottom'].set_visible(False) 516 | ax.spines['right'].set_visible(False) 517 | ax.spines['top'].set_visible(False) 518 | ax.set_xlim(-0.5,-0.5+maxwidth) 519 | ax.set_ylim(-0.5, 0.5+i//maxwidth) 520 | ax.imshow(im.transpose(),cmap=cmap,norm=norm) 521 | for i,c in enumerate(line): 522 | ax.text(i%maxwidth-0.25,im_height-i//maxwidth-0.25-1,c,fontsize=12) 523 | plt.tight_layout() 524 | plt.savefig(filename,bbox_inches='tight') 525 | 526 | 527 | 528 | model.eval() 529 | softmax = torch.nn.Softmax(dim=1) 530 | if args.infer: 531 | for line in sys.stdin: 532 | infer(line) 533 | 534 | if args.generate: 535 | import random 536 | line = '' 537 | for i in range(args.max_sample_length): 538 | line_tensor = str_to_tensor([line],args.input_length) 539 | if args.conditional_model: 540 | category_onehot = torch.zeros([line_tensor.shape[1], len(all_categories)]).to(device) 541 | for category in args.category: 542 | category_i = all_categories.index(category) 543 | category_onehot[0, category_i] = 1 544 | category_onehot = torch.unsqueeze(category_onehot,0) 545 | category_onehot = torch.cat(line_tensor.shape[0][category_onehot],dim=0) 546 | input_tensor = torch.cat([line_tensor,category_onehot],dim=2) 547 | else: 548 | input_tensor = line_tensor 549 | _,output_nextchars = model(input_tensor) 550 | # 3rd order tensor < len(line) x batch_size x len(vocabulary) > 551 | probs = softmax(args.temperatureoutput_nextchars[i,:,:]) 552 | dist = torch.distributions.categorical.Categorical(probs) 553 | nextchar_i = dist.sample() 554 | nextchar = vocabulary[nextchar_i] 555 | if nextchar == EOL: 556 | break 557 | if nextchar == OOV: 558 | nextchar='~' 559 | line += nextchar 560 | if args.conditional_model: 561 | print('name=',line) 562 | else: 563 | infer(line) 564 | 565 | -------------------------------------------------------------------------------- /project/names_transformers.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # process command line args 4 | import argparse 5 | parser = argparse.ArgumentParser(fromfile_prefix_chars='@') 6 | 7 | parser_control = parser.add_argument_group('control options') 8 | parser_control.add_argument('--infer',action='store_true') 9 | parser_control.add_argument('--train',action='store_true') 10 | parser_control.add_argument('--generate',action='store_true') 11 | 12 | parser_data = parser.add_argument_group('data options') 13 | parser_data.add_argument('--data',default='names') 14 | parser_data.add_argument('--data_format',choices=['names','headlines'],default='names') 15 | parser_data.add_argument('--sample_strategy',choices=['uniform_line','uniform_category'],default='uniform_category') 16 | parser_data.add_argument('--case_insensitive',action='store_true') 17 | parser_data.add_argument('--dropout',type=float,default=0.0) 18 | 19 | parser_model = parser.add_argument_group('model options') 20 | parser_model.add_argument('--model',choices=['cnn','rnn','gru','lstm','bert'],default='rnn') 21 | parser_model.add_argument('--resnet',action='store_true') 22 | parser_model.add_argument('--hidden_layer_size',type=int,default=128) 23 | parser_model.add_argument('--num_layers',type=int,default=1) 24 | parser_model.add_argument('--conditional_model',action='store_true') 25 | 26 | parser_opt = parser.add_argument_group('optimization options') 27 | parser_opt.add_argument('--batch_size',type=int,default=1) 28 | parser_opt.add_argument('--learning_rate',type=float,default=1e-1) 29 | parser_opt.add_argument('--optimizer',choices=['sgd','adam'],default='sgd') 30 | parser_opt.add_argument('--gradient_clipping',action='store_true') 31 | parser_opt.add_argument('--momentum',type=float,default=0.9) 32 | parser_opt.add_argument('--weight_decay',type=float,default=1e-4) 33 | parser_opt.add_argument('--samples',type=int,default=10000) 34 | parser_opt.add_argument('--input_length',type=int) 35 | parser_opt.add_argument('--warm_start') 36 | parser_opt.add_argument('--disable_categories',action='store_true') 37 | 38 | parser_infer = parser.add_argument_group('inference options') 39 | parser_infer.add_argument('--infer_path',default='explain_outputs') 40 | 41 | parser_generate = parser.add_argument_group('generate options') 42 | parser_generate.add_argument('--temperature',type=float,default=1.0) 43 | parser_generate.add_argument('--max_sample_length',type=int,default=100) 44 | parser_generate.add_argument('--category',nargs='') 45 | 46 | parser_debug = parser.add_argument_group('debug options') 47 | parser_debug.add_argument('--device',choices=['auto','cpu','gpu'],default='auto') 48 | parser_debug.add_argument('--print_delay',type=int,default=5) 49 | parser_debug.add_argument('--log_dir_base',type=str,default='log') 50 | parser_debug.add_argument('--log_dir',type=str) 51 | parser_debug.add_argument('--save_every',type=int,default=1000) 52 | parser_debug.add_argument('--print_every',type=int,default=100) 53 | 54 | args = parser.parse_args() 55 | 56 | if args.model=='cnn' and args.input_length is None: 57 | raise ValueError('if --model=cnn, then you must specify --input_length') 58 | 59 | # load args from file if warm starting 60 | if args.warm_start is not None: 61 | import sys 62 | import os 63 | args_orig = args 64 | args = parser.parse_args(['@'+os.path.join(args.warm_start,'args')]+sys.argv[1:]) 65 | args.train = args_orig.train 66 | 67 | # supress warnings 68 | import warnings 69 | warnings.simplefilter(action='ignore', category=FutureWarning) 70 | 71 | # load modules 72 | import datetime 73 | import glob 74 | import os 75 | import math 76 | import random 77 | import string 78 | import sys 79 | import time 80 | import unicodedata 81 | from unidecode import unidecode 82 | 83 | import torch 84 | import torch.nn as nn 85 | from torch.utils.tensorboard import SummaryWriter 86 | import transformers 87 | 88 | # set device to cpu/gpu 89 | if args.device=='gpu' or (torch.cuda.is_available() and args.device=='auto'): 90 | device = torch.device('cuda') 91 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 92 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 93 | else: 94 | device = torch.device('cpu') 95 | print('device=',device) 96 | 97 | # import the training data 98 | BOL = '\x00' 99 | EOL = '\x01' 100 | OOV = '\x02' 101 | if args.case_insensitive: 102 | vocabulary = string.ascii_lowercase 103 | else: 104 | vocabulary = string.ascii_letters 105 | vocabulary += " .,;'" + '1234567890:-/#$%' + OOV + BOL + EOL 106 | print('len(vocabulary)=',len(vocabulary)) 107 | 108 | def unicode_to_ascii(s): 109 | ''' 110 | Removes diacritics from unicode characters. 111 | See: https://stackoverflow.com/a/518232/2809427 112 | ''' 113 | return ''.join( 114 | c for c in unicodedata.normalize('NFD', s) 115 | if unicodedata.category(c) != 'Mn' 116 | and c in vocabulary 117 | ) 118 | 119 | def format_line(line): 120 | line = unidecode(line) 121 | if args.case_insensitive: 122 | line = line.lower() 123 | return line 124 | 125 | # Build the category_lines dictionary, a list of names per language 126 | if args.data_format == 'names': 127 | category_lines = {} 128 | all_categories = [] 129 | #for filename in glob.glob(os.path.join(args.data,'.txt')): 130 | for filename in glob.glob(os.path.join(args.data,'')): 131 | print('filename=',filename) 132 | category = os.path.splitext(os.path.basename(filename))[0] 133 | all_categories.append(category) 134 | lines = open(filename, encoding='utf-8').read().strip().split('\n') 135 | lines = [format_line(line) for line in lines] 136 | category_lines[category] = lines 137 | 138 | elif args.data_format == 'headlines': 139 | import gzip 140 | import json 141 | from collections import defaultdict,Counter 142 | 143 | # load data points 144 | category_lines = defaultdict(lambda: []) 145 | lines_category = [] 146 | categories_counter = Counter() 147 | with gzip.open(args.data,'rt') as f: 148 | for line in f: 149 | article = json.loads(line) 150 | hostname = article['hostname'] 151 | categories_counter[hostname] += 1 152 | #day = article['day'].split()[0] 153 | title = article['title'] 154 | title = format_line(title) 155 | category_lines[hostname].append(title) 156 | lines_category.append((title, hostname)) 157 | all_categories = [ hostname for hostname,count in categories_counter.most_common() ] 158 | all_categories = list(all_categories) 159 | 160 | #print('len(lines_category)=',len(lines_category)) 161 | print('len(all_categories)=',len(all_categories)) 162 | 163 | def str_to_tensor(ss,input_length=None): 164 | ''' 165 | Converts a list of strings into a tensor of shape <max_length, len(ss), len(vocabulary)>. 166 | This is used to convert text into a form suitable for input into a RNN/CNN. 167 | ''' 168 | max_length = max([len(s) for s in ss]) + 2 169 | if input_length: 170 | max_length = input_length 171 | tensor = torch.zeros(max_length, len(ss), len(vocabulary)).to(device) 172 | for j,s in enumerate(ss): 173 | s = BOL + s + EOL 174 | for i, letter in enumerate(s): 175 | if i<max_length: 176 | vocabulary_i = vocabulary.find(letter) 177 | if vocabulary_i==-1: 178 | vocabulary_i = vocabulary.find(OOV) 179 | tensor[i,j,vocabulary_i] = 1 180 | return tensor 181 | 182 | def str_to_tensor_bert(lines): 183 | max_length = 64 184 | encodings = [] 185 | for line in lines: 186 | encoding = tokenizer.encode_plus( 187 | line, 188 | #add_special_tokens = True, 189 | max_length = max_length, 190 | pad_to_max_length = True, 191 | return_attention_mask = True, 192 | return_tensors = 'pt', 193 | ) 194 | encodings.append(encoding) 195 | input_ids = torch.cat([ encoding['input_ids'] for encoding in encodings ],dim=0) 196 | attention_mask = torch.cat([ encoding['attention_mask'] for encoding in encodings ],dim=0) 197 | return input_ids,attention_mask 198 | 199 | # define the model 200 | input_size = len(vocabulary) 201 | if args.conditional_model: 202 | input_size += len(all_categories) 203 | 204 | class RNNModel(nn.Module): 205 | def init(self): 206 | super(RNNModel,self).init() 207 | if args.model=='rnn': 208 | mk_rnn = nn.RNN 209 | if args.model=='gru': 210 | mk_rnn = nn.GRU 211 | if args.model=='lstm': 212 | mk_rnn = nn.LSTM 213 | self.rnn = mk_rnn( 214 | input_size, 215 | args.hidden_layer_size, 216 | num_layers=args.num_layers, 217 | dropout=args.dropout 218 | ) 219 | self.fc_class = nn.Linear(args.hidden_layer_size,len(all_categories)) 220 | self.dropout = nn.Dropout(args.dropout) 221 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,len(vocabulary)) 222 | 223 | def forward(self, x): 224 | # out is 3rd order: < len(line) x batch size x hidden_layer_size > 225 | out,h_n = self.rnn(x) 226 | out = self.dropout(out) 227 | out_class = self.fc_class(out[out.shape[0]-1,:,:]) 228 | out_nextchars = torch.zeros(out.shape[0] , out.shape[1], len(vocabulary) ).to(device) 229 | for i in range(out.shape[0]): 230 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:]) 231 | return out_class, out_nextchars 232 | 233 | class ResnetRNNModel(nn.Module): 234 | def init(self): 235 | super().init() 236 | if args.model=='rnn': 237 | mk_rnn = nn.RNN 238 | if args.model=='gru': 239 | mk_rnn = nn.GRU 240 | if args.model=='lstm': 241 | mk_rnn = nn.LSTM 242 | rnn_input_size = input_size 243 | self.rnns = [] 244 | for layer in range(args.num_layers): 245 | rnn = mk_rnn( 246 | rnn_input_size, 247 | args.hidden_layer_size, 248 | num_layers=1, 249 | ) 250 | self.add_module('rnn'+str(layer),rnn) 251 | self.rnns.append(rnn) 252 | rnn_input_size = args.hidden_layer_size 253 | self.fc_class = nn.Linear(args.hidden_layer_size,len(all_categories)) 254 | self.dropout = nn.Dropout(args.dropout) 255 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,len(vocabulary)) 256 | 257 | def forward(self, x): 258 | # out is 3rd order: < len(line) x batch size x hidden_layer_size > 259 | out = x 260 | for layer,rnn in enumerate(self.rnns): 261 | out_prev = out 262 | out,_ = rnn(out) 263 | if layer>0 and args.resnet: 264 | out = out + out_prev 265 | out = self.dropout(out) 266 | out_class = self.fc_class(out[out.shape[0]-1,:,:]) 267 | out_nextchars = torch.zeros(out.shape[0] , out.shape[1], len(vocabulary) ).to(device) 268 | for i in range(out.shape[0]): 269 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:]) 270 | return out_class, out_nextchars 271 | 272 | 273 | class CNNModel(nn.Module): 274 | def init(self): 275 | super(CNNModel,self).init() 276 | self.relu = nn.ReLU() 277 | self.cnn = \ 278 | nn.Conv1d(input_size,args.hidden_layer_size,3,padding=1) 279 | self.cnns = (args.num_layers-1)[ 280 | nn.Conv1d(args.hidden_layer_size,args.hidden_layer_size,3,padding=1) 281 | ] 282 | self.dropout = nn.Dropout(args.dropout) 283 | self.fc_class = nn.Linear(args.hidden_layer_sizeargs.input_length,len(all_categories)) 284 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,input_size) 285 | 286 | def forward(self,x): 287 | out = torch.einsum('lbv->bvl',x) 288 | out = self.cnn(out) 289 | out = self.relu(out) 290 | for cnn in self.cnns: 291 | out = cnn(out) 292 | out = self.relu(out) 293 | out = self.dropout(out) 294 | out_class = out.view(args.batch_size,args.hidden_layer_sizeargs.input_length) 295 | out_class = self.fc_class(out_class) 296 | out = torch.einsum('ijk->kij',out) 297 | out_nextchars = torch.zeros([out.shape[0],out.shape[1],input_size]) 298 | for i in range(out.shape[0]): 299 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:]) 300 | return out_class, out_nextchars 301 | 302 | 303 | model_name = 'bert-base-multilingual-uncased' 304 | tokenizer = transformers.BertTokenizer.from_pretrained(model_name) 305 | bert = transformers.BertModel.from_pretrained(model_name) 306 | print('bert.config.vocab_size=',bert.config.vocab_size) 307 | class BertFineTuning(nn.Module): 308 | def init(self): 309 | super().init() 310 | embedding_size = args.hidden_layer_size 311 | self.fc_class = nn.Linear(768,len(all_categories)) 312 | 313 | def forward(self,x): 314 | input_ids, attention_mask = x 315 | last_layer,embedding = bert(input_ids) 316 | embedding = torch.mean(last_layer,dim=1) 317 | out = self.fc_class(embedding) 318 | return out, None 319 | 320 | # load the model 321 | if args.model=='bert': 322 | model = BertFineTuning() 323 | elif args.model=='cnn': 324 | model = CNNModel() 325 | else: 326 | if args.resnet: 327 | model = ResnetRNNModel() 328 | else: 329 | model = RNNModel() 330 | model.to(device) 331 | 332 | if args.warm_start: 333 | print('warm starting model from',args.warm_start) 334 | model_dict = torch.load(os.path.join(args.warm_start,'model')) 335 | model.load_state_dict(model_dict['model_state_dict']) 336 | 337 | # training 338 | if args.train: 339 | 340 | # create log_dir 341 | log_dir = args.log_dir 342 | if log_dir is None: 343 | log_dir = os.path.join(args.log_dir_base,( 344 | 'model='+args.model+ 345 | '_hidden='+str(args.hidden_layer_size)+ 346 | '_layers='+str(args.num_layers)+ 347 | '_cond='+str(args.conditional_model)+ 348 | '_resnet='+str(args.resnet)+ 349 | '_lr='+str(args.learning_rate)+ 350 | '_optim='+args.optimizer+ 351 | '_clip='+str(args.gradient_clipping)+ 352 | '_'+str(datetime.datetime.now()) 353 | )) 354 | try: 355 | os.makedirs(log_dir) 356 | with open(os.path.join(log_dir,'args'), 'w') as f: 357 | f.write('\n'.join(sys.argv[1:])) 358 | except FileExistsError: 359 | print('cannot create log dir,',log_dir,'already exists') 360 | sys.exit(1) 361 | writer = SummaryWriter(log_dir=log_dir) 362 | 363 | # prepare model for training 364 | criterion = nn.CrossEntropyLoss() 365 | print('model.parameters()=',list(model.parameters())) 366 | if args.optimizer == 'sgd': 367 | optimizer = torch.optim.SGD( 368 | model.parameters(), 369 | lr=args.learning_rate, 370 | momentum=args.momentum, 371 | weight_decay=args.weight_decay 372 | ) 373 | if args.optimizer == 'adam': 374 | optimizer = torch.optim.Adam( 375 | model.parameters(), 376 | lr=args.learning_rate, 377 | weight_decay=args.weight_decay 378 | ) 379 | model.train() 380 | 381 | # training loop 382 | start_time = time.time() 383 | for step in range(1, args.samples + 1): 384 | 385 | # get random training example 386 | categories = [] 387 | lines = [] 388 | for i in range(args.batch_size): 389 | if args.sample_strategy == 'uniform_category': 390 | category = random.choice(all_categories) 391 | line = random.choice(category_lines[category]) 392 | elif args.sample_strategy == 'uniform_line': 393 | line, category = random.choice(lines_category) 394 | 395 | categories.append(all_categories.index(category)) 396 | lines.append(line) 397 | category_tensor = torch.tensor(categories, dtype=torch.long).to(device) 398 | 399 | if args.model=='bert': 400 | input_tensor = str_to_tensor_bert(lines) 401 | else: 402 | line_tensor = str_to_tensor(lines,args.input_length) 403 | 404 | if args.conditional_model: 405 | category_onehot = torch.nn.functional.one_hot(category_tensor, len(all_categories)).float() 406 | category_onehot = torch.unsqueeze(category_onehot,0) 407 | category_onehot = torch.cat(line_tensor.shape[0][category_onehot],dim=0) 408 | input_tensor = torch.cat([line_tensor,category_onehot],dim=2) 409 | else: 410 | input_tensor = line_tensor 411 | 412 | input_tensor = input_tensor.to(device) 413 | category_tensor = category_tensor.to(device) 414 | 415 | # perform training step 416 | output_class,output_nextchars = model(input_tensor) 417 | loss_class = criterion(output_class, category_tensor) 418 | if args.model=='bert': 419 | loss_nextchars = torch.tensor(0.0) 420 | else: 421 | loss_nextchars_perchar = torch.zeros(output_nextchars.shape[0]).to(device) 422 | for i in range(output_nextchars.shape[0]-1): 423 | _, nextchar_i = line_tensor[i+1,:].topk(1) 424 | nextchar_i = nextchar_i.view([-1]) 425 | loss_nextchars_perchar[i] = criterion(output_nextchars[i,:], nextchar_i) 426 | loss_nextchars = torch.mean(loss_nextchars_perchar) 427 | 428 | if args.conditional_model or args.disable_categories: 429 | loss = loss_nextchars 430 | else: 431 | loss = loss_class + loss_nextchars 432 | loss.backward() 433 | grad_norm = sum([ torch.norm(p.grad)2 for p in model.parameters() if p.grad is not None])(1/2) 434 | if args.gradient_clipping: 435 | torch.nn.utils.clip_grad_norm_(model.parameters(),1.0) 436 | optimizer.step() 437 | 438 | # log optimization information 439 | writer.add_scalar('train/loss_class', loss_class.item(), step) 440 | writer.add_scalar('train/loss_nextchars', loss_nextchars.item(), step) 441 | writer.add_scalar('train/loss', loss.item(), step) 442 | writer.add_scalar('train/grad_norm', grad_norm.item(), step) 443 | 444 | # get accuracy@k 445 | ks = [1, 5, 10, 20] 446 | k = max(ks) 447 | top_n, top_i = output_class.topk(k) 448 | category_tensor_k = torch.cat(k[torch.unsqueeze(category_tensor,dim=1)],dim=1) 449 | accuracies = torch.where( 450 | top_i[:,:]==category_tensor_k, 451 | torch.ones([args.batch_size,k]).to(device), 452 | torch.zeros([args.batch_size,k]).to(device) 453 | ) 454 | for k in ks: 455 | accuracies_k,_ = torch.max(accuracies[:,:k], dim=1) 456 | accuracy_k = torch.mean(accuracies_k).item() 457 | writer.add_scalar('accuracy/@'+str(k), accuracy_k, step) 458 | 459 | # print status update 460 | if step % args.print_every == 0: 461 | 462 | # get category from output 463 | top_n, top_i = output_class.topk(1) 464 | guess_i = top_i[-1].item() 465 | category_i = category_tensor[-1] 466 | guess = all_categories[guess_i] 467 | category = all_categories[category_i] 468 | 469 | # print results 470 | correct = '✓' if guess == category else '✗ (%s)' % category 471 | print('%d %d%% (%.2f sec) %.4f %s / %s %s' % ( 472 | step, 473 | step / args.samples * 100, 474 | time.time()-start_time, 475 | loss, 476 | line, 477 | guess, 478 | correct 479 | )) 480 | 481 | # save model 482 | if step%args.save_every == 0 or step==args.samples: 483 | print('saving model checkpoint') 484 | torch.save({ 485 | 'step':step, 486 | 'model_state_dict': model.state_dict(), 487 | 'optimizer_state_dict': optimizer.state_dict(), 488 | 'loss':loss 489 | }, os.path.join(log_dir,'model')) 490 | 491 | # infer 492 | def infer(line): 493 | line = line.strip() 494 | if args.case_insensitive: 495 | line = line.lower() 496 | line_tensor = str_to_tensor([line],args.input_length) 497 | output_class,output_nextchars = model(line_tensor) 498 | probs = softmax(output_class) 499 | k=20 500 | top_n, top_i = probs.topk(k) 501 | print('line=',line) 502 | for i in range(k): 503 | guess = all_categories[top_i[0,i].item()] 504 | print(' ',i,guess, '(%0.2f)'%top_n[0,i].item()) 505 | if args.infer_path is not None: 506 | i = 0 507 | while os.path.exists(os.path.join(args.infer_path,"line%s.char.png" % str(i).zfill(4))): 508 | i += 1 509 | path_base = os.path.join(args.infer_path,'line'+str(i).zfill(4)) 510 | print('path_base=',path_base) 511 | explain(line, path_base+'.char.png', 'char') 512 | explain(line, path_base+'.word.png', 'word') 513 | 514 | def explain(line,filename,explain_type): 515 | scores = torch.zeros([len(line)]) 516 | scores[0]=5 517 | scores[1]=4 518 | scores[2]=3 519 | scores[3]=2 520 | scores[4]=1 521 | line2img(line,scores,filename) 522 | 523 | 524 | def line2img( 525 | line, 526 | scores, 527 | filename, 528 | maxwidth=40, 529 | img_width=800 530 | ): 531 | ''' 532 | Outputs an image containing text with green/red background highlights to indicate the importance of words in the text. 533 | 534 | Arguments: 535 | line (str): the text that should be printed 536 | scores (Tensor): a vector of size len(line), where each index contains the "weight" of the corresponding letter in the line string; positive values will be colored green, and negative values red. 537 | filename (str): the name of the output file 538 | ''' 539 | import matplotlib 540 | import matplotlib.colors as colors 541 | matplotlib.use('Agg') 542 | import matplotlib.pyplot as plt 543 | import numpy as np 544 | import math 545 | 546 | im_height=1+len(line)//maxwidth 547 | im=np.zeros([maxwidth,im_height]) 548 | for i in range(scores.shape[0]): 549 | im[i%maxwidth,im_height-i//maxwidth-1] = scores[i] 550 | 551 | cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", ["red","white","green"]) 552 | scores_max=torch.max(scores) 553 | norm=plt.Normalize(-scores_max,scores_max) 554 | 555 | dpi=96 556 | fig, ax = plt.subplots(figsize=(img_width/dpi, 300/dpi), dpi=dpi) 557 | ax.get_xaxis().set_visible(False) 558 | ax.get_yaxis().set_visible(False) 559 | ax.spines['left'].set_visible(False) 560 | ax.spines['bottom'].set_visible(False) 561 | ax.spines['right'].set_visible(False) 562 | ax.spines['top'].set_visible(False) 563 | ax.set_xlim(-0.5,-0.5+maxwidth) 564 | ax.set_ylim(-0.5, 0.5+i//maxwidth) 565 | ax.imshow(im.transpose(),cmap=cmap,norm=norm) 566 | for i,c in enumerate(line): 567 | ax.text(i%maxwidth-0.25,im_height-i//maxwidth-0.25-1,c,fontsize=12) 568 | plt.tight_layout() 569 | plt.savefig(filename,bbox_inches='tight') 570 | 571 | 572 | 573 | model.eval() 574 | softmax = torch.nn.Softmax(dim=1) 575 | if args.infer: 576 | for line in sys.stdin: 577 | infer(line) 578 | 579 | if args.generate: 580 | import random 581 | line = '' 582 | for i in range(args.max_sample_length): 583 | line_tensor = str_to_tensor([line],args.input_length) 584 | if args.conditional_model: 585 | category_onehot = torch.zeros([line_tensor.shape[1], len(all_categories)]).to(device) 586 | for category in args.category: 587 | category_i = all_categories.index(category) 588 | category_onehot[0, category_i] = 1 589 | category_onehot = torch.unsqueeze(category_onehot,0) 590 | category_onehot = torch.cat(line_tensor.shape[0][category_onehot],dim=0) 591 | input_tensor = torch.cat([line_tensor,category_onehot],dim=2) 592 | else: 593 | input_tensor = line_tensor 594 | _,output_nextchars = model(input_tensor) 595 | # 3rd order tensor < len(line) x batch_size x len(vocabulary) > 596 | probs = softmax(args.temperatureoutput_nextchars[i,:,:]) 597 | dist = torch.distributions.categorical.Categorical(probs) 598 | nextchar_i = dist.sample() 599 | nextchar = vocabulary[nextchar_i] 600 | if nextchar == EOL: 601 | break 602 | if nextchar == OOV: 603 | nextchar='~' 604 | line += nextchar 605 | if args.conditional_model: 606 | print('name=',line) 607 | else: 608 | infer(line) 609 | 610 | 611 | -------------------------------------------------------------------------------- /project/transformers_tutorial.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | # these lines prevent lots of warnings from being displayed 4 | import warnings 5 | warnings.simplefilter(action='ignore', category=FutureWarning) 6 | import os 7 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 8 | 9 | # import the deep learning libraries 10 | import torch 11 | import transformers 12 | 13 | # load the model 14 | #checkpoint_name = 'bert-base-uncased' 15 | checkpoint_name = 'bert-base-multilingual-uncased' 16 | tokenizer = transformers.BertTokenizer.from_pretrained(checkpoint_name) 17 | bert = transformers.BertModel.from_pretrained(checkpoint_name) 18 | 19 | # sample data 20 | lines = [ 21 | 'The coronavirus pandemic has taken over the world.', # English 22 | 'La pandemia de coronavirus se ha apoderado del mundo.', # Spanish 23 | 'La pandemia di coronavirus ha conquistato il mondo.', # Italian 24 | 'Capta est coronavirus pandemic in orbe terrarum.', # Latin 25 | 'đại dịch coronavirus đã chiếm lĩnh thế giới.', # Vietnamese 26 | 'пандемия коронавируса захватила мир.', # Russian 27 | 'سيطر وباء الفيروس التاجي على العالم.', # Arabic 28 | 'מגיפת הנגיף השתלט על העולם.', # Hebrew 29 | '코로나 바이러스 전염병이 세계를 점령했습니다.', # Korean 30 | '冠狀病毒大流行已席捲全球。', # Chinese (simplified) 31 | '冠状病毒大流行已经席卷全球。', # Chinese (traditional) 32 | 'コロナウイルスのパンデミックが世界を席巻しました。', # Japanese 33 | ] 34 | 35 | for line in lines: 36 | tokens = tokenizer.tokenize(line) 37 | print("tokens=",tokens) 38 | crash 39 | 40 | # generates 1-hot encodings of the lines 41 | max_length = 64 42 | encodings = [] 43 | #lines = lines[0:1] 44 | for line in lines: 45 | encoding = tokenizer.encode_plus( 46 | line, 47 | #add_special_tokens = True, 48 | max_length = max_length, 49 | pad_to_max_length = True, 50 | #return_attention_mask = True, 51 | return_tensors = 'pt', 52 | ) 53 | #print("encoding.keys()=",encoding.keys()) 54 | #print("encoding['input_ids'].shape=",encoding['input_ids'].shape) 55 | #print("encoding['input_ids']=",encoding['input_ids']) 56 | encodings.append(encoding) 57 | 58 | input_ids = torch.cat([encoding['input_ids'] for encoding in encodings ],dim=0) 59 | #attention_mask = torch.cat([ encoding['attention_mask'] for encoding in encodings ],dim=0) 60 | 61 | import datetime 62 | for i in range(10): 63 | print(datetime.datetime.now()) 64 | last_layer,embedding = bert(input_ids) #, attention_mask) 65 | print("last_layer.shape=",last_layer.shape) 66 | print("embedding.shape=",embedding.shape) 67 | crash 68 | 69 | 70 | class BertFineTuning(nn.Module): 71 | def init(self): 72 | super().init() 73 | #self.bert = transformers.BertModel.from_pretrained(checkpoint_name) 74 | self.fc = nn.Linear(768,num_classes) 75 | 76 | def forward(self,x): 77 | #last_layer,embedding = self.bert(x) 78 | last_layer,embedding = bert(x) 79 | embedding = torch.mean(last_layer,dim=1) 80 | out = self.fc(embedding) 81 | return out 82 | --------------------------------------------------------------------------------