├── .gitignore
├── README.md
├── hw1
├── README.md
└── rosenbrock.py
├── hw2
├── README.md
├── classifier.py
└── run_classifier.sh
├── hw3
└── README.md
├── hw4
├── README.md
└── resnet.py
├── hw5
├── README.md
├── cmc1.png
├── cmc2.png
├── mike1.png
├── mike2.png
├── mike3.png
├── mike4.png
└── mike5.png
├── hw6
├── README.md
├── img
│ └── xkcd-training.png
├── names.py
├── names.tar
├── names.tar.gz
├── names.zip
└── names
│ ├── Arabic.txt
│ ├── Chinese.txt
│ ├── Czech.txt
│ ├── Dutch.txt
│ ├── English.txt
│ ├── French.txt
│ ├── German.txt
│ ├── Greek.txt
│ ├── Irish.txt
│ ├── Italian.txt
│ ├── Japanese.txt
│ ├── Korean.txt
│ ├── Polish.txt
│ ├── Portuguese.txt
│ ├── Russian.txt
│ ├── Scottish.txt
│ ├── Spanish.txt
│ └── Vietnamese.txt
├── hw7
├── README.md
└── names.py
├── img
└── layers.png
├── lecture_notes
├── ad.py
├── ad2.py
└── einsum.py
└── project
├── README.md
├── img
├── line0000.char.png
├── line0000.word.png
├── line0001.char.png
├── line0001.word.png
├── line0002.char.png
├── line0002.word.png
├── line0003.char.png
└── line0003.word.png
├── names.py
├── names_embedding.py
├── names_transformers.py
└── transformers_tutorial.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.swp
2 | *.swo
3 | __pycache__
4 | notes
5 | data
6 | names
7 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # CSCI181: Deep Learning
2 |
3 | ## About the Instructor
4 |
5 | |||
6 | |-|-|
7 | | Name | Mike Izbicki (call me Mike) |
8 | | Email | mizbicki@cmc.edu |
9 | | Office | Adams 216 |
10 | | Office Hours | Monday 9:00-10:00AM, Tuesday/Thursday 2:30-3:30PM, or by appointment ([see my schedule](https://izbicki.me/schedule.html));
if my door is open, feel free to come in |
11 | | Webpage | [izbicki.me](https://izbicki.me) |
12 | | Research | Machine Learning (see [izbicki.me/research.html](https://izbicki.me/research.html) for some past projects) |
13 | | Fun Facts | grew up in San Clemente, 7 years in the navy, phd/postdoc at UC Riverside, taught in [DPRK](https://pust.co) |
14 |
15 | ## About the Course
16 |
17 | This is a course on **deep learning** (not big data).
18 |
19 |
20 |
21 |
22 |
23 | **Course Objectives:**
24 |
25 | Learning objectives:
26 |
27 | 1. Write basic PyTorch applications
28 | 1. Understand the "classic" deep network architectures
29 | 1. Use existing models in a "reasonable" way
30 | 1. Understand the limitations of deep learning
31 | 1. Read research papers published in deep learning
32 | 1. Understand what graduate school in machine learning is like
33 | 1. (Joke) [Understand that Schmidhuber invented machine learning](https://www.reddit.com/r/MachineLearning/comments/eivtmq/d_nominate_jurgen_schmidhuber_for_the_2020_turing/)
34 |
35 | My personal goal:
36 |
37 | 1. Find students to conduct research with me
38 |
39 | **Expected Background:**
40 |
41 | Necessary:
42 |
43 | 1. Basic Python programming
44 | 1. Linear algebra
45 | 1. Calc III
46 | 1. Statistics
47 |
48 | Good to have:
49 |
50 | 1. Machine learning / data mining
51 | 1. Lots of math
52 | 1. Familiarity with Unix and github
53 |
54 | **Resources:**
55 |
56 | Textbook:
57 |
58 | 1. [The Deep Learning Book](http://www.deeplearningbook.org/), by Ian Goodfellow and Yoshua Bengio and Aaron Courville; I will assume that you already know all of Part I of this book (basically the equivalent of a data mining/machine learning course)
59 | 1. Various papers/webpages as listed below
60 |
61 | Deep learning examples:
62 |
63 | 1. Images / Video
64 | 1. [Deoldify](https://github.com/jantic/DeOldify)
65 | 1. [style transfer](https://genekogan.com/works/style-transfer/)
66 | 1. [more style transfer](https://github.com/lengstrom/fast-style-transfer)
67 | 1. [dance coreography](https://experiments.withgoogle.com/billtjonesai)
68 | 1. [StyleGAN](https://github.com/NVlabs/stylegan)
69 | 1. [DeepPrivacy](https://github.com/hukkelas/DeepPrivacy)
70 | 1. https://thispersondoesnotexist.com/
71 | 1. https://thiscatdoesnotexist.com/
72 | 1. [Deep fakes](https://www.creativebloq.com/features/deepfake-examples)
73 | 1. [In Event of Moon Disaster](https://www.wbur.org/news/2019/11/22/mit-nixon-deep-fake)
74 |
75 | 1. Text
76 | 1. [Image captioning](https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/)
77 | 1. [AI Dungeon](https://www.aidungeon.io/)
78 | 1. https://www.thisstorydoesnotexist.com/
79 | 1. https://translate.google.com
80 |
81 | 1. Games
82 | 1. [AlphaGo](https://deepmind.com/research/case-studies/alphago-the-story-so-far)
83 | 1. [Dota 2](https://openai.com/projects/five/)
84 | 1. [StarCraft 2](https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning)
85 | 1. [MarioCart](https://www.youtube.com/watch?v=Ipi40cb_RsI)
86 | 1. [Mario](https://www.youtube.com/watch?v=qv6UVOQ0F44)
87 |
88 | 1. Other
89 | 1. [iSketchNFill](https://github.com/arnabgho/iSketchNFill)
90 | 1. [scrying-pen](https://experiments.withgoogle.com/scrying-pen)
91 | 1. [Tacotron](https://google.github.io/tacotron/publications/speaker_adaptation/)
92 |
93 | The good:
94 |
95 | 1. [most influential research in 2019 is deep learning papers](https://www.altmetric.com/top100/2019/)
96 | 1. [/r/machinelearning](https://reddit.com/r/machinelearning)
97 | 1. [recent open source AI programs](https://www.reddit.com/r/MachineLearning/comments/egyp7w/d_what_is_your_favorite_opensource_project_of/)
98 | 1. [The state of jobs in deep learning](https://www.reddit.com/r/MachineLearning/comments/egt6dp/d_are_decent_machine_learning_graduates_having_a/)
99 | 1. [The decade in review](https://leogao.dev/2019/12/31/The-Decade-of-Deep-Learning/)
100 |
101 | The bad:
102 |
103 | 1. [Machine learning reproducibility crisis](https://www.wired.com/story/artificial-intelligence-confronts-reproducibility-crisis/)
104 | 1. [Logistic regression vs deep learning in aftershock prediction](https://www.reddit.com/r/MachineLearning/comments/dcy2ar/r_one_neuron_versus_deep_learning_in_aftershock/)
105 | 1. [Pictures of black people](https://news.ycombinator.com/item?id=21147916)
106 | 1. [NLP Clever Hans BERT](https://thegradient.pub/nlps-clever-hans-moment-has-arrived/)
107 | 1. [Ex-Baidu researcher denies cheating at machine learning competition](https://www.enterpriseai.news/2015/06/12/baidu-fires-deep-images-ren-wu/)
108 |
109 | Computing resources:
110 |
111 | 1. [Google Colab](https://colab.research.google.com/notebooks/welcome.ipynb) provides 12 hours of free GPUs in a Jupyter notebook
112 | 1. [Kaggle](https://forums.fast.ai/t/kaggle-kernels-now-support-gpu-for-free/16217) provides 30 hours of free GPU
113 | 1. I have a 40CPU/8GPU machine that you can access for the course
114 | 1. I have another 4CPU/1GPU machine that needs someone to set it up
115 |
116 | Videos:
117 |
118 | 1. [3blue1brown](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi)
119 | 1. [2 minute papers](https://www.youtube.com/user/keeroyz)
120 | 1. [arxiv insights](https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg)
121 |
122 |
126 |
127 | ## Schedule
128 |
129 | | Week | Date | Topic |
130 | | ---- | ------------ | -------------------------------------- |
131 | | 1 | Tues, 21 Jan | Intro: Examples of Deep Learning |
132 | | 1 | Thur, 23 Jan | Automatic differentiation
- [pytorch tutorial part 1](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html)
- [pytorch tutorial part 2](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)
- [automatic differentiation tutorial](https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation)
- [einstein summation tutorial](https://rockt.github.io/2018/04/30/einsum)
- [NeurIPS paper](http://papers.nips.cc/paper/8092-automatic-differentiation-in-ml-where-we-are-and-where-we-should-be-going)
- [JMLR paper](http://www.jmlr.org/papers/v18/17-468.html)
- [pytoch: forward mode ad](https://github.com/pytorch/pytorch/issues/10223)
- [tensorflow: forward mode ad](https://github.com/pytorch/pytorch/issues/10223)
|
133 | | 2 | Tues, 28 Jan | Machine Learning Basics (Deep Learning Book Part 1, especially chapters 5.2-5.4) |
134 | | 2 | Thur, 30 Jan | Optimization
- [why momentum really works](https://distill.pub/2017/momentum/)
- [Leon Bottou's SGD paper](https://datajobs.com/data-science-repo/Stochastic-Gradient-Descent-[Leon-Bottou].pdf)
- [pytorch loss functions](https://pytorch.org/docs/stable/nn.html#crossentropyloss)
- [reflections on random kitchen sinks](http://www.argmin.net/2017/12/05/kitchen-sinks/)
- [Ali Rahimi's NIPS/NeurIPS 2017 keynote](https://www.youtube.com/watch?v=Qi1Yry33TQE)
- [OpenAI switches to PyTorch](https://openai.com/blog/openai-pytorch/)
|
135 | | 3 | Tues, 04 Feb | Image: CNNs
- [Stanford lecture slides](http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture05.pdf)
|
136 | | 3 | Thur, 06 Feb | Image: CNNs II
- [An intuitive explanation of CNNs](https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/) (compare with [eigenfaces](https://towardsdatascience.com/eigenfaces-recovering-humans-from-ghosts-17606c328184))
- [The history of neural networks](https://dataconomy.com/2017/04/history-neural-networks/)
[Summer Research](https://www.cmc.edu/summer-research/program-overview) |
137 | | 4 | Tues, 11 Feb | Regularization |
138 | | 4 | Thur, 13 Feb | Image: ResNet
- [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
- [An Overview of ResNet and its Variants](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035)
- [CVPR2016 Video](https://www.youtube.com/watch?v=C6tLw-rPQ2o)
More links:- [Schmidhuber on ResNet I](http://people.idsia.ch/~juergen/microsoft-wins-imagenet-through-feedforward-LSTM-without-gates.html)
- [Schmidhuber on ResNet II](http://people.idsia.ch/~juergen/highway-networks.html)
- [Baidu scandal at ILSVRC15](https://web.archive.org/web/20150602165531/http://www.image-net.org/challenges/LSVRC/announcement-June-2-2015)
- [What I learned from competing against a convnet on ImageNet](http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/)
- [MSCOCO](http://cocodataset.org)
|
139 | | 5 | Tues, 18 Feb | ResNet continued |
140 | | 5 | Thur, 20 Feb | ResNet continued- [DenseNet](https://arxiv.org/abs/1608.06993)
- Visualizing the Landscape of Neural Network ([OpenReview](https://openreview.net/forum?id=HkmaTz-0W), [NIPS/NeurIPS](http://papers.nips.cc/paper/7875-visualizing-the-loss-landscape-of-neural-nets))
|
141 | | 6 | Tues, 25 Feb | YOLO
- [YOLOv3: An Incremental Improvement](https://arxiv.org/abs/1804.02767)
- [YOLO9000: Better, Faster, Stronger](https://arxiv.org/abs/1612.08242) and [reviews](https://pjreddie.com/publications/yolo9000/)
- [You Only Look Once: Unified Real-Time Object Detection](http://arxiv.org/abs/1506.02640) and [reviews](https://pjreddie.com/publications/yolo/)
- [YOLO video example](https://www.youtube.com/watch?v=MPU2HistivI)
- [YOLO video presentation](https://www.youtube.com/watch?v=NM6lrxy0bxs&feature=youtu.be)
- [Joseph Redmon's CV](https://pjreddie.com/static/Redmon%20Resume.pdf)
- [Ethical concerns and YOLO](https://medium.com/syncedreview/yolo-creator-says-he-stopped-cv-research-due-to-ethical-concerns-b55a291ebb29)
The MvMF loss for geolocation:- [ECML-PKDD paper](https://izbicki.me/public/papers/ecmlpkdd2019-image-geolocation.pdf)
|
142 | | 6 | Thur, 27 Feb | Text: Basic text models- [bag of words]()
- [tf-idf](http://www.tfidf.com/)
- [n-grams](https://en.wikipedia.org/wiki/N-gram)
- [zipf's law](https://en.wikipedia.org/wiki/Zipf%27s_law)
- [hashing trick](https://booking.ai/dont-be-tricked-by-the-hashing-trick-192a6aae3087)
Python text processing libraries:- [spacy](https://spacy.io/)
- [neuralcoref](https://github.com/huggingface/neuralcoref)
- [NLTK](https://www.nltk.org/)
- [TextBlob](https://textblob.readthedocs.io/en/dev/)
- [textstat](https://pypi.org/project/textstat/)
|
143 | | 7 | Tues, 03 Mar | Text: CNNs- [character-level convolutional networks for text classification](http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classifica)
- [very deep convolutional networks for text classification](https://arxiv.org/abs/1606.01781) (uses resnets internally)
Text: RNNs- [RNN vs GRU vs LSTM](https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs-lstm-difference-lies-in-more-flexible-control-5f33e07b1e57)
|
144 | | 7 | Thur, 05 Mar | Text: Lab exercise |
145 | | 8 | Tues, 10 Mar | Text: Seq2seq- [the unreasonable effectiveness of RNNs](https://karpathy.github.io/2015/05/21/rnn-effectiveness/)
- [what is temperature?](https://cs.stackexchange.com/questions/79241/what-is-temperature-in-lstm-and-neural-networks-generally)
- [sampling strategies in pictures](https://medium.com/machine-learning-at-petiteprogrammer/sampling-strategies-for-recurrent-neural-networks-9aea02a6616f)
- [automatic image captioning](https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/)
|
146 | | 8 | Thur, 12 Mar | Text: Attention |
147 | | 9 | Tues, 17 Mar | **NO CLASS:** Spring Break |
148 | | 9 | Thur, 19 Mar | **NO CLASS:** Spring Break |
149 | | 10 | Tues, 24 Mar | Text: Transformers (paper, [blog post](http://jalammar.github.io/illustrated-transformer/)) |
150 | | 10 | Thur, 26 Mar |TBD |
151 | | 11 | Tues, 31 Mar | TBD |
152 | | 11 | Thur, 02 Apr | TBD |
153 | | 12 | Tues, 07 Apr | TBD |
154 | | 12 | Thur, 09 Apr | TBD |
155 | | 13 | Tues, 14 Apr | TBD |
156 | | 13 | Thur, 16 Apr | TBD |
157 | | 14 | Tues, 21 Apr | TBD |
158 | | 14 | Thur, 23 Apr | TBD |
159 | | 15 | Tues, 28 Apr | TBD |
160 | | 15 | Thur, 30 Apr | Project Presentations |
161 | | 16 | Thur, 05 May | Project Presentations |
162 | | 16 | Thur, 07 May | **NO CLASS:** Reading Day |
163 |
164 |
167 |
168 |
172 |
173 |
174 |
175 |
176 | ### Assignments
177 |
178 | | Week | Weight | Topic |
179 | | ---- | ------ | ------------------------------- |
180 | | 2 | 10 | Rosenbrock Function |
181 | | 3 | 10 | Crossentropy Loss |
182 | | 4 | 10 | CNN |
183 | | 6 | 10 | Image Transfer Learning |
184 | | 7 | 10 | RNN |
185 | | 10 | 10 | Text Transfer Learning |
186 | | -- | 10 | Reading |
187 | | 15 | 30 | Project |
188 |
189 | There are no exams in this course.
190 |
191 | **Late Work Policy:**
192 |
193 | You lose 10% on the assignment for each day late.
194 | If you have extenuating circumstances, contact me in advance of the due date and I may extend the due date for you.
195 |
196 | **Collaboration Policy:**
197 |
198 | You are encouraged to work together with other students on all assignments and use any online resources.
199 | Learning the course material is your responsibility,
200 | and so do whatever collaboration will help you learn the material.
201 |
202 |
246 |
247 |
258 |
259 | ## Accommodations for Disabilities
260 |
261 | I want you to succeed and I'll make every effort to ensure that you can.
262 | If you need any accommodations, please ask.
263 |
264 | If you have already established accommodations with Disability Services at CMC, please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course. You can start this conversation by forwarding me your accommodation letter. If you have not yet established accommodations through Disability Services, but have a temporary health condition or permanent disability (conditions include but are not limited to: mental health, attention-related, learning, vision, hearing, physical or health), you are encouraged to contact Assistant Dean for Disability Services & Academic Success, Kari Rood, at disabilityservices@cmc.edu to ask questions and/or begin the process. General information and the Request for Accommodations form can be found at the CMC DOS Disability Service’s website. Please note that arrangements must be made with advance notice in order to access the reasonable accommodations. You are able to request accommodations from CMC Disability Services at any point in the semester. Be mindful that this process may take some time to complete and accommodations are not retroactive. It is important to Claremont McKenna College to create inclusive and accessible learning environments consistent with federal and state law. If you are not a CMC student, please connect with the Disability Services Coordinator on your campus regarding a similar process.
265 |
266 |
--------------------------------------------------------------------------------
/hw1/README.md:
--------------------------------------------------------------------------------
1 | # Intro to pytorch
2 |
3 | **Due:** Tuesday, 28 Jan at midnight
4 |
5 | ## Tasks
6 |
7 | You are required to complete the following tasks:
8 |
9 | 1. Install pytorch
10 | 1. Modify the `rosenbrock.py` file so that it calculates the minimum of the rosenbrock function
11 | 1. Upload your completed code to Sakai
12 |
13 | ### Optional
14 |
15 | You are not required to complete the following tasks,
16 | however they are good exercises to get you familiar with pytorch.
17 |
18 | 1. Modify the `rosenbrock` function so that instead of taking two scalar variables as input, it takes a 2-dimensional vector as input.
19 | You will also have to update the optimization code to handle the new type of input.
20 |
21 | 1. Extend the `rosenbrock` function to arbitrary dimensions using wikipedia's high-dimensional rosenbrock function: https://en.wikipedia.org/wiki/Rosenbrock_function
22 |
23 | ## Submission
24 |
25 | Upload your `rosenbrock.py` file to sakai
26 |
--------------------------------------------------------------------------------
/hw1/rosenbrock.py:
--------------------------------------------------------------------------------
1 | '''
2 | The rosenbrock test function is a common "banana-shaped" function to test how well optimization routines work.
3 | See: https://en.wikipedia.org/wiki/Rosenbrock_function
4 | '''
5 | import torch
6 |
7 | def rosenbrock(x,y):
8 | a = 2
9 | b = 4
10 | return (a-x)**2 + b*(y-x**2)**2
11 |
12 | def rosenbrock_mod(x):
13 | a = 2
14 | b = 4
15 | return (a-x[0])**2 + b*(x[1]-x[0]**2)**2
16 |
17 | # add your code here
18 | alpha = 0.01
19 | x = torch.tensor([0.0,0.0],requires_grad=True)
20 | for i in range(5000):
21 | print('i=',i,'x=',x)
22 | z = rosenbrock_mod(x)
23 | z.backward()
24 | x = x - alpha * x.grad
25 | x = torch.tensor(x,requires_grad=True)
26 |
--------------------------------------------------------------------------------
/hw2/README.md:
--------------------------------------------------------------------------------
1 | # Intro to pytorch
2 |
3 | **Due:** Tuesday, 4 Feb at midnight
4 |
5 | ## Required tasks
6 |
7 | Modify the `classifier.py` file so that:
8 |
9 | 1. Adjust the training procedure so that the test set is evaluated at the end of every epoch.
10 |
11 | 1. Implement three new models: the random feature model and the 1 hidden layer neural network model.
12 | You should add a command line argument `--model` which takes one of four options
13 | (`linear`, `factorized_linear`, `random_feature`, and `nn`)
14 | and another argument `--size` which takes takes an integer argument and controls the number of random features or the size of the hidden layer, depending on the model.
15 | This will require modifying the `define the model` and the `optimization` sections of the homework file.
16 |
17 | 1. Add a command line option to use the MNIST dataset instead of CIFAR10 for training and testing.
18 | (This will require changing code in both the `load dataset` and the `define the model` sections of code.)
19 | Torchvision has many other datasets as well (see https://pytorch.org/docs/stable/torchvision/datasets.html), and you can add these datasets too.
20 |
21 | You should experiment with different values of `--alpha`, `--epochs`, `--batch_size`, `--model`, and `--size` to see how they effect your training time and the resulting accuracy of your models.
22 | Try to find the combination that results in the best training accuracy.
23 |
24 | ## Recommended tasks
25 |
26 | You are not required to complete the following tasks,
27 | however they are good exercises to get you familiar with pytorch.
28 |
29 | 1. Currently, the print statement of the inner loop of the optimization prints the loss of a single batch of data.
30 | Because this is only a single batch of data, the loss value is highly noisy, and it is difficult to tell if the model is converging.
31 | The [exponential moving average](https://en.wikipedia.org/wiki/Moving_average) is a good way to smooth these values,
32 | and machine learning practitioners typically use this technique to smooth the training loss and measure convergence.
33 | Implement this technique in your `classifier.py` file.
34 |
35 | 1. Make the optimization use SGD with momentum.
36 | Add a command line flag that controls the strength of the momentum,
37 | and experiment to find a good momentum value.
38 | (Beta = 0.9 is often used.)
39 |
40 | 1. Add a "deep" neural network as one of the possible classifiers that has more than 1 hidden layer.
41 | Make the number of layers and the size of each layer a parameter on the command line.
42 |
43 | ## Submission
44 |
45 | Upload your `classifier.py` file to sakai
46 |
47 |
--------------------------------------------------------------------------------
/hw2/classifier.py:
--------------------------------------------------------------------------------
1 | #!/bin/python3
2 | '''
3 | <<<<<<< HEAD
4 | Here are some results of running this code:
5 |
6 | =======
7 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
8 | > python3 classifier.py --dataset=mnist --model=linear
9 | test set accuracy = 0.9126833333333333
10 | > python3 classifier.py --dataset=mnist --model=factorized_linear
11 | test set accuracy = 0.8846833333333334
12 | > python3 classifier.py --dataset=mnist --model=neural_network --size=256
13 | test set accuracy = 0.92685
14 | > python3 classifier.py --dataset=mnist --model=kitchen_sink --size=256
15 | <<<<<<< HEAD
16 | test set accuracy = 0.8658333333333333
17 | =======
18 | test set accuracy = 0.92685
19 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
20 | '''
21 |
22 | # process command line args
23 | import argparse
24 | parser = argparse.ArgumentParser()
25 | <<<<<<< HEAD
26 |
27 | parser_model = parser.add_argument_group('model options')
28 | parser_model.add_argument('--model',choices=['linear','factorized_linear','kitchen_sink','neural_network'],default='linear')
29 | parser_model.add_argument('--size',type=int,default=32)
30 |
31 | parser_data = parser.add_argument_group('data options')
32 | parser_data.add_argument('--dataset',choices=['mnist','cifar10'])
33 |
34 | parser_opt = parser.add_argument_group('optimization options')
35 | parser_opt.add_argument('--seed',type=int)
36 | parser_opt.add_argument('--batch_size',type=int,default=16)
37 | parser_opt.add_argument('--alpha',type=float,default=0.01)
38 | parser_opt.add_argument('--epochs',type=int,default=10)
39 |
40 | parser_debug = parser.add_argument_group('debug options')
41 | parser_debug.add_argument('--show_image',action='store_true')
42 | parser_debug.add_argument('--print_step',type=int,default=1000)
43 | parser_debug.add_argument('--ema_alpha',type=float,default=0.99)
44 | parser_debug.add_argument('--eval_each_epoch',action='store_true')
45 |
46 | =======
47 | parser.add_argument('--batch_size',type=int,default=16)
48 | parser.add_argument('--alpha',type=float,default=0.01)
49 | parser.add_argument('--epochs',type=int,default=10)
50 | parser.add_argument('--show_image',action='store_true')
51 | parser.add_argument('--size',type=int,default=32)
52 | parser.add_argument('--print_step',type=int,default=1000)
53 | parser.add_argument('--dataset',choices=['mnist','cifar10'])
54 | parser.add_argument('--ema_alpha',type=float,default=0.99)
55 | parser.add_argument('--model',choices=['linear','factorized_linear','kitchen_sink','neural_network'],default='linear')
56 | parser.add_argument('--eval_each_epoch',action='store_true')
57 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
58 | args = parser.parse_args()
59 |
60 | # imports
61 | import datetime
62 | import torch
63 | import torch.nn as nn
64 | import torchvision
65 | import torchvision.transforms as transforms
66 |
67 | <<<<<<< HEAD
68 | # make deterministic
69 | if args.seed is not None:
70 | torch.manual_seed(0)
71 | torch.backends.cudnn.deterministic = True
72 | torch.backends.cudnn.benchmark = False
73 | =======
74 | # setting device on GPU if available, else CPU
75 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
76 | print('Using device:', device)
77 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
78 |
79 | # load dataset
80 | if args.dataset=='cifar10':
81 | transform = transforms.Compose([
82 | transforms.ToTensor(),
83 | transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
84 | ],
85 | )
86 | trainset = torchvision.datasets.CIFAR10(
87 | root = './data',
88 | train = True,
89 | download = True,
90 | transform = transform,
91 | )
92 | testset = torchvision.datasets.CIFAR10(
93 | root = './data',
94 | <<<<<<< HEAD
95 | train = False,
96 | =======
97 | train = True,
98 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
99 | download = True,
100 | transform = transform,
101 | )
102 | else:
103 | transform = transforms.Compose([
104 | transforms.ToTensor(),
105 | transforms.Normalize(( 0.5,), ( 0.5,))
106 | ],
107 | )
108 | trainset = torchvision.datasets.MNIST(
109 | root = './data',
110 | train = True,
111 | download = True,
112 | transform = transform,
113 | )
114 | testset = torchvision.datasets.MNIST(
115 | root = './data',
116 | <<<<<<< HEAD
117 | train = False,
118 | =======
119 | train = True,
120 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
121 | download = True,
122 | transform = transform,
123 | )
124 |
125 | trainloader = torch.utils.data.DataLoader(
126 | trainset,
127 | batch_size = args.batch_size,
128 | shuffle = True,
129 | )
130 | testloader = torch.utils.data.DataLoader(
131 | testset,
132 | batch_size = args.batch_size,
133 | shuffle = True,
134 | )
135 |
136 | # display images
137 | if args.show_image:
138 | import matplotlib.pyplot as plt
139 | import numpy as np
140 |
141 | def imshow(img):
142 | img = img / 2 + 0.5 # unnormalize
143 | npimg = img.numpy()
144 | plt.imshow(np.transpose(npimg, (1, 2, 0)))
145 | plt.show()
146 |
147 | # get some random training images
148 | dataiter = iter(trainloader)
149 | images, labels = dataiter.next()
150 |
151 | # show images
152 | imshow(torchvision.utils.make_grid(images))
153 |
154 | # exit
155 | import sys
156 | sys.exit(0)
157 |
158 | # define the model
159 | images, labels = iter(trainloader).next()
160 | shape_input = images.shape[1:]
161 | shape_output = torch.Size([10])
162 | h = torch.Size([args.size])
163 |
164 | <<<<<<< HEAD
165 | w = torch.tensor(torch.randn(shape_input+shape_output),requires_grad=True)
166 | u = torch.tensor(torch.randn(shape_input+h),requires_grad=True)
167 | v = torch.tensor(torch.randn(h+shape_output),requires_grad=True)
168 |
169 | # typically hard code the order of tensors
170 | # typically not hard code the actual values of the dimension (shape)
171 |
172 | def linear(x):
173 | #return torch.einsum('bijk,ijkl -> bl',x,w)
174 | #print('x.shape=',x.shape) # 16,1,28,28 = bijk
175 | #print('w.shape=',w.shape) # 1,28,28,10 = ijkl
176 | out = torch.einsum('bijk,ijkl -> bl',x,w)
177 | #print('out.shape=',out.shape) # 10 = l
178 | return out
179 |
180 | def factorized_linear(x):
181 | return torch.einsum('bijk,ijkh,hl -> bl',x,u,v)
182 |
183 | def neural_network(x):
184 | net = torch.einsum('bijk,ijkh -> bh',x,u)
185 | net = torch.relu(net)
186 | #relu = torch.nn.ReLU()
187 | #net = relu(net)
188 | #net = torch.max(torch.zeros(net.shape),net)
189 | net = torch.einsum('bh,hl -> bl',net,v)
190 | return net
191 |
192 | =======
193 | w = torch.tensor(torch.rand(shape_input+shape_output),requires_grad=True,device=device)
194 | u = torch.tensor(torch.rand(shape_input+h),requires_grad=True,device=device)
195 | v = torch.tensor(torch.rand(h+shape_output),requires_grad=True,device=device)
196 |
197 | def linear(x):
198 | return torch.einsum('bijk,ijkl -> bl',x,w)
199 |
200 | def factorized_linear(x):
201 | return torch.einsum('bijk,ijkh,hl -> bl',x,u,v)
202 |
203 | relu = nn.ReLU()
204 | def neural_network(x):
205 | net = torch.einsum('bijk,ijkh -> bh',x,u)
206 | net = relu(net)
207 | net = torch.einsum('bh,hl -> bl',net,v)
208 | return net
209 |
210 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
211 | kitchen_sink = neural_network
212 |
213 | f = eval(args.model)
214 |
215 | # eval on test set
216 | def eval_test_set():
217 | correct = 0.0
218 | total = 0.0
219 | with torch.no_grad():
220 | for data in testloader:
221 | images, labels = data
222 | outputs = f(images)
223 | _, predicted = torch.max(outputs.data, 1)
224 | total += labels.size(0)
225 | correct += (predicted == labels).sum().item()
226 | print('test set accuracy = ', correct/total)
227 |
228 | # optimize
229 | criterion = nn.CrossEntropyLoss()
230 | loss = float('inf')
231 | loss_ave = loss
232 | for epoch in range(args.epochs):
233 | for i, data in enumerate(trainloader, 0):
234 | if i%args.print_step==0:
235 | print(
236 | datetime.datetime.now(),
237 | 'epoch=',epoch,
238 | 'i=',i,
239 | 'loss_ave=',loss_ave
240 | )
241 | images, labels = data
242 | images.cuda()
243 | labels.cuda()
244 | outputs = f(images)
245 | loss = criterion(outputs,labels)
246 | if loss_ave == float('inf'):
247 | loss_ave = loss
248 | else:
249 | loss_ave = args.ema_alpha * loss_ave + (1 - args.ema_alpha) * loss
250 | loss.backward()
251 | if args.model=='linear':
252 | w = w - args.alpha * w.grad
253 | w = torch.tensor(w,requires_grad=True)
254 | else:
255 | <<<<<<< HEAD
256 | #print('|u.grad|=',torch.norm(u.grad))
257 | =======
258 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
259 | if args.model!='kitchen_sink':
260 | u = u - args.alpha * u.grad
261 | u = torch.tensor(u,requires_grad=True)
262 | v = v - args.alpha * v.grad
263 | v = torch.tensor(v,requires_grad=True)
264 | <<<<<<< HEAD
265 |
266 | if args.eval_each_epoch:
267 | eval_test_set()
268 |
269 | if not args.eval_each_epoch:
270 | eval_test_set()
271 | =======
272 |
273 | if args.eval_each_epoch:
274 | eval_test_set()
275 |
276 | if not args.eval_each_epoch:
277 | eval_test_set()
278 |
279 | >>>>>>> 246e250cb0ae0d8e78ca545c804ea92feb1c36ac
280 |
--------------------------------------------------------------------------------
/hw2/run_classifier.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 |
3 | mkdir -p output
4 |
5 | python3 -u classifier.py --dataset=mnist --model=linear > output/linear
6 | python3 -u classifier.py --dataset=mnist --model=factorized_linear > output/factorized_linear
7 | python3 -u classifier.py --dataset=mnist --model=neural_network --size=256 > output/neural_network
8 | python3 -u classifier.py --dataset=mnist --model=kitchen_sink --size=256 > output/kitchen_sink
9 |
10 | for i in 16 32 64 128 256 512 1024; do
11 | python3 -u classifier.py --dataset=mnist --model=neural_network --size=$i --epoch=25 > output/neural_network.$i
12 | done
13 |
--------------------------------------------------------------------------------
/hw3/README.md:
--------------------------------------------------------------------------------
1 | # CNNs
2 |
3 | **Due:** Thursday, 13 February at midnight
4 |
5 | ## Tasks
6 |
7 | You are required to complete the following tasks:
8 |
9 | 1. Use PyTorch's neural networks tutorial [part 1](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html) and [part 2](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) to create a python script that trains a small CNN on the CIFAR10 dataset
10 | 1. Modify the code to work on the Fashion-MNIST dataset as well
11 | 1. Modify the data input pipeline to include data augmentation using the `torchsample` library. See [this github repo](https://github.com/jiangqy/Data-Augmentation-Pytorch) for example code.
12 |
13 | ### Optional
14 |
15 | You are not required to complete the following tasks,
16 | however they are good exercises to get you familiar with pytorch.
17 |
18 | 1. Structure your python file so that it uses argparse to store all hyperparameters.
19 |
20 | 1. Write a shell script that experiments with different hyperparameter combinations.
21 |
22 | 1. Make one of the optional hyperparameters the use of the Adam optimizer instead of SGD.
23 | See [this pytorch tutorial](https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_optim.html) for an example.
24 |
25 | ## Submission
26 |
27 | Upload your python file to sakai
28 |
29 |
--------------------------------------------------------------------------------
/hw4/README.md:
--------------------------------------------------------------------------------
1 | # ResNets
2 |
3 | **Due:** Tuesday, ~~25 February~~ 27 February at midnight
4 |
5 | ## Tasks
6 |
7 | You are required to complete the following tasks:
8 |
9 | 1. Extend your code from hw3 to implement a 20-layer resnet model (you do not need to use batch norm layers)
10 |
11 | ### Optional
12 |
13 | You are not required to complete the following tasks,
14 | however they are good exercises to get you familiar with pytorch.
15 |
16 | 1. Use the batch normalization layer
17 | 1. Reproduce the main result from the resnet paper by:
18 | 1. implement the 20 layer plain network, 56 layer resnet, and 56 layer plain network
19 | 1. verify that training error for the 56 layer plain model is worse than for the 20 layer plain model
20 | 1. verify that training error for the 56 layer resnet is better than the 20 layer resnet (and the 20/56 layer plain models)
21 | 1. Implement the dense blocks from the ``Densely Connected Convolutional Networks'' paper
22 |
23 | ## Submission
24 |
25 | Upload your python file to sakai
26 |
27 |
--------------------------------------------------------------------------------
/hw4/resnet.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | # process command line args
4 | parser = argparse.ArgumentParser()
5 |
6 | parser_model = parser.add_argument_group('model options')
7 | parser_model.add_argument('--connections',choices=['plain','resnet'],default='resnet')
8 | parser_model.add_argument('--size',type=int,default=20)
9 |
10 | parser_opt = parser.add_argument_group('optimization options')
11 | parser_opt.add_argument('--batch_size',type=int,default=16)
12 | parser_opt.add_argument('--learning_rate',type=float,default=0.01)
13 | parser_opt.add_argument('--epochs',type=int,default=10)
14 | parser_opt.add_argument('--warm_start',type=str,default=None)
15 |
16 | parser_data = parser.add_argument_group('data options')
17 | parser_data.add_argument('--dataset',choices=['mnist','cifar10'])
18 |
19 | parser_debug = parser.add_argument_group('debug options')
20 | parser_debug.add_argument('--show_image',action='store_true')
21 | parser_debug.add_argument('--print_delay',type=int,default=60)
22 | parser_debug.add_argument('--log_dir',type=str)
23 | parser_debug.add_argument('--eval',action='store_true')
24 |
25 | args = parser.parse_args()
26 |
27 | # load libraries
28 | import datetime
29 | import os
30 | import sys
31 | import time
32 |
33 | import torch
34 | import torch.nn as nn
35 | from torch.utils.tensorboard import SummaryWriter
36 | import torchvision
37 | import torchvision.transforms as transforms
38 | import matplotlib.pyplot as plt
39 | import numpy as np
40 |
41 | # load data
42 | if args.dataset=='cifar10':
43 | image_shape=[3,32,32]
44 |
45 | transform = transforms.Compose(
46 | [ transforms.RandomHorizontalFlip()
47 | , transforms.ToTensor()
48 | , transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
49 | ])
50 |
51 | trainset = torchvision.datasets.CIFAR10(
52 | root='./data',
53 | train=True,
54 | download=True,
55 | transform=transform
56 | )
57 | trainloader = torch.utils.data.DataLoader(
58 | trainset,
59 | batch_size=args.batch_size,
60 | shuffle=True,
61 | num_workers=2
62 | )
63 |
64 | testset = torchvision.datasets.CIFAR10(
65 | root='./data',
66 | train=False,
67 | download=True,
68 | transform=transform
69 | )
70 | testloader = torch.utils.data.DataLoader(
71 | testset,
72 | batch_size=args.batch_size,
73 | shuffle=True,
74 | num_workers=2
75 | )
76 |
77 | if args.dataset=='mnist':
78 | image_shape=[1,28,28]
79 |
80 | transform = transforms.Compose(
81 | [ transforms.RandomHorizontalFlip()
82 | , transforms.ToTensor()
83 | , transforms.Normalize((0.5,), (0.5,))
84 | ])
85 |
86 | trainset = torchvision.datasets.MNIST(
87 | root='./data',
88 | train=True,
89 | download=True,
90 | transform=transform
91 | )
92 | trainloader = torch.utils.data.DataLoader(
93 | trainset,
94 | batch_size=args.batch_size,
95 | shuffle=True,
96 | num_workers=2
97 | )
98 |
99 | testset = torchvision.datasets.MNIST(
100 | root='./data',
101 | train=False,
102 | download=True,
103 | transform=transform
104 | )
105 | testloader = torch.utils.data.DataLoader(
106 | testset,
107 | batch_size=args.batch_size,
108 | shuffle=True,
109 | num_workers=2
110 | )
111 |
112 | # show image
113 | if args.show_image:
114 | def imshow(img):
115 | img = img / 2 + 0.5 # unnormalize
116 | npimg = img.numpy()
117 | plt.imshow(np.transpose(npimg, (1, 2, 0)))
118 | plt.show()
119 | dataiter = iter(trainloader)
120 | images, labels = dataiter.next()
121 | imshow(torchvision.utils.make_grid(images))
122 |
123 | # define the model
124 | def conv3x3(channels_in, channels_out):
125 | """3x3 convolution with padding"""
126 | return nn.Conv2d(
127 | channels_in,
128 | channels_out,
129 | kernel_size=3,
130 | stride=1,
131 | padding=1,
132 | groups=1,
133 | bias=False,
134 | dilation=dilation
135 | )
136 |
137 | class ResnetBlock(nn.Module):
138 | def __init__(
139 | self,
140 | channels,
141 | use_bn = True,
142 | ):
143 | super(BasicBlock, self).__init__()
144 | norm_layer = torch.nn.BatchNorm2d
145 | self.use_bn = use_bn
146 | self.conv1 = conv3x3(channels, channels, stride)
147 | if self.use_bn:
148 | self.bn1 = norm_layer(channels)
149 | self.relu = nn.ReLU(inplace=True)
150 | self.conv2 = conv3x3(channels, channels)
151 | if self.use_bn:
152 | self.bn2 = norm_layer(channels)
153 | self.downsample = downsample
154 | self.stride = stride
155 |
156 | def forward(self, x):
157 | identity = x
158 | out = self.conv1(x)
159 | if self.use_bn:
160 | out = self.bn1(out)
161 | out = self.relu(out)
162 | out = self.conv2(out)
163 | if self.use_bn:
164 | out = self.bn2(out)
165 | out += identity
166 | out = self.relu(out)
167 | return out
168 |
169 | import functools
170 | image_size = functools.reduce(lambda x, y: x * y, image_shape, 1)
171 |
172 | class Net(nn.Module):
173 | def __init__(self):
174 | super(Net, self).__init__()
175 | self.fc = torch.nn.Linear(image_size,10)
176 | pass
177 |
178 | def forward(self, x):
179 | out = x.view(args.batch_size,image_size)
180 | out = self.fc(out)
181 | return out
182 |
183 | net = Net()
184 |
185 | # load pretrained model
186 | if args.warm_start is not None:
187 | print('warm starting model from',args.warm_start)
188 | model_dict = torch.load(os.path.join(args.warm_start,'model'))
189 | net.load_state_dict(model_dict['model_state_dict'])
190 |
191 | # create save dir
192 | log_dir = args.log_dir
193 | if log_dir is None:
194 | log_dir = 'log/'+str(datetime.datetime.now())
195 |
196 | try:
197 | os.mkdir(log_dir)
198 | except FileExistsError:
199 | print('cannot create log dir,',log_dir,'already exists')
200 | sys.exit(1)
201 |
202 | writer = SummaryWriter(log_dir=log_dir)
203 |
204 | # train the model
205 | criterion = nn.CrossEntropyLoss()
206 | optimizer = torch.optim.SGD(net.parameters(), lr=args.learning_rate, momentum=0.9)
207 | net.train()
208 |
209 | total_iter = 0
210 | last_print = 0
211 |
212 | steps = 0
213 | for epoch in range(args.epochs):
214 | for i, data in enumerate(trainloader):
215 | steps += 1
216 | inputs, labels = data
217 | optimizer.zero_grad()
218 | outputs = net(inputs)
219 | loss = criterion(outputs, labels)
220 | loss.backward()
221 | optimizer.step()
222 |
223 | # accuracy
224 | prediction = torch.argmax(outputs,dim=1)
225 | accuracy = (prediction==labels).float().mean()
226 |
227 | # tensorboard
228 | writer.add_scalar('train/loss', loss.item(), steps)
229 | writer.add_scalar('train/accuracy', accuracy.item(), steps)
230 |
231 | # print statistics
232 | total_iter += 1
233 | if time.time() - last_print > args.print_delay:
234 | print(datetime.datetime.now(),'epoch = ',epoch,'steps=',steps,'batch/sec=',total_iter/args.print_delay)
235 | total_iter = 0
236 | last_print = time.time()
237 |
238 | torch.save({
239 | 'epoch':epoch,
240 | 'model_state_dict': net.state_dict(),
241 | 'optimizer_state_dict': optimizer.state_dict(),
242 | 'loss':loss
243 | }, os.path.join(log_dir,'model'))
244 |
245 |
246 | # test set
247 | if args.eval:
248 | print('evaluating model')
249 | net.eval()
250 |
251 | loss_total = 0
252 | accuracy_total = 0
253 | for i, data in enumerate(testloader):
254 | inputs, labels = data
255 | outputs = net(inputs)
256 | loss = criterion(outputs, labels)
257 |
258 | # accuracy
259 | prediction = torch.argmax(outputs,dim=1)
260 | accuracy = (prediction==labels).float().mean()
261 |
262 | # update variables
263 | loss_total += loss.item()
264 | accuracy_total += accuracy.item()
265 |
266 | print('loss=',loss_total/i)
267 | print('accuracy=',accuracy_total/i)
268 |
269 |
--------------------------------------------------------------------------------
/hw5/README.md:
--------------------------------------------------------------------------------
1 | # YOLOv3 lol
2 |
3 | **Due:** Tuesday, 3 March at midnight
4 |
5 | **Learning Objective:**
6 |
7 | 1. use other people's implementations of pytorch models
8 | 1. gain familiarity with YOLO
9 |
10 | ## Tasks
11 |
12 | You are required to complete the following tasks:
13 |
14 | 1. [This github repo](https://github.com/eriklindernoren/PyTorch-YOLOv3) contains a pytorch implementation of YOLOv3.
15 | Follow the directions to apply the YOLO object detection model to several of your own images.
16 | 1. Adjust the code so that it outputs images at their original resolution by changing the call to `plt.subplots(1)` to read
17 | ```
18 | plt.subplots(1,figsize=(img.shape[1]/96, img.shape[0]/96), dpi=96)
19 | ```
20 | 1. Upload your favorite images to sakai.
21 |
22 | Example images I created are:
23 |
24 | 
25 | 
26 | 
27 | 
28 | 
29 | 
30 |
31 | ### Optional
32 |
33 | You are not required to complete the following tasks,
34 | however they are good exercises to get you familiar with pytorch.
35 |
36 | 1. The code resizes all images before passing it to the YOLO model using the size specified by the `--img_size` parameter.
37 | Experiment with different values of this parameter to see the effects on the outputs.
38 | You should notice that smaller images result in fewer objects detected,
39 | but larger images require more computation.
40 | 1. Other parameters such as `--conf_thres` and `--nms_thres` adjust the model's results by trading off between object classification accuracy and localization accuracy.
41 | Try adjusting these parameters to get better results on your images.
42 |
43 | ## Submission
44 |
45 | Upload your images to sakai.
46 | You do not need to upload any code.
47 |
48 |
--------------------------------------------------------------------------------
/hw5/cmc1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/cmc1.png
--------------------------------------------------------------------------------
/hw5/cmc2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/cmc2.png
--------------------------------------------------------------------------------
/hw5/mike1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike1.png
--------------------------------------------------------------------------------
/hw5/mike2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike2.png
--------------------------------------------------------------------------------
/hw5/mike3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike3.png
--------------------------------------------------------------------------------
/hw5/mike4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike4.png
--------------------------------------------------------------------------------
/hw5/mike5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw5/mike5.png
--------------------------------------------------------------------------------
/hw6/README.md:
--------------------------------------------------------------------------------
1 | # Name Classifier
2 |
3 | In this assignment, you will create a model that can predict the nationality of surnames.
4 |
5 | **Due:** ~~Thursday 12 March~~ Sunday 29 March at midnight
6 |
7 | **Learning Objective:**
8 |
9 | 1. gain familiarity with character-level text models (RNN / CNN)
10 | 1. effectively use tensorboard to monitor the training of models and select hyperparameters
11 |
12 | ## Tasks
13 |
14 | Complete the following required tasks.
15 |
16 | 1. **Download the starter code and data:**
17 | On the command line, run:
18 | ```
19 | $ wget https://github.com/mikeizbicki/cmc-csci181/blob/master/hw6/names.py
20 | $ wget https://github.com/mikeizbicki/cmc-csci181/blob/master/hw6/names.tar.gz
21 | $ tar -xf names.tar.gz
22 | ```
23 | You should always manually inspect the data values before performing any coding or model training.
24 | In this case, get a list of the data files by running
25 | ```
26 | $ ls names
27 | ```
28 | Notice that there is a file for several different nationalities.
29 | Inspect the values of some of these files.
30 | ```
31 | $ head names/English.txt
32 | $ head names/Spanish.txt
33 | $ head names/Korean.txt
34 | ```
35 | Notice that the names have been romanized for all languages,
36 | but the names are not entirely ASCII.
37 | We will revisit this fact later.
38 |
39 | Also notice that each line contains a unique name.
40 | To get the total number of lines in each file (and therefore the total number of examples for each class), run
41 | ```
42 | $ wc -l names/*
43 | ```
44 | Observe based on this output that the training data is not equally balanced between classes.
45 |
46 | We will not be dividing this data up into a train/test split.
47 | In this case, that is not needed, because our data is essentially exhaustive of all possible names.
48 | (For example, the 94 Korean surnames account for >99% of all Korean last names.)
49 | Our primary goal is not to generalize to unseen names,
50 | but rather to have an efficient "compressed" representation of all names.
51 | This wil let us create a function for assigning the nationality to a name without having to explicitly store and search all 20,000 names.
52 | (As a side benefit, this function will generalize to typos and other unseen data, but we're not going to explicitly evaluate its ability to do this.)
53 |
54 | Compressing a training set without a test set is actually a common setting in deep learning.
55 | The [Hutter Prize](http://prize.hutter1.net/) will award $500,000 to the first people to efficient compress all human knowledge (i.e. wikipedia),
56 | and Google has a [similar competition](https://www.androidpolice.com/2018/01/12/compress-google-issues-machine-learning-challenge-build-better-jpeg/) for improving jpeg image compression.
57 | Many AGI researchers believe that the problem of creating an optimal data compression scheme is isomorphic to creating artificial intelligence.
58 |
59 | 1. **Different learning rates:**
60 | At the command prompt, execute the following line:
61 | ```
62 | $ python3 names.py --train --learning_rate=1e-1
63 | ```
64 | In a separate command prompt, launch tensorboard with the line:
65 | ```
66 | $ tensorboard --logdir=log
67 | ```
68 | You should observe that the loss function is diverging.
69 | Experiment with different learning rates to find the optimal value (i.e. the largest value that causes the loss to converge to zero).
70 |
71 | **NOTE:**
72 | In order to easily interpret the tensorboard plots, you may have to increasing the smoothing paramaeter very close to 1.
73 | I used a value of 0.99.
74 |
75 | **Question 1:**
76 | What is the optimal learning rate you found?
77 |
78 | 1. **Gradient clipping:**
79 | Tensorboard is recording three values: the training accuracy, the training loss, and the norm of the gradient.
80 | Notice that as training progresses, the norm of the gradient increases.
81 | This is called the *exploding gradient problem*.
82 |
83 | The standard solution to the exploding gradient problem is *gradient clipping*.
84 | In gradient clipping, we first measure the L2 norm of the gradient;
85 | then, if it is larger than some threshold value, we shrink the gradient so that it points in the same direction but has norm equal to the threshold.
86 |
87 | To add support for gradient clipping to your code,
88 | paste the following lines just before the call to `optimizer.step`.
89 | ```
90 | if args.gradient_clipping:
91 | torch.nn.utils.clip_grad_norm_(model.parameters(),1.0)
92 | ```
93 |
94 | Rerun the model training code, but now with the `--gradient_clipping` flag to enable gradient clipping.
95 | Once again, experiment to find the optimal value for the learning rate.
96 |
97 | **Question 2:**
98 | What is the new optimal learning rate you found with gradient clipping enabled?
99 |
100 | **Question 3:**
101 | Which set of hyperparameters is converging faster?
102 |
103 | At this point, hopefully this XKCD comic is starting to make sense:
104 |
105 |
106 |
107 |
108 | 1. **Optimization method:**
109 | [Adam](https://arxiv.org/abs/1412.6980) is a popular alternative to SGD for optimizing models that was invented in 2014.
110 | The basic idea behind Adam is that not all parameters will need to take the same step size on each iteration,
111 | and so we should somehow learn the optimal step size for each parameter independently.
112 | It almost always converges much faster than SGD in practice, but sometimes has worse generalization error.
113 | Because of the fast convergence, Adam is widely used in practice.
114 | The original paper has over [38k citations on google scholar](https://scholar.google.com/scholar?cluster=16194105527543080940).
115 | (I think it's the second most cited paper of all time after the resnet paper, but I'm not 100% sure how to check this.)
116 | Unfortunately, however, [a 2018 paper](https://openreview.net/forum?id=ryQu7f-RZ) found a fatal flaw in the proof of convergence of the Adam paper, and showed that Adam is guaranteed not to converge even on some simple convex problems.
117 | Despite this flaw, Adam remains widely popular, is the optimizer of choice for thousands of pytorch users, and has thousands of citations already this year.
118 | In the deep learning world, people simply don't care about proofs yet.
119 |
120 | To add support for the Adam optimizer to the code,
121 | paste the following lines below the code for the SGD optimizer.
122 | ```
123 | if args.optimizer == 'adam':
124 | optimizer = torch.optim.Adam(
125 | model.parameters(),
126 | lr=args.learning_rate,
127 | weight_decay=args.weight_decay
128 | )
129 | ```
130 |
131 | Use the `--optimizer=adam` flag to train a model using Adam instead of SGD.
132 | Like SGD, Adam takes a learning rate hyperparameter,
133 |
134 | **Question 4:**
135 | What is the optimal learning rate for Adam?
136 | and you should experiment with different values to find the optimal value.
137 |
138 | The [`torch.optim`](https://pytorch.org/docs/stable/optim.html?highlight=torch optim) module contains many other optimizers that you can use.
139 | Select one of these additional optimizers to include in your code,
140 | and make the appropriate adjustments in the arguments list and training loop.
141 |
142 | **Question 5:**
143 | Which combination of optimizer/hyperparameters is converging faster?
144 |
145 | 1. **Different types of RNNs:**
146 | There are three different types of RNNs is common use.
147 | So far, your model has been using "vanilla" RNNs,
148 | which is what we discussed in class.
149 | Two other types are *gated recurrent units* (GRUs) and *long short term memories* (LSTMs).
150 | GRUs and LSTMs have more complicated activation functions that try to better capture long-term dependencies within the input text.
151 | Visit [this webpage](https://medium.com/@saurabh.rathor092/simple-rnn-vs-gru-vs-lstm-difference-lies-in-more-flexible-control-5f33e07b1e57) to see a picture representation of each of the RNN units.
152 |
153 | Understanding in detail the differences between the types of RNNs is not important.
154 | What is important is that the inputs and outputs of vanilla RNNs, GRUs, and LSTMs are all the same.
155 | This means that you can easily switch between the type of recurrent network by simply calling the appropriate torch library function.
156 | (The functions are `torch.nn.RNN`, `torch.nn.GRU`, and `torch.nn.LSTM`.)
157 |
158 | Adjust the `Model` class so that it uses either RNNs, GRUs, or LSTMs depending on the value of the `--model` input parameter.
159 |
160 | **Question 6:**
161 | Once again, experiment with different combinations of hyperparameters using these new layers.
162 | What gives the best results?
163 |
164 | 1. **RNN size:**
165 | There are two hyperparameters that control the "size" of an RNN.
166 | The advantages of larger sizes are: better training accuracy and improved ability to capture longterm dependencies within text.
167 | The disadvantages are: longer training time and worse generalization error.
168 |
169 | Adjust the `Model` class so that the calls to `torch.nn.RNN`,`torch.nn.GRU`, and `torch.nn.LSTM` use the `--hidden_layer_size` and `--num_layers` command line flags to determine the size of the hidden layers and number (respectively).
170 |
171 | **Question 7:**
172 | Experiment with different model sizes to find a good balance between speed and accuracy.
173 |
174 | 1. **Change batch size:**
175 | Currently, the training procedure uses a fixed batch size with only a single training example.
176 | There are two key functions for generating batch tensors from strings.
177 |
178 | The first is `unicode_to_ascii`, which converts a Unicode string into a Latin-only alphabet representation.
179 | Run this command on the strings `Izbicki`, `Ízbìçkï` and `이즈비키` to see how they get processed and the limitations of our current system.
180 |
181 | The second is `str_to_tensor`, which converts an input string into a 3rd order tensor.
182 | Notice that the first dimension is for the length of the string, and the second dimension is for the batch size.
183 | This is a standard pytorch convention.
184 |
185 | Modify the `str_to_tensor` function so that it takes a list of b strings as input instead of only a single string.
186 | The input strings are unlikely to be all of the same length,
187 | but the output tensor must have the same length for each string.
188 | To solve this problem, the first dimension of the tensor will have the largest size of all of the input strings;
189 | then, the remaining input strings will have their slices padded with all zeros to fill the space.
190 | To help the model understand when it reaches the end of a name, the special `$` character is used to symbolize the end of a name, and this should be inserted at the end of each string, before the zero padding.
191 |
192 | Next, you will need to modify the data sampling step to sample `args.batch_size` data points on each iteration.
193 | This is the part of the code after the comment `# get random training example`.
194 |
195 | **Question 8:**
196 | Experiment with a larger batch size.
197 | This should make training your model a bit faster because the matrix multiplications in your CPU will have better cache coherency.
198 | (A single step of batch size 10 should take only about 5x longer than a step of batch size 1.)
199 | A larger batch size will also reduce the variance of each step of SGD/Adam, and so a larger step size can be used.
200 | As a rule of thumb, increasing the batch size by a factor of `a` will let you increase the learning rate by a factor of `a` as well.
201 |
202 | With a batch size of 16, what is the new optimal learning rate?
203 |
204 | 1. **Add CNN support:**
205 | CNNs typically have a linear layer on top of them,
206 | and this linear layer requires that all inputs have a fixed length.
207 |
208 | 1. Modify the `str_to_tensor` function so that if the `--input_length` parameter is specified,
209 | then the tensor is padded/truncated so that the first dimension has size `args.input_length`.
210 |
211 | 1. Modify the `Model` class so that if the `--model=cnn` parameter is specified,
212 | then a cnn is used (the `torch.nn.Conv1d` function).
213 | Your implementation should use a width 3 filter.
214 | `--hidden_layer_size` as the number of channels,
215 | and your should have `--num_layers` cnn layers.
216 |
217 | **Question 9:**
218 | Experiment with different hyperparameters to find the best combination for the CNN model.
219 | How does the CNN model compare to the RNN models?
220 |
221 | 1. **Longrun model training:**
222 | Once you have a set of model hyperparameters that you like,
223 | then increase `--samples` to 100000 to train a more accurate model.
224 | (Depending on your specific hyperparameters, you may need to use an even larger number of samples to get the model to converge.)
225 | Then, use the `--warm_start` parameter to reload this model,
226 | and train for another 100000 samples (but this time with a learning rate lowered by a factor of 10).
227 | Repeat this procedure one more time.
228 |
229 | The whole procedure should take 10-30 minutes depending on the speed of your computer and the complexity of your model.
230 | This would be a good point to have an office chair jousting dual
231 |
232 |
233 |
234 | (Comic modified from https://xkcd.com/303/)
235 |
236 | 1. **Inference:**
237 | You can use the `--infer` parameter combined with `--warm_start` to use the model for inference (sometimes called model *deployment*).
238 | In this mode, `names.py` passes each line in stdin to the model and outputs the class predictions.
239 |
240 | You should modify the inference code so that instead of outputting a single prediction,
241 | it outputs the model's top 3 predictions along with the probability associated with each prediction.
242 |
243 | To get more than the top 1 prediction, you will have to change how the `topk` function is called.
244 |
245 | To convert the `output` tensor into probabilities, you will have to apply the `torch.nn.Softmax` function.
246 |
247 | ## Submission
248 |
249 | Upload your code to sakai.
250 | Hand in a hard copy of your completed answers.
251 |
--------------------------------------------------------------------------------
/hw6/img/xkcd-training.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/hw6/img/xkcd-training.png
--------------------------------------------------------------------------------
/hw6/names.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | # process command line args
4 | import argparse
5 | parser = argparse.ArgumentParser(fromfile_prefix_chars='@')
6 |
7 | parser_model = parser.add_argument_group('model options')
8 | parser_model.add_argument('--model',choices=['cnn','rnn','gru','lstm'],default='rnn')
9 | parser_model.add_argument('--hidden_layer_size',type=int,default=128)
10 | parser_model.add_argument('--num_layers',type=int,default=1)
11 |
12 | parser_opt = parser.add_argument_group('optimization options')
13 | parser_opt.add_argument('--batch_size',type=int,default=128)
14 | parser_opt.add_argument('--learning_rate',type=float,default=1e-1)
15 | parser_opt.add_argument('--optimizer',choices=['sgd','adam'],default='sgd')
16 | parser_opt.add_argument('--gradient_clipping',action='store_true')
17 | parser_opt.add_argument('--momentum',type=float,default=0.9)
18 | parser_opt.add_argument('--weight_decay',type=float,default=1e-4)
19 | parser_opt.add_argument('--samples',type=int,default=10000)
20 | parser_opt.add_argument('--input_length',type=int)
21 | parser_opt.add_argument('--warm_start')
22 |
23 | parser_data = parser.add_argument_group('data options')
24 | parser_data.add_argument('--data',default='names')
25 |
26 | parser_debug = parser.add_argument_group('debug options')
27 | parser_debug.add_argument('--print_delay',type=int,default=5)
28 | parser_debug.add_argument('--log_dir',type=str)
29 | parser_debug.add_argument('--save_every',type=int,default=1000)
30 | parser_debug.add_argument('--infer',action='store_true')
31 | parser_debug.add_argument('--train',action='store_true')
32 |
33 | args = parser.parse_args()
34 |
35 | # load args from file if warm starting
36 | if args.warm_start is not None:
37 | import sys
38 | import os
39 | args_orig = args
40 | args = parser.parse_args(['@'+os.path.join(args.warm_start,'args')]+sys.argv[1:])
41 | args.train = args_orig.train
42 |
43 | # load modules
44 | import datetime
45 | import glob
46 | import os
47 | import math
48 | import random
49 | import string
50 | import sys
51 | import time
52 | import unicodedata
53 |
54 | import torch
55 | import torch.nn as nn
56 | from torch.utils.tensorboard import SummaryWriter
57 |
58 | # import the training data
59 | vocabulary = string.ascii_letters + " .,;'$"
60 |
61 | def unicode_to_ascii(s):
62 | '''
63 | Removes diacritics from unicode characters.
64 | See: https://stackoverflow.com/a/518232/2809427
65 | '''
66 | return ''.join(
67 | c for c in unicodedata.normalize('NFD', s)
68 | if unicodedata.category(c) != 'Mn'
69 | and c in vocabulary
70 | )
71 |
72 | # Build the category_lines dictionary, a list of names per language
73 | category_lines = {}
74 | all_categories = []
75 |
76 | for filename in glob.glob(os.path.join(args.data,'*.txt')):
77 | category = os.path.splitext(os.path.basename(filename))[0]
78 | all_categories.append(category)
79 | lines = open(filename, encoding='utf-8').read().strip().split('\n')
80 | lines = [unicode_to_ascii(line) for line in lines]
81 | category_lines[category] = lines
82 |
83 | n_categories = len(all_categories)
84 |
85 | def str_to_tensor(s):
86 | '''
87 | converts aa string into a = 0.8 (only for the model without the `--conditional_model` flag)
40 |
41 | 1. Use tensorboard.dev to upload your tensorboard training runs to the cloud.
42 | You must create two separate tensorboard.dev webpages,
43 | one for each of the two sets of models trained above.
44 | Each of the tensorboard.dev pages should not have unrelated training runs appear in the plots.
45 |
46 | Here are the examples that I created in the videos:
47 | 1. unconditional model: https://tensorboard.dev/experiment/AMAd6axxQEuP2P20PIDKpQ/#scalars&_smoothingWeight=0.99
48 | 1. conditional model: https://tensorboard.dev/experiment/x99kAW5cQQ2NMgwU0lOdDQ/#scalars&_smoothingWeight=0.9
49 |
52 |
53 | ### Optional tasks
54 |
55 | 1. Modify the `CNNModel` class so that it also predicts the next character.
56 |
57 | In the video lectures, we only discussed how to modify the `RNNModel` class to predict the next character.
58 | The `CNNModel` class can also be used for predicting the next character in the same way.
59 | (And in fact, using `--model=cnn` currently results in a crash because of our changes to the training code.)
60 |
61 | *Bonus question:*
62 | Using a `CNNModel` to predict the next character is special case of using a *markov chain* to predict the next character of text.
63 | Why?
64 |
65 | 1. In the video lectures, I mention that you can use rejection sampling to sample from the conditional distributions when the model is an unconditional model.
66 | Implement this algorithm.
67 |
68 | 1. Implement beam search in your generation code.
69 |
70 | ## Submission
71 |
72 | 1. Submit the links to your tensorboard.dev pages on sakai.
73 |
--------------------------------------------------------------------------------
/hw7/names.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | # process command line args
4 | import argparse
5 | parser = argparse.ArgumentParser(fromfile_prefix_chars='@')
6 |
7 | parser_control = parser.add_argument_group('control options')
8 | parser_control.add_argument('--infer',action='store_true')
9 | parser_control.add_argument('--train',action='store_true')
10 |
11 | parser_data = parser.add_argument_group('data options')
12 | parser_data.add_argument('--data',default='names')
13 |
14 | parser_model = parser.add_argument_group('model options')
15 | parser_model.add_argument('--model',choices=['cnn','rnn','gru','lstm'],default='rnn')
16 | parser_model.add_argument('--hidden_layer_size',type=int,default=128)
17 | parser_model.add_argument('--num_layers',type=int,default=1)
18 |
19 | parser_opt = parser.add_argument_group('optimization options')
20 | parser_opt.add_argument('--batch_size',type=int,default=1)
21 | parser_opt.add_argument('--learning_rate',type=float,default=1e-1)
22 | parser_opt.add_argument('--optimizer',choices=['sgd','adam'],default='sgd')
23 | parser_opt.add_argument('--gradient_clipping',action='store_true')
24 | parser_opt.add_argument('--momentum',type=float,default=0.9)
25 | parser_opt.add_argument('--weight_decay',type=float,default=1e-4)
26 | parser_opt.add_argument('--samples',type=int,default=10000)
27 | parser_opt.add_argument('--input_length',type=int,default=20)
28 | parser_opt.add_argument('--warm_start')
29 |
30 | parser_debug = parser.add_argument_group('debug options')
31 | parser_debug.add_argument('--print_delay',type=int,default=5)
32 | parser_debug.add_argument('--log_dir',type=str)
33 | parser_debug.add_argument('--save_every',type=int,default=1000)
34 |
35 | args = parser.parse_args()
36 |
37 | if args.model=='cnn' and args.input_length is None:
38 | raise ValueError('if --model=cnn, then you must specify --input_length')
39 |
40 | # load args from file if warm starting
41 | if args.warm_start is not None:
42 | import sys
43 | import os
44 | args_orig = args
45 | args = parser.parse_args(['@'+os.path.join(args.warm_start,'args')]+sys.argv[1:])
46 | args.train = args_orig.train
47 |
48 | # supress warnings
49 | import warnings
50 | warnings.simplefilter(action='ignore', category=FutureWarning)
51 |
52 | # load modules
53 | import datetime
54 | import glob
55 | import os
56 | import math
57 | import random
58 | import string
59 | import sys
60 | import time
61 | import unicodedata
62 |
63 | import torch
64 | import torch.nn as nn
65 | from torch.utils.tensorboard import SummaryWriter
66 |
67 | # import the training data
68 | vocabulary = string.ascii_letters + " .,;'$"
69 |
70 | def unicode_to_ascii(s):
71 | '''
72 | Removes diacritics from unicode characters.
73 | See: https://stackoverflow.com/a/518232/2809427
74 | '''
75 | return ''.join(
76 | c for c in unicodedata.normalize('NFD', s)
77 | if unicodedata.category(c) != 'Mn'
78 | and c in vocabulary
79 | )
80 |
81 | # Build the category_lines dictionary, a list of names per language
82 | category_lines = {}
83 | all_categories = []
84 | for filename in glob.glob(os.path.join(args.data,'*.txt')):
85 | category = os.path.splitext(os.path.basename(filename))[0]
86 | all_categories.append(category)
87 | lines = open(filename, encoding='utf-8').read().strip().split('\n')
88 | lines = [unicode_to_ascii(line) for line in lines]
89 | category_lines[category] = lines
90 |
91 | def str_to_tensor(ss,input_length=None):
92 | '''
93 | Converts a list of strings into a tensor of shape .
94 | This is used to convert text into a form suitable for input into a RNN/CNN.
95 | '''
96 | max_length = max([len(s) for s in ss]) + 1
97 | if input_length:
98 | max_length = input_length
99 | tensor = torch.zeros(max_length, len(ss), len(vocabulary))
100 | for j,s in enumerate(ss):
101 | s+='$'
102 | for i, letter in enumerate(s):
103 | if ibvl',x)
137 | out = self.cnn(out)
138 | out = self.relu(out)
139 | for cnn in self.cnns:
140 | out = cnn(out)
141 | out = self.relu(out)
142 | out = out.view(args.batch_size,args.hidden_layer_size*args.input_length)
143 | out = self.fc(out)
144 | return out
145 |
146 | # load the model
147 | if args.model=='cnn':
148 | model = CNNModel()
149 | else:
150 | model = RNNModel()
151 |
152 | if args.warm_start:
153 | print('warm starting model from',args.warm_start)
154 | model_dict = torch.load(os.path.join(args.warm_start,'model'))
155 | model.load_state_dict(model_dict['model_state_dict'])
156 |
157 | # training
158 | if args.train:
159 |
160 | # create log_dir
161 | log_dir = args.log_dir
162 | if log_dir is None:
163 | log_dir = 'log/'+(
164 | 'model='+args.model+
165 | '_lr='+str(args.learning_rate)+
166 | '_optim='+args.optimizer+
167 | '_clip='+str(args.gradient_clipping)+
168 | '_'+str(datetime.datetime.now())
169 | )
170 | try:
171 | os.makedirs(log_dir)
172 | with open(os.path.join(log_dir,'args'), 'w') as f:
173 | f.write('\n'.join(sys.argv[1:]))
174 | except FileExistsError:
175 | print('cannot create log dir,',log_dir,'already exists')
176 | sys.exit(1)
177 | writer = SummaryWriter(log_dir=log_dir)
178 |
179 | # prepare model for training
180 | criterion = nn.CrossEntropyLoss()
181 | if args.optimizer == 'sgd':
182 | optimizer = torch.optim.SGD(
183 | model.parameters(),
184 | lr=args.learning_rate,
185 | momentum=args.momentum,
186 | weight_decay=args.weight_decay
187 | )
188 | if args.optimizer == 'adam':
189 | optimizer = torch.optim.Adam(
190 | model.parameters(),
191 | lr=args.learning_rate,
192 | weight_decay=args.weight_decay
193 | )
194 | model.train()
195 |
196 | # training loop
197 | start_time = time.time()
198 | for step in range(1, args.samples + 1):
199 |
200 | # get random training example
201 | categories = []
202 | lines = []
203 | for i in range(args.batch_size):
204 | category = random.choice(all_categories)
205 | line = random.choice(category_lines[category])
206 | categories.append(all_categories.index(category))
207 | lines.append(line)
208 | category_tensor = torch.tensor(categories, dtype=torch.long)
209 | line_tensor = str_to_tensor(lines,args.input_length)
210 |
211 | # perform training step
212 | output = model(line_tensor)
213 | loss = criterion(output, category_tensor)
214 | loss.backward()
215 | grad_norm = sum([ torch.norm(p.grad)**2 for p in model.parameters()])**(1/2)
216 | if args.gradient_clipping:
217 | torch.nn.utils.clip_grad_norm_(model.parameters(),1.0)
218 | optimizer.step()
219 |
220 | # get category from output
221 | top_n, top_i = output.topk(1)
222 | guess_i = top_i[-1].item()
223 | category_i = category_tensor[-1]
224 | guess = all_categories[guess_i]
225 | category = all_categories[category_i]
226 | accuracies = torch.where(
227 | top_i[:,0]==category_tensor,
228 | torch.ones([args.batch_size]),
229 | torch.zeros([args.batch_size])
230 | )
231 | accuracy = torch.mean(accuracies).item()
232 |
233 | # tensorboard
234 | writer.add_scalar('train/loss', loss.item(), step)
235 | writer.add_scalar('train/accuracy', accuracy, step)
236 | writer.add_scalar('train/grad_norm', grad_norm.item(), step)
237 |
238 | # print status update
239 | if step % 100 == 0:
240 | correct = '✓' if guess == category else '✗ (%s)' % category
241 | print('%d %d%% (%.2f sec) %.4f %s / %s %s' % (
242 | step,
243 | step / args.samples * 100,
244 | time.time()-start_time,
245 | loss,
246 | line,
247 | guess,
248 | correct
249 | ))
250 |
251 | # save model
252 | if step%args.save_every == 0 or step==args.samples:
253 | print('saving model checkpoint')
254 | torch.save({
255 | 'step':step,
256 | 'model_state_dict': model.state_dict(),
257 | 'optimizer_state_dict': optimizer.state_dict(),
258 | 'loss':loss
259 | }, os.path.join(log_dir,'model'))
260 |
261 |
262 | # infer
263 | model.eval()
264 | softmax = torch.nn.Softmax(dim=1)
265 | if args.infer:
266 | for line in sys.stdin:
267 | line = line.strip()
268 | line_tensor = str_to_tensor([line],args.input_length)
269 | output = model(line_tensor)
270 | probs = softmax(output)
271 | top_n, top_i = probs.topk(3)
272 | guess_0 = all_categories[top_i[0,0].item()]
273 | guess_1 = all_categories[top_i[0,1].item()]
274 | guess_2 = all_categories[top_i[0,2].item()]
275 | print(
276 | 'name=',line,
277 | 'guess0=%s (%0.2f)'%(guess_0,probs[0,top_i[0,0].item()],),
278 | 'guess1=%s (%0.2f)'%(guess_1,probs[0,top_i[0,1].item()],),
279 | 'guess2=%s (%0.2f)'%(guess_2,probs[0,top_i[0,2].item()],),
280 | )
281 |
282 |
--------------------------------------------------------------------------------
/img/layers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/img/layers.png
--------------------------------------------------------------------------------
/lecture_notes/ad.py:
--------------------------------------------------------------------------------
1 | print('hello world')
2 |
3 | import torch
4 |
5 | # 0 order tensors = numbers
6 | x = torch.tensor(0.0)
7 | y = torch.tensor(2.0)
8 | z = x + y
9 | print('z=',z.item())
10 |
11 | # 1st order tensors = vectors
12 | x = torch.tensor([1,2,3])
13 |
14 | # 2nd order tensor = matrix
15 | m = torch.tensor(
16 | [[1,2,3]
17 | ,[4,5,6]])
18 |
19 | #m2 = torch.tensor(
20 | # [[1,2,3,4]
21 | # ,[4,5,6]])
22 |
23 | # 3rd order tensors = cubes
24 | c = torch.tensor([[[3],[3]]])
25 |
26 | # two new features of torch
27 | # 1. works on GPUs
28 | # 2. supports automatic diff.
29 | # tensorflow:
30 | # 1. also has TPU
31 | # 2. better deployment deveops
32 |
--------------------------------------------------------------------------------
/lecture_notes/ad2.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | def f(x):
4 | return x**2 + 4*x + 2
5 |
6 | def df(x):
7 | return 2*x + 4
8 |
9 | # minimum at x=-2
10 | # analytic formula
11 | # closed-form formula
12 | # for the minimum of f
13 |
14 | x = torch.tensor(
15 | 0.0,
16 | requires_grad=True
17 | )
18 | y = torch.tensor(
19 | 1.0,
20 | requires_grad=True
21 | )
22 |
23 |
24 | # d/dx f(x) == d/dx z
25 | z = f(x)
26 | z.backward() # computes the derivative
27 |
28 | x.grad # this is df(x)
29 |
30 | print('f(x)=',f(x))
31 | #print('df(x)=',df(x))
32 | print('x.grad=',x.grad)
33 | print('y.grad=',y.grad)
34 |
35 | # gradient descent
36 | x0 = torch.tensor(7.0,requires_grad=True)
37 | z0 = f(x0)
38 | z0.backward()
39 |
40 | alpha = 0.1 # step size, learning rate
41 | x1 = x0 - alpha * x0.grad # key formula
42 | x1 = torch.tensor(x1,requires_grad=True)
43 | z1 = f(x1)
44 | z1.backward()
45 |
46 | x2 = x1 - alpha * x1.grad
47 |
48 | print('x0=',x0)
49 | print('x1=',x1)
50 | print('x2=',x2)
51 |
52 | # loop version of gradient descent
53 | x = torch.tensor(7.0,requires_grad=True)
54 | for i in range(50):
55 | print('i=',i,'x=',x)
56 | z = f(x)
57 | z.backward()
58 | x = x - alpha * x.grad
59 | x = torch.tensor(x,requires_grad=True)
60 |
61 |
62 | # higher order tensors
63 | x = torch.tensor([[[[[7.0,5.6]]]]])
64 |
65 | x = torch.ones(3,4,5)
66 | # 3rd order = R^m*n*o
67 | # m = 3, n=4, o=5
68 | print('x=',x)
69 |
70 | #x = torch.zeros(3,4,5)
71 | #x = torch.empty(3,4,5)
72 | print('x=',x)
73 |
74 | z = f(x)
75 | print('z=',z)
76 | z.backward()
77 | x.grad
78 |
--------------------------------------------------------------------------------
/lecture_notes/einsum.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | x=torch.ones(2,3)
4 |
5 | print('x=',x)
6 |
7 | print('sum=',torch.einsum('im->',[x]))
8 | print('trans=',torch.einsum('ij->ji',[x]))
9 |
10 | y = torch.ones(2)
11 | print('l2=',torch.einsum('i,i->',[y,y]))
12 |
13 | print('complex=',torch.einsum(
14 | 'ij, ij, ij -> ij',
15 | [x,x,x]
16 | ))
17 |
--------------------------------------------------------------------------------
/project/README.md:
--------------------------------------------------------------------------------
1 | # Project: analyzing news articles about the coronavirus
2 |
3 | **Overview:**
4 | This is the final project for [CMC's CS181: Deep Learning](https://github.com/mikeizbicki/cmc-csci181) course.
5 | The project will guide you through the process of using state of the art deep learning techniques to analyze the news coverage of the coronavirus.
6 | The dataset you will analyze contains 2 million news articles written in 20 languages and published in 50,000 venues around the world.
7 | Despite this large dataset size,
8 | the project has been designed to be completed on only modest hardware,
9 | and specifically does not require access to a GPU.
10 |
11 | **Scientific goals:**
12 | We will try to answer the following questions:
13 |
14 | 1. What is the bias of different news sources? (geographic, topical, ideological, etc.)
15 |
16 | 1. How has coverage of the coronavirus changed over time?
17 |
18 | 1. Can we detect/generate "fake news" stories about the coronavirus?
19 | Wikipedia has a [big list of fake news stories related to coronavirus](https://en.wikipedia.org/wiki/Misinformation_related_to_the_2019%E2%80%9320_coronavirus_pandemic).
20 |
21 |
34 |
35 |
39 |
40 | **Learning objectives:**
41 |
42 | 1. Have a cool project in your portfolio to talk about in job interviews
43 |
44 | 1. Understand the following deep learning concepts
45 | 1. explainability
46 | 1. attention (compared with RNNs and CNNs)
47 | 1. transfer learning / fine tuning
48 | 1. embeddings
49 |
50 | 1. Apply deep learning techniques to a real world dataset
51 | 1. understand data cleaning techniques
52 | 1. understand the importance of Unicode in both English and foreign language text
53 | 1. learn how to use "natural supervision" to generate labels for unlabeled data
54 | 1. learn how to approach a problem that no one knows the answers to
55 |
56 | 1. Understand the research process
57 |
58 | **Related projects:**
59 |
60 | There's been lots of machine learning research applied to the coronavirus ([see here for a really big list](https://towardsdatascience.com/machine-learning-methods-to-aid-in-coronavirus-response-70df8bfc7861)).
61 | The closest related research to this project is:
62 |
63 | 1. Kaggle is hosting [a competition to analyze academic articles about coronavirus](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge).
64 | The dataset includes 47,000 scientific articles,
65 | which is too many for doctors to read,
66 | and the goal is to extract the most relevant information about these articles.
67 | 1. Stanford hosted [a virtual conference on COVID-19 and AI](https://hai.stanford.edu/events/covid-19-and-ai-virtual-conference/agenda), the most relavent presentation was by [Renée DiResta](https://hai.stanford.edu/people/ren-e-diresta) on [Misinformation & Disinformation in state media about COVID-19](https://www.youtube.com/watch?v=z4105Exe23Q&t=1hr58m10s)
68 |
69 | ## Part 0: The data
70 |
71 | You can download version 0 of the dataset at:
72 |
73 | 1. training set: https://izbicki.me/public/cs/cs181/coronavirus-headlines-train.jsonl.gz
74 | 1. test set: https://izbicki.me/public/cs/cs181/coronavirus-headlines-test.jsonl.gz
75 |
76 | You should download the training set and place it in a directory called `coronavirus_headlines` with the following commands:
77 | ```
78 | $ mkdir coronavirus-headlines
79 | $ cd coronavirus-headlines
80 | $ wget https://izbicki.me/public/cs/cs181/coronavirus-headlines-train.jsonl.gz
81 | ```
82 |
83 | The dataset is stored in the [JSON Lines](http://jsonlines.org/) format,
84 | where every line represents a single news articles and has the following keys:
85 |
86 | | key | semantics |
87 | | ------------- | --------- |
88 | | `url` | The url of the article |
89 | | `hostname` | The hostname field of the url (e.g. `www.huffpost.com` or `www.breitbart.com`) |
90 | | `title` | The title of the article as extracted by [newspaper3k](https://newspaper.readthedocs.io/en/latest/). This is typically the `` tag or the `` tag of the webpage. No preprocessing has been done on the titles, and so many titles contain weird unicode values that need to be normalized. |
91 | | `pub_time` | The date of publication as extracted by [newspaper3k](https://newspaper.readthedocs.io/en/latest/). These dates should be taken with a heavy grain of salt. Many of these dates are clearly wrong, for example there are dates in the 2030s and dates from thousands of years in the past. Furthermore, some domains like `sputniknews.com` use the European date convention of YYYY-MM-DD, but these are interpreted by newspaper3k as YYYY-DD-MM dates, and so have their day and month flipped. My guess is that for dates that might be relevant to the coronavirus (roughly 2019-11-01 to 2020-04-01), somewhere between 50%-80% of the dates are correct. |
92 | | `lang` | The language ISO-2 code determined by applying [langid.py](https://github.com/saffsd/langid.py) to the body of the article. For popular languages like English and Chinese, I think these labels are fairly accurate. But for other languages I'm less confident. For example, there are many articles about coronavirus labeled as Latin, and my suspicion is that most of these articles are actually written in another romance language like Spanish or Italian. |
93 |
94 | This isn't every article ever written about the coronavirus,
95 | but it's a large fraction of them.
96 | The list has been filtered to include only English language articles,
97 | and articles whose title contains one of the strings `coronavirus`, `corona virus`, `covid` or `ncov`.
98 |
99 | ## Part 1: explainable machine learning
100 |
101 | **due date:** Thursday, 23 April
102 |
103 | ### What's already been done
104 | Our first goal when analyzing this dataset is to predict the hostname that published an article given just the title.
105 | We will see that this gives us a simple (but crude) way to measure how similar two different hostnames are.
106 |
107 | I trained this model with the following command:
108 | ```
109 | $ python3 names.py \
110 | --train \
111 | --data=coronavirus-headlines/coronavirus-headlines-train.jsonl.gz \
112 | --data_format=headlines \
113 | --model=gru \
114 | --hidden_layer_size=512 \
115 | --num_layers=8 \
116 | --resnet \
117 | --dropout=0.2 \
118 | --optimizer=adam \
119 | --learning_rate=1e-3 \
120 | --gradient_clipping \
121 | --batch_size=128
122 | ```
123 | You do not have to run this command yourself, as it will take a long time.
124 | (I let it run for 2 days on my GPU system.)
125 | I am providing the command just so that you can see the particular hyperparameters used to train the model.
126 |
127 | You should download the pretrained model by running the commands
128 | ```
129 | $ wget https://izbicki.me/public/cs/cs181/gru_512x8.tar.gz
130 | $ mkdir models
131 | $ mv gru_512x8.tar.gz models
132 | $ tar -xzf models/gru_512x8.tar.gz
133 | ```
134 |
135 | The last step you need to do before running the code is create a directory for the output explanations to be saved into:
136 | ```
137 | $ mkdir explain_outputs
138 | ```
139 |
140 | We can now run inference on our model with the command
141 | ```
142 | $ python3 names.py --infer --warm_start=models/gru_512x8
143 | ```
144 | which will accept text from stdin and run the inference algorithm on it.
145 |
146 | Examples:
147 |
148 | 1. Geographic similarity:
149 | The Australian website `www.news.com.au` ran the story titled
150 | ```
151 | Sick Qantas passenger at Melbourne Airport sparks coronavirus fears
152 | ```
153 | We can run our inference algorithm on this title using the command
154 | ```
155 | $ python3 names.py --infer --warm_start=models/gru_512x8 <<< "Sick Qantas passenger at Melbourne Airport sparks coronavirus fears"
156 | ```
157 | The top 5 predictions are:
158 | ```
159 | 0 www.news.com.au (0.24)
160 | 1 www.abc.net.au (0.04)
161 | 2 www.dailymail.co.uk (0.04)
162 | 3 au.news.yahoo.com (0.03)
163 | 4 news.yahoo.com (0.03)
164 | ```
165 | Most of these are other Australian newspapers.
166 |
167 | 1. Topical similarity:
168 | The website `virological.org` is a discussion forum where doctors and biologists post their analysis of different viruses.
169 | Understandably, they have been recently been posting detailed analyses of the coronavirus,
170 | and one such post was titled
171 | ```
172 | nCoV's relationship to bat coronaviruses & recombination signals (no snakes) - no evidence the 2019-nCoV lineage is recombinant
173 | ```
174 | We can run our inference algorithm using the command
175 | ```
176 | $ python3 names.py --infer --warm_start=models/gru_512x8 <<< "nCoV's relationship to bat coronaviruses & recombination signals (no snakes) - no evidence the 2019-nCoV lineage is recombinant"
177 | ```
178 | The top 5 predictions are:
179 | ```
180 | 0 virological.org (0.46)
181 | 1 www.businessinsider.com (0.04)
182 | 2 www.insider.com (0.02)
183 | 3 contagiontracker.com (0.02)
184 | 4 www.en24.news (0.01)
185 | ```
186 | This suggests that these websites all publish relatively more academic articles about the coronavirus than other news websites.
187 | Notice that more relavent sites such as `medarxiv.org` (a website for publishing academic medical papers that contains several analyses of the cornavirus) do not appear in this list even though they are very similar to `virological.org`.
188 |
189 | 1. Politics similarity:
190 | `breitbart.com` is a conservative news source that is well known for supporting President Trump.
191 | They published the following article:
192 | ```
193 | Pollak: Coronavirus Panic Partly Driven by Anti-Trump Hysteria
194 | ```
195 | We can run our inference algorithm using the command
196 | ```
197 | $ python3 names.py --infer --warm_start=models/gru_512x8 <<< "Pollak: Coronavirus Panic Partly Driven by Anti-Trump Hysteria"
198 | ```
199 | The top 5 predictions are:
200 | ```
201 | 0 ussanews.com (0.04)
202 | 1 www.infowars.com (0.04)
203 | 2 crossman66.wordpress.com (0.04)
204 | 3 fromthetrenchesworldreport.com (0.03)
205 | 4 jonsnewplace.wordpress.com (0.02)
206 | ```
207 | In this case, Breitbart does not appear as one of the top predictions.
208 | But all the other sources listed share a similar conservative perspective.
209 |
210 |
241 |
242 | **The Problem:**
243 | We have a (crude) way to measure the similarity of two hostnames,
244 | but we can't explain *why* two hostnames get measured similarly.
245 | Your goal in this assignment is to find these explanations using the "sliding window algorithm".
246 |
247 | ### The sliding window algorithm
248 |
249 | The sliding window algorithm is a folklore technique for explaining the result of any machine learning algorithm.
250 | There are more sophisticated algorithms (such as [LIME](https://github.com/marcotcr/lime) and [SHAP](https://github.com/slundberg/shap)),
251 | but these are significantly more difficult to implement and interpret.
252 | They do both have nice libraries, however, which give pretty visualizations.
253 |
254 | **Basic idea.**
255 | If we have an input sentence
256 | ```
257 | Pollak: Coronavirus Panic Partly Driven by Anti-Trump Hysteria
258 | ```
259 | and we want to know how important the word `Trump` is for our final classification,
260 | we can:
261 | (1) remove the word `Trump`,
262 | (2) rerun the classification on the modified sentence,
263 | and (3) compare the results of the model on the modified and unmodified sentences.
264 | If the results are similar, then the word `Trump` is not important;
265 | if the results are different, then the word `Trump` is important.
266 |
267 | **How to remove a word?**
268 | There are a surprising number of ways to remove a word from a sentence:
269 |
270 | 1. Create a new sentence by concatenating everything to the left and right of the removed word.
271 | Thus, our example sentence would become
272 | ```
273 | Pollak: Coronavirus Panic Partly Driven by Anti- Hysteria
274 | ```
275 | This method is easy to implement,
276 | but can result in grammatically incorrect sentences.
277 | Since our model is only trained on grammatically correct sentences,
278 | there is no reason to expect it to perform well on malformed sentences,
279 | and it is likely to output a large difference for every word in the sentence.
280 |
281 | 2. Replace the word with another word.
282 | In our example sentence, we might replace the word `Trump` with the word `Biden` to get
283 | ```
284 | Pollak: Coronavirus Panic Partly Driven by Anti-Biden Hysteria
285 | ```
286 | This sentence is now grammatically correct,
287 | and so we could expect our model to do reasonably on it.
288 | But how well it does would depend on our choice of replacement word,
289 | and so we would need to do many replacements and take an average to get a good estimate of the word's importance.
290 |
291 | 3. Insert "blank" inputs where the selected word should be.
292 | Recall that the inputs to our neural network are one-hot encoded letters.
293 | Therefore, every letter has a vector associated with it that has exactly one `1`.
294 | If we replace the `1` with a `0`, then the model will still know that there is a word in this location (because the size of the input tensor doesn't change),
295 | but the model is getting no information about what that word is.
296 |
297 | **Comparing model outputs.**
298 | There are many ways to calculate the similarity between two model outputs,
299 | but the simplest is using the Euclidean distance between output the model probabilities,
300 | and that's what you should use in this assignment.
301 |
302 | **Example outputs.**
303 | In the following images, each word is colored with the Euclidean distance calculated above.
304 | Darker green values indicate that the word is more important.
305 |
306 |
307 |
308 |
309 |
310 |
311 |
312 |
313 |
314 |
315 |
316 |
317 |
318 | **Pseudocode.**
319 | The following pseudocode summarizes the sliding window explanation algorithm.
320 | ```
321 | construct input_tensor from input_sentence
322 | let probs = softmax(model(input_tensor))
323 | for each word in the input sentence:
324 | construct input_tensor' by setting the colums associated with word to 0
325 | let probs' = softmax(model(input_tensor))
326 | weight[word] = |probs-probs'|
327 | ```
328 |
329 | **Character level explanations.**
330 | By repeating the above procedure with individual characters (rather than words),
331 | we can generate character-level explanations of our text.
332 |
333 | The resulting explanations look like:
334 |
335 |
336 |
337 |
338 |
339 |
340 |
341 |
342 |
343 |
344 |
345 |
346 |
347 |
351 |
352 | ### Tasks for you to complete
353 |
354 | 1. Follow the instructions above to download the pretrained model weights
355 | 1. Implement the `explain` function in the `names.py` files source code
356 | 1. Reproduce the example explanations above to ensure that your code is working correctly
357 | 1. Select 3 titles from the training data and generate word/char level explanations for these titles
358 |
359 | ### Submission
360 |
361 | Upload your explanation images and source code to sakai.
362 |
363 | ## Part 2: the attention mechanism and fine tuning
364 |
365 | This part of the assignment is based on a new dataset [corona.multilan100.jsonl.gz](https://izbicki.me/public/cs/cs181/corona.multilang100.jsonl.gz),
366 | which is in the same format as the previous dataset.
367 | You should download this file and place it in your project folder.
368 |
369 | Unlike the previous dataset, this dataset is multilingual.
370 | It contains news headlines written in:
371 | 1. English,
372 | 1. Spanish,
373 | 1. Portuguese,
374 | 1. Italian,
375 | 1. French,
376 | 1. German,
377 | 1. Russian,
378 | 1. Chinese,
379 | 1. Korean,
380 | 1. and Japanese.
381 |
382 | For each of these 10 languages, 10 prominent news sources were selected to be included in the dataset.
383 | There are therefore 100 classes.
384 | The overall size of the dataset 106767 headines, and so there are about 1000 headlines per news source.
385 |
386 | The character level model we used before will not work in this multilingual setting because many of these languages use different vocabularies.
387 | We will instead use the BERT transformer model and the [transformers](https://huggingface.co/transformers/) python library.
388 |
389 | ### Tasks for you to complete
390 |
391 | 1. Add support for training a multilingual BERT model to the `names.py` file.
392 | 1. Train the BERT model so that you get at least the following training accuuracies:
393 | 1. accuracy@1 >= 0.3
394 | 1. accuracy@20 >= 0.9
395 |
396 | These are very conservative numbers.
397 | You can view my [tensorboard.dev log](https://tensorboard.dev/experiment/2WJbkgdyTlGvh6Gk4mu0PQ/#scalars&_smoothingWeight=0.99) to see what type of performance levels are possible.
398 |
399 | You will have to experiment with different hyperparamter combinations in order to get good results.
400 | You do not have to do any warmstarts to get these results.
401 | I encourage you to try warmstarts since you can get much better accuracies,
402 | but the computational expense may be too much for some of your computers,
403 | and so I am not requiring it.
404 |
405 | 1. Generate a tensorboard.dev plot showing your model training progress
406 |
407 | ### Submission
408 |
409 | Upload the link to you tensorboard.dev output on sakai
410 |
411 | ### Optional task
412 |
413 | Extend the explanation code from part 1 so that it works on the BERT model as well.
414 |
415 | ## Part 3: embeddings
416 |
417 | Recall that in the previous part of the project, we created the following BERT model:
418 |
419 | ```
420 | class BertFineTuning(torch.nn.Module):
421 | def __init__(self):
422 | super().__init__()
423 | self.fc_class = torch.nn.Linear(768,num_classes)
424 |
425 | def forward(self,x):
426 | last_layer,embedding = bert(x)
427 | embedding = torch.mean(last_layer,dim=1)
428 | out = self.fc_class(embedding)
429 | return out
430 | ```
431 |
432 | The linear layer `self.fc_class` is just a matrix that is `768 x num_classes`.
433 | In other words, each class has a 768 dimensional vector associated with it,
434 | and that vector encodes lots of information about the class.
435 | We call this vector an "embedding" of the class.
436 |
437 | By visualizing the embeddings, we can understand which classes our model thinks are "similar".
438 | Tensorboard has some built-in tools for this visualization using algorithms like PCA and t-SNE.
439 |
440 | ### Tasks for you to complete
441 |
442 | 1. Modify your `names.py` file so that it outputs class embeddings to tensorboard.
443 |
444 | 1. Load tensorboard and visualize the resulting embeddings.
445 |
446 | ### Submission
447 |
448 | Upload a screenshot of your embeddings and your source code to sakai.
449 |
--------------------------------------------------------------------------------
/project/img/line0000.char.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0000.char.png
--------------------------------------------------------------------------------
/project/img/line0000.word.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0000.word.png
--------------------------------------------------------------------------------
/project/img/line0001.char.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0001.char.png
--------------------------------------------------------------------------------
/project/img/line0001.word.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0001.word.png
--------------------------------------------------------------------------------
/project/img/line0002.char.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0002.char.png
--------------------------------------------------------------------------------
/project/img/line0002.word.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0002.word.png
--------------------------------------------------------------------------------
/project/img/line0003.char.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0003.char.png
--------------------------------------------------------------------------------
/project/img/line0003.word.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mikeizbicki/cmc-csci181-deeplearning/dc922f971e1a4e8d966cd9d9d73e5d295afbd3b6/project/img/line0003.word.png
--------------------------------------------------------------------------------
/project/names.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | # process command line args
4 | import argparse
5 | parser = argparse.ArgumentParser(fromfile_prefix_chars='@')
6 |
7 | parser_control = parser.add_argument_group('control options')
8 | parser_control.add_argument('--infer',action='store_true')
9 | parser_control.add_argument('--train',action='store_true')
10 | parser_control.add_argument('--generate',action='store_true')
11 |
12 | parser_data = parser.add_argument_group('data options')
13 | parser_data.add_argument('--data',default='names')
14 | parser_data.add_argument('--data_format',choices=['names','headlines'],default='names')
15 | parser_data.add_argument('--sample_strategy',choices=['uniform_line','uniform_category'],default='uniform_category')
16 | parser_data.add_argument('--case_insensitive',action='store_true')
17 | parser_data.add_argument('--dropout',type=float,default=0.0)
18 |
19 | parser_model = parser.add_argument_group('model options')
20 | parser_model.add_argument('--model',choices=['cnn','rnn','gru','lstm'],default='rnn')
21 | parser_model.add_argument('--resnet',action='store_true')
22 | parser_model.add_argument('--hidden_layer_size',type=int,default=128)
23 | parser_model.add_argument('--num_layers',type=int,default=1)
24 | parser_model.add_argument('--conditional_model',action='store_true')
25 |
26 | parser_opt = parser.add_argument_group('optimization options')
27 | parser_opt.add_argument('--batch_size',type=int,default=1)
28 | parser_opt.add_argument('--learning_rate',type=float,default=1e-1)
29 | parser_opt.add_argument('--optimizer',choices=['sgd','adam'],default='sgd')
30 | parser_opt.add_argument('--gradient_clipping',action='store_true')
31 | parser_opt.add_argument('--momentum',type=float,default=0.9)
32 | parser_opt.add_argument('--weight_decay',type=float,default=1e-4)
33 | parser_opt.add_argument('--samples',type=int,default=10000)
34 | parser_opt.add_argument('--input_length',type=int)
35 | parser_opt.add_argument('--warm_start')
36 | parser_opt.add_argument('--disable_categories',action='store_true')
37 |
38 | parser_infer = parser.add_argument_group('inference options')
39 | parser_infer.add_argument('--infer_path',default='explain_outputs')
40 |
41 | parser_generate = parser.add_argument_group('generate options')
42 | parser_generate.add_argument('--temperature',type=float,default=1.0)
43 | parser_generate.add_argument('--max_sample_length',type=int,default=100)
44 | parser_generate.add_argument('--category',nargs='*')
45 |
46 | parser_debug = parser.add_argument_group('debug options')
47 | parser_debug.add_argument('--device',choices=['auto','cpu','gpu'],default='auto')
48 | parser_debug.add_argument('--print_delay',type=int,default=5)
49 | parser_debug.add_argument('--log_dir_base',type=str,default='log')
50 | parser_debug.add_argument('--log_dir',type=str)
51 | parser_debug.add_argument('--save_every',type=int,default=1000)
52 | parser_debug.add_argument('--print_every',type=int,default=100)
53 |
54 | args = parser.parse_args()
55 |
56 | if args.model=='cnn' and args.input_length is None:
57 | raise ValueError('if --model=cnn, then you must specify --input_length')
58 |
59 | # load args from file if warm starting
60 | if args.warm_start is not None:
61 | import sys
62 | import os
63 | args_orig = args
64 | args = parser.parse_args(['@'+os.path.join(args.warm_start,'args')]+sys.argv[1:])
65 | args.train = args_orig.train
66 |
67 | # supress warnings
68 | import warnings
69 | warnings.simplefilter(action='ignore', category=FutureWarning)
70 |
71 | # load modules
72 | import datetime
73 | import glob
74 | import os
75 | import math
76 | import random
77 | import string
78 | import sys
79 | import time
80 | import unicodedata
81 | from unidecode import unidecode
82 |
83 | import torch
84 | import torch.nn as nn
85 | from torch.utils.tensorboard import SummaryWriter
86 |
87 | # set device to cpu/gpu
88 | if args.device=='gpu' or (torch.cuda.is_available() and args.device=='auto'):
89 | device = torch.device('cuda')
90 | torch.set_default_tensor_type('torch.cuda.FloatTensor')
91 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
92 | else:
93 | device = torch.device('cpu')
94 | print('device=',device)
95 |
96 | # import the training data
97 | BOL = '\x00'
98 | EOL = '\x01'
99 | OOV = '\x02'
100 | if args.case_insensitive:
101 | vocabulary = string.ascii_lowercase
102 | else:
103 | vocabulary = string.ascii_letters
104 | vocabulary += " .,;'" + '1234567890:-/#$%' + OOV + BOL + EOL
105 | print('len(vocabulary)=',len(vocabulary))
106 |
107 | def unicode_to_ascii(s):
108 | '''
109 | Removes diacritics from unicode characters.
110 | See: https://stackoverflow.com/a/518232/2809427
111 | '''
112 | return ''.join(
113 | c for c in unicodedata.normalize('NFD', s)
114 | if unicodedata.category(c) != 'Mn'
115 | and c in vocabulary
116 | )
117 |
118 | def format_line(line):
119 | line = unidecode(line)
120 | if args.case_insensitive:
121 | line = line.lower()
122 | return line
123 |
124 | # Build the category_lines dictionary, a list of names per language
125 | if args.data_format == 'names':
126 | category_lines = {}
127 | all_categories = []
128 | #for filename in glob.glob(os.path.join(args.data,'*.txt')):
129 | for filename in glob.glob(os.path.join(args.data,'*')):
130 | print('filename=',filename)
131 | category = os.path.splitext(os.path.basename(filename))[0]
132 | all_categories.append(category)
133 | lines = open(filename, encoding='utf-8').read().strip().split('\n')
134 | lines = [format_line(line) for line in lines]
135 | category_lines[category] = lines
136 |
137 | elif args.data_format == 'headlines':
138 | import gzip
139 | import json
140 | from collections import defaultdict,Counter
141 |
142 | # load data points
143 | category_lines = defaultdict(lambda: [])
144 | lines_category = []
145 | categories_counter = Counter()
146 | with gzip.open(args.data,'rt') as f:
147 | for line in f:
148 | article = json.loads(line)
149 | hostname = article['hostname']
150 | categories_counter[hostname] += 1
151 | day = article['day'].split()[0]
152 | title = article['title']
153 | title = format_line(title)
154 | category_lines[hostname].append(title)
155 | lines_category.append((title, hostname))
156 | all_categories = [ hostname for hostname,count in categories_counter.most_common() ]
157 | all_categories = list(all_categories)
158 |
159 | print('len(lines_category)=',len(lines_category))
160 | print('len(all_categories)=',len(all_categories))
161 |
162 | def str_to_tensor(ss,input_length=None):
163 | '''
164 | Converts a list of strings into a tensor of shape .
165 | This is used to convert text into a form suitable for input into a RNN/CNN.
166 | '''
167 | max_length = max([len(s) for s in ss]) + 2
168 | if input_length:
169 | max_length = input_length
170 | tensor = torch.zeros(max_length, len(ss), len(vocabulary)).to(device)
171 | for j,s in enumerate(ss):
172 | s = BOL + s + EOL
173 | for i, letter in enumerate(s):
174 | if i
207 | out,h_n = self.rnn(x)
208 | out = self.dropout(out)
209 | out_class = self.fc_class(out[out.shape[0]-1,:,:])
210 | out_nextchars = torch.zeros(out.shape[0] , out.shape[1], len(vocabulary) ).to(device)
211 | for i in range(out.shape[0]):
212 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:])
213 | return out_class, out_nextchars
214 |
215 | class ResnetRNNModel(nn.Module):
216 | def __init__(self):
217 | super().__init__()
218 | if args.model=='rnn':
219 | mk_rnn = nn.RNN
220 | if args.model=='gru':
221 | mk_rnn = nn.GRU
222 | if args.model=='lstm':
223 | mk_rnn = nn.LSTM
224 | rnn_input_size = input_size
225 | self.rnns = []
226 | for layer in range(args.num_layers):
227 | rnn = mk_rnn(
228 | rnn_input_size,
229 | args.hidden_layer_size,
230 | num_layers=1,
231 | )
232 | self.add_module('rnn'+str(layer),rnn)
233 | self.rnns.append(rnn)
234 | rnn_input_size = args.hidden_layer_size
235 | self.fc_class = nn.Linear(args.hidden_layer_size,len(all_categories))
236 | self.dropout = nn.Dropout(args.dropout)
237 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,len(vocabulary))
238 |
239 | def forward(self, x):
240 | # out is 3rd order: < len(line) x batch size x hidden_layer_size >
241 | out = x
242 | for layer,rnn in enumerate(self.rnns):
243 | out_prev = out
244 | out,_ = rnn(out)
245 | if layer>0 and args.resnet:
246 | out = out + out_prev
247 | out = self.dropout(out)
248 | out_class = self.fc_class(out[out.shape[0]-1,:,:])
249 | out_nextchars = torch.zeros(out.shape[0] , out.shape[1], len(vocabulary) ).to(device)
250 | for i in range(out.shape[0]):
251 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:])
252 | return out_class, out_nextchars
253 |
254 |
255 | class CNNModel(nn.Module):
256 | def __init__(self):
257 | super(CNNModel,self).__init__()
258 | self.relu = nn.ReLU()
259 | self.cnn = \
260 | nn.Conv1d(input_size,args.hidden_layer_size,3,padding=1)
261 | self.cnns = (args.num_layers-1)*[
262 | nn.Conv1d(args.hidden_layer_size,args.hidden_layer_size,3,padding=1)
263 | ]
264 | self.dropout = nn.Dropout(args.dropout)
265 | self.fc_class = nn.Linear(args.hidden_layer_size*args.input_length,len(all_categories))
266 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,input_size)
267 |
268 | def forward(self,x):
269 | out = torch.einsum('lbv->bvl',x)
270 | out = self.cnn(out)
271 | out = self.relu(out)
272 | for cnn in self.cnns:
273 | out = cnn(out)
274 | out = self.relu(out)
275 | out = self.dropout(out)
276 | out_class = out.view(args.batch_size,args.hidden_layer_size*args.input_length)
277 | out_class = self.fc_class(out_class)
278 | out = torch.einsum('ijk->kij',out)
279 | out_nextchars = torch.zeros([out.shape[0],out.shape[1],input_size])
280 | for i in range(out.shape[0]):
281 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:])
282 | return out_class, out_nextchars
283 |
284 |
285 | # load the model
286 | if args.model=='cnn':
287 | model = CNNModel()
288 | else:
289 | if args.resnet:
290 | model = ResnetRNNModel()
291 | else:
292 | model = RNNModel()
293 | model.to(device)
294 |
295 | if args.warm_start:
296 | print('warm starting model from',args.warm_start)
297 | model_dict = torch.load(os.path.join(args.warm_start,'model'), map_location=device)
298 | model.load_state_dict(model_dict['model_state_dict'])
299 |
300 | # training
301 | if args.train:
302 |
303 | # create log_dir
304 | log_dir = args.log_dir
305 | if log_dir is None:
306 | log_dir = os.path.join(args.log_dir_base,(
307 | 'model='+args.model+
308 | '_hidden='+str(args.hidden_layer_size)+
309 | '_layers='+str(args.num_layers)+
310 | '_cond='+str(args.conditional_model)+
311 | '_resnet='+str(args.resnet)+
312 | '_lr='+str(args.learning_rate)+
313 | '_optim='+args.optimizer+
314 | '_clip='+str(args.gradient_clipping)+
315 | '_'+str(datetime.datetime.now())
316 | ))
317 | try:
318 | os.makedirs(log_dir)
319 | with open(os.path.join(log_dir,'args'), 'w') as f:
320 | f.write('\n'.join(sys.argv[1:]))
321 | except FileExistsError:
322 | print('cannot create log dir,',log_dir,'already exists')
323 | sys.exit(1)
324 | writer = SummaryWriter(log_dir=log_dir)
325 |
326 | # prepare model for training
327 | criterion = nn.CrossEntropyLoss()
328 | if args.optimizer == 'sgd':
329 | optimizer = torch.optim.SGD(
330 | model.parameters(),
331 | lr=args.learning_rate,
332 | momentum=args.momentum,
333 | weight_decay=args.weight_decay
334 | )
335 | if args.optimizer == 'adam':
336 | optimizer = torch.optim.Adam(
337 | model.parameters(),
338 | lr=args.learning_rate,
339 | weight_decay=args.weight_decay
340 | )
341 | model.train()
342 |
343 | # training loop
344 | start_time = time.time()
345 | for step in range(1, args.samples + 1):
346 |
347 | # get random training example
348 | categories = []
349 | lines = []
350 | for i in range(args.batch_size):
351 | if args.sample_strategy == 'uniform_category':
352 | category = random.choice(all_categories)
353 | line = random.choice(category_lines[category])
354 | elif args.sample_strategy == 'uniform_line':
355 | line, category = random.choice(lines_category)
356 |
357 | categories.append(all_categories.index(category))
358 | lines.append(line)
359 | category_tensor = torch.tensor(categories, dtype=torch.long).to(device)
360 | line_tensor = str_to_tensor(lines,args.input_length)
361 |
362 | if args.conditional_model:
363 | category_onehot = torch.nn.functional.one_hot(category_tensor, len(all_categories)).float()
364 | category_onehot = torch.unsqueeze(category_onehot,0)
365 | category_onehot = torch.cat(line_tensor.shape[0]*[category_onehot],dim=0)
366 | input_tensor = torch.cat([line_tensor,category_onehot],dim=2)
367 | else:
368 | input_tensor = line_tensor
369 |
370 | input_tensor = input_tensor.to(device)
371 | category_tensor = category_tensor.to(device)
372 |
373 | # perform training step
374 | output_class,output_nextchars = model(input_tensor)
375 | loss_class = criterion(output_class, category_tensor)
376 | loss_nextchars_perchar = torch.zeros(output_nextchars.shape[0]).to(device)
377 | for i in range(output_nextchars.shape[0]-1):
378 | _, nextchar_i = line_tensor[i+1,:].topk(1)
379 | nextchar_i = nextchar_i.view([-1])
380 | loss_nextchars_perchar[i] = criterion(output_nextchars[i,:], nextchar_i)
381 | loss_nextchars = torch.mean(loss_nextchars_perchar)
382 |
383 | if args.conditional_model or args.disable_categories:
384 | loss = loss_nextchars
385 | else:
386 | loss = loss_class + loss_nextchars
387 | loss.backward()
388 | grad_norm = sum([ torch.norm(p.grad)**2 for p in model.parameters() if p.grad is not None])**(1/2)
389 | if args.gradient_clipping:
390 | torch.nn.utils.clip_grad_norm_(model.parameters(),1.0)
391 | optimizer.step()
392 |
393 | # log optimization information
394 | writer.add_scalar('train/loss_class', loss_class.item(), step)
395 | writer.add_scalar('train/loss_nextchars', loss_nextchars.item(), step)
396 | writer.add_scalar('train/loss', loss.item(), step)
397 | writer.add_scalar('train/grad_norm', grad_norm.item(), step)
398 |
399 | # get accuracy@k
400 | ks = [1, 5, 10, 20]
401 | k = max(ks)
402 | top_n, top_i = output_class.topk(k)
403 | category_tensor_k = torch.cat(k*[torch.unsqueeze(category_tensor,dim=1)],dim=1)
404 | accuracies = torch.where(
405 | top_i[:,:]==category_tensor_k,
406 | torch.ones([args.batch_size,k]).to(device),
407 | torch.zeros([args.batch_size,k]).to(device)
408 | )
409 | for k in ks:
410 | accuracies_k,_ = torch.max(accuracies[:,:k], dim=1)
411 | accuracy_k = torch.mean(accuracies_k).item()
412 | writer.add_scalar('accuracy/@'+str(k), accuracy_k, step)
413 |
414 | # print status update
415 | if step % args.print_every == 0:
416 |
417 | # get category from output
418 | top_n, top_i = output_class.topk(1)
419 | guess_i = top_i[-1].item()
420 | category_i = category_tensor[-1]
421 | guess = all_categories[guess_i]
422 | category = all_categories[category_i]
423 |
424 | # print results
425 | correct = '✓' if guess == category else '✗ (%s)' % category
426 | print('%d %d%% (%.2f sec) %.4f %s / %s %s' % (
427 | step,
428 | step / args.samples * 100,
429 | time.time()-start_time,
430 | loss,
431 | line,
432 | guess,
433 | correct
434 | ))
435 |
436 | # save model
437 | if step%args.save_every == 0 or step==args.samples:
438 | print('saving model checkpoint')
439 | torch.save({
440 | 'step':step,
441 | 'model_state_dict': model.state_dict(),
442 | 'optimizer_state_dict': optimizer.state_dict(),
443 | 'loss':loss
444 | }, os.path.join(log_dir,'model'))
445 |
446 | # infer
447 | def infer(line):
448 | line = line.strip()
449 | if args.case_insensitive:
450 | line = line.lower()
451 | line_tensor = str_to_tensor([line],args.input_length)
452 | output_class,output_nextchars = model(line_tensor)
453 | probs = softmax(output_class)
454 | k=20
455 | top_n, top_i = probs.topk(k)
456 | print('line=',line)
457 | for i in range(k):
458 | guess = all_categories[top_i[0,i].item()]
459 | print(' ',i,guess, '(%0.2f)'%top_n[0,i].item())
460 | if args.infer_path is not None:
461 | i = 0
462 | while os.path.exists(os.path.join(args.infer_path,"line%s.char.png" % str(i).zfill(4))):
463 | i += 1
464 | path_base = os.path.join(args.infer_path,'line'+str(i).zfill(4))
465 | print('path_base=',path_base)
466 | explain(line, path_base+'.char.png', 'char')
467 | explain(line, path_base+'.word.png', 'word')
468 |
469 | def explain(line,filename,explain_type):
470 | scores = torch.zeros([len(line)])
471 | scores[0]=5
472 | scores[1]=4
473 | scores[2]=3
474 | scores[3]=2
475 | scores[4]=1
476 | line2img(line,scores,filename)
477 |
478 |
479 | def line2img(
480 | line,
481 | scores,
482 | filename,
483 | maxwidth=40,
484 | img_width=800
485 | ):
486 | '''
487 | Outputs an image containing text with green/red background highlights to indicate the importance of words in the text.
488 |
489 | Arguments:
490 | line (str): the text that should be printed
491 | scores (Tensor): a vector of size len(line), where each index contains the "weight" of the corresponding letter in the line string; positive values will be colored green, and negative values red.
492 | filename (str): the name of the output file
493 | '''
494 | import matplotlib
495 | import matplotlib.colors as colors
496 | matplotlib.use('Agg')
497 | import matplotlib.pyplot as plt
498 | import numpy as np
499 | import math
500 |
501 | im_height=1+len(line)//maxwidth
502 | im=np.zeros([maxwidth,im_height])
503 | for i in range(scores.shape[0]):
504 | im[i%maxwidth,im_height-i//maxwidth-1] = scores[i]
505 |
506 | cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", ["red","white","green"])
507 | scores_max=torch.max(scores)
508 | norm=plt.Normalize(-scores_max,scores_max)
509 |
510 | dpi=96
511 | fig, ax = plt.subplots(figsize=(img_width/dpi, 300/dpi), dpi=dpi)
512 | ax.get_xaxis().set_visible(False)
513 | ax.get_yaxis().set_visible(False)
514 | ax.spines['left'].set_visible(False)
515 | ax.spines['bottom'].set_visible(False)
516 | ax.spines['right'].set_visible(False)
517 | ax.spines['top'].set_visible(False)
518 | ax.set_xlim(-0.5,-0.5+maxwidth)
519 | ax.set_ylim(-0.5, 0.5+i//maxwidth)
520 | ax.imshow(im.transpose(),cmap=cmap,norm=norm)
521 | for i,c in enumerate(line):
522 | ax.text(i%maxwidth-0.25,im_height-i//maxwidth-0.25-1,c,fontsize=12)
523 | plt.tight_layout()
524 | plt.savefig(filename,bbox_inches='tight')
525 |
526 |
527 |
528 | model.eval()
529 | softmax = torch.nn.Softmax(dim=1)
530 | if args.infer:
531 | for line in sys.stdin:
532 | infer(line)
533 |
534 | if args.generate:
535 | import random
536 | line = ''
537 | for i in range(args.max_sample_length):
538 | line_tensor = str_to_tensor([line],args.input_length)
539 | if args.conditional_model:
540 | category_onehot = torch.zeros([line_tensor.shape[1], len(all_categories)]).to(device)
541 | for category in args.category:
542 | category_i = all_categories.index(category)
543 | category_onehot[0, category_i] = 1
544 | category_onehot = torch.unsqueeze(category_onehot,0)
545 | category_onehot = torch.cat(line_tensor.shape[0]*[category_onehot],dim=0)
546 | input_tensor = torch.cat([line_tensor,category_onehot],dim=2)
547 | else:
548 | input_tensor = line_tensor
549 | _,output_nextchars = model(input_tensor)
550 | # 3rd order tensor < len(line) x batch_size x len(vocabulary) >
551 | probs = softmax(args.temperature*output_nextchars[i,:,:])
552 | dist = torch.distributions.categorical.Categorical(probs)
553 | nextchar_i = dist.sample()
554 | nextchar = vocabulary[nextchar_i]
555 | if nextchar == EOL:
556 | break
557 | if nextchar == OOV:
558 | nextchar='~'
559 | line += nextchar
560 | if args.conditional_model:
561 | print('name=',line)
562 | else:
563 | infer(line)
564 |
565 |
--------------------------------------------------------------------------------
/project/names_transformers.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | # process command line args
4 | import argparse
5 | parser = argparse.ArgumentParser(fromfile_prefix_chars='@')
6 |
7 | parser_control = parser.add_argument_group('control options')
8 | parser_control.add_argument('--infer',action='store_true')
9 | parser_control.add_argument('--train',action='store_true')
10 | parser_control.add_argument('--generate',action='store_true')
11 |
12 | parser_data = parser.add_argument_group('data options')
13 | parser_data.add_argument('--data',default='names')
14 | parser_data.add_argument('--data_format',choices=['names','headlines'],default='names')
15 | parser_data.add_argument('--sample_strategy',choices=['uniform_line','uniform_category'],default='uniform_category')
16 | parser_data.add_argument('--case_insensitive',action='store_true')
17 | parser_data.add_argument('--dropout',type=float,default=0.0)
18 |
19 | parser_model = parser.add_argument_group('model options')
20 | parser_model.add_argument('--model',choices=['cnn','rnn','gru','lstm','bert'],default='rnn')
21 | parser_model.add_argument('--resnet',action='store_true')
22 | parser_model.add_argument('--hidden_layer_size',type=int,default=128)
23 | parser_model.add_argument('--num_layers',type=int,default=1)
24 | parser_model.add_argument('--conditional_model',action='store_true')
25 |
26 | parser_opt = parser.add_argument_group('optimization options')
27 | parser_opt.add_argument('--batch_size',type=int,default=1)
28 | parser_opt.add_argument('--learning_rate',type=float,default=1e-1)
29 | parser_opt.add_argument('--optimizer',choices=['sgd','adam'],default='sgd')
30 | parser_opt.add_argument('--gradient_clipping',action='store_true')
31 | parser_opt.add_argument('--momentum',type=float,default=0.9)
32 | parser_opt.add_argument('--weight_decay',type=float,default=1e-4)
33 | parser_opt.add_argument('--samples',type=int,default=10000)
34 | parser_opt.add_argument('--input_length',type=int)
35 | parser_opt.add_argument('--warm_start')
36 | parser_opt.add_argument('--disable_categories',action='store_true')
37 |
38 | parser_infer = parser.add_argument_group('inference options')
39 | parser_infer.add_argument('--infer_path',default='explain_outputs')
40 |
41 | parser_generate = parser.add_argument_group('generate options')
42 | parser_generate.add_argument('--temperature',type=float,default=1.0)
43 | parser_generate.add_argument('--max_sample_length',type=int,default=100)
44 | parser_generate.add_argument('--category',nargs='*')
45 |
46 | parser_debug = parser.add_argument_group('debug options')
47 | parser_debug.add_argument('--device',choices=['auto','cpu','gpu'],default='auto')
48 | parser_debug.add_argument('--print_delay',type=int,default=5)
49 | parser_debug.add_argument('--log_dir_base',type=str,default='log')
50 | parser_debug.add_argument('--log_dir',type=str)
51 | parser_debug.add_argument('--save_every',type=int,default=1000)
52 | parser_debug.add_argument('--print_every',type=int,default=100)
53 |
54 | args = parser.parse_args()
55 |
56 | if args.model=='cnn' and args.input_length is None:
57 | raise ValueError('if --model=cnn, then you must specify --input_length')
58 |
59 | # load args from file if warm starting
60 | if args.warm_start is not None:
61 | import sys
62 | import os
63 | args_orig = args
64 | args = parser.parse_args(['@'+os.path.join(args.warm_start,'args')]+sys.argv[1:])
65 | args.train = args_orig.train
66 |
67 | # supress warnings
68 | import warnings
69 | warnings.simplefilter(action='ignore', category=FutureWarning)
70 |
71 | # load modules
72 | import datetime
73 | import glob
74 | import os
75 | import math
76 | import random
77 | import string
78 | import sys
79 | import time
80 | import unicodedata
81 | from unidecode import unidecode
82 |
83 | import torch
84 | import torch.nn as nn
85 | from torch.utils.tensorboard import SummaryWriter
86 | import transformers
87 |
88 | # set device to cpu/gpu
89 | if args.device=='gpu' or (torch.cuda.is_available() and args.device=='auto'):
90 | device = torch.device('cuda')
91 | torch.set_default_tensor_type('torch.cuda.FloatTensor')
92 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
93 | else:
94 | device = torch.device('cpu')
95 | print('device=',device)
96 |
97 | # import the training data
98 | BOL = '\x00'
99 | EOL = '\x01'
100 | OOV = '\x02'
101 | if args.case_insensitive:
102 | vocabulary = string.ascii_lowercase
103 | else:
104 | vocabulary = string.ascii_letters
105 | vocabulary += " .,;'" + '1234567890:-/#$%' + OOV + BOL + EOL
106 | print('len(vocabulary)=',len(vocabulary))
107 |
108 | def unicode_to_ascii(s):
109 | '''
110 | Removes diacritics from unicode characters.
111 | See: https://stackoverflow.com/a/518232/2809427
112 | '''
113 | return ''.join(
114 | c for c in unicodedata.normalize('NFD', s)
115 | if unicodedata.category(c) != 'Mn'
116 | and c in vocabulary
117 | )
118 |
119 | def format_line(line):
120 | line = unidecode(line)
121 | if args.case_insensitive:
122 | line = line.lower()
123 | return line
124 |
125 | # Build the category_lines dictionary, a list of names per language
126 | if args.data_format == 'names':
127 | category_lines = {}
128 | all_categories = []
129 | #for filename in glob.glob(os.path.join(args.data,'*.txt')):
130 | for filename in glob.glob(os.path.join(args.data,'*')):
131 | print('filename=',filename)
132 | category = os.path.splitext(os.path.basename(filename))[0]
133 | all_categories.append(category)
134 | lines = open(filename, encoding='utf-8').read().strip().split('\n')
135 | lines = [format_line(line) for line in lines]
136 | category_lines[category] = lines
137 |
138 | elif args.data_format == 'headlines':
139 | import gzip
140 | import json
141 | from collections import defaultdict,Counter
142 |
143 | # load data points
144 | category_lines = defaultdict(lambda: [])
145 | lines_category = []
146 | categories_counter = Counter()
147 | with gzip.open(args.data,'rt') as f:
148 | for line in f:
149 | article = json.loads(line)
150 | hostname = article['hostname']
151 | categories_counter[hostname] += 1
152 | #day = article['day'].split()[0]
153 | title = article['title']
154 | title = format_line(title)
155 | category_lines[hostname].append(title)
156 | lines_category.append((title, hostname))
157 | all_categories = [ hostname for hostname,count in categories_counter.most_common() ]
158 | all_categories = list(all_categories)
159 |
160 | #print('len(lines_category)=',len(lines_category))
161 | print('len(all_categories)=',len(all_categories))
162 |
163 | def str_to_tensor(ss,input_length=None):
164 | '''
165 | Converts a list of strings into a tensor of shape .
166 | This is used to convert text into a form suitable for input into a RNN/CNN.
167 | '''
168 | max_length = max([len(s) for s in ss]) + 2
169 | if input_length:
170 | max_length = input_length
171 | tensor = torch.zeros(max_length, len(ss), len(vocabulary)).to(device)
172 | for j,s in enumerate(ss):
173 | s = BOL + s + EOL
174 | for i, letter in enumerate(s):
175 | if i
225 | out,h_n = self.rnn(x)
226 | out = self.dropout(out)
227 | out_class = self.fc_class(out[out.shape[0]-1,:,:])
228 | out_nextchars = torch.zeros(out.shape[0] , out.shape[1], len(vocabulary) ).to(device)
229 | for i in range(out.shape[0]):
230 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:])
231 | return out_class, out_nextchars
232 |
233 | class ResnetRNNModel(nn.Module):
234 | def __init__(self):
235 | super().__init__()
236 | if args.model=='rnn':
237 | mk_rnn = nn.RNN
238 | if args.model=='gru':
239 | mk_rnn = nn.GRU
240 | if args.model=='lstm':
241 | mk_rnn = nn.LSTM
242 | rnn_input_size = input_size
243 | self.rnns = []
244 | for layer in range(args.num_layers):
245 | rnn = mk_rnn(
246 | rnn_input_size,
247 | args.hidden_layer_size,
248 | num_layers=1,
249 | )
250 | self.add_module('rnn'+str(layer),rnn)
251 | self.rnns.append(rnn)
252 | rnn_input_size = args.hidden_layer_size
253 | self.fc_class = nn.Linear(args.hidden_layer_size,len(all_categories))
254 | self.dropout = nn.Dropout(args.dropout)
255 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,len(vocabulary))
256 |
257 | def forward(self, x):
258 | # out is 3rd order: < len(line) x batch size x hidden_layer_size >
259 | out = x
260 | for layer,rnn in enumerate(self.rnns):
261 | out_prev = out
262 | out,_ = rnn(out)
263 | if layer>0 and args.resnet:
264 | out = out + out_prev
265 | out = self.dropout(out)
266 | out_class = self.fc_class(out[out.shape[0]-1,:,:])
267 | out_nextchars = torch.zeros(out.shape[0] , out.shape[1], len(vocabulary) ).to(device)
268 | for i in range(out.shape[0]):
269 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:,:])
270 | return out_class, out_nextchars
271 |
272 |
273 | class CNNModel(nn.Module):
274 | def __init__(self):
275 | super(CNNModel,self).__init__()
276 | self.relu = nn.ReLU()
277 | self.cnn = \
278 | nn.Conv1d(input_size,args.hidden_layer_size,3,padding=1)
279 | self.cnns = (args.num_layers-1)*[
280 | nn.Conv1d(args.hidden_layer_size,args.hidden_layer_size,3,padding=1)
281 | ]
282 | self.dropout = nn.Dropout(args.dropout)
283 | self.fc_class = nn.Linear(args.hidden_layer_size*args.input_length,len(all_categories))
284 | self.fc_nextchars = nn.Linear(args.hidden_layer_size,input_size)
285 |
286 | def forward(self,x):
287 | out = torch.einsum('lbv->bvl',x)
288 | out = self.cnn(out)
289 | out = self.relu(out)
290 | for cnn in self.cnns:
291 | out = cnn(out)
292 | out = self.relu(out)
293 | out = self.dropout(out)
294 | out_class = out.view(args.batch_size,args.hidden_layer_size*args.input_length)
295 | out_class = self.fc_class(out_class)
296 | out = torch.einsum('ijk->kij',out)
297 | out_nextchars = torch.zeros([out.shape[0],out.shape[1],input_size])
298 | for i in range(out.shape[0]):
299 | out_nextchars[i,:,:] = self.fc_nextchars(out[i,:])
300 | return out_class, out_nextchars
301 |
302 |
303 | model_name = 'bert-base-multilingual-uncased'
304 | tokenizer = transformers.BertTokenizer.from_pretrained(model_name)
305 | bert = transformers.BertModel.from_pretrained(model_name)
306 | print('bert.config.vocab_size=',bert.config.vocab_size)
307 | class BertFineTuning(nn.Module):
308 | def __init__(self):
309 | super().__init__()
310 | embedding_size = args.hidden_layer_size
311 | self.fc_class = nn.Linear(768,len(all_categories))
312 |
313 | def forward(self,x):
314 | input_ids, attention_mask = x
315 | last_layer,embedding = bert(input_ids)
316 | embedding = torch.mean(last_layer,dim=1)
317 | out = self.fc_class(embedding)
318 | return out, None
319 |
320 | # load the model
321 | if args.model=='bert':
322 | model = BertFineTuning()
323 | elif args.model=='cnn':
324 | model = CNNModel()
325 | else:
326 | if args.resnet:
327 | model = ResnetRNNModel()
328 | else:
329 | model = RNNModel()
330 | model.to(device)
331 |
332 | if args.warm_start:
333 | print('warm starting model from',args.warm_start)
334 | model_dict = torch.load(os.path.join(args.warm_start,'model'))
335 | model.load_state_dict(model_dict['model_state_dict'])
336 |
337 | # training
338 | if args.train:
339 |
340 | # create log_dir
341 | log_dir = args.log_dir
342 | if log_dir is None:
343 | log_dir = os.path.join(args.log_dir_base,(
344 | 'model='+args.model+
345 | '_hidden='+str(args.hidden_layer_size)+
346 | '_layers='+str(args.num_layers)+
347 | '_cond='+str(args.conditional_model)+
348 | '_resnet='+str(args.resnet)+
349 | '_lr='+str(args.learning_rate)+
350 | '_optim='+args.optimizer+
351 | '_clip='+str(args.gradient_clipping)+
352 | '_'+str(datetime.datetime.now())
353 | ))
354 | try:
355 | os.makedirs(log_dir)
356 | with open(os.path.join(log_dir,'args'), 'w') as f:
357 | f.write('\n'.join(sys.argv[1:]))
358 | except FileExistsError:
359 | print('cannot create log dir,',log_dir,'already exists')
360 | sys.exit(1)
361 | writer = SummaryWriter(log_dir=log_dir)
362 |
363 | # prepare model for training
364 | criterion = nn.CrossEntropyLoss()
365 | print('model.parameters()=',list(model.parameters()))
366 | if args.optimizer == 'sgd':
367 | optimizer = torch.optim.SGD(
368 | model.parameters(),
369 | lr=args.learning_rate,
370 | momentum=args.momentum,
371 | weight_decay=args.weight_decay
372 | )
373 | if args.optimizer == 'adam':
374 | optimizer = torch.optim.Adam(
375 | model.parameters(),
376 | lr=args.learning_rate,
377 | weight_decay=args.weight_decay
378 | )
379 | model.train()
380 |
381 | # training loop
382 | start_time = time.time()
383 | for step in range(1, args.samples + 1):
384 |
385 | # get random training example
386 | categories = []
387 | lines = []
388 | for i in range(args.batch_size):
389 | if args.sample_strategy == 'uniform_category':
390 | category = random.choice(all_categories)
391 | line = random.choice(category_lines[category])
392 | elif args.sample_strategy == 'uniform_line':
393 | line, category = random.choice(lines_category)
394 |
395 | categories.append(all_categories.index(category))
396 | lines.append(line)
397 | category_tensor = torch.tensor(categories, dtype=torch.long).to(device)
398 |
399 | if args.model=='bert':
400 | input_tensor = str_to_tensor_bert(lines)
401 | else:
402 | line_tensor = str_to_tensor(lines,args.input_length)
403 |
404 | if args.conditional_model:
405 | category_onehot = torch.nn.functional.one_hot(category_tensor, len(all_categories)).float()
406 | category_onehot = torch.unsqueeze(category_onehot,0)
407 | category_onehot = torch.cat(line_tensor.shape[0]*[category_onehot],dim=0)
408 | input_tensor = torch.cat([line_tensor,category_onehot],dim=2)
409 | else:
410 | input_tensor = line_tensor
411 |
412 | input_tensor = input_tensor.to(device)
413 | category_tensor = category_tensor.to(device)
414 |
415 | # perform training step
416 | output_class,output_nextchars = model(input_tensor)
417 | loss_class = criterion(output_class, category_tensor)
418 | if args.model=='bert':
419 | loss_nextchars = torch.tensor(0.0)
420 | else:
421 | loss_nextchars_perchar = torch.zeros(output_nextchars.shape[0]).to(device)
422 | for i in range(output_nextchars.shape[0]-1):
423 | _, nextchar_i = line_tensor[i+1,:].topk(1)
424 | nextchar_i = nextchar_i.view([-1])
425 | loss_nextchars_perchar[i] = criterion(output_nextchars[i,:], nextchar_i)
426 | loss_nextchars = torch.mean(loss_nextchars_perchar)
427 |
428 | if args.conditional_model or args.disable_categories:
429 | loss = loss_nextchars
430 | else:
431 | loss = loss_class + loss_nextchars
432 | loss.backward()
433 | grad_norm = sum([ torch.norm(p.grad)**2 for p in model.parameters() if p.grad is not None])**(1/2)
434 | if args.gradient_clipping:
435 | torch.nn.utils.clip_grad_norm_(model.parameters(),1.0)
436 | optimizer.step()
437 |
438 | # log optimization information
439 | writer.add_scalar('train/loss_class', loss_class.item(), step)
440 | writer.add_scalar('train/loss_nextchars', loss_nextchars.item(), step)
441 | writer.add_scalar('train/loss', loss.item(), step)
442 | writer.add_scalar('train/grad_norm', grad_norm.item(), step)
443 |
444 | # get accuracy@k
445 | ks = [1, 5, 10, 20]
446 | k = max(ks)
447 | top_n, top_i = output_class.topk(k)
448 | category_tensor_k = torch.cat(k*[torch.unsqueeze(category_tensor,dim=1)],dim=1)
449 | accuracies = torch.where(
450 | top_i[:,:]==category_tensor_k,
451 | torch.ones([args.batch_size,k]).to(device),
452 | torch.zeros([args.batch_size,k]).to(device)
453 | )
454 | for k in ks:
455 | accuracies_k,_ = torch.max(accuracies[:,:k], dim=1)
456 | accuracy_k = torch.mean(accuracies_k).item()
457 | writer.add_scalar('accuracy/@'+str(k), accuracy_k, step)
458 |
459 | # print status update
460 | if step % args.print_every == 0:
461 |
462 | # get category from output
463 | top_n, top_i = output_class.topk(1)
464 | guess_i = top_i[-1].item()
465 | category_i = category_tensor[-1]
466 | guess = all_categories[guess_i]
467 | category = all_categories[category_i]
468 |
469 | # print results
470 | correct = '✓' if guess == category else '✗ (%s)' % category
471 | print('%d %d%% (%.2f sec) %.4f %s / %s %s' % (
472 | step,
473 | step / args.samples * 100,
474 | time.time()-start_time,
475 | loss,
476 | line,
477 | guess,
478 | correct
479 | ))
480 |
481 | # save model
482 | if step%args.save_every == 0 or step==args.samples:
483 | print('saving model checkpoint')
484 | torch.save({
485 | 'step':step,
486 | 'model_state_dict': model.state_dict(),
487 | 'optimizer_state_dict': optimizer.state_dict(),
488 | 'loss':loss
489 | }, os.path.join(log_dir,'model'))
490 |
491 | # infer
492 | def infer(line):
493 | line = line.strip()
494 | if args.case_insensitive:
495 | line = line.lower()
496 | line_tensor = str_to_tensor([line],args.input_length)
497 | output_class,output_nextchars = model(line_tensor)
498 | probs = softmax(output_class)
499 | k=20
500 | top_n, top_i = probs.topk(k)
501 | print('line=',line)
502 | for i in range(k):
503 | guess = all_categories[top_i[0,i].item()]
504 | print(' ',i,guess, '(%0.2f)'%top_n[0,i].item())
505 | if args.infer_path is not None:
506 | i = 0
507 | while os.path.exists(os.path.join(args.infer_path,"line%s.char.png" % str(i).zfill(4))):
508 | i += 1
509 | path_base = os.path.join(args.infer_path,'line'+str(i).zfill(4))
510 | print('path_base=',path_base)
511 | explain(line, path_base+'.char.png', 'char')
512 | explain(line, path_base+'.word.png', 'word')
513 |
514 | def explain(line,filename,explain_type):
515 | scores = torch.zeros([len(line)])
516 | scores[0]=5
517 | scores[1]=4
518 | scores[2]=3
519 | scores[3]=2
520 | scores[4]=1
521 | line2img(line,scores,filename)
522 |
523 |
524 | def line2img(
525 | line,
526 | scores,
527 | filename,
528 | maxwidth=40,
529 | img_width=800
530 | ):
531 | '''
532 | Outputs an image containing text with green/red background highlights to indicate the importance of words in the text.
533 |
534 | Arguments:
535 | line (str): the text that should be printed
536 | scores (Tensor): a vector of size len(line), where each index contains the "weight" of the corresponding letter in the line string; positive values will be colored green, and negative values red.
537 | filename (str): the name of the output file
538 | '''
539 | import matplotlib
540 | import matplotlib.colors as colors
541 | matplotlib.use('Agg')
542 | import matplotlib.pyplot as plt
543 | import numpy as np
544 | import math
545 |
546 | im_height=1+len(line)//maxwidth
547 | im=np.zeros([maxwidth,im_height])
548 | for i in range(scores.shape[0]):
549 | im[i%maxwidth,im_height-i//maxwidth-1] = scores[i]
550 |
551 | cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", ["red","white","green"])
552 | scores_max=torch.max(scores)
553 | norm=plt.Normalize(-scores_max,scores_max)
554 |
555 | dpi=96
556 | fig, ax = plt.subplots(figsize=(img_width/dpi, 300/dpi), dpi=dpi)
557 | ax.get_xaxis().set_visible(False)
558 | ax.get_yaxis().set_visible(False)
559 | ax.spines['left'].set_visible(False)
560 | ax.spines['bottom'].set_visible(False)
561 | ax.spines['right'].set_visible(False)
562 | ax.spines['top'].set_visible(False)
563 | ax.set_xlim(-0.5,-0.5+maxwidth)
564 | ax.set_ylim(-0.5, 0.5+i//maxwidth)
565 | ax.imshow(im.transpose(),cmap=cmap,norm=norm)
566 | for i,c in enumerate(line):
567 | ax.text(i%maxwidth-0.25,im_height-i//maxwidth-0.25-1,c,fontsize=12)
568 | plt.tight_layout()
569 | plt.savefig(filename,bbox_inches='tight')
570 |
571 |
572 |
573 | model.eval()
574 | softmax = torch.nn.Softmax(dim=1)
575 | if args.infer:
576 | for line in sys.stdin:
577 | infer(line)
578 |
579 | if args.generate:
580 | import random
581 | line = ''
582 | for i in range(args.max_sample_length):
583 | line_tensor = str_to_tensor([line],args.input_length)
584 | if args.conditional_model:
585 | category_onehot = torch.zeros([line_tensor.shape[1], len(all_categories)]).to(device)
586 | for category in args.category:
587 | category_i = all_categories.index(category)
588 | category_onehot[0, category_i] = 1
589 | category_onehot = torch.unsqueeze(category_onehot,0)
590 | category_onehot = torch.cat(line_tensor.shape[0]*[category_onehot],dim=0)
591 | input_tensor = torch.cat([line_tensor,category_onehot],dim=2)
592 | else:
593 | input_tensor = line_tensor
594 | _,output_nextchars = model(input_tensor)
595 | # 3rd order tensor < len(line) x batch_size x len(vocabulary) >
596 | probs = softmax(args.temperature*output_nextchars[i,:,:])
597 | dist = torch.distributions.categorical.Categorical(probs)
598 | nextchar_i = dist.sample()
599 | nextchar = vocabulary[nextchar_i]
600 | if nextchar == EOL:
601 | break
602 | if nextchar == OOV:
603 | nextchar='~'
604 | line += nextchar
605 | if args.conditional_model:
606 | print('name=',line)
607 | else:
608 | infer(line)
609 |
610 |
611 |
--------------------------------------------------------------------------------
/project/transformers_tutorial.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python3
2 |
3 | # these lines prevent lots of warnings from being displayed
4 | import warnings
5 | warnings.simplefilter(action='ignore', category=FutureWarning)
6 | import os
7 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
8 |
9 | # import the deep learning libraries
10 | import torch
11 | import transformers
12 |
13 | # load the model
14 | #checkpoint_name = 'bert-base-uncased'
15 | checkpoint_name = 'bert-base-multilingual-uncased'
16 | tokenizer = transformers.BertTokenizer.from_pretrained(checkpoint_name)
17 | bert = transformers.BertModel.from_pretrained(checkpoint_name)
18 |
19 | # sample data
20 | lines = [
21 | 'The coronavirus pandemic has taken over the world.', # English
22 | 'La pandemia de coronavirus se ha apoderado del mundo.', # Spanish
23 | 'La pandemia di coronavirus ha conquistato il mondo.', # Italian
24 | 'Capta est coronavirus pandemic in orbe terrarum.', # Latin
25 | 'đại dịch coronavirus đã chiếm lĩnh thế giới.', # Vietnamese
26 | 'пандемия коронавируса захватила мир.', # Russian
27 | 'سيطر وباء الفيروس التاجي على العالم.', # Arabic
28 | 'מגיפת הנגיף השתלט על העולם.', # Hebrew
29 | '코로나 바이러스 전염병이 세계를 점령했습니다.', # Korean
30 | '冠狀病毒大流行已席捲全球。', # Chinese (simplified)
31 | '冠状病毒大流行已经席卷全球。', # Chinese (traditional)
32 | 'コロナウイルスのパンデミックが世界を席巻しました。', # Japanese
33 | ]
34 |
35 | for line in lines:
36 | tokens = tokenizer.tokenize(line)
37 | print("tokens=",tokens)
38 | crash
39 |
40 | # generates 1-hot encodings of the lines
41 | max_length = 64
42 | encodings = []
43 | #lines = lines[0:1]
44 | for line in lines:
45 | encoding = tokenizer.encode_plus(
46 | line,
47 | #add_special_tokens = True,
48 | max_length = max_length,
49 | pad_to_max_length = True,
50 | #return_attention_mask = True,
51 | return_tensors = 'pt',
52 | )
53 | #print("encoding.keys()=",encoding.keys())
54 | #print("encoding['input_ids'].shape=",encoding['input_ids'].shape)
55 | #print("encoding['input_ids']=",encoding['input_ids'])
56 | encodings.append(encoding)
57 |
58 | input_ids = torch.cat([encoding['input_ids'] for encoding in encodings ],dim=0)
59 | #attention_mask = torch.cat([ encoding['attention_mask'] for encoding in encodings ],dim=0)
60 |
61 | import datetime
62 | for i in range(10):
63 | print(datetime.datetime.now())
64 | last_layer,embedding = bert(input_ids) #, attention_mask)
65 | print("last_layer.shape=",last_layer.shape)
66 | print("embedding.shape=",embedding.shape)
67 | crash
68 |
69 |
70 | class BertFineTuning(nn.Module):
71 | def __init__(self):
72 | super().__init__()
73 | #self.bert = transformers.BertModel.from_pretrained(checkpoint_name)
74 | self.fc = nn.Linear(768,num_classes)
75 |
76 | def forward(self,x):
77 | #last_layer,embedding = self.bert(x)
78 | last_layer,embedding = bert(x)
79 | embedding = torch.mean(last_layer,dim=1)
80 | out = self.fc(embedding)
81 | return out
82 |
--------------------------------------------------------------------------------