├── .DS_Store
├── .gitignore
├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
├── code
├── .DS_Store
├── .ipynb_checkpoints
│ ├── a_dogscats_setup_data-checkpoint.ipynb
│ └── a_setup_data-checkpoint.ipynb
├── README.md
├── a_dogscats_setup_data.ipynb
├── split_data.sh
├── subset_data.md
├── tree.md
└── untar_weights_file.md
├── courses
├── .DS_Store
├── README.md
├── cla
│ └── README.md
├── mes_projets
│ ├── 0_login.md
│ ├── 1_scp_data_aws.md
│ ├── 2_split_data.md
│ ├── README.md
│ ├── a_dogscats_setup_data.ipynb
│ ├── comments.md
│ ├── get_data.md
│ ├── test.md
│ └── to_do.md
├── ml1
│ ├── lesson_01.md
│ ├── lesson_02.md
│ ├── lesson_03.md
│ ├── lesson_04.md
│ └── lesson_05.md
├── nlp
│ ├── README.md
│ ├── videos_01_to_05.md
│ └── videos_06_to_10.md
├── udacity_pytorch
│ ├── README.md
│ ├── images
│ │ ├── .keep
│ │ └── cnn_formulas.png
│ ├── notes.md
│ ├── orientation.md
│ ├── pytorch_1_nanodegree.md
│ ├── pytorch_2_.md
│ ├── pytorch_3.md
│ ├── pytorch_4_cnn.md
│ ├── pytorch_5_style_transfer.md
│ ├── pytorch_6_rnn.md
│ └── softmax.ipynb
├── v2-dl1
│ ├── .DS_Store
│ ├── README.md
│ ├── lesson_1a_course_intro.md
│ ├── lesson_1b_cnn_tools.md
│ ├── lesson_2_resnet34_resnext50.md
│ ├── lesson_3_x.md
│ ├── lesson_4_x.md
│ ├── lesson_5_x.md
│ ├── lesson_6_x.md
│ └── lesson_7_x.md
├── v2-dl2
│ ├── README.md
│ ├── lesson_08.md
│ ├── lesson_09.md
│ ├── lesson_10_1.md
│ ├── lesson_10_2.md
│ ├── lesson_11_1.md
│ └── lesson_11_2.md
├── v3-dl1
│ ├── .DS_Store
│ ├── .keep
│ ├── README.md
│ ├── gcp_0_setup_notes.md
│ ├── gcp_1_logging_in.md
│ ├── images
│ │ ├── .keep
│ │ ├── camel.jpeg
│ │ ├── camels_class.png
│ │ ├── camels_confusion.png
│ │ ├── elephant1.png
│ │ ├── elephant_cm.png
│ │ ├── elephant_predict.png
│ │ ├── gcp1.png
│ │ ├── horse.jpeg
│ │ ├── horses_txt.png
│ │ ├── nyc_group.jpeg
│ │ ├── rs_camel.jpg
│ │ ├── soumith.jpg
│ │ └── south_africa.png
│ ├── kaggle_fruits.md
│ ├── lesson_1_lecture.md
│ ├── lesson_1_rs_camels_horses.md
│ ├── lesson_2_1_lecture.md
│ ├── lesson_2_2_lecture.md
│ └── lesson_3_lecture.md
├── v3-dl2
│ ├── README.md
│ └── lecture_8.md
└── v4-dl1
│ ├── README.md
│ ├── doc_Jupyter_01.md
│ ├── doc_Jupyter_02_reference.md
│ ├── image
│ ├── .keep
│ └── transforms.png
│ ├── lesson_01.md
│ ├── lesson_02.md
│ ├── lesson_03.md
│ ├── lesson_04.md
│ ├── lesson_05_ethics.md
│ ├── lesson_06.md
│ ├── lesson_07.md
│ ├── lesson_08_NLP.md
│ └── paperspace.md
├── fastai_dl_course_v1.md
├── fastai_dl_course_v2.md
├── fastai_dl_course_v3.md
├── fastai_dl_terms.md
├── fastai_ml_course.md
├── googlefc30e18b4a9edaa2.html
├── helpful_linux_commands.md
├── images
├── chrome_curlwget.png
├── dl_libraries.png
├── image_downloader.png
├── lesson_08
│ ├── lesson08_lr_find.png
│ ├── lesson8_bbox.png
│ ├── lesson8_dl_box.png
│ ├── lesson8_embeddings.png
│ ├── lesson8_learning.png
│ ├── lesson8_learning2.png
│ ├── lesson8_lr_find2.png
│ ├── lesson8_matplotlib.png
│ ├── lesson8_md.png
│ ├── lesson8_motivation.png
│ ├── lesson8_nb_pascal.png
│ ├── lesson8_obj_det.png
│ ├── lesson8_opps.png
│ ├── lesson8_paper.png
│ ├── lesson8_part1_2.png
│ ├── lesson8_part2.png
│ ├── lesson8_stage1.png
│ ├── lesson8_step1.png
│ ├── lesson8_transfer_learning.png
│ ├── lesson8_visualize.png
│ └── lesson8_x.png
├── lesson_09
│ ├── .keep
│ ├── lesson9_archit.png
│ ├── lesson9_bbox.png
│ ├── lesson9_data_loader.png
│ ├── lesson9_know_these1.png
│ └── lesson9_know_these2.png
├── lesson_11
│ ├── .keep
│ ├── lesson_11_charloop.png
│ ├── lesson_11_nt.png
│ ├── lesson_11_rnn.png
│ ├── lesson_11_rnn2.png
│ ├── lesson_11_rnn_stacked.png
│ └── lesson_11_rnn_stacked2.png
├── ncm_gephi.jpg
├── paperspace.png
├── paperspace_fastai.png
├── paperspace_jupyter.png
├── pretrained_networks.png
├── softmax.png
├── tmux_start.png
├── tmux_summary.png
└── triple_backticks.png
├── notes
├── competitions.md
├── deep_learning_libraries.md
├── imagenet.md
├── loss_functions.md
├── nlp_data.md
└── nlp_terms.md
├── resources.md
├── takeaways.md
├── tips_faq_beginners.md
├── tips_prereqs.md
├── tips_troubleshooting.md
└── tools
├── README.md
├── aws_ami_gpu_setup.md
├── check_links.py
├── copy_files_local_to_cloud.md
├── create_keypair.md
├── crestle_run.md
├── download_data_browser_curlwget.md
├── download_data_curl.md
├── download_data_kaggle_cli.md
├── getting_image_data.md
├── jupyter_notebook.md
├── paperspace.md
├── setup_personal_dl_box.md
├── symlinks.md
├── temp
├── .keep
└── index.html
├── tmux.md
└── unix_linux.md
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/.DS_Store
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | tmp*
2 | tags
3 | data
4 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Pledge
4 |
5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
6 |
7 | ## Our Standards
8 |
9 | Examples of behavior that contributes to creating a positive environment include:
10 |
11 | * Using welcoming and inclusive language
12 | * Being respectful of differing viewpoints and experiences
13 | * Gracefully accepting constructive criticism
14 | * Focusing on what is best for the community
15 | * Showing empathy towards other community members
16 |
17 | Examples of unacceptable behavior by participants include:
18 |
19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
20 | * Trolling, insulting/derogatory comments, and personal or political attacks
21 | * Public or private harassment
22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
23 | * Other conduct which could reasonably be considered inappropriate in a professional setting
24 |
25 | ## Our Responsibilities
26 |
27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28 |
29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30 |
31 | ## Scope
32 |
33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34 |
35 | ## Enforcement
36 |
37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38 |
39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40 |
41 | ## Attribution
42 |
43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44 |
45 | [homepage]: http://contributor-covenant.org
46 | [version]: http://contributor-covenant.org/version/1/4/
47 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 Reshama Shaikh
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # [fast.ai](http://www.fast.ai)
2 | - latest course (v3): https://course.fast.ai
3 | - fastai on **GitHub:** [fastai](https://github.com/fastai/fastai)
4 | - Data
5 | - [Torrents](http://academictorrents.com/browse.php?search=fastai&page=0)
6 | - [some fastai files](http://files.fast.ai) (files, models, data)
7 |
8 | ## About Me
9 | * [My Blog](https://reshamas.github.io) (Reshama Shaikh)
10 | * Twitter: [@reshamas](https://twitter.com/reshamas)
11 |
12 |
13 | ## Projects
14 | - [Deploying Deep Learning Models On Web And Mobile](https://reshamas.github.io/deploying-deep-learning-models-on-web-and-mobile/) with [Nidhin Pattaniyil](https://www.linkedin.com/in/nidhinpattaniyil/)
15 |
16 | ---
17 |
18 | ## Courses (my outlines)
19 |
20 | ### Deep Learning
21 | - [Version 3] (fastai_dl_course_v4.md) (Spring 2020)
22 | - [Version 3](fastai_dl_course_v3.md) (Fall 2018 to Spring 2019)
23 | - [Version 2](fastai_dl_course_v2.md) (Fall 2017 to Spring 2018)
24 | - [Version 1](fastai_dl_course_v1.md) (Fall 2016 to Spring 2017)
25 |
26 | ### Machine Learning
27 | - [Fall 2017](fastai_ml_course.md)
28 |
29 | ---
30 |
31 | ## Helpful Resources
32 | * [Directory of fastai and DL terms](fastai_dl_terms.md)
33 | * [Solving the Most Common Errors](tips_troubleshooting.md)
34 | * [Fastai FAQs for Beginners](tips_faq_beginners.md)
35 | * [30+ Best Practices](http://forums.fast.ai/t/30-best-practices/12344)
36 | * [Resources](resources.md) (Blogs Written by fastai Fellows / Research Papers, etc)
37 | * [Fastai Blog Posts](http://www.fast.ai/topics/) (by Rachel Thomas & Jeremy Howard)
38 | - podcast with [Jeremy Howard on fastai_v1](https://twimlai.com/twiml-talk-186-the-fastai-v1-deep-learning-framework-with-jeremy-howard/) :red_circle:
39 | - podcast with [Rachel Thomas](https://twimlai.com/twiml-talk-138-practical-deep-learning-with-rachel-thomas/)
40 | - [Jeremy's PyTorch Tutorial](https://github.com/fastai/fastai_old/blob/master/dev_nb/001a_nn_basics.ipynb)
41 |
42 | ## [Technical Tools](tools/)
43 | * [tmux on AWS](tools/tmux.md)
44 | * [Download data using Kaggle CLI](tools/download_data_kaggle_cli.md)
45 | * [Download data using Chrome and wget](tools/download_data_browser_curlwget.md)
46 | * [Jupyter Notebook Commands & Shortcuts](tools/jupyter_notebook.md)
47 | * [How to Create a Keypair](tools/create_keypair.md)
48 | * [Copy Files from Local PC to Cloud PC](tools/copy_files_local_to_cloud.md)
49 |
50 |
51 | ## Other Resources
52 | - [Publish notebooks as Github gists with a single button click!](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/gist_it/readme.html)
53 | - [Tips for building large image datasets](https://forums.fast.ai/t/tips-for-building-large-image-datasets/26688)
54 |
55 |
56 |
57 |
58 |
59 |
--------------------------------------------------------------------------------
/code/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/code/.DS_Store
--------------------------------------------------------------------------------
/code/README.md:
--------------------------------------------------------------------------------
1 | # Helpful Code
2 |
3 | ### Utilities
4 | - https://github.com/prairie-guy/ai_utilities
5 |
6 | A set of scripts useful in deep learning and AI purposes, originally for use with `fast.ai` lectures and libraries.
7 | - Set up `train` and `valid` directories for use in deep learning models.
8 | - Download any number of images from Google image search.
9 | - Use 'file' to determine the type of picture then filter (keep) only pictures of a specified type.
10 |
11 |
12 | ### get size (shape) of image file
13 | ```python
14 | img.shape
15 | ```
16 | ```bash
17 | (198, 179, 3)
18 | ```
19 |
20 | ### Validator Set Creator
21 | * https://github.com/Renga411/dl1.fastai/blob/master/Validation-set-creator.ipynb
22 |
--------------------------------------------------------------------------------
/code/split_data.sh:
--------------------------------------------------------------------------------
1 | # splitting data
2 | # this is an example of camels/horses dataset
3 | # ls camels | wc -l
4 | # 0. shuffle the data
5 | # split the data into train/valid
6 | # 1. split the data for us into train/valid
7 | # 2. take a subset and split that into train/valid
8 |
9 | sample_t=${1:-50}
10 | sample_v=${2:-20}
11 | data_f="camelshorses"
12 | sample_f="camelshorses_sample"
13 | category_1="camels"
14 | category_2="horses"
15 |
16 | n_t=65
17 | n_v=35
18 |
19 | ns_t=40
20 | ns_v=20
21 | outfile="shuffled.txt"
22 |
23 | # create the folders
24 | mkdir -p data/$data_f/{train,valid}/{$category_1,$category_2}
25 | mkdir -p data/$sample_f/{train,valid}/{$category_1,$category_2}
26 |
27 | # shuffle the data for LABEL #1
28 | # copy into train and valid folders
29 |
30 | echo "print contents of shuffled.txt"
31 | echo " "
32 | cat /tmp/shuffled.txt
33 |
34 | start_dir="data/$data_f/train/$category_1/*"
35 | #shuf -e data/$data_f/train/$category_1/ >/tmp/shuffled.txt
36 | shuf -e $start_dir > /tmp/shuffled.txt
37 |
38 | echo "print contents of shuffled.txt"
39 | echo " "
40 |
41 |
42 | cat /tmp/shuffled.txt
43 |
44 | echo "copying images to train/valid for full dataset"
45 |
46 | head /tmp/shuffled.txt -n $n_t | xargs -i cp {} data/$data_f/train/$category_1
47 | tail /tmp/shuffled.txt -n $n_v | xargs -i cp {} data/$data_f/valid/$category_1
48 |
49 | echo "copying images to train/valid for subset of data"
50 | head /tmp/shuffled.txt -n $ns_t | xargs -i cp {} data/$sample_f/train/$category_1
51 | tail /tmp/shuffled.txt -n $ns_v | xargs -i cp {} data/$sample_f/valid/$category_1
52 |
53 |
54 | # shuffle the data for LABEL #1
55 | # copy into train and valid folders
56 | #shuf -e data/$data_f/train/$category_2/* > /tmp/shuffled2.txt
57 |
58 | #head /tmp/shuffled2.txt -n $n_t | xargs -i cp {} data/$data_f/train/$category_2
59 | #tail /tmp/shuffled2.txt -n $n_v | xargs -i cp {} data/$data_f/valid/$category_2
60 |
61 | #head /tmp/shuffled2.txt -n $ns_t | xargs -i cp {} data/$sample_f/train/$category_2
62 | #tail /tmp/shuffled2.txt -n $ns_v | xargs -i cp {} data/$sample_f/valid/$category_2
63 |
64 |
65 |
66 |
67 | #shuf -n $n_t -e data/$data_f/$category_1/* | xargs -i cp {} data/$sample_f/train/$category_1
68 | #shuf -n $n_t -e data/$data_f/$category_2/* | xargs -i cp {} data/$sample_f/train/$category_2
69 | #shuf -n $n_v -e data/$data_f/$category_1/* | xargs -i cp {} data/$sample_f/valid/$category_1
70 | #shuf -n $n_v -e data/$data_f/$category_2/* | xargs -i cp {} data/$sample_f/valid/$category_2
71 |
72 |
73 |
74 | #mkdir -p data/$sample_f/{train,valid}/{$category_1,$category_2}
75 | #shuf -n 200 -e data/dogscats/train/cats | xargs -i cp {} data/dogscats_sample/train/cats
76 |
77 | #shuf -n $sample_t -e data/$data_f/train/$category_1/* | xargs -i cp {} data/$sample_f/train/$category_1
78 | #shuf -n $sample_t -e data/$data_f/train/$category_2/* | xargs -i cp {} data/$sample_f/train/$category_2
79 | #shuf -n $sample_v -e data/$data_f/valid/$category_1/* | xargs -i cp {} data/$sample_f/valid/$category_1
80 | #shuf -n $sample_v -e data/$data_f/valid/$category_2/* | xargs -i cp {} data/$sample_f/valid/$category_2
81 |
--------------------------------------------------------------------------------
/code/subset_data.md:
--------------------------------------------------------------------------------
1 | # Subset data using `shuf`
2 |
3 | From the directory of your notebook (from where you have the data folder available) run the following:
4 | ```bash
5 | mkdir -p data/dogscats_sample/{valid,train}/{cats,dogs}
6 |
7 | shuf -n 200 -e data/dogscats/train/cats/* | xargs -i cp {} data/dogscats_sample/train/cats
8 | shuf -n 200 -e data/dogscats/train/dogs/* | xargs -i cp {} data/dogscats_sample/train/dogs
9 | shuf -n 100 -e data/dogscats/valid/cats/* | xargs -i cp {} data/dogscats_sample/valid/cats
10 | shuf -n 100 -e data/dogscats/valid/dogs/* | xargs -i cp {} data/dogscats_sample/valid/dogs
11 | ```
12 |
13 | ```
14 | ls camels | wc -l
15 | ```
16 |
17 | ```bash
18 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ cp horses/*.jpeg train/horses/
19 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ cp horses/*.jpeg valid/horses/
20 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ cp camels/*.jpeg train/camels/
21 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ cp camels/*.jpeg valid/camels/
22 | ```
23 | ```bash
24 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls horses | wc -l
25 | 101
26 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls train/horses | wc -l
27 | 101
28 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls valid/horses | wc -l
29 | 101
30 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls camels | wc -l
31 | 101
32 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls train/camels | wc -l
33 | 101
34 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls valid/camels | wc -l
35 | 101
36 | ```
37 |
38 | ```bash
39 | # make the sample data directory
40 | mkdir -p data/camelhorse/{valid,train}/{camel,horse}
41 |
42 | # split original data into train/test
43 | shuf -n 68 -e data/camelhorse/camels | xargs -i cp {} data/camelhorse/train/camel
44 | shuf -n 68 -e data/camelhorse/horses | xargs -i cp {} data/camelhorse/train/horse
45 | shuf -n 33 -e data/camelhorse/camels | xargs -i cp {} data/camelhorse/valid/camel
46 | shuf -n 33 -e data/camelhorse/horses | xargs -i cp {} data/camelhorse/valid/horse
47 | ```
48 |
49 |
50 | ```bash
51 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls ~/data/camelshorses/camels | wc -l
52 | 101
53 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls ~/data/camelshorses/horses | wc -l
54 | 101
55 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls ~/data/camelshorses/train | wc -l
56 | 2
57 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls ~/data/camelshorses/train/camels | wc -l
58 | 0
59 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls ~/data/camelshorses/train/horses | wc -l
60 | 0
61 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls ~/data/camelshorses/valid/camels | wc -l
62 | 0
63 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$ ls ~/data/camelshorses/valid/horses | wc -l
64 | 0
65 | (fastai) ubuntu@ip-172-31-2-59:~/data/camelshorses$
66 | ```
67 |
68 | In your notebook, change the PATH to PATH = "data/dogscats_sample/"
69 | The awesome command @jeremy shared on Twitter was this (please note the mv that you want to normally
70 | use when creating the train / valid / test splits):
71 |
72 | shuf -n 5000 -e all/*.* | xargs -i mv {} all_val/
73 |
--------------------------------------------------------------------------------
/code/tree.md:
--------------------------------------------------------------------------------
1 | # tree
2 | ```bash
3 | pip install tree
4 | ```
5 |
6 | ```bash
7 | (fastai) ubuntu@ip-172-31-2-59:~/data$ tree -d
8 | .
9 | ├── camelshorses
10 | │ ├── camels
11 | │ ├── horses
12 | │ ├── train
13 | │ │ ├── camels
14 | │ │ └── horses
15 | │ └── valid
16 | │ ├── camels
17 | │ └── horses
18 | ├── camelshorses_sample
19 | │ ├── train
20 | │ │ ├── camels
21 | │ │ └── horses
22 | │ └── valid
23 | │ ├── camels
24 | │ └── horses
25 | └── dogscats
26 | ├── models
27 | ├── sample
28 | │ ├── models
29 | │ ├── tmp
30 | │ ├── train
31 | │ │ ├── cats
32 | │ │ └── dogs
33 | │ └── valid
34 | │ ├── cats
35 | │ └── dogs
36 | ├── test1
37 | ├── tmp
38 | │ ├── x_act_resnet34_0_224.bc
39 | │ │ ├── data
40 | │ │ └── meta
41 | │ ├── x_act_test_resnet34_0_224.bc
42 | │ │ ├── data
43 | │ │ └── meta
44 | │ └── x_act_val_resnet34_0_224.bc
45 | │ ├── data
46 | │ └── meta
47 | ├── train
48 | │ ├── cats
49 | │ └── dogs
50 | └── valid
51 | ├── cats
52 | └── dogs
53 |
54 | 44 directories
55 | (fastai) ubuntu@ip-172-31-2-59:~/data$
56 | ```
57 |
--------------------------------------------------------------------------------
/code/untar_weights_file.md:
--------------------------------------------------------------------------------
1 | # Expand Weights File
2 |
3 | ```bash
4 | cd /home/paperspace/fastai/courses/dl1/fastai
5 | ```
6 |
7 | ```bash
8 | curl -O http://files.fast.ai/models/weights.tgz
9 | tar zxvf weights.tgz
10 | ```
11 |
--------------------------------------------------------------------------------
/courses/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/.DS_Store
--------------------------------------------------------------------------------
/courses/README.md:
--------------------------------------------------------------------------------
1 | # fastai Courses
2 |
3 |
4 | ## Deep Learning
5 | Note: Best to use the *latest* version.
6 |
7 | #### Version 3
8 | * [Deep Learning 1](v3-dl1/)
9 |
10 | #### Version 2
11 | * [Deep Learning 1](v2-dl1/)
12 | * [Deep Learning 2](v2-dl2/)
13 |
14 | ## Machine Learning
15 | * [Machine Learning 1](ml1/)
16 |
17 | ## Linear Algebra
18 | * [Computational Linear Algebra](cla/)
19 |
--------------------------------------------------------------------------------
/courses/cla/README.md:
--------------------------------------------------------------------------------
1 | # Computational Linear Algebra
2 |
3 | http://www.fast.ai/2017/07/17/num-lin-alg/
4 |
--------------------------------------------------------------------------------
/courses/mes_projets/0_login.md:
--------------------------------------------------------------------------------
1 | # Logging in to AWS
2 |
3 | ## Step 0: Initial Set-up Assumptions
4 | Assuming:
5 | - I have launched a p2 instance
6 | - I have set up my key pair
7 | - I have created an alias in my startup configuration file `~/.bash_profile`. In my case, I use `~/.zshrc`
8 |
9 | My alias:
10 | ```bash
11 | alias fastai='ssh -i "id_rsa" ubuntu@ec2-88-888-888-88.compute-1.amazonaws.com -L8888:localhost:8888'
12 | ```
13 |
14 | ## Step 1: AWS Console
15 | - sign in here: https://signin.aws.amazon.com/
16 | - start my `p2.xlarge` instance from before
17 |
18 | ## Step 2: My terminal on my Mac (local computer)
19 |
20 | ### Go to the appropriate directory
21 | ```bash
22 | cd /Users/reshamashaikh/.ssh
23 | ```
24 | ### Login to AWS
25 | Login as the user "ubuntu" rather than the user "root".
26 |
27 | ```bash
28 | fastai
29 | ```
30 |
31 | ### Update Ubuntu: `sudo apt-get update`
32 | ```bash
33 | sudo apt-get update
34 | ```
35 |
36 | ### Update fastai repo: `git pull`
37 | ```bash
38 | cd fastai
39 | ```
40 | ```bash
41 | git pull
42 | ```
43 | >my example
44 | ```bash
45 | (fastai) ubuntu@ip-172-31-2-59:~$ ls
46 | data fastai src
47 | (fastai) ubuntu@ip-172-31-2-59:~$ cd fastai
48 | (fastai) ubuntu@ip-172-31-2-59:~/fastai$ git pull
49 | (fastai) ubuntu@ip-172-31-2-59:~/fastai$
50 | ```
51 | ### Update Anaconda packages: `conda env update`
52 | ```bash
53 | conda env update
54 | ```
55 | >my example
56 | ```bash
57 | (fastai) ubuntu@ip-172-31-2-59:~/fastai$ conda env update
58 | Using Anaconda API: https://api.anaconda.org
59 | Fetching package metadata .................
60 | Solving package specifications: .
61 | #
62 | # To activate this environment, use:
63 | # > source activate fastai
64 | #
65 | # To deactivate an active environment, use:
66 | # > source deactivate
67 | #
68 | (fastai) ubuntu@ip-172-31-2-59:~/fastai$
69 | ```
70 | ### Update Anaconda packages: `conda update --all`
71 |
72 |
73 | ## Step 3: Turn off AWS Instance after completing work!
74 |
75 | ---
76 | ## `~/.bashrc` File
77 | ```bash
78 | nano ~/.bashrc
79 | ```
80 |
81 |
82 | ---
83 | # My Projects
84 |
85 | ## Go to where my projects are
86 | ```bash
87 | cd /home/ubuntu/my_repos/
88 | ```
89 |
90 | ### Project 1
91 | ```bash
92 | /home/ubuntu/my_repos/llis_topicModel
93 | ```
94 |
95 | ### Project 2
96 | ```bash
97 | (fastai) ubuntu@ip-172-31-2-59:~/git_repos/projects$ pwd
98 | /home/ubuntu/git_repos/projects
99 | (fastai) ubuntu@ip-172-31-2-59:~/git_repos/projects$ ls -l
100 | total 12
101 | drwxrwxr-x 2 ubuntu ubuntu 4096 Jan 8 21:07 camels_h
102 | drwxrwxr-x 3 ubuntu ubuntu 4096 Jan 8 00:44 iceberg
103 | -rw-rw-r-- 1 ubuntu ubuntu 23 Jan 7 21:04 README.md
104 | (fastai) ubuntu@ip-172-31-2-59:~/git_repos/projects$
105 | ```
106 |
107 | ## My data
108 | ```bash
109 | (fastai) ubuntu@ip-172-31-2-59:~/data$ pwd
110 | /home/ubuntu/data
111 | (fastai) ubuntu@ip-172-31-2-59:~/data$ ls -alt
112 | total 20
113 | drwxr-xr-x 20 ubuntu ubuntu 4096 Jan 8 21:11 ..
114 | drwxrwxr-x 2 ubuntu ubuntu 4096 Jan 7 20:44 iceberg
115 | drwxrwxr-x 5 ubuntu ubuntu 4096 Jan 7 20:38 .
116 | drwxrwxr-x 8 ubuntu ubuntu 4096 Dec 21 01:53 camelhorse
117 | drwxrwxr-x 8 ubuntu ubuntu 4096 Dec 20 22:19 dogscats
118 | (fastai) ubuntu@ip-172-31-2-59:~/data$
119 | ```
120 |
121 | ## Launch Jupyter Notebook
122 | ```bash
123 | (fastai) ubuntu@ip-172-31-2-59:~$ pwd
124 | /home/ubuntu
125 | (fastai) ubuntu@ip-172-31-2-59:~$ jupyter notebook
126 | ```
127 |
128 |
--------------------------------------------------------------------------------
/courses/mes_projets/1_scp_data_aws.md:
--------------------------------------------------------------------------------
1 |
2 | ```bash
3 | % pwd
4 | /Users/reshamashaikh/ds/data/camelshorses
5 | % ls
6 | total 0
7 | drwxr-xr-x 103 3502 Nov 25 12:19 camels
8 | drwxr-xr-x 104 3536 Nov 25 12:16 horses
9 | % ls camels | wc -l
10 | 102
11 | % ls horses | wc -l
12 | 102
13 | % scp -r . ubuntu@34.198.228.48:~/data/camelhorse
14 | ```
15 |
--------------------------------------------------------------------------------
/courses/mes_projets/2_split_data.md:
--------------------------------------------------------------------------------
1 |
2 | ```bash
3 | # make the sub directories
4 | mkdir -p data/camelhorse/{train,valid}/{camel,horse}
5 |
6 | # split original data into train/test
7 | shuf -n 68 -e data/camelhorse/camels/* | xargs -i cp {} data/camelhorse/train/camel
8 | shuf -n 68 -e data/camelhorse/horses/* | xargs -i cp {} data/camelhorse/train/horse
9 | shuf -n 33 -e data/camelhorse/camels/* | xargs -i cp {} data/camelhorse/valid/camel
10 | shuf -n 33 -e data/camelhorse/horses/* | xargs -i cp {} data/camelhorse/valid/horse
11 |
12 |
13 | ls ~/data/camelhorse/camels | wc -l
14 | ls ~/data/camelhorse/horses | wc -l
15 | ls ~/data/camelhorse/train/camel | wc -l
16 | ls ~/data/camelhorse/train/horse | wc -l
17 | ls ~/data/camelhorse/valid/camel | wc -l
18 | ls ~/data/camelhorse/valid/horse | wc -l
19 | ```
20 |
--------------------------------------------------------------------------------
/courses/mes_projets/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | https://www.kaggle.com/devm2024/keras-model-for-beginners-0-210-on-lb-eda-r-d
4 |
--------------------------------------------------------------------------------
/courses/mes_projets/comments.md:
--------------------------------------------------------------------------------
1 | https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/kernels?sortBy=votes&group=everyone&pageSize=20&competitionId=8076
2 |
3 | ```
4 | (fastai) ubuntu@ip-172-31-2-59:~/data$ pwd
5 | /home/ubuntu/data
6 | mkdir toxic_comments
7 | ```
8 |
9 | ```bash
10 | kg download -c jigsaw-toxic-comment-classification-challenge
11 | ```
12 |
13 | sudo apt install unzip
14 | unzip -q train.zip
15 | unzip -q test.zip
16 |
17 | ```bash
18 | (fastai) ubuntu@ip-172-31-2-59:~/data/toxic_comments$ pwd
19 | /home/ubuntu/data/toxic_comments
20 | (fastai) ubuntu@ip-172-31-2-59:~/data/toxic_comments$ mkdir subm
21 | ```
22 | ```
23 | (fastai) ubuntu@ip-172-31-2-59:~/data/toxic_comments$ wget http://nlp.stanford.edu/data/glove.6B.zip
24 | ```
25 |
26 |
27 |
--------------------------------------------------------------------------------
/courses/mes_projets/get_data.md:
--------------------------------------------------------------------------------
1 | # Iceberg
2 |
3 | ```bash
4 | (fastai) ubuntu@ip-172-31-2-59:~/data$ pwd
5 | /home/ubuntu/data
6 | (fastai) ubuntu@ip-172-31-2-59:~/data$
7 | ````
8 |
9 | https://www.kaggle.com/c/statoil-iceberg-classifier-challenge
10 |
11 |
12 | ```bash
13 | rm ~/.kaggle-cli/browser.pickle
14 | pip install kaggle-cli --upgrade
15 | ```
16 |
17 | ```bash
18 | kg download -u "reshamashaikh" -p "xxx" -c statoil-iceberg-classifier-challenge
19 | ```
20 |
21 | ```bash
22 | sudo apt-get install p7zip-full
23 | 7z e test.json.7z
24 | 7z e train.json.7z
25 | 7z e sample_submission.csv.7z
26 | ```
27 |
28 | ```
29 | (fastai) ubuntu@ip-172-31-2-59:~/data/iceberg$ ls -alt
30 | total 1972980
31 | drwxrwxr-x 2 ubuntu ubuntu 4096 Jan 7 20:44 .
32 | drwxrwxr-x 5 ubuntu ubuntu 4096 Jan 7 20:38 ..
33 | -rw-rw-r-- 1 ubuntu ubuntu 257127394 Jan 7 20:36 test.json.7z
34 | -rw-rw-r-- 1 ubuntu ubuntu 44932785 Jan 7 20:36 train.json.7z
35 | -rw-rw-r-- 1 ubuntu ubuntu 38566 Jan 7 20:36 sample_submission.csv.7z
36 | -rw-rw-r-- 1 ubuntu ubuntu 117951 Oct 23 17:27 sample_submission.csv
37 | -rw-rw-r-- 1 ubuntu ubuntu 1521771850 Oct 23 17:27 test.json
38 | -rw-rw-r-- 1 ubuntu ubuntu 196313674 Oct 23 17:23 train.json
39 | (fastai) ubuntu@ip-172-31-2-59:~/data/iceberg$
40 | ```
41 | ```bash
42 | (fastai) ubuntu@ip-172-31-2-59:~/data/iceberg$ wc -l *
43 | 8425 sample_submission.csv
44 | 151 sample_submission.csv.7z
45 | 0 test.json
46 | 1004794 test.json.7z
47 | 0 train.json
48 | 175531 train.json.7z
49 | 1188901 total
50 | (fastai) ubuntu@ip-172-31-2-59:~/data/iceberg$
51 | ```
52 |
53 | ```bash
54 | (fastai) ubuntu@ip-172-31-2-59:~/fastai/courses/dl1$ jupyter notebook
55 | ```
56 |
57 |
58 |
--------------------------------------------------------------------------------
/courses/mes_projets/test.md:
--------------------------------------------------------------------------------
1 | pie, desserts
2 |
--------------------------------------------------------------------------------
/courses/mes_projets/to_do.md:
--------------------------------------------------------------------------------
1 | # To do later...when I have some time
2 |
3 | ## Reading
4 | * [precompute=True](http://forums.fast.ai/t/precompute-true/7316/55)
5 | * go thru Jeremy's notebook [cifar10](https://github.com/fastai/fastai/blob/master/courses/dl1/cifar10.ipynb) as it should provide you with a good intuition about how best to use:
6 | - lr_finder
7 | - cycle_len
8 | - cycle_mult
9 | - resizing
10 |
11 | ## To Explore Later
12 | [byobu](http://byobu.co) Byobu is a GPLv3 open source text-based window manager and terminal multiplexer.
13 |
14 | ## Reading
15 | [Kaggle Planet Competition: How to land in top 4%](https://medium.com/@irshaduetian/kaggle-planet-competition-how-to-land-in-top-4-a679ff0013ba)
16 |
17 |
--------------------------------------------------------------------------------
/courses/ml1/lesson_01.md:
--------------------------------------------------------------------------------
1 |
2 | # Lesson 1 - random forests
3 |
4 | - Length: 01:18
5 | - Video: https://www.youtube.com/watch?v=CzdWqFTmn0Y&feature=youtu.be
6 | - Notebook: [lesson1-rf.ipynb](https://github.com/fastai/fastai/blob/master/courses/ml1/lesson1-rf.ipynb)
7 |
8 | ---
9 |
10 | ## Getting data using `curl`
11 | https://www.kaggle.com/c/bluebook-for-bulldozers
12 |
13 | - ML should help us understand a dataset, not just make predictions about it.
14 |
15 | Firefox, to website, then Javascript console, Developer
16 | - ctrl + shift + i to bring up web developer tool
17 | - tab to Network
18 | - go to data row
19 | - right click, copy as Curl (unix command that downloads data, like `wget`)
20 | - might want to delete "2.0" in url since it causes problems
21 | - `curl url_link -o bulldozers.zip` `-o` means output, then give suitable file name
22 | - `mkdir bulldozers`
23 | - `mv bulldozers.zip bulldozers/`
24 | - `sudo apt install unzip` or `brew install unzip`
25 | - `unzip bulldozers.zip`
26 |
27 | Python 3.6 format string:
28 | ```python
29 | df_raw = pd.read_csv(f'{PATH}Train.csv', low_memory=False,
30 | parse_dates=["saledate"])
31 | ```
32 | - `f'{PATH}Train.csv'` the `f` tells it to interpolate the "{PATH}"
33 | - `low_memory=False` make it read more of the file to decide what the types are
34 |
35 | ### Example
36 | `name = 'Jeremy'`
37 | `age = 43`
38 | `f'Hello {name.upper()}, you are {age}'`
39 | output:
40 | >Hello, Jeremy, you are 43
41 |
42 | ### Random Forest
43 | - universal machine learning technique
44 | - way of predicting something of any kind (dog/cat, price)
45 | - can predict a categorical or continuous variable
46 | - columns can be of any kind (pixel data, zip codes, revenues)
47 | - in general, it doesn't overfit
48 | - easy to stop it from overfitting
49 | - don't need a separate validation set
50 | - has few, if any statistical assumptions
51 | - doesn't assume data is normally distributed
52 | - doesn't assume relationships are linear
53 | - don't need to specify interactions
54 | - requires few pieces of feature engineering (don't have to take log of data)
55 | - it's a great place to start
56 | - if your random forest doesn't work, it's a sign there is something wrong with the data
57 |
58 | Both Curse of Dimensionality & No Free Lunch are largely false.
59 |
60 | #### Curse of Dimensionality - idea that the more columns you have, it creates more columns that are empty; that the more dimensions you have, the more they sit on the edge; in theory, distance between points is much less meaningful.
61 | - points **do** still have distance from each other
62 | - in the 90's, theory took over machine learning
63 | - we lost a decade of real practical development with these theories
64 | - in practice, building models on lots and lots of columns works well
65 |
66 | #### No Free Lunch Theorem
67 | - there is no type of model that works well for any kind of dataset
68 | - but, in the real world, that's not true; some techniques **do work**
69 | - ensembles of decision trees works well
70 |
71 | ### sklearn
72 | - RandomForestRegressor is part of `sklearn`, `scikit learn`
73 | - Scikit learn is not the best, but perfectly good at nearly everything; popular library
74 | - next part of course (with Yannet), will look at different kind of decision tree ensemble, called Gradient Boosting Trees, XGBoost which is better than gradient boosting trees in scikit learn
75 |
76 | `from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier`
77 | - RandomForestRegressor - predicts continuous variables
78 | - RandomForestClassifier - predicts categorical variable
79 |
80 | ## Convert to Pandas Categories
81 | The categorical variables are currently stored as strings, which is inefficient, and doesn't provide the numeric coding required for a random forest. Therefore we call train_cats to convert strings to pandas categories.
82 | This is a fastai library function:
83 | `train_cats(df_raw)`
84 |
85 | ## re-order Pandas categories
86 | ```python
87 | df_raw.UsageBand.cat.categories
88 | Out[9]:
89 | Index(['High', 'Low', 'Medium'], dtype='object')
90 | In [10]:
91 | df_raw.UsageBand.cat.set_categories(['High', 'Medium', 'Low'], ordered=True, inplace=True)
92 | ```
93 | In the background, the code is 0, 1, 2 for the categories which is what is used in Random Forest. -1 is assigned to NA.
94 |
95 | ## get percent of missing values for each column
96 | We're still not quite done - for instance we have lots of missing values, wish we can't pass directly to a random forest.
97 | `display_all(df_raw.isnull().sum().sort_index()/len(df_raw))`
98 | ```bash
99 | Backhoe_Mounting 0.803872
100 | Blade_Extension 0.937129
101 | Blade_Type 0.800977
102 | Blade_Width 0.937129
103 | Coupler 0.466620
104 | Coupler_System 0.891660
105 | Differential_Type 0.826959
106 | Drive_System 0.739829
107 | Enclosure 0.000810
108 | ```
109 |
110 | ## save to feather format
111 | - But let's save this file for now, since it's already in format can we be stored and accessed efficiently.
112 | - saves to disk the same way it appears in RAM
113 | - **feather** is fairly new
114 | - fastest way to save to disk and fastest way to read it back
115 | ```python
116 | os.makedirs('tmp', exist_ok=True)
117 | df_raw.to_feather('tmp/bulldozers-raw')
118 | ```
119 | So, then we don't have to re-run everything from start of notebook.
120 | #### Pre-processing
121 | In the future we can simply read it from this fast format.
122 | ```python
123 | df_raw = pd.read_feather('tmp/bulldozers-raw')
124 | ```
125 | ## Run Random Forest
126 | - serial number (numbers): random forest works fine with these ID numbers that are not really continuous
127 | - random forests are trivially parellelizable
128 | - means it will split up data across CPUs and linearly scale
129 | - `n_jobs=-1` means create a separate job for each CPU that you have
130 |
131 | ### Kaggle Competition
132 | - generally speaking, if you're in the top half of a Kaggle competition, you're doing well
133 | - so, here, with no thinking and using the defaults of the algorithm (random forest), we're in the top quarter of the competition
134 | - random forests are insanely powerful
135 |
136 | ### HW
137 | - take as many Kaggle competitions as you can
138 | - try this process with Pandas set up
139 | -
140 |
141 |
142 |
143 |
144 |
145 |
146 |
147 |
--------------------------------------------------------------------------------
/courses/ml1/lesson_02.md:
--------------------------------------------------------------------------------
1 | # Lesson 2: RF - Part 2
2 |
3 | Length: 01:35
4 | Notebook: [lesson1-rf.ipynb](https://github.com/fastai/fastai/blob/master/courses/ml1/lesson1-rf.ipynb)
5 |
6 | ---
7 |
8 | ## Create a symlink
9 | ```bash
10 | ln -s ../../fastai ./
11 | ```
12 | where `./` is the current directory
13 |
14 |
15 | Evaluation Metric is: root mean squared log error
16 | sum{ [(ln(act) - ln(pred)]^2 }
17 |
18 | ## Data Process
19 | - we need all of our columns to be numbers
20 | - use function `add_datepart` to replace a date variable with all of its date parts
21 | - use function `train_cats` to convert strings to pandas categories (Notice: data type is not `string`, but `category`)
22 | - use function `set_categories` to re-order categories
23 | - use function `proc_df` to replace categories with their numeric codes, handle missing continuous values, and split the dependent variable into a separate variable.
24 | >df, y, nas = proc_df(df_raw, 'SalePrice')
25 | - for continuous variables, missing values were replaced with the median
26 |
27 | ## R^2
28 | - if you get an R^2 that is negative, it means your model is worse than predicting the mean
29 | - R^2 is not necessarily what you're trying to optimize
30 | - R^2 how good is your model vs the naive mean model?
31 |
32 | ## Test and Validation Sets
33 | - Creating a validation set is the most important thing you'll do in machine learning.
34 | - Validation Set (first hold out set): use this to determine what hyperparameters to use
35 | - Testing (second hold out set): I've done modeling, now I'll see how it works
36 |
37 | ## Random Forest code
38 | - `n_estimators` = number of trees
39 | - `n_jobs=-1` --> means create a separate job for each CPU that you have
40 | ```python
41 | m = RandomForestRegressor(n_estimators=20, n_jobs=-1)
42 | ```
43 |
44 | ## Random Forest Scores output
45 | [training RMSE , validation RMSE, training R^2, validation R^2]
46 | ```bash
47 | [0.1026724559118164, 0.33553753413792303, 0.9786895444439101, 0.79893791069374753]
48 | ```
49 |
50 | ## Bagging
51 | - statistical technique to create a random forest
52 | - Bag of Little Bootstraps, Michael Jordan
53 | - create 5 different models which are not correlated --> they offer different insights
54 | - build 1000 trees on 10 separate data points --> invididual trees will not be predictive, but combined they will
55 |
56 | ## Bootstrapping
57 | - pick out n rows with replacement
58 |
59 | ## Out-of-Bag (OOB) Score
60 | - very useful when we have only a small dataset
61 | ```python
62 | m = RandomForestRegressor(n_estimators=40, n_jobs=-1, oob_score=True)
63 | m.fit(X_train, y_train)
64 | print_score(m)
65 | ```
66 | [training RMSE , validation RMSE, training R^2, validation R^2, OOB R^2]
67 | ```bash
68 | [0.10198464613020647, 0.2714485881623037, 0.9786192457999483, 0.86840992079038759, 0.84831537630038534]
69 | ```
70 |
71 | ## Grid Search
72 | - pass in list of hyperparameters we want to tune and values we want to try
73 |
74 | ## Subsampling
75 | - The basic idea is this: rather than limit the total amount of data that our model can access, let's instead limit it to a different random subset per tree. That way, given enough trees, the model can still see all the data, but for each individual tree it'll be just as fast as if we had cut down our dataset as before.
76 | - no dataset is too big for this technique (ex: 120 million rows for grocery store data of Kaggle competition)
77 | - need to set `oob_score = False` if using subsample approach of `set_rf_samples(20000)`
78 | - to turn it off, do `reset_rf_samples()`
79 | ```python
80 | set_rf_samples(20000)
81 | ```
82 |
83 | ## Important Takeaway / Tip
84 | - very few people in industry or academia do this
85 | - most people run all of their models on all of their data all of the time using their best parameters
86 | - do most of your models on a large enough sample size so your accuracy is reasonable, that takes a small number of seconds to train
87 |
88 | ## Tree Building Parameters
89 | - `min_samples_leaf=1` this is the default
90 | - `min_samples_leaf=3` says stop training the tree further when your leaf node has 3 or less samples in; the numbers 1, 3, 5, 10, and 25 work well
91 | - `max_features=None` this is the default; then max_features=n_features (default is to use all the features)
92 | - `max_features=0.5` the less correlated your trees are with each other, the better; randomly choose half the features
93 | - `max_features` in practice, good values range from 0.5 to log2 or sqrt
94 |
95 | ## Random Forest
96 | - hard to screw it up
97 | - great for out of box, even without tuning hyperparameters
98 | - tends to work on most datasets most of the time
99 |
100 | ## Looking at categories
101 | - `df_raw.fiProductClassDesc.cat.categories`
102 | - `df_raw.fiProductClassDesc.cat.codes` --> this is what the random forest sees
103 |
104 | ## Homework
105 | - experiment
106 | - draw the trees
107 | - plot the errors
108 | - try different datasets
109 | - write your own R2
110 | - write your own versions of the datasets
111 |
112 |
113 |
114 |
115 |
116 |
--------------------------------------------------------------------------------
/courses/ml1/lesson_05.md:
--------------------------------------------------------------------------------
1 | # Lesson 5
2 |
3 | - Length: 01:40
4 | - Video: https://www.youtube.com/watch?v=3jl2h9hSRvc&feature=youtu.be
5 | - Notebook: [lesson2-rf_interpretation.ipynb](https://github.com/fastai/fastai/blob/master/courses/ml1/lesson2-rf_interpretation.ipynb)
6 |
7 | ---
8 | ## Review
9 | - What's the difference between Machine Learning and "any other kind of [analysis] work"? In ML, we care about the **generalization error** (in other analysis, we care about how well we map our observations to outcome)
10 | - the most common way to check for **generalization** is to randomly pull some data rows into a **test set** and then check the accuracy of the **training set** with the **test set**
11 | - the problem is: what if it doesn't generalize? could change hyperparameters, data augmentation, etc. Keep doing this until many attempts, it will generalize. But after trying 50 different things, could get a good result accidentally
12 | - what we generally do is get a second **test set** and call it a **validation set**
13 | - a trick for **random forests** is we don't need a validation set; instead, we use the **oob error/score (out-of-bag)**
14 | - every time we train a tree in RF, there are a bunch of observations that are held out anyway (to get rid of some of the randomness).
15 | - **oob score** gives us something pretty similar to **validation score**, though, on average, it is a little less good
16 | - samples from oob are bootstrap samples
17 | - ** with validation set, we can use the whole forest to make the prediction
18 | - ** but, here we cannot use the whole forest; every row is going to use a subset of the trees to make its predictions; with less trees, we get a less accurate predction
19 | - ** think about it over the week
20 | - Why have a validation set at all when using random forests? If it's a randomly chosen validation dataset, it is not strictly necessary;
21 | - you've got 4 levels of items we're got to test
22 | 1. oob - when that's done working well, go to next one
23 | 2. validation set
24 | 3. test
25 |
26 | ### How Kaggle Compute Their Validation Score
27 | - splits the test set into 2 pieces: Public, Private
28 | - they don't tell you which is which
29 | - you submit your predictions to Kaggle
30 | - Kaggle selects a random 30% to tell you your Leaderboard score
31 | - at the end of the competition, that gets thrown away
32 | - then they use the other 70% to calculate your "real score"
33 | - making sure that you're not using the feedback from the Leaderboard to figure out some set of hyperparameters that do well but don't generalize
34 | - this is why it is good practice to use Kaggle; at the end of a competition, you may drop 100 places in a competition
35 | - good to practice on Kaggle than at a company where there are millions of dollars on the line
36 |
37 | ### Q: case of not using a random sample for validation
38 | - Q: When might I not be able to use a random set for validation?
39 | - cases: in the case of temporal data, unbalanced data
40 | - Tyler: we expect things close by in time to be related close to them. If we destroy the order, ...
41 | - JH: important to remember, when you buid a model, think that we are going to use the model at a time in the future
42 | - when you build a model, you always have a systematic error, that the model will be used at a later time, at which time the world will be different than it is now; there is a lag from when time model is built to time when it is used; even when building the model, data is much older; a lot of the time, _that matters_
43 | - if we're predicting who will buy toilet paper in NJ, and it takes us 2 weeks to put model in production, and we used data based on past 2 years, then by that time, things may look very different
44 | - particularly, our validation set (if we randomly sampled from a 4-yr period), then the vast majority of that data is over a year old, and it may be that the toilet paper buying habits of folks in NJ may have dramatically shifted
45 | - maybe there is a terrible recession and they can't afford high quality paper
46 | - maybe paper making industry has gone thru the roof and they're buying more paper because it's cheaper
47 | - so, the world changes, if you use a random sample for your validation set, then you are actually checking: how good are you at predicting things that are totally obsolete now? how good are you at predicting things that happened 4 years ago? That's _not_ interesting.
48 | - What we want to do in practice, any time there is some temporal piece, instead say (assuming we've ordered it by time), make the tail end of the data the **validation set**
49 | - example: last 10% of data is the test set
50 | - the 10% of the data prior to the test set is the validation set
51 | - we then build a model that still works on stuff that is later in time than what the model was built on; that it generalizes into the future
52 | - Q: how do you get the validation set to be good?
53 | - `20:00` if it looks good on the **oob** then it means we are not overfitting in the statistical sense; it's working well on a random sample; but then it looks bad on the validation set; you somehow failed to predict the future; you predicted the past
54 | - Suraj idea: maybe we should train a recent period only; downside, we're using less data, create a less-rich model
55 | - most machine learning functions have ability to provide a weight to each row of data
56 | - for example for RF, instead of bootstrapping, could have a weight on each row and randomly pick that row with some probability, so the most recent rows have a higher probability of being selected; that can work very well; it's something you have to try, and if you don't have a validation set that represents the future (compared to what you're training on), then you have no way of knowing how your techniques are working
57 | - `21:15` you make a compromise between amount of data vs recency of data?
58 | - JH: what Jeremy tends to do when he has temporal data, which is probably most of the time, he once he gets something working well on the validation set, he wouldn't just go and use the model on the test set, because the thing I've trained on is (test set) much more in the future; this time he would replicate building the model again, this time combine the train and validation sets, and retrain the model. - at that point, you've got no way to test against a validation set so you have to make sure you have a reproducible script or notebook that does exactly the same steps in exactly the same ways because if you get something wrong then you're going to find on the test set that you've got a problem;
59 | - `22:10` so what I (JH) does in practice is I need to know is my validation set a truly representative of
60 | -
61 |
62 |
63 |
64 |
--------------------------------------------------------------------------------
/courses/nlp/README.md:
--------------------------------------------------------------------------------
1 | # fastai: [NLP Course](https://www.fast.ai/2019/07/08/fastai-nlp/)
2 | - [YouTube Playlist](https://www.youtube.com/playlist?list=PLtmWHNX-gukKocXQOkQjuVxglSDYWsSh9)
3 | - [Jupyter notebooks on GitHub](https://github.com/fastai/course-nlp)
4 |
5 | ## Videos
6 |
7 | 1. [What is NLP?](www.youtube.com/watch?v=cce8ntxP_XI) (0:23) (done 03-Mar-2020)
8 | 2. [Topic Modeling with SVD & NMF (NLP video 2)](www.youtube.com/watch?v=tG3pUwmGjsc) (1:07) (done 03-Mar-2020)
9 | 3. [Topic Modeling and SVD revisited](https://youtu.be/lRZ4aMaXPBI) (33:06) (done 04-Mar-2020)
10 | 4. [Sentiment Classification of Movie Reviews (using Naive Bayes, Logistic Regression, Ngrams](https://youtu.be/hp2ipC5pW4I) (58:20)
11 | - [notebook](https://github.com/fastai/course-nlp/blob/master/3-logreg-nb-imdb.ipynb) (done 04-Mar-2020)
12 | 5. [Sentiment Classification of Movie Reviews: NB, LR, Ngrams](https://youtu.be/dt7sArnLo1g) (52:00) (done)
13 | 6. [Derivation of Naive Bayes & Numerical Stability](https://youtu.be/z8-Tbrg1-rE) (24:00) (done 11-Mar-2020)
14 | 7. [Revisiting Naive Bayes, and Regex](https://youtu.be/Q1zLqfnEXdw) (38:00) (done 12-Mar-2020)
15 | 8. [Intro to Language Modeling](https://youtu.be/PNNHaQUQqW8) (41:00) (done 12-Mar-2020)
16 | 9. [Transfer learning](https://youtu.be/5gCQvuznKn0) (1:36:00)
17 | - ...
18 | 19.
19 |
20 |
21 |
22 |
--------------------------------------------------------------------------------
/courses/nlp/videos_01_to_05.md:
--------------------------------------------------------------------------------
1 | # NLP: Lessons 1 to 5
2 |
3 | ## Video 2 [Topic Modeling with SVD & NMF (NLP video 2)](www.youtube.com/watch?v=tG3pUwmGjsc)
4 | * spacy doesn't offer a stemmer, because it doesn't think it should be used
5 | * Google [sentencepiece](https://github.com/google/sentencepiece)
6 | * performs sub-word tokens
7 | * NMF (non-negative matrix factorization) is not unique, but can be more interpretable
8 |
9 | To check time of a step:
10 | ```python
11 | %time u, s, v = np.linalg.svd(vectors, full_matrices=False)
12 | ```
13 |
14 | ## Video 3
15 |
16 | - stemming: getting roots of words (chops off end, "poor man's lemmatization")
17 | - lemmatization: (fancier)
18 | - lemmatization is more computationally expensive than stemming
19 | - stemming is quicker and easier
20 |
21 | ### Pre-processing
22 | - when you have less data, do this pre-processing
23 | - do you think your model can handle the complexity:
24 | - if you're using neural networks, don't do lemmatization, because that is throwing away information
25 | - if you have a simpler model, can't learn as much complexity, so do this pre-processing
26 |
27 | ### Factorization is analagous to matrix decomposition
28 |
29 | ### What are the nice properties that matrices in an SVD decomposition have?
30 | - A = USV
31 | - U: orthonormal; columns or rows are orthonormal to each other; the columns are orthogonal and pairwise normalized. (dot product of two columns is 0. dot product of column with itself gives us 1)
32 | - S: diagonal matrix; everything off diagonals is 0; capture an idea of importance, singular values, descending order: capture biggest one first, non-negative, scale of U and V is both 1
33 | - V: same properties as U, but transpose of rows are orthonormal to each other
34 | - NMF: special property in decomposition is **non-negative** AND matrix is **sparse** (sparse means many of the values are zero)
35 |
36 | ## Linear Algebra
37 | - 3 Blue 1 Brown: Essence of Linear Algebra [playlist on YouTube](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab0
38 | - video Chapter 3 [Linear transformation and matrices](https://youtu.be/kYB8IZa5AuE)
39 |
40 | ## Reviewing spreadsheet
41 | - first matrix: TF-IDF (term document matrix)
42 | - rows: author_title
43 | - cols: words
44 | - use SVD to decompose TF-IDF matrix into 3 matrices
45 | - U rows: author_title
46 | - U cols: topics1 to x
47 | - S: diagonal matrix (singular values in descending order, most important one at (1, 1) position in matrix)
48 | - S rows: topics
49 | - S cols: topics
50 | - V:
51 | - V rows: topics
52 | - V cols: words
53 |
54 | ## Advantages / Disadvantages of SVD vs NMF
55 | - NMF: non-negative values, can be more interpretable
56 | - SVD: can have negative values for topic
57 | - SVD: is an exact decomposition; can fully represent the input matrices
58 | - NMF: not exact
59 | - NMF: need to set the number of topics you want, that's hyperparameter that you are choosing;
60 | - SVD: traditional SVD: you are getting as many singular values as you have documents, (assuming fewer documents than vocab words)
61 | - SVD: there is opportunity to look at singular values and see that when they get little, these topics may be so unimportant that I can ignore them and chop them off. But that also means with SVD, you're doing extra work.
62 | - so it is both giving you more information and extra work
63 | - SVD: on a big matrix is **slow**
64 | - in example on a 10K by 10K matrix, it is very slow
65 | - one way to address this is to use **randomized SVD**
66 |
67 | ### Full vs Reduced SVD
68 | - Full SVD will have U and V both be square matrices
69 | - that involves making up some columns for U that don't directly depend on the data A
70 | - for S matrix (singular value), also adding some rows of pure zeroes
71 | - in practice, you are usually going to be using reduced SVD, it's quicker to calculate and you are often not needing to use/ turn it into an orthonormal basis
72 |
73 | ---
74 |
75 | ## Video 4 [Sentiment Classification of Movie Reviews (using Naive Bayes, Logistic Regression, Ngrams](https://youtu.be/hp2ipC5pW4I)
76 |
77 | ### Word frequency count
78 | - in Jupyter notebook, type `?? URLs` to pull up documentation
79 | - `itos` = integer to string [is type list]
80 | - `stoi` = string to integer [is type dictionary], dict is good to search by string,
81 | - `movie_reviews.vocab.itos[230:240]` are ordered by **frequency**
82 | - `movie_reviews.vocab.stoi['language']` gives 917
83 | - if you want human-readable, use strings
84 | - if you want something the algorithm can process, use numbers
85 | - it's not a 1-to-1 mapping because several words can have the same index
86 | - we can have a lot of words mapping to "unknown", many things will map to capital letter,
87 |
88 | ### Creating term document matrix
89 | - a matrix with lots of zeroes is called **sparse**
90 | - you can save a lot of memory by only storing the non-zero values
91 | - opposite of **sparse** matrices are **dense** matrices
92 |
93 | ## Sparse matrix storage formats
94 | - we know most words don't show up in most reviews
95 |
96 | ### coordinate-wise (scipy calls COO)
97 | - store 3 values: row in matrix, col in matrix and the value of that entry
98 | - instead of full matrix size (say 10x10), you only store 3 items for each entry (x_i, y_i, entry)
99 | - rows or columns need not be ordered in any way
100 |
101 | ### compressed sparse row (CSR)
102 | - stores column and entry
103 | - assigns row pointer, and only changes it when it moves to next row
104 | - list of row pointers is shorter than for coordinate-wise storage
105 | - if you are accessing data by row a lot, this makes it easier
106 | - it's not as easy to access columns, and that would require more calculations
107 |
108 | ### compressed sparse column (CSC)
109 | - similar to CSR, but uses column
110 |
111 | There are a lot of different **Sparse Matrix Compression Formats**.
112 | - Coordinate format is the most intuitive
113 |
114 | Advantage of CSR method over Coordinate-wise method:
115 | - the number of operations to perform matrix-vector multiplication in both storage method are the same ...
116 | - However: the number of **memory accesses** is reduced (by 2 to be exact) in the CSR method
117 |
118 |
119 | ---
120 |
121 | # Video 5: [Sentiment Classification of Movie Reviews: NB, LR, Ngrams](https://youtu.be/dt7sArnLo1g)
122 | [Notebook](https://github.com/fastai/course-nlp/blob/master/3-logreg-nb-imdb.ipynb)
123 |
124 |
125 |
--------------------------------------------------------------------------------
/courses/nlp/videos_06_to_10.md:
--------------------------------------------------------------------------------
1 | # NLP: Videos 6 to 10
2 |
3 | ## Video 6: [Derivation of Naive Bayes and Numerical Stability](https://youtu.be/z8-Tbrg1-rE)
4 | - use conda to install fastai library
5 | - how computers store numbers
6 |
7 | ### regex
8 | - use `assert` to check test cases
9 | - instead of writing `0 1 2 3 4 5 6 7 8 9` we can write `[0-9]` or `\d`
10 |
11 |
12 | ## Video 7: [Revisiting Naive Bayes, and Regex](https://youtu.be/Q1zLqfnEXdw)
13 | - revisiting Naive Bayes via spreadsheet
14 | -
15 |
16 | ## Video 8: [Intro to Language Modeling](https://youtu.be/PNNHaQUQqW8)
17 | - notebook 5
18 | -
19 |
20 | ## Video 9: [Transfer learning](https://youtu.be/5gCQvuznKn0)
21 |
22 | A **Bloom filter** is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set.
23 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/README.md:
--------------------------------------------------------------------------------
1 | # Udacity: PyTorch Scholarship Challenge
2 |
3 | ## Useful Links
4 | - [Slack](https://pytorchfbchallenge.slack.com/messages/CDBRFM534/details/)
5 | - [List of Lessons](https://classroom.udacity.com/nanodegrees/nd188/parts/ef29955b-1133-473a-a46f-c0696c865f97)
6 | - [Udacity home](https://classroom.udacity.com/me)
7 | - [Udacity Program Manager page](https://sites.google.com/udacity.com/pytorch-scholarship-facebook/home?bsft_eid=b79c3be9-39ba-50c5-c5c6-a0855c187059&utm_campaign=sch_600_2018-11-09_ndxxx_pytorch-firstday_na&utm_source=blueshift&utm_medium=email&utm_content=sch_600_2018-11-09_ndxxx_pytorch-firstday_na&bsft_clkid=183339b1-e50a-4fde-b1ce-2c28e575da50&bsft_uid=806e445b-d051-4ad1-b190-fa3b2c617935&bsft_mid=3978c5d6-05bb-4c5a-8977-b6a49db0ac22)
8 | - GitHub [Udacity](https://github.com/udacity/deep-learning-v2-pytorch)
9 |
10 | ## Goals
11 |
12 |
13 | ## Lessons
14 | - [x] Setup: slack, repo
15 | - [x] Lesson 0 Welcome
16 | - [x] Lesson 1 Intro to NN (2 hrs)
17 | - [x] Lesson 2 Talking PyTorch with Soumith (30 min)
18 | - [x] Lesson 3 Intro to PyTorch (2 hrs)
19 | - [x] Lesson 4 CNN (5 hrs)
20 | - [x] Lesson 5 Style Transfer (5 hrs)
21 | - [ ] Lesson 6 RNN (5 hrs)
22 | - [ ] Lesson 7 Sentiment Prediction with RNN (2 hrs)
23 | - [ ] Lesson 8 Deploying PyTorch Models (30 min)
24 | - [ ] Lab Challenge
25 |
26 | ## Course Outline
27 | In this course, you'll learn the basics of deep neural networks and how to build various models using PyTorch. You'll get hands-on experience building state-of-the-art deep learning models.
28 |
29 | ### 1. Introduction to Neural Networks
30 | - Learn the concepts behind deep learning and how we train deep neural networks with backpropagation.
31 |
32 | ### 2. Talking PyTorch with Soumith Chintala
33 | - Cezanne Camacho and Soumith Chintala, the creator of PyTorch, chat about the past, present, and future of PyTorch.
34 |
35 | ### 3. Introduction to PyTorch
36 | - Learn how to build deep neural networks with PyTorch
37 | - Build a state-of-the-art model using a pre-trained network that classifies cat and dog images
38 |
39 | ### 4. Convolutional Neural Networks
40 | - Here you'll learn about convolutional neural networks, powerful architectures for solving computer vision problems.
41 | - Build and train an image classifier from scratch to classify dog breeds.
42 |
43 | ### 5. Style Transfer
44 | - Use a trained network to transfer the style of one image to another image
45 | - Implement the style transfer model from Gatys et al.
46 |
47 | ### 6. Recurrent Neural Networks
48 | - Learn how to use recurrent neural networks to learn from sequences of data such as time series
49 | - Build a recurrent network that learns from text and generates new text one character at a time
50 |
51 | ### 7. Sentiment Prediction with an RNN
52 | - Build and train a recurrent network that can classify the sentiment of movie reviews
53 |
54 | ### 8. Deploying PyTorch Models
55 | - Learn how to use PyTorch's Hybrid Frontend to convert models from Python to C++ for use in production
56 |
57 | ---
58 |
59 | ## References
60 | - [Convolutional Neural Networks (CNNs / ConvNets)](http://cs231n.github.io/convolutional-networks/#conv)
61 | - [Joel Grus - Livecoding Madness - Let's Build a Deep Learning Library](https://www.youtube.com/watch?v=o64FV-ez6Gw)
62 |
63 | ## Recommendations from Udacity
64 | 1. Stanford NLP class (using PyTorch):* http://web.stanford.edu/class/cs224n/
65 | 2. UC-Berkeley CV Class (also using PyTorch* – will be published as open courseware): https://inst.eecs.berkeley.edu/~cs280/sp18/
66 | 3. Colab now supports native PyTorch* – give it a try by importing torch torchvision, etc.. and then changing to the GPU backend: https://colab.research.google.com/notebooks/welcome.ipynb#recent=true
67 | 4. EE-559 at EPFL (including slides, code, etc..) - https://fleuret.org/ee559/
68 | 5. Ecosystem projects: https://pytorch.org/ecosystem
69 | 6. Newer tutorial by Jeremy Howard and Rachel Thomas - https://pytorch.org/tutorials/beginner/nn_tutorial.html
70 |
71 | *A few folks they follow who are awesome:*
72 | - Chris Manning (Stanford)
73 | - Alyosha Efros (UC-Berkeley)
74 | - Yann Lecun (FB and NYU)
75 | - Smerity (Stephen Merity) https://twitter.com/Smerity?lang=en
76 | - Andrey Karpathy (Telsa)
77 | - Bryan Catanzaro https://twitter.com/ctnzr?lang=en
78 | - Delip Rao https://twitter.com/deliprao?lang=en
79 | - Lisha Li https://twitter.com/lishali88?lang=en
80 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/images/.keep:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/images/cnn_formulas.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/udacity_pytorch/images/cnn_formulas.png
--------------------------------------------------------------------------------
/courses/udacity_pytorch/notes.md:
--------------------------------------------------------------------------------
1 | # PyTorch
2 |
3 | IR = Intermediate Representation
4 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/orientation.md:
--------------------------------------------------------------------------------
1 | # Orientation
2 |
3 | ## Info
4 | - 12-Nov-2018
5 | - YouTube: https://www.youtube.com/watch?v=bHpZfvVQI3g
6 |
7 | ## Contact
8 | - DM in Slack to give feedback (or via AMA)
9 |
10 | ## to do
11 | - download the Udacity app
12 | - Slack: quality over quantity
13 | - step up, leadership
14 |
15 | ## Tips
16 | - add calendar to personal calendar
17 | - if you miss an AMA or slack, it is all archived on PyTorch challenge site.
18 | - attitude of finishing way ahead of schedule, try to get it done in 2-4 weeks
19 | - put in whatever extra time you have available
20 | - stay a part of the community all the way to the end
21 | -
22 | ## Phase 1 Resources & Programs
23 | - check out bios of Alumni Volunteers
24 | - team of 3 on staff, but with 10,000 students
25 | - post questions on Slack
26 | - Study groups, regional, community tab on PyTorch challenge site
27 | - .inTandem
28 | - role of: mentor, student, study buddy
29 | - Social Media tweets, scholarship pride, daily hashtags
30 |
31 | ## Questions
32 | - how do we turn a number (say 0.7+0.8 = 1.5) from 1.5 to a number between 0 and 1?
33 | - Use **sigmoid** function
34 | - sigmoid(1.5) = 0.82
35 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/pytorch_1_nanodegree.md:
--------------------------------------------------------------------------------
1 | # Lesson 1: Welcome
2 |
3 | ## Phase 2: Deep Learning Nanodegree Scholarship Opportunity
4 | You have the chance to qualify for a follow-up scholarship for a Nanodegree program. These scholarships will be awarded based on
5 | - your progress and performance in the initial 2 month challenge course
6 | - as well as your **contributions to the student community.**
7 |
8 | So, be sure to:
9 | - cover all the concepts in the course
10 | - complete all exercises along the way
11 | - and help your fellow students by answering their questions in the forums or in Slack
12 |
13 | ## Participation
14 |
15 | - We've created a Slack Workspace especially for this program so that you have the opportunity to interact with one another in a shared community. We encourage your to use Slack:
16 | - to ask questions and
17 | - receive technical help from your classmates and alumni volunteers
18 | - participate in events and **attend AMA (Ask Me Anything)** sessions with the Scholarship Team.
19 |
20 | ## Developing an AI Application
21 | After training and optimizing your model, you'll upload the saved network to one of our workspaces. Your model will receive a score based on its accuracy predicting flower species from a test set. This **score will be used in our decision process for awarding scholarships.**
22 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/pytorch_3.md:
--------------------------------------------------------------------------------
1 | # PyTorch
2 |
3 | - `weights.reshape(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)` sometimes, and sometimes a clone, as in it copies the data to another part of memory
4 | - `weights.resize_` underscore at end means this method is an in-place operation
5 |
6 | ## Universal Function Approximator
7 | -
8 |
9 | ## Loss Function (Cost Function)
10 | - it is a measure of our prediction error
11 | - the whole goal is to adjust our network parameters to minimize our loss
12 | - we do this by using a process called **gradient descent**
13 |
14 | ## Gradient
15 | - the gradient is the slope of the loss function with respect to our parameters
16 | - the gradient always points in the direction of the *fastest change*
17 | - so, if we have a mountain, the gradient is always going to point up the mountain
18 | - so, you can imagine our loss function being like this mountain where we have a high loss up here, and we have a low loss down here
19 | - so, we know that we want to get to the minimum of our loss when we minimize our loss, and so we want to go downwards
20 | - and so, basically, the gradient points upwards and so, we just go the opposite direction. So, we go in the direction of the negative gradient
21 | - and then, if we keep following this down, then eventually we get to the bottom of this mountain, the **lowest loss**
22 | - with multi-layered neural networks, we use an algorithm called backpropagation to do this
23 |
24 | ## Backpropagation
25 | - backprop is really just an application of the chain rule of calculus
26 | - So, if you think about it, when we pass in some data, some input into our network, it goes through this forward pass through the network to calculate our loss
27 | - So, we pass in some data, some feature input x
28 | - and then it goes through this linear transformation which depends on our weights and biases.
29 | - And then through some activation function like sigmoid
30 | - through another linear transformation with some more weights and biases
31 | - and then that goes in [last layer], and from that we calculate our loss
32 | - So, if we make a small change in our weights (say in the first layer), it's going to propagate through the network and end up, like results in, a small change in our loss.
33 | - So you can kind of think of this as a chain of changes
34 | - So, with backprop, we actually use these same changes, but we go in the opposite direction
35 | - So, for each of these operations like the loss and the linear transformation (L2), and the sigmoid activation function, there's always going to be some derivative, some gradient between the outputs and the inputs
36 | - And so what we do, is we take each of the gradients for these operations and we pass them backwards through the network.
37 | - At each step, we multiply the incoming gradient with the gradient of the operation itself.
38 | - So, for example, just kind of starting at the end with the loss
39 | - so we pass this gradient through the loss, dl/dL2, so this is the gradient of the loss with respect to the second linear transformation
40 | - and then we pass that backwards again and if we multiply it by the loss of this L2, so this is the linear transformation with respect to the outputs of our activation function, that gives us the gradient for this operation
41 | - And if you multiply this gradient by the gradient coming from the loss, then we get the total gradient for both of these parts
42 | - and this gradient can be passed back to this softmax function
43 | - So, as the general process for backpropagation, we take our gradients, we pass it backwards to the previous operation, multiply it by the gradient there, and then pass that total gradient backwards.
44 | - So, we just keep doing that through each of the operations in our network.
45 |
46 | ## Losses in PyTorch
47 | - PyTorch provide a lot of losses, including the cross entropy loss
48 | - `criterion = nn.CrossEntropyLoss`
49 | - Cross entropy loss is used in classification problems
50 | - So, if we wanted to use cross-entropy, we just say `criterion = nn.crossEntropyLoss` and create that class
51 | - So, one thing to note, if you look at the documentation for cross-entropy loss, you'll see that it actually wants the scores, like the logits, of our network, as the input to the cross-entropy loss.
52 | - So, you'll be using this with an output such as softmax, which gives us this nice probability distribution. But, for computational reasons, then it's generally better to use the logits which are the input to the softmax as the input to this loss.
53 | - So, the input is expected to be the scores for each class, and not the probabilities themselves.
54 | - So, first I am going to import the necessary modules.
55 |
56 | ## Metrics
57 | - Accuracy
58 | - Precision
59 | - Recall
60 | - Top-5 Error Rate
61 | - `ps.topk(1)` returns the highest value (or probability) for a class
62 |
63 | ## Transfer Learning
64 | - Using a pre-trained network on images not in the training set is called transfer learning.
65 | - Most of the pretrained models require the input to be 224x224 images.
66 |
67 |
68 |
69 |
70 |
71 |
72 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/pytorch_4_cnn.md:
--------------------------------------------------------------------------------
1 | # Convolutional Neural Networks
2 |
3 |
4 | ## Normalization
5 | - will help out networks train better
6 | - for the MNIST data, we divide each pixel value by 255. Our normalized range will be from 0 to 1
7 | - because neural networks rely on gradient calculations, normalizing the pixels helps the gradient calculations stay consistent and not get so large that they slow down or prevent a network from training
8 | -
9 |
10 | ## Normalizing image inputs
11 | Data normalization is an important pre-processing step. It ensures that each input (each pixel value, in this case) comes from a standard distribution. That is, the range of pixel values in one input image are the same as the range in another image. This standardization makes our model train and reach a minimum error, faster!
12 |
13 | Data normalization is typically done by subtracting the mean (the average of all pixel values) from each pixel, and then dividing the result by the standard deviation of all the pixel values. Sometimes you'll see an approximation here, where we use a mean and standard deviation of 0.5 to center the pixel values. Read more about the [Normalize transformation in PyTorch](https://pytorch.org/docs/stable/torchvision/transforms.html#transforms-on-torch-tensor).
14 |
15 | The distribution of such data should resemble a [Gaussian function](http://mathworld.wolfram.com/GaussianFunction.html) centered at zero. For image inputs we need the pixel numbers to be positive, so we often choose to scale the data in a normalized range [0,1].
16 |
17 | ## MLP = Multi-layer perceptron
18 |
19 | ### Validation Set: Takeaways
20 | We create a validation set to:
21 | - Measure how well a model generalizes, during training
22 | - Tell us when to stop training a model; when the validation loss stops decreasing (and especially when the validation loss starts increasing and the training loss is still decreasing)
23 |
24 | ## MLP vs CNN
25 | ### MLP
26 | - use only fully connected layers
27 | - only accept vectors as input
28 |
29 | ### CNN
30 | - also use sparsely connected layers
31 | - also accept matrices as input
32 |
33 | ### openCV Library
34 | OpenCV is a computer vision and machine learning software library that includes many common image analysis algorithms that will help us build custom, intelligent computer vision applications. To start with, this includes tools that help us process images and select areas of interest! The library is widely used in academic and industrial applications; from their site, OpenCV includes an impressive list of users:
35 | > “Along with well-established companies like Google, Yahoo, Microsoft, Intel, IBM, Sony, Honda, Toyota that employ the library, there are many startups such as Applied Minds, VideoSurf, and Zeitera, that make extensive use of OpenCV.”
36 |
37 | So, note, how we `import cv2` in the next notebook and use it to create and apply image filters!
38 |
39 | ## Define a Convolutional Layer in PyTorch
40 | ```python
41 | self.conv1 = nn.Conv2d(depth_of_input, desired_depth_of_output,
42 | kernel_size, stride = 1, padding = 0)
43 | ```
44 |
45 | - 3 channels of input: R, G, B
46 | - we may want to produce 16 images (or "filters")
47 | - kernel size: 3x3 filter
48 | - stride generally set to 1 (often the default value)
49 | - padding, set it so convolutional layer will have same height and width as previous layer
50 | ```python
51 | self.conv1 = nn.Conv2d(3, 16, 3, stride = 1, padding = 0)
52 | ```
53 |
54 | ### Max Pooling
55 | - max pooling follows every 1 or 2 convolutional layers in the sequence
56 | - To define a max pooling layer, you only need to define the filter size and stride.
57 | ```python
58 | self.maxpool = nn.MaxPool2d(kernel_size, stride)
59 | ```
60 | - most common settings:
61 | ```python
62 | self.maxpool = nn.MaxPool2d(2, 2)
63 | ```
64 |
65 | ### Q
66 | - If you want to define a convolutional layer that is the same x-y size as an input array, what padding should you have for a kernel_size of 7? (You may assume that other parameters are left as their default values.)
67 | - padding=3
68 | - Yes! If you overlay a 7x7 kernel so that its center-pixel is at the right-edge of an image, you will have 3 kernel columns that do not overlay anything! So, that's how big your padding needs to be.
69 |
70 | ## Convolutional Layers
71 | - We typically define a convolutional layer in PyTorch using nn.Conv2d, with the following parameters, specified:
72 | - `nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)`
73 | - `in_channels` refers to the depth of an input. For a grayscale image, this depth = 1
74 | - `out_channels` refers to the desired depth of the output, or the number of filtered images you want to get as output
75 | - `kernel_size` is the size of your convolutional kernel (most commonly 3 for a 3x3 kernel)
76 | - `stride` and `padding` have default values, but should be set depending on how large you want your output to be in the spatial dimensions x, y
77 | Read more about Conv2d in the documentation.
78 |
79 | ## Pooling Layers
80 | - Maxpooling layers commonly come after convolutional layers to shrink the x-y dimensions of an input, read more about pooling layers in PyTorch, here.
81 |
82 | ## forward
83 | Here, we see that poling layer being applied in the forward function.
84 | ```python
85 | x = F.relu(self.conv1(x))
86 | x = self.pool(x)
87 | ```
88 |
89 | 
90 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/pytorch_5_style_transfer.md:
--------------------------------------------------------------------------------
1 | # Style Transfer
2 |
3 | ## Gram Matrix
4 | - non-localized information is information that would still be there even if the image was shuffled around in space
5 | - style: prominent colors and textures of an image
6 | - gram matrix: whose values indicate the similarities between the layers
7 | - dimensions don't depend on the input image
8 | - just one mathematical way of representing shared or prominent styles
9 | - style itself is kind of an abstract idea. but the gram matrix is the most widely used in practice
10 |
11 | ## Style Loss
12 | - the smaller the alpha/beta ratio, the more stylistic effect you will see.
13 | - alpha: content weight
14 | - beta: style weight
15 |
16 | ## VGG Features
17 |
18 | ## Lesson 8 Notebook (Exercise)
19 | - https://github.com/udacity/deep-learning-v2-pytorch/blob/master/style-transfer/Style_Transfer_Solution.ipynb
20 |
21 |
--------------------------------------------------------------------------------
/courses/udacity_pytorch/pytorch_6_rnn.md:
--------------------------------------------------------------------------------
1 | # RNN
2 |
3 | ## recurrent neural networks (RNNs)
4 | RNNs are designed specifically to learn from sequences of data by passing the hidden state from one step in the sequence to the next step in the sequence, combined with the input.
5 |
6 | ## long short-term memory (LSTM)
7 | LSTMs are an improvement the RNNs, and are quite useful when our neural network needs to switch between remembering recent things, and things from long time ago.
8 |
9 |
10 | But first, I want to give you some great references to study this further. There are many posts out there about LSTMs, here are a few of my favorites:
11 |
12 | Chris Olah's LSTM post
13 | Edwin Chen's LSTM post
14 | Andrej Karpathy's blog post on RNNs
15 | Andrej Karpathy's lecture on RNNs and LSTMs from CS231n
16 |
17 | ## Recurrent Layers
18 | Here is the documentation for the main types of recurrent layers in PyTorch. Take a look and read about the three main types: RNN, LSTM, and GRU.
19 |
20 | - The hidden state should have dimensions: (num_layers, batch_size, hidden_dim).
21 |
22 | ---
23 |
24 | ### `__init__` explanation
25 | First I have an **embedding layer**, which should take in the size of our vocabulary (our number of integer tokens) and produce an embedding of `embedding_dim` size. So, as this model trains, this is going to create and embedding lookup table that has as many rows as we have word integers, and as many columns as the embedding dimension.
26 |
27 | Then, I have an **LSTM layer**, which takes in inputs of `embedding_dim` size. So, it's accepting embeddings as inputs, and producing an output and hidden state of a hidden size. I am also specifying a number of layers, and a dropout value, and finally, I’m setting `batch_first` to True because we are using DataLoaders to batch our data like that!
28 |
29 | Then, the LSTM outputs are passed to a dropout layer and then a fully-connected, linear layer that will produce `output_size` number of outputs. And finally, I’ve defined a sigmoid layer to convert the output to a value between 0-1.
30 |
31 | Feedforward behavior
32 | Moving on to the `forward` function, which takes in an input `x` and a `hidden state`, I am going to pass an input through these layers in sequence.
33 |
34 | ```python
35 | def forward(self, x, hidden):
36 | """
37 | Perform a forward pass of our model on some input and hidden state.
38 | """
39 | batch_size = x.size(0)
40 |
41 | # embeddings and lstm_out
42 | embeds = self.embedding(x)
43 | lstm_out, hidden = self.lstm(embeds, hidden)
44 |
45 | # stack up lstm outputs
46 | lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
47 |
48 | # dropout and fully-connected layer
49 | out = self.dropout(lstm_out)
50 | out = self.fc(out)
51 |
52 | # sigmoid function
53 | sig_out = self.sig(out)
54 |
55 | # reshape to be batch_size first
56 | sig_out = sig_out.view(batch_size, -1)
57 | sig_out = sig_out[:, -1] # get last batch of labels
58 |
59 | # return last sigmoid output and hidden state
60 | return sig_out, hidden
61 | ```
62 |
63 | ### `forward` explanation
64 | So, first, I'm getting the batch_size of my input x, which I’ll use for shaping my data. Then, I'm passing x through the embedding layer first, to get my embeddings as output
65 |
66 | These embeddings are passed to my lstm layer, alongside a hidden state, and this returns an lstm_output and a new hidden state! Then I'm going to stack up the outputs of my LSTM to pass to my last linear layer.
67 |
68 | Then I keep going, passing the reshaped lstm_output to a dropout layer and my linear layer, which should return a specified number of outputs that I will pass to my sigmoid activation function.
69 |
70 | Now, I want to make sure that I’m returning only the last of these sigmoid outputs for a batch of input data, so, I’m going to shape these outputs into a shape that is batch_size first. Then I'm getting the last bacth by called `sig_out[:, -1], and that’s going to give me the batch of last labels that I want!
71 |
72 | Finally, I am returning that output and the hidden state produced by the LSTM layer.
73 |
74 | ### `init_hidden`
75 | That completes my forward function and then I have one more: init_hidden and this is just the same as you’ve seen before. The hidden and cell states of an LSTM are a tuple of values and each of these is size (n_layers by batch_size, by hidden_dim). I’m initializing these hidden weights to all zeros, and moving to a gpu if available.
76 |
77 | ```python
78 | def init_hidden(self, batch_size):
79 | ''' Initializes hidden state '''
80 | # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
81 | # initialized to zero, for hidden state and cell state of LSTM
82 | weight = next(self.parameters()).data
83 |
84 | if (train_on_gpu):
85 | hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
86 | weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
87 | else:
88 | hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
89 | weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
90 |
91 | return hidden
92 | ```
93 |
94 |
95 |
--------------------------------------------------------------------------------
/courses/v2-dl1/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v2-dl1/.DS_Store
--------------------------------------------------------------------------------
/courses/v2-dl1/README.md:
--------------------------------------------------------------------------------
1 | # Deep Learning Homework
2 |
3 | These are the recommended tasks:
4 | - go through the notebooks presented in class, read the extra text that is in the notebook and try changing the hyperparameters to better understand them
5 | - choose a dataset of your own and replicate the notebook
6 | - get started on the Kaggle competitions
7 | - read [blogs](https://github.com/reshamas/fastai_deeplearn_part1/blob/master/resources.md)
8 | - it is expected to watch the videos 2 to 4 times (from Lesson 1 to Lesson 7) to grasp concepts more fully
9 |
--------------------------------------------------------------------------------
/courses/v2-dl1/lesson_6_x.md:
--------------------------------------------------------------------------------
1 | # Lesson 6
2 |
3 |
4 | [Wiki: Lesson 6](http://forums.fast.ai/t/wiki-lesson-6/8629)
5 |
6 | Notebooks:
7 | * [lesson5-movielens.ipynb](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb)
8 | * [lesson6-rnn.ipynb](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson6-rnn.ipynb)
9 | * [lesson3-rossman.ipynb](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson3-rossman.ipynb)
10 | * [lesson6-sgd.ipynb](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson6-sgd.ipynb)
11 |
12 | ## Blogs to Review
13 |
14 | * [Optimization for Deep Learning Highlights in 2017](http://ruder.io/deep-learning-optimization-2017/index.html) by Sebastian Ruder (researcher, not USF student)
15 | - this blog covers SGD, ADAM, weight decays :red_circle: (read it!)
16 |
17 | * [Deep Learning #4: Why You Need to Start Using Embedding Layers](https://towardsdatascience.com/deep-learning-4-embedding-layers-f9a02d55ac12)
18 |
19 |
20 | ## Papers to Review
21 | * [Entity Embeddings of Categorical Variables](https://www.slideshare.net/sermakarevich/entity-embeddings-of-categorical-variables)
22 |
23 | ## Summary of Course so Far
24 | - our penultimate lesson
25 |
26 |
27 | ### Dimensions
28 | - we can compress high dimensional spaces to a few dimensions, using PCA (Principal Component Analysis)
29 | - PCA is a linear technique
30 | - Rachel's computational linear algebra covers PCA
31 | - PCA similar to SVD (singular value decomposition)
32 | - find 3 linear combinations of the 50 dimensions which capture as much of the variation as possible, but different from each other
33 | ```python
34 | from sklearn.decomposition import PCA
35 | pca = PCA(n_components=3)
36 | movie_pca = pca.fit(movie_emb.T).components_
37 | ```
38 |
39 | ### MAPE (Mean Average Percent Error)
40 | - can give folks at work random forest with embeddings without using neural networks
41 | - you can train a neural net with embeddings; everyone else in organization can chuck that into GBM or random forests or KNN
42 | - can give power of neural nets to everyone in organization without everyone having to do fastai table
43 | - embedding can be in SQL table
44 | - GBM and random forests learn a lot quicker than neural nets do
45 | - visualizing embeddings can be interesting
46 | - first, see things you expect to see
47 | - then, try seeing things that were not expected (some clusterings)
48 | - Q: skipgrams, a type of embedding?
49 | - A: skipgrams for NLP
50 | - say we have an unlabeled dataset, such as Google Books
51 | - the best way, in my opinion to turn an unlabeled (or unsupervised) problem into a labeled problem is to invent some labels
52 | - what they did in Word2vec is: here's a sentence with 11 words in it: _ _ _ _ _ _ _ _ _ _ _
53 | - let's delete the middle word and replace it with a random word
54 | - example: replace "cat" with "justice"
55 | - sentence: the cute little **CAT** sat on the fuzzy mat ---> **assign label = 1**
56 | - sentence: the cute little **JUSTICE** sat on the fuzzy mat ---> **assign label = 0**
57 | - ! now we have something we can build a machine learning model on
58 | - quick, shallow learning, end up with embeddings with linear characteristics
59 |
60 | ## NLP
61 | - for something more predictive, use neural net
62 | - we need to move past Word2Vec and GLoVe, these linear based methods; these embeddings are way less predictive than with embeddings learned with deep models
63 | - nowadays, **unsupervised learning** is really **fake task labeled learning**
64 | - we need something where the type of relationships it's going to learn are the types we care about.
65 |
66 | ## Fake Tasks
67 | - in computer vision, let's take an image and use an unusal data augmentation, such as recolor them too much, and ask neural net to predict augmented and non-augmented image
68 | - use the best fake task you can
69 | - a bad "fake task" is an **auto-encoder**; reconstruct my input using neural net with some activations deleted; most uncreative task, but it works surprisingly well
70 | - we may cover this unsupervised learning in Part 2, if there is interest
71 |
72 | `41:00` back to Rossman notebook:
73 | - https://github.com/fastai/fastai/blob/master/courses/dl1/lesson3-rossman.ipynb
74 | - lot of details of this notebook are covered in the ML course
75 |
76 | #### Shallow Learning vs Deep Learning
77 | - shallow learning means it doesn't have a hidden layer
78 |
79 | ## Recurrent Neural Networks
80 | https://github.com/fastai/fastai/blob/master/courses/dl1/lesson6-sgd.ipynb
81 |
82 |
83 | * [lesson6-rnn.ipynb](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson6-rnn.ipynb)
84 |
85 |
86 | - Machine Learning course - building stuff up from the foundations
87 | - Deep Learning course - best practices, top down
88 | - Lessons 9, 10, 11 of ML course: create a neural net layer from scratch
89 | -
90 |
91 |
--------------------------------------------------------------------------------
/courses/v2-dl1/lesson_7_x.md:
--------------------------------------------------------------------------------
1 | # Lesson 7
2 | live 11-Dec-2017
3 |
4 |
5 | [Wiki: Lesson 7](http://forums.fast.ai/t/lesson-7-wiki-thread/8847/1)
6 |
7 | Notebooks:
8 | * [lesson6-rnn.ipynb](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson6-rnn.ipynb)
9 | * [lesson7-cifar10.ipynb](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson7-cifar10.ipynb)
10 | * [lesson7-CAM.ipynb](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson7-CAM.ipynb)
11 |
12 | ---
13 | ## Other links
14 | - WILD ML RNN Tutorial - http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
15 | - Chris Olah on LSTM http://colah.github.io/posts/2015-08-Understanding-LSTMs/
16 | - More from Olah and others - https://distill.pub/
17 | - [BatchNorm paper](https://arxiv.org/pdf/1502.03167.pdf)
18 | - [Laptop recommendation](https://youtu.be/EKzSiuqiHNg?t=1h1m51s); [Surface Book 2 15 inch](https://www.cnet.com/products/microsoft-surface-book-2/review/)
19 |
20 |
21 | ## Theme of Part 1
22 | - classification and regression with deep learning
23 | - identifying best practices
24 | - here are 3 lines of code for image classification
25 | - first 4 lessons were NLP, structured data, collaborative filtering
26 | - last 3 lessons were above topics in more detail, more detailed code
27 |
28 | ## Theme of Part 2
29 | - generative modeling
30 | - creating a sentence, image captioning, neural translation
31 | - creating an image, style transfer
32 | - moving from best practices to speculative practices
33 | - how to read a paper and implement from scratch
34 | - does not assume a particular math background, but be prepared to dig through notation and convert to code
35 |
36 | ## RNN
37 | - not so different
38 | - they are like a fully connected network
39 |
40 | ## Batch Size
41 | `bs=64` means data is split into 65 chunks of data.
42 | NOT batches of size 64!
43 |
44 | ## Data Augmentation for NLP
45 | - JH can't talk about that; doesn't know a good way
46 | - JH will do further study on that
47 |
48 | ## CIFAR 10
49 | - well-known dataset in academia: https://www.cs.toronto.edu/~kriz/cifar.html
50 | - small datasets are much more interesting than ImageNet
51 | - often, we're looking at 32x32 pixels (example: lung cancer image)
52 | - often, it's more challenging, and more interesting
53 | - we can run algorithms much more quickly, and it's still challenging
54 | - you can get the data by: `wget http://pjreddie.com/media/files/cifar.tgz` (provided in form we need)
55 | - this is mean, SD per channel; try to replicate on your own
56 | ```python
57 | classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
58 | stats = (np.array([ 0.4914 , 0.48216, 0.44653]), np.array([ 0.24703, 0.24349, 0.26159]))
59 | ```
60 | - Kerem's notebook on how different optimizers work: https://github.com/KeremTurgutlu/deeplearning/blob/master/Exploring%20Optimizers.ipynb
61 | - to improve model, we'll next replace our fully connected model (with 1 hidden layer) with a CNN
62 | - `nn.Conv2d(layers[i], layers[i + 1], kernel_size=3, stride=2)`
63 | - `layers[i]` number of features coming in
64 | - `layers[i + 1]` number of features coming out
65 | - `stride=2` is a "stride 2 convolution"
66 | - it has similar effect to `maxpooling`; reduces the size of the layers
67 | - `self.pool = nn.AdaptiveMaxPool2d(1)`
68 | - standard now for state-of-the-art algorithms
69 | - I'm not going to tell you how big an area to pool, I will tell you how big a resolution to create
70 | - starting with 28x28: Do a 14x14 adaptive maxpool; same as 2x2 maxpool with a 14x14 output
71 |
72 | ## BatchNorm (Batch Normalization)
73 | - a couple of years old now
74 | - makes it easier to train deeper networks
75 |
76 |
77 | ## Getting Ready for Part 2
78 | - assumes you have mastered all techniques introdued in Part 1
79 | - has same level of intensity as Part 1
80 | - people who did well in Part 2 last year watched each of the videos at least 3 times
81 | - make sure you get to the point where you can recreate the notebooks without watching the videos
82 | - try and recreate the notebooks using different datasets
83 | - keep up with the forum; recent papers, advances
84 | - you'll find less of it is mysterious; makes more sense; there will always be stuff you don't understand
85 | - Lessons 1 and 2 of Part 1 may seem trivial
86 | - people who succeed are those who keep working at it
87 | - hope to see you all in March
88 | - see you in the Forum
89 |
--------------------------------------------------------------------------------
/courses/v2-dl2/README.md:
--------------------------------------------------------------------------------
1 | Deep Learning - Part 2
2 |
--------------------------------------------------------------------------------
/courses/v3-dl1/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/.DS_Store
--------------------------------------------------------------------------------
/courses/v3-dl1/.keep:
--------------------------------------------------------------------------------
1 | xxx
2 |
--------------------------------------------------------------------------------
/courses/v3-dl1/README.md:
--------------------------------------------------------------------------------
1 |
2 | ## Course Videos
3 | https://course.fast.ai/videos
4 |
5 |
6 | ## Get to Jupyter Notebook
7 | - Go to localhost (run Jupyter Notebook):
8 | http://localhost:8080/tree
9 |
10 | ## Important Links
11 | - [Google Cloud Platform](http://course-v3.fast.ai/start_gcp.html)
12 | - [GCP: update fastai, conda & packages](http://course-v3.fast.ai/start_gcp.html#step-4-access-fastai-materials-and-update-packages)
13 |
14 |
15 | [PyTorch Forums](https://discuss.pytorch.org)
16 |
17 | ## Model Tuning Advice
18 |
19 | no graph for learning rate finder: means learning rate is too small
20 |
21 | ### Seed for validation dataset
22 | ```python
23 | np.random.seed(42)
24 | data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4)
25 | ```
26 | This means that every time I run this code, I will get the same validation set.
27 |
28 | ### If errors are too high
29 | #### example of problem
30 | ```bash
31 | Total time: 00:13
32 | epoch train_loss valid_loss error_rate
33 | 1 12.220007 1144188288.000000 0.765957 (00:13)
34 | ```
35 |
36 | #### example of solution
37 | ```python
38 | #learn.fit_one_cycle(6, max_lr=0.5)
39 | #learn.fit_one_cycle(6, max_lr=0.25)
40 | #learn.fit_one_cycle(6, max_lr=0.05)
41 | #learn.fit_one_cycle(6, max_lr=0.025)
42 | #learn.fit_one_cycle(6, max_lr=0.01)
43 | learn.fit_one_cycle(6, max_lr=0.001)
44 | ```
45 |
46 | ### LR finder plot is blank
47 | #### 1.
48 | ```python
49 | learn.recorder.plot()
50 | # if plot is blank
51 | learn.recorder.plot(skip_start=0, skip_end=0)
52 | ```
53 |
54 | #### 2. reduce batch size
55 | - Reducing your batch size, in order to increase the number of batches.
56 | ```python
57 | np.random.seed(42)
58 | data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4, bs=16)
59 | ```
60 |
61 | You’re now overfitting. Try 10 epochs, then unfreeze, then 4 epochs.
62 |
63 |
--------------------------------------------------------------------------------
/courses/v3-dl1/gcp_0_setup_notes.md:
--------------------------------------------------------------------------------
1 | # GCP Setup notes
2 |
3 | ## GCP
4 | - https://cloud.google.com
5 | - [Platform: GCP](https://forums.fast.ai/t/platform-gcp/27375) (Discourse topic)
6 | - [Tutorial](http://course-v3.fast.ai/start_gcp.html) to get started.
7 | - [Complete Guide](https://arunoda.me/blog/ideal-way-to-creare-a-fastai-node) - starting with $0.2/hour
8 |
9 |
10 | ## GCP (Google Cloud Compute)
11 | - fastai instructions for GCP:
12 | - http://course-v3.fast.ai/start_gcp.html
13 | - Console:
14 | - https://console.cloud.google.com/compute/instances?project=near-earth-comets-f8c3f&folder&organizationId
15 | -
16 |
17 | ## Instance
18 | ```bash
19 | gcloud --version
20 | ```
21 | output:
22 | ```bash
23 | Google Cloud SDK 222.0.0
24 | bq 2.0.36
25 | core 2018.10.19
26 | gsutil 4.34
27 | ```
28 |
29 | ## Create an Instance on GCP
30 | ```bash
31 | % export IMAGE_FAMILY="pytorch-1-0-cu92-experimental"
32 | % export ZONE="us-west2-b"
33 | % export INSTANCE_NAME="my-fastai-instance"
34 | % export INSTANCE_TYPE="n1-highmem-8"
35 | % gcloud compute instances create $INSTANCE_NAME \
36 | --zone=$ZONE \
37 | --image-family=$IMAGE_FAMILY \
38 | --image-project=deeplearning-platform-release \
39 | --maintenance-policy=TERMINATE \
40 | --accelerator='type=nvidia-tesla-p4,count=1' \
41 | --machine-type=$INSTANCE_TYPE \
42 | --boot-disk-size=200GB \
43 | --metadata='install-nvidia-driver=True' \
44 | --preemptible
45 | Created [https://www.googleapis.com/compute/v1/projects/near-earth-comets-f8c3f/zones/us-west2-b/instances/my-fastai-instance].
46 | NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
47 | my-fastai-instance us-west2-b n1-highmem-8 true 10.168.0.2 35.235.122.68 RUNNING
48 | %
49 | ```
50 |
51 | ## Go to GCP Console and see that the instance has been created
52 | - https://console.cloud.google.com/compute/instances?project=near-earth-comets-f8c3f&folder&organizationId
53 | - Note that this will be the page you have to go to later to **STOP YOUR INSTANCE**.
54 |
55 | ## Connect to GCP Instance
56 | - Once this is done, you can connect to your instance from the terminal by typing:
57 | Example:
58 | ```bash
59 | gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080
60 | ```
61 | For me, it is:
62 | ```bash
63 | gcloud compute ssh --zone=$ZONE jupyter@my-fastai-instance -- -L 8080:localhost:8080
64 | ```
65 | ###
66 | My passphrase: fastai
67 |
68 | ###
69 | >my example
70 |
71 | ```bash
72 | % gcloud compute ssh --zone=$ZONE jupyter@my-fastai-instance -- -L 8080:localhost:8080
73 | Updating project ssh metadata...⠧Updated [https://www.googleapis.com/compute/v1/projects/near-earth-comets-f8c3f].
74 | Updating project ssh metadata...done.
75 | Waiting for SSH key to propagate.
76 | Warning: Permanently added 'compute.7610414667562550937' (ECDSA) to the list of known hosts.
77 | Enter passphrase for key '/Users/reshamashaikh/.ssh/google_compute_engine':
78 | Enter passphrase for key '/Users/reshamashaikh/.ssh/google_compute_engine':
79 | ======================================
80 | Welcome to the Google Deep Learning VM
81 | ======================================
82 |
83 | Version: m10
84 | Based on: Debian GNU/Linux 9.5 (stretch) (GNU/Linux 4.9.0-8-amd64 x86_64\n)
85 |
86 | Resources:
87 | * Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform
88 | * Google Cloud Documentation: https://cloud.google.com/deep-learning-vm
89 | * Google Group: https://groups.google.com/forum/#!forum/google-dl-platform
90 |
91 | To reinstall Nvidia driver (if needed) run:
92 | sudo /opt/deeplearning/install-driver.sh
93 | This image uses python 3.6 from the Anaconda. Anaconda is installed to:
94 | /opt/anaconda3/
95 |
96 | If anything need to be installed and used with Jupyter Lab please do it in the following way:
97 | sudo /opt/anaconda3/bin/pip install
98 |
99 | Linux my-fastai-instance 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08) x86_64
100 |
101 | The programs included with the Debian GNU/Linux system are free software;
102 | the exact distribution terms for each program are described in the
103 | individual files in /usr/share/doc/*/copyright.
104 |
105 | Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
106 | permitted by applicable law.
107 | jupyter@my-fastai-instance:~$
108 | ```
109 |
110 | ### Commands to run
111 |
112 | ```bash
113 | ls
114 | python -V
115 | conda --version
116 | pip list
117 | ```
118 |
119 | ### Go to localhost (run Jupyter Notebook)
120 | http://localhost:8080/tree
121 |
122 |
123 |
124 |
--------------------------------------------------------------------------------
/courses/v3-dl1/gcp_1_logging_in.md:
--------------------------------------------------------------------------------
1 | # Logging in to GCP
2 |
3 | ## Step 1: GCP Console
4 | 1. Go to my [GCP console](https://console.cloud.google.com/compute/instances?project=near-earth-comets-f8c3f&folder&organizationId&duration=PT1H)
5 | 2. `Start` my instance, which is called `my-fastai-instance`
6 |
7 | ## Step 2: My Mac Terminal
8 | 0. `gcloud auth login` May need to login in via Google chrome
9 |
10 | 1. Go to my terminal on the Mac, type this:
11 | ```bash
12 | gcloud compute ssh --zone=$ZONE jupyter@my-fastai-instance -- -L 8080:localhost:8080
13 | ```
14 | ```bash
15 | gcloud compute ssh --zone=us-west2-b jupyter@my-fastai-instance -- -L 8080:localhost:8080
16 | ```
17 |
18 | >Enter passphrase for key '/Users/reshamashaikh/.ssh/google_compute_engine':
19 | ```
20 | xxxxx
21 | ```
22 | I will see this:
23 | ```bash
24 | jupyter@my-fastai-instance:~$
25 | ```
26 |
27 | ## Updating
28 | ### Important Links
29 | - [Google Cloud Platform](http://course-v3.fast.ai/start_gcp.html)
30 | - [GCP: update fastai, conda & packages](http://course-v3.fast.ai/start_gcp.html#step-4-access-fastai-materials-and-update-packages)
31 |
32 | ### Updating packages
33 | ```bash
34 | cd course-v3/
35 | git pull
36 | ```
37 | ```bash
38 | jupyter@my-fastai-instance:~/course-v3$ cd ..
39 | jupyter@my-fastai-instance:~$ pwd
40 | /home/jupyter
41 | ```
42 | ```bash
43 | cd tutorials/fastai
44 | git checkout .
45 | git pull
46 | ```
47 |
48 | ## Update fastai library
49 | ```bash
50 | sudo /opt/anaconda3/bin/conda install -c fastai fastai
51 | ```
52 | ```bash
53 | conda install -c fastai fastai
54 | ```
55 |
56 | ### get fastai version ---> in terminal
57 | ```bash
58 | pip list | grep fastai
59 | ```
60 |
61 |
62 |
63 | ---
64 | ```bash
65 | jupyter@my-fastai-instance:~/tutorials/fastai$ pip list | grep fastai
66 | fastai 1.0.12
67 | ```
68 | Fri, 11/12/18
69 | ```bash
70 | fastai 1.0.18
71 | ```
72 | Sat, 12/8/18
73 | ```bash
74 | fastai 1.0.35
75 | ```
76 | Sat, 12/15/18
77 | ```bash
78 | From https://github.com/fastai/fastai
79 | 7d617eda..af59fa03 master -> origin/master
80 | * [new branch] release-1.0.36 -> origin/release-1.0.36
81 | * [new branch] release-1.0.37 -> origin/release-1.0.37
82 | * [new tag] 1.0.37 -> 1.0.37
83 | * [new tag] 1.0.36 -> 1.0.36
84 | * [new tag] 1.0.36.post1 -> 1.0.36.post1
85 | ```
86 |
87 | ### get fastai version ---> in Jupyter notebook
88 | ```python
89 | import torch
90 | print(torch.__version__)
91 | import fastai
92 | print(fastai.__version__)
93 | ```
94 |
95 | ## Step 3: Get to Jupyter Notebook
96 | - Go to localhost (run Jupyter Notebook):
97 | http://localhost:8080/tree
98 |
99 | ## Where am I working?
100 | ```bash
101 | jupyter@my-fastai-instance:~/projects$ pwd
102 | /home/jupyter/projects
103 | ```
104 | http://localhost:8080/tree/projects
105 |
106 |
107 | ## Step 4: Shut down GCP instance in the console
108 | - Go to GCP console
109 |
110 | ---
111 |
112 | - `ImageBunch`
113 | - `TextDataBunch`
114 |
--------------------------------------------------------------------------------
/courses/v3-dl1/images/.keep:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/courses/v3-dl1/images/camel.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/camel.jpeg
--------------------------------------------------------------------------------
/courses/v3-dl1/images/camels_class.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/camels_class.png
--------------------------------------------------------------------------------
/courses/v3-dl1/images/camels_confusion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/camels_confusion.png
--------------------------------------------------------------------------------
/courses/v3-dl1/images/elephant1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/elephant1.png
--------------------------------------------------------------------------------
/courses/v3-dl1/images/elephant_cm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/elephant_cm.png
--------------------------------------------------------------------------------
/courses/v3-dl1/images/elephant_predict.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/elephant_predict.png
--------------------------------------------------------------------------------
/courses/v3-dl1/images/gcp1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/gcp1.png
--------------------------------------------------------------------------------
/courses/v3-dl1/images/horse.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/horse.jpeg
--------------------------------------------------------------------------------
/courses/v3-dl1/images/horses_txt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/horses_txt.png
--------------------------------------------------------------------------------
/courses/v3-dl1/images/nyc_group.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/nyc_group.jpeg
--------------------------------------------------------------------------------
/courses/v3-dl1/images/rs_camel.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/rs_camel.jpg
--------------------------------------------------------------------------------
/courses/v3-dl1/images/soumith.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/soumith.jpg
--------------------------------------------------------------------------------
/courses/v3-dl1/images/south_africa.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v3-dl1/images/south_africa.png
--------------------------------------------------------------------------------
/courses/v3-dl1/kaggle_fruits.md:
--------------------------------------------------------------------------------
1 | ```bash
2 | 568 ls
3 | 569 pip install kaggle-cli
4 | 570 pip install kaggle
5 | 571 export KAGGLE_USERNAME=reshamashaikh
6 | 572 export KAGGLE_KEY=9896d8968968968968968968962
7 | 573 kaggle datasets download -d moltean/fruits
8 | 574 conda env list
9 | 575 history
10 | 576 pip uninstall kaggle-cli
11 | 577 pip install --upgrade pip
12 | 578 sudo pip install kaggle
13 | 579 history
14 | 580 kaggle datasets download -d moltean/fruits
15 | 581 ls
16 | 582 mkdir inputs
17 | 583 ls
18 | 584 mv inputs/ input/
19 | 585 ls
20 | 586 unzip fruits.zip input/
21 | 587 mkdir input/fruits
22 | 588 unzip fruits.zip input/
23 | 589 ls
24 | 590 unzip fruits.zip -d input/fruits/
25 | 591 ls
26 | 592 history
27 | jupyter@my-fastai-instance:~/kaggle_fruits$
28 | ```
29 |
30 | ```bash
31 | 571 export KAGGLE_USERNAME=reshamashaikh
32 | 572 export KAGGLE_KEY=9896d8968968968968968968962
33 | 578 sudo pip install kaggle
34 | kaggle datasets download -d moltean/fruits
35 | 582 mkdir inputs
36 | 584 mv inputs/ input/
37 | 590 unzip fruits.zip -d input/fruits/
38 | ```
39 |
40 | ```bash
41 | jupyter@my-fastai-instance:~/projects/dl_fastai/nlp/data$ kaggle datasets download -d yelp-dataset/yelp-dataset
42 | jupyter@my-fastai-instance:~/projects/dl_fastai/nlp/data$ ls
43 | yelp-dataset.zip
44 | jupyter@my-fastai-instance:~/projects/dl_fastai/nlp/data$ unzip yelp-dataset.zip
45 | ```
46 |
47 |
48 |
49 |
50 |
--------------------------------------------------------------------------------
/courses/v3-dl1/lesson_1_lecture.md:
--------------------------------------------------------------------------------
1 | # Lesson 1
2 |
3 | - Live Date: 22-Oct-2018
4 | - [Wiki](https://forums.fast.ai/t/lesson-1-class-discussion-and-resources/27332)
5 | - [Video](https://www.youtube.com/watch?v=BWWm4AzsdLk)
6 | - Video duration: 1:40:11
7 | - Notebook:
8 | - [lesson1-pets.ipynb](https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson1-pets.ipynb)
9 | - [fastai library](https://github.com/fastai/fastai)
10 |
11 | ---
12 |
13 | ## Homework: parts completed ✅
14 | - Google Cloud setup
15 | - Get your GPU going
16 | - read lesson1-pets.ipynb notebook
17 | - read [Tips for Building Image dataset](https://forums.fast.ai/t/tips-for-building-large-image-datasets/26688)
18 | - read [Lesson 1 notes](https://forums.fast.ai/t/deep-learning-lesson-1-notes/27748)
19 | - read fastai documentation
20 | - run lesson1-pets.ipynb
21 |
22 | ## Homework: To Do
23 | - get your own image dataset
24 | - Repeat process on your own dataset
25 | - Share on Forums
26 | - repo: fastai_docs
27 | - download repo
28 | - run the code
29 | - experiment
30 | - git/clone, open in Jupyter (GitHub doesn't render notebooks so well)
31 | - Use the first notebook
32 |
33 | ## Lesson 1 Pets
34 | ```bash
35 | RuntimeError: CUDA error: out of memory
36 | ```
37 | Note: reduce batch size and restart kernel
38 |
39 |
40 |
41 | ---
42 |
43 | # Intro
44 | - slightly delayed, waiting for students to get through security
45 | - For in-class students in SF:
46 | - get to know your group of 6
47 |
48 | ## Pete Maker
49 | - intro from PG&E
50 | - USF specific site procedures (earthquake, emergencies, evacuation)
51 |
52 | ## [David Uminsky](https://www.linkedin.com/in/david-uminsky-5153b1a8/)
53 | - Professor of DS at USF
54 | - Diversity Fellows sponsored by: EBay, Facebook
55 | - 3rd iteration of this course (started from 60-80 students to 280)
56 |
57 | ## Rachel Thomas
58 |
59 | ## Jeremy Howard
60 | - largest group of people joining: Bangalore, India
61 | - US towns
62 | - Lagos
63 |
64 | ## Computer for in-class
65 | 1. AWS Salamander
66 | 2. AWS EC2
67 | 3. Google Compute Platform (GCP)
68 |
69 | ## Computers for Int'l
70 | 1. Google Computer Platform
71 | - has fastai image
72 | - $300 credits
73 | 2. AWS EC2 $0.90/hr
74 |
75 | ## GCP
76 | https://cloud.google.com
77 |
78 | ### Advice
79 | Pick one project, do it very well, and make it fantastic.
80 |
81 | doc(interp.plot_top_losses)
82 | - prediction, actual, loss, probability it was predicted
83 | - Don't be afraid to look at the source code.
84 | - confusion matrix, if you have lots of classes, don't use confusion matrix. use interp.most_confused.
85 | - `unfreeze`: please train the whole model
86 | - if you run out of memory, use a smaller batch size
87 |
88 |
89 |
--------------------------------------------------------------------------------
/courses/v3-dl1/lesson_1_rs_camels_horses.md:
--------------------------------------------------------------------------------
1 | # Camels vs Horses
2 |
3 | ## Important Links
4 | - [Google Cloud Platform](http://course-v3.fast.ai/start_gcp.html)
5 | - [GCP: update fastai, conda & packages](http://course-v3.fast.ai/start_gcp.html#step-4-access-fastai-materials-and-update-packages)
6 |
7 | ---
8 |
9 | # Downloading Images
10 | [Fastai tutorial: downloading images](https://github.com/fastai/course-v3/blob/master/nbs/dl1/download_images.ipynb)
11 | - After this step in the Chrome Javascript console:
12 |
13 | ```java
14 | window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
15 | ```
16 | in Mac, it will download a file called `download.csv` to my `~/Downloads` folder
17 | - rename the folder to your image. For me:
18 | 1. camels.csv
19 | 2. horses.csv
20 |
21 | #### Go to my `Downloads` directory
22 | ```bash
23 | pwd
24 | ```
25 | ```
26 | /Users/reshamashaikh/Downloads
27 | ```
28 |
29 | #### List items in directory in reverse order
30 | ```bash
31 | ls -lrt
32 | ```
33 | ```
34 | -rw-r--r--@ 1 68354 Oct 26 17:01 camels.csv
35 | -rw-r--r--@ 1 85497 Oct 26 17:03 horses.csv
36 | ```
37 |
38 | ## `scp` to GCP
39 | ```bash
40 | gcloud compute scp camels.csv jupyter@my-fastai-instance:~
41 | gcloud compute scp horses.csv jupyter@my-fastai-instance:~
42 | ```
43 |
44 | ## on GCP: move data to `data` directory
45 | ```bash
46 | jupyter@my-fastai-instance:~$ ls
47 | camels.csv course-v3 horses.csv tutorials
48 | jupyter@my-fastai-instance:~$ mv *.csv /home/jupyter/tutorials/data
49 | jupyter@my-fastai-instance:~$ ls
50 | course-v3 tutorials
51 | jupyter@my-fastai-instance:~$ ls tutorials/data
52 | camels.csv horses.csv
53 | jupyter@my-fastai-instance:~$
54 | ```
55 |
56 | ## convert `.csv` files to `.txt` files
57 | ```bash
58 | cat camels.csv | tr ',' '\n' > camels.txt
59 | cat horses.csv | tr ',' '\n' > horses.txt
60 | ```
61 |
62 | ## rename files to match notebook
63 | ```bash
64 | mv camels.txt urls_camels.txt
65 | mv horses.txt urls_horses.txt
66 | ```
67 |
68 | ## Create directory and upload urls file into your server
69 | - Original [notebook](https://github.com/fastai/course-v3/blob/master/nbs/dl1/download_images.ipynb
70 | ```bash
71 | my_path = "/home/jupyter/tutorials/"
72 | ```
73 | ```bash
74 | folder = 'camels'
75 | file = 'urls_camels.txt'
76 | ```
77 | ```bash
78 | path = Path(my_path+'data/mammals')
79 | dest = path/folder
80 | dest.mkdir(parents=True, exist_ok=True)
81 | ```
82 | do same for "horses"
83 |
84 | ### Move url_name.txt file to appropriate folder
85 | ```bash
86 | mv urls_camels.txt /home/jupyter/tutorials/data/mammals/camels
87 | mv urls_horses.txt /home/jupyter/tutorials/data/mammals/horses
88 | ```
89 |
90 | ## Directory Tree
91 | ```bash
92 | jupyter@my-fastai-instance:~/tutorials/data$ pwd
93 | /home/jupyter/tutorials/data
94 | jupyter@my-fastai-instance:~/tutorials/data$ tree -d
95 | .
96 | └── mammals
97 | ├── camels
98 | └── horses
99 |
100 | 3 directories
101 | jupyter@my-fastai-instance:~/tutorials/data$
102 | ```
103 |
104 | ## let's look at file
105 | ```bash
106 | head urls_camels.txt
107 | ```
108 |
109 | ```bash
110 | jupyter@my-fastai-instance:~/tutorials/data/mammals$ head urls_camels.txt
111 | https://media.buzzle.com/media/images-en/gallery/mammals/camels/1200-close-up-of-camel-nostrils.jpg
112 | http://www.cidrap.umn.edu/sites/default/files/public/styles/ss_media_popup/public/media/article/baby_camel_nursing.jpg?itok=0vwqXyoW
113 | https://www.thenational.ae/image/policy:1.632918:1506081168/image/jpeg.jpg?f=16x9&w=1200&$p$f$w=dfa40e8
114 | https://i.dailymail.co.uk/i/pix/2012/11/24/article-2237967-162CA49A000005DC-153_634x409.jpg
115 | https://samslifeinjeddah.files.wordpress.com/2014/08/jed-camel-2_edited.jpg
116 | https://i.pinimg.com/236x/29/94/04/299404d417dd8b836b4a5c396cb597a6--camel-animal-baby-camel.jpg
117 | https://i.chzbgr.com/full/9056188416/h8763E301/
118 | https://i.dailymail.co.uk/i/pix/2012/11/24/article-2237967-162CA5A0000005DC-2_634x372.jpg
119 | https://secure.i.telegraph.co.uk/multimedia/archive/01676/Camel_Milk_1676595c.jpg
120 | https://upload.wikimedia.org/wikipedia/commons/4/43/07._Camel_Profile%2C_near_Silverton%2C_NSW%2C_07.07.2007.jpg
121 | jupyter@my-fastai-instance:~/tutorials/data/mammals$
122 | ```
123 |
124 |
--------------------------------------------------------------------------------
/courses/v3-dl1/lesson_3_lecture.md:
--------------------------------------------------------------------------------
1 | # Lesson 3: Multi-label, Segmentation, Image Regression & More
2 |
3 | - Live Date: 08-Nov-2018
4 | - Video: https://www.youtube.com/watch?v=VPg2ZlRPiXI
5 | - Wiki: https://forums.fast.ai/t/lesson-3-official-resources-and-updates/29732
6 |
7 | ## Video Player for Lessons
8 | - Zach in SF study group
9 | - http://videos.fast.ai
10 |
11 | ## Intro
12 | - in class discussion thread for Forums, stuff related to lesson, related to people new
13 | - lesson 3, further discussion thread, on advanced sub-category
14 | - Andrew Ng has a Machine Learning course on Coursera
15 | - fastai ML course not a prereq for DL course
16 |
17 | ## Production area on documentation
18 | - [Zeit deployment of app](https://course-v3.fast.ai/deployment_zeit.html)
19 | -
20 |
--------------------------------------------------------------------------------
/courses/v3-dl2/README.md:
--------------------------------------------------------------------------------
1 | # Part 2 (version 3: Sprint 2019)
2 |
3 | Forums: [Part 2 Lessons, Links and Updates](https://forums.fast.ai/t/2019-part-2-lessons-links-and-updates/41429)
4 |
5 | ## Lessons
6 | - Lesson 8:
7 | - Lesson 9: How to train your model
8 |
--------------------------------------------------------------------------------
/courses/v3-dl2/lecture_8.md:
--------------------------------------------------------------------------------
1 | # Lesson 8
2 |
3 | ## From foundations: Matrix multiplication; Fully connected network forward and backward passes
4 |
5 | ### Broadcasting
6 | - powerful tool for writing code in Python that runs at C speed
7 | - with PyTorch, it will run at CUDA speed; allows us to get rid of our for-loops
8 | - 'broadcasting' a scalar to a tensor
9 | ```python
10 | t = c.expand_as(m)
11 | t.storage()
12 | t.stride(), t.shape
13 | ```
14 | - tensors that behave as higher rank things than they are actually stored as
15 | - broadcasting functionality gives us C like speed without additional memory overhead
16 | - `unsqueeze` adds an additional dimension
17 | ```python
18 | c.unsqueeze(1)
19 | ```
20 |
21 | ### Einstein Summation Notation
22 |
23 |
24 |
--------------------------------------------------------------------------------
/courses/v4-dl1/README.md:
--------------------------------------------------------------------------------
1 | # Deep Learning: fastai version 4
2 |
3 | ## Fastai Forums
4 | - [Official Part 1 (2020) updates and resources thread ](https://forums.fast.ai/t/official-part-1-2020-updates-and-resources-thread/63376)
5 |
6 |
7 | ## Resources
8 | - video: [nbdev tutorial](https://youtu.be/Hrs7iEYmRmg)
9 | - A walk-thru of the basic features of nbdev (http://nbdev.fast.ai/).
10 |
--------------------------------------------------------------------------------
/courses/v4-dl1/doc_Jupyter_01.md:
--------------------------------------------------------------------------------
1 |
2 | ## Referencing Documentation in Jupyter Notebook
3 |
4 | #### inside ( of function; shift + tab
5 | Note: go inside the parenthesis of a functio and hit Shift+Tab to see options
6 |
7 | 
8 |
9 | #### `?` OR `??` gives interactive python guide (abbreviated output below. [full doc output](doc_01_reference.md))
10 |
11 | ```text
12 | IPython -- An enhanced Interactive Python
13 | =========================================
14 |
15 | IPython offers a fully compatible replacement for the standard Python
16 | interpreter, with convenient shell features, special commands, command
17 | history mechanism and output results caching.
18 |
19 | At your system command line, type 'ipython -h' to see the command line
20 | options available. This document only describes interactive features.
21 |
22 | GETTING HELP
23 | ------------
24 |
25 | Within IPython you have various way to access help:
26 |
27 | ? -> Introduction and overview of IPython's features (this screen).
28 | object? -> Details about 'object'.
29 | object?? -> More detailed, verbose information about 'object'.
30 | %quickref -> Quick reference of all IPython specific syntax and magics.
31 | help -> Access Python's own help system.
32 |
33 | If you are in terminal IPython you can quit this screen by pressing `q`.
34 | ```
35 |
36 | #### `?learn` gives (`learn?` works too)
37 | ```bash
38 | Signature: learn(event_name)
39 | Type: Learner
40 | String form:
41 | File: /opt/conda/envs/fastai/lib/python3.7/site-packages/fastai2/learner.py
42 | Docstring: Group together a `model`, some `dls` and a `loss_func` to handle training
43 | ```
44 |
45 | #### `??learn` gives entire class info (doc abbreviated here); (`learn??` works too)
46 | ```bash
47 | Signature: learn(event_name)
48 | Type: Learner
49 | String form:
50 | File: /opt/conda/envs/fastai/lib/python3.7/site-packages/fastai2/learner.py
51 | Source:
52 | class Learner():
53 | def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
54 | metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
55 | moms=(0.95,0.85,0.95)):
56 | store_attr(self, "dls,model,opt_func,lr,splitter,model_dir,wd,wd_bn_bias,train_bn,metrics,moms")
57 | self.training,self.create_mbar,self.logger,self.opt,self.cbs = False,True,print,None,L()
58 | if loss_func is None:
59 | loss_func = getattr(dls.train_ds, 'loss_func', None)
60 | assert loss_func is not None, "Could not infer loss function from the data, please pass a loss function."
61 | self.loss_func = loss_func
62 | self.path = path if path is not None else getattr(dls, 'path', Path('.'))
63 | self.add_cbs([(cb() if isinstance(cb, type) else cb) for cb in L(defaults.callbacks)+L(cbs)])
64 | self.model.to(self.dls.device)
65 | if hasattr(self.model, 'reset'): self.model.reset()
66 | self.epoch,self.n_epoch,self.loss = 0,1,tensor(0.)
67 |
68 | @property
69 | def metrics(self): return self._metrics
70 | @metrics.setter
71 | def metrics(self,v): self._metrics = L(v).map(mk_metric)
72 | ```
73 |
74 | #### `?learn.predict` gives:
75 | ```bash
76 | Signature: learn.predict(item, rm_type_tfms=None, with_input=False)
77 | Docstring: Return the prediction on `item`, fully decoded, loss function decoded and probabilities
78 | File: /opt/conda/envs/fastai/lib/python3.7/site-packages/fastai2/learner.py
79 | Type: method
80 | ```
81 |
82 | #### `??learn.predict` gives:
83 | ```bash
84 | Signature: learn.predict(item, rm_type_tfms=None, with_input=False)
85 | Docstring: Return the prediction on `item`, fully decoded, loss function decoded and probabilities
86 | Source:
87 | def predict(self, item, rm_type_tfms=None, with_input=False):
88 | dl = self.dls.test_dl([item], rm_type_tfms=rm_type_tfms)
89 | inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
90 | dec = self.dls.decode_batch((*tuplify(inp),*tuplify(dec_preds)))[0]
91 | i = getattr(self.dls, 'n_inp', -1)
92 | dec_inp,dec_targ = map(detuplify, [dec[:i],dec[i:]])
93 | res = dec_targ,dec_preds[0],preds[0]
94 | if with_input: res = (dec_inp,) + res
95 | return res
96 | File: /opt/conda/envs/fastai/lib/python3.7/site-packages/fastai2/learner.py
97 | Type: method
98 | ```
99 |
100 | #### `doc(learn)` gives
101 |
102 | ```text
103 | Learner object at 0x7f5ffb61dfd0>[source]
104 | Learner object at 0x7f5ffb61dfd0>(event_name)
105 |
106 | Group together a model, some dls and a loss_func to handle training
107 | ```
108 | #### `doc(learn)` and getting to source code
109 | Can click on "[source]" after typing `doc(learn)` to bring you to the fastai code in GitHub repo
110 |
111 | - `doc(learn.predict)` gives
112 | ```text
113 | Learner.predict[source]
114 | Learner.predict(item, rm_type_tfms=None, with_input=False)
115 |
116 | Return the prediction on item, fully decoded, loss function decoded and probabilities
117 |
118 | Show in docs
119 | ```
120 |
121 | #### `ImageDataLoaders` + shift + tab
122 | ```text
123 | Init signature: ImageDataLoaders(*loaders, path='.', device=None)
124 | Docstring: Basic wrapper around several `DataLoader`s with factory methods for computer vision problems
125 | File: /opt/conda/envs/fastai/lib/python3.7/site-packages/fastai2/vision/
126 | ```
127 |
128 |
129 |
130 |
131 |
--------------------------------------------------------------------------------
/courses/v4-dl1/image/.keep:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/courses/v4-dl1/image/transforms.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/courses/v4-dl1/image/transforms.png
--------------------------------------------------------------------------------
/courses/v4-dl1/lesson_01.md:
--------------------------------------------------------------------------------
1 | # Lesson 1
2 | - Live: 17-Mar-2020
3 | - Time: 6:30 to 9pm PST (9:30pm to midnight EST)
4 | - course will be released in July
5 | - supposed to be the official version (now **v4**)
6 | - book: [Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD ](https://www.amazon.com/Deep-Learning-Coders-fastai-PyTorch/dp/1492045527)
7 |
8 | ## Homework
9 | - [Lesson 1 Homework](https://forums.fast.ai/t/did-you-do-the-homework/660340)
10 |
11 | - [x] make sure you can spin up a GPU server
12 | - [x] that you can shut it down when it is finished
13 | - [x] run the code shown in the lecture
14 | - [x] use the documentation, use the doc function inside juypter notebook
15 | - [x] do some searching of the fast.ai docs
16 | - [ ] see if you can grab the fast.ai documentation notebooks and try running them: [doc notebooks](https://github.com/fastai/fastai2/tree/master/nbs)
17 | - [ ] read a chapter of the fast.ai book 11
18 | - [ ] do the questionnaire at the end of the chapter (not everything has been covered yet, answer only the questions that you can)
19 | - [ ] try to get comfortable with running code
20 |
21 | ## Paperspace
22 | - fastai: [Getting Started with Gradient](https://course.fast.ai/start_gradient.html)
23 | - fastai: v4 [Paperspace (free, paid options)](https://forums.fast.ai/t/platform-paperspace-free-paid-options/65515)
24 |
25 | ### My steps on Paperspace
26 | 1. notebook: https://www.paperspace.com/telmjtws3/notebook/prjrrhy56
27 | 2. Open terminal, via Jupyter Notebook
28 | - type `bash` to get a regular terminal (autocomplete, etc)
29 | - `pip install fastai2 fastcore --upgrade`
30 | - `cd course-v4`
31 | - `git pull`
32 |
33 |
34 | ## Logistics
35 | - edited video will be available in 1-2 days
36 | - whatever you ask on the forum, it will eventually be public
37 | - it's not personal if your post gets deleted from forums, it's for the readability of the forums
38 | - 800 most valued members of community taking course
39 | - at 9:40pm EST, there are **441** people watching
40 | - at 9:45pm EST, **465**
41 | - at 10:00pm EST, **483**
42 | - at 111:45pm EST, **434**
43 | - at 12am, **405**
44 |
45 | ## Forums
46 | - can select "none" to remove study group threads
47 | - study group: research shows doing work in a group are much more likely to create powerful, long-term projects
48 | - will set up virtual study groups
49 |
50 | ## COVID-19
51 | - blog: [Covid-19, your community, and you — a data science perspective](https://www.fast.ai/2020/03/09/coronavirus/)
52 | - published: 09-Mar-2020
53 | - 1/2 million people read the blog
54 | - post translated in 15 languages
55 | - OPEN Forum category: [covid-19](https://forums.fast.ai/c/covid-19/52)
56 |
57 | 10:33 break
58 |
59 | ## Getting Started
60 | - AGI: Artificial General Intelligence
61 | - Neural networks: a brief history
62 | - 1986: MIT released book on Parallel Distributed Processing (PDP)
63 | -
64 |
65 | ## Education at Bat: 7 Principals for Educators
66 | Professor David Perkins uses his childhood baseball experiences:
67 | 1. Play the whole game
68 | 2. Make the game worth playing
69 | 3. Work on the hard parts
70 |
71 | You will be practicing things that are hard. Requires:
72 | - tenacity
73 | - committment
74 | - will need to work damn hard
75 | - spend less time on theory, and MORE time on running models and with code
76 |
77 | ## Software Stack
78 | - fastai
79 | - PyTorch
80 | - Python
81 |
82 | ## PyTorch
83 | - Tensorflow got bogged down
84 | - PyTorch was easier to use
85 | - in last 12 months, % of papers that use PyTorch at conferences went from 20% to 80%
86 | - industry moves slowly, but will catch up
87 | - PyTorch: very flexible, not designed for beginner-friendliness
88 | - doesn't have higher level libraries
89 | - fastai is the most popular higher level API for PyTorch
90 | - fastai uses a layered API
91 |
92 | ## To do work
93 | - need GPU, Nvidia one
94 | - use one of the platforms that is easily set up
95 | - run it on Linux; it's hard enough to learn deep learning w/o archane solutions
96 | - app_jupyter.ipynb: learn about Jupyter notebook
97 | - REPL: Read, Evaluate, Print, Loop
98 |
99 | ## Jupyter Notebook
100 | - shift + enter: to run
101 | - Workflow: select notebook, duplicate it and rename it
102 | - fastbook repository: all text from book
103 | - course-v4 --> this removes all text, leaves just code
104 | - at the end of notebooks, there are Questionnaires
105 | - What do we want you to take away from each notebook?
106 | - What should you know before you move on?
107 | - Do questionnaire before moving on to each chapter
108 | - If you missed something, do go back and read it
109 | - If you do get stuck after a couple of times, then do move on to next chapter and you might understand it better later
110 | - File / Trust Notebook
111 | - `jupyter labextension install @jupyter-widgets/jupyterlab-manager`
112 |
113 | ##
114 | - deep learning is a kind of machine learning
115 | -
116 |
117 | ## Limitations Inherent to Machine Learning
118 | -
119 |
120 | ## Consider how a model interacts with its environment
121 | - PROXY: arrest is a proxy for crime [listen to this again]
122 | -
123 |
124 | ## Homework
125 | 1. spin up a GPU server
126 | 2. run code
127 | 3. search fastai docs
128 | 4. try to get comfortable, know your way around
129 | 5. read chapter of book
130 | 6. go through questionnaire
131 |
132 |
133 |
--------------------------------------------------------------------------------
/courses/v4-dl1/lesson_03.md:
--------------------------------------------------------------------------------
1 | # Lesson 3
2 | - Live: 31-Mar-2020
3 | - Time: 6:30 to 9pm PST (9:30pm to midnight EST)
4 |
5 | - 9:30pm 144 viewing
6 | - 9:45pm 263 viewing
7 | - 10:00pm viewing
8 |
9 | Note: finished watching Apr 16, 2020.
10 |
11 | ## Homework
12 | - [Lesson 3 Homework] ()
13 |
14 | - [ ] read blog: [](https://www.fast.ai/2016/12/29/uses-of-ai/)
15 | - [ ] create your own application
16 |
17 |
18 | ## Notes
19 | - [fastai/fastbook](https://github.com/fastai/fastbook)
20 | - full notebooks that contain text of O'Reilly book
21 | - [fastai/course-v4](https://github.com/fastai/course-v4)
22 | - same notebooks with prose stripped away
23 | - do practice coding here
24 |
25 | ##
26 | - using notebook: https://github.com/fastai/fastbook/blob/master/02_production.ipynb
27 | - look at getting model into production
28 | - `DataBlock` API
29 | ```python
30 | bears = DataBlock(
31 | blocks=(ImageBlock, CategoryBlock),
32 | get_items=get_image_files,
33 | splitter=RandomSplitter(valid_pct=0.3, seed=42),
34 | get_y=parent_label,
35 | item_tfms=Resize(128))
36 | ```
37 |
38 | ## Data Augmentation
39 | - default: it grabs the center of image
40 | - `.new`: creates a new DataBlock object
41 | ```python
42 | bears = bears.new(item_tfms=Resize(128, ResizeMethod.Squish))
43 | dls = bears.dataloaders(path)
44 | dls.valid.show_batch(max_n=4, nrows=1)
45 | ```
46 | - `ResizeMethod.Pad` adds black bars to side, avoids squishing image
47 | - `pad_mode='zeros'` can have `pad_mode='reflect'`
48 | `bears = bears.new(item_tfms=Resize(128, ResizeMethod.Pad, pad_mode='zeros'))`
49 | - `ResizeMethod.Squish` most efficient
50 | - `tem_tfms=RandomResizedCrop` most popular one; `min_scale=0.3` pick 30% of pixels of orig image each time
51 | `bears = bears.new(item_tfms=RandomResizedCrop(128, min_scale=0.3))`
52 |
53 | - Item transforms vs Batch transforms
54 | ```python
55 | bears = bears.new(item_tfms=Resize(128), batch_tfms=aug_transforms(mult=2))
56 | dls = bears.dataloaders(path)
57 | dls.train.show_batch(max_n=8, nrows=2, unique=True)
58 | ```
59 | - fastai will avoid doing data augmentation on the validation dataset
60 | - show name of cateogories:
61 | ```python
62 | learn_inf.dls.vocab
63 | ```
64 | ```bash
65 | (#3) ['black','grizzly','teddy']
66 | ```
67 |
68 | ## Making a GUI; web app for predictions (25:00)
69 | - `!pip install voila`
70 | - can use binder for making it publicly available
71 |
72 | ### *out of domain* data (domain shift)
73 |
74 | ### Python broadcasting
75 |
76 | ## MNIST: baseline + calculating gradient
77 | - notebook: https://github.com/fastai/fastbook/blob/master/04_mnist_basics.ipynb
78 |
79 |
80 |
81 |
--------------------------------------------------------------------------------
/courses/v4-dl1/lesson_04.md:
--------------------------------------------------------------------------------
1 | # Lesson 4
2 | - Live: 14-Apr-2020
3 | - Time: 6:30 to 9pm PST (9:30pm to midnight EST)
4 | - finished watching 21-apr-2020
5 |
6 | ## Homework
7 | - [Lesson 4 Homework] ()
8 | - [ ]
9 | - [ ]
10 |
11 | ## Notes
12 | `pets1.summary(path/"images"` helps with debugging
13 |
--------------------------------------------------------------------------------
/courses/v4-dl1/lesson_05_ethics.md:
--------------------------------------------------------------------------------
1 | # Lesson 5: Ethics for Data Science
2 | - Live: 07-Apr-2020
3 | - Time: 6:30 to 9pm PST (9:30pm to midnight EST)
4 | - Lesson 5 thread: https://forums.fast.ai/t/lesson-5-official-topic/68039
5 |
6 | NOTE: finished watching lecture on 14-Apr-2020
7 |
8 | ## Ethics Course
9 | - full USF ethics course by Rachel Thomas will be released before July 2020
10 |
11 |
12 | ## Case Study
13 | 1. Feedback loop
14 | - data
15 | - recommendation systems: they are determining what user is exposed to, and what content will become popular
16 | - Google promoting damaging conspiracy theories
17 | -
18 | 2. Software to determine poor people's health benefits
19 | - bug in software cut coverage for people with cerebral palsy
20 | - system implemented with no way to identify and address mistakes
21 | 3. Latanya Sweeeney
22 | - Pd.D
23 | - when googled her name, would see ads for criminal records
24 | - disproportionate African names were getting ads for criminal records
25 | - bias in advertising shows up a lot
26 |
27 |
28 | Resources:
29 | - [Georgetown Law: Center for Privacy and Technology](https://forums.fast.ai/t/lesson-5-official-topic/68039)
30 | - How to Fact Check: https://www.notion.so/Check-Please-Starter-Course-ae34d043575e42828dc2964437ea4eed
31 | - Maciej Ceglowski
32 | https://en.wikipedia.org/wiki/Maciej_Ceg%C5%82owski
33 |
34 |
35 |
36 |
37 |
--------------------------------------------------------------------------------
/courses/v4-dl1/lesson_06.md:
--------------------------------------------------------------------------------
1 | # Lesson 6
2 | - Live: 21-Apr-2020
3 | - Time: 6:30 to 9pm PST (9:30pm to midnight EST)
4 | - 10:30pm 185 watching
5 |
6 | ## Homework
7 | - [Lesson 4] ()
8 | - [ ]
9 | - [ ]
10 |
11 | ## Notes
12 | - [fastai/fastbook](https://github.com/fastai/fastbook)
13 | - full notebooks that contain text of O'Reilly book
14 | - [fastai/course-v4](https://github.com/fastai/course-v4)
15 | - same notebooks with prose stripped away
16 | - do practice coding here
17 |
18 | ## Topics
19 | - pet breeds; multiple classification
20 | - good learning rate finder questions and answers
21 |
22 | ## Computer Vision Problem: Pet Breed
23 |
24 | ### Discriminative Learning Rates
25 | - Notebook: https://github.com/fastai/course-v4/blob/master/nbs/05_pet_breeds.ipynb
26 | - unfreezing and transfer learning
27 | >what we would
28 | really like is to have a small learning
29 | rate for the early layers and a bigger
30 | learning rate for the later layers
31 | - slicing
32 | ```python
33 | learn.fit_one_cycle(6, lr_max=1e-5)
34 | ```
35 | #### our own version of fine-tuning here
36 | ```python
37 | learn = cnn_learner(dls, resnet34, metrics=error_rate)
38 | learn.fit_one_cycle(3, 3e-3)
39 | learn.unfreeze()
40 | learn.fit_one_cycle(12, lr_max=slice(1e-6,1e-4))
41 | ```
42 | #### how do you make it better now?
43 | - 5.4% error on 37 categories is pretty good (for pet breed data)
44 | - can use a deeper architecture
45 | - `Cuda runtime error: out of memory` is out of memory on your GPU
46 | - restart notebook
47 | - can use less precise numbers to save memory
48 | ```python
49 | from fastai2.callback.fp16 import *
50 | learn = cnn_learner(dls, resnet50, metrics=error_rate).to_fp16()
51 | learn.fine_tune(6, freeze_epochs=3)
52 | ```
53 | - increasing number of layers (or more complex architecture) doesn't always improve the error rate
54 | - requires experimentation
55 | - trick: use small models for as long as possible (to do cleaning and testing); then try bigger models because they will take longer
56 | - "always assume you can do better [with error rate] because you never know"
57 |
58 | ## Multi-label Classification
59 | - notebook: https://github.com/fastai/course-v4/blob/master/nbs/06_multicat.ipynb
60 | - determining multiple labels per image (Ex: contains car, bike person, etc)
61 | - dataset: PASCAL
62 | - http://host.robots.ox.ac.uk/pascal/VOC/
63 | - https://gluon-cv.mxnet.io/build/examples_datasets/pascal_voc.html
64 |
65 |
66 | ## Example
67 | ```python
68 | a = list(enumerate(string.ascii_lowercase))
69 | a[0], len(a)
70 | ```
71 | ```bash
72 | ((0, 'a'), 26)
73 | ```
74 |
75 | ## creating: **Datasets**, **Data Block** and **DataLoaders**
76 | - serialization: means saving something
77 | - best to use functions over lambda (because in Python, it doesn't save object created using lambda)
78 | - one-hot encoding for multiple labels
79 | -
80 | ```python
81 | def splitter(df):
82 | train = df.index[~df['is_valid']].tolist()
83 | valid = df.index[df['is_valid']].tolist()
84 | return train,valid
85 |
86 | dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
87 | splitter=splitter,
88 | get_x=get_x,
89 | get_y=get_y)
90 |
91 | dsets = dblock.datasets(df)
92 | dsets.train[0]
93 | ```
94 | ## path
95 | ```python
96 | Path.BASE_PATH = None
97 | path
98 | ```
99 | ```python
100 | (path/'01').ls()
101 | ```
102 | ### Important to know
103 | 1. create a learner
104 | 2. grab a batch of data
105 | 3. pass it to the model
106 | 4. see the shape; recognize why the shape is
107 | ```python
108 | learn = cnn_learner(dls, resnet18)
109 | ```
110 | ```python
111 | x,y = dls.train.one_batch()
112 | activs = learn.model(x)
113 | activs.shape
114 | ```
115 | >torch.Size([64, 20])
116 |
117 | ### Binary cross entropy
118 |
119 | ## Note
120 | - **Accuracy** only works for single label datasets, like MNIST
121 |
122 | ## Collaborative Filtering Deep Dive
123 | - applications: what kind of other diagnosis, figure out where someone will click next
124 | - anything where you are trying to learn from past behavior
125 | -
126 |
127 |
128 |
--------------------------------------------------------------------------------
/courses/v4-dl1/lesson_07.md:
--------------------------------------------------------------------------------
1 | # Lesson 7
2 |
3 | - Live: 28-Apr-2020
4 | - Time: 6:30 to 9pm PST (9:30pm to midnight EST)
5 | - 9:30pm 101 watching
6 | - 10:30pm 177 watching
7 |
8 | ## Topics
9 | - weight decay, regularization
10 | - embedding
11 | - PyTorch code
12 | - Tablular
13 |
14 | ## Notebook
15 | - Collaborative filtering: https://github.com/fastai/course-v4/blob/5a9fca472f55a8186e62a21111deab119001e0df/nbs/08_collab.ipynb
16 | - Tabular: https://github.com/fastai/course-v4/blob/5a9fca472f55a8186e62a21111deab119001e0df/nbs/09_tabular.ipynb
17 |
18 | ## Regularization
19 | - use for **overfitting**
20 | - **weight decay** is also known as **L2 Regularization**
21 | - in general, big coefficients are going to cause a big swing in the loss
22 |
23 | ## Embeddings
24 | - index lookup into an array
25 | - computational shortcut to one hot encoding
26 | - cardinality: number of levels of a variable
27 |
28 | ## Dataset
29 | - Blue Book for Bulldozers Kaggle Competition
30 |
31 | ## Random Forests: Bagging
32 | - to improve the random forests, use **bagging**
33 | - randomly select subsets of data and train it
34 | - then average the different versions of the models
35 | - advantage to this is that these models have errors which are not correlated to each other
36 |
37 | Here is the procedure that Breiman is proposing:
38 | 1. Randomly choose a subset of the rows of your data (i.e., "bootstrap replicates of your learning set")
39 | 2. Train a model using this subset
40 | 3. Save that model, and then return to step one a few times
41 | 4. This will give you a number of trained models. To make a prediction, predict using all of the models, and then take the average of each of those model's predictions.
42 |
43 | ## BAGGING
44 | it means that we can improve the accuracy of nearly any kind of machine learning algorithm by training it multiple times, each time on a different random subset of data, and average its predictions.
45 |
46 | ## Leo Breiman: Random Forest
47 | In 2001 Leo Breiman went on to demonstrate that this approach to building models, when applied to decision tree building algorithms, was particularly powerful. He went even further than just randomly choosing rows for each model's training, but also randomly selected from a subset of columns when choosing each split in each decision tree. He called this method the random forest
48 |
49 | ## OOB: out-of-box
50 | - review: remove each variable and how it impacts the R MSE
51 |
52 | ## Partial Dependence Plot
53 |
54 | ## Boosting
55 |
--------------------------------------------------------------------------------
/courses/v4-dl1/lesson_08_NLP.md:
--------------------------------------------------------------------------------
1 | # Lesson 8
2 |
3 | - Live: 05-May-2020
4 | - Time: 6:30 to 9pm PST (9:30pm to midnight EST)
5 | - 9:30pm 160 watching
6 |
7 | ## Topics
8 | - NLP
9 |
10 | ## Notebook
11 | - [10_nlp](https://github.com/fastai/fastbook/blob/master/10_nlp.ipynb)
12 | - [12_nlp_dive](https://github.com/fastai/fastbook/blob/master/12_nlp_dive.ipynb)
13 |
14 | ## AR and TAR Regularization
15 |
--------------------------------------------------------------------------------
/courses/v4-dl1/paperspace.md:
--------------------------------------------------------------------------------
1 |
2 | ## Paperspace
3 | - fastai: [Getting Started with Gradient](https://course.fast.ai/start_gradient.html)
4 | - fastai: v4 [Paperspace (free, paid options)](https://forums.fast.ai/t/platform-paperspace-free-paid-options/65515)
5 |
6 | ### My steps on Paperspace
7 | 1. notebook: https://www.paperspace.com/telmjtws3/notebook/prjrrhy56
8 | 2. Open terminal, via Jupyter Notebook
9 | - type `bash` to get a regular terminal (autocomplete, etc)
10 | - `pip install fastai2 fastcore --upgrade`
11 | - `cd course-v4`
12 | - `git pull`
13 |
14 | ### Back to work
15 | 1. Log in: https://www.paperspace.com
16 | 2. To "notebooks" or "workspace": https://www.paperspace.com/console/notebooks
17 | 3. Actions / Start
18 | 4. Actions / Open
19 | 5. New / terminal
20 |
21 | ## updating packages on Paperspace
22 | ```bash
23 | apt-get update
24 | ```
25 | ```bash
26 | apt-get install libsndfile1-dev
27 | ```
28 |
29 | ## unzip files
30 | ```
31 | 10 cd storage
32 | 11 ls
33 | 12 cd fowl_data/
34 | 13 ls
35 | 14 unzip Test.zip
36 | 15 pwd
37 | 16 clear
38 | 17 history
39 | ```
40 | ```bash
41 | root@6c4a45f4bab8:/notebooks/storage/fowl_data# unzip -q Train.zip
42 | ```
43 |
44 |
45 | ## Adding a data folder and data
46 |
47 | 6. use bash shell: `# bash`
48 | 7. going to `storage` folder
49 | ```bash
50 | root@51ae9bcde285:/notebooks/storage# pwd
51 | /notebooks/storage
52 | ```
53 | 8. can `mkdir` here to add datasets
54 | ```bash
55 | # bash
56 | root@51ae9bcde285:/notebooks# ls
57 | course-v4 datasets storage
58 | root@51ae9bcde285:/notebooks# cd storage
59 | root@51ae9bcde285:/notebooks/storage# ls
60 | archive data models
61 | root@51ae9bcde285:/notebooks/storage# mkdir fowl
62 | ```
63 | 9. go to that directory
64 | ```bash
65 | root@51ae9bcde285:/notebooks/storage# cd fowl
66 | root@51ae9bcde285:/notebooks/storage/fowl# ls
67 | root@51ae9bcde285:/notebooks/storage/fowl# pwd
68 | /notebooks/storage/fowl
69 | ```
70 | Tried: `wget` and `curl` but urls were not working
71 | Zindi Fowl competition: https://zindi.africa/competitions/fowl-escapades/data
72 |
73 | 10. Go to Jupyter notebook in Paperspace
74 | - navigate to `storage` folder
75 | - use **upload** to upload files
76 |
77 | ## Data
78 | ```bash
79 | root@3b9d9da72ac6:/notebooks/storage/fowl_data# pwd
80 | /notebooks/storage/fowl_data
81 | root@3b9d9da72ac6:/notebooks/storage/fowl_data# ls -alt
82 | total 2104240
83 | drwxr-xr-x 6 root root 4096 Mar 31 19:48 ..
84 | -rw-r--r-- 1 root root 1407124233 Mar 31 16:26 Train.zip
85 | -rw-r--r-- 1 root root 743620991 Mar 31 16:12 Test.zip
86 | drwxr-xr-x 3 root root 4096 Mar 31 15:12 .
87 | drwxr-xr-x 2 root root 4096 Mar 31 15:12 .ipynb_checkpoints
88 | -rw-r--r-- 1 root root 3815649 Mar 31 15:12 StarterNotebook.ipynb
89 | -rw-r--r-- 1 root root 2391 Mar 31 15:11 authors.csv
90 | -rw-r--r-- 1 root root 80027 Mar 31 15:11 SampleSubmission.csv
91 | -rw-r--r-- 1 root root 48594 Mar 31 15:11 Train.csv
92 | -rw-r--r-- 1 root root 13679 Mar 31 15:11 Test.csv
93 | root@3b9d9da72ac6:/notebooks/storage/fowl_data#
94 | ```
95 | ### rename directories
96 | ```bash
97 | root@6c4a45f4bab8:/notebooks/storage/fowl_data# mv Train/ train/
98 | root@6c4a45f4bab8:/notebooks/storage/fowl_data# mv Test/ test/
99 | root@6c4a45f4bab8:/notebooks/storage/fowl_data#
100 | ```
101 | ```bash
102 | conda install -c conda-forge ffmpeg
103 | ```
104 |
105 | ```bash
106 | apt-get install htop
107 | ```
108 | ```bash
109 | htop
110 | ```
111 |
112 |
113 |
--------------------------------------------------------------------------------
/fastai_dl_course_v1.md:
--------------------------------------------------------------------------------
1 | # [Fastai](http://www.fast.ai) Deep Learning Course: Version 1
2 |
3 | ## Dates of Course (Version 1)
4 | - Deep Learning (Oct 2016 to Apr 2017)
5 | - Part 1: Oct - Dec 2016
6 | - Part 2: Mar - May 2017
7 |
8 | ## Deep Learing Coursework (Version 1)
9 | * [Part 1 v1](http://course17.fast.ai)
10 | * [Part 2 v1](http://course17.fast.ai/part2.html)
11 |
12 |
13 | ## Other
14 | - [fastai v1: Launch Announcement](http://www.fast.ai/2018/10/02/fastai-ai/)
15 | - [fastai_old (on GitHub)](https://github.com/fastai/fastai_old) (old version)
16 |
--------------------------------------------------------------------------------
/fastai_dl_course_v2.md:
--------------------------------------------------------------------------------
1 | # [Fastai](http://www.fast.ai) Deep Learning Course: Version 2
2 |
3 | ## Dates of Course
4 | - Deep Learning Version 2 (Oct 2017 to Apr 2018)
5 | - Part 1: Oct - Dec 2017
6 | - Part 2: Mar - May 2018
7 |
8 | ## Forums
9 | * [Discourse: part1-v2](http://forums.fast.ai/c/part1-v2)
10 | * [Discourse: part1-v2 beginner](http://forums.fast.ai/c/part1v2-beg)
11 | * [Discourse: part2-v2](http://forums.fast.ai/c/part2-v2)
12 |
13 |
14 | ---
15 | ## Deep Learning Coursework (Version 2)
16 | * [Part 1 v2](http://course.fast.ai) (released Jan 2018)
17 | * [Part 2 v2](http://www.fast.ai/2018/05/07/part2-launch/) (released May 2018)
18 |
19 | ### [Deep Learning Part 1](http://forums.fast.ai/t/welcome-to-part-1-v2/5787)
20 | * [Lesson 1 wiki](http://forums.fast.ai/t/wiki-lesson-1/9398) Image Recognition
21 | * [Lesson 2 wiki](http://forums.fast.ai/t/wiki-lesson-2/9399) CNNs
22 | * [Lesson 3 wiki](http://forums.fast.ai/t/wiki-lesson-3/9401) Overfitting
23 | * [Lesson 4 wiki](http://forums.fast.ai/t/wiki-lesson-4/9402) Embeddings
24 | * [Lesson 5 wiki](http://forums.fast.ai/t/wiki-lesson-5/9403) NLP
25 | * [Lesson 6 wiki](http://forums.fast.ai/t/wiki-lesson-6/9404) RNNs
26 | * [Lesson 7 wiki](http://forums.fast.ai/t/wiki-lesson-7/9405) CNN Architecture
27 |
28 | ### [Deep Learning, Part 2](http://www.fast.ai/2018/05/07/part2-launch/)
29 | * [Lesson 8](http://course.fast.ai/lessons/lesson8.html) Object Detection
30 | * [Lesson 9](http://course.fast.ai/lessons/lesson9.html) Single Shot Multibox Detector (SSD)
31 | * [Lesson 10](http://course.fast.ai/lessons/lesson10.html) NLP Classification and Translation
32 | * [Lesson 11](http://course.fast.ai/lessons/lesson11.html) Neural Translation
33 | * [Lesson 12](http://course.fast.ai/lessons/lesson12.html) Generative Adverarial Networks (GANS)
34 | * [Lesson 13](http://course.fast.ai/lessons/lesson13.html) Image Enhancement
35 | * [Lesson 14](http://course.fast.ai/lessons/lesson14.html) Super Resolution; Image Segmentation with UNET
36 |
37 |
38 | ### Deep Learning Lesson Timelines
39 | * http://forums.fast.ai/t/part-1-v2-complete-collection-of-video-timelines/11183
40 |
41 | ---
42 |
43 | ### [Deep Learning 1: My Lesson Notes](courses/dl1-v2/)
44 | * Lesson 1
45 | - [Lesson 1a: Course Intro](courses/dl1-v2/lesson_1a_course_intro.md)
46 | - [Lesson 1b: CNN and resnet Architecture](courses/dl1-v2/lesson_1b_cnn_tools.md)
47 | * [Lesson 2: resnet34, resnext50](courses/dl1-v2/lesson_2_resnet34_resnext50.md) CNNs
48 | * [Lesson 3: CNN Image Intro](courses/dl1-v2/lesson_3_x.md) Overfitting
49 | * [Lesson 4: Embeddings](courses/dl1-v2/lesson_4_x.md) Embeddings
50 | * [Lesson 5: ](courses/dl1-v2/lesson_5_x.md) NLP
51 | * [Lesson 6: ](courses/dl1-v2/lesson_6_x.md) RNNs
52 | * [Lesson 7: ](courses/dl1-v2/lesson_7_x.md) CNN Architecture
53 |
54 | ---
55 | ### [Deep Learning 2: My Lesson Notes](courses/dl2-v2/)
56 | * [Lesson 8](courses/dl2-v2/lesson_08.md) Object Detection
57 | * [Lesson 9](courses/dl2-v2/lesson_09.md) Multi-object Detection
58 | * Lesson 10 NLP Classification and Translation
59 | - [Lesson 10_1](courses/dl2-v2/lesson_10_1.md)
60 | - [Lesson 10_2](courses/dl2-v2/lesson_10_2.md)
61 | * Lesson 11 Neural Translation
62 | - [Lesson 11_1](courses/dl2-v2/lesson_11_1.md)
63 | - [Lesson 11_2](courses/dl2-v2/lesson_11_2.md)
64 | * [Lesson 12] ()
65 | * [Lesson 13] ()
66 | * [Lesson 14] ()
67 |
68 | ---
69 |
70 | ## Platforms for Using fastai (GPU required) v2
71 | [Summary of Cloud GPU Vendors (with billing)](https://github.com/binga/cloud-gpus)
72 | * [Paperspace setup](tools/paperspace.md)
73 | * [AWS AMI GPU Setup](tools/aws_ami_gpu_setup.md)
74 | - [How to setup fastai in an Amazon AWS region without fastai AMI like in Europe](https://medium.com/@pierre_guillou/guide-install-fastai-in-any-aws-region-8f4fe29132e5)
75 | * [Crestle](tools/crestle_run.md)
76 | * [Google Cloud GPU Setup for fastai](https://medium.com/google-cloud/set-up-google-cloud-gpu-for-fast-ai-45a77fa0cb48)
77 | * [Set up personal deep learning box (home computer)](tools/setup_personal_dl_box.md)
78 | * [Microsoft Azure](https://medium.com/@manikantayadunanda/setting-up-deeplearning-machine-and-fast-ai-on-azure-a22eb6bd6429)
79 | * [Running fast.ai notebooks with Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/running-fast-ai-notebooks-with-amazon-sagemaker/)
80 | * Docker
81 | - [Paperspace Docker Container](https://hub.docker.com/r/paperspace/fastai/)
82 | - [Fastai and Docker](https://nji-syd.github.io/2018/03/26/up-and-running-with-fast-ai-and-docker/)
83 | * [manual: bash script for setup](http://files.fast.ai/setup/paperspace)
84 | - the CUDA drivers
85 | - Anaconda (special Python distribution)
86 | - Python libraries
87 | - fastai library
88 | - courses
89 | - data
90 | * Other
91 | - [FloydHub](https://www.floydhub.com)
92 | - https://github.com/YuelongGuo/floydhub.fast.ai
93 | - [Google Colaboratory](https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/welcome.ipynb)
94 | - [Salamander](http://forums.fast.ai/t/setup-on-salamander-cheaper-easier-than-aws/25427)
95 |
96 |
97 |
--------------------------------------------------------------------------------
/fastai_dl_course_v3.md:
--------------------------------------------------------------------------------
1 | # [Fastai](http://www.fast.ai) Deep Learning Course: Version 3
2 |
3 | ## Part 1: Dates of Course
4 | - [Application Announcement](http://forums.fast.ai/t/fast-ai-live-the-new-version-of-the-international-fellowship/22825): CLOSED
5 | - Deep Learning Version 3 (Oct 2018 to Dec 2018)
6 | - Class is at the following time:
7 | - **6:30pm to 9:00pm PST** (Pacific Standard Time)
8 | - 9:30pm to midnight EST
9 | - Class is on the following days:
10 | - Lecture 1: Mon Oct 22
11 | - Lecture 2: Tue Oct 30
12 | - Lecture 3: Thu Nov 8
13 | - Lecture 4: Tue Nov 13
14 | - Lecture 5: Mon Nov 19
15 | - Lecture 6: Tue Nov 27
16 | - Lecture 7: Wed Dec 12
17 |
18 | ## Forums (Discourse)
19 | * [part1-v3](http://forums.fast.ai/c/part1-v3)
20 | * [part1-v3-adv](https://forums.fast.ai/c/part1-v3/part1-v3-adv)
21 |
22 | ---
23 | ## Coursework (fastai website)
24 | * Course release announcement: https://www.fast.ai/2019/01/24/course-v3/
25 | * Videos: [Part 1 v3](https://course.fast.ai/videos) (released 24-Jan-2019)
26 | * Course docs: https://course.fast.ai
27 |
28 | ---
29 |
30 | ### [Deep Learning: My Lesson Notes](courses/dl1-v3/)
31 | * [Lesson 1](courses/v3-dl1/lesson_1_lecture.md) Image Recognition
32 | * [Lesson 2] (courses/v3-dl1/lesson_2_1.md)
33 | * [Lesson 3] (courses/v3-dl1/lesson_3_1.md)
34 | * [Lesson 4] (courses/v3-dl1/lesson_4_1.md)
35 | * [Lesson 5] (courses/v3-dl1/lesson_5_1.md)
36 | * [Lesson 6] (courses/v3-dl1/lesson_6_1.md)
37 | * [Lesson 7] (courses/v3-dl1/lesson_7_1.md)
38 |
39 |
40 |
41 |
--------------------------------------------------------------------------------
/fastai_ml_course.md:
--------------------------------------------------------------------------------
1 | # [Fastai](http://www.fast.ai) Machine Learning [(ML)](http://www.fast.ai/2018/09/26/ml-launch/) Course
2 |
3 | ## Dates of Course
4 | - Machine Learning (Fall 2017)
5 |
6 | ## Forums
7 | - [ML Forum](http://forums.fast.ai/t/another-treat-early-access-to-intro-to-machine-learning-videos/6826)
8 | - [ML video timelines](http://forums.fast.ai/t/another-treat-early-access-to-intro-to-machine-learning-videos/6826/321?u=ericpb)
9 |
10 | ## [Intro to Machine Learning: My Lesson Notes](courses/ml1/)
11 | * [Lesson 1: Random Forests Part 1](courses/ml1/lesson_01.md)
12 | * [Lesson 2: Random Forests Part 2](courses/ml1/lesson_02.md)
13 | * [Lesson 3: Preprocessing Data](courses/ml1/lesson_03.md)
14 | * [Lesson 4: RF Hyperparameters & Feature Importance](courses/ml1/lesson_04.md)
15 | * [Lesson 5: ](courses/ml1/lesson_05.md) * in-progress *
16 | * [Lesson 6: ] (courses/ml1/)
17 | * [Lesson 7: ] (courses/ml1/)
18 | * [Lesson 8: ] (courses/ml1/)
19 | * [Lesson 9: ] (courses/ml1/)
20 | * [Lesson 10: ] (courses/ml1/)
21 | * [Lesson 11: ] (courses/ml1/)
22 | * [Lesson 12: ] (courses/ml1/)
23 |
--------------------------------------------------------------------------------
/googlefc30e18b4a9edaa2.html:
--------------------------------------------------------------------------------
1 | google-site-verification: googlefc30e18b4a9edaa2.html
--------------------------------------------------------------------------------
/helpful_linux_commands.md:
--------------------------------------------------------------------------------
1 | # Helpful Linux Commands
2 |
3 |
4 | ### get list of Jupyter Notebook sessions
5 | ```
6 | jupyter notebook list
7 | ```
8 |
9 | ### list CPU GPU memory usage:
10 | ```
11 | htop
12 | ```
13 |
14 | ### see GPU usage
15 | ```bash
16 | nvidia smi
17 | ```
18 | ```bash
19 | nvidia-smi dmon
20 | ```
21 | ```bash
22 | watch -n 1 nvidia-smi
23 | ```
24 |
25 | ### list number of lines in a file
26 | `wc -l file.csv`
27 |
28 |
29 |
30 |
31 |
--------------------------------------------------------------------------------
/images/chrome_curlwget.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/chrome_curlwget.png
--------------------------------------------------------------------------------
/images/dl_libraries.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/dl_libraries.png
--------------------------------------------------------------------------------
/images/image_downloader.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/image_downloader.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson08_lr_find.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson08_lr_find.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_bbox.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_bbox.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_dl_box.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_dl_box.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_embeddings.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_embeddings.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_learning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_learning.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_learning2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_learning2.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_lr_find2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_lr_find2.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_matplotlib.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_matplotlib.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_md.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_md.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_motivation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_motivation.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_nb_pascal.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_nb_pascal.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_obj_det.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_obj_det.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_opps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_opps.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_paper.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_paper.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_part1_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_part1_2.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_part2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_part2.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_stage1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_stage1.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_step1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_step1.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_transfer_learning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_transfer_learning.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_visualize.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_visualize.png
--------------------------------------------------------------------------------
/images/lesson_08/lesson8_x.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_08/lesson8_x.png
--------------------------------------------------------------------------------
/images/lesson_09/.keep:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/images/lesson_09/lesson9_archit.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_09/lesson9_archit.png
--------------------------------------------------------------------------------
/images/lesson_09/lesson9_bbox.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_09/lesson9_bbox.png
--------------------------------------------------------------------------------
/images/lesson_09/lesson9_data_loader.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_09/lesson9_data_loader.png
--------------------------------------------------------------------------------
/images/lesson_09/lesson9_know_these1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_09/lesson9_know_these1.png
--------------------------------------------------------------------------------
/images/lesson_09/lesson9_know_these2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_09/lesson9_know_these2.png
--------------------------------------------------------------------------------
/images/lesson_11/.keep:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/images/lesson_11/lesson_11_charloop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_11/lesson_11_charloop.png
--------------------------------------------------------------------------------
/images/lesson_11/lesson_11_nt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_11/lesson_11_nt.png
--------------------------------------------------------------------------------
/images/lesson_11/lesson_11_rnn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_11/lesson_11_rnn.png
--------------------------------------------------------------------------------
/images/lesson_11/lesson_11_rnn2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_11/lesson_11_rnn2.png
--------------------------------------------------------------------------------
/images/lesson_11/lesson_11_rnn_stacked.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_11/lesson_11_rnn_stacked.png
--------------------------------------------------------------------------------
/images/lesson_11/lesson_11_rnn_stacked2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/lesson_11/lesson_11_rnn_stacked2.png
--------------------------------------------------------------------------------
/images/ncm_gephi.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/ncm_gephi.jpg
--------------------------------------------------------------------------------
/images/paperspace.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/paperspace.png
--------------------------------------------------------------------------------
/images/paperspace_fastai.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/paperspace_fastai.png
--------------------------------------------------------------------------------
/images/paperspace_jupyter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/paperspace_jupyter.png
--------------------------------------------------------------------------------
/images/pretrained_networks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/pretrained_networks.png
--------------------------------------------------------------------------------
/images/softmax.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/softmax.png
--------------------------------------------------------------------------------
/images/tmux_start.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/tmux_start.png
--------------------------------------------------------------------------------
/images/tmux_summary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/tmux_summary.png
--------------------------------------------------------------------------------
/images/triple_backticks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reshamas/fastai_deeplearn_part1/21e30ad3a6ec379edfb0feee3920bb170701fe47/images/triple_backticks.png
--------------------------------------------------------------------------------
/notes/competitions.md:
--------------------------------------------------------------------------------
1 | # Competitions
2 |
3 | * [Kaggle](https://www.kaggle.com/competitions)
4 | * [crowdAI](https://www.crowdai.org)
5 | * [Space Apps Challenges](https://2017.spaceappschallenge.org/challenges/)
6 | * [Datahacks](https://datahack.analyticsvidhya.com/contest/practice-problem-age-detection/)
7 | * [Congressional Data Competition](https://www.challenge.gov/list/)
8 | * [GECCO Competitions](http://gecco-2018.sigevo.org/index.html/tiki-index.php?page=Competitions)
9 | * [KD Nuggets listing](https://www.kdnuggets.com/competitions/)
10 | * [International Data Analytics Olympiad](http://idao.world/)
11 | * [SpaceNet Competition: Road Detection and Routing](https://www.iqt.org/cosmiq-works-radiant-solutions-and-nvidia-announce-third-spacenettm-competition-road-detection-and-routing-challenge/#new_tab)
12 |
13 | ## Conference Competitions / Tasks
14 | * [NIPS Competition](https://nips.cc/Conferences/2018/CallForCompetitions)
15 | * [American Statistical Association Data Expo](http://community.amstat.org/stat-computing/data-expo/data-expo-2018)
16 | * [SemEval-2017 Task 9 (NLP)](http://alt.qcri.org/semeval2017/task9/)
17 | * [2016 Shared Task: Challenges in NLP for Clinical Data](https://www.i2b2.org/NLP/RDoCforPsychiatry/PreviousChallenges.php)
18 | * [Noisy User Generated Task (WNUT) Shared Tasks](http://noisy-text.github.io/2017/)
19 | * [SemEval-2017 Task 1 (Semantic Textual Similarity)](http://alt.qcri.org/semeval2017/task1/)
20 |
21 |
22 | ## Students
23 | * [Adobe Analytics Challenge](http://adobeanalyticschallenge.com/) (for university students)
24 | * [ASEAN Data Mining Competition](https://www.youthop.com/competitions/asean-date-science-competition-2018), Asia
25 | * [Data Mining Cup - International Student Competition](https://www.data-mining-cup.com/)
26 |
27 |
28 |
29 |
--------------------------------------------------------------------------------
/notes/deep_learning_libraries.md:
--------------------------------------------------------------------------------
1 | # Deep Learning Libraries
2 |
3 | * TensorFlow (Google)
4 | * Keras (Google); is an open source neural network library written in Python. It is capable of running on top of MXNet, Deeplearning4j, Tensorflow, CNTK or Theano
5 | * Caffe (Berkeley)
6 | * Theano (Pascal Lamblin and Yoshua Bengio), library retired in Fall 2017
7 | * PyTorch (Facebook)
8 | * Sonnet (Google)
9 | * MXNet (Amazon)
10 | * Torch (Lua-based)
11 | * Microsoft Cognitive Toolkit (CNTK)
12 | * DLIB (C++ library)
13 | * Caffe2 (Facebook)
14 | * Chainer (Preferred Networks, Japan)
15 | * Paddlepaddle (Baidu, China)
16 | * DeepLearning4J (Skymind in SF); Java, can use with Apache Hadoop and Apache Spark
17 | * Lasagne ( lightweight library used to construct and train networks in Theano)
18 |
19 | And more:
20 | * fastai (USF)
21 | * Pyro (Uber)
22 |
23 |
24 | 
25 |
26 |
27 | ---
28 |
29 | ## References
30 | - [Full Ranking List](https://github.com/thedataincubator/data-science-blogs/blob/master/output/DL_libraries_final_Rankings.csv)
31 | - [Ranking Popular Deep Learning Libraries for Data Science](https://blog.thedataincubator.com/2017/10/ranking-popular-deep-learning-libraries-for-data-science/) Oct 2017
32 |
--------------------------------------------------------------------------------
/notes/imagenet.md:
--------------------------------------------------------------------------------
1 | # [ImageNet](http://www.image-net.org)
2 |
3 | First step is to use a pre-trained model.
4 |
5 | ### Pre-trained Model:
6 | - Someone has already come along, downloaded millions of images off of the internet
7 | - Built a deep learning model that has learned to recognize the contents of those images
8 | - Nearly always, with these pre-trained models, they use ImageNet dataset
9 | - ImageNet has most respected annual computer vision competition (winners are Google, Microsoft)
10 | - 32,000+ categories
11 | - Folks that create these pre-trained networks basically download large subset of images from ImageNet
12 |
13 | #### Shortcomings of ImageNet Dataset
14 | ImageNet is carefully curated so that photo has one main item in it
15 |
16 | ### Using ImageNet
17 | - For us, this is a suitable dataset
18 | - Each year, the winner make source code / weights available
19 |
20 |
21 | ## Architectures: Winners of ImageNet
22 | - **SENet**, 2017 (Squeeze-and-Excitation Networks)
23 | - **ResNet**, 2016 (Microsoft)
24 | - **GoogLeNet**, 2015, Inception module
25 | - **VGG Net**, 2014 (Oxford Univ group)
26 | - Last of the really powerful simple architectures
27 | - VGG’s simpler approach is not much less accurate than others
28 | - For teaching purposes, it is pretty State of the art AND easy to understand
29 | - Excellent for the problems that differ (like satellite imagery vs simple photos)
30 | - **ZF Net**, 2013 (Matthew Zeiler and Rob Fergus from NYU)
31 | - **AlexNet**, 2012 (Stanford)
32 |
33 | ## Pre-trained Models
34 | - source: https://pytorch.org/docs/stable/torchvision/models.html
35 |
36 | 
37 |
38 |
39 | ## Reference
40 | [The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3)](https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html)
41 |
42 |
43 |
44 |
45 |
--------------------------------------------------------------------------------
/notes/loss_functions.md:
--------------------------------------------------------------------------------
1 | # Loss Functions
2 |
3 | ## Cross-entropy Loss
4 |
5 | https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
6 |
7 | Cross-Entropy
8 | Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.
9 |
10 | _images/cross_entropy.png
11 |
12 | The graph above shows the range of possible loss values given a true observation (isDog = 1). As the predicted probability approaches 1, log loss slowly decreases. As the predicted probability decreases, however, the log loss increases rapidly. Log loss penalizes both types of errors, but especially those predictions that are confident and wrong!
13 |
14 | Cross-entropy and log loss are slightly different depending on context, but in machine learning when calculating error rates between 0 and 1 they resolve to the same thing.
15 |
16 |
--------------------------------------------------------------------------------
/notes/nlp_data.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | [The wikitext long term dependency language modeling dataset](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/)
5 |
--------------------------------------------------------------------------------
/notes/nlp_terms.md:
--------------------------------------------------------------------------------
1 | # NLP Terms
2 | ```
3 | POS = part of speech
4 | NP-chunking = noun phrase chunking
5 |
6 | DT = determiner
7 | JJ = adjectives
8 | NN = noun
9 | VBD = verb
10 | ```
11 |
12 | ### BLEU (bilingual evaluation understudy)
13 | is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation, the better it is" – this is the central idea behind BLEU.[1][2] BLEU was one of the first metrics to achieve a high correlation with human judgements of quality,[3][4] and remains one of the most popular automated and inexpensive metrics.
14 |
15 | ### Word Embedding
16 | Word Embedding turns text into numbers.
17 |
18 | #### Types of Word Embedding
19 | 1. Bag of Words - each word is represented in the matrix, results in sparse matrices
20 | 2. GloVe - counts of co-occurrences
21 | 3. Word2Vec
22 | - These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.
23 |
24 | ### IE (Information Extraction)
25 | IE turns the unstructured information embedded in texts into structured data.
26 |
27 |
28 | ### IOB (Inside, Outside, Beginning)
29 | ```
30 | The most widespread file representation uses IOB tags:
31 | IOB = Inside-Outside-Begininning
32 | B = beginnning (marks beginning of chunk)
33 | I = inside (all subsequent parts of chunk)
34 | O = outside
35 | ```
36 |
37 | ### Named Entity
38 | anything that can be referred to with a proper name
39 |
40 |
41 | ### NER (Named Entity Recognition)
42 | task of detecting and classifying all the proper names mentioned in a text
43 | * Generic NER: finds names of people, places and organizations that are mentioned in ordinary news texts
44 | * practical applications: built to detect everything from names of genes and proteins, to names of college courses
45 |
46 | ### Reference Resolution (Coreference)
47 | occurs when two or more expressions in a text refer to the same person or thing; they have the same referent, e.g. Bill said he would come; the proper noun Bill and the pronoun he refer to the same person, namely to Bill
48 |
49 | ### Relation Detection and Classification
50 | find and classify semantic relations among the entities discovered in a given text
51 |
52 | ### Event Detection and Classification
53 | find and classify the events in which the entities are participating
54 |
55 |
56 | ### GloVe
57 | GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
58 |
59 |
60 |
61 | ### Temporal Expression Detection
62 | * tells us that our sample text contains the temporal expressions *Friday* and *Thursday*
63 | * includes date expressions such as days of the week, months, holidays, as well as relative expressions including phrases like *two days from now* or *next year*.
64 | * includes time: noon, 3pm, etc.
65 |
66 | ### Temporal Analysis
67 | over problem is to map temporal expressions onto specific calendar dates or times of day and then to use those times to situate events in time.
68 |
--------------------------------------------------------------------------------
/resources.md:
--------------------------------------------------------------------------------
1 | # Resources
2 |
3 | ## Lessons
4 | * [Lesson 1 Notes](http://forums.fast.ai/t/deeplearning-lec1notes/7089) Tim Lee [(Tim's GitHub repo)](https://github.com/timdavidlee/learning-deep/tree/master/deeplearning1)
5 | * [Lesson 2: Case Study - A world class image classifier for dogs and cats (err.., anything)](https://medium.com/@apiltamang/case-study-a-world-class-image-classifier-for-dogs-and-cats-err-anything-9cf39ee4690e) Apil Tamang
6 | * [Lesson 2 Notes](http://forums.fast.ai/t/deeplearning-lecnotes2/7515/2) Tim Lee
7 | * [Lesson 3 Notes](http://forums.fast.ai/t/deeplearning-lecnotes3/7866) Tim Lee
8 | * [Lesson 4 Notes](http://forums.fast.ai/t/deeplearning-lec4notes/8146) Tim Lee
9 |
10 | ## Blog Sites by Author
11 | - [Anand Saha](http://teleported.in/)
12 | - [Apil Tamang](https://medium.com/@apiltamang)
13 |
14 |
15 | ## Blogs Written by (or recommended by) fastai Fellows
16 |
17 | ### Resnet
18 | * [Decoding the ResNet architecture](http://teleported.in/posts/decoding-resnet-architecture/) Anand Saha
19 | * [Yet Another ResNet Tutorial (or not)](https://medium.com/@apiltamang/yet-another-resnet-tutorial-or-not-f6dd9515fcd7) Apil Tamang
20 | * [An Overview of ResNet and its Variants](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035) Vincent Fung
21 |
22 | ### Stuctured Deep Learning
23 | * [Structured Deep Learning](https://towardsdatascience.com/structured-deep-learning-b8ca4138b848) Kerem Turgutlu (Masters' student at USF)
24 |
25 | ### NLP
26 | * [Fine-tuned Language Models for Text Classification](https://arxiv.org/abs/1801.06146) by Jeremy Howard and Sebastian Ruder
27 |
28 | ### PyTorch
29 | * [Transfer Learning using PyTorch — Part 2](https://towardsdatascience.com/transfer-learning-using-pytorch-part-2-9c5b18e15551) Vishnu Subramanian (April 2017)
30 | * [A practitioner's guide to PyTorch](https://medium.com/@radekosmulski/a-practitioners-guide-to-pytorch-1d0f6a238040) by Radek
31 |
32 | ### Learning Rate
33 | * [Improving the way we work with learning rate](https://techburst.io/improving-the-way-we-work-with-learning-rate-5e99554f163b) Vitaly Bushaev
34 | * [Visualizing Learning rate vs Batch size (Neural Nets basics using Fast.ai tools)](https://miguel-data-sc.github.io/2017-11-05-first/) Miguel (Nov 2017)
35 | * [Estimating an Optimal Learning Rate For a Deep Neural Network](https://medium.com/@surmenok/estimating-optimal-learning-rate-for-a-deep-neural-network-ce32f2556ce0) Pavel Surmenok
36 | * [Cyclical Learning Rate Technique](http://teleported.in/posts/cyclic-learning-rate/) Anand Saha
37 | * [Transfer Learning using differential learning rates](https://towardsdatascience.com/transfer-learning-using-differential-learning-rates-638455797f00) Manikanta Yadunanda
38 |
39 |
40 | ### CNN
41 | * [Convolutional Neural Network in 5 minutes](https://medium.com/@init_27/convolutional-neural-network-in-5-minutes-8f867eb9ca39) Sanyam Bhutani
42 | * [CS231n Convolutional Neural Networks for Visual Recognition](http://cs231n.github.io/convolutional-networks/)
43 |
44 | ### Kaggle
45 | * [FastAI Kaggle Starter Kit ](https://www.kaggle.com/timolee/fastai-kaggle-starter-kit-lb-0-33) Tim Lee
46 |
47 | ### Jupyter Notebook
48 |
49 | * [Debugging Jupyter notebooks](https://davidhamann.de/2017/04/22/debugging-jupyter-notebooks/)
50 |
51 | ### and More
52 |
53 | * [Do smoother areas of the error surface lead to better generalization? (An experiment inspired by the first lecture of the fast.ai MOOC)](https://medium.com/@radekosmulski/do-smoother-areas-of-the-error-surface-lead-to-better-generalization-b5f93b9edf5b) Radek
54 | * [Contributing to fast.ai](https://medium.com/@wgilliam/86f2c05d72aa) Wayde Gilliam
55 | * [Getting Computers To See Better Than Humans](https://medium.com/@ArjunRajkumar/getting-computers-to-see-better-than-humans-346d96634f73) Arjun Rajkumar
56 | * [Fun with small image data-sets](https://medium.com/@nikhil.b.k_13958/fun-with-small-image-data-sets-8c83d95d0159) Nikhil B
57 | * [Fun with small image data-sets (Part 2)](https://medium.com/@nikhil.b.k_13958/fun-with-small-image-data-sets-part-2-54d683ca8c96) Nikhil B
58 | * [Structured Deep Learning](https://medium.com/@keremturgutlu/structured-deep-learning-b8ca4138b848) Kerem Turgutlu
59 | * [Exploring Stochastic Gradient Descent with Restarts (SGDR)](https://medium.com/38th-street-studios/exploring-stochastic-gradient-descent-with-restarts-sgdr-fa206c38a74e) Mark Hoffman
60 | * [How do We Train Neural Networks?](https://towardsdatascience.com/how-do-we-train-neural-networks-edd985562b73) Vitaly Bushev
61 |
62 | ### Reference Blogs
63 |
64 | * [Understanding LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) Christopher Olah
65 | * [Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano](http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/) Denny Britz
66 |
67 | ## Research Publications
68 |
69 | * [A systematic study of the class imbalance problem
70 | in convolutional neural networks](https://arxiv.org/pdf/1710.05381.pdf)
71 | * [What’s your ML Test Score? A rubric for ML
72 | production systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45742.pdf) (NIPS 2016)
73 | * [ADAM: A Method for Stochastic Optimization](https://arxiv.org/pdf/1412.6980.pdf) (ICLR 2015)
74 | * [A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay](https://arxiv.org/abs/1803.09820) Leslie Smith, March 2018
75 | * [Cyclical Learning Rates for Training Neural Networks](https://arxiv.org/abs/1506.01186) (WACV 2017) Leslie Smith
76 | * [Fixing Weight Decay Regularization in Adam](https://arxiv.org/abs/1711.05101) Ilya Loshchilov, Frank Hutter (Submitted on 14 Nov 2017)
77 | * [Learning Distributed Representations of Concepts](http://www.cs.toronto.edu/~hinton/absps/families.pdf) Geoffrey Hinton, 1986
78 | * [Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)
79 |
80 | ### Key Research Papers
81 | * [A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay](https://arxiv.org/abs/1803.09820), Leslie N. Smith, 2018
82 | * [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) Kaiming He, ILSVRC 2015 classification task winner
83 | * [Visualizing and Understanding Convolutional Networks](http://www.matthewzeiler.com/wp-content/uploads/2017/07/arxive2013.pdf) Zeiler & Fergus, 2013
84 |
85 |
86 | ## Videos
87 |
88 | * [The wonderful and terrifying implications of computers that can learn](https://www.ted.com/talks/jeremy_howard_the_wonderful_and_terrifying_implications_of_computers_that_can_learn) (Ted Talk by Jeremy Howard 2014)
89 | * [A Visual and Intuitive Understanding of Deep Learning](https://www.youtube.com/embed/Oqm9vsf_hvU?autoplay=1&feature=oembed&wmode=opaque) Otavio Good of Google, AI Conf SF Sep 2017
90 | * [Ian Goodfellow - Numerical Computation for Deep Learning - AI With The Best Oct 14-15, 2017](https://www.youtube.com/watch?v=XlYD8jn1ayE&t=5m40s)
91 | * [Ali Rahimi's talk at NIPS(NIPS 2017 Test-of-time award presentation)](https://www.youtube.com/watch?v=Qi1Yry33TQE)
92 |
93 |
94 |
--------------------------------------------------------------------------------
/takeaways.md:
--------------------------------------------------------------------------------
1 | # Takeways / Tips
2 |
3 | ## Modeling
4 | 1. When training a model, we can "ignore" or not worry as much about **overfitting** as long as the validation error is decreasing.
5 |
6 |
7 | 2. **Image Sizes** are generally at 224x224 and 299x299, which are the sizes that imagenet models are generally trained at. You get best results if you use the same as the original training size. Since people don’t tend to mention what size was used originally, you can try using both with something like dogs v cats and see which works better. More recent models seem to generally use 299.
8 |
9 | 3. **Rare Cases** You can replicate the rare classes to make them more balanced. Never throw away data!
10 |
11 | ### Reducing Overfitting
12 | * data augmentation
13 | * pretrained network
14 | * gradually increasing image size
15 | * differential learning rates
16 | * SGDR
17 | * dropouts
18 | * higher resolution images
19 |
20 | # Best Practices
21 |
22 | 1. When opening a notebook in fastai library, make a copy with the prefix **tmp**. "tmp" files are included in fastai repo's [.gitignore](https://github.com/fastai/fastai/blob/master/.gitignore)
23 |
24 |
--------------------------------------------------------------------------------
/tips_faq_beginners.md:
--------------------------------------------------------------------------------
1 | # Fastai FAQs for Beginners
2 |
3 | ## Q1: How to ask for help for fastai
4 | - http://forums.fast.ai/t/how-to-ask-for-help/10421
5 |
6 | - Make sure you enclose your code in triple back ticks. Example:
7 |
8 | >use this code - notice the 3 ` enclosing the code block:
9 |
10 | 
11 |
12 |
13 | >to render this:
14 |
15 | ```bash
16 | ~/.conda/envs/tf-gpu/lib/python3.6/multiprocessing/popen_fork.py in __init__(self, process_obj)
17 | 18 sys.stderr.flush()
18 | 19 self.returncode = None
19 | ---> 20 self._launch(process_obj)
20 | 21
21 | 22 def duplicate_for_child(self, fd):
22 |
23 | ~/.conda/envs/tf-gpu/lib/python3.6/multiprocessing/popen_fork.py in _launch(self, process_obj)
24 | 65 code = 1
25 | 66 parent_r, child_w = os.pipe()
26 | ---> 67 self.pid = os.fork()
27 | 68 if self.pid == 0:
28 | 69 try:
29 |
30 | OSError: [Errno 12] Cannot allocate memory
31 | ```
32 |
33 |
34 | ---
35 | ## Q2: Where can I put _my_ Jupter Notebook?
36 |
37 | :red_circle: **NOTE:** Do NOT put your Jupyter Notebook under the `/data/` directory! Here's [the link](http://forums.fast.ai/t/how-to-remove-ipynb-checkpoint/8532/2) for why.
38 |
39 | ### Option 1 (default): under `/courses`
40 | The default location is under the `dl1` folder, wherever you've cloned the repo on your GPU machine.
41 | >my example
42 | ```bash
43 | (fastai) paperspace@psgyqmt1m:~$ ls
44 | anaconda3 data downloads fastai
45 | ```
46 | - Paperspace: `/home/paperspace/fastai/courses/dl1`
47 | - AWS: `/home/ubuntu/fastai/courses/dl1`
48 |
49 | ### Option 2: where you want
50 | If you change the default **location of your notebook**, you'll need to update your `.bashrc` file. Add in the path to where you've cloned the fastai GitHub repo:
51 | - for me, my notebooks are in a "projects" directory: `~/projects`
52 | - my `fastai` repo is cloned at the root level, so it is here: `~/fastai`
53 |
54 | in the file `.bashrc` add this path:
55 | ```
56 | export PYTHONPATH=$PYTHONPATH:~/fastai
57 | ```
58 | **Reminder:** don't forget to run (or `source`) your `.bashrc` file:
59 | 1. add path where fastai repo is to `.bashrc`
60 | 2. save and exit
61 | 3. source it: `source ~/.bashrc`
62 |
63 | ### Option 3: used `pip install`
64 | Note that if you did `pip install`, you don't need to specify the path (as in option 2, or you don't need to put in the courses folder, as in option 1).
65 | However, fastai is still being updated so there is a delay in library being available directly via pip.
66 | Can try:
67 | `pip install https://github.com/fastai/fastai/archive/master.zip`
68 |
69 | ---
70 | ## Q3: What does my directory structure look like?
71 | >my path
72 | ```bash
73 | PATH = "/home/ubuntu/data/dogscats/"
74 | ```
75 |
76 | >looking at my directory structure
77 | ```bash
78 | !tree {PATH} -d
79 | ```
80 | ```bash
81 | /home/ubuntu/data/dogscats/
82 | ├── models
83 | ├── sample
84 | │ ├── models
85 | │ ├── tmp
86 | │ ├── train
87 | │ │ ├── cats
88 | │ │ └── dogs
89 | │ └── valid
90 | │ ├── cats
91 | │ └── dogs
92 | ├── test
93 | ├── train
94 | │ ├── cats
95 | │ └── dogs
96 | └── valid
97 | ├── cats
98 | └── dogs
99 | ```
100 | ### Notes on directories
101 | * `models` directory: created automatically
102 | * `sample` directory: you create this with a small sub-sample, for testing code
103 | * `test` directory: put any test data there if you have it
104 | * `train`/`test` directory: you create these and separate the data using your own data sample
105 | * `tmp` directory: if you have this, it was automatically created after running models
106 | * fastai / keras code automatically picks up the **label** of your categories based on your folders. Hence, in this example, the two labels are: dogs, cats
107 |
108 | ### Notes on image file names
109 | * not important, you can name them whatever you want
110 |
111 |
112 | ### Getting file counts
113 | >looking at file counts
114 | ```bash
115 | # print number of files in each folder
116 |
117 | print("training data: cats")
118 | !ls -l {PATH}train/cats | grep ^[^dt] | wc -l
119 |
120 | print("training data: dogs")
121 | !ls -l {PATH}train/dogs | grep ^[^dt] | wc -l
122 |
123 | print("validation data: cats")
124 | !ls -l {PATH}valid/cats | grep ^[^dt] | wc -l
125 |
126 | print("validation data: dogs")
127 | !ls -l {PATH}valid/dogs | grep ^[^dt] | wc -l
128 |
129 | print("test data")
130 | !ls -l {PATH}test1 | grep ^[^dt] | wc -l
131 | ```
132 | >my output
133 | ```bash
134 | training data: cats
135 | 11501
136 | training data: dogs
137 | 11501
138 | validation data: cats
139 | 1001
140 | validation data: dogs
141 | 1001
142 | test data
143 | 12501
144 | ```
145 | ---
146 | ## Q4: What is a good train/validation/test split?
147 | - can do `80/20` (train/validation)
148 | - if you have or are creating a 'test' split, use for (train/validation/test):
149 | - can do `80/15/5`
150 | - can do `70/20/10`
151 | - can do `60/20/20`
152 |
153 | **Note:** Depending on who the instructor is, they use various naming conventions:
154 | - train/test and then **validation** for holdout data
155 | - train/validation and then **test** for holdout data
156 |
157 | It's important to understand that:
158 | - in the case of train/test, the test set is used to test for **generalization**
159 | - the **holdout data** is a second test set
160 |
161 | ---
162 | ## Q5: How do I copy files or data from my local computer to a cloud machine (Paperspace, AWS, etc)?
163 |
164 | [Instructions on using `scp` command to transfer files from platforms](https://github.com/reshamas/fastai_deeplearn_part1/blob/master/tools/copy_files_local_to_cloud.md)
165 |
166 | ---
167 | ## Q6: Where do I put my sample images?
168 | [testing sample images after the model has been created](http://forums.fast.ai/t/wiki-lesson-1/9398/282)
169 |
--------------------------------------------------------------------------------
/tips_prereqs.md:
--------------------------------------------------------------------------------
1 | # Things to Know Before Running Fastai Library
2 |
3 |
4 | ## Q1: What is train/valid/test?
5 |
6 |
7 | ## Q2: How do I divide up train/valid/test?
8 |
9 |
--------------------------------------------------------------------------------
/tips_troubleshooting.md:
--------------------------------------------------------------------------------
1 | # Solving Errors
2 |
3 | ## Latest version of fastai library
4 | Do git pull of [fastai library](https://github.com/fastai/fastai). Updates may sort out some errors.
5 | ```bash
6 | git pull
7 | ```
8 | ## Update Anaconda packages
9 | ```bash
10 | conda env update
11 | conda update --all
12 | ```
13 |
14 | ## Delete `tmp` directory and rerun
15 |
16 | ## CUDA out of memory error
17 | - interrupt kernel
18 | - reduce batch size
19 | - **RESTART kernel**!
20 |
21 | ## TTA (Test Time Augmentation)
22 | - [forum post](http://forums.fast.ai/t/lesson-2-dog-breeds-error-on-call-of-accuracy-log-preds-y/11965)
23 | - "TTA used to return the average of the augmentations as a prediction. Now it returns the set so you can do with them as you please."
24 |
25 | #### Error with this code
26 | ```python
27 | log_preds,y = learn.TTA()
28 | probs = np.exp(log_preds)
29 | accuracy(log_preds,y), metrics.log_loss(y, probs)
30 | ```
31 | #### Adjust with this code
32 | ```python
33 | log_preds,y = learn.TTA()
34 | preds = np.mean(np.exp(log_preds),0)
35 | ```
36 |
37 | ---
38 | ## Empty graph with learning rate finder
39 | - try increasing the batch size
40 |
41 | ---
42 |
43 | # Debugging
44 | Note from Jeremy:
45 | Immediately after you get the error, type `%debug` in a cell to enter the debugger. Then use the standard python debugger commands to follow your code to see what’s happening.
46 |
--------------------------------------------------------------------------------
/tools/README.md:
--------------------------------------------------------------------------------
1 | # Tools for Deep Learning
2 |
3 |
4 | [Create an image dataset from scratch](http://forums.fast.ai/t/create-an-image-dataset-from-scratch/9992)
5 |
--------------------------------------------------------------------------------
/tools/check_links.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | # Objective: run a script to check an *.md file to see that all links are valid
4 |
5 | # EXAMPLE of how to run file:
6 | """
7 | ▶ pwd
8 | /Users/reshamashaikh/ds/my_repos/fastai_deeplearn_part1/tools
9 |
10 | my_repos/fastai_deeplearn_part1/tools
11 | ▶ python check_links.py -v /Users/reshamashaikh/ds/my_repos/fastai_deeplearn_part1/README.md
12 | VALID http://www.fast.ai
13 | VALID http://forums.fast.ai/c/part1-v2
14 | VALID http://forums.fast.ai/c/part1v2-beg
15 | VALID https://github.com/fastai/fastai
16 | VALID tools/aws_ami_gpu_setup.md
17 | VALID tools/tmux.md
18 | VALID resources.md
19 |
20 | my_repos/fastai_deeplearn_part1/tools
21 | ▶ python check_links.py -v /Users/reshamashaikh/ds/my_repos/fastai_deeplearn_part1/tools/tmux.md
22 | VALID #section-a
23 | VALID #section-b
24 | VALID #section-c
25 | VALID #section-d
26 | VALID #section-e
27 | VALID https://hackernoon.com/a-gentle-introduction-to-tmux-8d784c404340
28 | VALID https://alekshnayder.com
29 | VALID http://console.aws.amazon.com/
30 |
31 | """
32 |
33 | # Running Python 3
34 |
35 | __author__ = 'taylanbil'
36 |
37 |
38 | import os
39 | import markdown
40 | from argparse import ArgumentParser
41 |
42 | from bs4 import BeautifulSoup
43 |
44 |
45 | class LinkChecker(object):
46 |
47 | def __init__(self, mdfilename, verbose=False):
48 | """
49 | input: mdfilename has to be the full path!!!
50 | """
51 | self.mdfilename = mdfilename
52 | self.path = os.path.abspath(os.path.dirname(mdfilename))
53 | self.soup = self.get_soup()
54 | self.verbose = verbose
55 |
56 | def validate_link(self, link):
57 | if link.startswith('http'):
58 | return True
59 | elif link.startswith('#'):
60 | return bool(self.soup.find_all('a', {'name': link[1:]}))
61 | elif link.startswith('/'):
62 | return os.path.exists(os.path.join(self.path, link[1:]))
63 | else:
64 | return os.path.exists(os.path.join(self.path, link))
65 |
66 | def get_soup(self):
67 | with open(self.mdfilename, 'r') as f:
68 | md = markdown.markdown(f.read())
69 | soup = BeautifulSoup(md, "lxml")
70 | return soup
71 |
72 | def get_links(self):
73 | for link in self.soup.find_all('a', href=True):
74 | yield link['href']
75 |
76 | def process_link(self, link):
77 | isvalid = 'VALID' if self.validate_link(link) else 'INVALID'
78 | if self.verbose or isvalid == 'INVALID':
79 | print('{isvalid}\t{link}'.format(isvalid=isvalid, link=link))
80 |
81 | def main(self):
82 | for link in self.get_links():
83 | self.process_link(link)
84 |
85 |
86 | def get_namespace():
87 | parser = ArgumentParser()
88 | parser.add_argument(
89 | 'mdfilename', help='''full path to the .md file you would like
90 | to check links in''')
91 | parser.add_argument(
92 | '-v', '--verbose', action='store_true',
93 | help='''verbose flag. if specified, prints all links with
94 | results. Otherwise, prints invalid links only''')
95 | return parser.parse_args()
96 |
97 |
98 | if __name__ == '__main__':
99 | ns = get_namespace()
100 | LC = LinkChecker(ns.mdfilename, verbose=ns.verbose)
101 | LC.main()
102 |
103 | # # a test here
104 | # mdfile = '/Users/reshamashaikh/ds/my_repos/fastai_deeplearn_part1/README.md'
105 | # LC = LinkChecker(mdfile)
106 | # LC.main()
107 |
108 |
--------------------------------------------------------------------------------
/tools/copy_files_local_to_cloud.md:
--------------------------------------------------------------------------------
1 | # Copy Files from Local Computer to Cloud Computer
2 | - copy files from local computer to AWS, Paperspace, Google Cloud, etc
3 | - copy files from cloud computer to local
4 | - copy files from local computer to remote machine
5 |
6 | ## Reference
7 | [fastai Forum thread](http://forums.fast.ai/t/lesson-1-part-1-v2-custom-images/10154/16)
8 | - [Stack Overflow](https://stackoverflow.com/questions/4728752/scp-a-bunch-of-files-via-bash-script-there-must-be-a-better-way)
9 | - [Stack Exchange](https://unix.stackexchange.com/questions/232946/how-to-copy-all-files-from-a-directory-to-a-remote-directory-using-scp)
10 |
11 | ## Defintion
12 | `scp` = secure copy
13 |
14 | ### General Syntax
15 | `scp -i "path to .pem file" "file to be copied from local machine" username@amazoninstance: 'destination folder to copy file on remote machine'`
16 |
17 | ### Examples
18 | ```bash
19 | scp -r . ubuntu@107.22.140.44:~/data/camelhorse
20 | ```
21 |
22 | ```bash
23 | scp -i "path to .pem file" "file to be copied from local machine" username@amazoninstance: 'destination folder to copy file on remote machine'
24 | ```
25 |
26 | ```bash
27 | scp -i .ssh/aws-key-fast-ai.pem
28 | ubuntu@ec2-35-165-244-148.us-west2.compute.amazonaws.com:~/nbs/Notebooks/Weights/Predictions/test_preds_rms.dat ~/test_preds_rms.dat
29 | ```
30 |
--------------------------------------------------------------------------------
/tools/create_keypair.md:
--------------------------------------------------------------------------------
1 | # Create a keypair
2 |
3 | ### Step 1: go to appropriate directory in termainal
4 | * In your Terminal, go to `.ssh` folder under your home directory
5 | (Note: Windows users should have Ubuntu installed.)
6 | >my example
7 | `/Users/reshamashaikh/.ssh`
8 |
9 | **Note:** If you do not have the `.ssh` directory, you can create it (make sure you are in your home directory):
10 | `mkdir .ssh`
11 |
12 | ### Step 2: create `id_rsa` files if needed
13 | **Note:** these `id_rsa` files contain a special password for your computer to be able to log onto AWS.
14 |
15 | If you do not have these two files (`id_rsa` and `id_rsa.pub`), create them by typing:
16 | - `ssh-keygen`
17 | - Hit `` 3 times
18 |
19 | >my example
20 | ```bash
21 | % pwd
22 | /Users/reshamashaikh/.ssh
23 | % ls
24 | % ssh-keygen
25 | Generating public/private rsa key pair.
26 | Enter file in which to save the key (/Users/reshamashaikh/.ssh/id_rsa):
27 | Enter passphrase (empty for no passphrase):
28 | Enter same passphrase again:
29 | Your identification has been saved in /Users/reshamashaikh/.ssh/id_rsa.
30 | Your public key has been saved in /Users/reshamashaikh/.ssh/id_rsa.pub.
31 | The key fingerprint is:
32 | SHA256:jmDJes1qOzDi8KynXLGQ098JMSRnbIyt0w7vSgEsr2E reshamashaikh@RESHAMAs-MacBook-Pro.local
33 | The key's randomart image is:
34 | +---[RSA 2048]----+
35 | | .=+ |
36 | |. .== |
37 | |.o +o |
38 | |..+= oo |
39 | |.E.+X. S |
40 | |+o=o=*oo. |
41 | |++.*o.+o. |
42 | |..*.oo |
43 | |o= o+o |
44 | +----[SHA256]-----+
45 | % ls
46 | total 16
47 | -rw------- 1 1675 Dec 17 12:20 id_rsa
48 | -rw-r--r-- 1 422 Dec 17 12:20 id_rsa.pub
49 | %
50 | ```
51 |
52 | ### Step 3: import key files to AWS
53 | (Note: Extra step for Windows users: you will need to copy these files to your hardrive from Ubuntu.)
54 | In AWS, go to **Key Pairs** in left menu and import `id_rsa.pub`. This step connects your local computer to AWS.
55 | Note for Mac Users: can also `cat id_rsa.pub` in terminal, copy and paste it into AWS for "key contents".
56 |
57 |
58 |
59 |
--------------------------------------------------------------------------------
/tools/crestle_run.md:
--------------------------------------------------------------------------------
1 | # Getting Crestle Working - for Newbies
2 | fastai.ai Part 1 v2
3 | Updated: 05-Nov-2017
4 |
5 | ### Why does my notebook have all these errors when I try running it in Crestle?
6 | Answer: the fastai repo in there has outdated materials
7 |
8 | ### What's the easiest way to fix it?
9 |
10 | a) log into [Crestle](https://www.crestle.com) and `Start Jupyter`
11 | b) Hit `New Terminal`
12 | c) `ls`
13 | d) `cd courses`
14 | e) `ls` (you'll see the fastai course there)
15 |
16 | f) `git pull` (update repo)
17 |
18 | OR, if you run into errors because you have added files to the repository, etc., this is a quick fix:
19 | g) `rm -rf fastai` (delete this old version)
20 | h) `git clone https://github.com/fastai/fastai.git` (clone, get updated course files)
21 |
22 | >my example
23 | ```bash
24 | nbuser@jupyter:~$ ls
25 | README.txt courses examples
26 | nbuser@jupyter:~$ cd courses
27 | nbuser@jupyter:~/courses$ ls
28 | fastai
29 | nbuser@jupyter:~/courses$ rm -rf fastai
30 | nbuser@jupyter:~/courses$ git clone https://github.com/fastai/fastai.git
31 | Cloning into 'fastai'...
32 | remote: Counting objects: 1055, done.
33 | remote: Compressing objects: 100% (19/19), done.
34 | remote: Total 1055 (delta 11), reused 17 (delta 9), pack-reused 1026
35 | Receiving objects: 100% (1055/1055), 64.37 MiB | 40.84 MiB/s, done.
36 | Resolving deltas: 100% (598/598), done.
37 | Checking connectivity... done.
38 | Checking out files: 100% (110/110), done.
39 | nbuser@jupyter:~/courses$
40 | ```
41 | And, now my [Lesson 1 notebook](https://s.users.crestle.com/u-fqnc8t2x12/notebooks/courses/fastai/courses/dl1/lesson1.ipynb) works! :boom:
42 |
43 |
44 |
45 |
46 |
--------------------------------------------------------------------------------
/tools/download_data_browser_curlwget.md:
--------------------------------------------------------------------------------
1 |
2 | # Browser extensions for getting data
3 |
4 | ## [Chrome Extension: CurlWget](https://chrome.google.com/webstore/detail/curlwget/jmocjfidanebdlinpbcdkcmgdifblncg?hl=en)
5 |
6 | Note: This is a second way to download the data from Kaggle. The first way is using `kaggle-cli`.
7 |
8 | ## Kaggle Competition
9 | [Planet: Understanding the Amazon from Space](https://www.kaggle.com/c/planet-understanding-the-amazon-from-space)
10 |
11 | The Planet data is here:
12 | https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data
13 | * Download data, and cancel it.
14 | * In Chrome browser, top right, click bright yellow icon and copy the text. Mine looks like this:
15 |
16 |
17 | Copy and paste the syntax in your terminal.
18 | >my example
19 | ```bash
20 | wget --header="Host: storage.googleapis.com" --header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8" --header="Accept-Language: en-US,en;q=0.8" "https://storage.googleapis.com/kaggle-competitions-data/kaggle/6322/test-jpg.tar.7z?GoogleAccessId=competitions-data@kaggle-161607.iam.gserviceaccount.com&Expires=1510937384&Signature=5%2Bq%2BWbix63zFgHiDlusQsWDmXpAmCZ43%2BNCyXV9v6m%2BaPjEHloBX%2FFX858hPSZohUXUs3kT9gbE5zEhQ%2FKjYD8ngPGgPwQYP3IOV4Tn3ku2P2%2FQ8vtE%2FFNUmcqs7rOqC8ZUAoX3TZ8OHSoh%2B1R3zYp0mY%2FjDbhJXPVVsZSsnEynbO0Rg9jsXFN0UH2QgKWhGoYou%2B1W2u6UvUsjgNfYnwgzCzeEjmjN1Fp2we7q18EYgbdv3Y%2BMpP%2BDQxz57%2B%2Bn9Cio%2Bn012qy5hDJec9%2F6PSZ2w%2Bvl0JuazRmaOP2K7L9MgH1zhAlO%2FQy37fC9r8XqOtLqMChYBYKXPHO0qSF6Dw%3D%3D" -O "test-jpg.tar.7z" -c
21 | ```
22 | All the cookies, headers needed to download header are saved in the syntax. Also useful for downloading other items hidden behind a login.
23 |
24 | ## Data Location
25 | Option 1: `data` directory could be a sub-directory of where your Jupyter Notebook is located.
26 | Option 2: Use symlinks
27 |
28 | ## Firefox browser extension
29 | https://addons.mozilla.org/en-US/firefox/addon/cliget/
30 |
--------------------------------------------------------------------------------
/tools/download_data_curl.md:
--------------------------------------------------------------------------------
1 | # Download Dataset using `curl`
2 |
3 | ## Getting Data
4 |
5 | Sample data: https://www.kaggle.com/c/bluebook-for-bulldozers
6 | - in this example, we're using Firefox as a browser (you can use another browser)
7 | - go to the website where the data is
8 | - go to Developer section
9 | - method 1: Javascript console, Developer
10 | - method 2: ctrl + shift + i to bring up web developer tool
11 | - tab to Network
12 | - go to data row
13 | - right click, copy as "curl" (unix command that downloads data, like wget)
14 | - might want to delete "2.0" in url since it causes problems
15 | - curl url_link -o bulldozers.zip -o means output, then give suitable file name
16 |
17 | ## Setting up the Data Directory
18 | - mkdir bulldozers
19 | - mv bulldozers.zip bulldozers/
20 | - sudo apt install unzip or if on Mac: brew install unzip
21 | - cd bulldozers
22 | - unzip bulldozers.zip
23 |
24 |
25 |
--------------------------------------------------------------------------------
/tools/download_data_kaggle_cli.md:
--------------------------------------------------------------------------------
1 | # Kaggle CLI
2 | (**CLI** = **C**ommand **L**ine **I**nterface)
3 |
4 | ## Resource
5 | [Kaggle CLI Wiki](http://wiki.fast.ai/index.php/Kaggle_CLI)
6 |
7 | ## Installation
8 | Check to see if `kaggle-cli` is installed:
9 | kaggle-cli --version
10 |
11 | Install `kaggle-cli`:
12 | pip install kaggle-cli
13 | or pip3 install kaggle-cli
14 |
15 | May need to **update package** if you run into errors:
16 | pip install kaggle-cli --upgrade
17 | or pip3 install kaggle-cli --upgrade
18 |
19 |
20 | ---
21 |
22 | ## [Kaggle Competition Datasets](https://www.kaggle.com/datasets)
23 | Note 1: You must have a Kaggle user ID and password. If you logged in to Kaggle using FB or LI, you'll have to reset your password, as that is needed for command line access to the data.
24 |
25 | Note 2: Pick a competition, and ensure you have **accepted the rules** of that competition. Otherwise, you will not be able to download the data using the CLI.
26 |
27 |
28 |
29 | ### Step 1: Identify the competition I will use
30 | https://www.kaggle.com/c/dogs-vs-cats
31 |
32 | **Note:** the competition name can be found in the url; here it is **dogs-vs-cats**
33 |
34 | ### Step 2: Accept competition rules
35 | https://www.kaggle.com/c/dogs-vs-cats/rules
36 |
37 | ### Step 3: Set up data directory
38 | ls
39 | mkdir data
40 | cd data
41 | >my example
42 | ```bash
43 | ubuntu@ip-10-0-0-13:~$ ls
44 | anaconda2 anaconda3 downloads git nbs temp
45 | ubuntu@ip-10-0-0-13:~$ mkdir data
46 | ubuntu@ip-10-0-0-13:~$ cd data
47 | ```
48 |
49 | ### Step 4a: Download data (try 1)
50 | Syntax:
51 | kg config -g -u 'username' -p 'password' -c 'competition'
52 | kg download
53 |
54 | Note: Here's an example of warning message I receive when I tried to download data before accepting the rules of the competition:
55 | >my example
56 | ```bash
57 | ubuntu@ip-10-0-0-13:~/data$ kg config -g -u 'reshamashaikh' -p 'xxx' -c dogs-vs-cats
58 | ubuntu@ip-10-0-0-13:~/data$ kg download
59 | Starting new HTTPS connection (1): www.kaggle.com
60 | downloading https://www.kaggle.com/c/dogs-vs-cats/download/sampleSubmission.csv
61 |
62 | sampleSubmission.csv N/A% | | ETA: --:--:-- 0.0 s/B
63 |
64 | Warning: download url for file sampleSubmission.csv resolves to an html document rather than a downloadable file.
65 | Is it possible you have not accepted the competition's rules on the kaggle website?
66 | ```
67 |
68 | ### Step 4b: Dowload data (try 2)
69 | Note 1: I have accepted the competition rules; will try downloading again
70 | config -g -u 'username' -p 'password' -c 'competition'
71 | kg download
72 | >my example
73 | ```bash
74 | ubuntu@ip-10-0-0-13:~/data$ kg config -g -u 'reshamashaikh' -p 'xxx' -c dogs-vs-cats
75 | ubuntu@ip-10-0-0-13:~/data$ kg download
76 | Starting new HTTPS connection (1): www.kaggle.com
77 | downloading https://www.kaggle.com/c/dogs-vs-cats/download/sampleSubmission.csv
78 |
79 | Starting new HTTPS connection (1): storage.googleapis.com
80 | sampleSubmission.csv 100% |##################################################################################################################| Time: 0:00:00 320.2 KiB/s
81 |
82 | downloading https://www.kaggle.com/c/dogs-vs-cats/download/test1.zip
83 |
84 | test1.zip 100% |#############################################################################################################################| Time: 0:00:08 32.5 MiB/s
85 |
86 | downloading https://www.kaggle.com/c/dogs-vs-cats/download/train.zip
87 |
88 | train.zip 100% |#############################################################################################################################| Time: 0:00:17 31.4 MiB/s
89 | ```
90 |
91 | ### Download Kaggle Data (another way)
92 | Note: sometimes setting up the configuration results in an error the next time you try to download another competition. You may want to bypass configuration and directly include your user ID, password and competition name in one command line.
93 |
94 | ```bash
95 | kg download -u 'reshamashaikh' -p 'xxx' -c statoil-iceberg-classifier-challenge
96 | ```
97 |
98 | ### Step 5: Look at data that was downloaded
99 | ls -alt
100 | ```bash
101 | ubuntu@ip-10-0-0-13:~/data$ ls -alt
102 | total 833964
103 | -rw-rw-r-- 1 ubuntu ubuntu 569546721 Nov 4 18:24 train.zip
104 | drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 4 18:24 .
105 | -rw-rw-r-- 1 ubuntu ubuntu 284321224 Nov 4 18:24 test1.zip
106 | -rw-rw-r-- 1 ubuntu ubuntu 88903 Nov 4 18:23 sampleSubmission.csv
107 | drwxr-xr-x 22 ubuntu ubuntu 4096 Nov 4 18:23 ..
108 | ubuntu@ip-10-0-0-13:~/data$
109 | ```
110 |
111 | ### Step 6: Unzip Files
112 | Note 1: You will need to install and use `unzip` to unzip files.
113 |
114 | For Window users:
115 | 1. First Download ubuntu from Window Microsoft Store
116 |
117 | 2. Open PowerShell as Administrator and run:`Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux`
118 |
119 | 3. Once the download has completed, select "Launch".This will open a console window. Wait for installation to complete then you will be prompted to create your LINUX user account.
120 |
121 | 4. Create your LINUX username and password.
122 |
123 | 5. Go to Control Panel and Turn on Developer Mode .
124 |
125 | 6. Run `bash` from command-prompt. After that you can follow same as Linux users guide.
126 |
127 |
128 | For Linux Users:
129 |
130 |
131 | sudo apt install unzip
132 | unzip train.zip
133 | unzip -q test.zip (Note: `-q` means to unzip quietly, suppressing the printing)
134 |
135 | ```bash
136 | ubuntu@ip-10-0-0-13:~/nbs/data$ ls train/dogs/dog.1.jpg
137 | train/dogs/dog.1.jpg
138 | ubuntu@ip-10-0-0-13:~/nbs/data$ ls -l train/dogs/ | wc -l
139 | 12501
140 | ubuntu@ip-10-0-0-13:~/nbs/data$
141 |
142 |
143 | ubuntu@ip-10-0-0-13:~/nbs/data$ ls -l train/cats/ | wc -l
144 | 12501
145 | ubuntu@ip-10-0-0-13:~/nbs/data$
146 | ubuntu@ip-10-0-0-13:~/nbs/data$ ls test1 | wc -l
147 | 12500
148 | ubuntu@ip-10-0-0-13:~/nbs/data$
149 | ```
150 |
151 | ---
152 | ## Kaggle - Submit Results
153 | ```bash
154 | kg submit -u -p -c -m ""
155 | ```
156 | >my example
157 | ```bash
158 | /home/ubuntu/data/iceberg/sub
159 | (fastai) ubuntu@ip-172-31-2-59:~/data/iceberg/sub$
160 | ```
161 | ```bash
162 | kg submit resnext50_sz150_zm13.csv -u 'reshamashaikh' -p 'xxx' -c statoil-iceberg-classifier-challenge
163 | ```
164 |
165 | ---
166 | ### Jeremy’s Setup
167 | Good to copy 100 or so the sample directory; enough to check that the scripts are working
168 |
169 | Advice 1: Separate TEST data into VALIDATION
170 | TASK: move 1000 each dogs / cats into valid
171 | ```bash
172 | > ls valid/cats/ | wc -l
173 | 1000
174 | > ls valid/dogs/ | wc -l
175 | 1000
176 |
177 | Advice 2: Do all of your work on sample data
178 | > ls sample/train
179 |
180 | > ls sample/valid
181 |
182 | > ls sample/train/cats | wc -l
183 | 8
184 | > ls sample/valid/cats | wc -l
185 | 4
186 | ```
187 | ---
188 | ## Kaggle API
189 | Another option is to use the Kaggle API
190 | https://github.com/Kaggle/kaggle-api
191 |
192 |
--------------------------------------------------------------------------------
/tools/getting_image_data.md:
--------------------------------------------------------------------------------
1 | # Getting Image Data
2 |
3 | ## Data Sources
4 | * [New York Public Library Digital Collections](https://digitalcollections.nypl.org) referred by [Enigma](https://www.enigma.com).
5 | * Google Images: https://images.google.com/
6 | * ImageNet: http://www.image-net.org
7 |
8 | ## Search for image of interest
9 | - Example: Search for images of horses
10 |
11 | ## Downloading Image Data using Chrome Plug in
12 | - this extension lets you bulk download images from a website
13 | - Use this plugin, Image Downloader which downloads images:
14 | https://chrome.google.com/webstore/detail/image-downloader/cnpniohnfphhjihaiiggeabnkjhpaldj
15 | - there should be extensions for other browsers such as Firefox, Opera, etc.
16 |
17 |
18 |
19 |
20 |
21 | ---
22 | #### My Sample Data
23 | * Horses: https://digitalcollections.nypl.org/search/index?&keywords=horses&sort=score+desc#/?scroll=150
24 | * Camels: https://digitalcollections.nypl.org/search/index?utf8=%E2%9C%93&keywords=camels#/?scroll=180
25 |
--------------------------------------------------------------------------------
/tools/jupyter_notebook.md:
--------------------------------------------------------------------------------
1 | # Jupyter Notebook Commands & Shortcuts
2 |
3 | In [Kaggle 2017 data science survey](https://www.kaggle.com/surveys/2017) of 16K data scientists, Jupyter Notebook came up as 3rd most important self-reported tool for data science.
4 |
5 | ## Notebook Features
6 | * can add text, images, code - all in one place
7 | * can document what we're doing as we go along and code
8 | * can put pictures, videos, html tables, interactive widgets
9 | * great experimentation environment
10 |
11 | ## Help
12 | h shows the list of shortcuts
13 |
14 | ## Notebook Commands / Shortcuts
15 | * Shift + Enter to run cell
16 | * Shift + Tab First time pressing: tells you what parameters to pass
17 | * Shift + Tab Press 3 times: gives additional info about method
18 |
19 | ### Select multiple cells
20 | ESC Shift+ :arrow_up: extend select cells above
21 | ESC Shift+ :arrow_down: extend select cells below
22 |
23 |
24 | ## Notebook Source Code Access
25 |
26 | ### to look at documentation for code (or function)
27 | * ? + function name
28 | * Example: ?ImageClassifierData.from_paths
29 |
30 | ### to look at source code for a function
31 | * ?? + function name
32 | * Example: ??ImageClassifierData.from_paths
33 |
34 | ### to find out where a particular function or class comes from
35 | * function name , then Shift + Enter
36 | * Example of Input: ImageClassifierData Shift + Enter
37 | * Example of Output: `fastai.dataset.ImageClassifierData`
38 | * Example of Input: display Shift + Enter
39 | * Example of Output: ``
40 |
41 |
42 | ### to find out what parameters that the function can take, also shows default parameter options
43 | * Within function, Shift + Tab
44 | * `object`, then Tab shows all the options for that object or function
45 |
46 | ## Convert your notebooks to .md
47 | ```bash
48 | jupyter nbconvert --to