├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LA-MCTS-baselines
    ├── Bayesian-Optimization
    │   ├── functions.py
    │   └── run.py
    └── Nevergrad
    │   ├── functions.py
    │   └── run.py
├── LA-MCTS
    ├── README.md
    ├── functions
    │   ├── README.md
    │   ├── functions.py
    │   ├── mujoco_functions.py
    │   └── visualize_policy.py
    ├── images
    │   └── mujoco_experiments.png
    ├── lamcts
    │   ├── Classifier.py
    │   ├── MCTS.py
    │   ├── Node.py
    │   ├── __init__.py
    │   └── utils.py
    ├── requirements.txt
    ├── revision.patch
    ├── run.py
    ├── setup.py
    └── test.py
├── LICENSE
├── LaNAS
    ├── Distributed_LaNAS
    │   ├── README.md
    │   ├── clientX
    │   │   ├── client.py
    │   │   ├── continue_train.py
    │   │   ├── model.py
    │   │   ├── nasnet_set.py
    │   │   ├── operations.py
    │   │   ├── train_client.py
    │   │   └── utils.py
    │   ├── collect_results.py
    │   ├── launch_clients.sh
    │   ├── read_results.py
    │   ├── server
    │   │   ├── Classifier.py
    │   │   ├── MCTS.py
    │   │   ├── Node.py
    │   │   ├── net_training.py
    │   │   └── search_space.zip
    │   └── total_trace.json
    ├── LaNAS_NASBench101
    │   ├── Classifier.py
    │   ├── MCTS.py
    │   ├── Node.py
    │   ├── README.md
    │   ├── extract_end_time.py
    │   ├── net_training.py
    │   └── our_past_results.txt
    ├── LaNet
    │   ├── CIFAR10
    │   │   ├── README.md
    │   │   ├── auto_augment.py
    │   │   ├── model.py
    │   │   ├── nasnet_set.py
    │   │   ├── operations.py
    │   │   ├── test.py
    │   │   ├── train.py
    │   │   └── utils.py
    │   └── README.md
    ├── README.md
    └── one-shot_LaNAS
    │   ├── Evaluate
    │       ├── generator.py
    │       ├── individual_model.py
    │       ├── operations.py
    │       ├── run.sh
    │       ├── super_individual_train.py
    │       ├── translator.py
    │       └── utils.py
    │   ├── LaNAS
    │       ├── Classifier.py
    │       ├── MCTS.py
    │       ├── Node.py
    │       └── mlp_predictor.py
    │   ├── README.md
    │   ├── read_result.py
    │   ├── result.txt
    │   ├── search.py
    │   ├── supernet
    │       ├── generator.py
    │       ├── operations.py
    │       ├── supernet_model.py
    │       ├── supernet_train.py
    │       └── utils.py
    │   └── train.py
└── README.md


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as
 6 | contributors and maintainers pledge to make participation in our project and
 7 | our community a harassment-free experience for everyone, regardless of age, body
 8 | size, disability, ethnicity, sex characteristics, gender identity and expression,
 9 | level of experience, education, socio-economic status, nationality, personal
10 | appearance, race, religion, or sexual identity and orientation.
11 | 
12 | ## Our Standards
13 | 
14 | Examples of behavior that contributes to creating a positive environment
15 | include:
16 | 
17 | * Using welcoming and inclusive language
18 | * Being respectful of differing viewpoints and experiences
19 | * Gracefully accepting constructive criticism
20 | * Focusing on what is best for the community
21 | * Showing empathy towards other community members
22 | 
23 | Examples of unacceptable behavior by participants include:
24 | 
25 | * The use of sexualized language or imagery and unwelcome sexual attention or
26 | advances
27 | * Trolling, insulting/derogatory comments, and personal or political attacks
28 | * Public or private harassment
29 | * Publishing others' private information, such as a physical or electronic
30 | address, without explicit permission
31 | * Other conduct which could reasonably be considered inappropriate in a
32 | professional setting
33 | 
34 | ## Our Responsibilities
35 | 
36 | Project maintainers are responsible for clarifying the standards of acceptable
37 | behavior and are expected to take appropriate and fair corrective action in
38 | response to any instances of unacceptable behavior.
39 | 
40 | Project maintainers have the right and responsibility to remove, edit, or
41 | reject comments, commits, code, wiki edits, issues, and other contributions
42 | that are not aligned to this Code of Conduct, or to ban temporarily or
43 | permanently any contributor for other behaviors that they deem inappropriate,
44 | threatening, offensive, or harmful.
45 | 
46 | ## Scope
47 | 
48 | This Code of Conduct applies within all project spaces, and it also applies when
49 | an individual is representing the project or its community in public spaces.
50 | Examples of representing a project or community include using an official
51 | project e-mail address, posting via an official social media account, or acting
52 | as an appointed representative at an online or offline event. Representation of
53 | a project may be further defined and clarified by project maintainers.
54 | 
55 | ## Enforcement
56 | 
57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
58 | reported by contacting the project team at <opensource-conduct@fb.com>. All
59 | complaints will be reviewed and investigated and will result in a response that
60 | is deemed necessary and appropriate to the circumstances. The project team is
61 | obligated to maintain confidentiality with regard to the reporter of an incident.
62 | Further details of specific enforcement policies may be posted separately.
63 | 
64 | Project maintainers who do not follow or enforce the Code of Conduct in good
65 | faith may face temporary or permanent repercussions as determined by other
66 | members of the project's leadership.
67 | 
68 | ## Attribution
69 | 
70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
72 | 
73 | [homepage]: https://www.contributor-covenant.org
74 | 
75 | For answers to common questions about this code of conduct, see
76 | https://www.contributor-covenant.org/faq
77 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to LaMCTS
 2 | We want to make contributing to this project as easy and transparent as
 3 | possible.
 4 | 
 5 | ## Pull Requests
 6 | We actively welcome your pull requests.
 7 | 
 8 | 1. Fork the repo and create your branch from `master`.
 9 | 2. If you've added code that should be tested, add tests.
10 | 3. If you've changed APIs, update the documentation.
11 | 4. Ensure the test suite passes.
12 | 5. Make sure your code lints.
13 | 6. If you haven't already, complete the Contributor License Agreement ("CLA").
14 | 
15 | ## Contributor License Agreement ("CLA")
16 | In order to accept your pull request, we need you to submit a CLA. You only need
17 | to do this once to work on any of Facebook's open source projects.
18 | 
19 | Complete your CLA here: <https://code.facebook.com/cla>
20 | 
21 | ## Issues
22 | We use GitHub issues to track public bugs. Please ensure your description is
23 | clear and has sufficient instructions to be able to reproduce the issue.
24 | 
25 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
26 | disclosure of security bugs. In those cases, please go through the process
27 | outlined on that page and do not file a public issue.
28 | 
29 | ## License
30 | By contributing to LaMCTS, you agree that your contributions will be licensed
31 | under the LICENSE file in the root directory of this source tree.
32 | 


--------------------------------------------------------------------------------
/LA-MCTS-baselines/Bayesian-Optimization/functions.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import numpy as np
  7 | import gym
  8 | import json
  9 | import os
 10 | 
 11 | 
 12 | class tracker:
 13 |     def __init__(self, foldername):
 14 |         self.counter   = 0
 15 |         self.results   = []
 16 |         self.curt_best = float("inf")
 17 |         self.curt_best_x = None
 18 |         self.foldername = foldername
 19 |         try:
 20 |             os.mkdir(foldername)
 21 |         except OSError:
 22 |             print ("Creation of the directory %s failed" % foldername)
 23 |         else:
 24 |             print ("Successfully created the directory %s " % foldername)
 25 |         
 26 |     def dump_trace(self):
 27 |         trace_path = self.foldername + '/result' + str(len( self.results) )
 28 |         final_results_str = json.dumps(self.results)
 29 |         with open(trace_path, "a") as f:
 30 |             f.write(final_results_str + '\n')
 31 |             
 32 |     def track(self, result, x = None):
 33 |         if result < self.curt_best:
 34 |             self.curt_best = result
 35 |             self.curt_best_x = x
 36 |         print("")
 37 |         print("="*10)
 38 |         print("iteration:", self.counter, "total samples:", len(self.results) )
 39 |         print("="*10)
 40 |         print("current best f(x):", self.curt_best)
 41 |         print("current best x:", np.around(self.curt_best_x, decimals=1))
 42 |         self.results.append(self.curt_best)
 43 |         self.counter += 1
 44 |         if len(self.results) % 100 == 0:
 45 |             self.dump_trace()
 46 |         
 47 | class Levy:
 48 |     def __init__(self, dims=1):
 49 |         self.dims    = dims
 50 |         self.lb      = -10 * np.ones(dims)
 51 |         self.ub      =  10 * np.ones(dims)
 52 |         self.counter = 0
 53 |         print("####dims:", dims)
 54 |         self.tracker = tracker('Levy'+str(dims))
 55 | 
 56 |     def __call__(self, x):
 57 |         x = np.array(x)
 58 |         self.counter += 1
 59 |         assert len(x) == self.dims
 60 |         assert x.ndim == 1
 61 |         assert np.all(x <= self.ub) and np.all(x >= self.lb)
 62 |         w = []
 63 |         for idx in range(0, len(x)):
 64 |             w.append( 1 + (x[idx] - 1) / 4 )
 65 |         w = np.array(w)
 66 |         
 67 |         term1 = ( np.sin( np.pi*w[0] ) )**2;
 68 |         
 69 |         term3 = ( w[-1] - 1 )**2 * ( 1 + ( np.sin( 2 * np.pi * w[-1] ) )**2 );
 70 |         
 71 |         
 72 |         term2 = 0;
 73 |         for idx in range(1, len(w) ):
 74 |             wi  = w[idx]
 75 |             new = (wi-1)**2 * ( 1 + 10 * ( np.sin( np.pi* wi + 1 ) )**2)
 76 |             term2 = term2 + new
 77 |         
 78 |         result = term1 + term2 + term3
 79 | 
 80 |         self.tracker.track( result, x )
 81 | 
 82 |         return result
 83 |     
 84 |         
 85 | class Ackley:
 86 |     def __init__(self, dims=3):
 87 |         self.dims    = dims
 88 |         self.lb      = -5 * np.ones(dims)
 89 |         self.ub      = 10 * np.ones(dims)
 90 |         self.counter = 0
 91 |         self.tracker = tracker('Ackley'+str(dims))
 92 |         
 93 | 
 94 |     def __call__(self, x):
 95 |         x = np.array(x)
 96 |         self.counter += 1
 97 |         assert len(x) == self.dims
 98 |         assert x.ndim == 1
 99 |         # assert np.all(x <= self.ub) and np.all(x >= self.lb)
100 |         w = 1 + (x - 1.0) / 4.0
101 |         result = (-20*np.exp(-0.2 * np.sqrt(np.inner(x,x) / x.size )) -np.exp(np.cos(2*np.pi*x).sum() /x.size) + 20 +np.e )
102 |         
103 |         self.tracker.track( result, x )
104 |         
105 |         return result    
106 |     
107 |     
108 |     
109 |     
110 |     
111 |     
112 |     
113 |     
114 |     
115 |     
116 |     
117 |     
118 |     
119 |     
120 |     
121 | 


--------------------------------------------------------------------------------
/LA-MCTS-baselines/Bayesian-Optimization/run.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | from functions import *
 7 | import argparse
 8 | import os
 9 | from skopt import gp_minimize
10 | import argparse
11 | 
12 | parser = argparse.ArgumentParser(description='Process inputs')
13 | parser.add_argument('--func', help='specify the test function')
14 | parser.add_argument('--dims', type=int, help='specify the problem dimensions')
15 | parser.add_argument('--iterations', type=int, help='specify the iterations to collect in the search')
16 | 
17 | 
18 | args = parser.parse_args()
19 | 
20 | f = None
21 | iteration = 0
22 | if args.func == 'ackley':
23 |     assert args.dims > 0
24 |     f = Ackley(dims =args.dims)
25 | elif args.func == 'levy':
26 |     f = Levy(dims = args.dims)
27 | else:
28 |     print('function not defined')
29 |     os._exit(1)
30 | 
31 | assert args.dims > 0
32 | assert f is not None
33 | assert args.iterations > 0
34 | 
35 | lower = f.lb
36 | upper = f.ub
37 | 
38 | bounds = []
39 | for idx in range(0, len(f.lb) ):
40 |     bounds.append( ( float(f.lb[idx]), float(f.ub[idx])) )
41 | 
42 | res = gp_minimize(f,                          # the function to minimize
43 |                   bounds,                     # the bounds on each dimension of x
44 |                   acq_func="EI",              # the acquisition function
45 |                   n_calls=args.iterations,
46 |                   acq_optimizer = "sampling", # using sampling to be consisent with our BO implementation
47 |                   n_initial_points=40
48 |                   )


--------------------------------------------------------------------------------
/LA-MCTS-baselines/Nevergrad/functions.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import numpy as np
  7 | import gym
  8 | import json
  9 | import os
 10 | 
 11 | 
 12 | class tracker:
 13 |     def __init__(self, foldername):
 14 |         self.counter   = 0
 15 |         self.results   = []
 16 |         self.curt_best = float("inf")
 17 |         self.curt_best_x = None
 18 |         self.foldername = foldername
 19 |         try:
 20 |             os.mkdir(foldername)
 21 |         except OSError:
 22 |             print ("Creation of the directory %s failed" % foldername)
 23 |         else:
 24 |             print ("Successfully created the directory %s " % foldername)
 25 |         
 26 |     def dump_trace(self):
 27 |         trace_path = self.foldername + '/result' + str(len( self.results) )
 28 |         final_results_str = json.dumps(self.results)
 29 |         with open(trace_path, "a") as f:
 30 |             f.write(final_results_str + '\n')
 31 |             
 32 |     def track(self, result, x = None):
 33 |         if result < self.curt_best:
 34 |             self.curt_best = result
 35 |             self.curt_best_x = x
 36 |         print("")
 37 |         print("="*10)
 38 |         print("iteration:", self.counter, " total samples:", len(self.results))
 39 |         print("="*10)
 40 |         print("current best f(x):", self.curt_best)
 41 |         print("current best x:", np.around(self.curt_best_x, decimals=1))
 42 |         self.results.append(self.curt_best)
 43 |         self.counter += 1
 44 |         if len(self.results) % 100 == 0:
 45 |             self.dump_trace()            
 46 |     
 47 | class Levy:
 48 |     def __init__(self, dims=1):
 49 |         self.dims    = dims
 50 |         self.lb      = -10 * np.ones(dims)
 51 |         self.ub      =  10 * np.ones(dims)
 52 |         self.counter = 0
 53 |         print("####dims:", dims)
 54 |         self.tracker = tracker('Levy'+str(dims))
 55 | 
 56 |     def __call__(self, x):
 57 |         x = np.array(x)
 58 |         self.counter += 1
 59 |         assert len(x) == self.dims
 60 |         assert x.ndim == 1
 61 |         assert np.all(x <= self.ub) and np.all(x >= self.lb)
 62 |         w = []
 63 |         for idx in range(0, len(x)):
 64 |             w.append( 1 + (x[idx] - 1) / 4 )
 65 |         w = np.array(w)
 66 |         
 67 |         term1 = ( np.sin( np.pi*w[0] ) )**2;
 68 |         
 69 |         term3 = ( w[-1] - 1 )**2 * ( 1 + ( np.sin( 2 * np.pi * w[-1] ) )**2 );
 70 |         
 71 |         
 72 |         term2 = 0;
 73 |         for idx in range(1, len(w) ):
 74 |             wi  = w[idx]
 75 |             new = (wi-1)**2 * ( 1 + 10 * ( np.sin( np.pi* wi + 1 ) )**2)
 76 |             term2 = term2 + new
 77 |         
 78 |         result = term1 + term2 + term3
 79 | 
 80 |         self.tracker.track( result, x )
 81 | 
 82 |         return result
 83 |         
 84 | class Ackley:
 85 |     def __init__(self, dims=3):
 86 |         self.dims   = dims
 87 |         self.lb    = -5 * np.ones(dims)
 88 |         self.ub    =  10 * np.ones(dims)
 89 |         self.counter = 0
 90 |         self.tracker = tracker('Ackley'+str(dims))
 91 |         
 92 | 
 93 |     def __call__(self, x):
 94 |         self.counter += 1
 95 |         assert len(x) == self.dims
 96 |         assert x.ndim == 1
 97 |         # assert np.all(x <= self.ub) and np.all(x >= self.lb)
 98 |         w = 1 + (x - 1.0) / 4.0
 99 |         result = (-20*np.exp(-0.2 * np.sqrt(np.inner(x,x) / x.size )) -np.exp(np.cos(2*np.pi*x).sum() /x.size) + 20 +np.e )
100 |         
101 |         self.tracker.track( result, x )
102 |                 
103 |         return result   


--------------------------------------------------------------------------------
/LA-MCTS-baselines/Nevergrad/run.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | from functions import *
 7 | import argparse
 8 | import os
 9 | import nevergrad as ng
10 | 
11 | import argparse
12 | 
13 | 
14 | 
15 | 
16 | parser = argparse.ArgumentParser(description='Process inputs')
17 | parser.add_argument('--func', help='specify the test function')
18 | parser.add_argument('--dims', type=int, help='specify the problem dimensions')
19 | parser.add_argument('--iterations', type=int, help='specify the iterations to collect in the search')
20 | 
21 | 
22 | args = parser.parse_args()
23 | 
24 | f = None
25 | iteration = 0
26 | if args.func == 'ackley':
27 |     assert args.dims > 0
28 |     f = Ackley(dims =args.dims)
29 | elif args.func == 'levy':
30 |     f = Levy(dims = args.dims)
31 | else:
32 |     print('function not defined')
33 |     os._exit(1)
34 | 
35 | assert args.dims > 0
36 | assert f is not None
37 | assert args.iterations > 0
38 | 
39 | 
40 | def from_unit_cube(x, lb, ub):
41 |     """Project from [0, 1]^d to hypercube with bounds lb and ub"""
42 |     assert np.all(lb < ub) and lb.ndim == 1 and ub.ndim == 1 and x.ndim == 2
43 |     xx = x * (ub - lb) + lb
44 |     return np.ravel(xx)
45 |     
46 | init = from_unit_cube( np.random.rand(f.dims).reshape(1,-1), f.lb, f.ub)
47 | 
48 | param = ng.p.Array(init=init ).set_bounds(f.lb, f.ub)
49 | 
50 | optimizer = ng.optimizers.NGOpt(parametrization=param, budget=args.iterations)
51 | 
52 | recommendation = optimizer.minimize(f)


--------------------------------------------------------------------------------
/LA-MCTS/README.md:
--------------------------------------------------------------------------------
  1 | # Latent Action Monte Carlo Tree Search (LA-MCTS)
  2 | LA-MCTS is a meta-algortihm that partitions the search space for black-box optimizations. LA-MCTS progressively learns to partition and explores promising regions in the search space, so that solvers such as Bayesian Optimizations (BO) can focus on promising subregions, mitigating the over-exploring issue in high-dimensional problems. 
  3 | 
  4 | <p align="center">
  5 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LA-MCTS/meta_algorithms.png?raw=true' width="300">
  6 | </p>
  7 | 
  8 | Please reference the following publication when using this package. ArXiv <a href="https://arxiv.org/abs/2007.00708">link</a>.
  9 | 
 10 | ```
 11 | @article{wang2020learning,
 12 |   title={Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search},
 13 |   author={Wang, Linnan and Fonseca, Rodrigo and Tian, Yuandong},
 14 |   journal={NeurIPS},
 15 |   year={2020}
 16 | }
 17 | ```
 18 | 
 19 | ## Run LaMCTS and Baselines in test functions (1 minute tutorial)
 20 | For 1 minute evaluation of Bayesian Optimizations (BO) and Evolutionary Algorithm (EA) v.s. LA-MCTS boosted BO, please follow the procedures below. 
 21 | 
 22 | Here we test on <a href="https://www.sfu.ca/~ssurjano/ackley.html">Ackley</a> or <a href="https://www.sfu.ca/~ssurjano/levy.html">Levy</a> in 10 dimensions; Please run multiple times to compare the average performance.
 23 | 
 24 | - ***Evaluate LA-MCTS boosted Bayesian Optimization***: using GP surrogate, EI acuqusition, plus LA-MCTS.
 25 | ```
 26 | cd LA-MCTS
 27 | python run.py --func ackley --dims 10 --iterations 100
 28 | ```
 29 | 
 30 | - ***Evaluate Bayesian Optimization***: using GP surrogate, EI acuqusition.
 31 | ```
 32 | pip install scikit-optimize
 33 | cd LA-MCTS-baselines/Bayesian-Optimization
 34 | python run.py --func ackley --dims 10 --iterations 100
 35 | ```
 36 | 
 37 | - ***Evaluate Evolutionary Algorithm***: using NGOpt from Nevergrad.
 38 | ```
 39 | pip install nevergrad
 40 | cd LA-MCTS-baselines/Nevergrad
 41 | python run.py --func ackley --dims 10 --iterations 100
 42 | ```
 43 | 
 44 | 
 45 | ## How to use LA-MCTS to optimize your own function?
 46 | Please wrap your function into a class defined as follows; functions/functions.py provides a few examples.
 47 | 
 48 | ```
 49 | class myFunc:
 50 |     def __init__(self, dims=1):
 51 |         self.dims    = dims                   #problem dimensions
 52 |         self.lb      =  np.ones(dims)         #lower bound for each dimensions 
 53 |         self.ub      =  np.ones(dims)         #upper bound for each dimensions 
 54 |         self.tracker = tracker('myFunc')      #defined in functions.py
 55 | 
 56 |     def __call__(self, x):
 57 |         # some sanity check of x        
 58 |         f(x) = myFunc(x)
 59 |         self.tracker.track( f(x), x )        
 60 |         return f(x)
 61 | ```
 62 | 
 63 | After defining your function, e.g. f = func(), minimizing f(x) is as easy as passing f into MCTS.
 64 | ```
 65 | f = myFunc()
 66 | agent = MCTS(lb = f.lb,     # the lower bound of each problem dimensions
 67 |              ub = f.ub,     # the upper bound of each problem dimensions
 68 |              dims = f.dims, # the problem dimensions
 69 |              ninits = 40,   # the number of random samples used in initializations 
 70 |              func = f       # function object to be optimized
 71 |              )
 72 | agent.search(iterations = 100)
 73 | ```
 74 | Please check `run.py`. 
 75 | 
 76 | 
 77 | ## What it can and cannot do?
 78 | In this release, the codes only support optimizing continuous black box functions.
 79 | 
 80 | ## Tuning LA-MCTS
 81 | 
 82 | ### **Cp**: controling the amount of exploration, MCTS.py line 27.
 83 | > <b>We set Cp = 0.1 * max of f(x) </b>. 
 84 | 
 85 | > For example, if f(x) measures the test accuracy of a neural network x, Cp = 0.1. But the best performance should be tuned in specific cases. A large Cp encourages LA-MCTS to visit bad regions more often (exploration), and a small Cp otherwise. LA-MCTS degenreates to random search if Cp = 0, while LA-MCTS degenerates to a pure greedy based policy, e.g. regression tree, at Cp = 0. Both are undesired. 
 86 | 
 87 | ### **Leaf Size**: the splitting threshold (θ), MCTS.py line 38.
 88 | > <b> We set θ ∈ [20, 100] </b> in our experiments.
 89 | 
 90 | > the splitting threshold controls the speed of tree growth. Given the same \#samples, smaller θ leads to a deeper tree. If Ω is very large, more splits enable LA-MCTS to quickly focus on a small promising region, and yields good results. However, if θ is too small, the performance and the boundary estimation of the region become more unreliable. 
 91 | 
 92 | ### **SVM kernel**: the type of kernels used by SVM, Classifier.py line 35.
 93 | > <b> kernel can be 'linear', 'poly', 'rbf'</b>
 94 | 
 95 | > From our experiments, linear kernel is the fastest, but rbf or poly are generally producing better results. If you want to draw > 1000 samples, we suggest using linear kernel, and rbf and poly otherwise.
 96 | 
 97 | ## Mujoco tasks and Gym Games.
 98 | <p align="center">
 99 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LA-MCTS/lunar_landing.gif?raw=true' width="400">
100 | </p>
101 | 
102 | - **Run Lunarlanding**:
103 | 1. Install gym and start running.
104 | ```
105 | pip install gym
106 | python run.py --func lunar --samples 500
107 | ```
108 | Copy the final "current best x" from the output to visualize your policy.
109 | 
110 | 2. Visualize your policy
111 | ```
112 | cd functions
113 | Replace our policy value to your learned policy from the previous step, i.e. policy = np.array([xx]).
114 | python visualize_policy.py
115 | ```
116 | - **Run MuJoCo**:
117 | <p align="center">
118 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LA-MCTS/swimmer.gif?raw=true' width="400"> &nbsp; &nbsp; &nbsp;
119 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LA-MCTS/hopper.gif?raw=true' width="400">
120 | </p>
121 | 
122 | 1. Setup mujoco-py, see <a href="https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key">here</a>.
123 | 
124 | 2. Download TuRBO from <a href="https://github.com/uber-research/TuRBO">here</a>. Extract and find the turbo folder.
125 | 
126 | 3. Move revision.patch to the turbo folder, 
127 | ```
128 | cp revision.patch TuRBO-master/turbo
129 | cd TuRBO-master/turbo
130 | patch -p1 < revision.patch
131 | cd ../..
132 | mv TuRBO-master/turbo ./turbo_1
133 | ```
134 | 4. Open file Classifier.py, uncomment line 23, 343->368. Open file MCTS.py, change line 47, solver_type from bo to turbo.
135 | 
136 | 5. Now it is ready to run,
137 | ```
138 | python run.py --func swimmer --iterations 1000
139 | ```
140 | 
141 | ![Mujoco Experiments](images/mujoco_experiments.png)
142 | 
143 | 
144 | ## Possible Extensions
145 | In MCTS.py line 229, the select returns a path and a leaf to bound the sampling space. We used ```get_sample_ratio_in_region``` in Classifier.py to acquire samples in the selected partition. Other sampler can also be used. 
146 | 
147 | LAMCTS can be used together with any value function / evaluator / cost models.
148 | 


--------------------------------------------------------------------------------
/LA-MCTS/functions/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/LA-MCTS/functions/functions.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import numpy as np
  7 | import gym
  8 | import json
  9 | import os
 10 | 
 11 | import imageio
 12 | 
 13 | 
 14 | class tracker:
 15 |     def __init__(self, foldername):
 16 |         self.counter   = 0
 17 |         self.results   = []
 18 |         self.curt_best = float("inf")
 19 |         self.foldername = foldername
 20 |         try:
 21 |             os.mkdir(foldername)
 22 |         except OSError:
 23 |             print ("Creation of the directory %s failed" % foldername)
 24 |         else:
 25 |             print ("Successfully created the directory %s " % foldername)
 26 |         
 27 |     def dump_trace(self):
 28 |         trace_path = self.foldername + '/result' + str(len( self.results) )
 29 |         final_results_str = json.dumps(self.results)
 30 |         with open(trace_path, "a") as f:
 31 |             f.write(final_results_str + '\n')
 32 |             
 33 |     def track(self, result):
 34 |         if result < self.curt_best:
 35 |             self.curt_best = result
 36 |         self.results.append(self.curt_best)
 37 |         if len(self.results) % 100 == 0:
 38 |             self.dump_trace()
 39 | 
 40 | class Levy:
 41 |     def __init__(self, dims=10):
 42 |         self.dims        = dims
 43 |         self.lb          = -10 * np.ones(dims)
 44 |         self.ub          =  10 * np.ones(dims)
 45 |         self.tracker     = tracker('Levy'+str(dims))
 46 |         
 47 |         #tunable hyper-parameters in LA-MCTS
 48 |         self.Cp          = 10
 49 |         self.leaf_size   = 8
 50 |         self.kernel_type = "poly"
 51 |         self.ninits      = 40
 52 |         self.gamma_type   = "auto"
 53 |         print("initialize levy at dims:", self.dims)
 54 |         
 55 |     def __call__(self, x):
 56 |         assert len(x) == self.dims
 57 |         assert x.ndim == 1
 58 |         assert np.all(x <= self.ub) and np.all(x >= self.lb)
 59 |         
 60 |         w = []
 61 |         for idx in range(0, len(x)):
 62 |             w.append( 1 + (x[idx] - 1) / 4 )
 63 |         w = np.array(w)
 64 |         
 65 |         term1 = ( np.sin( np.pi*w[0] ) )**2;
 66 |         
 67 |         term3 = ( w[-1] - 1 )**2 * ( 1 + ( np.sin( 2 * np.pi * w[-1] ) )**2 );
 68 |         
 69 |         
 70 |         term2 = 0;
 71 |         for idx in range(1, len(w) ):
 72 |             wi  = w[idx]
 73 |             new = (wi-1)**2 * ( 1 + 10 * ( np.sin( np.pi* wi + 1 ) )**2)
 74 |             term2 = term2 + new
 75 |         
 76 |         result = term1 + term2 + term3
 77 |         self.tracker.track( result )
 78 | 
 79 |         return result
 80 | 
 81 | class Ackley:
 82 |     def __init__(self, dims=10):
 83 |         self.dims      = dims
 84 |         self.lb        = -5 * np.ones(dims)
 85 |         self.ub        =  10 * np.ones(dims)
 86 |         self.counter   = 0
 87 |         self.tracker   = tracker('Ackley'+str(dims) )
 88 |         
 89 |         #tunable hyper-parameters in LA-MCTS
 90 |         self.Cp        = 1
 91 |         self.leaf_size = 10
 92 |         self.ninits    = 40
 93 |         self.kernel_type = "rbf"
 94 |         self.gamma_type  = "auto"
 95 |         
 96 |         
 97 |     def __call__(self, x):
 98 |         self.counter += 1
 99 |         assert len(x) == self.dims
100 |         assert x.ndim == 1
101 |         assert np.all(x <= self.ub) and np.all(x >= self.lb)
102 |         result = (-20*np.exp(-0.2 * np.sqrt(np.inner(x,x) / x.size )) -np.exp(np.cos(2*np.pi*x).sum() /x.size) + 20 +np.e )
103 |         self.tracker.track( result )
104 |                 
105 |         return result
106 |         
107 | class Lunarlanding:
108 |     def __init__(self):
109 |         self.dims = 12
110 |         self.lb   = np.zeros(12)
111 |         self.ub   = 2 * np.ones(12)
112 |         self.counter = 0
113 |         self.env = gym.make('LunarLander-v2')
114 |         
115 |         #tunable hyper-parameters in LA-MCTS
116 |         self.Cp          = 50
117 |         self.leaf_size   = 10
118 |         self.kernel_type = "poly"
119 |         self.ninits      = 40
120 |         self.gamma_type  = "scale"
121 |         
122 |         self.render      = False
123 |         
124 |         
125 |     def heuristic_Controller(self, s, w):
126 |         angle_targ = s[0] * w[0] + s[2] * w[1]
127 |         if angle_targ > w[2]:
128 |             angle_targ = w[2]
129 |         if angle_targ < -w[2]:
130 |             angle_targ = -w[2]
131 |         hover_targ = w[3] * np.abs(s[0])
132 | 
133 |         angle_todo = (angle_targ - s[4]) * w[4] - (s[5]) * w[5]
134 |         hover_todo = (hover_targ - s[1]) * w[6] - (s[3]) * w[7]
135 | 
136 |         if s[6] or s[7]:
137 |             angle_todo = w[8]
138 |             hover_todo = -(s[3]) * w[9]
139 | 
140 |         a = 0
141 |         if hover_todo > np.abs(angle_todo) and hover_todo > w[10]:
142 |             a = 2
143 |         elif angle_todo < -w[11]:
144 |             a = 3
145 |         elif angle_todo > +w[11]:
146 |             a = 1
147 |         return a
148 |         
149 |     def __call__(self, x):
150 |         self.counter += 1
151 |         assert len(x) == self.dims
152 |         assert x.ndim == 1
153 |         assert np.all(x <= self.ub) and np.all(x >= self.lb)
154 |     
155 |         total_rewards = []
156 |         for i in range(0, 3): # controls the number of episode/plays per trial
157 |             state = self.env.reset()
158 |             rewards_for_episode = []
159 |             num_steps = 2000
160 |         
161 |             for step in range(num_steps):
162 |                 if self.render:
163 |                     self.env.render()
164 |                 received_action = self.heuristic_Controller(state, x)
165 |                 next_state, reward, done, info = self.env.step(received_action)
166 |                 rewards_for_episode.append( reward )
167 |                 state = next_state
168 |                 if done:
169 |                      break
170 |                         
171 |             rewards_for_episode = np.array(rewards_for_episode)
172 |             total_rewards.append( np.sum(rewards_for_episode) )
173 |         total_rewards = np.array(total_rewards)
174 |         mean_rewards = np.mean( total_rewards )
175 |         
176 |         return mean_rewards*-1
177 | 
178 | 
179 |     
180 |     
181 |     
182 |     
183 |     
184 |     
185 |     
186 |     
187 |     
188 |     
189 |     
190 |     
191 |     
192 |     
193 |     
194 |     
195 |     
196 |     
197 |     
198 |     
199 | 


--------------------------------------------------------------------------------
/LA-MCTS/functions/mujoco_functions.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import numpy as np
  7 | import gym
  8 | import json
  9 | import os
 10 | 
 11 | class Swimmer:
 12 |     
 13 |     def __init__(self):
 14 |         self.policy_shape = (2, 8)
 15 |         self.mean         = 0
 16 |         self.std          = 1
 17 |         self.dims         = 16
 18 |         self.lb           = -1 * np.ones(self.dims)
 19 |         self.ub           =  1 * np.ones(self.dims)
 20 |         self.counter      = 0
 21 |         self.env          = gym.make('Swimmer-v2')
 22 |         self.num_rollouts = 3
 23 |         
 24 |         #tunable hyper-parameters in LA-MCTS
 25 |         self.Cp           = 20
 26 |         self.leaf_size    = 10
 27 |         self.kernel_type  = "poly"
 28 |         self.gamma_type   = "scale"
 29 |         self.ninits       = 40
 30 |         print("===========initialization===========")
 31 |         print("mean:", self.mean)
 32 |         print("std:", self.std)
 33 |         print("dims:", self.dims)
 34 |         print("policy:", self.policy_shape )
 35 |         
 36 |         self.render = False
 37 |         
 38 |     
 39 |     def __call__(self, x):
 40 |         self.counter += 1
 41 |         assert len(x) == self.dims
 42 |         assert x.ndim == 1
 43 |         assert np.all(x <= self.ub) and np.all(x >= self.lb)
 44 |         
 45 |         M = x.reshape(self.policy_shape)
 46 |         
 47 |         returns = []
 48 |         observations = []
 49 |         actions = []
 50 |         
 51 |         
 52 |         for i in range(self.num_rollouts):
 53 |             obs    = self.env.reset()
 54 |             done   = False
 55 |             totalr = 0.
 56 |             steps  = 0
 57 |             
 58 |             while not done:
 59 |                 
 60 |                 action = np.dot(M, (obs - self.mean)/self.std)
 61 |                 observations.append(obs)
 62 |                 actions.append(action)
 63 |                 obs, r, done, _ = self.env.step(action)
 64 |                 totalr += r
 65 |                 steps += 1                
 66 |                 if self.render:
 67 |                     self.env.render()            
 68 |             returns.append(totalr)
 69 |             
 70 |         
 71 |         return np.mean(returns)*-1
 72 |         
 73 | 
 74 | class Hopper:
 75 |     
 76 |     def __init__(self):
 77 |         self.mean    = np.array([1.41599384, -0.05478602, -0.25522216, -0.25404721, 
 78 |                                  0.27525085, 2.60889529,  -0.0085352, 0.0068375, 
 79 |                                  -0.07123674, -0.05044839, -0.45569644])
 80 |         self.std     = np.array([0.19805723, 0.07824488,  0.17120271, 0.32000514, 
 81 |                                  0.62401884, 0.82814161,  1.51915814, 1.17378372, 
 82 |                                  1.87761249, 3.63482761, 5.7164752 ])
 83 |         self.dims    = 33
 84 |         self.lb      = -1 * np.ones(self.dims)
 85 |         self.ub      =  1 * np.ones(self.dims)
 86 |         self.counter = 0
 87 |         self.env     = gym.make('Hopper-v2')
 88 |         self.num_rollouts = 3
 89 |         self.render  = False
 90 |         self.policy_shape = (3, 11)
 91 |         
 92 |         #tunable hyper-parameters in LA-MCTS
 93 |         self.Cp           = 10
 94 |         self.leaf_size    = 100
 95 |         self.kernel_type  = "poly"
 96 |         self.gamma_type   = "auto"
 97 |         self.ninits       = 150
 98 |         
 99 |         print("===========initialization===========")
100 |         print("mean:", self.mean)
101 |         print("std:", self.std)
102 |         print("dims:", self.dims)
103 |         print("policy:", self.policy_shape )
104 |             
105 |     def __call__(self, x):
106 |         self.counter += 1
107 |         assert len(x) == self.dims
108 |         assert x.ndim == 1
109 |         assert np.all(x <= self.ub) and np.all(x >= self.lb)
110 |         
111 |         M = x.reshape(self.policy_shape)
112 |         
113 |         returns = []
114 |         observations = []
115 |         actions = []
116 |         
117 |         for i in range(self.num_rollouts):
118 |             obs    = self.env.reset()
119 |             done   = False
120 |             totalr = 0.
121 |             steps  = 0
122 |             while not done:
123 |                 # M      = self.policy
124 |                 inputs = (obs - self.mean)/self.std
125 |                 action = np.dot(M, inputs)
126 |                 observations.append(obs)
127 |                 actions.append(action)
128 |                 obs, r, done, _ = self.env.step(action)
129 |                 totalr += r
130 |                 steps  += 1
131 |                 if self.render:
132 |                     self.env.render()
133 |             returns.append(totalr)
134 |             
135 |         return np.mean(returns)*-1
136 | 
137 | # ######################################## #
138 | # Visualize the learned policy for Swimmer #
139 | # ######################################## #
140 | # f = Swimmer()
141 | # x = np.array([-0.5343142,-0.46203456, -0.70218485,
142 | #               -0.00929887, 0.4072553, 0.04604763,
143 | #               0.67289615, -0.5894774, 0.79874759,
144 | #               0.84010238, 0.54327755, 0.25715409,
145 | #               0.89032131, -0.56112252, -0.0960243,
146 | #               0.13397496])
147 | # f.render = True
148 | # result = f(x)
149 | # print( result )
150 | 
151 | # f = Hopper()
152 | # x = np.random.rand(f.dims)
153 | # result = f(x)
154 | # print( result )
155 | 
156 | 
157 | 
158 | 


--------------------------------------------------------------------------------
/LA-MCTS/functions/visualize_policy.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | 
 7 | from functions import Lunarlanding
 8 | from mujoco_functions import *
 9 | import numpy as np
10 | 
11 | 
12 | 
13 | 
14 | # policy = np.array([ 0.6829919, 0.4348611, 1.93635682, 1.53007997,
15 | #                1.69574236, 0.66056938, 0.28328839, 1.12798157,
16 | #                0.06496076, 1.71219888, 0.23686494, 0.20135697 ] )
17 | # f = Lunarlanding()
18 | # f.render = True
19 | # result = f(policy)
20 | # print( result )
21 | 
22 | policy = np.array(
23 | [ 0.40721659,  0.64248771,   0.31267019, -0.69240676, -0.00208609, -0.86336196,
24 |  -0.54423801,  0.28333422,  -0.68388651, -0.26167397, -0.58448575,  0.11981415,
25 |  -0.90660989,  0.55700556,  -0.22651554,  0.42790948,  0.15368999,  0.7514032,
26 |  -0.42978046, -0.60632853,  -0.88724493, -0.01787839,  0.74753749, -0.8137155,
27 |   0.41300612,  0.08062934,  -0.25451053, -0.77197475, -0.09003459, -0.76673666,
28 |  -0.30785222,  0.41125726,  -0.11475573]
29 | )
30 | f = Hopper()
31 | f.render = True
32 | result = f(policy)
33 | print(result)
34 | 


--------------------------------------------------------------------------------
/LA-MCTS/images/mujoco_experiments.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/LaMCTS/489bd60886f23b0b76b10aa8602ea6722f334ad6/LA-MCTS/images/mujoco_experiments.png


--------------------------------------------------------------------------------
/LA-MCTS/lamcts/Node.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | from .Classifier import Classifier
  7 | import json
  8 | import numpy as np
  9 | import math
 10 | import operator
 11 | 
 12 | class Node:
 13 |     obj_counter   = 0
 14 |     # If a leave holds >= SPLIT_THRESH, we split into two new nodes.
 15 |     
 16 |     def __init__(self, parent = None, dims = 0, reset_id = False, kernel_type = "rbf", gamma_type = "auto"):
 17 |         # Note: every node is initialized as a leaf,
 18 |         # only internal nodes equip with classifiers to make decisions
 19 |         # if not is_root:
 20 |         #     assert type( parent ) == type( self )
 21 |         self.dims          = dims
 22 |         self.x_bar         = float('inf')
 23 |         self.n             = 0
 24 |         self.uct           = 0
 25 |         self.classifier    = Classifier( [], self.dims, kernel_type, gamma_type )
 26 |             
 27 |         #insert curt into the kids of parent
 28 |         self.parent        = parent        
 29 |         self.kids          = [] # 0:good, 1:bad
 30 |         
 31 |         self.bag               = []
 32 |         self.is_svm_splittable = False 
 33 |         
 34 |         if reset_id:
 35 |             Node.obj_counter = 0
 36 | 
 37 |         self.id            = Node.obj_counter
 38 |                 
 39 |         #data for good and bad kids, respectively
 40 |         Node.obj_counter += 1
 41 |     
 42 |     def update_kids(self, good_kid, bad_kid):
 43 |         assert len(self.kids) == 0
 44 |         self.kids.append( good_kid )
 45 |         self.kids.append( bad_kid )
 46 |         assert self.kids[0].classifier.get_mean() > self.kids[1].classifier.get_mean()
 47 |         
 48 |     def is_good_kid(self):
 49 |         if self.parent is not None:
 50 |             if self.parent.kids[0] == self:
 51 |                 return True
 52 |             else:
 53 |                 return False
 54 |         else:
 55 |             return False
 56 |     
 57 |     def is_leaf(self):
 58 |         if len(self.kids) == 0:
 59 |             return True
 60 |         else:
 61 |             return False 
 62 |             
 63 |     def visit(self):
 64 |         self.n += 1
 65 |         
 66 |     def print_bag(self):
 67 |         sorted_bag = sorted(self.bag.items(), key=operator.itemgetter(1))
 68 |         print("BAG"+"#"*10)
 69 |         for item in sorted_bag:
 70 |             print(item[0],"==>", item[1])            
 71 |         print("BAG"+"#"*10)
 72 |         print('\n')
 73 |         
 74 |     def update_bag(self, samples):
 75 |         assert len(samples) > 0
 76 |         
 77 |         self.bag.clear()
 78 |         self.bag.extend( samples )
 79 |         self.classifier.update_samples( self.bag )
 80 |         if len(self.bag) <= 2:
 81 |             self.is_svm_splittable = False
 82 |         else:
 83 |             self.is_svm_splittable = self.classifier.is_splittable_svm()
 84 |         self.x_bar             = self.classifier.get_mean()
 85 |         self.n                 = len( self.bag )
 86 |         
 87 |     def clear_data(self):
 88 |         self.bag.clear()
 89 |     
 90 |     def get_name(self):
 91 |         # state is a list of jsons
 92 |         return "node" + str(self.id)
 93 |     
 94 |     def pad_str_to_8chars(self, ins, total):
 95 |         if len(ins) <= total:
 96 |             ins += ' '*(total - len(ins) )
 97 |             return ins
 98 |         else:
 99 |             return ins[0:total]
100 |             
101 |     def get_rand_sample_from_bag(self):
102 |         if len( self.bag ) > 0:
103 |             upeer_boundary = len(list(self.bag))
104 |             rand_idx = np.random.randint(0, upeer_boundary)
105 |             return self.bag[rand_idx][0]
106 |         else:
107 |             return None
108 |             
109 |     def get_parent_str(self):
110 |         return self.parent.get_name()
111 |             
112 |     def propose_samples_bo(self, num_samples, path, lb, ub, samples):
113 |         proposed_X = self.classifier.propose_samples_bo(num_samples, path, lb, ub, samples)
114 |         return proposed_X
115 |         
116 |     def propose_samples_turbo(self, num_samples, path, func):
117 |         proposed_X, fX = self.classifier.propose_samples_turbo(num_samples, path, func)
118 |         return proposed_X, fX
119 | 
120 |     def propose_samples_rand(self, num_samples):
121 |         assert num_samples > 0
122 |         samples = self.classifier.propose_samples_rand(num_samples)
123 |         return samples
124 |     
125 |     def __str__(self):
126 |         name   = self.get_name()
127 |         name   = self.pad_str_to_8chars(name, 7)
128 |         name  += ( self.pad_str_to_8chars( 'is good:' + str(self.is_good_kid() ), 15 ) )
129 |         name  += ( self.pad_str_to_8chars( 'is leaf:' + str(self.is_leaf() ), 15 ) )
130 |         
131 |         val    = 0
132 |         name  += ( self.pad_str_to_8chars( ' val:{0:.4f}   '.format(round(self.get_xbar(), 3) ), 20 ) )
133 |         name  += ( self.pad_str_to_8chars( ' uct:{0:.4f}   '.format(round(self.get_uct(), 3) ), 20 ) )
134 | 
135 |         name  += self.pad_str_to_8chars( 'sp/n:'+ str(len(self.bag))+"/"+str(self.n), 15 )
136 |         upper_bound = np.around( np.max(self.classifier.X, axis = 0), decimals=2 )
137 |         lower_bound = np.around( np.min(self.classifier.X, axis = 0), decimals=2 )
138 |         boundary    = ''
139 |         for idx in range(0, self.dims):
140 |             boundary += str(lower_bound[idx])+'>'+str(upper_bound[idx])+' '
141 |             
142 |         #name  += ( self.pad_str_to_8chars( 'bound:' + boundary, 60 ) )
143 | 
144 |         parent = '----'
145 |         if self.parent is not None:
146 |             parent = self.parent.get_name()
147 |         parent = self.pad_str_to_8chars(parent, 10)
148 |         
149 |         name += (' parent:' + parent)
150 |         
151 |         kids = ''
152 |         kid  = ''
153 |         for k in self.kids:
154 |             kid   = self.pad_str_to_8chars( k.get_name(), 10 )
155 |             kids += kid
156 |         name  += (' kids:' + kids)
157 |         
158 |         return name
159 |     
160 | 
161 |     def get_uct(self, Cp = 10 ):
162 |         if self.parent == None:
163 |             return float('inf')
164 |         if self.n == 0:
165 |             return float('inf')
166 |         return self.x_bar + 2*Cp*math.sqrt( 2* np.power(self.parent.n, 0.5) / self.n )
167 |     
168 |     def get_xbar(self):
169 |         return self.x_bar
170 | 
171 |     def get_n(self):
172 |         return self.n
173 |         
174 |     def train_and_split(self):
175 |         assert len(self.bag) >= 2
176 |         self.classifier.update_samples( self.bag )
177 |         good_kid_data, bad_kid_data = self.classifier.split_data()
178 |         assert len( good_kid_data ) + len( bad_kid_data ) ==  len( self.bag )
179 |         return good_kid_data, bad_kid_data
180 | 
181 |     def plot_samples_and_boundary(self, func):
182 |         name = self.get_name() + ".pdf"
183 |         self.classifier.plot_samples_and_boundary(func, name)
184 | 
185 |     def sample_arch(self):
186 |         if len(self.bag) == 0:
187 |             return None
188 |         net_str = np.random.choice( list(self.bag.keys() ) )
189 |         del self.bag[net_str]
190 |         return json.loads(net_str )
191 | 
192 | 
193 | 
194 | # print(root)
195 | #
196 | # with open('features.json', 'r') as infile:
197 | #     data=json.loads( infile.read() )
198 | # samples = {}
199 | # for d in data:
200 | #     samples[ json.dumps(d['feature']) ] = d['acc']
201 | # n1 = Node(samples, root)
202 | # print(n1)
203 | #
204 | # n1 =
205 | 
206 | 


--------------------------------------------------------------------------------
/LA-MCTS/lamcts/__init__.py:
--------------------------------------------------------------------------------
1 | from .MCTS import MCTS


--------------------------------------------------------------------------------
/LA-MCTS/lamcts/utils.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | 
 7 | import numpy as np
 8 | 
 9 | def from_unit_cube(point, lb, ub):
10 |     assert np.all(lb < ub) 
11 |     assert lb.ndim == 1 
12 |     assert ub.ndim == 1 
13 |     assert point.ndim  == 2
14 |     new_point = point * (ub - lb) + lb
15 |     return new_point
16 | 
17 | def latin_hypercube(n, dims):
18 |     points = np.zeros((n, dims))
19 |     centers = (1.0 + 2.0 * np.arange(0.0, n)) 
20 |     centers = centers / float(2 * n)
21 |     for i in range(0, dims):
22 |         points[:, i] = centers[np.random.permutation(n)]
23 | 
24 |     perturbation = np.random.uniform(-1.0, 1.0, (n, dims)) 
25 |     perturbation = perturbation / float(2 * n)
26 |     points += perturbation
27 |     return points
28 | 
29 | 


--------------------------------------------------------------------------------
/LA-MCTS/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | torch
3 | scikit-learn
4 | matplotlib


--------------------------------------------------------------------------------
/LA-MCTS/revision.patch:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/LaMCTS/489bd60886f23b0b76b10aa8602ea6722f334ad6/LA-MCTS/revision.patch


--------------------------------------------------------------------------------
/LA-MCTS/run.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | from functions.functions import *
 7 | from functions.mujoco_functions import *
 8 | from lamcts import MCTS
 9 | import argparse
10 | 
11 | 
12 | 
13 | 
14 | parser = argparse.ArgumentParser(description='Process inputs')
15 | parser.add_argument('--func', help='specify the test function')
16 | parser.add_argument('--dims', type=int, help='specify the problem dimensions')
17 | parser.add_argument('--iterations', type=int, help='specify the iterations to collect in the search')
18 | 
19 | 
20 | args = parser.parse_args()
21 | 
22 | f = None
23 | iteration = 0
24 | if args.func == 'ackley':
25 |     assert args.dims > 0
26 |     f = Ackley(dims =args.dims)
27 | elif args.func == 'levy':
28 |     assert args.dims > 0
29 |     f = Levy(dims = args.dims)
30 | elif args.func == 'lunar': 
31 |     f = Lunarlanding()
32 | elif args.func == 'swimmer':
33 |     f = Swimmer()
34 | elif args.func == 'hopper':
35 |     f = Hopper()
36 | else:
37 |     print('function not defined')
38 |     os._exit(1)
39 | 
40 | assert f is not None
41 | assert args.iterations > 0
42 | 
43 | 
44 | # f = Ackley(dims = 10)
45 | # f = Levy(dims = 10)
46 | # f = Swimmer()
47 | # f = Hopper()
48 | # f = Lunarlanding()
49 | 
50 | agent = MCTS(
51 |              lb = f.lb,              # the lower bound of each problem dimensions
52 |              ub = f.ub,              # the upper bound of each problem dimensions
53 |              dims = f.dims,          # the problem dimensions
54 |              ninits = f.ninits,      # the number of random samples used in initializations 
55 |              func = f,               # function object to be optimized
56 |              Cp = f.Cp,              # Cp for MCTS
57 |              leaf_size = f.leaf_size, # tree leaf size
58 |              kernel_type = f.kernel_type, #SVM configruation
59 |              gamma_type = f.gamma_type    #SVM configruation
60 |              )
61 | 
62 | agent.search(iterations = args.iterations)
63 | 
64 | """
65 | FAQ:
66 | 
67 | 1. How to retrieve every f(x) during the search?
68 | 
69 | During the optimization, the function will create a folder to store the f(x) trace; and
70 | the name of the folder is in the format of function name + function dimensions, e.g. Ackley10.
71 | 
72 | Every 100 samples, the function will write a row to a file named results + total samples, e.g. result100 
73 | mean f(x) trace in the first 100 samples.
74 | 
75 | Each last row of result file contains the f(x) trace starting from 1th sample -> the current sample.
76 | Results of previous rows are from previous experiments, as we always append the results from a new experiment
77 | to the last row.
78 | 
79 | Here is an example to interpret a row of f(x) trace.
80 | [5, 3.2, 2.1, ..., 1.1]
81 | The first sampled f(x) is 5, the second sampled f(x) is 3.2, and the last sampled f(x) is 1.1 
82 | 
83 | 2. How to improve the performance?
84 | Tune Cp, leaf_size, and improve BO sampler with others.
85 | 
86 | """
87 | 


--------------------------------------------------------------------------------
/LA-MCTS/setup.py:
--------------------------------------------------------------------------------
 1 | import setuptools
 2 | 
 3 | with open("README.md", "r") as fh:
 4 |     long_description = fh.read()
 5 | 
 6 | with open("requirements.txt") as f:
 7 |     required = f.read().splitlines()
 8 | 
 9 | setuptools.setup(
10 |      name = 'LA-MCTS',  
11 |      version = '0.1',
12 |      author = "Linnan Wang",
13 |      author_email = "wangnan318@gmail.com",
14 |      description = "",
15 |      long_description = long_description,
16 |      long_description_content_type = "text/markdown",
17 |      url = "https://github.com/facebookresearch/LaMCTS",
18 |      packages = ["lamcts"],
19 |      install_requires=required,
20 |      include_package_data = True,
21 |      classifiers = [
22 |          "Programming Language :: Python :: 3",
23 |          "Operating System :: OS Independent",
24 |      ]
25 |  )
26 | 


--------------------------------------------------------------------------------
/LA-MCTS/test.py:
--------------------------------------------------------------------------------
1 | from lamcts import MCTS


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/README.md:
--------------------------------------------------------------------------------
 1 | ## Distributed LaNAS
 2 | • <b>accurate evaluations, suit for chasing SoTA results</b>
 3 | 
 4 | • <b>you need a lot of GPUs</b>
 5 | 
 6 | This provides a simple distributed framework for training using LaNAS, with which we achieve SoTA results with 500 GPUs. The distributed LaNAS trains every sampled network from scratch, and I believe techniques such as early prediction will be a very nice improvement to
 7 | the current implementation. Because sending network configurations is fairly cheap, we implemented a simple client-server system to parallelize the distributed search. This figure depicts the general idea.
 8 | 
 9 | <p align="center">
10 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LaNAS/distributed_lanas_architecture.png?raw=true' width="500">
11 | </p>
12 | 
13 | ## Starting the server
14 | 
15 | We uniformly sampled a few million networks from the NASNet search space, and pre-built search space in the file of "search_space". The server loads the file, and search the networks within the file. Feel free to change this to a random generator and merge with this branch.
16 | 
17 | Here are the steps to start:
18 | 1. go to server folder, unzip search_space.zip.
19 | 2. ifconfig get your ip address
20 | 3. you need change the line 212 in MCTS.py
21 | ```
22 | address = ('XXX.XX.XX.XXX', 8000), # replace XX to your ip address, and change to different ports if 8000 does not work.
23 | ```
24 | 4. To start the server, ``` python MCTS.py & ```.
25 | 
26 | ## Starting the clients
27 | Each client folder corresponds to a GPU; you can create as many clients folder as you want, simply copy and paste.
28 | 
29 | Once the server starts running, here is what you need to start clients.
30 | 1. go to client folder, open client.py
31 | 2. change line 20, line 71, line 109 to <b>the server's ip address</b>.
32 | 3. set to a unused GPU
33 | 4. python client.py
34 | 
35 | If you have 500 GPUs, create 500 folders, and repeat the above process 500x. ;)
36 | 
37 | ## Collecting the results
38 | We write a script collect_results.py to collect all the results in client folders. Once it creates total_trace.json (we also uploaded the total trace collected from our experiments), you can read the results by ``` python read_results.py```, and the results are ranked backward, i.e. the last row is the best.
39 | 
40 | Here is the snapshot of best architectures found in our distribtued search.
41 | <p align="center">
42 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LaNAS/distributed_search_results.png?raw=true' width="600">
43 | </p>
44 | 
45 | The last column is the test accuracy after training each networks for 200 epochs. We assume the best network is the one with the best test accuracy. 
46 | 
47 | ## Training the top model
48 | You can train the best "searched" network using the training pipeline <a href="../LaNet/CIFAR10">here</a>. 
49 | 
50 | ## Fault Tolerance
51 | Fault tolerance is very important if you will use hundreds of GPUs. We have already taken care of it in the current implementation.
52 | 
53 | On the server side, it will dump the pickled current state at every search iteration in the file named "mcts_agent". You can resume the searching with that state. The MCTS.py will find mcts_agent in the current folder. If your server got preempted, simply python MCTS.py again.
54 | 
55 | On the client side, it will dump the training state, and resume the training if a job was preempted in the middle of training. To restart a client, python client.py. That's it. ;)
56 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/clientX/client.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import time
  7 | import json
  8 | import random
  9 | import sys
 10 | import os
 11 | import pickle
 12 | import socket
 13 | import signal
 14 | import numpy as np
 15 | import traceback
 16 | import train_client
 17 | from nasnet_set import *
 18 | from array import array
 19 | from multiprocessing.connection import Client
 20 | 
 21 | 
 22 | class Client_t:
 23 |     
 24 |     def __init__(self):
 25 |         self.addr           = ('100.97.66.131', 8000)
 26 |         self.client_name    = "client"
 27 |         self.total_send     = 0
 28 |         self.total_recv     = 0
 29 |         self.accuracy_trace = {}
 30 |         self.load_acc_trace()
 31 |         signal.signal(signal.SIGUSR1, self.sig_handler)
 32 |         signal.signal(signal.SIGTERM, self.term_handler)
 33 |         #below two is to signal the client status
 34 |         self.received       = False
 35 |         self.network        = []
 36 |         self.acc            = 0
 37 |     
 38 |     def print_client_status(self):
 39 |         print("client->receive status: ", client.received  )
 40 |         print("client->network: ",        client.network )
 41 |         print("client->acc: ",            client.acc )
 42 |         print("client->trace_len:",       len(client.accuracy_trace) )
 43 | 
 44 |         
 45 |     def sig_handler(self, signum, frame):
 46 |         print("caught signal", signum," about to exit, dump client")
 47 |         self.dump_client()
 48 |         if os.path.isfile('client.inst'):
 49 |             print("dump successful")
 50 | 
 51 |     def term_handler(self, signum, frame):
 52 |         self.dump_client()
 53 |         if os.path.isfile('client.inst'):
 54 |              print("dump successful")
 55 |         print("terminated caught", flush=True)
 56 | 
 57 |     def dump_client(self):
 58 |         client_path = 'client.inst'
 59 |         with open(client_path,"wb") as outfile:
 60 |             pickle.dump(self, outfile)
 61 |     
 62 |     def dump_acc_trace(self):
 63 |         with open('acc_trace.json', 'w') as fp:
 64 |             json.dump(self.accuracy_trace, fp)
 65 | 
 66 |     def load_acc_trace(self):
 67 |         if os.path.isfile('acc_trace.json'):
 68 |             with open('acc_trace.json') as fp:
 69 |                 self.accuracy_trace = json.load(fp)
 70 |             print("loading #", len(self.accuracy_trace )," prev trained networks")
 71 | 
 72 |     def train(self):
 73 |         while True:
 74 |             while not self.received:
 75 |                 try:
 76 |                     send_address = ('100.97.66.131', 8000)
 77 |                     conn = Client(send_address, authkey=b'nasnet')
 78 |                     if conn.poll(2):
 79 |                         [ self.network ] = conn.recv()
 80 |                         self.total_recv += 1
 81 |                         conn.close()
 82 |                         self.received = True
 83 |                         self.dump_client()
 84 |                         print("RECEIEVE:=>", self.network)
 85 |                         print("RECEIEVE:=>", " total_send:", self.total_send, " total_recv:", self.total_recv)
 86 |                         self.print_client_status()
 87 |                 except Exception as e:
 88 |                     print(e)
 89 |                     print(traceback.format_exc())
 90 |                     print("client recv error")
 91 | 
 92 |             if self.received:
 93 |                 print("prepare training the network:", self.network)
 94 |                 network = np.array( self.network, dtype = 'int' )
 95 |                 network = network.tolist()
 96 |                 net     = gen_code_from_list( network, node_num=7 ) #TODO: change it to 7
 97 |                 net_str = json.dumps( network )
 98 |                 if net_str in self.accuracy_trace:
 99 |                     self.acc = self.accuracy_trace[net_str]
100 |                 else:
101 |                     genotype_net = translator([net, net], max_node=7) #TODO: change it to 7
102 |                     print("--"*15)
103 |                     print(genotype_net)
104 |                     print("training the above network")
105 |                     print("--"*15)
106 |                     self.acc = train_client.run(genotype_net, epochs=600, batch_size=200)
107 |                     self.accuracy_trace[net_str] = self.acc
108 |                     self.dump_acc_trace()
109 | 
110 |             #TODO: train the actual network
111 |             #time.sleep(random.randint(2, 5) )
112 |             while self.received:
113 |                 try:
114 |                     recv_address = ('100.97.66.131', 8000)
115 |                     conn = Client(recv_address, authkey=b'nasnet')
116 |                     network_str = json.dumps( np.array(network).tolist() )
117 |                     conn.send([self.client_name, network_str, self.acc])
118 |                     self.total_send += 1
119 |                     print("SEND:=>", self.network, self.acc)
120 |                     self.network  = []
121 |                     self.acc      = 0
122 |                     self.received = False
123 |                     self.dump_client()
124 |                     print("SEND:=>", " total_send:", self.total_send, " total_recv:", self.total_recv)
125 |                     conn.close()
126 |                 except Exception as e:
127 |                     print(e)
128 |                     print(traceback.format_exc())
129 |                     print("client send error, reconnecting")
130 | 
131 | inst_path = 'client.inst'
132 | if os.path.isfile( inst_path ) == True:
133 |     with open(inst_path, 'rb') as client_data:
134 |         client = pickle.load( client_data )
135 |         client.print_client_status()
136 |     client.train()
137 | else:
138 |     client = Client_t()
139 |     client.train()
140 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/clientX/continue_train.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import  os
  7 | import  sys
  8 | import  time
  9 | import  glob
 10 | import  numpy as np
 11 | import  torch
 12 | import  utils
 13 | import  logging
 14 | import  argparse
 15 | import  torch.nn as nn
 16 | import  genotypes
 17 | import  torch.utils
 18 | import  torchvision.datasets as dset
 19 | import  torch.backends.cudnn as cudnn
 20 | 
 21 | from    model import NetworkCIFAR as Network
 22 | 
 23 | parser = argparse.ArgumentParser("cifar10")
 24 | parser.add_argument('--data', type=str, default='../data', help='location of the data corpus')
 25 | parser.add_argument('--batch_size', type=int, default=96, help='batch size')
 26 | parser.add_argument('--lr', type=float, default=0.025, help='init learning rate')
 27 | parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
 28 | parser.add_argument('--wd', type=float, default=3e-4, help='weight decay')
 29 | parser.add_argument('--report_freq', type=float, default=50, help='report frequency')
 30 | parser.add_argument('--gpu', type=int, default=0, help='gpu device id')
 31 | parser.add_argument('--epochs', type=int, default=600, help='num of training epochs')
 32 | parser.add_argument('--init_ch', type=int, default=36, help='num of init channels')
 33 | parser.add_argument('--layers', type=int, default=20, help='total number of layers')
 34 | parser.add_argument('--model_path', type=str, default='saved_models', help='path to save the model')
 35 | parser.add_argument('--auxiliary', action='store_true', default=False, help='use auxiliary tower')
 36 | parser.add_argument('--auxiliary_weight', type=float, default=0.4, help='weight for auxiliary loss')
 37 | parser.add_argument('--cutout', action='store_true', default=False, help='use cutout')
 38 | parser.add_argument('--cutout_length', type=int, default=16, help='cutout length')
 39 | parser.add_argument('--drop_path_prob', type=float, default=0.2, help='drop path probability')
 40 | parser.add_argument('--exp_path', type=str, default='exp/cifar10', help='experiment name')
 41 | parser.add_argument('--seed', type=int, default=0, help='random seed')
 42 | parser.add_argument('--arch', type=str, default='DARTS', help='which architecture to use')
 43 | parser.add_argument('--grad_clip', type=float, default=5, help='gradient clipping')
 44 | parser.add_argument('--cur_epoch', type=int, default=0, help='num of training epochs')
 45 | parser.add_argument('--save', type=str, default='EXP', help='experiment name')
 46 | 
 47 | args = parser.parse_args()
 48 | 
 49 | args.save = 'eval-{}-{}-{}'.format(args.arch, time.strftime("%Y%m%d-%H%M%S"), args.init_ch)
 50 | utils.create_exp_dir(args.save, scripts_to_save=glob.glob('*.py'))
 51 | 
 52 | log_format = '%(asctime)s %(message)s'
 53 | logging.basicConfig(stream=sys.stdout, level=logging.INFO,
 54 |                     format=log_format, datefmt='%m/%d %I:%M:%S %p')
 55 | fh = logging.FileHandler(os.path.join(args.save, 'log.txt'))
 56 | fh.setFormatter(logging.Formatter(log_format))
 57 | logging.getLogger().addHandler(fh)
 58 | 
 59 | 
 60 | 
 61 | def main():
 62 | 
 63 | 
 64 |     np.random.seed(args.seed)
 65 |     torch.cuda.set_device(args.gpu)
 66 |     cudnn.benchmark = True
 67 |     cudnn.enabled = True
 68 |     torch.manual_seed(args.seed)
 69 |     logging.info('gpu device = %d' % args.gpu)
 70 |     logging.info("args = %s", args)
 71 | 
 72 |     genotype = eval("genotypes.%s" % args.arch)
 73 | 
 74 |     # model = Network(args.init_ch, 10, args.layers, args.auxiliary, genotype).cuda()
 75 |     model = torch.load(os.path.join(args.model_path, 'model.pt'))
 76 | 
 77 |     logging.info("param size = %fMB", utils.count_parameters_in_MB(model))
 78 | 
 79 |     criterion = nn.CrossEntropyLoss().cuda()
 80 |     optimizer = torch.optim.SGD(
 81 |         model.parameters(),
 82 |         args.lr,
 83 |         momentum=args.momentum,
 84 |         weight_decay=args.wd
 85 |     )
 86 | 
 87 |     train_transform, valid_transform = utils._data_transforms_cifar10(args)
 88 |     train_data = dset.CIFAR10(root=args.data, train=True, download=True, transform=train_transform)
 89 |     valid_data = dset.CIFAR10(root=args.data, train=False, download=True, transform=valid_transform)
 90 | 
 91 |     train_queue = torch.utils.data.DataLoader(
 92 |         train_data, batch_size=args.batch_size, shuffle=True, pin_memory=True, num_workers=2)
 93 | 
 94 |     valid_queue = torch.utils.data.DataLoader(
 95 |         valid_data, batch_size=args.batch_size, shuffle=False, pin_memory=True, num_workers=2)
 96 | 
 97 |     scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, float(args.epochs))
 98 | 
 99 |     best_acc = 0.0
100 | 
101 |     for i in range(args.cur_epoch):
102 |         scheduler.step()
103 | 
104 |     for epoch in range(args.cur_epoch, args.epochs):
105 |         scheduler.step()
106 |         logging.info('epoch %d lr %e', epoch, scheduler.get_lr()[0])
107 |         model.drop_path_prob = args.drop_path_prob * epoch / args.epochs
108 | 
109 | 
110 | 
111 |         valid_acc, valid_obj = infer(valid_queue, model, criterion)
112 |         logging.info('valid_acc: %f', valid_acc)
113 | 
114 |         if valid_acc > best_acc:
115 |             best_acc = valid_acc
116 |             print('this model is the best')
117 |             torch.save(model, os.path.join(args.save, 'AlphaX_1.pt'))
118 | 
119 |         torch.save(model, os.path.join(args.save, 'trained.pt'))
120 |         print('current best acc is', best_acc)
121 | 
122 | 
123 |         train_acc, train_obj = train(train_queue, model, criterion, optimizer)
124 |         logging.info('train_acc: %f', train_acc)
125 | 
126 | 
127 | 
128 |         # utils.save(model, os.path.join(args.save, 'trained.pt'))
129 |         print('saved to: trained.pt')
130 | 
131 | 
132 | def train(train_queue, model, criterion, optimizer):
133 | 
134 |     objs = utils.AverageMeter()
135 |     top1 = utils.AverageMeter()
136 |     top5 = utils.AverageMeter()
137 |     model.train()
138 | 
139 |     for step, (x, target) in enumerate(train_queue):
140 |         x = x.cuda()
141 |         target = target.cuda(non_blocking=True)
142 | 
143 |         optimizer.zero_grad()
144 |         logits, logits_aux = model(x)
145 |         loss = criterion(logits, target)
146 |         if args.auxiliary:
147 |             loss_aux = criterion(logits_aux, target)
148 |             loss += args.auxiliary_weight * loss_aux
149 |         loss.backward()
150 |         nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip)
151 |         optimizer.step()
152 | 
153 |         prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5))
154 |         n = x.size(0)
155 |         objs.update(loss.item(), n)
156 |         top1.update(prec1.item(), n)
157 |         top5.update(prec5.item(), n)
158 | 
159 |         if step % args.report_freq == 0:
160 |             logging.info('train %03d %e %f %f', step, objs.avg, top1.avg, top5.avg)
161 | 
162 |     return top1.avg, objs.avg
163 | 
164 | 
165 | def infer(valid_queue, model, criterion):
166 | 
167 |     objs = utils.AverageMeter()
168 |     top1 = utils.AverageMeter()
169 |     top5 = utils.AverageMeter()
170 |     model.eval()
171 | 
172 |     for step, (x, target) in enumerate(valid_queue):
173 |         x = x.cuda()
174 |         target = target.cuda(non_blocking=True)
175 | 
176 |         with torch.no_grad():
177 |             logits, _ = model(x)
178 |             loss = criterion(logits, target)
179 | 
180 |             prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5))
181 |             n = x.size(0)
182 |             objs.update(loss.item(), n)
183 |             top1.update(prec1.item(), n)
184 |             top5.update(prec5.item(), n)
185 | 
186 |         if step % args.report_freq == 0:
187 |             logging.info('>>Validation: %03d %e %f %f', step, objs.avg, top1.avg, top5.avg)
188 | 
189 |     return top1.avg, objs.avg
190 | 
191 | 
192 | if __name__ == '__main__':
193 |     main()
194 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/clientX/model.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import  torch
  7 | import  torch.nn as nn
  8 | from operations import *
  9 | 
 10 | 
 11 | class Cell(nn.Module):
 12 | 
 13 |     def __init__(self, genotype, C_prev_prev, C_prev, C, reduction, reduction_prev):
 14 | 
 15 |         super(Cell, self).__init__()
 16 | 
 17 |         print(C_prev_prev, C_prev, C)
 18 | 
 19 |         if reduction_prev:
 20 |             self.preprocess0 = FactorizedReduce(C_prev_prev, C)
 21 |         else:
 22 |             self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0)
 23 |         self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0)
 24 | 
 25 |         if reduction:
 26 |             op_names, indices = zip(*genotype.reduce)
 27 |             concat = genotype.reduce_concat
 28 |         else:
 29 |             op_names, indices = zip(*genotype.normal)
 30 |             concat = genotype.normal_concat
 31 |         self._compile(C, op_names, indices, concat, reduction)
 32 | 
 33 |     def _compile(self, C, op_names, indices, concat, reduction):
 34 | 
 35 |         assert len(op_names) == len(indices)
 36 | 
 37 |         self._steps = len(op_names) // 2
 38 |         self._concat = concat
 39 |         self.multiplier = len(concat)
 40 | 
 41 |         self._ops = nn.ModuleList()
 42 |         for name, index in zip(op_names, indices):
 43 |             stride = 2 if reduction and index < 2 else 1
 44 |             op = OPS[name](C, stride, True)
 45 |             self._ops += [op]
 46 |         self._indices = indices
 47 | 
 48 |     def forward(self, s0, s1, drop_prob):
 49 | 
 50 |         s0 = self.preprocess0(s0)
 51 |         s1 = self.preprocess1(s1)
 52 | 
 53 |         states = [s0, s1]
 54 |         for i in range(self._steps):
 55 |             h1 = states[self._indices[2 * i]]
 56 |             h2 = states[self._indices[2 * i + 1]]
 57 |             op1 = self._ops[2 * i]
 58 |             op2 = self._ops[2 * i + 1]
 59 |             h1 = op1(h1)
 60 |             h2 = op2(h2)
 61 | 
 62 |             if self.training and drop_prob > 0.:
 63 |                 if not isinstance(op1, Identity):
 64 |                     h1 = drop_path(h1, drop_prob)
 65 |                 if not isinstance(op2, Identity):
 66 |                     h2 = drop_path(h2, drop_prob)
 67 |             s = h1 + h2
 68 |             states += [s]
 69 |         return torch.cat([states[i] for i in self._concat], dim=1)
 70 | 
 71 | def drop_path(x, drop_prob):
 72 |     if drop_prob > 0.:
 73 |         keep_prob = 1. - drop_prob
 74 | 
 75 |         mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
 76 |         x.div_(keep_prob)
 77 |         try:
 78 |             x.mul_(mask)
 79 |         except:
 80 |             mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
 81 |             x.mul_(mask)
 82 |     return x
 83 | 
 84 | 
 85 | 
 86 | 
 87 | class AuxiliaryHeadCIFAR(nn.Module):
 88 | 
 89 |     def __init__(self, C, num_classes):
 90 |         """assuming input size 8x8"""
 91 |         super(AuxiliaryHeadCIFAR, self).__init__()
 92 | 
 93 |         self.features = nn.Sequential(
 94 |             nn.ReLU(inplace=True),
 95 |             nn.AvgPool2d(5, stride=3, padding=0, count_include_pad=False),  # image size = 2 x 2
 96 |             nn.Conv2d(C, 128, 1, bias=False),
 97 |             nn.BatchNorm2d(128),
 98 |             nn.ReLU(inplace=True),
 99 |             nn.Conv2d(128, 768, 2, bias=False),
100 |             nn.BatchNorm2d(768),
101 |             nn.ReLU(inplace=True)
102 |         )
103 |         self.classifier = nn.Linear(768, num_classes)
104 | 
105 |     def forward(self, x):
106 |         x = self.features(x)
107 |         x = self.classifier(x.view(x.size(0), -1))
108 |         return x
109 | 
110 | 
111 | class NetworkCIFAR(nn.Module):
112 | 
113 |     def __init__(self, C, num_classes, layers, auxiliary, genotype):
114 |         super(NetworkCIFAR, self).__init__()
115 | 
116 |         self._layers = layers
117 |         self._auxiliary = auxiliary
118 | 
119 |         stem_multiplier = 3
120 |         C_curr = stem_multiplier * C
121 |         self.stem = nn.Sequential(
122 |             nn.Conv2d(3, C_curr, 3, padding=1, bias=False),
123 |             nn.BatchNorm2d(C_curr)
124 |         )
125 | 
126 |         C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
127 |         self.cells = nn.ModuleList()
128 |         reduction_prev = False
129 |         for i in range(layers):
130 |             if i in [layers // 3, 2 * layers // 3]:
131 |                 C_curr *= 2
132 |                 reduction = True
133 |             else:
134 |                 reduction = False
135 |             cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction, reduction_prev)
136 |             reduction_prev = reduction
137 |             self.cells += [cell]
138 |             C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr
139 |             if i == 2 * layers // 3:
140 |                 C_to_auxiliary = C_prev
141 | 
142 |         if auxiliary:
143 |             self.auxiliary_head = AuxiliaryHeadCIFAR(C_to_auxiliary, num_classes)
144 |         self.global_pooling = nn.AdaptiveAvgPool2d(1)
145 |         self.classifier = nn.Linear(C_prev, num_classes)
146 | 
147 |     def forward(self, input):
148 |         logits_aux = None
149 |         s0 = s1 = self.stem(input)
150 |         for i, cell in enumerate(self.cells):
151 |             s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
152 |             if i == 2 * self._layers // 3:
153 |                 if self._auxiliary and self.training:
154 |                     logits_aux = self.auxiliary_head(s1)
155 |         out = self.global_pooling(s1)
156 |         logits = self.classifier(out.view(out.size(0), -1))
157 |         return logits, logits_aux
158 | 
159 | 
160 | 
161 | 
162 | 
163 | 
164 | 
165 | 
166 | 
167 | 
168 | 
169 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/clientX/operations.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | # from torch import functional as F
  9 | import torch.nn.functional as F
 10 | 
 11 | OPS = {
 12 |     'avg_pool_3x3': lambda C, stride, affine: nn.AvgPool2d(3, stride=stride, padding=1, count_include_pad=False),
 13 |     'max_pool_2x2' : lambda C, stride, affine: nn.MaxPool2d(2, stride=stride, padding=0),
 14 |     'max_pool_3x3': lambda C, stride, affine: nn.MaxPool2d(3, stride=stride, padding=1),
 15 |     'max_pool_5x5': lambda C, stride, affine: nn.MaxPool2d(5, stride=stride, padding=2),
 16 |     'skip_connect': lambda C, stride, affine: Identity() if stride == 1 else FactorizedReduce(C, C, affine=affine),
 17 |     'sep_conv_3x3': lambda C, stride, affine: SepConv(C, C, 3, stride, 1, affine=affine),
 18 |     'sep_conv_5x5': lambda C, stride, affine: SepConv(C, C, 5, stride, 2, affine=affine),
 19 |     'sep_conv_7x7': lambda C, stride, affine: SepConv(C, C, 7, stride, 3, affine=affine),
 20 |     'dil_conv_3x3' : lambda C, stride, affine: DilConv(C, C, 3, stride, 2, 2, affine=affine),
 21 |     'dil_conv_5x5' : lambda C, stride, affine: DilConv(C, C, 5, stride, 4, 2, affine=affine),
 22 |     'conv_1x1' : lambda C, stride, affine: nn.Conv2d(C, C, (1,1), stride=(stride, stride), padding=(0,0), bias=False),
 23 |     'conv_3x3' : lambda C, stride, affine: nn.Conv2d(C, C, (3,3), stride=(stride, stride), padding=(1,1), bias=False),
 24 |     'conv_5x5' : lambda C, stride, affine: nn.Conv2d(C, C, (5,5), stride=(stride, stride), padding=(2,2), bias=False),
 25 | }
 26 | 
 27 | 
 28 | class ReLUConvBN(nn.Module):
 29 |     def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True):
 30 | 
 31 |         super(ReLUConvBN, self).__init__()
 32 | 
 33 |         self.op = nn.Sequential(
 34 |             nn.ReLU(inplace=False),
 35 |             Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False),
 36 |             nn.BatchNorm2d(C_out, affine=affine)
 37 |         )
 38 | 
 39 |     def forward(self, x):
 40 |         return self.op(x)
 41 | 
 42 | class Conv2d(nn.Conv2d):
 43 | 
 44 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1,
 45 |                  padding=0, dilation=1, groups=1, bias=True):
 46 |         super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride,
 47 |                  padding, dilation, groups, bias)
 48 | 
 49 |     def forward(self, x):
 50 |         weight = self.weight
 51 |         weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2,
 52 |                                   keepdim=True).mean(dim=3, keepdim=True)
 53 |         weight = weight - weight_mean
 54 |         std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5
 55 |         weight = weight / std.expand_as(weight)
 56 |         return F.conv2d(x, weight, self.bias, self.stride,
 57 |                         self.padding, self.dilation, self.groups)
 58 | 
 59 | 
 60 | class SepConv(nn.Module):
 61 | 
 62 |     def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True):
 63 |         super(SepConv, self).__init__()
 64 | 
 65 |         self.op = nn.Sequential(
 66 |             nn.ReLU(inplace=False),
 67 |             nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding,
 68 |                       groups=C_in, bias=False),
 69 |             Conv2d(C_in, C_in, kernel_size=1, padding=0, bias=False),
 70 |             nn.BatchNorm2d(C_in, affine=affine),
 71 |             nn.ReLU(inplace=False),
 72 |             nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=1, padding=padding,
 73 |                       groups=C_in, bias=False),
 74 |             Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False),
 75 |             nn.BatchNorm2d(C_out, affine=affine),
 76 |         )
 77 | 
 78 |     def forward(self, x):
 79 |         return self.op(x)
 80 | 
 81 | 
 82 | class Identity(nn.Module):
 83 | 
 84 |     def __init__(self):
 85 |         super(Identity, self).__init__()
 86 | 
 87 |     def forward(self, x):
 88 |         return x
 89 | 
 90 | 
 91 | class DilConv(nn.Module):
 92 |     def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True):
 93 |         super(DilConv, self).__init__()
 94 |         self.op = nn.Sequential(
 95 |             nn.ReLU(inplace=False),
 96 |             nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation,
 97 |                       groups=C_in, bias=False),
 98 |             Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False),
 99 |             nn.BatchNorm2d(C_out, affine=affine),
100 |         )
101 | 
102 |     def forward(self, x):
103 |         return self.op(x)
104 | 
105 | 
106 | class FactorizedReduce(nn.Module):
107 | 
108 |     def __init__(self, C_in, C_out, affine=True):
109 | 
110 |         super(FactorizedReduce, self).__init__()
111 | 
112 |         assert C_out % 2 == 0
113 | 
114 |         self.relu = nn.ReLU(inplace=False)
115 |         self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False)
116 |         self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False)
117 |         self.bn = nn.BatchNorm2d(C_out, affine=affine)
118 | 
119 |     def forward(self, x):
120 |         x = self.relu(x)
121 | 
122 |         # x: torch.Size([32, 32, 32, 32])
123 |         # conv1: [b, c_out//2, d//2, d//2]
124 |         # conv2: []
125 |         # out: torch.Size([32, 32, 16, 16])
126 | 
127 |         out = torch.cat([self.conv_1(x), self.conv_2(x[:, :, 1:, 1:])], dim=1)
128 |         out = self.bn(out)
129 |         return out
130 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/clientX/train_client.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import  os
  7 | import  sys
  8 | import  time
  9 | import  glob
 10 | import  numpy as np
 11 | import  torch
 12 | import  utils
 13 | import  logging
 14 | import  argparse
 15 | import  torch.nn as nn
 16 | import  torch.utils
 17 | import  torchvision.datasets as dset
 18 | import  torch.backends.cudnn as cudnn
 19 | import json
 20 | import hashlib
 21 | import  apex
 22 | 
 23 | 
 24 | from model import NetworkCIFAR as Network
 25 | 
 26 | 
 27 | 
 28 | def run(net, init_ch=32, layers=20, auxiliary=True, lr=0.025, momentum=0.9, wd=3e-4, cutout=True, cutout_length=16, data='../data', batch_size=96, epochs=600, drop_path_prob=0.2, auxiliary_weight=0.4):
 29 |     save = '/checkpoint/linnanwang/nasnet/' + hashlib.md5(json.dumps(net).encode()).hexdigest()
 30 |     utils.create_exp_dir(save, scripts_to_save=glob.glob('*.py'))
 31 | 
 32 |     log_format = '%(asctime)s %(message)s'
 33 |     logging.basicConfig(stream=sys.stdout, level=logging.INFO,
 34 |                         format=log_format, datefmt='%m/%d %I:%M:%S %p')
 35 |     fh = logging.FileHandler(os.path.join(save, 'log.txt'))
 36 |     fh.setFormatter(logging.Formatter(log_format))
 37 |     logging.getLogger().addHandler(fh)
 38 | 
 39 | 
 40 |     np.random.seed(0)
 41 |     torch.cuda.set_device(0)
 42 |     cudnn.benchmark = True
 43 |     cudnn.enabled = True
 44 |     torch.manual_seed(0)
 45 |     logging.info('gpu device = %d' % 0)
 46 |     # logging.info("args = %s", args)
 47 | 
 48 | 
 49 |     genotype = net
 50 |     model = Network(init_ch, 10, layers, auxiliary, genotype).cuda()
 51 | 
 52 |     logging.info("param size = %fMB", utils.count_parameters_in_MB(model))
 53 | 
 54 |     criterion = nn.CrossEntropyLoss().cuda()
 55 |     optimizer = torch.optim.SGD(
 56 |         model.parameters(),
 57 |         lr,
 58 |         momentum=momentum,
 59 |         weight_decay=wd
 60 |     )
 61 |     model, optimizer = apex.amp.initialize(model, optimizer, opt_level="O3")
 62 | 
 63 | 
 64 | 
 65 |     train_transform, valid_transform = utils._data_transforms_cifar10(cutout, cutout_length)
 66 |     train_data = dset.CIFAR10(root=data, train=True, download=True, transform=train_transform)
 67 |     valid_data = dset.CIFAR10(root=data, train=False, download=True, transform=valid_transform)
 68 | 
 69 |     train_queue = torch.utils.data.DataLoader(
 70 |         train_data, batch_size=batch_size, shuffle=True, pin_memory=True, num_workers=2)
 71 | 
 72 |     valid_queue = torch.utils.data.DataLoader(
 73 |         valid_data, batch_size=batch_size, shuffle=False, pin_memory=True, num_workers=2)
 74 | 
 75 |     scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, float(epochs))
 76 | 
 77 |     best_acc = 0.0
 78 | 
 79 |     for epoch in range(epochs):
 80 |         scheduler.step()
 81 |         logging.info('epoch %d lr %e', epoch, scheduler.get_lr()[0])
 82 |         model.drop_path_prob = drop_path_prob * epoch / epochs
 83 | 
 84 | 
 85 |         train_acc, train_obj = train(train_queue, model, criterion, optimizer, auxiliary=auxiliary, auxiliary_weight=auxiliary_weight)
 86 |         logging.info('train_acc: %f', train_acc)
 87 | 
 88 |         valid_acc, valid_obj = infer(valid_queue, model, criterion)
 89 |         logging.info('valid_acc: %f', valid_acc)
 90 | 
 91 |         if valid_acc > best_acc and epoch >= 50:
 92 |             print('this model is the best')
 93 |             torch.save(model.state_dict(), os.path.join(save, 'model.pt'))
 94 |         if valid_acc > best_acc:
 95 |             best_acc = valid_acc
 96 |         print('current best acc is', best_acc)
 97 | 
 98 |         if epoch == 100:
 99 |             break
100 | 
101 |         # utils.save(model, os.path.join(args.save, 'trained.pt'))
102 |         print('saved to: model.pt')
103 | 
104 |     return best_acc
105 | 
106 | 
107 | def train(train_queue, model, criterion, optimizer, auxiliary=True, auxiliary_weight=0.4, grad_clip=float(5), report_freq=float(50)):
108 | 
109 |     objs = utils.AverageMeter()
110 |     top1 = utils.AverageMeter()
111 |     top5 = utils.AverageMeter()
112 |     model.train()
113 | 
114 |     for step, (x, target) in enumerate(train_queue):
115 |         x = x.cuda()
116 |         target = target.cuda(non_blocking=True)
117 | 
118 |         optimizer.zero_grad()
119 |         logits, logits_aux = model(x)
120 |         loss = criterion(logits, target)
121 |         if auxiliary:
122 |             loss_aux = criterion(logits_aux, target)
123 |             loss += auxiliary_weight * loss_aux
124 |         loss.backward()
125 |         nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
126 |         optimizer.step()
127 | 
128 |         prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5))
129 |         n = x.size(0)
130 |         objs.update(loss.item(), n)
131 |         top1.update(prec1.item(), n)
132 |         top5.update(prec5.item(), n)
133 | 
134 |         if step % report_freq == 0:
135 |             logging.info('train %03d %e %f %f', step, objs.avg, top1.avg, top5.avg)
136 | 
137 |     return top1.avg, objs.avg
138 | 
139 | 
140 | def infer(valid_queue, model, criterion, report_freq=float(50)):
141 | 
142 |     objs = utils.AverageMeter()
143 |     top1 = utils.AverageMeter()
144 |     top5 = utils.AverageMeter()
145 |     model.eval()
146 | 
147 |     for step, (x, target) in enumerate(valid_queue):
148 |         x = x.cuda()
149 |         target = target.cuda(non_blocking=True)
150 | 
151 |         with torch.no_grad():
152 |             logits, _ = model(x)
153 |             loss = criterion(logits, target)
154 | 
155 |             prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5))
156 |             n = x.size(0)
157 |             objs.update(loss.item(), n)
158 |             top1.update(prec1.item(), n)
159 |             top5.update(prec5.item(), n)
160 | 
161 |         if step % report_freq == 0:
162 |             logging.info('>>Validation: %03d %e %f %f', step, objs.avg, top1.avg, top5.avg)
163 | 
164 |     return top1.avg, objs.avg
165 | 
166 | 
167 | 
168 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/clientX/utils.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import  os
  7 | import  numpy as np
  8 | import  torch
  9 | import  shutil
 10 | import  torchvision.transforms as transforms
 11 | 
 12 | 
 13 | class AverageMeter:
 14 | 
 15 |     def __init__(self):
 16 |         self.reset()
 17 | 
 18 |     def reset(self):
 19 |         self.avg = 0
 20 |         self.sum = 0
 21 |         self.cnt = 0
 22 | 
 23 |     def update(self, val, n=1):
 24 |         self.sum += val * n
 25 |         self.cnt += n
 26 |         self.avg = self.sum / self.cnt
 27 | 
 28 | 
 29 | def accuracy(output, target, topk=(1,)):
 30 |     """
 31 | 
 32 |     :param output: logits, [b, classes]
 33 |     :param target: [b]
 34 |     :param topk:
 35 |     :return:
 36 |     """
 37 |     maxk = max(topk)
 38 |     batch_size = target.size(0)
 39 | 
 40 |     _, pred = output.topk(maxk, 1, True, True)
 41 |     pred = pred.t()
 42 |     correct = pred.eq(target.view(1, -1).expand_as(pred))
 43 | 
 44 |     res = []
 45 |     for k in topk:
 46 |         correct_k = correct[:k].view(-1).float().sum(0)
 47 |         res.append(correct_k.mul_(100.0 / batch_size))
 48 | 
 49 |     return res
 50 | 
 51 | 
 52 | class Cutout:
 53 |     def __init__(self, length):
 54 |         self.length = length
 55 | 
 56 |     def __call__(self, img):
 57 |         h, w = img.size(1), img.size(2)
 58 |         mask = np.ones((h, w), np.float32)
 59 |         y = np.random.randint(h)
 60 |         x = np.random.randint(w)
 61 | 
 62 |         y1 = np.clip(y - self.length // 2, 0, h)
 63 |         y2 = np.clip(y + self.length // 2, 0, h)
 64 |         x1 = np.clip(x - self.length // 2, 0, w)
 65 |         x2 = np.clip(x + self.length // 2, 0, w)
 66 | 
 67 |         mask[y1: y2, x1: x2] = 0.
 68 |         mask = torch.from_numpy(mask)
 69 |         mask = mask.expand_as(img)
 70 |         img *= mask
 71 |         return img
 72 | 
 73 | 
 74 | def _data_transforms_cifar10(cutout, cutout_length):
 75 |     """
 76 | 
 77 |     :param args:
 78 |     :return:
 79 |     """
 80 |     CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
 81 |     CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
 82 | 
 83 |     train_transform = transforms.Compose([
 84 |         transforms.RandomCrop(32, padding=4),
 85 |         transforms.RandomHorizontalFlip(),
 86 |         transforms.ToTensor(),
 87 |         transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
 88 |     ])
 89 |     if cutout:
 90 |         train_transform.transforms.append(Cutout(cutout_length))
 91 | 
 92 |     valid_transform = transforms.Compose([
 93 |         transforms.ToTensor(),
 94 |         transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
 95 |     ])
 96 |     return train_transform, valid_transform
 97 | 
 98 | 
 99 | def count_parameters_in_MB(model):
100 |     """
101 |     count all parameters excluding auxiliary
102 |     :param model:
103 |     :return:
104 |     """
105 |     return np.sum(v.numel() for name, v in model.named_parameters() if "auxiliary" not in name) / 1e6
106 | 
107 | 
108 | def save_checkpoint(state, is_best, save):
109 |     filename = os.path.join(save, 'checkpoint.pth.tar')
110 |     torch.save(state, filename)
111 |     if is_best:
112 |         best_filename = os.path.join(save, 'model_best.pth.tar')
113 |         shutil.copyfile(filename, best_filename)
114 | 
115 | 
116 | def save(model, model_path):
117 |     print('saved to model:', model_path)
118 |     torch.save(model.state_dict(), model_path)
119 | 
120 | 
121 | def load(model, model_path):
122 |     print('load from model:', model_path)
123 |     model.load_state_dict(torch.load(model_path))
124 | 
125 | 
126 | def drop_path(x, drop_prob):
127 |     if drop_prob > 0.:
128 |         keep_prob = 1. - drop_prob
129 | 
130 |         mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
131 |         x.div_(keep_prob)
132 |         try:
133 |             x.mul_(mask)
134 |         except:
135 |             mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
136 |             x.mul_(mask)
137 |     return x
138 | 
139 | 
140 | def create_exp_dir(path, scripts_to_save=None):
141 |     if not os.path.exists(path):
142 |         os.mkdir(path)
143 |     print('Experiment dir : {}'.format(path))
144 | 
145 |     if scripts_to_save is not None:
146 |         if not os.path.exists(os.path.join(path, 'scripts')):
147 |             os.mkdir(os.path.join(path, 'scripts'))
148 |         for script in scripts_to_save:
149 |             dst_file = os.path.join(path, 'scripts', os.path.basename(script))
150 |             shutil.copyfile(script, dst_file)
151 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/collect_results.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | import json
 7 | import os
 8 | import operator
 9 | 
10 | total_trace  = {}
11 | for i in range(1, 800):
12 |     path = 'client' + str(i) + '/' + 'acc_trace.json'
13 |     if os.path.exists(path):
14 |         with open(path, 'r') as json_data:
15 |             data = json.load(json_data)
16 |         for k, v in data.items():
17 |             total_trace[k] = v
18 | 
19 | with open('total_trace.json', 'w') as outfile:
20 |     json.dump(total_trace, outfile)
21 | print("total element:", len(total_trace) )
22 | 
23 | 
24 | 
25 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/launch_clients.sh:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | 
 7 | for (( c=1; c < 600; c++ ))
 8 | do
 9 |    echo "---------------------------------"
10 |    echo $PWD
11 |    cd "client$c"
12 |    echo $PWD
13 |    screen -S client -d -m srun --gres=gpu:1 --time=24:00:00 --cpus-per-task=1 python client.py
14 |    cd ".."
15 |    echo "$PWD"
16 | done
17 | 
18 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/read_results.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | import json
 7 | import os
 8 | import operator
 9 | 
10 | data  = {}
11 | 
12 | with open('total_trace.json') as json_file:
13 |     data = json.load(json_file)
14 | 
15 | 
16 | sorted_trace = {}
17 | sorted_trace = sorted(data.items(), key=operator.itemgetter(1))
18 | for k,v in sorted_trace:
19 |     print(k, v)
20 | 
21 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/server/Classifier.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | from torch.autograd import Variable
  9 | import json
 10 | from torch import optim
 11 | import numpy as np
 12 | 
 13 | # this is the backbone model
 14 | # to split networks at a MCTS state
 15 | class LinearModel(nn.Module):
 16 |     
 17 |     def __init__(self, input_dim, output_dim):
 18 |         super(LinearModel, self).__init__()
 19 |         self.fc1 = nn.Linear(input_dim, output_dim)
 20 |         torch.nn.init.xavier_uniform_( self.fc1.weight )
 21 |     
 22 |     def forward(self, x):
 23 |         y = self.fc1(x)
 24 |         return y
 25 | 
 26 | # the input will be samples!
 27 | class Classifier():
 28 |     def __init__(self, samples, input_dim):
 29 |         self.training_counter = 0
 30 |         assert input_dim >= 1
 31 |         assert type(samples) ==  type({})
 32 |         self.input_dim  = input_dim
 33 |         self.samples    = samples
 34 |         self.model      = LinearModel(input_dim, 1)
 35 |         if torch.cuda.is_available():
 36 |             self.model.cuda()
 37 |         self.l_rate     = 0.00001
 38 |         self.optimiser  = optim.Adam(self.model.parameters(), lr=self.l_rate, betas=(0.9, 0.999), eps=1e-08)
 39 |         self.epochs     = 10000
 40 |         self.boundary   = -1
 41 |         self.nets       = []
 42 |         
 43 |     
 44 |     def reinit(self):
 45 |         torch.nn.init.xavier_uniform_( self.m.fc1.weight )
 46 |         torch.nn.init.xavier_uniform_( self.m.fc2.weight )
 47 |     
 48 |     def update_samples(self, latest_samples):
 49 |         assert type(latest_samples) == type(self.samples)
 50 |         sampled_nets    = []
 51 |         nets_acc        = []
 52 |         for k, v in latest_samples.items():
 53 |             net = json.loads(k)
 54 |             sampled_nets.append( net )
 55 |             nets_acc.append( v )
 56 |         self.nets = torch.from_numpy(np.asarray(sampled_nets, dtype=np.float32).reshape(-1, self.input_dim))
 57 |         self.acc  = torch.from_numpy(np.asarray(nets_acc,     dtype=np.float32).reshape(-1, 1))
 58 |         self.samples = latest_samples
 59 |         if torch.cuda.is_available():
 60 |             self.nets = self.nets.cuda()
 61 |             self.acc  = self.acc.cuda()
 62 | 
 63 |     def train(self):
 64 |         if self.training_counter == 0:
 65 |             self.epochs = 20000
 66 |         else:
 67 |             self.epochs = 3000
 68 |         self.training_counter += 1
 69 |         # in a rare case, one branch has no networks
 70 |         if len(self.nets) == 0:
 71 |             return
 72 |         for epoch in range(self.epochs):
 73 |             epoch += 1
 74 |             nets = self.nets
 75 |             acc  = self.acc
 76 |             #clear grads
 77 |             self.optimiser.zero_grad()
 78 |             #forward to get predicted values
 79 |             outputs = self.model.forward( nets )
 80 |             loss = nn.MSELoss()(outputs, acc)
 81 |             loss.backward()# back props
 82 |             nn.utils.clip_grad_norm_(self.model.parameters(), 5)
 83 |             self.optimiser.step()# update the parameters
 84 | #            if epoch % 1000 == 0:
 85 | #                print('@' + self.__class__.__name__ + ' epoch {}, loss {}'.format(epoch, loss.data))
 86 | 
 87 |     def predict(self, remaining):
 88 |         assert type(remaining) == type({})
 89 |         remaining_archs    = []
 90 |         for k, v in remaining.items():
 91 |             net = json.loads(k)
 92 |             remaining_archs.append( net )
 93 |         remaining_archs = torch.from_numpy(np.asarray(remaining_archs, dtype=np.float32).reshape(-1, self.input_dim))
 94 |         if torch.cuda.is_available():
 95 |             remaining_archs = remaining_archs.cuda()
 96 |         outputs = self.model.forward(remaining_archs)
 97 |         if torch.cuda.is_available():
 98 |             remaining_archs = remaining_archs.cpu()
 99 |             outputs         = outputs.cpu()
100 |         result  = {}
101 |         counter = 0
102 |         for k in range(0, len(remaining_archs) ):
103 |             counter += 1
104 |             arch = remaining_archs[k].detach().numpy()
105 |             arch_str = json.dumps( arch.tolist() )
106 |             result[ arch_str ] = outputs[k].detach().numpy().tolist()[0]
107 |         assert len(result) == len(remaining)
108 |         return result
109 | 
110 |     def split_predictions(self, remaining):
111 |         assert type(remaining) == type({})
112 |         samples_badness  = {}
113 |         samples_goodies  = {}
114 |         if len(remaining) == 0:
115 |             return samples_badness, samples_goodies
116 |         predictions = self.predict(remaining)
117 |         avg_acc          = self.predict_mean()
118 |         self.boundary    = avg_acc
119 |         for k, v in predictions.items():
120 |             if v < avg_acc:
121 |                 samples_badness[k] = v
122 |             else:
123 |                 samples_goodies[k] = v
124 |         assert len(samples_badness) + len(samples_goodies) == len(remaining)
125 |         return  samples_goodies, samples_badness
126 | 
127 | 
128 |     def predict_mean(self):
129 |         if len(self.nets) == 0:
130 |             return 0
131 |         # can we use the actual acc?
132 |         outputs    = self.model.forward(self.nets)
133 |         pred_np    = None
134 |         if torch.cuda.is_available():
135 |             pred_np = outputs.detach().cpu().numpy()
136 |         else:
137 |             pred_np = outputs.detach().numpy()
138 |         return np.mean(pred_np)
139 |     
140 |     def split_data(self):
141 |         samples_badness  = {}
142 |         samples_goodies  = {}
143 |         if len(self.nets) == 0:
144 |             return samples_badness, samples_goodies
145 |         self.train()
146 |         avg_acc          = self.predict_mean()
147 |         self.boundary    = avg_acc
148 |         for k, v in self.samples.items():
149 |             if v < avg_acc:
150 |                 samples_badness[k]  = v
151 |             else:
152 |                 samples_goodies[k] = v
153 |         assert len(samples_badness) + len(samples_goodies) == len( self.samples )
154 |         return  samples_goodies, samples_badness
155 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/server/Node.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | from Classifier import Classifier
  7 | import json
  8 | import numpy as np
  9 | import math
 10 | 
 11 | class Node:
 12 |     obj_counter   = 0
 13 |     # If a leave holds >= SPLIT_THRESH, we split into two new nodes.
 14 |     
 15 |     def __init__(self, parent = None,  is_good_kid = False, arch_code_len = 0, is_root = False):
 16 |         # Note: every node is initialized as a leaf,
 17 |         # only internal nodes equip with classifiers to make decisions
 18 |         if not is_root:
 19 |             assert type( parent ) == type( self )
 20 |         self.is_root       = is_root
 21 |         self.ARCH_CODE_LEN = arch_code_len
 22 |         self.x_bar         = float("inf")
 23 |         self.n             = 0
 24 |         self.classifier    = Classifier({}, self.ARCH_CODE_LEN)
 25 |         self.parent        = parent
 26 |         self.is_good_kid   = is_good_kid
 27 |         self.uct           = 0
 28 |         
 29 |         #insert curt into the kids of parent
 30 |         if parent is not None:
 31 |             self.parent.kids.append(self)
 32 |             if self.parent.is_leaf  == True:
 33 |                 self.parent.is_leaf = False
 34 |             assert len(self.parent.kids) <= 2
 35 |         self.kids          = []
 36 |         self.bag           = { }
 37 |         self.good_kid_data = {}
 38 |         self.bad_kid_data  = {}
 39 | 
 40 |         self.is_leaf       = True
 41 |         self.id            = Node.obj_counter
 42 |         
 43 |         #data for good and bad kids, respectively
 44 |         Node.obj_counter += 1
 45 |     
 46 |     def visit(self):
 47 |         self.n += 1
 48 |     
 49 |     def collect_sample(self, arch, acc):
 50 |         self.bag[json.dumps(arch) ] = acc
 51 |         self.n                      = len( self.bag )
 52 |     
 53 |     def print_bag(self):
 54 |         print("BAG"+"#"*10)
 55 |         for k, v in self.bag.items():
 56 |             print("arch:", k, "acc:", v)
 57 |         print("BAG"+"#"*10)
 58 |         print('\n')
 59 | 
 60 |     
 61 |     def put_in_bag(self, net, acc):
 62 |         assert type(net) == type([])
 63 |         assert type(acc) == type(float(0.1))
 64 |         net_k = json.dumps(net)
 65 |         self.bag[net_k] = acc
 66 |     
 67 |     def get_name(self):
 68 |         # state is a list of jsons
 69 |         return "node" + str(self.id)
 70 |     
 71 |     def pad_str_to_8chars(self, ins):
 72 |         if len(ins) <= 14:
 73 |             ins += ' '*(14 - len(ins) )
 74 |             return ins
 75 |         else:
 76 |             return ins
 77 |     
 78 |     def __str__(self):
 79 |         name   = self.get_name()
 80 |         name   = self.pad_str_to_8chars(name)
 81 |         name  += ( self.pad_str_to_8chars( 'lf:' + str(self.is_leaf)) )
 82 |         
 83 |         val    = 0
 84 |         name  += ( self.pad_str_to_8chars( ' val:{0:.4f}   '.format(round(self.get_xbar(), 4) ) ) )
 85 |         name  += ( self.pad_str_to_8chars( ' uct:{0:.4f}   '.format(round(self.get_uct(), 4) ) ) )
 86 | 
 87 |         name  += self.pad_str_to_8chars( 'n:'+str(self.n) )
 88 |         name  += self.pad_str_to_8chars( 'sp:'+ str(len(self.bag)) )
 89 |         name  += ( self.pad_str_to_8chars( 'g_k:' + str( len(self.good_kid_data) ) ) )
 90 |         name  += ( self.pad_str_to_8chars( 'b_k:' + str( len(self.bad_kid_data ) ) ) )
 91 | 
 92 |         parent = '----'
 93 |         if self.parent is not None:
 94 |             parent = self.parent.get_name()
 95 |         parent = self.pad_str_to_8chars(parent)
 96 |         
 97 |         name += (' parent:' + parent)
 98 |         
 99 |         kids = ''
100 |         kid  = ''
101 |         for k in self.kids:
102 |             kid   = self.pad_str_to_8chars( k.get_name() )
103 |             kids += kid
104 |         name  += (' kids:' + kids)
105 |         
106 |         return name
107 |     
108 | 
109 |     def get_uct(self):
110 |         if self.is_root and self.parent == None:
111 |             return float('inf')
112 |         if self.n == 0:
113 |             return float('inf')
114 |         Cp = 0.5
115 |         return self.x_bar + 2*Cp*math.sqrt( 2* math.log(self.parent.n) / self.n )
116 |     
117 |     def get_xbar(self):
118 |         return self.x_bar
119 | 
120 |     def get_n(self):
121 |         return self.n
122 |     
123 |     def get_parent_str(self):
124 |         return self.parent.get_name()
125 | 
126 |     def train(self):
127 |         if self.parent == None and self.is_root == True:
128 |         # training starts from the bag
129 |             assert len(self.bag) > 0
130 |             self.classifier.update_samples(self.bag )
131 |             self.good_kid_data, self.bad_kid_data = self.classifier.split_data()
132 |         elif self.is_leaf:
133 |             if self.is_good_kid:
134 |                 self.bag = self.parent.good_kid_data
135 |             else:
136 |                 self.bag = self.parent.bad_kid_data
137 |         else:
138 |             if self.is_good_kid:
139 |                 self.bag = self.parent.good_kid_data
140 |                 self.classifier.update_samples(self.parent.good_kid_data )
141 |                 self.good_kid_data, self.bad_kid_data = self.classifier.split_data()
142 |             else:
143 |                 self.bag = self.parent.bad_kid_data
144 |                 self.classifier.update_samples(self.parent.bad_kid_data )
145 |                 self.good_kid_data, self.bad_kid_data = self.classifier.split_data()
146 |         if len(self.bag) == 0:
147 |            self.x_bar = float('inf')
148 |            self.n     = 0
149 |         else:
150 |            self.x_bar = np.mean( np.array(list(self.bag.values())) )
151 |            self.n     = len( self.bag.values() )
152 | 
153 |     def predict(self):
154 |         if self.parent == None and self.is_root == True and self.is_leaf == False:
155 |             self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.bag)
156 |         elif self.is_leaf:
157 |             if self.is_good_kid:
158 |                 self.bag = self.parent.good_kid_data
159 |             else:
160 |                 self.bag = self.parent.bad_kid_data
161 |         else:
162 |             if self.is_good_kid:
163 |                 self.bag = self.parent.good_kid_data
164 |                 self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.parent.good_kid_data)
165 |             else:
166 |                 self.bag = self.parent.bad_kid_data
167 |                 self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.parent.bad_kid_data)
168 | 
169 |     def sample_arch(self):
170 |         if len(self.bag) == 0:
171 |             return None
172 |         net_str = np.random.choice( list(self.bag.keys() ) )
173 |         del self.bag[net_str]
174 |         return json.loads(net_str )
175 |     
176 |     def clear_data(self):
177 |         self.bag.clear()
178 |         self.bad_kid_data.clear()
179 |         self.good_kid_data.clear()
180 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/server/net_training.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | import numpy as np
 7 | import time
 8 | import sys
 9 | import copy
10 | from   datetime import datetime
11 | import collections
12 | import json
13 | import operator
14 | import os
15 | 
16 | class Net_Trainer:
17 |     best_trace      = collections.OrderedDict()
18 |     dataset         = collections.OrderedDict()
19 |     training_trace  = collections.OrderedDict()
20 |     best_arch       = None
21 |     best_acc        = 0
22 |     best_accuracy   = 0
23 |     counter         = 0
24 |     
25 |     def __init__(self):
26 |         raw_data = []
27 |         with open('features.json', 'r') as infile:
28 |             raw_data = json.loads( infile.read() )
29 |         for i in raw_data:
30 |             arch = i['feature']
31 |             acc  = i['acc']
32 |             self.dataset[json.dumps(arch) ] = acc
33 |             if acc > self.best_acc:
34 |                 self.best_acc  = acc
35 |                 self.best_arch = json.dumps( arch )
36 |         print("searching target:", self.best_arch," acc:", self.best_acc)
37 |         
38 |         print("trainer loaded:", len(self.dataset)," entries" )
39 |     
40 |     def print_best_traces(self):
41 |         print("%"*20)
42 |         print("=====> best accuracy so far:", self.best_accuracy)
43 |         sorted_best_traces = sorted(self.best_trace.items(), key=operator.itemgetter(1))
44 |         for item in sorted_best_traces:
45 |             print(item[0],"==>", item[1])
46 |         for item in sorted_best_traces:
47 |             print(item[1])
48 |         print("%"*20)
49 |        
50 |     def train_net(self, network):
51 |         # input is a code of an architecture
52 |         assert type( network ) == type( [] )
53 |         network_str = json.dumps( network )
54 |         assert network_str in self.dataset
55 |         is_found = False
56 |         acc = self.dataset[network_str]
57 |         # we ensure not to repetitatively sample same architectures
58 |         assert network_str not in self.training_trace.keys()
59 |         self.training_trace[network_str] = acc
60 |         self.counter += 1
61 |         if acc > self.best_accuracy:
62 |             print("@@@update best state:", network)
63 |             print("@@@update best acc:", acc)
64 |             print("target str:", self.best_arch)
65 |             self.best_accuracy = acc
66 |             item = [acc, self.counter]
67 |             self.best_trace[network_str] = item
68 |             if network_str == self.best_arch:
69 |                 sorted_best_traces = sorted(self.best_trace.items(), key=operator.itemgetter(1))
70 |                 final_results = []
71 |                 for item in sorted_best_traces:
72 |                     final_results.append( item[1] )
73 |                 final_results_str = json.dumps(final_results)
74 |                 with open("result.txt", "a") as f:
75 |                     f.write(final_results_str + '\n')
76 |                 print("$$$$$$$$$$$$$$$$$$$CONGRATUGLATIONS$$$$$$$$$$$$$$$$$$$")
77 |                 os._exit(1)
78 | 
79 |         return acc
80 | 


--------------------------------------------------------------------------------
/LaNAS/Distributed_LaNAS/server/search_space.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/LaMCTS/489bd60886f23b0b76b10aa8602ea6722f334ad6/LaNAS/Distributed_LaNAS/server/search_space.zip


--------------------------------------------------------------------------------
/LaNAS/LaNAS_NASBench101/Classifier.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | from torch.autograd import Variable
  9 | import json
 10 | from torch import optim
 11 | import numpy as np
 12 | 
 13 | # this is the backbone model
 14 | # to split networks at a MCTS state
 15 | class LinearModel(nn.Module):
 16 |     
 17 |     def __init__(self, input_dim, output_dim):
 18 |         super(LinearModel, self).__init__()
 19 |         self.fc1 = nn.Linear(input_dim, output_dim)
 20 |         torch.nn.init.xavier_uniform_( self.fc1.weight )
 21 |     
 22 |     def forward(self, x):
 23 |         y = self.fc1(x)
 24 |         #print("=====>X_shape:", x.shape)
 25 |         return y
 26 | 
 27 | # the input will be samples!
 28 | class Classifier():
 29 |     def __init__(self, samples, input_dim):
 30 |         self.training_counter = 0
 31 |         assert input_dim >= 1
 32 |         assert type(samples) ==  type({})
 33 |         self.input_dim  = input_dim
 34 |         self.samples    = samples
 35 |         self.model      = LinearModel(input_dim, 1)
 36 | 
 37 |         if torch.cuda.is_available():
 38 |             self.model.cuda()
 39 |         self.l_rate     = 0.00001
 40 |         self.optimiser  = optim.Adam(self.model.parameters(), lr=self.l_rate, betas=(0.9, 0.999), eps=1e-08)
 41 |         self.epochs     = 1 #TODO:revise to 100
 42 |         self.boundary   = -1
 43 |         self.nets       = []
 44 |         
 45 |     def get_params(self):
 46 |         return self.model.fc1.weight.detach().numpy(), self.model.fc1.bias.detach().numpy()
 47 | 
 48 |     def reinit(self):
 49 |         torch.nn.init.xavier_uniform_( self.m.fc1.weight )
 50 |         torch.nn.init.xavier_uniform_( self.m.fc2.weight )
 51 |     
 52 |     def update_samples(self, latest_samples):
 53 |         assert type(latest_samples) == type(self.samples)
 54 |         sampled_nets    = []
 55 |         nets_acc        = []
 56 |         for k, v in latest_samples.items():
 57 |             net = json.loads(k)
 58 |             sampled_nets.append( net )
 59 |             nets_acc.append( v )
 60 |         self.nets = torch.from_numpy(np.asarray(sampled_nets, dtype=np.float32).reshape(-1, self.input_dim))
 61 |         self.acc  = torch.from_numpy(np.asarray(nets_acc,     dtype=np.float32).reshape(-1, 1))
 62 |         self.samples = latest_samples
 63 |         if torch.cuda.is_available():
 64 |             self.nets = self.nets.cuda()
 65 |             self.acc  = self.acc.cuda()
 66 | 
 67 |     def train(self):
 68 |         if self.training_counter == 0:
 69 |             self.epochs = 20000
 70 |         else:
 71 |             self.epochs = 3000
 72 |         self.training_counter += 1
 73 |         # in a rare case, one branch has no networks
 74 |         if len(self.nets) == 0:
 75 |             return
 76 |         for epoch in range(self.epochs):
 77 |             epoch += 1
 78 |             nets = self.nets
 79 |             acc  = self.acc
 80 |             #clear grads
 81 |             self.optimiser.zero_grad()
 82 |             #forward to get predicted values
 83 |             outputs = self.model.forward( nets )
 84 |             loss = nn.MSELoss()(outputs, acc)
 85 |             loss.backward()# back props
 86 |             nn.utils.clip_grad_norm_(self.model.parameters(), 5)
 87 |             self.optimiser.step()# update the parameters
 88 | #            if epoch % 1000 == 0:
 89 | #                print('@' + self.__class__.__name__ + ' epoch {}, loss {}'.format(epoch, loss.data))
 90 | 
 91 |     def predict(self, remaining):
 92 |         assert type(remaining) == type({})
 93 |         remaining_archs    = []
 94 |         for k, v in remaining.items():
 95 |             net = json.loads(k)
 96 |             remaining_archs.append( net )
 97 |         remaining_archs = torch.from_numpy(np.asarray(remaining_archs, dtype=np.float32).reshape(-1, self.input_dim))
 98 |         if torch.cuda.is_available():
 99 |             remaining_archs = remaining_archs.cuda()
100 |         outputs = self.model.forward(remaining_archs)
101 |         if torch.cuda.is_available():
102 |             remaining_archs = remaining_archs.cpu()
103 |             outputs         = outputs.cpu()
104 |         result  = {}
105 |         counter = 0
106 |         for k in range(0, len(remaining_archs) ):
107 |             counter += 1
108 |             arch = remaining_archs[k].detach().numpy()
109 |             arch_str = json.dumps( arch.tolist() )
110 |             result[ arch_str ] = outputs[k].detach().numpy().tolist()[0]
111 |         assert len(result) == len(remaining)
112 |         return result
113 | 
114 |     def split_predictions(self, remaining):
115 |         assert type(remaining) == type({})
116 |         samples_badness  = {}
117 |         samples_goodies  = {}
118 |         if len(remaining) == 0:
119 |             return samples_badness, samples_goodies
120 |         predictions = self.predict(remaining)
121 |         avg_acc          = self.predict_mean()
122 |         self.boundary    = avg_acc
123 |         for k, v in predictions.items():
124 |             if v < avg_acc:
125 |                 samples_badness[k] = v
126 |             else:
127 |                 samples_goodies[k] = v
128 |         assert len(samples_badness) + len(samples_goodies) == len(remaining)
129 |         return  samples_goodies, samples_badness
130 | 
131 | 
132 |     def predict_mean(self):
133 |         if len(self.nets) == 0:
134 |             return 0
135 |         # can we use the actual acc?
136 |         outputs    = self.model.forward(self.nets)
137 |         pred_np    = None
138 |         if torch.cuda.is_available():
139 |             pred_np = outputs.detach().cpu().numpy()
140 |         else:
141 |             pred_np = outputs.detach().numpy()
142 |         return np.mean(pred_np)
143 |     
144 |     def split_data(self):
145 |         samples_badness  = {}
146 |         samples_goodies  = {}
147 |         if len(self.nets) == 0:
148 |             return samples_badness, samples_goodies
149 |         self.train()
150 |         avg_acc          = self.predict_mean()
151 |         self.boundary    = avg_acc
152 |         for k, v in self.samples.items():
153 |             if v < avg_acc:
154 |                 samples_badness[k]  = v
155 |             else:
156 |                 samples_goodies[k] = v
157 |         assert len(samples_badness) + len(samples_goodies) == len( self.samples )
158 |         return  samples_goodies, samples_badness
159 | 


--------------------------------------------------------------------------------
/LaNAS/LaNAS_NASBench101/Node.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | from Classifier import Classifier
  7 | import json
  8 | import numpy as np
  9 | import math
 10 | 
 11 | class Node:
 12 |     obj_counter   = 0
 13 |     # If a leave holds >= SPLIT_THRESH, we split into two new nodes.
 14 |     
 15 |     def __init__(self, parent = None,  is_good_kid = False, arch_code_len = 0, is_root = False):
 16 |         # Note: every node is initialized as a leaf,
 17 |         # only internal nodes equip with classifiers to make decisions
 18 |         if not is_root:
 19 |             assert type( parent ) == type( self )
 20 |         self.is_root       = is_root
 21 |         self.ARCH_CODE_LEN = arch_code_len
 22 |         self.x_bar         = float("inf")
 23 |         self.n             = 0
 24 |         self.classifier    = Classifier({}, self.ARCH_CODE_LEN)
 25 |         self.parent        = parent
 26 |         self.is_good_kid   = is_good_kid
 27 |         self.uct           = 0
 28 |         self.best_arch     = [0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0, 0.5, 0.5, 0.5, 0.5, 1.0, 0.0]
 29 |         
 30 |         #insert curt into the kids of parent
 31 |         if parent is not None:
 32 |             self.parent.kids.append(self)
 33 |             if self.parent.is_leaf  == True:
 34 |                 self.parent.is_leaf = False
 35 |             assert len(self.parent.kids) <= 2
 36 |         self.kids          = []
 37 |         self.bag           = { }
 38 |         self.good_kid_data = {}
 39 |         self.bad_kid_data  = {}
 40 | 
 41 |         self.is_leaf       = True
 42 |         self.id            = Node.obj_counter
 43 |         
 44 |         #data for good and bad kids, respectively
 45 |         Node.obj_counter += 1
 46 |     
 47 |     def visit(self):
 48 |         self.n += 1
 49 |     
 50 |     def collect_sample(self, arch, acc):
 51 |         self.bag[json.dumps(arch) ] = acc
 52 |         self.n                      = len( self.bag )
 53 |     
 54 |     def print_bag(self):
 55 |         print("BAG"+"#"*10)
 56 |         for k, v in self.bag.items():
 57 |             print("arch:", k, "acc:", v)
 58 |         print("BAG"+"#"*10)
 59 |         print('\n')
 60 | 
 61 |     
 62 |     def put_in_bag(self, net, acc):
 63 |         assert type(net) == type([])
 64 |         assert type(acc) == type(float(0.1))
 65 |         net_k = json.dumps(net)
 66 |         self.bag[net_k] = acc
 67 |     
 68 |     def get_name(self):
 69 |         # state is a list of jsons
 70 |         return "node" + str(self.id)
 71 |     
 72 |     def pad_str_to_8chars(self, ins):
 73 |         if len(ins) <= 14:
 74 |             ins += ' '*(14 - len(ins) )
 75 |             return ins
 76 |     
 77 |     def __str__(self):
 78 |         name   = self.get_name()
 79 |         name   = self.pad_str_to_8chars(name)
 80 |         name  += ( self.pad_str_to_8chars( 'lf:' + str(self.is_leaf)) )
 81 |         
 82 |         val    = 0
 83 |         name  += ( self.pad_str_to_8chars( ' val:{0:.4f}   '.format(round(self.get_xbar(), 4) ) ) )
 84 |         name  += ( self.pad_str_to_8chars( ' uct:{0:.4f}   '.format(round(self.get_uct(), 4) ) ) )
 85 | 
 86 |         name  += self.pad_str_to_8chars( 'n:'+str(self.n) )
 87 |         name  += self.pad_str_to_8chars( 'sp:'+ str(len(self.bag)) )
 88 |         name  += ( self.pad_str_to_8chars( 'g_k:' + str( len(self.good_kid_data) ) ) )
 89 |         name  += ( self.pad_str_to_8chars( 'b_k:' + str( len(self.bad_kid_data ) ) ) )
 90 |         name  += ( self.pad_str_to_8chars( 'best:' + str( json.dumps(self.best_arch) in self.bag ) ) )
 91 | 
 92 | 
 93 |         parent = '----'
 94 |         if self.parent is not None:
 95 |             parent = self.parent.get_name()
 96 |         parent = self.pad_str_to_8chars(parent)
 97 |         
 98 |         name += (' parent:' + parent)
 99 |         
100 |         kids = ''
101 |         kid  = ''
102 |         for k in self.kids:
103 |             kid   = self.pad_str_to_8chars( k.get_name() )
104 |             kids += kid
105 |         name  += (' kids:' + kids)
106 |         
107 |         return name
108 |     
109 | 
110 |     def get_uct(self, Cp = 0.000002):
111 |         if self.is_root and self.parent == None:
112 |             return float('inf')
113 |         if self.n == 0:
114 |             return float('inf')
115 |         return self.x_bar + 2*Cp*math.sqrt( 2* math.log(self.parent.n) / self.n )
116 |     
117 |     def get_xbar(self):
118 |         return self.x_bar
119 | 
120 |     def get_n(self):
121 |         return self.n
122 |     
123 |     def get_parent_str(self):
124 |         return self.parent.get_name()
125 | 
126 |     def train(self):
127 |         if self.parent == None and self.is_root == True:
128 |         # training starts from the bag
129 |             assert len(self.bag) > 0
130 |             self.classifier.update_samples(self.bag )
131 |             self.good_kid_data, self.bad_kid_data = self.classifier.split_data()
132 |         elif self.is_leaf:
133 |             if self.is_good_kid:
134 |                 self.bag = self.parent.good_kid_data
135 |             else:
136 |                 self.bag = self.parent.bad_kid_data
137 |         else:
138 |             if self.is_good_kid:
139 |                 self.bag = self.parent.good_kid_data
140 |                 self.classifier.update_samples(self.parent.good_kid_data )
141 |                 self.good_kid_data, self.bad_kid_data = self.classifier.split_data()
142 |             else:
143 |                 self.bag = self.parent.bad_kid_data
144 |                 self.classifier.update_samples(self.parent.bad_kid_data )
145 |                 self.good_kid_data, self.bad_kid_data = self.classifier.split_data()
146 |         if len(self.bag) == 0:
147 |            self.x_bar = float('inf')
148 |            self.n     = 0
149 |         else:
150 |            self.x_bar = np.mean( np.array(list(self.bag.values())) )
151 |            self.n     = len( self.bag.values() )
152 | 
153 |     def predict(self):
154 |         if self.parent == None and self.is_root == True and self.is_leaf == False:
155 |             self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.bag)
156 |         elif self.is_leaf:
157 |             if self.is_good_kid:
158 |                 self.bag = self.parent.good_kid_data
159 |             else:
160 |                 self.bag = self.parent.bad_kid_data
161 |         else:
162 |             if self.is_good_kid:
163 |                 self.bag = self.parent.good_kid_data
164 |                 self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.parent.good_kid_data)
165 |             else:
166 |                 self.bag = self.parent.bad_kid_data
167 |                 self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.parent.bad_kid_data)
168 | 
169 |     def sample_arch(self):
170 |         if len(self.bag) == 0:
171 |             return None
172 |         net_str = np.random.choice( list(self.bag.keys() ) )
173 |         del self.bag[net_str]
174 |         return json.loads(net_str )
175 |     
176 |     def clear_data(self):
177 |         self.bag.clear()
178 |         self.bad_kid_data.clear()
179 |         self.good_kid_data.clear()
180 | 


--------------------------------------------------------------------------------
/LaNAS/LaNAS_NASBench101/README.md:
--------------------------------------------------------------------------------
 1 | ## LaNAS on NASBench-101
 2 | 
 3 | This folder has everything you need to test LaNAS on NASBench-101. Before you start, please download a preprocessed NASBench-101 from <a href="https://github.com/linnanwang/AlphaX-NASBench101">AlphaX</a> (see section <b>Download the dataset</b>).
 4 | ```
 5 | place nasbench_dataset in LaNAS/LaNAS_NASBench101
 6 | python MCTS.py
 7 | ```
 8 | The program will stop once it finds the global optimum. The search usually takes a few hours to a day. Once it finishes, The search results will be written into the last row in results.txt. Here is an example to interpret the result.
 9 | 
10 | >[[0.9313568472862244, 1], <b>[0.9326255321502686, 47]</b>, [0.9332265059153239, 51], [0.9342948794364929, 72], [0.9343950351079305, 76], [0.93873530626297, 81], [0.9388020833333334, 224], [0.9388688604036967, 472], [0.9391693472862244, 639], [0.9407051205635071, 740], [0.9420072237650553, 831], [0.9423410892486572, 1545], [0.943175752957662, 3259]]
11 | 
12 | This means before the 47th sample, the best validation accuracy is 0.9326255321502686; and in this case LaNAS finds the best network using 3259 samples. The results of a new experiment will be appended as a new row in results.txt.
13 | 
14 | We also provided results of our past runs in <b>our_past_results.txt</b>, you can use that for comparisions; but feel free to reproduce the results with this release.
15 | 
16 | ## About NASBench-101
17 | Please check <a href="https://github.com/linnanwang/AlphaX-NASBench101">AlphaX</a> to see our encoding of NASBench.
18 | 
19 | ## About Predictor based Search Methods
20 | 
21 | <b>The simplest way to verify "why predictor not working" is to try it on the 10 dimensional continuous Ackley function (in functions.py in LA-MCTS). In practice, the search space has 10^30 architectures, you CANNOT predict every one; and whatever predictor you use will fail.</b>
22 | 
23 | <b>Why predictor works well on NASBench?</b>The main issue of predictor based methods is that these methods need to predict every architecture in the search space to perform well, and misses an acquisition (e.g. in Bayesian Optimization) to make the trade-off between exploration and exploitation. NASBench only has 4.2*10^5 networks, which can be predicted in a second. We show a simple MLP can perform well (< 1000 samples) if it predicts on all the architectures in NASBench. Besides, the following figure visualizes the distribution of network-accuracy on NASBench-101, y in log scale. So it is not surprising to see even using random search can find a reasonable result, since most networks are pretty good.
24 | 
25 | <p align="center">
26 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LaNAS/nasbench_distribution.png?raw=true' width="400">
27 | </p>
28 | 
29 | In fact, the purpose of neural predictors, e.g. Graph Neural Network-based predictors, are very similar to the surrogate model (e.g. Gaussian Process) used in Bayesian Optimizations. The original NASBench-101 paper chose a set of very good baselines for comparisons.
30 | 
31 | 
32 | <b>Why predictor works in NASNet or EfficientNet search space?</b> These search space are constructed under very strong prior experience; and the final network accuracy can be hack with bag of tricks listed <a href="https://github.com/facebookresearch/LaMCTS/tree/master/LaNAS/LaNet">here</a>.
33 | 
34 | In this implementation, we used MLP to predict samples to assign an architecture to a partition. This is an engineering simplification and can be replaced by a hit-and-run sampler, i.e. sampling from a convex polytope. However, we do replace this with a sampling method in one-shot LaNAS, i.e. Fig.6(c) in LaNAS; and also see LA-MCTS. Thank you.
35 | 


--------------------------------------------------------------------------------
/LaNAS/LaNAS_NASBench101/extract_end_time.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | 
 7 | import json
 8 | import csv
 9 | 
10 | 
11 | 
12 | with open("our_past_results.txt", "r") as f:
13 |     l = f.readlines()
14 | 
15 | 
16 | list_net = []
17 | for i in range(len(l)):
18 |     l[i] = l[i].rstrip('\n')
19 |     list_net.append(json.loads(l[i]))
20 |     #print(json.loads(l[i]))
21 |     print(str(json.loads(l[i])[-1][1]), end =", "),
22 | print("")
23 | 
24 | 


--------------------------------------------------------------------------------
/LaNAS/LaNAS_NASBench101/net_training.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | import numpy as np
 7 | import time
 8 | import sys
 9 | import copy
10 | from   datetime import datetime
11 | import collections
12 | import json
13 | import operator
14 | import os
15 | 
16 | class Net_Trainer:
17 |     
18 |     def __init__(self):
19 |         self.best_trace      = collections.OrderedDict()
20 |         self.dataset         = collections.OrderedDict()
21 |         self.training_trace  = collections.OrderedDict()
22 |         self.best_arch       = None
23 |         self.best_acc        = 0
24 |         self.best_accuracy   = 0
25 |         self.counter         = 0
26 | 
27 |         raw_data = []
28 |         with open('nasbench_dataset', 'r') as infile:
29 |             raw_data = json.loads( infile.read() )
30 |         for i in raw_data:
31 |             arch = i['feature']
32 |             acc  = i['acc']
33 |             self.dataset[json.dumps(arch) ] = acc
34 |             if acc > self.best_acc:
35 |                 self.best_acc  = acc
36 |                 self.best_arch = json.dumps( arch )
37 |         print("searching target:", self.best_arch," acc:", self.best_acc)
38 |         
39 |         print("trainer loaded:", len(self.dataset)," entries" )
40 |     
41 |     def print_best_traces(self):
42 |         print("%"*20)
43 |         print("=====> best accuracy so far:", self.best_accuracy)
44 |         sorted_best_traces = sorted(self.best_trace.items(), key=operator.itemgetter(1))
45 |         for item in sorted_best_traces:
46 |             print(item[0],"==>", item[1])
47 |         for item in sorted_best_traces:
48 |             print(item[1])
49 |         print("%"*20)
50 |        
51 |     def train_net(self, network):
52 |         # input is a code of an architecture
53 |         assert type( network ) == type( [] )
54 |         network_str = json.dumps( network )
55 |         assert network_str in self.dataset
56 |         is_found = False
57 |         acc = self.dataset[network_str]
58 |         # we ensure not to repetitatively sample same architectures
59 |         assert network_str not in self.training_trace.keys()
60 |         self.training_trace[network_str] = acc
61 |         self.counter += 1
62 |         if acc > self.best_accuracy:
63 |             print("@@@update best state:", network)
64 |             print("@@@update best acc:", acc)
65 |             print("target str:", self.best_arch)
66 |             self.best_accuracy = acc
67 |             item = [acc, self.counter]
68 |             self.best_trace[network_str] = item
69 |             if network_str == self.best_arch:
70 |                 sorted_best_traces = sorted(self.best_trace.items(), key=operator.itemgetter(1))
71 |                 final_results = []
72 |                 for item in sorted_best_traces:
73 |                     final_results.append( item[1] )
74 |                 final_results_str = json.dumps(final_results)
75 |                 with open("result.txt", "a") as f:
76 |                     f.write(final_results_str + '\n')
77 |                 print("$$$$$$$$$$$$$$$$$$$CONGRATUGLATIONS$$$$$$$$$$$$$$$$$$$")
78 |                 os._exit(1)
79 | 
80 |         return acc
81 | 


--------------------------------------------------------------------------------
/LaNAS/LaNet/CIFAR10/README.md:
--------------------------------------------------------------------------------
 1 | ## Testing LaNet
 2 | 
 3 | 1. Download pre-trained checkpoint from <a href="https://drive.google.com/file/d/1bZsEoG-sroVyYR4F_2ozGLA5W50CT84P/view?usp=sharing">here</a>, and place and unzip it in the same folder.
 4 | 
 5 | 2. Run the following command to test.
 6 | ```
 7 | python test.py  --checkpoint  ./lanas_128_99.03 --layers 24 --init_ch 128 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]'
 8 | ```
 9 | 
10 | ```[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]``` is the best network found during the search. The snapshot below shows the top performing architectures (bottom to top) found during the distributed search. You can get the whole trace from <a href="../../Distributed_LaNAS">here</a>.
11 | 
12 | <p align="center">
13 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LaNAS/distributed_search_results.png?raw=true' width="600">
14 | </p>
15 | 
16 | 
17 | 
18 | ## Training LaNet
19 | 1. Install cutmix
20 | 
21 | ```pip install git+https://github.com/ildoonet/cutmix```
22 | 
23 | 2. run training with the following command.
24 | 
25 | ```
26 | mkdir checkpoints
27 | python train.py --auxiliary --batch_size=32 --init_ch=128 --layer=24 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' --model_ema --model-ema-decay 0.9999 --auto_augment --epochs 1500
28 | ```
29 | 
30 | - **Training on the ImageNet**
31 | 
32 | Please use the training pipeline from <a href="https://github.com/rwightman/pytorch-image-models">Pytorch-Image-Models</a>. Here we describe the procedures to do so:
33 | 1. get the network from train.py, line 121
34 | 2. go to Pytorch-Image-Models
35 | 3. find pytorch-image-models/blob/master/timm/models/factory.py, replace line 57 as follows
36 | ``` 
37 | # model = create_fn(**model_args, **kwargs) 
38 | model = our-network
39 | ```
40 | <b> Our ImageNet pipeline will be released soon, stay tuned. </b>
41 | 
42 | 
43 | 


--------------------------------------------------------------------------------
/LaNAS/LaNet/CIFAR10/model.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | from operations import *
  7 | 
  8 | 
  9 | class Cell(nn.Module):
 10 | 
 11 |     def __init__(self, genotype, C_prev_prev, C_prev, C, reduction, reduction_prev):
 12 | 
 13 |         super(Cell, self).__init__()
 14 | 
 15 |         if reduction_prev:
 16 |             self.preprocess0 = FactorizedReduce(C_prev_prev, C)
 17 |         else:
 18 |             self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0)
 19 |         self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0)
 20 | 
 21 |         if reduction:
 22 |             op_names, indices = zip(*genotype.reduce)
 23 |             concat = genotype.reduce_concat
 24 |         else:
 25 |             op_names, indices = zip(*genotype.normal)
 26 |             concat = genotype.normal_concat
 27 |         self._compile(C, op_names, indices, concat, reduction)
 28 | 
 29 |     def _compile(self, C, op_names, indices, concat, reduction):
 30 | 
 31 |         assert len(op_names) == len(indices)
 32 | 
 33 |         self._steps = len(op_names) // 2
 34 |         self._concat = concat
 35 |         self.multiplier = len(concat)
 36 | 
 37 |         self._ops = nn.ModuleList()
 38 |         for name, index in zip(op_names, indices):
 39 |             stride = 2 if reduction and index < 2 else 1
 40 |             op = OPS[name](C, stride, True)
 41 |             self._ops += [op]
 42 |         self._indices = indices
 43 | 
 44 |     def forward(self, s0, s1, drop_prob):
 45 | 
 46 |         s0 = self.preprocess0(s0)
 47 |         s1 = self.preprocess1(s1)
 48 | 
 49 |         states = [s0, s1]
 50 |         for i in range(self._steps):
 51 |             h1 = states[self._indices[2 * i]]
 52 |             h2 = states[self._indices[2 * i + 1]]
 53 |             op1 = self._ops[2 * i]
 54 |             op2 = self._ops[2 * i + 1]
 55 |             h1 = op1(h1)
 56 |             h2 = op2(h2)
 57 | 
 58 |             if self.training and drop_prob > 0.:
 59 |                 if not isinstance(op1, Identity):
 60 |                     h1 = drop_path(h1, drop_prob)
 61 |                 if not isinstance(op2, Identity):
 62 |                     h2 = drop_path(h2, drop_prob)
 63 |             s = h1 + h2
 64 |             states += [s]
 65 |         return torch.cat([states[i] for i in self._concat], dim=1)
 66 | 
 67 | def drop_path(x, drop_prob):
 68 |     if drop_prob > 0.:
 69 |         keep_prob = 1. - drop_prob
 70 | 
 71 |         mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
 72 |         x.div_(keep_prob)
 73 |         try:
 74 |             x.mul_(mask)
 75 |         except:
 76 |             mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
 77 |             x.mul_(mask)
 78 |     return x
 79 | 
 80 | 
 81 | 
 82 | 
 83 | class AuxiliaryHeadCIFAR(nn.Module):
 84 | 
 85 |     def __init__(self, C, num_classes):
 86 |         """assuming input size 8x8"""
 87 |         super(AuxiliaryHeadCIFAR, self).__init__()
 88 | 
 89 |         self.features = nn.Sequential(
 90 |             nn.ReLU(inplace=True),
 91 |             nn.AvgPool2d(5, stride=3, padding=0, count_include_pad=False),  # image size = 2 x 2
 92 |             nn.Conv2d(C, 128, 1, bias=False),
 93 |             nn.BatchNorm2d(128),
 94 |             nn.ReLU(inplace=True),
 95 |             nn.Conv2d(128, 768, 2, bias=False),
 96 |             nn.BatchNorm2d(768),
 97 |             nn.ReLU(inplace=True)
 98 |         )
 99 |         self.classifier = nn.Linear(768, num_classes)
100 | 
101 |     def forward(self, x):
102 |         x = self.features(x)
103 |         x = self.classifier(x.view(x.size(0), -1))
104 |         return x
105 | 
106 | 
107 | class NetworkCIFAR(nn.Module):
108 | 
109 |     def __init__(self, C, num_classes, layers, auxiliary, genotype):
110 |         super(NetworkCIFAR, self).__init__()
111 | 
112 |         self._layers = layers
113 |         self._auxiliary = auxiliary
114 | 
115 |         stem_multiplier = 3
116 |         C_curr = stem_multiplier * C
117 |         self.stem = nn.Sequential(
118 |             nn.Conv2d(3, C_curr, 3, padding=1, bias=False),
119 |             nn.BatchNorm2d(C_curr)
120 |         )
121 | 
122 |         C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
123 |         self.cells = nn.ModuleList()
124 |         reduction_prev = False
125 |         for i in range(layers):
126 |             if i in [layers // 3, 2 * layers // 3]:
127 |                 C_curr *= 2
128 |                 reduction = True
129 |             else:
130 |                 reduction = False
131 |             cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction, reduction_prev)
132 |             reduction_prev = reduction
133 |             self.cells += [cell]
134 |             C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr
135 |             if i == 2 * layers // 3:
136 |                 C_to_auxiliary = C_prev
137 | 
138 |         if auxiliary:
139 |             self.auxiliary_head = AuxiliaryHeadCIFAR(C_to_auxiliary, num_classes)
140 |         self.global_pooling = nn.AdaptiveAvgPool2d(1)
141 |         self.classifier = nn.Linear(C_prev, num_classes)
142 | 
143 |     def forward(self, input):
144 |         logits_aux = None
145 |         s0 = s1 = self.stem(input)
146 |         for i, cell in enumerate(self.cells):
147 |             if self.training:
148 |                 s0, s1 = s1, cell(s0, s1, self.drop_path_prob)
149 |             else:
150 |                 s0, s1 = s1, cell(s0, s1, 0)
151 |             if i == 2 * self._layers // 3:
152 |                 if self._auxiliary and self.training:
153 |                     logits_aux = self.auxiliary_head(s1)
154 |         out = self.global_pooling(s1)
155 |         logits = self.classifier(out.view(out.size(0), -1))
156 |         return logits, logits_aux
157 | 
158 | 
159 | 
160 | 
161 | 
162 | 
163 | 
164 | 
165 | 
166 | 
167 | 
168 | 


--------------------------------------------------------------------------------
/LaNAS/LaNet/CIFAR10/operations.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | 
 10 | OPS = {
 11 |     'avg_pool_3x3': lambda C, stride, affine: nn.AvgPool2d(3, stride=stride, padding=1, count_include_pad=False),
 12 |     'max_pool_2x2' : lambda C, stride, affine: nn.MaxPool2d(2, stride=stride, padding=0),
 13 |     'max_pool_3x3': lambda C, stride, affine: nn.MaxPool2d(3, stride=stride, padding=1),
 14 |     'max_pool_5x5': lambda C, stride, affine: nn.MaxPool2d(5, stride=stride, padding=2),
 15 |     'skip_connect': lambda C, stride, affine: Identity() if stride == 1 else FactorizedReduce(C, C, affine=affine),
 16 |     'sep_conv_3x3': lambda C, stride, affine: SepConv(C, C, 3, stride, 1, affine=affine),
 17 |     'sep_conv_5x5': lambda C, stride, affine: SepConv(C, C, 5, stride, 2, affine=affine),
 18 |     'sep_conv_7x7': lambda C, stride, affine: SepConv(C, C, 7, stride, 3, affine=affine),
 19 |     'dil_conv_3x3' : lambda C, stride, affine: DilConv(C, C, 3, stride, 2, 2, affine=affine),
 20 |     'dil_conv_5x5' : lambda C, stride, affine: DilConv(C, C, 5, stride, 4, 2, affine=affine),
 21 |     'conv_1x1' : lambda C, stride, affine: nn.Conv2d(C, C, (1,1), stride=(stride, stride), padding=(0,0), bias=False),
 22 |     'conv_3x3' : lambda C, stride, affine: nn.Conv2d(C, C, (3,3), stride=(stride, stride), padding=(1,1), bias=False),
 23 |     'conv_5x5' : lambda C, stride, affine: nn.Conv2d(C, C, (5,5), stride=(stride, stride), padding=(2,2), bias=False),
 24 | }
 25 | 
 26 | 
 27 | class ReLUConvBN(nn.Module):
 28 |     def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True):
 29 | 
 30 |         super(ReLUConvBN, self).__init__()
 31 | 
 32 |         self.op = nn.Sequential(
 33 |             nn.ReLU(inplace=False),
 34 |             Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False),
 35 |             nn.BatchNorm2d(C_out, affine=affine)
 36 |         )
 37 | 
 38 |     def forward(self, x):
 39 |         return self.op(x)
 40 | 
 41 | class Conv2d(nn.Conv2d):
 42 | 
 43 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1,
 44 |                  padding=0, dilation=1, groups=1, bias=True):
 45 |         super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride,
 46 |                  padding, dilation, groups, bias)
 47 | 
 48 |     def forward(self, x):
 49 |         weight = self.weight
 50 |         weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2,
 51 |                                   keepdim=True).mean(dim=3, keepdim=True)
 52 |         weight = weight - weight_mean
 53 |         std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5
 54 |         weight = weight / std.expand_as(weight)
 55 |         return F.conv2d(x, weight, self.bias, self.stride,
 56 |                         self.padding, self.dilation, self.groups)
 57 | 
 58 | 
 59 | class SepConv(nn.Module):
 60 | 
 61 |     def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True):
 62 |         super(SepConv, self).__init__()
 63 | 
 64 |         self.op = nn.Sequential(
 65 |             nn.ReLU(inplace=False),
 66 |             nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding,
 67 |                       groups=C_in, bias=False),
 68 |             Conv2d(C_in, C_in, kernel_size=1, padding=0, bias=False),
 69 |             nn.BatchNorm2d(C_in, affine=affine),
 70 |             nn.ReLU(inplace=False),
 71 |             nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=1, padding=padding,
 72 |                       groups=C_in, bias=False),
 73 |             Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False),
 74 |             nn.BatchNorm2d(C_out, affine=affine),
 75 |         )
 76 | 
 77 |     def forward(self, x):
 78 |         return self.op(x)
 79 | 
 80 | 
 81 | class Identity(nn.Module):
 82 | 
 83 |     def __init__(self):
 84 |         super(Identity, self).__init__()
 85 | 
 86 |     def forward(self, x):
 87 |         return x
 88 | 
 89 | 
 90 | class DilConv(nn.Module):
 91 |     def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True):
 92 |         super(DilConv, self).__init__()
 93 |         self.op = nn.Sequential(
 94 |             nn.ReLU(inplace=False),
 95 |             nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation,
 96 |                       groups=C_in, bias=False),
 97 |             Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False),
 98 |             nn.BatchNorm2d(C_out, affine=affine),
 99 |         )
100 | 
101 |     def forward(self, x):
102 |         return self.op(x)
103 | 
104 | 
105 | class FactorizedReduce(nn.Module):
106 | 
107 |     def __init__(self, C_in, C_out, affine=True):
108 | 
109 |         super(FactorizedReduce, self).__init__()
110 | 
111 |         assert C_out % 2 == 0
112 | 
113 |         self.relu = nn.ReLU(inplace=False)
114 |         self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False)
115 |         self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False)
116 |         self.bn = nn.BatchNorm2d(C_out, affine=affine)
117 | 
118 |     def forward(self, x):
119 |         x = self.relu(x)
120 | 
121 |         out = torch.cat([self.conv_1(x), self.conv_2(x[:, :, 1:, 1:])], dim=1)
122 |         out = self.bn(out)
123 |         return out
124 | 
125 | 
126 | 
127 | 
128 | 
129 | 
130 | 
131 | 
132 | 
133 | 
134 | 
135 | 


--------------------------------------------------------------------------------
/LaNAS/LaNet/CIFAR10/test.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import sys
  7 | import utils
  8 | import argparse
  9 | import torch.nn as nn
 10 | import torch.utils
 11 | import torchvision.datasets as dset
 12 | import torch.backends.cudnn as cudnn
 13 | from collections import namedtuple
 14 | from model import NetworkCIFAR as Network
 15 | from utils import *
 16 | from torch.utils.data.dataset import Subset
 17 | import logging
 18 | from nasnet_set import *
 19 | 
 20 | 
 21 | 
 22 | parser = argparse.ArgumentParser("cifar10")
 23 | parser.add_argument('--data', type=str, default='../data', help='location of the data corpus')
 24 | parser.add_argument('--batch_size', type=int, default=96, help='batch size')
 25 | parser.add_argument('--lr', type=float, default=0.025, help='init learning rate')
 26 | parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
 27 | parser.add_argument('--wd', type=float, default=3e-4, help='weight decay')
 28 | parser.add_argument('--report_freq', type=float, default=50, help='report frequency')
 29 | parser.add_argument('--gpu', type=int, default=0, help='gpu device id')
 30 | parser.add_argument('--epochs', type=int, default=600, help='num of training epochs')
 31 | parser.add_argument('--layers', type=int, default=24, help='total number of layers')
 32 | parser.add_argument('--init_ch', type=int, default=36, help='num of init channels')
 33 | parser.add_argument('--model_path', type=str, default='saved_models', help='path to save the model')
 34 | parser.add_argument('--auxiliary_weight', type=float, default=0.4, help='weight for auxiliary loss')
 35 | parser.add_argument('--cutout', action='store_true', default=False, help='use cutout')
 36 | parser.add_argument('--cutout_length', type=int, default=16, help='cutout length')
 37 | parser.add_argument('--drop_path_prob', type=float, default=0.2, help='drop path probability')
 38 | parser.add_argument('--seed', type=int, default=0, help='random seed')
 39 | parser.add_argument('--arch', type=str, default='', help='which architecture to use')
 40 | parser.add_argument('--checkpoint', type=str, default='', help='load from checkpoint')
 41 | parser.add_argument('--save', type=str, default='EXP', help='experiment name')
 42 | 
 43 | 
 44 | 
 45 | 
 46 | 
 47 | args = parser.parse_args()
 48 | 
 49 | 
 50 | net = eval(args.arch)
 51 | print(net)
 52 | code = gen_code_from_list(net, node_num=int((len(net) / 4)))
 53 | genotype = translator([code, code], max_node=int((len(net) / 4)))
 54 | print(genotype)
 55 | 
 56 | 
 57 | 
 58 | 
 59 | 
 60 | log_format = '%(asctime)s %(message)s'
 61 | logging.basicConfig(stream=sys.stdout, level=logging.INFO,
 62 |                     format=log_format, datefmt='%m/%d %I:%M:%S %p')
 63 | fh = logging.FileHandler(os.path.join(args.checkpoint, 'log.txt'))
 64 | fh.setFormatter(logging.Formatter(log_format))
 65 | logging.getLogger().addHandler(fh)
 66 | 
 67 | 
 68 | 
 69 | def main():
 70 | 
 71 |     torch.cuda.set_device(args.gpu)
 72 |     cudnn.benchmark = True
 73 |     cudnn.enabled = True
 74 | 
 75 |     logging.info('gpu device = %d' % args.gpu)
 76 |     logging.info("args = %s", args)
 77 | 
 78 | 
 79 |     model = Network(args.init_ch, 10, args.layers, True, genotype).cuda()
 80 | 
 81 |     logging.info("param size = %fMB", utils.count_parameters_in_MB(model))
 82 | 
 83 |     checkpoint = torch.load(args.checkpoint + '/top1.pt')
 84 |     model.load_state_dict(checkpoint['model_state_dict'])
 85 |     criterion = nn.CrossEntropyLoss().cuda()
 86 | 
 87 | 
 88 |     CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
 89 |     CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
 90 | 
 91 |     valid_transform = transforms.Compose([
 92 |         transforms.ToTensor(),
 93 |         transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
 94 |     ])
 95 | 
 96 | 
 97 | 
 98 | 
 99 |     valid_queue = torch.utils.data.DataLoader(
100 |             dset.CIFAR10(root=args.data, train=False, transform=valid_transform),
101 |             batch_size=args.batch_size, shuffle=True, num_workers=2, pin_memory=True)
102 | 
103 | 
104 |     valid_acc, valid_obj = infer(valid_queue, model, criterion)
105 |     logging.info('valid_acc: %f', valid_acc)
106 | 
107 | 
108 | 
109 | def infer(valid_queue, model, criterion):
110 | 
111 |     objs = utils.AverageMeter()
112 |     top1 = utils.AverageMeter()
113 |     top5 = utils.AverageMeter()
114 |     model.eval()
115 | 
116 |     for step, (x, target) in enumerate(valid_queue):
117 |         x = x.cuda()
118 |         target = target.cuda(non_blocking=True)
119 | 
120 |         with torch.no_grad():
121 |             logits, _ = model(x)
122 |             loss = criterion(logits, target)
123 | 
124 |             prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5))
125 |             n = x.size(0)
126 |             objs.update(loss.item(), n)
127 |             top1.update(prec1.item(), n)
128 |             top5.update(prec5.item(), n)
129 | 
130 | 
131 |         if step % args.report_freq == 0:
132 |             logging.info('>>Validation: %03d %e %f %f', step, objs.avg, top1.avg, top5.avg)
133 | 
134 | 
135 | 
136 |     return top1.avg, objs.avg
137 | 
138 | 
139 | 
140 | if __name__ == '__main__':
141 |     main()


--------------------------------------------------------------------------------
/LaNAS/LaNet/README.md:
--------------------------------------------------------------------------------
 1 | CIFAR10 folder currently contains the test and training pipeline using the NASNet search space. 
 2 | 
 3 | The code for EfficientNet search space on ImageNet will be released later.
 4 | 
 5 | ## Performance of LaNet
 6 | **CIFAR-10**: 99.03% top-1 using NASNet search space, SoTA result without using ImageNet or transfer learning.
 7 | 
 8 | |     Model      | LaNet      | EfficientNet-B7       | GPIPE                 | PyramidNet      | XNAS           |
 9 | | -------------- | ---------- | ---------             | ----------            | --------------  | -------------- |
10 | | top-1          | 99.03      | 98.9                  | 99.0                  | 98.64           | 98.4           |
11 | | use ImageNet   | X          | <span>&#10003;</span> | <span>&#10003;</span> | X               | X              |
12 | 
13 | 
14 | **ImageNet**: 77.7% top-1@240 MFLOPS, 80.8% top-1@600 MFLOPS using EfficientNet search space, SoTA results on the efficentNet space.
15 | 
16 | 
17 | |     Model      | LaNet      | OFA       | FBNetV2-F4 | MobileNet-V3    | FBNet-B        |
18 | | -------------- | ---------- | --------- | ---------- | --------------  | -------------- |
19 | | top-1          | 77.7       | 76.9      | 76.0       | 75.2            | 74.1           |
20 | | MFLOPS         | 240        | 230       | 238        | 219             | 295            |
21 | 
22 | |     Model      | LaNet      | OFA       | FBNetV3    | EfficientNet-B1|
23 | | -------------- | ---------- | --------- |  -----------| -------------- |
24 | | top-1          | 80.8       | 80.0      |  79.6       | 79.1           |
25 | | MFLOPS         | 600        | 595       |  544        | 700            |
26 | 
27 | 
28 | **Applying LaNet to detection**: Compared to NAS-FCOS in CVPR-2020,
29 | |     Backbone      | Decoder      | FLOPS(G)       | AP    |
30 | | ----------------- | ------------ | -------------- |-------|
31 | |     LaNet         | FPN-FCOS     | 35.22          | 36.5  |
32 | |     MobileNetV2   | FPN-FCOS     | 105.4          | 31.2  |
33 | |     MobileNetV2   | NAS-FCOS     | 39.3           | 32.0  |
34 | 
35 | <b>We will release the ImageNet model, search framework, training pipeline, and their applications on detection, segmentation soon; stay tuned.</b>
36 | 
37 | 
38 | 
39 | ## Bag of Tricks
40 | Here are the following training heuristics we have used in our project:
41 | 
42 | - ***Data Augmentation*** 
43 | > We use CutOut, CutMix and RandAugmentation. Pytorch-Image-Models has a very nice implementation, but keep an eye of SoTA data augmentation techniques.
44 | 
45 | - ***Distillation*** 
46 | 
47 | >a) The main source of the performance improvement in recent NAS EfficientNet paper.
48 | It seems training a student network together with a teacher from scratch can further improve the current SoTA, 
49 | better than transferring weights from a fancy supernet. 
50 | 
51 | >b) Use a better teacher helps.
52 | 
53 | >c) A better teacher may require larger images than student, use interpolation to resize the batch to feed into student and teacher.
54 | 
55 | - ***Training Hyper-parameters***
56 | > drop out and drop path, tune your training hyper-parameters. The learning rate cannot be too large nor too small, check your loss progress.  
57 | 
58 | - ***EMA***
59 | > Using Exponential Moving Average (EMA) in the models, e.g. CNN, Detection, Transformers, unsupervised models, or whatever NLP or CV models, helps the performance, especially when your training finishes with fewer number of epochs.
60 | 
61 | - ***Testing different Crops***
62 | > Try changing different crop percentages in testing, it usually improves 0.1%.
63 | 
64 | - ***Longer epochs***
65 | > Make sure your model is sufficiently trained.
66 | 
67 | 


--------------------------------------------------------------------------------
/LaNAS/README.md:
--------------------------------------------------------------------------------
 1 | # Introduction
 2 | LaNAS is an application of LA-MCTS to Neural Architecture Search (NAS), though the more general approach (LA-MCTS) was inspired from LaNAS. Here is what are included in this release:
 3 | 
 4 | - <a href="./LaNAS_NASBench101">**Evaluation on NASBench-101** </a>: Evaluating LaNAS on NASBench-101 without training models. 
 5 | 
 6 | - <a href="./LaNet">**Our Searched Models, LaNet**</a>: SoTA results: • 99.03% on CIFAR-10 • 77.7% @ 240MFLOPS on ImageNet.
 7 | 
 8 | - <a href="./one-shot_LaNAS">**One/Few-shot LaNAS**</a>: Using a supernet to evaluate the model, obtaining results within a few GPU days.
 9 | 
10 | - <a href="./Distributed_LaNAS">**Distributed LaNAS**</a>: Distributed framework for LaNAS, usable with hundreds of GPUs.
11 | 
12 | - <a href="./LaNet">**Training heuristics used**</a>: We list all tricks used in ImageNet training to reach SoTA. 
13 | 
14 | # Publication
15 | 
16 | <a href="https://linnanwang.github.io/latent-actions.pdf">Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search</a> </br>
17 | Linnan Wang (Brown University), Saining Xie (Facebook AI Research), Teng Li(Facebook AI Research), Rodrigo Fonesca (Brown University), Yuandong Tian (Facebook AI Research)</br>
18 | 
19 | And special thanks to the enormous help from Yiyang Zhao (Worcester Polytechnic Institute).
20 | 
21 | # Performance on NASBench-101
22 | We have strictly followed NASBench-101 guidlines in benchmarking the results, please see our paper for details.
23 | <p align="center">
24 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LaNAS/Benchmark.png?raw=true' width="800">
25 | </p>
26 | 
27 | # Performance of Searched Networks
28 | **CIFAR-10**: 99.03% top-1 using NASNet search space, SoTA result without using ImageNet or transfer learning.
29 | 
30 | |     Model      | LaNet      | EfficientNet-B7       | GPIPE                 | PyramidNet      | XNAS           |
31 | | -------------- | ---------- | ---------             | ----------            | --------------  | -------------- |
32 | | top-1          | 99.03      | 98.9                  | 99.0                  | 98.64           | 98.4           |
33 | | use ImageNet   | X          | <span>&#10003;</span> | <span>&#10003;</span> | X               | X              |
34 | 
35 | 
36 | **ImageNet**: 77.7% top-1@240 MFLOPS, 80.8% top-1@600 MFLOPS using EfficientNet search space, SoTA results on the efficentNet space.
37 | 
38 | 
39 | |     Model      | LaNet      | OFA       | FBNetV2-F4 | MobileNet-V3    | FBNet-B        |
40 | | -------------- | ---------- | --------- | ---------- | --------------  | -------------- |
41 | | top-1          | 77.7       | 76.9      | 76.0       | 75.2            | 74.1           |
42 | | MFLOPS         | 240        | 230       | 238        | 219             | 295            |
43 | 
44 | |     Model      | LaNet      | OFA       | FBNetV3    | EfficientNet-B1|
45 | | -------------- | ---------- | --------- |  -----------| -------------- |
46 | | top-1          | 80.8       | 80.0      |  79.6       | 79.1           |
47 | | MFLOPS         | 600        | 595       |  544        | 700            |
48 | 
49 | 
50 | **Applying LaNet to detection**: Compared to NAS-FCOS in CVPR-2020,
51 | |     Backbone      | Decoder      | FLOPS(G)       | AP    |
52 | | ----------------- | ------------ | -------------- |-------|
53 | |     LaNet         | FPN-FCOS     | 35.22          | 36.5  |
54 | |     MobileNetV2   | FPN-FCOS     | 105.4          | 31.2  |
55 | |     MobileNetV2   | NAS-FCOS     | 39.3           | 32.0  |
56 | 
57 | <b>We will release the ImageNet model, search framework, training pipeline, and their applications on detection, segmentation soon; stay tuned.</b>
58 | 
59 | 
60 | # Trying Other CV or NLP Applications
61 | In LaNAS, we model a network architecture as a vector encoding, i.e. [x,...,x], and there is an decoder that translate the encoding into a runnable model for PyTorch or TensorFlow. Please see the function train_net in `net_training.py`. 
62 | That means you only need implement an evaluator / cost model for your applications to use LaNAS. 
63 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/Evaluate/generator.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import numpy as np
  7 | import itertools
  8 | import random
  9 | import copy
 10 | 
 11 | node = 4
 12 | layer_type = [
 13 |     'max_pool_3x3',
 14 |     'skip_connect',
 15 |     'sep_conv_3x3',
 16 |     'sep_conv_5x5'
 17 | ]
 18 | 
 19 | 
 20 | def supernet_generator(node, layer_type):
 21 |     vec_length = len(layer_type)
 22 |     masked_vec = np.ones((1, vec_length))[0].tolist()
 23 |     disconnected_vec = np.zeros((1, vec_length))[0].tolist()
 24 |     supernet = [[] for v in range(node)]
 25 |     for i in range(node):
 26 |         for j in range(node + 2):
 27 |             if j < i + 2:
 28 |                 supernet[i].append(masked_vec.copy())
 29 |             else:
 30 |                 supernet[i].append(disconnected_vec.copy())
 31 |     return supernet
 32 | 
 33 | def mask_specific_value(supernet, node_id, input_id, operation_id):
 34 |     supernet[node_id][input_id][operation_id] = 0.0
 35 |     return supernet
 36 | 
 37 | 
 38 | def selected_specific_value(supernet, node_id, input_id, operation_id):
 39 |     for i in range(len(supernet[node_id][input_id])):
 40 |         if i != operation_id:
 41 |             supernet[node_id][input_id][i] = 0.0
 42 |     return supernet
 43 | 
 44 | 
 45 | 
 46 | def name_compression_encoder(uncompressed_supernet, layer_type):
 47 |     supernet = copy.deepcopy(uncompressed_supernet)
 48 | 
 49 |     connectivity_domain = [0.0, 1.0]
 50 | 
 51 |     mix_operator = [p for p in itertools.product(connectivity_domain, repeat=len(layer_type))]
 52 |     for i in range(len(mix_operator)):
 53 |         mix_operator[i] = list(mix_operator[i])
 54 | 
 55 |     for i in range(len(supernet)):
 56 |         for j in range(len(supernet[i])):
 57 |             if type(supernet[i][j]) is list:
 58 |                 for p in range(len(mix_operator)):
 59 |                     if supernet[i][j] == mix_operator[p]:
 60 |                         supernet[i][j] = p
 61 |     return supernet
 62 | 
 63 | def name_compression_decoder(compressed_supernet, layer_type):
 64 |     supernet = copy.deepcopy(compressed_supernet)
 65 | 
 66 |     connectivity_domain = [0.0, 1.0]
 67 | 
 68 |     mix_operator = [p for p in itertools.product(connectivity_domain, repeat=len(layer_type))]
 69 |     for i in range(len(mix_operator)):
 70 |         mix_operator[i] = list(mix_operator[i])
 71 | 
 72 |     for i in range(len(supernet)):
 73 |         for j in range(len(supernet[i])):
 74 |             supernet[i][j] = mix_operator[supernet[i][j]]
 75 | 
 76 |     return supernet
 77 | 
 78 | 
 79 | def layer_type_encoder(layer_type):
 80 |     encoded_type = []
 81 |     for i in range(len(layer_type)):
 82 |         if layer_type[i] == 'skip_connect':
 83 |             encoded_type.append(0)
 84 |         if layer_type[i] == 'max_pool_3x3':
 85 |             encoded_type.append(1)
 86 |         if layer_type[i] == 'sep_conv_3x3':
 87 |             encoded_type.append(2)
 88 |         if layer_type[i] == 'sep_conv_5x5':
 89 |             encoded_type.append(3)
 90 | 
 91 |     return sorted(encoded_type)
 92 | 
 93 | 
 94 | def random_net_generator(supernet, numbers):
 95 | 
 96 |     avail_node = []
 97 |     for i in range(len(supernet)):
 98 |         for j in range(len(supernet[i])):
 99 |             if supernet[i][j] != 0:
100 |                 avail_node.append([i, j])
101 | 
102 |     net_list = []
103 |     i = 0
104 |     while True:
105 |         new_net = copy.deepcopy(supernet)
106 |         changed_time = random.randint(1, 10)
107 | 
108 |         for j in range(changed_time):
109 |             changed_node = random.choice(avail_node)
110 |             changed_value = random.randint(0, 15)
111 |             new_net[changed_node[0]][changed_node[1]] = changed_value
112 | 
113 |         if new_net not in net_list:
114 |             n = copy.deepcopy(new_net)
115 |             net_list.append(n)
116 |             i += 1
117 | 
118 |         if i == numbers:
119 |             break
120 |     # print(net_list)
121 |     return net_list
122 | 
123 | 
124 | def resume_net_from_file(path):
125 |     with open(path, 'r') as f:
126 |         network = eval(f.read())
127 |     return network
128 | 
129 | 
130 | 
131 | supernet_normal = supernet_generator(node, layer_type)
132 | supernet_normal = mask_specific_value(supernet_normal, 0, 0, 1)
133 | 
134 | supernet_reduce = supernet_generator(node, layer_type)
135 | supernet_reduce = mask_specific_value(supernet_reduce, 0, 0, 1)
136 | 
137 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/Evaluate/individual_model.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | from operations import *
 10 | from torch.autograd import Variable
 11 | 
 12 | 
 13 | 
 14 | class MixedOp(nn.Module):
 15 |     def __init__(self, C, stride, layer_type):
 16 |         super(MixedOp, self).__init__()
 17 |         self._ops = nn.ModuleList()
 18 |         for layer in layer_type:
 19 |             op = OPS[layer](C, stride, False)
 20 |             if 'pool' in layer:
 21 |                 op = nn.Sequential(op, nn.BatchNorm2d(C, affine=False))
 22 |             self._ops.append(op)
 23 | 
 24 |     def forward(self, x):
 25 |         return sum(op(x) for op in self._ops)
 26 |         # return sum(w * op(x) for w, op in zip(weights, self._ops))  # use sum instead concat
 27 | 
 28 | 
 29 | class Cell(nn.Module):
 30 |     def __init__(self, layer_type, steps, multiplier, C_prev_prev, C_prev, C, reduction, reduction_prev, supernet_matrix):
 31 |         super(Cell, self).__init__()
 32 |         self.reduction = reduction
 33 | 
 34 |         if reduction_prev:
 35 |             self.preprocess0 = FactorizedReduce(C_prev_prev, C, affine=False)
 36 |         else:
 37 |             self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0, affine=False)
 38 |         self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0, affine=False)
 39 |         self._steps = steps
 40 |         self._multiplier = multiplier
 41 | 
 42 |         self._ops = nn.ModuleList()
 43 |         self._bns = nn.ModuleList()
 44 |         for i in range(self._steps):
 45 |             for j in range(2 + i):
 46 |                 op_list = []
 47 |                 for type in range(len(supernet_matrix[i][j])):
 48 |                     if supernet_matrix[i][j][type] == 1.0:
 49 |                         op_list.append(layer_type[type])
 50 |                 stride = 2 if reduction and j < 2 else 1
 51 | 
 52 |                 op = MixedOp(C, stride, layer_type=op_list)
 53 |                 self._ops.append(op)
 54 | 
 55 | 
 56 |     def forward(self, s0, s1, supernet_matrix, drop_prob):
 57 |         s0 = self.preprocess0(s0)
 58 |         s1 = self.preprocess1(s1)
 59 | 
 60 |         states = [s0, s1]
 61 |         offset = 0
 62 |         for i in range(self._steps):
 63 |             H = []
 64 |             op = []
 65 |             for j, h in enumerate(states):
 66 |                 if len(self._ops[offset + j]._ops) != 0:
 67 |                     H.append(self._ops[offset + j](h))
 68 |                     op.append(self._ops[offset + j])
 69 | 
 70 |             if self.training and drop_prob > 0.:
 71 |                 for hn_index in range(len(H)):
 72 |                     if len(op[hn_index]._ops) != 0:
 73 |                         if not isinstance(op[hn_index]._ops[0], Identity):
 74 |                         # print(len(op[hn_index]._ops))
 75 |                         # print(op[hn_index]._ops[0])
 76 |                         # print(op[hn_index])
 77 |                             H[hn_index] = drop_path(H[hn_index], drop_prob)
 78 | 
 79 |             s = sum(hn for hn in H)
 80 | 
 81 |             offset += len(states)
 82 |             states.append(s)
 83 |         return torch.cat(states[-len(supernet_matrix):], dim=1)
 84 | 
 85 | 
 86 | 
 87 | def drop_path(x, drop_prob):
 88 |     if drop_prob > 0.:
 89 |         keep_prob = 1. - drop_prob
 90 | 
 91 |         mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
 92 |         x.div_(keep_prob)
 93 |         try:
 94 |             x.mul_(mask)
 95 |         except:
 96 |             mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
 97 |             x.mul_(mask)
 98 |     return x
 99 | 
100 | 
101 | class AuxiliaryHeadCIFAR(nn.Module):
102 | 
103 |     def __init__(self, C, num_classes):
104 |         """assuming input size 8x8"""
105 |         super(AuxiliaryHeadCIFAR, self).__init__()
106 | 
107 |         self.features = nn.Sequential(
108 |             nn.ReLU(inplace=True),
109 |             nn.AvgPool2d(5, stride=3, padding=0, count_include_pad=False),  # image size = 2 x 2
110 |             nn.Conv2d(C, 128, 1, bias=False),
111 |             nn.BatchNorm2d(128),
112 |             nn.ReLU(inplace=True),
113 |             nn.Conv2d(128, 768, 2, bias=False),
114 |             nn.BatchNorm2d(768),
115 |             nn.ReLU(inplace=True)
116 |         )
117 |         self.classifier = nn.Linear(768, num_classes)
118 | 
119 |     def forward(self, x):
120 |         x = self.features(x)
121 |         x = self.classifier(x.view(x.size(0), -1))
122 |         return x
123 | 
124 | 
125 | class Network(nn.Module):
126 |     def __init__(self, supernet_normal, supernet_reduce, layer_type, C, num_classes, layers, auxiliary, steps=4, multiplier=4, stem_multiplier=3):
127 |         super(Network, self).__init__()
128 |         self.supernet_normal = supernet_normal
129 |         self.supernet_reduce = supernet_reduce
130 |         self.layer_type = layer_type
131 |         self._C = C
132 |         self._num_classes = num_classes
133 |         self._layers = layers
134 |         self._steps = steps
135 |         self._multiplier = multiplier
136 |         self._auxiliary = auxiliary
137 |         C_curr = stem_multiplier * C
138 |         self.stem = nn.Sequential(
139 |             nn.Conv2d(3, C_curr, 3, padding=1, bias=False),
140 |             nn.BatchNorm2d(C_curr)
141 |         )
142 | 
143 |         C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
144 |         self.cells = nn.ModuleList()
145 |         reduction_prev = False
146 | 
147 |         for i in range(layers):
148 |             if i in [layers // 3, 2 * layers // 3]:
149 |                 C_curr *= 2
150 |                 reduction = True
151 |                 cell = Cell(self.layer_type, len(self.supernet_reduce), multiplier, C_prev_prev, C_prev, C_curr, reduction, reduction_prev, self.supernet_reduce)
152 |             else:
153 |                 reduction = False
154 |                 cell = Cell(self.layer_type, len(self.supernet_normal), multiplier, C_prev_prev, C_prev, C_curr, reduction, reduction_prev, self.supernet_normal)
155 | 
156 |             reduction_prev = reduction
157 |             self.cells += [cell]
158 | 
159 |             C_prev_prev, C_prev = C_prev, multiplier * C_curr
160 |             if i == 2 * layers // 3:
161 |                 C_to_auxiliary = C_prev
162 |         if auxiliary:
163 |             self.auxiliary_head = AuxiliaryHeadCIFAR(C_to_auxiliary, num_classes)
164 |         self.global_pooling = nn.AdaptiveAvgPool2d(1)
165 |         self.classifier = nn.Linear(C_prev, num_classes)
166 | 
167 | 
168 | 
169 |     def forward(self, input):
170 | 
171 |         s0 = s1 = self.stem(input)
172 |         for i, cell in enumerate(self.cells):
173 |             if not cell.reduction:
174 |                 s0, s1 = s1, cell(s0, s1, self.supernet_normal, self.drop_path_prob)
175 |             else:
176 |                 s0, s1 = s1, cell(s0, s1, self.supernet_reduce, self.drop_path_prob)
177 |         out = self.global_pooling(s1)
178 |         logits = self.classifier(out.view(out.size(0), -1))
179 |         return logits
180 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/Evaluate/operations.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | 
 10 | OPS = {
 11 |   'none' : lambda C, stride, affine: Zero(stride),
 12 |   'avg_pool_3x3' : lambda C, stride, affine: nn.AvgPool2d(3, stride=stride, padding=1, count_include_pad=False),
 13 |   'max_pool_3x3' : lambda C, stride, affine: nn.MaxPool2d(3, stride=stride, padding=1),
 14 |   'skip_connect' : lambda C, stride, affine: Identity() if stride == 1 else FactorizedReduce(C, C, affine=affine),
 15 |   'sep_conv_3x3' : lambda C, stride, affine: SepConv(C, C, 3, stride, 1, affine=affine),
 16 |   'sep_conv_5x5' : lambda C, stride, affine: SepConv(C, C, 5, stride, 2, affine=affine),
 17 |   'sep_conv_7x7' : lambda C, stride, affine: SepConv(C, C, 7, stride, 3, affine=affine),
 18 |   'dil_conv_3x3' : lambda C, stride, affine: DilConv(C, C, 3, stride, 2, 2, affine=affine),
 19 |   'dil_conv_5x5' : lambda C, stride, affine: DilConv(C, C, 5, stride, 4, 2, affine=affine),
 20 |   'conv_7x1_1x7' : lambda C, stride, affine: nn.Sequential(
 21 |     nn.ReLU(inplace=False),
 22 |     nn.Conv2d(C, C, (1,7), stride=(1, stride), padding=(0, 3), bias=False),
 23 |     nn.Conv2d(C, C, (7,1), stride=(stride, 1), padding=(3, 0), bias=False),
 24 |     nn.BatchNorm2d(C, affine=affine)
 25 |     ),
 26 |   'conv_1x1' : lambda C, stride, affine: nn.Conv2d(C, C, (1,1), stride=(stride, stride), padding=(0,0), bias=False),
 27 |   'conv_3x3' : lambda C, stride, affine: nn.Conv2d(C, C, (3,3), stride=(stride, stride), padding=(1,1), bias=False),
 28 |   'conv_5x5' : lambda C, stride, affine: nn.Conv2d(C, C, (5,5), stride=(stride, stride), padding=(2,2), bias=False),
 29 | }
 30 | 
 31 | class ReLUConvBN(nn.Module):
 32 | 
 33 |   def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True):
 34 |     super(ReLUConvBN, self).__init__()
 35 |     self.op = nn.Sequential(
 36 |       nn.ReLU(inplace=False),
 37 |       nn.Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False),
 38 |       nn.BatchNorm2d(C_out, affine=affine)
 39 |     )
 40 | 
 41 |   def forward(self, x):
 42 |     return self.op(x)
 43 | 
 44 | class DilConv(nn.Module):
 45 |     
 46 |   def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True):
 47 |     super(DilConv, self).__init__()
 48 |     self.op = nn.Sequential(
 49 |       nn.ReLU(inplace=False),
 50 |       nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=C_in, bias=False),
 51 |       nn.Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False),
 52 |       nn.BatchNorm2d(C_out, affine=affine),
 53 |       )
 54 | 
 55 |   def forward(self, x):
 56 |     return self.op(x)
 57 | 
 58 | 
 59 | class SepConv(nn.Module):
 60 |     
 61 |   def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True):
 62 |     super(SepConv, self).__init__()
 63 |     self.op = nn.Sequential(
 64 |       nn.ReLU(inplace=False),
 65 |       nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, groups=C_in, bias=False),
 66 |       nn.Conv2d(C_in, C_in, kernel_size=1, padding=0, bias=False),
 67 |       nn.BatchNorm2d(C_in, affine=affine),
 68 |       nn.ReLU(inplace=False),
 69 |       nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=1, padding=padding, groups=C_in, bias=False),
 70 |       nn.Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False),
 71 |       nn.BatchNorm2d(C_out, affine=affine),
 72 |       )
 73 | 
 74 |   def forward(self, x):
 75 |     return self.op(x)
 76 | 
 77 | 
 78 | class Conv2d(nn.Conv2d):
 79 | 
 80 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1,
 81 |                  padding=0, dilation=1, groups=1, bias=True):
 82 |         super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride,
 83 |                  padding, dilation, groups, bias)
 84 | 
 85 |     def forward(self, x):
 86 |         weight = self.weight
 87 |         weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2,
 88 |                                   keepdim=True).mean(dim=3, keepdim=True)
 89 |         weight = weight - weight_mean
 90 |         std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5
 91 |         weight = weight / std.expand_as(weight)
 92 |         return F.conv2d(x, weight, self.bias, self.stride,
 93 |                         self.padding, self.dilation, self.groups)
 94 | 
 95 | 
 96 | 
 97 | class Identity(nn.Module):
 98 | 
 99 |   def __init__(self):
100 |     super(Identity, self).__init__()
101 | 
102 |   def forward(self, x):
103 |     return x
104 | 
105 | 
106 | class Zero(nn.Module):
107 | 
108 |   def __init__(self, stride):
109 |     super(Zero, self).__init__()
110 |     self.stride = stride
111 | 
112 |   def forward(self, x):
113 |     if self.stride == 1:
114 |       return x.mul(0.)
115 |     return x[:,:,::self.stride,::self.stride].mul(0.)
116 | 
117 | 
118 | class FactorizedReduce(nn.Module):
119 | 
120 |   def __init__(self, C_in, C_out, affine=True):
121 |     super(FactorizedReduce, self).__init__()
122 |     assert C_out % 2 == 0
123 |     self.relu = nn.ReLU(inplace=False)
124 |     self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False)
125 |     self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 
126 |     self.bn = nn.BatchNorm2d(C_out, affine=affine)
127 | 
128 |   def forward(self, x):
129 |     x = self.relu(x)
130 |     out = torch.cat([self.conv_1(x), self.conv_2(x[:,:,1:,1:])], dim=1)
131 |     out = self.bn(out)
132 |     return out
133 | 
134 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/Evaluate/run.sh:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates.
2 | # All rights reserved.
3 | # This source code is licensed under the license found in the
4 | # LICENSE file in the root directory of this source tree.
5 | # 
6 | 
7 | screen -d -m -S eval_3 srun -p dev --gres=gpu:1 --time=72:00:00 --cpus-per-task=1  python super_individual_train.py --cutout --auxiliary --batch_size=16 --init_ch=36 --masked_code='[1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0]'
8 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/Evaluate/translator.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import numpy as np
  7 | import random
  8 | 
  9 | node = 5
 10 | layer_type = [
 11 |     'max_pool_3x3',
 12 |     'skip_connect',
 13 |     'sep_conv_3x3',
 14 |     'sep_conv_5x5'
 15 | ]
 16 | 
 17 | 
 18 | def get_rand_vector(layer_type):
 19 |     vec_length = len(layer_type)
 20 |     masked_vec = []
 21 |     for i in range(0, vec_length):
 22 |         if random.random() > 0.5:
 23 |             masked_vec.append(1)
 24 |         else:
 25 |             masked_vec.append(0)
 26 |     return masked_vec
 27 | 
 28 | 
 29 | def mask_rand_generator():
 30 |     supernet = [[] for v in range(node)]
 31 |     for i in range(node):
 32 |         for j in range(node + 2):
 33 |             if j < i + 2:
 34 |                 supernet[i].append(get_rand_vector(layer_type))
 35 |             else:
 36 |                 supernet[i].append(0)
 37 |     return supernet
 38 | 
 39 | 
 40 | def supernet_generator(node, layer_type):
 41 |     vec_length = len(layer_type)
 42 |     masked_vec = np.ones((1, vec_length))[0].tolist()
 43 |     supernet = [[] for v in range(node)]
 44 |     for i in range(node):
 45 |         for j in range(node + 2):
 46 |             if j < i + 2:
 47 |                 supernet[i].append(masked_vec.copy())
 48 |             else:
 49 |                 supernet[i].append(0)
 50 |     return supernet
 51 | 
 52 | 
 53 | def mask_specific_value(supernet, node_id, input_id, operation_id):
 54 |     supernet[node_id][input_id][operation_id] = 0.0
 55 |     return supernet
 56 | 
 57 | 
 58 | def selected_specific_value(supernet, node_id, input_id, operation_id):
 59 |     for i in range(len(supernet[node_id][input_id])):
 60 |         if i != operation_id:
 61 |             supernet[node_id][input_id][i] = 0.0
 62 |     return supernet
 63 | 
 64 | 
 65 | def encoding_to_masks(encoding):
 66 |     encoding = np.array(encoding).reshape(-1, 4)
 67 |     supernet_normal = supernet_generator(node, layer_type)
 68 |     supernet_reduce = supernet_generator(node, layer_type)
 69 |     supernet = [supernet_normal, supernet_reduce]
 70 |     mask = []
 71 |     counter = 0
 72 |     for cell in supernet:
 73 |         mask_cell = []
 74 |         for row in cell:
 75 |             mask_row = []
 76 |             for col in row:
 77 |                 if type(col) == type([]):
 78 |                     mask_row.append(encoding[counter].tolist())
 79 |                     counter += 1
 80 |                 else:
 81 |                     mask_row.append(0)
 82 |             mask_cell.append(mask_row)
 83 |         mask.append(mask_cell)
 84 | 
 85 |     normal_mask = mask[0]
 86 |     reduce_mask = mask[1]
 87 | 
 88 |     return normal_mask, reduce_mask
 89 | 
 90 | 
 91 | def supernet_mask():
 92 |     supernet_normal = supernet_generator(node, layer_type)
 93 |     supernet_reduce = supernet_generator(node, layer_type)
 94 |     return supernet_normal, supernet_reduce
 95 | 
 96 | 
 97 | def encode_supernet():
 98 |     supernet_normal = supernet_generator(node, layer_type)
 99 |     supernet_reduce = supernet_generator(node, layer_type)
100 |     supernet = [supernet_normal, supernet_reduce]
101 | 
102 |     layer_types_count = len(layer_type)
103 |     count = 0
104 |     assert type(supernet) == type([])
105 |     for cell in supernet:
106 |         for row in cell:
107 |             for col in row:
108 |                 if type(col) == type([]):
109 |                     count += layer_types_count
110 |     return np.ones((count)).tolist()
111 | 
112 | 
113 | def define_search_space():
114 |     # hit-and-run default
115 |     # A x <= b
116 |     search_space = encode_supernet()
117 |     A = []
118 |     b = []
119 |     init_point = []
120 |     param_pos = 0
121 |     for i in range(0, len(search_space)):
122 |         tmp = np.zeros(len(search_space))
123 |         tmp[i] = 1
124 |         A.append(np.copy(tmp))
125 |         b.append(1.0000001)
126 |         # we need relax a little bit here for the precision issue
127 |         # A*x <= 1, we use 1.000+epsilon
128 |         # A*x >= 0, we use 0-epslon
129 |         A.append(-1 * np.copy(tmp))
130 |         b.append(0.0000001)
131 |     for i in range(0, len(search_space)):
132 |         if random.random() >= 0.5:
133 |             init_point.append(0.0)
134 |         else:
135 |             init_point.append(1.0)
136 |     return {"A": np.array(A), "b": np.array(b), "init_point": np.array(init_point)}
137 | 
138 | 
139 | c = [1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1]
140 | 
141 | 
142 | def expend_to_supernet_code(old_supernet):
143 |     for i in range(len(old_supernet)):
144 |         for j in range(len(old_supernet[i])):
145 |             if old_supernet[i][j] == 0:
146 |                 old_supernet[i][j] = [0] * len(old_supernet[0][0])
147 |     for i in range(len(old_supernet)):
148 |         for j in range(len(old_supernet[i])):
149 |             for n in range(len(old_supernet[i][j])):
150 |                 old_supernet[i][j][n] = float(old_supernet[i][j][n])
151 | 
152 |     return old_supernet
153 | 
154 | 
155 | 
156 | normal, reduce = encoding_to_masks(c)
157 | 
158 | normal = expend_to_supernet_code(normal)
159 | reduce = expend_to_supernet_code(reduce)
160 | 
161 | print(normal)
162 | print(reduce)
163 | 
164 | 
165 | # print(supernet_reduce)


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/Evaluate/utils.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import os
  7 | import numpy as np
  8 | import torch
  9 | import shutil
 10 | import torchvision.transforms as transforms
 11 | from torch.autograd import Variable
 12 | 
 13 | 
 14 | class AvgrageMeter(object):
 15 | 
 16 |   def __init__(self):
 17 |     self.reset()
 18 | 
 19 |   def reset(self):
 20 |     self.avg = 0
 21 |     self.sum = 0
 22 |     self.cnt = 0
 23 | 
 24 |   def update(self, val, n=1):
 25 |     self.sum += val * n
 26 |     self.cnt += n
 27 |     self.avg = self.sum / self.cnt
 28 | 
 29 | 
 30 | def accuracy(output, target, topk=(1,)):
 31 |   maxk = max(topk)
 32 |   batch_size = target.size(0)
 33 | 
 34 |   _, pred = output.topk(maxk, 1, True, True)
 35 |   pred = pred.t()
 36 |   correct = pred.eq(target.view(1, -1).expand_as(pred))
 37 | 
 38 |   res = []
 39 |   for k in topk:
 40 |     correct_k = correct[:k].view(-1).float().sum(0)
 41 |     res.append(correct_k.mul_(100.0/batch_size))
 42 |   return res
 43 | 
 44 | 
 45 | class Cutout(object):
 46 |     def __init__(self, length):
 47 |         self.length = length
 48 | 
 49 |     def __call__(self, img):
 50 |         h, w = img.size(1), img.size(2)
 51 |         mask = np.ones((h, w), np.float32)
 52 |         y = np.random.randint(h)
 53 |         x = np.random.randint(w)
 54 | 
 55 |         y1 = np.clip(y - self.length // 2, 0, h)
 56 |         y2 = np.clip(y + self.length // 2, 0, h)
 57 |         x1 = np.clip(x - self.length // 2, 0, w)
 58 |         x2 = np.clip(x + self.length // 2, 0, w)
 59 | 
 60 |         mask[y1: y2, x1: x2] = 0.
 61 |         mask = torch.from_numpy(mask)
 62 |         mask = mask.expand_as(img)
 63 |         img *= mask
 64 |         return img
 65 | 
 66 | 
 67 | def _data_transforms_cifar10(cutout, cutout_length):
 68 |   CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
 69 |   CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
 70 | 
 71 |   train_transform = transforms.Compose([
 72 |     transforms.RandomCrop(32, padding=4),
 73 |     transforms.RandomHorizontalFlip(),
 74 |     transforms.ToTensor(),
 75 |     transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
 76 |   ])
 77 |   if cutout:
 78 |     train_transform.transforms.append(Cutout(cutout_length))
 79 | 
 80 |   valid_transform = transforms.Compose([
 81 |     transforms.ToTensor(),
 82 |     transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
 83 |     ])
 84 |   return train_transform, valid_transform
 85 | 
 86 | 
 87 | def count_parameters_in_MB(model):
 88 |   return np.sum(np.prod(v.size()) for name, v in model.named_parameters() if "auxiliary" not in name)/1e6
 89 | 
 90 | 
 91 | def save_checkpoint(state, is_best, save):
 92 |   filename = os.path.join(save, 'checkpoint.pth.tar')
 93 |   torch.save(state, filename)
 94 |   if is_best:
 95 |     best_filename = os.path.join(save, 'model_best.pth.tar')
 96 |     shutil.copyfile(filename, best_filename)
 97 | 
 98 | 
 99 | def save(model, model_path):
100 |   torch.save(model.state_dict(), model_path)
101 | 
102 | 
103 | def load(model, model_path):
104 |   model.load_state_dict(torch.load(model_path))
105 | 
106 | 
107 | def drop_path(x, drop_prob):
108 |     if drop_prob > 0.:
109 |         keep_prob = 1. - drop_prob
110 | 
111 |         mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
112 |         x.div_(keep_prob)
113 |         try:
114 |             x.mul_(mask)
115 |         except:
116 |             mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
117 |             x.mul_(mask)
118 |     return x
119 | 
120 | def create_exp_dir(path, scripts_to_save=None):
121 |     if not os.path.exists(path):
122 |         os.mkdir(path)
123 |     print('Experiment dir : {}'.format(path))
124 | 
125 |     if scripts_to_save is not None:
126 |         if not os.path.exists(os.path.join(path, 'scripts')):
127 |             os.mkdir(os.path.join(path, 'scripts'))
128 |         for script in scripts_to_save:
129 |             dst_file = os.path.join(path, 'scripts', os.path.basename(script))
130 |             shutil.copyfile(script, dst_file)
131 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/LaNAS/Classifier.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | from torch.autograd import Variable
  9 | import json
 10 | from torch import optim
 11 | import numpy as np
 12 | 
 13 | # this is the backbone model
 14 | # to split networks at a MCTS state
 15 | class LinearModel(nn.Module):
 16 |     
 17 |     def __init__(self, input_dim, output_dim):
 18 |         super(LinearModel, self).__init__()
 19 |         self.fc1 = nn.Linear(input_dim, output_dim)
 20 |         torch.nn.init.xavier_uniform_( self.fc1.weight )
 21 |     
 22 |     def forward(self, x):
 23 |         y = self.fc1(x)
 24 |         #print("=====>X_shape:", x.shape)
 25 |         return y
 26 | 
 27 | # the input will be samples!
 28 | class Classifier():
 29 |     def __init__(self, samples, input_dim):
 30 |         self.training_counter = 0
 31 |         assert input_dim >= 1
 32 |         assert type(samples) ==  type({})
 33 |         self.input_dim  = input_dim
 34 |         self.samples    = samples
 35 |         self.model      = LinearModel(input_dim, 1)
 36 | 
 37 |         if torch.cuda.is_available():
 38 |             self.model.cuda()
 39 |         self.l_rate     = 0.00001
 40 |         self.optimiser  = optim.Adam(self.model.parameters(), lr=self.l_rate, betas=(0.9, 0.999), eps=1e-08)
 41 |         self.epochs     = 1 #TODO:revise to 100
 42 |         self.boundary   = -1
 43 |         self.nets       = []
 44 |         
 45 |     def get_params(self):
 46 |         
 47 |         return self.model.fc1.weight.detach().cpu().numpy(), self.model.fc1.bias.detach().cpu().numpy()
 48 | 
 49 |     def reinit(self):
 50 |         torch.nn.init.xavier_uniform_( self.m.fc1.weight )
 51 |         torch.nn.init.xavier_uniform_( self.m.fc2.weight )
 52 |     
 53 |     def update_samples(self, latest_samples):
 54 |         assert type(latest_samples) == type(self.samples)
 55 |         sampled_nets    = []
 56 |         nets_acc        = []
 57 |         for k, v in latest_samples.items():
 58 |             net = json.loads(k)
 59 |             sampled_nets.append( net )
 60 |             nets_acc.append( v )
 61 |         self.nets = torch.from_numpy(np.asarray(sampled_nets, dtype=np.float32).reshape(-1, self.input_dim))
 62 |         self.acc  = torch.from_numpy(np.asarray(nets_acc,     dtype=np.float32).reshape(-1, 1))
 63 |         self.samples = latest_samples
 64 |         if torch.cuda.is_available():
 65 |             self.nets = self.nets.cuda()
 66 |             self.acc  = self.acc.cuda()
 67 | 
 68 |     def train(self):
 69 |         if self.training_counter == 0:
 70 |             self.epochs = 1000#20000
 71 |         else:
 72 |             self.epochs = 1000#3000
 73 |         self.training_counter += 1
 74 |         # in a rare case, one branch has no networks
 75 |         if len(self.nets) == 0:
 76 |             return
 77 |         for epoch in range(self.epochs):
 78 |             epoch += 1
 79 |             nets = self.nets
 80 |             acc  = self.acc
 81 |             #clear grads
 82 |             self.optimiser.zero_grad()
 83 |             #forward to get predicted values
 84 |             outputs = self.model.forward( nets )
 85 |             loss = nn.MSELoss()(outputs, acc)
 86 |             loss.backward()# back props
 87 |             nn.utils.clip_grad_norm_(self.model.parameters(), 5)
 88 |             self.optimiser.step()# update the parameters
 89 | #            if epoch % 1000 == 0:
 90 | #                print('@' + self.__class__.__name__ + ' epoch {}, loss {}'.format(epoch, loss.data))
 91 | 
 92 |     def predict(self, remaining):
 93 |         assert type(remaining) == type({})
 94 |         remaining_archs    = []
 95 |         for k, v in remaining.items():
 96 |             net = json.loads(k)
 97 |             remaining_archs.append( net )
 98 |         remaining_archs = torch.from_numpy(np.asarray(remaining_archs, dtype=np.float32).reshape(-1, self.input_dim))
 99 |         if torch.cuda.is_available():
100 |             remaining_archs = remaining_archs.cuda()
101 |         outputs = self.model.forward(remaining_archs)
102 |         if torch.cuda.is_available():
103 |             remaining_archs = remaining_archs.cpu()
104 |             outputs         = outputs.cpu()
105 |         result  = {}
106 |         counter = 0
107 |         for k in range(0, len(remaining_archs) ):
108 |             counter += 1
109 |             arch = remaining_archs[k].detach().numpy()
110 |             arch_str = json.dumps( arch.tolist() )
111 |             result[ arch_str ] = outputs[k].detach().numpy().tolist()[0]
112 |         assert len(result) == len(remaining)
113 |         return result
114 | 
115 |     def split_predictions(self, remaining):
116 |         assert type(remaining) == type({})
117 |         samples_badness  = {}
118 |         samples_goodies  = {}
119 |         if len(remaining) == 0:
120 |             return samples_badness, samples_goodies
121 |         predictions = self.predict(remaining)
122 |         avg_acc          = self.predict_mean()
123 |         self.boundary    = avg_acc
124 |         for k, v in predictions.items():
125 |             if v < avg_acc:
126 |                 samples_badness[k] = v
127 |             else:
128 |                 samples_goodies[k] = v
129 |         assert len(samples_badness) + len(samples_goodies) == len(remaining)
130 |         return  samples_goodies, samples_badness
131 | 
132 | 
133 |     def predict_mean(self):
134 |         if len(self.nets) == 0:
135 |             return 0
136 |         # can we use the actual acc?
137 |         outputs    = self.model.forward(self.nets)
138 |         pred_np    = None
139 |         if torch.cuda.is_available():
140 |             pred_np = outputs.detach().cpu().numpy()
141 |         else:
142 |             pred_np = outputs.detach().numpy()
143 |         return np.mean(pred_np)
144 |     
145 |     def split_data(self):
146 |         samples_badness  = {}
147 |         samples_goodies  = {}
148 |         if len(self.nets) == 0:
149 |             return samples_badness, samples_goodies
150 |         self.train()
151 |         avg_acc          = self.predict_mean()
152 |         self.boundary    = avg_acc
153 |         for k, v in self.samples.items():
154 |             if v < avg_acc:
155 |                 samples_badness[k]  = v
156 |             else:
157 |                 samples_goodies[k] = v
158 |         assert len(samples_badness) + len(samples_goodies) == len( self.samples )
159 |         return  samples_goodies, samples_badness
160 | 
161 | #test
162 | #with open('features.json', 'r') as infile:
163 | #    data=json.loads( infile.read() )
164 | #samples = {}
165 | #for d in data:
166 | #    samples[ json.dumps(d['feature']) ] = d['acc']
167 | #
168 | #print("total samples:", len(samples.keys() ))
169 | #c = Classifier(samples, 10)
170 | #goodies, badies = c.split_data()
171 | #print("goodies:", len(goodies.keys() ), np.mean(np.array(list(goodies.values()) ) )," bad:", len(badies.keys() ), np.mean(np.array(list(badies.values()) )) )
172 | 
173 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/LaNAS/mlp_predictor.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import numpy as np
  7 | # import matplotlib.pyplot as plt
  8 | import json
  9 | from scipy.stats import norm
 10 | from scipy.optimize import minimize
 11 | import random
 12 | import time
 13 | import os
 14 | import itertools
 15 | import operator
 16 | import torch
 17 | import torch.nn as nn
 18 | from torch.autograd import Variable
 19 | import json
 20 | from torch import optim
 21 | import numpy as np
 22 | import random 
 23 | 
 24 | 
 25 | class LinearModel(nn.Module):
 26 | 
 27 |     # def __init__(self, input_dim, output_dim):
 28 |     #     super(LinearModel, self).__init__()
 29 |     #     self.fc1 = nn.Linear(input_dim, output_dim)
 30 |     #     torch.nn.init.xavier_uniform_( self.fc1.weight )
 31 | 
 32 |     # def forward(self, x):
 33 |     #     y = self.fc1(x)
 34 |     #     return y
 35 | 
 36 |     def __init__(self, input_dim, output_dim):
 37 |         super(LinearModel, self).__init__()
 38 |         self.fc1 = nn.Linear(input_dim, 100)
 39 |         self.fc2 = nn.Linear(100, output_dim)
 40 | 
 41 |         torch.nn.init.xavier_uniform_( self.fc1.weight )
 42 |         torch.nn.init.xavier_uniform_( self.fc2.weight )
 43 | 
 44 |     def weights_init(self):
 45 |         torch.nn.init.xavier_uniform_( self.fc1.weight )
 46 |         torch.nn.init.xavier_uniform_( self.fc2.weight )
 47 | 
 48 |     def forward(self, x):
 49 |         x1 = self.fc1(x)
 50 |         x2 = torch.relu(x1)
 51 |         y  = self.fc2(x2)
 52 |         y  = torch.sigmoid(y)
 53 |         return y
 54 |         
 55 |     def train(self, samples):
 56 |         optimiser  = optim.Adam(self.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08)
 57 |         
 58 |         X_sample = None
 59 |         Y_sample = None
 60 |         for sample in samples:
 61 |             if X_sample is None or Y_sample is None:
 62 |                 X_sample = np.array( json.loads(sample) )
 63 |                 Y_sample = np.array( samples[ sample ]  )
 64 |             else:
 65 |                 X_sample = np.vstack([X_sample, json.loads(sample) ] )
 66 |                 Y_sample = np.vstack([Y_sample, samples[ sample ] ] )
 67 |         batch_size = 100
 68 |         print("dataset:", len(samples) )
 69 |         chunks = int( X_sample.shape[0] / batch_size )
 70 |         if  X_sample.shape[0] % batch_size > 0:
 71 |             chunks += 1
 72 |         for epoch in range(0, 150):
 73 |             X_sample_split = np.array_split(X_sample, chunks)
 74 |             Y_sample_split = np.array_split(Y_sample, chunks)
 75 |             #print("epoch=", epoch)
 76 |             for i in range(0, chunks):
 77 |                 optimiser.zero_grad()
 78 |                 inputs = torch.from_numpy( np.asarray(X_sample_split[i], dtype=np.float32).reshape(X_sample_split[i].shape[0], X_sample_split[i].shape[1]) )
 79 |                 outputs = self.forward( inputs )
 80 |                 loss = nn.MSELoss()(outputs, torch.from_numpy( np.asarray(Y_sample_split[i], dtype=np.float32) ).reshape(-1, 1)  )
 81 |                 loss.backward()# back props
 82 |                 nn.utils.clip_grad_norm_(self.parameters(), 5)
 83 |                 optimiser.step()# update the parameters
 84 |     
 85 |     def propose_networks( self, search_space ):
 86 |         ''' search space to predict by a meta-DNN for points selection  '''
 87 |         networks = []
 88 |         for network in search_space.keys():
 89 |             networks.append( json.loads( network ) )
 90 |         X    = np.array( networks )
 91 |         X    = torch.from_numpy( np.asarray(X, dtype=np.float32).reshape(X.shape[0], X.shape[1]) )
 92 |         Y    = self.forward( X )
 93 |         Y    = Y.data.numpy()
 94 |         Y    = Y.reshape( len(networks) )
 95 |         X    = X.data.numpy( )
 96 |         proposed_networks = []
 97 |         n    = 10
 98 |         if Y.shape[0] < n:
 99 |             n = Y.shape[0]
100 |         indices = np.argsort(Y)[-n:]
101 |         print("indices:", indices.shape)
102 |         proposed_networks = X[indices]
103 |         return proposed_networks.tolist()
104 |     
105 |         
106 |         
107 | 
108 | # ####preprocess data####
109 | # dataset = []
110 | # with open('nasbench_dataset', 'r') as infile:
111 | #     dataset = json.loads( infile.read() )
112 | #
113 | # samples = {}
114 | # for data in dataset:
115 | #     samples[json.dumps(data["feature"])] = data["acc"]
116 | #
117 | # BEST_ACC   = 0
118 | # BEST_ARCH  = None
119 | # CURT_BEST  = 0
120 | # BEST_TRACE = {}
121 | # for i in dataset:
122 | #     arch = i['feature']
123 | #     acc  = i['acc']
124 | #     if acc > BEST_ACC:
125 | #         BEST_ACC  = acc
126 | #         BEST_ARCH = json.dumps( arch )
127 | # print("##target acc:", BEST_ACC)
128 | # #######################
129 | #
130 | # # bounds = np.array([[-1.0, 2.0]])
131 | # noise = 0.2
132 | # #
133 | # #
134 | # # def f(X, noise=noise):
135 | # #     return -np.sin(3*X) - X**2 + 0.7*X + noise * np.random.randn(*X.shape)
136 | # #
137 | # # X_init = np.array([[-0.9], [1.1]])
138 | # # Y_init = f(X_init)
139 | # #
140 | # # X = np.arange(bounds[:, 0], bounds[:, 1], 0.01).reshape(-1, 1)
141 | # # Y = f(X,0)
142 | # #
143 | #
144 | #
145 | #
146 | # # Gaussian process with Matern kernel as surrogate model
147 | #
148 | # init_samples = random.sample(samples.keys(), 100)
149 | #
150 | #
151 | # # Initialize samples
152 | # #
153 | # # Number of iterations
154 | # n_iter = 1000000000000
155 | # #
156 | # # plt.figure(figsize=(12, n_iter * 3))
157 | # # plt.subplots_adjust(hspace=0.4)
158 | # #
159 | # predictor  = LinearModel(49, 1)
160 | #
161 | # window_size = 100
162 | # sample_counter = 0
163 | #
164 | # #     # Obtain next sampling point from the acquisition function (expected_improvement)
165 | #     X_next = propose_location(predictor, X_sample, Y_sample, samples)
166 | # #     # Obtain next noisy sample from the objective function
167 | #     for network in X_next:
168 | #         X_sample = np.vstack([X_sample, network] )
169 | #     for network in X_next:
170 | #         sample_counter += 1
171 | #         acc = samples[ json.dumps( network.tolist() ) ]
172 | #         if acc > CURT_BEST:
173 | #             BEST_TRACE[json.dumps( network.tolist() ) ] = [acc, sample_counter]
174 | #             CURT_BEST = acc
175 | #         if acc == BEST_ACC:
176 | #             sorted_best_traces = sorted(BEST_TRACE.items(), key=operator.itemgetter(1))
177 | #             for item in sorted_best_traces:
178 | #                 print(item[0],"==>", item[1])
179 | #             final_results = []
180 | #             for item in sorted_best_traces:
181 | #                 final_results.append( item[1] )
182 | #             final_results_str = json.dumps(final_results)
183 | #             with open("result.txt", "a") as f:
184 | #                 f.write(final_results_str + '\n')
185 | #             print("$$$$$$$$$$$$$$$$$$$CONGRATUGLATIONS$$$$$$$$$$$$$$$$$$$")
186 | #             os._exit(1)
187 | #
188 | #         print(network, acc)
189 | #         del samples[ json.dumps( network.tolist() ) ]
190 | #         Y_sample = np.vstack([Y_sample, acc] )
191 | 
192 | 
193 | 
194 | 
195 | 
196 | 
197 | 
198 | 
199 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/README.md:
--------------------------------------------------------------------------------
 1 | ## one-shot/few-shot LaNAS
 2 | • <b>fast to get a working result</b>
 3 | 
 4 | • <b>the inaccurate prediction from supernet degrades the final network performance</b>
 5 | 
 6 | The one-shot LaNAS uses a pretrained supernet to predict the performance of a proposed architecture via masking. The following figure illustrates the search procedures.
 7 | 
 8 | <p align="center">
 9 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LaNAS/one-shot_LaNAS_search.png?raw=true' width="600">
10 | </p>
11 | 
12 | The training of supernet is same as the regular training except for that we apply a random mask at each iterations. 
13 | 
14 | ## Evaluating search algorithms on the supernet
15 | NASBench-101 has very limited architectures (~420K architectures), which can be easily predicted with some sort of predictor. Supernet can be a great alternative to solve this problem as it renders a search space having 10^21 architectures. Therefore, our supernet can also be used as a benchmark to evaluate different search algorithms. See Fig.6 in <a href="https://linnanwang.github.io/latent-actions.pdf">LaNAS paper</a>. Please check how LaNAS interacts with supernet, and samples the architecture and its accuracy.
16 | 
17 | 
18 | ## Training the supernet
19 | You can skip this step if use our pre-trained supernet.
20 | 
21 | Our supernet is designed for NASNet search space, and changing it to a new design space requires some work to change the codes. We're working on this issue, will update later. The training of supernet is fairly easy, simply
22 | 
23 | ``` python train.py ```
24 | 
25 | - **Training on the ImageNet**
26 | 
27 | Please use the training pipeline from <a href="https://github.com/rwightman/pytorch-image-models">Pytorch-Image-Models</a>. Here we describe the procedures to do so:
28 | 1. get the supernet model from supernet_train.py, line 94
29 | 2. go to Pytorch-Image-Models
30 | 3. find pytorch-image-models/blob/master/timm/models/factory.py, replace line 57 as follows
31 | ``` 
32 | # model = create_fn(**model_args, **kwargs) 
33 | model = our-supernet
34 | ```
35 | 
36 | ## Searching with a supernet
37 | You can download the supernet pre-trained by us from <a href="https://drive.google.com/file/d/11RqnHAcfhiSYvCSpYZDfilCI1CYmL7WK/view?usp=sharing">here<a>. Place it in the same folder, and start searching with
38 |  
39 |  
40 | ``` python train.py ```
41 | 
42 | The search results will be written into a results.txt, and you can read the results by 
43 | 
44 | ``` python read_result.py ```
45 | 
46 | The program outputs every samples with its test accuracy, e.g.
47 | 
48 | >[[1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] 81.69 3774
49 | 
50 | > <b>[1.0 .. 0.0] is the architecture encoding, which can be used to train a network later.</b>
51 | 
52 | > <b>81.69 is the test accuracy predicted from supernet via weight sharing.</b>
53 | 
54 | > <b>3774 means this is the 3774th sample.</b>
55 | 
56 | ## Training a searched network
57 | Once you pick a network after reading the results, you can train the network in the Evaluate folder.
58 | ```
59 | cd Evaluate
60 | #attention, you need supply the code of target architecture in the argument of masked_code
61 | python super_individual_train.py --cutout --auxiliary --batch_size=16 --init_ch=36 --masked_code='[1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0]'
62 | ```
63 | ## Improving with few-shot NAS
64 | Though one-shot NAS substantially reduces the computation cost by training only one supernet, to approximate the performance of every architecture in the search space via weight-sharing. However, the performance estimation can be very inaccurate due to the co-adaption among operations. 
65 | Recently, we propose <b>few-shot NAS</b> that uses multiple supernetworks, called sub-supernet, each covering different regions of the search space to alleviate the undesired co-adaption. Since each sub-supernet only covers a small search space, compared to one-shot NAS, few-shot NAS improves the accuracy of architecture evaluation with a small increase of evaluation cost. Please see the following paper for details.
66 | 
67 | <a href="https://arxiv.org/abs/2006.06863">Few-shot Neural Architecture Search</a> </br>
68 | in submission</br>
69 | Yiyang Zhao (WPI), Linnan Wang (Brown), Yuandong Tian (FAIR), Rodrigo Fonseca (Brown), Tian Guo (WPI)
70 | 
71 | **To Evaluate Few-shot NAS**, please check this <a href="https://github.com/aoiang/few-shot-NAS">repository</a>. The following figures show the performance improvement of few-shot NAS.
72 | <p align="center">
73 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LaNAS/few-shot-1.png?raw=true' width="1000">
74 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LaNAS/few-shot-2.png?raw=true' width="1000">
75 | </p>
76 | These figures basically tell you few-shot NAS is an effective trade-off between one-shot NAS and vanilla NAS, i.e. training from scratch that retains both good performance estimation of a network and the fast speed.
77 | 
78 | 
79 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/read_result.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | 
 7 | import json
 8 | 
 9 | with open('result.txt') as json_data:
10 |     data = json.load(json_data)
11 | 
12 | counter = 0
13 | for elem in data:
14 |     print(elem[0], elem[1], counter)
15 |     counter += 1
16 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/search.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | from supernet.generator import encode_supernet, define_search_space
 7 | from LaNAS.MCTS import MCTS
 8 | from supernet.supernet_train import Trainer
 9 | import numpy as np
10 | import os
11 | import pickle
12 | 
13 | ############## first step, training supernet 
14 | 
15 | # trainer.run(300)
16 | 
17 | ############## second step, searching over the supernet
18 | search_space = define_search_space()
19 | print( search_space["init_point"], len( search_space["init_point"] ) )
20 | # sample = mcts.zero_supernet_generator()
21 | # print(sample)
22 | #
23 | #
24 | trainer = Trainer( batch_size=40, init_channels= 48  )
25 | trainer.load_model("./NASNet_Supernet.pt")
26 | 
27 | #
28 | node_path = "mcts_agent"
29 | if os.path.isfile(node_path) == True:
30 |     with open(node_path, 'rb') as json_data:
31 |         agent = pickle.load(json_data)
32 |         print("=====>loads:", len(agent.samples)," samples" )
33 |         print("=====>loads:", agent.SEARCH_COUNTER," counter" )
34 |         agent.search( )
35 | else:
36 |     mcts   = MCTS(search_space, trainer, 5)
37 |     sample = mcts.trainer.propose_nasnet_mask()
38 |     result = trainer.infer_masks( sample )
39 |     mcts.collect_samples( result )
40 |     mcts.search( )
41 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/supernet/operations.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | 
 10 | OPS = {
 11 |   'none' : lambda C, stride, affine: Zero(stride),
 12 |   'avg_pool_3x3' : lambda C, stride, affine: nn.AvgPool2d(3, stride=stride, padding=1, count_include_pad=False),
 13 |   'max_pool_3x3' : lambda C, stride, affine: nn.MaxPool2d(3, stride=stride, padding=1),
 14 |   'skip_connect' : lambda C, stride, affine: Identity() if stride == 1 else FactorizedReduce(C, C, affine=affine),
 15 |   'sep_conv_3x3' : lambda C, stride, affine: SepConv(C, C, 3, stride, 1, affine=affine),
 16 |   'sep_conv_5x5' : lambda C, stride, affine: SepConv(C, C, 5, stride, 2, affine=affine),
 17 |   'sep_conv_7x7' : lambda C, stride, affine: SepConv(C, C, 7, stride, 3, affine=affine),
 18 |   'dil_conv_3x3' : lambda C, stride, affine: DilConv(C, C, 3, stride, 2, 2, affine=affine),
 19 |   'dil_conv_5x5' : lambda C, stride, affine: DilConv(C, C, 5, stride, 4, 2, affine=affine),
 20 |   'conv_7x1_1x7' : lambda C, stride, affine: nn.Sequential(
 21 |     nn.ReLU(inplace=False),
 22 |     nn.Conv2d(C, C, (1,7), stride=(1, stride), padding=(0, 3), bias=False),
 23 |     nn.Conv2d(C, C, (7,1), stride=(stride, 1), padding=(3, 0), bias=False),
 24 |     nn.BatchNorm2d(C, affine=affine)
 25 |     ),
 26 |   'conv_1x1' : lambda C, stride, affine: nn.Conv2d(C, C, (1,1), stride=(stride, stride), padding=(0,0), bias=False),
 27 |   'conv_3x3' : lambda C, stride, affine: nn.Conv2d(C, C, (3,3), stride=(stride, stride), padding=(1,1), bias=False),
 28 |   'conv_5x5' : lambda C, stride, affine: nn.Conv2d(C, C, (5,5), stride=(stride, stride), padding=(2,2), bias=False),
 29 | }
 30 | 
 31 | class ReLUConvBN(nn.Module):
 32 | 
 33 |   def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True):
 34 |     super(ReLUConvBN, self).__init__()
 35 |     self.op = nn.Sequential(
 36 |       nn.ReLU(inplace=False),
 37 |       nn.Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False),
 38 |       nn.BatchNorm2d(C_out, affine=affine)
 39 |     )
 40 | 
 41 |   def forward(self, x):
 42 |     return self.op(x)
 43 | 
 44 | class DilConv(nn.Module):
 45 |     
 46 |   def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True):
 47 |     super(DilConv, self).__init__()
 48 |     self.op = nn.Sequential(
 49 |       nn.ReLU(inplace=False),
 50 |       nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=C_in, bias=False),
 51 |       nn.Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False),
 52 |       nn.BatchNorm2d(C_out, affine=affine),
 53 |       )
 54 | 
 55 |   def forward(self, x):
 56 |     return self.op(x)
 57 | 
 58 | 
 59 | class SepConv(nn.Module):
 60 |     
 61 |   def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True):
 62 |     super(SepConv, self).__init__()
 63 |     self.op = nn.Sequential(
 64 |       nn.ReLU(inplace=False),
 65 |       nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, groups=C_in, bias=False),
 66 |       nn.Conv2d(C_in, C_in, kernel_size=1, padding=0, bias=False),
 67 |       nn.BatchNorm2d(C_in, affine=affine),
 68 |       nn.ReLU(inplace=False),
 69 |       nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=1, padding=padding, groups=C_in, bias=False),
 70 |       nn.Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False),
 71 |       nn.BatchNorm2d(C_out, affine=affine),
 72 |       )
 73 | 
 74 |   def forward(self, x):
 75 |     return self.op(x)
 76 | 
 77 | 
 78 | class Conv2d(nn.Conv2d):
 79 | 
 80 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1,
 81 |                  padding=0, dilation=1, groups=1, bias=True):
 82 |         super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride,
 83 |                  padding, dilation, groups, bias)
 84 | 
 85 |     def forward(self, x):
 86 |         weight = self.weight
 87 |         weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2,
 88 |                                   keepdim=True).mean(dim=3, keepdim=True)
 89 |         weight = weight - weight_mean
 90 |         std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5
 91 |         weight = weight / std.expand_as(weight)
 92 |         return F.conv2d(x, weight, self.bias, self.stride,
 93 |                         self.padding, self.dilation, self.groups)
 94 | 
 95 | 
 96 | 
 97 | class Identity(nn.Module):
 98 | 
 99 |   def __init__(self):
100 |     super(Identity, self).__init__()
101 | 
102 |   def forward(self, x):
103 |     return x
104 | 
105 | 
106 | class Zero(nn.Module):
107 | 
108 |   def __init__(self, stride):
109 |     super(Zero, self).__init__()
110 |     self.stride = stride
111 | 
112 |   def forward(self, x):
113 |     if self.stride == 1:
114 |       return x.mul(0.)
115 |     return x[:,:,::self.stride,::self.stride].mul(0.)
116 | 
117 | 
118 | class FactorizedReduce(nn.Module):
119 | 
120 |   def __init__(self, C_in, C_out, affine=True):
121 |     super(FactorizedReduce, self).__init__()
122 |     assert C_out % 2 == 0
123 |     self.relu = nn.ReLU(inplace=False)
124 |     self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False)
125 |     self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 
126 |     self.bn = nn.BatchNorm2d(C_out, affine=affine)
127 | 
128 |   def forward(self, x):
129 |     x = self.relu(x)
130 |     out = torch.cat([self.conv_1(x), self.conv_2(x[:,:,1:,1:])], dim=1)
131 |     out = self.bn(out)
132 |     return out
133 | 
134 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/supernet/supernet_model.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | from .operations import *
 10 | from torch.autograd import Variable
 11 | 
 12 | 
 13 | 
 14 | class MixedOp(nn.Module):
 15 |     def __init__(self, C, stride, layer_type):
 16 |         super(MixedOp, self).__init__()
 17 |         self._ops = nn.ModuleList()
 18 |         for layer in layer_type:
 19 |             op = OPS[layer](C, stride, False)
 20 |             if 'pool' in layer:
 21 |                 op = nn.Sequential(op, nn.BatchNorm2d(C, affine=False))
 22 |             self._ops.append(op)
 23 | 
 24 |     def forward(self, x, weights):
 25 |         # print(len(weights), len(self._ops))
 26 |         return sum(w * op(x) for w, op in zip(weights, self._ops))  # use sum instead concat
 27 | 
 28 | 
 29 | class Cell(nn.Module):
 30 |     def __init__(self, layer_type, steps, multiplier, C_prev_prev, C_prev, C, reduction, reduction_prev):
 31 |         super(Cell, self).__init__()
 32 |         self.reduction = reduction
 33 | 
 34 |         if reduction_prev:
 35 |             self.preprocess0 = FactorizedReduce(C_prev_prev, C, affine=False)
 36 |         else:
 37 |             self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0, affine=False)
 38 |         self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0, affine=False)
 39 |         self._steps = steps
 40 |         self._multiplier = multiplier
 41 | 
 42 |         self._ops = nn.ModuleList()
 43 |         self._bns = nn.ModuleList()
 44 |         for i in range(self._steps):
 45 |             for j in range(2 + i):
 46 |                 stride = 2 if reduction and j < 2 else 1
 47 |                 op = MixedOp(C, stride, layer_type)
 48 |                 self._ops.append(op)
 49 | 
 50 |     def forward(self, s0, s1, supernet_matrix):
 51 |         s0 = self.preprocess0(s0)
 52 |         s1 = self.preprocess1(s1)
 53 | 
 54 |         states = [s0, s1]
 55 |         offset = 0
 56 |         for i in range(self._steps):
 57 |             s = sum(self._ops[offset + j](h, supernet_matrix[i][j]) for j, h in enumerate(states))
 58 | 
 59 |             offset += len(states)
 60 |             states.append(s)
 61 | 
 62 |         return torch.cat(states[-self._multiplier:], dim=1)
 63 | 
 64 | 
 65 | class Network(nn.Module):
 66 |     def __init__(self, supernet_normal, supernet_reduce, layer_type, C, num_classes, layers, criterion, steps=4, multiplier=4, stem_multiplier=3):
 67 |         super(Network, self).__init__()
 68 |         self.supernet_normal = supernet_normal
 69 |         self.supernet_reduce = supernet_reduce
 70 |         self.layer_type = layer_type
 71 |         self._C = C
 72 |         self._num_classes = num_classes
 73 |         self._layers = layers
 74 |         self._criterion = criterion
 75 |         self._steps = steps
 76 |         self._multiplier = multiplier
 77 | 
 78 |         C_curr = stem_multiplier * C
 79 |         self.stem = nn.Sequential(
 80 |             nn.Conv2d(3, C_curr, 3, padding=1, bias=False),
 81 |             nn.BatchNorm2d(C_curr)
 82 |         )
 83 | 
 84 |         C_prev_prev, C_prev, C_curr = C_curr, C_curr, C
 85 |         self.cells = nn.ModuleList()
 86 |         reduction_prev = False
 87 |         for i in range(layers):
 88 |             if i in [layers // 3, 2 * layers // 3]:
 89 |                 C_curr *= 2
 90 |                 reduction = True
 91 |                 cell = Cell(self.layer_type, len(self.supernet_reduce), multiplier, C_prev_prev, C_prev, C_curr, reduction, reduction_prev)
 92 |             else:
 93 |                 reduction = False
 94 |                 cell = Cell(self.layer_type, steps, multiplier, C_prev_prev, C_prev, C_curr, reduction, reduction_prev)
 95 | 
 96 |             reduction_prev = reduction
 97 |             self.cells += [cell]
 98 |             C_prev_prev, C_prev = C_prev, multiplier * C_curr
 99 | 
100 |         self.global_pooling = nn.AdaptiveAvgPool2d(1)
101 |         self.classifier = nn.Linear(C_prev, num_classes)
102 | 
103 |     def change_masks(self, normal_mask, reduce_mask):            
104 |         self.supernet_normal = normal_mask
105 |         self.supernet_reduce = reduce_mask
106 | 
107 |     def forward(self, input):       
108 |         s0 = s1 = self.stem(input)
109 |         for i, cell in enumerate(self.cells):
110 |             if not cell.reduction:
111 |                 s0, s1 = s1, cell(s0, s1, self.supernet_normal)
112 |             else:
113 |                 s0, s1 = s1, cell(s0, s1, self.supernet_reduce)
114 |         out = self.global_pooling(s1)
115 |         logits = self.classifier(out.view(out.size(0), -1))
116 |         return logits
117 | 
118 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/supernet/utils.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | # This source code is licensed under the license found in the
  4 | # LICENSE file in the root directory of this source tree.
  5 | # 
  6 | import os
  7 | import numpy as np
  8 | import torch
  9 | import shutil
 10 | import torchvision.transforms as transforms
 11 | from torch.autograd import Variable
 12 | 
 13 | 
 14 | class AvgrageMeter(object):
 15 | 
 16 |   def __init__(self):
 17 |     self.reset()
 18 | 
 19 |   def reset(self):
 20 |     self.avg = 0
 21 |     self.sum = 0
 22 |     self.cnt = 0
 23 | 
 24 |   def update(self, val, n=1):
 25 |     self.sum += val * n
 26 |     self.cnt += n
 27 |     self.avg = self.sum / self.cnt
 28 | 
 29 | 
 30 | def accuracy(output, target, topk=(1,)):
 31 |   maxk = max(topk)
 32 |   batch_size = target.size(0)
 33 | 
 34 |   _, pred = output.topk(maxk, 1, True, True)
 35 |   pred = pred.t()
 36 |   correct = pred.eq(target.view(1, -1).expand_as(pred))
 37 | 
 38 |   res = []
 39 |   for k in topk:
 40 |     correct_k = correct[:k].view(-1).float().sum(0)
 41 |     res.append(correct_k.mul_(100.0/batch_size))
 42 |   return res
 43 | 
 44 | 
 45 | class Cutout(object):
 46 |     def __init__(self, length):
 47 |         self.length = length
 48 | 
 49 |     def __call__(self, img):
 50 |         h, w = img.size(1), img.size(2)
 51 |         mask = np.ones((h, w), np.float32)
 52 |         y = np.random.randint(h)
 53 |         x = np.random.randint(w)
 54 | 
 55 |         y1 = np.clip(y - self.length // 2, 0, h)
 56 |         y2 = np.clip(y + self.length // 2, 0, h)
 57 |         x1 = np.clip(x - self.length // 2, 0, w)
 58 |         x2 = np.clip(x + self.length // 2, 0, w)
 59 | 
 60 |         mask[y1: y2, x1: x2] = 0.
 61 |         mask = torch.from_numpy(mask)
 62 |         mask = mask.expand_as(img)
 63 |         img *= mask
 64 |         return img
 65 | 
 66 | 
 67 | def _data_transforms_cifar10(cutout, cutout_length):
 68 |   CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
 69 |   CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
 70 | 
 71 |   train_transform = transforms.Compose([
 72 |     transforms.RandomCrop(32, padding=4),
 73 |     transforms.RandomHorizontalFlip(),
 74 |     transforms.ToTensor(),
 75 |     transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
 76 |   ])
 77 |   if cutout:
 78 |     train_transform.transforms.append(Cutout(cutout_length))
 79 | 
 80 |   valid_transform = transforms.Compose([
 81 |     transforms.ToTensor(),
 82 |     transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
 83 |     ])
 84 |   return train_transform, valid_transform
 85 | 
 86 | 
 87 | def count_parameters_in_MB(model):
 88 |   return np.sum(np.prod(v.size()) for name, v in model.named_parameters() if "auxiliary" not in name)/1e6
 89 | 
 90 | 
 91 | def save_checkpoint(state, is_best, save):
 92 |   filename = os.path.join(save, 'checkpoint.pth.tar')
 93 |   torch.save(state, filename)
 94 |   if is_best:
 95 |     best_filename = os.path.join(save, 'model_best.pth.tar')
 96 |     shutil.copyfile(filename, best_filename)
 97 | 
 98 | 
 99 | def save(model, model_path):
100 |   torch.save(model.state_dict(), model_path)
101 | 
102 | 
103 | def load(model, model_path):
104 |   model.load_state_dict(torch.load(model_path))
105 | 
106 | 
107 | def drop_path(x, drop_prob):
108 |     if drop_prob > 0.:
109 |         keep_prob = 1. - drop_prob
110 | 
111 |         mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
112 |         x.div_(keep_prob)
113 |         try:
114 |             x.mul_(mask)
115 |         except:
116 |             mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
117 |             x.mul_(mask)
118 |     return x
119 | 
120 | #
121 | # def create_exp_dir(path, scripts_to_save=None):
122 | #   if not os.path.exists(path):
123 | #     os.mkdir(path)
124 | #   print('Experiment dir : {}'.format(path))
125 | #
126 | #   if scripts_to_save is not None:
127 | #     os.mkdir(os.path.join(path, 'scripts'))
128 | #     for script in scripts_to_save:
129 | #       dst_file = os.path.join(path, 'scripts', os.path.basename(script))
130 | #       shutil.copyfile(script, dst_file)
131 | 
132 | 
133 | def create_exp_dir(path, scripts_to_save=None):
134 |     if not os.path.exists(path):
135 |         os.mkdir(path)
136 |     print('Experiment dir : {}'.format(path))
137 | 
138 |     if scripts_to_save is not None:
139 |         if not os.path.exists(os.path.join(path, 'scripts')):
140 |             os.mkdir(os.path.join(path, 'scripts'))
141 |         for script in scripts_to_save:
142 |             dst_file = os.path.join(path, 'scripts', os.path.basename(script))
143 |             shutil.copyfile(script, dst_file)
144 | 


--------------------------------------------------------------------------------
/LaNAS/one-shot_LaNAS/train.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | # This source code is licensed under the license found in the
 4 | # LICENSE file in the root directory of this source tree.
 5 | # 
 6 | from supernet.generator import encode_supernet, define_search_space
 7 | from LaNAS.MCTS import MCTS
 8 | from supernet.supernet_train import Trainer
 9 | 
10 | 
11 | trainer = Trainer( batch_size=80, init_channels= 48, epochs = 300 )
12 | trainer.run()
13 | 
14 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <p align="center">
 2 | <img src='https://github.com/linnanwang/paper-image-repo/blob/master/LA-MCTS/logo.png?raw=true' width="600">
 3 | </p>
 4 | 
 5 | # Latent Action Monte Carlo Tree Search (LA-MCTS)
 6 | 
 7 | LA-MCTS is a new MCTS based derivative-free meta-solver. It learns to partition the search space, so that solvers such as Bayesian optimization or evolutionary algorithms can focus on a smaller region to find better solutions with fewer samples.
 8 | 
 9 | Please 🌟star🌟  the repo if you like our work, thank you.
10 | 
11 | # Contributors
12 | Linnan Wang (First Author), Yuandong Tian (Principal Investigator), Yiyang Zhao, Saining Xie, Teng Li and Rodrigo Fonesca.
13 | 
14 | # What's in this release?
15 | 
16 | This release contains our implementation of LA-MCTS and its application to Neural Architecture Search (LaNAS), but it can also be applied to large-scale hyper-parameter optimization, reinforcement learning, scheduling, optimizing computer systems, and many others.
17 | 
18 | ## Neural Architecture Search (NAS) 
19 | - <a href="./LaNAS/LaNAS_NASBench101">**Evaluation on NASBench-101** </a>: Evaluating LaNAS on NASBench-101 on your laptop without training models. 
20 | 
21 | - <a href="./LaNAS/LaNet">**Our Searched Models, LaNet**</a>: SoTA results: • 99.03% on CIFAR-10 • 77.7% @ 240MFLOPS on ImageNet.
22 | 
23 | - <a href="./LaNAS/one-shot_LaNAS">**One/Few-shot LaNAS**</a>: Using a supernet to evaluate the model, obtaining results within a few GPU days.
24 | 
25 | - <a href="./LaNAS/Distributed_LaNAS">**Distributed LaNAS**</a>: Distributed framework for LaNAS, usable with hundreds of GPUs.
26 | 
27 | - <a href="./LaNAS/LaNet">**Training heuristics used**</a>: We list all tricks used in ImageNet training to reach SoTA. 
28 | 
29 | ## Black-box optimization 
30 | - <a href="./LA-MCTS">**Performance with baselines**</a>: 1 minute evaluations of LA-MCTS v.s. Bayesian Optimization and Evolutionary Search. </br>
31 |   **In the NeurIPS-2020 black box optimization challenge, the concept of LA-MCTS is used by 3rd (JetBrains) and 8th (KAIST) place teams. Check out the leaderboard <a href="https://bbochallenge.com/leaderboard">here</a>.**
32 | 
33 | - <a href="./LA-MCTS">**Mujoco Experiments**</a>: LA-MCTS on Mujoco environment. 
34 | 
35 | 
36 | #  Project Logs
37 | ## Building the MCTS based NAS agent
38 | 
39 | >Inspired by AlphaGo, we build the very first NAS search algorithm based on Monte Carlo Tree Search (MCTS) in 2017, namely AlphaX. The action space is fixed (layer-by-layer construction) and MCTS is used to steer towards promising search regions. We showed the Convolutional Neural Network designed by AlphaX improve the downstream applications such as detection, style transfer, image captioning, and many others.
40 | 
41 | <a href="https://arxiv.org/pdf/1805.07440.pdf">Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search</a> </br>
42 | AAAI-2020, [<a href="https://github.com/linnanwang/AlphaX-NASBench101">code</a>]</br>
43 | Linnan Wang (Brown), Yiyang Zhao(WPI), Yuu Jinnai(Brown), Yuandong Tian(FAIR), Rodrigo Fonseca(Brown)</br>
44 | 
45 | ## From AlphaX to LaNAS
46 | >On AlphaX, we find that different action space used in MCTS significantly affects the search efficiency, which motivates the idea of learning action space for MCTS on the fly during training.
47 | This leads to LaNAS. 
48 | LaNAS uses a linear classifier at each decision node of MCTS to learn good versus bad actions, and evaluates each leaf node, which now represents a subregion of the search space rather than a single architecture, by a uniform random sampling one architecture and evalute. 
49 | The first version of LaNAS implemented a distributed system to perform NAS by training every such samples from scratch using 500 GPUs. 
50 | The second version of LaNAS, called one-shot LaNAS, uses a single one-shot subnetwork to evaluate the quality of samples, trading evaluation efficiency with accuracy. 
51 | One-shot LaNAS finds a reasonable solution in a few GPU days.  
52 | 
53 | <a href="https://linnanwang.github.io/latent-actions.pdf">Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search</a> </br>
54 | TPAMI 2021 </br>
55 | Linnan Wang (Brown), Saining Xie (FAIR), Teng Li(FAIR), Rodrigo Fonesca (Brown), Yuandong Tian (FAIR)</br>
56 | 
57 | 
58 | ## From LaNAS to a generic solver LA-MCTS
59 | > Since LaNAS works very well on NAS datasets, e.g. NASBench-101, and the core of the algorithm can be easily generalized to other problems, we extend it to be a generic solver for black-box function optimization. 
60 | LA-MCTS further improves by using a nonlinear classifier at each decision node in MCTS and use a surrogate (e.g., a function approximator) to evaluate each sample in the leaf node. 
61 | The surrogate can come from any existing Black-box optimizer (e.g., Bayesian Optimization). 
62 | The details of LA-MCTS can be found in the following paper.  
63 | 
64 | <a href="https://arxiv.org/abs/2007.00708">Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search</a> </br>
65 | NeurIPS 2020 </br>
66 | Linnan Wang (Brown University), Rodrigo Fonesca (Brown University), Yuandong Tian (Facebook AI Research) </br>
67 | 
68 | ## From one-shot NAS to few-shot NAS
69 | > To overcome issues of one-shot NAS, we propose few-shot NAS that uses multiple supernets, each covering different regions of the search space specified by the intermediate of the search tree. Extensive experiments show that few-shot NAS significantly improves upon one-shot methods. See the paper below for details.
70 | 
71 | <a href="https://arxiv.org/abs/2006.06863">Few-shot Neural Architecture Search</a> </br> [<a href="https://github.com/aoiang/few-shot-NAS">code</a>] </br>
72 | Yiyang Zhao (WPI), Linnan Wang (Brown), Yuandong Tian (FAIR), Rodrigo Fonseca (Brown), Tian Guo (WPI)
73 | 
74 | 
75 | ## License
76 | LA-MCTS is under [CC-BY-NC 4.0 license](./LICENSE).
77 | 


--------------------------------------------------------------------------------