├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LA-MCTS-baselines ├── Bayesian-Optimization │ ├── functions.py │ └── run.py └── Nevergrad │ ├── functions.py │ └── run.py ├── LA-MCTS ├── README.md ├── functions │ ├── README.md │ ├── functions.py │ ├── mujoco_functions.py │ └── visualize_policy.py ├── images │ └── mujoco_experiments.png ├── lamcts │ ├── Classifier.py │ ├── MCTS.py │ ├── Node.py │ ├── __init__.py │ └── utils.py ├── requirements.txt ├── revision.patch ├── run.py ├── setup.py └── test.py ├── LICENSE ├── LaNAS ├── Distributed_LaNAS │ ├── README.md │ ├── clientX │ │ ├── client.py │ │ ├── continue_train.py │ │ ├── model.py │ │ ├── nasnet_set.py │ │ ├── operations.py │ │ ├── train_client.py │ │ └── utils.py │ ├── collect_results.py │ ├── launch_clients.sh │ ├── read_results.py │ ├── server │ │ ├── Classifier.py │ │ ├── MCTS.py │ │ ├── Node.py │ │ ├── net_training.py │ │ └── search_space.zip │ └── total_trace.json ├── LaNAS_NASBench101 │ ├── Classifier.py │ ├── MCTS.py │ ├── Node.py │ ├── README.md │ ├── extract_end_time.py │ ├── net_training.py │ └── our_past_results.txt ├── LaNet │ ├── CIFAR10 │ │ ├── README.md │ │ ├── auto_augment.py │ │ ├── model.py │ │ ├── nasnet_set.py │ │ ├── operations.py │ │ ├── test.py │ │ ├── train.py │ │ └── utils.py │ └── README.md ├── README.md └── one-shot_LaNAS │ ├── Evaluate │ ├── generator.py │ ├── individual_model.py │ ├── operations.py │ ├── run.sh │ ├── super_individual_train.py │ ├── translator.py │ └── utils.py │ ├── LaNAS │ ├── Classifier.py │ ├── MCTS.py │ ├── Node.py │ └── mlp_predictor.py │ ├── README.md │ ├── read_result.py │ ├── result.txt │ ├── search.py │ ├── supernet │ ├── generator.py │ ├── operations.py │ ├── supernet_model.py │ ├── supernet_train.py │ └── utils.py │ └── train.py └── README.md /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to make participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies within all project spaces, and it also applies when 49 | an individual is representing the project or its community in public spaces. 50 | Examples of representing a project or community include using an official 51 | project e-mail address, posting via an official social media account, or acting 52 | as an appointed representative at an online or offline event. Representation of 53 | a project may be further defined and clarified by project maintainers. 54 | 55 | ## Enforcement 56 | 57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 58 | reported by contacting the project team at . All 59 | complaints will be reviewed and investigated and will result in a response that 60 | is deemed necessary and appropriate to the circumstances. The project team is 61 | obligated to maintain confidentiality with regard to the reporter of an incident. 62 | Further details of specific enforcement policies may be posted separately. 63 | 64 | Project maintainers who do not follow or enforce the Code of Conduct in good 65 | faith may face temporary or permanent repercussions as determined by other 66 | members of the project's leadership. 67 | 68 | ## Attribution 69 | 70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 72 | 73 | [homepage]: https://www.contributor-covenant.org 74 | 75 | For answers to common questions about this code of conduct, see 76 | https://www.contributor-covenant.org/faq 77 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to LaMCTS 2 | We want to make contributing to this project as easy and transparent as 3 | possible. 4 | 5 | ## Pull Requests 6 | We actively welcome your pull requests. 7 | 8 | 1. Fork the repo and create your branch from `master`. 9 | 2. If you've added code that should be tested, add tests. 10 | 3. If you've changed APIs, update the documentation. 11 | 4. Ensure the test suite passes. 12 | 5. Make sure your code lints. 13 | 6. If you haven't already, complete the Contributor License Agreement ("CLA"). 14 | 15 | ## Contributor License Agreement ("CLA") 16 | In order to accept your pull request, we need you to submit a CLA. You only need 17 | to do this once to work on any of Facebook's open source projects. 18 | 19 | Complete your CLA here: 20 | 21 | ## Issues 22 | We use GitHub issues to track public bugs. Please ensure your description is 23 | clear and has sufficient instructions to be able to reproduce the issue. 24 | 25 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe 26 | disclosure of security bugs. In those cases, please go through the process 27 | outlined on that page and do not file a public issue. 28 | 29 | ## License 30 | By contributing to LaMCTS, you agree that your contributions will be licensed 31 | under the LICENSE file in the root directory of this source tree. 32 | -------------------------------------------------------------------------------- /LA-MCTS-baselines/Bayesian-Optimization/functions.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | import gym 8 | import json 9 | import os 10 | 11 | 12 | class tracker: 13 | def __init__(self, foldername): 14 | self.counter = 0 15 | self.results = [] 16 | self.curt_best = float("inf") 17 | self.curt_best_x = None 18 | self.foldername = foldername 19 | try: 20 | os.mkdir(foldername) 21 | except OSError: 22 | print ("Creation of the directory %s failed" % foldername) 23 | else: 24 | print ("Successfully created the directory %s " % foldername) 25 | 26 | def dump_trace(self): 27 | trace_path = self.foldername + '/result' + str(len( self.results) ) 28 | final_results_str = json.dumps(self.results) 29 | with open(trace_path, "a") as f: 30 | f.write(final_results_str + '\n') 31 | 32 | def track(self, result, x = None): 33 | if result < self.curt_best: 34 | self.curt_best = result 35 | self.curt_best_x = x 36 | print("") 37 | print("="*10) 38 | print("iteration:", self.counter, "total samples:", len(self.results) ) 39 | print("="*10) 40 | print("current best f(x):", self.curt_best) 41 | print("current best x:", np.around(self.curt_best_x, decimals=1)) 42 | self.results.append(self.curt_best) 43 | self.counter += 1 44 | if len(self.results) % 100 == 0: 45 | self.dump_trace() 46 | 47 | class Levy: 48 | def __init__(self, dims=1): 49 | self.dims = dims 50 | self.lb = -10 * np.ones(dims) 51 | self.ub = 10 * np.ones(dims) 52 | self.counter = 0 53 | print("####dims:", dims) 54 | self.tracker = tracker('Levy'+str(dims)) 55 | 56 | def __call__(self, x): 57 | x = np.array(x) 58 | self.counter += 1 59 | assert len(x) == self.dims 60 | assert x.ndim == 1 61 | assert np.all(x <= self.ub) and np.all(x >= self.lb) 62 | w = [] 63 | for idx in range(0, len(x)): 64 | w.append( 1 + (x[idx] - 1) / 4 ) 65 | w = np.array(w) 66 | 67 | term1 = ( np.sin( np.pi*w[0] ) )**2; 68 | 69 | term3 = ( w[-1] - 1 )**2 * ( 1 + ( np.sin( 2 * np.pi * w[-1] ) )**2 ); 70 | 71 | 72 | term2 = 0; 73 | for idx in range(1, len(w) ): 74 | wi = w[idx] 75 | new = (wi-1)**2 * ( 1 + 10 * ( np.sin( np.pi* wi + 1 ) )**2) 76 | term2 = term2 + new 77 | 78 | result = term1 + term2 + term3 79 | 80 | self.tracker.track( result, x ) 81 | 82 | return result 83 | 84 | 85 | class Ackley: 86 | def __init__(self, dims=3): 87 | self.dims = dims 88 | self.lb = -5 * np.ones(dims) 89 | self.ub = 10 * np.ones(dims) 90 | self.counter = 0 91 | self.tracker = tracker('Ackley'+str(dims)) 92 | 93 | 94 | def __call__(self, x): 95 | x = np.array(x) 96 | self.counter += 1 97 | assert len(x) == self.dims 98 | assert x.ndim == 1 99 | # assert np.all(x <= self.ub) and np.all(x >= self.lb) 100 | w = 1 + (x - 1.0) / 4.0 101 | result = (-20*np.exp(-0.2 * np.sqrt(np.inner(x,x) / x.size )) -np.exp(np.cos(2*np.pi*x).sum() /x.size) + 20 +np.e ) 102 | 103 | self.tracker.track( result, x ) 104 | 105 | return result 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | -------------------------------------------------------------------------------- /LA-MCTS-baselines/Bayesian-Optimization/run.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from functions import * 7 | import argparse 8 | import os 9 | from skopt import gp_minimize 10 | import argparse 11 | 12 | parser = argparse.ArgumentParser(description='Process inputs') 13 | parser.add_argument('--func', help='specify the test function') 14 | parser.add_argument('--dims', type=int, help='specify the problem dimensions') 15 | parser.add_argument('--iterations', type=int, help='specify the iterations to collect in the search') 16 | 17 | 18 | args = parser.parse_args() 19 | 20 | f = None 21 | iteration = 0 22 | if args.func == 'ackley': 23 | assert args.dims > 0 24 | f = Ackley(dims =args.dims) 25 | elif args.func == 'levy': 26 | f = Levy(dims = args.dims) 27 | else: 28 | print('function not defined') 29 | os._exit(1) 30 | 31 | assert args.dims > 0 32 | assert f is not None 33 | assert args.iterations > 0 34 | 35 | lower = f.lb 36 | upper = f.ub 37 | 38 | bounds = [] 39 | for idx in range(0, len(f.lb) ): 40 | bounds.append( ( float(f.lb[idx]), float(f.ub[idx])) ) 41 | 42 | res = gp_minimize(f, # the function to minimize 43 | bounds, # the bounds on each dimension of x 44 | acq_func="EI", # the acquisition function 45 | n_calls=args.iterations, 46 | acq_optimizer = "sampling", # using sampling to be consisent with our BO implementation 47 | n_initial_points=40 48 | ) -------------------------------------------------------------------------------- /LA-MCTS-baselines/Nevergrad/functions.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | import gym 8 | import json 9 | import os 10 | 11 | 12 | class tracker: 13 | def __init__(self, foldername): 14 | self.counter = 0 15 | self.results = [] 16 | self.curt_best = float("inf") 17 | self.curt_best_x = None 18 | self.foldername = foldername 19 | try: 20 | os.mkdir(foldername) 21 | except OSError: 22 | print ("Creation of the directory %s failed" % foldername) 23 | else: 24 | print ("Successfully created the directory %s " % foldername) 25 | 26 | def dump_trace(self): 27 | trace_path = self.foldername + '/result' + str(len( self.results) ) 28 | final_results_str = json.dumps(self.results) 29 | with open(trace_path, "a") as f: 30 | f.write(final_results_str + '\n') 31 | 32 | def track(self, result, x = None): 33 | if result < self.curt_best: 34 | self.curt_best = result 35 | self.curt_best_x = x 36 | print("") 37 | print("="*10) 38 | print("iteration:", self.counter, " total samples:", len(self.results)) 39 | print("="*10) 40 | print("current best f(x):", self.curt_best) 41 | print("current best x:", np.around(self.curt_best_x, decimals=1)) 42 | self.results.append(self.curt_best) 43 | self.counter += 1 44 | if len(self.results) % 100 == 0: 45 | self.dump_trace() 46 | 47 | class Levy: 48 | def __init__(self, dims=1): 49 | self.dims = dims 50 | self.lb = -10 * np.ones(dims) 51 | self.ub = 10 * np.ones(dims) 52 | self.counter = 0 53 | print("####dims:", dims) 54 | self.tracker = tracker('Levy'+str(dims)) 55 | 56 | def __call__(self, x): 57 | x = np.array(x) 58 | self.counter += 1 59 | assert len(x) == self.dims 60 | assert x.ndim == 1 61 | assert np.all(x <= self.ub) and np.all(x >= self.lb) 62 | w = [] 63 | for idx in range(0, len(x)): 64 | w.append( 1 + (x[idx] - 1) / 4 ) 65 | w = np.array(w) 66 | 67 | term1 = ( np.sin( np.pi*w[0] ) )**2; 68 | 69 | term3 = ( w[-1] - 1 )**2 * ( 1 + ( np.sin( 2 * np.pi * w[-1] ) )**2 ); 70 | 71 | 72 | term2 = 0; 73 | for idx in range(1, len(w) ): 74 | wi = w[idx] 75 | new = (wi-1)**2 * ( 1 + 10 * ( np.sin( np.pi* wi + 1 ) )**2) 76 | term2 = term2 + new 77 | 78 | result = term1 + term2 + term3 79 | 80 | self.tracker.track( result, x ) 81 | 82 | return result 83 | 84 | class Ackley: 85 | def __init__(self, dims=3): 86 | self.dims = dims 87 | self.lb = -5 * np.ones(dims) 88 | self.ub = 10 * np.ones(dims) 89 | self.counter = 0 90 | self.tracker = tracker('Ackley'+str(dims)) 91 | 92 | 93 | def __call__(self, x): 94 | self.counter += 1 95 | assert len(x) == self.dims 96 | assert x.ndim == 1 97 | # assert np.all(x <= self.ub) and np.all(x >= self.lb) 98 | w = 1 + (x - 1.0) / 4.0 99 | result = (-20*np.exp(-0.2 * np.sqrt(np.inner(x,x) / x.size )) -np.exp(np.cos(2*np.pi*x).sum() /x.size) + 20 +np.e ) 100 | 101 | self.tracker.track( result, x ) 102 | 103 | return result -------------------------------------------------------------------------------- /LA-MCTS-baselines/Nevergrad/run.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from functions import * 7 | import argparse 8 | import os 9 | import nevergrad as ng 10 | 11 | import argparse 12 | 13 | 14 | 15 | 16 | parser = argparse.ArgumentParser(description='Process inputs') 17 | parser.add_argument('--func', help='specify the test function') 18 | parser.add_argument('--dims', type=int, help='specify the problem dimensions') 19 | parser.add_argument('--iterations', type=int, help='specify the iterations to collect in the search') 20 | 21 | 22 | args = parser.parse_args() 23 | 24 | f = None 25 | iteration = 0 26 | if args.func == 'ackley': 27 | assert args.dims > 0 28 | f = Ackley(dims =args.dims) 29 | elif args.func == 'levy': 30 | f = Levy(dims = args.dims) 31 | else: 32 | print('function not defined') 33 | os._exit(1) 34 | 35 | assert args.dims > 0 36 | assert f is not None 37 | assert args.iterations > 0 38 | 39 | 40 | def from_unit_cube(x, lb, ub): 41 | """Project from [0, 1]^d to hypercube with bounds lb and ub""" 42 | assert np.all(lb < ub) and lb.ndim == 1 and ub.ndim == 1 and x.ndim == 2 43 | xx = x * (ub - lb) + lb 44 | return np.ravel(xx) 45 | 46 | init = from_unit_cube( np.random.rand(f.dims).reshape(1,-1), f.lb, f.ub) 47 | 48 | param = ng.p.Array(init=init ).set_bounds(f.lb, f.ub) 49 | 50 | optimizer = ng.optimizers.NGOpt(parametrization=param, budget=args.iterations) 51 | 52 | recommendation = optimizer.minimize(f) -------------------------------------------------------------------------------- /LA-MCTS/README.md: -------------------------------------------------------------------------------- 1 | # Latent Action Monte Carlo Tree Search (LA-MCTS) 2 | LA-MCTS is a meta-algortihm that partitions the search space for black-box optimizations. LA-MCTS progressively learns to partition and explores promising regions in the search space, so that solvers such as Bayesian Optimizations (BO) can focus on promising subregions, mitigating the over-exploring issue in high-dimensional problems. 3 | 4 |

5 | 6 |

7 | 8 | Please reference the following publication when using this package. ArXiv link. 9 | 10 | ``` 11 | @article{wang2020learning, 12 | title={Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search}, 13 | author={Wang, Linnan and Fonseca, Rodrigo and Tian, Yuandong}, 14 | journal={NeurIPS}, 15 | year={2020} 16 | } 17 | ``` 18 | 19 | ## Run LaMCTS and Baselines in test functions (1 minute tutorial) 20 | For 1 minute evaluation of Bayesian Optimizations (BO) and Evolutionary Algorithm (EA) v.s. LA-MCTS boosted BO, please follow the procedures below. 21 | 22 | Here we test on Ackley or Levy in 10 dimensions; Please run multiple times to compare the average performance. 23 | 24 | - ***Evaluate LA-MCTS boosted Bayesian Optimization***: using GP surrogate, EI acuqusition, plus LA-MCTS. 25 | ``` 26 | cd LA-MCTS 27 | python run.py --func ackley --dims 10 --iterations 100 28 | ``` 29 | 30 | - ***Evaluate Bayesian Optimization***: using GP surrogate, EI acuqusition. 31 | ``` 32 | pip install scikit-optimize 33 | cd LA-MCTS-baselines/Bayesian-Optimization 34 | python run.py --func ackley --dims 10 --iterations 100 35 | ``` 36 | 37 | - ***Evaluate Evolutionary Algorithm***: using NGOpt from Nevergrad. 38 | ``` 39 | pip install nevergrad 40 | cd LA-MCTS-baselines/Nevergrad 41 | python run.py --func ackley --dims 10 --iterations 100 42 | ``` 43 | 44 | 45 | ## How to use LA-MCTS to optimize your own function? 46 | Please wrap your function into a class defined as follows; functions/functions.py provides a few examples. 47 | 48 | ``` 49 | class myFunc: 50 | def __init__(self, dims=1): 51 | self.dims = dims #problem dimensions 52 | self.lb = np.ones(dims) #lower bound for each dimensions 53 | self.ub = np.ones(dims) #upper bound for each dimensions 54 | self.tracker = tracker('myFunc') #defined in functions.py 55 | 56 | def __call__(self, x): 57 | # some sanity check of x 58 | f(x) = myFunc(x) 59 | self.tracker.track( f(x), x ) 60 | return f(x) 61 | ``` 62 | 63 | After defining your function, e.g. f = func(), minimizing f(x) is as easy as passing f into MCTS. 64 | ``` 65 | f = myFunc() 66 | agent = MCTS(lb = f.lb, # the lower bound of each problem dimensions 67 | ub = f.ub, # the upper bound of each problem dimensions 68 | dims = f.dims, # the problem dimensions 69 | ninits = 40, # the number of random samples used in initializations 70 | func = f # function object to be optimized 71 | ) 72 | agent.search(iterations = 100) 73 | ``` 74 | Please check `run.py`. 75 | 76 | 77 | ## What it can and cannot do? 78 | In this release, the codes only support optimizing continuous black box functions. 79 | 80 | ## Tuning LA-MCTS 81 | 82 | ### **Cp**: controling the amount of exploration, MCTS.py line 27. 83 | > We set Cp = 0.1 * max of f(x) . 84 | 85 | > For example, if f(x) measures the test accuracy of a neural network x, Cp = 0.1. But the best performance should be tuned in specific cases. A large Cp encourages LA-MCTS to visit bad regions more often (exploration), and a small Cp otherwise. LA-MCTS degenreates to random search if Cp = 0, while LA-MCTS degenerates to a pure greedy based policy, e.g. regression tree, at Cp = 0. Both are undesired. 86 | 87 | ### **Leaf Size**: the splitting threshold (θ), MCTS.py line 38. 88 | > We set θ ∈ [20, 100] in our experiments. 89 | 90 | > the splitting threshold controls the speed of tree growth. Given the same \#samples, smaller θ leads to a deeper tree. If Ω is very large, more splits enable LA-MCTS to quickly focus on a small promising region, and yields good results. However, if θ is too small, the performance and the boundary estimation of the region become more unreliable. 91 | 92 | ### **SVM kernel**: the type of kernels used by SVM, Classifier.py line 35. 93 | > kernel can be 'linear', 'poly', 'rbf' 94 | 95 | > From our experiments, linear kernel is the fastest, but rbf or poly are generally producing better results. If you want to draw > 1000 samples, we suggest using linear kernel, and rbf and poly otherwise. 96 | 97 | ## Mujoco tasks and Gym Games. 98 |

99 | 100 |

101 | 102 | - **Run Lunarlanding**: 103 | 1. Install gym and start running. 104 | ``` 105 | pip install gym 106 | python run.py --func lunar --samples 500 107 | ``` 108 | Copy the final "current best x" from the output to visualize your policy. 109 | 110 | 2. Visualize your policy 111 | ``` 112 | cd functions 113 | Replace our policy value to your learned policy from the previous step, i.e. policy = np.array([xx]). 114 | python visualize_policy.py 115 | ``` 116 | - **Run MuJoCo**: 117 |

118 |       119 | 120 |

121 | 122 | 1. Setup mujoco-py, see here. 123 | 124 | 2. Download TuRBO from here. Extract and find the turbo folder. 125 | 126 | 3. Move revision.patch to the turbo folder, 127 | ``` 128 | cp revision.patch TuRBO-master/turbo 129 | cd TuRBO-master/turbo 130 | patch -p1 < revision.patch 131 | cd ../.. 132 | mv TuRBO-master/turbo ./turbo_1 133 | ``` 134 | 4. Open file Classifier.py, uncomment line 23, 343->368. Open file MCTS.py, change line 47, solver_type from bo to turbo. 135 | 136 | 5. Now it is ready to run, 137 | ``` 138 | python run.py --func swimmer --iterations 1000 139 | ``` 140 | 141 | ![Mujoco Experiments](images/mujoco_experiments.png) 142 | 143 | 144 | ## Possible Extensions 145 | In MCTS.py line 229, the select returns a path and a leaf to bound the sampling space. We used ```get_sample_ratio_in_region``` in Classifier.py to acquire samples in the selected partition. Other sampler can also be used. 146 | 147 | LAMCTS can be used together with any value function / evaluator / cost models. 148 | -------------------------------------------------------------------------------- /LA-MCTS/functions/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /LA-MCTS/functions/functions.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | import gym 8 | import json 9 | import os 10 | 11 | import imageio 12 | 13 | 14 | class tracker: 15 | def __init__(self, foldername): 16 | self.counter = 0 17 | self.results = [] 18 | self.curt_best = float("inf") 19 | self.foldername = foldername 20 | try: 21 | os.mkdir(foldername) 22 | except OSError: 23 | print ("Creation of the directory %s failed" % foldername) 24 | else: 25 | print ("Successfully created the directory %s " % foldername) 26 | 27 | def dump_trace(self): 28 | trace_path = self.foldername + '/result' + str(len( self.results) ) 29 | final_results_str = json.dumps(self.results) 30 | with open(trace_path, "a") as f: 31 | f.write(final_results_str + '\n') 32 | 33 | def track(self, result): 34 | if result < self.curt_best: 35 | self.curt_best = result 36 | self.results.append(self.curt_best) 37 | if len(self.results) % 100 == 0: 38 | self.dump_trace() 39 | 40 | class Levy: 41 | def __init__(self, dims=10): 42 | self.dims = dims 43 | self.lb = -10 * np.ones(dims) 44 | self.ub = 10 * np.ones(dims) 45 | self.tracker = tracker('Levy'+str(dims)) 46 | 47 | #tunable hyper-parameters in LA-MCTS 48 | self.Cp = 10 49 | self.leaf_size = 8 50 | self.kernel_type = "poly" 51 | self.ninits = 40 52 | self.gamma_type = "auto" 53 | print("initialize levy at dims:", self.dims) 54 | 55 | def __call__(self, x): 56 | assert len(x) == self.dims 57 | assert x.ndim == 1 58 | assert np.all(x <= self.ub) and np.all(x >= self.lb) 59 | 60 | w = [] 61 | for idx in range(0, len(x)): 62 | w.append( 1 + (x[idx] - 1) / 4 ) 63 | w = np.array(w) 64 | 65 | term1 = ( np.sin( np.pi*w[0] ) )**2; 66 | 67 | term3 = ( w[-1] - 1 )**2 * ( 1 + ( np.sin( 2 * np.pi * w[-1] ) )**2 ); 68 | 69 | 70 | term2 = 0; 71 | for idx in range(1, len(w) ): 72 | wi = w[idx] 73 | new = (wi-1)**2 * ( 1 + 10 * ( np.sin( np.pi* wi + 1 ) )**2) 74 | term2 = term2 + new 75 | 76 | result = term1 + term2 + term3 77 | self.tracker.track( result ) 78 | 79 | return result 80 | 81 | class Ackley: 82 | def __init__(self, dims=10): 83 | self.dims = dims 84 | self.lb = -5 * np.ones(dims) 85 | self.ub = 10 * np.ones(dims) 86 | self.counter = 0 87 | self.tracker = tracker('Ackley'+str(dims) ) 88 | 89 | #tunable hyper-parameters in LA-MCTS 90 | self.Cp = 1 91 | self.leaf_size = 10 92 | self.ninits = 40 93 | self.kernel_type = "rbf" 94 | self.gamma_type = "auto" 95 | 96 | 97 | def __call__(self, x): 98 | self.counter += 1 99 | assert len(x) == self.dims 100 | assert x.ndim == 1 101 | assert np.all(x <= self.ub) and np.all(x >= self.lb) 102 | result = (-20*np.exp(-0.2 * np.sqrt(np.inner(x,x) / x.size )) -np.exp(np.cos(2*np.pi*x).sum() /x.size) + 20 +np.e ) 103 | self.tracker.track( result ) 104 | 105 | return result 106 | 107 | class Lunarlanding: 108 | def __init__(self): 109 | self.dims = 12 110 | self.lb = np.zeros(12) 111 | self.ub = 2 * np.ones(12) 112 | self.counter = 0 113 | self.env = gym.make('LunarLander-v2') 114 | 115 | #tunable hyper-parameters in LA-MCTS 116 | self.Cp = 50 117 | self.leaf_size = 10 118 | self.kernel_type = "poly" 119 | self.ninits = 40 120 | self.gamma_type = "scale" 121 | 122 | self.render = False 123 | 124 | 125 | def heuristic_Controller(self, s, w): 126 | angle_targ = s[0] * w[0] + s[2] * w[1] 127 | if angle_targ > w[2]: 128 | angle_targ = w[2] 129 | if angle_targ < -w[2]: 130 | angle_targ = -w[2] 131 | hover_targ = w[3] * np.abs(s[0]) 132 | 133 | angle_todo = (angle_targ - s[4]) * w[4] - (s[5]) * w[5] 134 | hover_todo = (hover_targ - s[1]) * w[6] - (s[3]) * w[7] 135 | 136 | if s[6] or s[7]: 137 | angle_todo = w[8] 138 | hover_todo = -(s[3]) * w[9] 139 | 140 | a = 0 141 | if hover_todo > np.abs(angle_todo) and hover_todo > w[10]: 142 | a = 2 143 | elif angle_todo < -w[11]: 144 | a = 3 145 | elif angle_todo > +w[11]: 146 | a = 1 147 | return a 148 | 149 | def __call__(self, x): 150 | self.counter += 1 151 | assert len(x) == self.dims 152 | assert x.ndim == 1 153 | assert np.all(x <= self.ub) and np.all(x >= self.lb) 154 | 155 | total_rewards = [] 156 | for i in range(0, 3): # controls the number of episode/plays per trial 157 | state = self.env.reset() 158 | rewards_for_episode = [] 159 | num_steps = 2000 160 | 161 | for step in range(num_steps): 162 | if self.render: 163 | self.env.render() 164 | received_action = self.heuristic_Controller(state, x) 165 | next_state, reward, done, info = self.env.step(received_action) 166 | rewards_for_episode.append( reward ) 167 | state = next_state 168 | if done: 169 | break 170 | 171 | rewards_for_episode = np.array(rewards_for_episode) 172 | total_rewards.append( np.sum(rewards_for_episode) ) 173 | total_rewards = np.array(total_rewards) 174 | mean_rewards = np.mean( total_rewards ) 175 | 176 | return mean_rewards*-1 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | -------------------------------------------------------------------------------- /LA-MCTS/functions/mujoco_functions.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | import gym 8 | import json 9 | import os 10 | 11 | class Swimmer: 12 | 13 | def __init__(self): 14 | self.policy_shape = (2, 8) 15 | self.mean = 0 16 | self.std = 1 17 | self.dims = 16 18 | self.lb = -1 * np.ones(self.dims) 19 | self.ub = 1 * np.ones(self.dims) 20 | self.counter = 0 21 | self.env = gym.make('Swimmer-v2') 22 | self.num_rollouts = 3 23 | 24 | #tunable hyper-parameters in LA-MCTS 25 | self.Cp = 20 26 | self.leaf_size = 10 27 | self.kernel_type = "poly" 28 | self.gamma_type = "scale" 29 | self.ninits = 40 30 | print("===========initialization===========") 31 | print("mean:", self.mean) 32 | print("std:", self.std) 33 | print("dims:", self.dims) 34 | print("policy:", self.policy_shape ) 35 | 36 | self.render = False 37 | 38 | 39 | def __call__(self, x): 40 | self.counter += 1 41 | assert len(x) == self.dims 42 | assert x.ndim == 1 43 | assert np.all(x <= self.ub) and np.all(x >= self.lb) 44 | 45 | M = x.reshape(self.policy_shape) 46 | 47 | returns = [] 48 | observations = [] 49 | actions = [] 50 | 51 | 52 | for i in range(self.num_rollouts): 53 | obs = self.env.reset() 54 | done = False 55 | totalr = 0. 56 | steps = 0 57 | 58 | while not done: 59 | 60 | action = np.dot(M, (obs - self.mean)/self.std) 61 | observations.append(obs) 62 | actions.append(action) 63 | obs, r, done, _ = self.env.step(action) 64 | totalr += r 65 | steps += 1 66 | if self.render: 67 | self.env.render() 68 | returns.append(totalr) 69 | 70 | 71 | return np.mean(returns)*-1 72 | 73 | 74 | class Hopper: 75 | 76 | def __init__(self): 77 | self.mean = np.array([1.41599384, -0.05478602, -0.25522216, -0.25404721, 78 | 0.27525085, 2.60889529, -0.0085352, 0.0068375, 79 | -0.07123674, -0.05044839, -0.45569644]) 80 | self.std = np.array([0.19805723, 0.07824488, 0.17120271, 0.32000514, 81 | 0.62401884, 0.82814161, 1.51915814, 1.17378372, 82 | 1.87761249, 3.63482761, 5.7164752 ]) 83 | self.dims = 33 84 | self.lb = -1 * np.ones(self.dims) 85 | self.ub = 1 * np.ones(self.dims) 86 | self.counter = 0 87 | self.env = gym.make('Hopper-v2') 88 | self.num_rollouts = 3 89 | self.render = False 90 | self.policy_shape = (3, 11) 91 | 92 | #tunable hyper-parameters in LA-MCTS 93 | self.Cp = 10 94 | self.leaf_size = 100 95 | self.kernel_type = "poly" 96 | self.gamma_type = "auto" 97 | self.ninits = 150 98 | 99 | print("===========initialization===========") 100 | print("mean:", self.mean) 101 | print("std:", self.std) 102 | print("dims:", self.dims) 103 | print("policy:", self.policy_shape ) 104 | 105 | def __call__(self, x): 106 | self.counter += 1 107 | assert len(x) == self.dims 108 | assert x.ndim == 1 109 | assert np.all(x <= self.ub) and np.all(x >= self.lb) 110 | 111 | M = x.reshape(self.policy_shape) 112 | 113 | returns = [] 114 | observations = [] 115 | actions = [] 116 | 117 | for i in range(self.num_rollouts): 118 | obs = self.env.reset() 119 | done = False 120 | totalr = 0. 121 | steps = 0 122 | while not done: 123 | # M = self.policy 124 | inputs = (obs - self.mean)/self.std 125 | action = np.dot(M, inputs) 126 | observations.append(obs) 127 | actions.append(action) 128 | obs, r, done, _ = self.env.step(action) 129 | totalr += r 130 | steps += 1 131 | if self.render: 132 | self.env.render() 133 | returns.append(totalr) 134 | 135 | return np.mean(returns)*-1 136 | 137 | # ######################################## # 138 | # Visualize the learned policy for Swimmer # 139 | # ######################################## # 140 | # f = Swimmer() 141 | # x = np.array([-0.5343142,-0.46203456, -0.70218485, 142 | # -0.00929887, 0.4072553, 0.04604763, 143 | # 0.67289615, -0.5894774, 0.79874759, 144 | # 0.84010238, 0.54327755, 0.25715409, 145 | # 0.89032131, -0.56112252, -0.0960243, 146 | # 0.13397496]) 147 | # f.render = True 148 | # result = f(x) 149 | # print( result ) 150 | 151 | # f = Hopper() 152 | # x = np.random.rand(f.dims) 153 | # result = f(x) 154 | # print( result ) 155 | 156 | 157 | 158 | -------------------------------------------------------------------------------- /LA-MCTS/functions/visualize_policy.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | 7 | from functions import Lunarlanding 8 | from mujoco_functions import * 9 | import numpy as np 10 | 11 | 12 | 13 | 14 | # policy = np.array([ 0.6829919, 0.4348611, 1.93635682, 1.53007997, 15 | # 1.69574236, 0.66056938, 0.28328839, 1.12798157, 16 | # 0.06496076, 1.71219888, 0.23686494, 0.20135697 ] ) 17 | # f = Lunarlanding() 18 | # f.render = True 19 | # result = f(policy) 20 | # print( result ) 21 | 22 | policy = np.array( 23 | [ 0.40721659, 0.64248771, 0.31267019, -0.69240676, -0.00208609, -0.86336196, 24 | -0.54423801, 0.28333422, -0.68388651, -0.26167397, -0.58448575, 0.11981415, 25 | -0.90660989, 0.55700556, -0.22651554, 0.42790948, 0.15368999, 0.7514032, 26 | -0.42978046, -0.60632853, -0.88724493, -0.01787839, 0.74753749, -0.8137155, 27 | 0.41300612, 0.08062934, -0.25451053, -0.77197475, -0.09003459, -0.76673666, 28 | -0.30785222, 0.41125726, -0.11475573] 29 | ) 30 | f = Hopper() 31 | f.render = True 32 | result = f(policy) 33 | print(result) 34 | -------------------------------------------------------------------------------- /LA-MCTS/images/mujoco_experiments.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/LaMCTS/489bd60886f23b0b76b10aa8602ea6722f334ad6/LA-MCTS/images/mujoco_experiments.png -------------------------------------------------------------------------------- /LA-MCTS/lamcts/Node.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from .Classifier import Classifier 7 | import json 8 | import numpy as np 9 | import math 10 | import operator 11 | 12 | class Node: 13 | obj_counter = 0 14 | # If a leave holds >= SPLIT_THRESH, we split into two new nodes. 15 | 16 | def __init__(self, parent = None, dims = 0, reset_id = False, kernel_type = "rbf", gamma_type = "auto"): 17 | # Note: every node is initialized as a leaf, 18 | # only internal nodes equip with classifiers to make decisions 19 | # if not is_root: 20 | # assert type( parent ) == type( self ) 21 | self.dims = dims 22 | self.x_bar = float('inf') 23 | self.n = 0 24 | self.uct = 0 25 | self.classifier = Classifier( [], self.dims, kernel_type, gamma_type ) 26 | 27 | #insert curt into the kids of parent 28 | self.parent = parent 29 | self.kids = [] # 0:good, 1:bad 30 | 31 | self.bag = [] 32 | self.is_svm_splittable = False 33 | 34 | if reset_id: 35 | Node.obj_counter = 0 36 | 37 | self.id = Node.obj_counter 38 | 39 | #data for good and bad kids, respectively 40 | Node.obj_counter += 1 41 | 42 | def update_kids(self, good_kid, bad_kid): 43 | assert len(self.kids) == 0 44 | self.kids.append( good_kid ) 45 | self.kids.append( bad_kid ) 46 | assert self.kids[0].classifier.get_mean() > self.kids[1].classifier.get_mean() 47 | 48 | def is_good_kid(self): 49 | if self.parent is not None: 50 | if self.parent.kids[0] == self: 51 | return True 52 | else: 53 | return False 54 | else: 55 | return False 56 | 57 | def is_leaf(self): 58 | if len(self.kids) == 0: 59 | return True 60 | else: 61 | return False 62 | 63 | def visit(self): 64 | self.n += 1 65 | 66 | def print_bag(self): 67 | sorted_bag = sorted(self.bag.items(), key=operator.itemgetter(1)) 68 | print("BAG"+"#"*10) 69 | for item in sorted_bag: 70 | print(item[0],"==>", item[1]) 71 | print("BAG"+"#"*10) 72 | print('\n') 73 | 74 | def update_bag(self, samples): 75 | assert len(samples) > 0 76 | 77 | self.bag.clear() 78 | self.bag.extend( samples ) 79 | self.classifier.update_samples( self.bag ) 80 | if len(self.bag) <= 2: 81 | self.is_svm_splittable = False 82 | else: 83 | self.is_svm_splittable = self.classifier.is_splittable_svm() 84 | self.x_bar = self.classifier.get_mean() 85 | self.n = len( self.bag ) 86 | 87 | def clear_data(self): 88 | self.bag.clear() 89 | 90 | def get_name(self): 91 | # state is a list of jsons 92 | return "node" + str(self.id) 93 | 94 | def pad_str_to_8chars(self, ins, total): 95 | if len(ins) <= total: 96 | ins += ' '*(total - len(ins) ) 97 | return ins 98 | else: 99 | return ins[0:total] 100 | 101 | def get_rand_sample_from_bag(self): 102 | if len( self.bag ) > 0: 103 | upeer_boundary = len(list(self.bag)) 104 | rand_idx = np.random.randint(0, upeer_boundary) 105 | return self.bag[rand_idx][0] 106 | else: 107 | return None 108 | 109 | def get_parent_str(self): 110 | return self.parent.get_name() 111 | 112 | def propose_samples_bo(self, num_samples, path, lb, ub, samples): 113 | proposed_X = self.classifier.propose_samples_bo(num_samples, path, lb, ub, samples) 114 | return proposed_X 115 | 116 | def propose_samples_turbo(self, num_samples, path, func): 117 | proposed_X, fX = self.classifier.propose_samples_turbo(num_samples, path, func) 118 | return proposed_X, fX 119 | 120 | def propose_samples_rand(self, num_samples): 121 | assert num_samples > 0 122 | samples = self.classifier.propose_samples_rand(num_samples) 123 | return samples 124 | 125 | def __str__(self): 126 | name = self.get_name() 127 | name = self.pad_str_to_8chars(name, 7) 128 | name += ( self.pad_str_to_8chars( 'is good:' + str(self.is_good_kid() ), 15 ) ) 129 | name += ( self.pad_str_to_8chars( 'is leaf:' + str(self.is_leaf() ), 15 ) ) 130 | 131 | val = 0 132 | name += ( self.pad_str_to_8chars( ' val:{0:.4f} '.format(round(self.get_xbar(), 3) ), 20 ) ) 133 | name += ( self.pad_str_to_8chars( ' uct:{0:.4f} '.format(round(self.get_uct(), 3) ), 20 ) ) 134 | 135 | name += self.pad_str_to_8chars( 'sp/n:'+ str(len(self.bag))+"/"+str(self.n), 15 ) 136 | upper_bound = np.around( np.max(self.classifier.X, axis = 0), decimals=2 ) 137 | lower_bound = np.around( np.min(self.classifier.X, axis = 0), decimals=2 ) 138 | boundary = '' 139 | for idx in range(0, self.dims): 140 | boundary += str(lower_bound[idx])+'>'+str(upper_bound[idx])+' ' 141 | 142 | #name += ( self.pad_str_to_8chars( 'bound:' + boundary, 60 ) ) 143 | 144 | parent = '----' 145 | if self.parent is not None: 146 | parent = self.parent.get_name() 147 | parent = self.pad_str_to_8chars(parent, 10) 148 | 149 | name += (' parent:' + parent) 150 | 151 | kids = '' 152 | kid = '' 153 | for k in self.kids: 154 | kid = self.pad_str_to_8chars( k.get_name(), 10 ) 155 | kids += kid 156 | name += (' kids:' + kids) 157 | 158 | return name 159 | 160 | 161 | def get_uct(self, Cp = 10 ): 162 | if self.parent == None: 163 | return float('inf') 164 | if self.n == 0: 165 | return float('inf') 166 | return self.x_bar + 2*Cp*math.sqrt( 2* np.power(self.parent.n, 0.5) / self.n ) 167 | 168 | def get_xbar(self): 169 | return self.x_bar 170 | 171 | def get_n(self): 172 | return self.n 173 | 174 | def train_and_split(self): 175 | assert len(self.bag) >= 2 176 | self.classifier.update_samples( self.bag ) 177 | good_kid_data, bad_kid_data = self.classifier.split_data() 178 | assert len( good_kid_data ) + len( bad_kid_data ) == len( self.bag ) 179 | return good_kid_data, bad_kid_data 180 | 181 | def plot_samples_and_boundary(self, func): 182 | name = self.get_name() + ".pdf" 183 | self.classifier.plot_samples_and_boundary(func, name) 184 | 185 | def sample_arch(self): 186 | if len(self.bag) == 0: 187 | return None 188 | net_str = np.random.choice( list(self.bag.keys() ) ) 189 | del self.bag[net_str] 190 | return json.loads(net_str ) 191 | 192 | 193 | 194 | # print(root) 195 | # 196 | # with open('features.json', 'r') as infile: 197 | # data=json.loads( infile.read() ) 198 | # samples = {} 199 | # for d in data: 200 | # samples[ json.dumps(d['feature']) ] = d['acc'] 201 | # n1 = Node(samples, root) 202 | # print(n1) 203 | # 204 | # n1 = 205 | 206 | -------------------------------------------------------------------------------- /LA-MCTS/lamcts/__init__.py: -------------------------------------------------------------------------------- 1 | from .MCTS import MCTS -------------------------------------------------------------------------------- /LA-MCTS/lamcts/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | 7 | import numpy as np 8 | 9 | def from_unit_cube(point, lb, ub): 10 | assert np.all(lb < ub) 11 | assert lb.ndim == 1 12 | assert ub.ndim == 1 13 | assert point.ndim == 2 14 | new_point = point * (ub - lb) + lb 15 | return new_point 16 | 17 | def latin_hypercube(n, dims): 18 | points = np.zeros((n, dims)) 19 | centers = (1.0 + 2.0 * np.arange(0.0, n)) 20 | centers = centers / float(2 * n) 21 | for i in range(0, dims): 22 | points[:, i] = centers[np.random.permutation(n)] 23 | 24 | perturbation = np.random.uniform(-1.0, 1.0, (n, dims)) 25 | perturbation = perturbation / float(2 * n) 26 | points += perturbation 27 | return points 28 | 29 | -------------------------------------------------------------------------------- /LA-MCTS/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | torch 3 | scikit-learn 4 | matplotlib -------------------------------------------------------------------------------- /LA-MCTS/revision.patch: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/LaMCTS/489bd60886f23b0b76b10aa8602ea6722f334ad6/LA-MCTS/revision.patch -------------------------------------------------------------------------------- /LA-MCTS/run.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from functions.functions import * 7 | from functions.mujoco_functions import * 8 | from lamcts import MCTS 9 | import argparse 10 | 11 | 12 | 13 | 14 | parser = argparse.ArgumentParser(description='Process inputs') 15 | parser.add_argument('--func', help='specify the test function') 16 | parser.add_argument('--dims', type=int, help='specify the problem dimensions') 17 | parser.add_argument('--iterations', type=int, help='specify the iterations to collect in the search') 18 | 19 | 20 | args = parser.parse_args() 21 | 22 | f = None 23 | iteration = 0 24 | if args.func == 'ackley': 25 | assert args.dims > 0 26 | f = Ackley(dims =args.dims) 27 | elif args.func == 'levy': 28 | assert args.dims > 0 29 | f = Levy(dims = args.dims) 30 | elif args.func == 'lunar': 31 | f = Lunarlanding() 32 | elif args.func == 'swimmer': 33 | f = Swimmer() 34 | elif args.func == 'hopper': 35 | f = Hopper() 36 | else: 37 | print('function not defined') 38 | os._exit(1) 39 | 40 | assert f is not None 41 | assert args.iterations > 0 42 | 43 | 44 | # f = Ackley(dims = 10) 45 | # f = Levy(dims = 10) 46 | # f = Swimmer() 47 | # f = Hopper() 48 | # f = Lunarlanding() 49 | 50 | agent = MCTS( 51 | lb = f.lb, # the lower bound of each problem dimensions 52 | ub = f.ub, # the upper bound of each problem dimensions 53 | dims = f.dims, # the problem dimensions 54 | ninits = f.ninits, # the number of random samples used in initializations 55 | func = f, # function object to be optimized 56 | Cp = f.Cp, # Cp for MCTS 57 | leaf_size = f.leaf_size, # tree leaf size 58 | kernel_type = f.kernel_type, #SVM configruation 59 | gamma_type = f.gamma_type #SVM configruation 60 | ) 61 | 62 | agent.search(iterations = args.iterations) 63 | 64 | """ 65 | FAQ: 66 | 67 | 1. How to retrieve every f(x) during the search? 68 | 69 | During the optimization, the function will create a folder to store the f(x) trace; and 70 | the name of the folder is in the format of function name + function dimensions, e.g. Ackley10. 71 | 72 | Every 100 samples, the function will write a row to a file named results + total samples, e.g. result100 73 | mean f(x) trace in the first 100 samples. 74 | 75 | Each last row of result file contains the f(x) trace starting from 1th sample -> the current sample. 76 | Results of previous rows are from previous experiments, as we always append the results from a new experiment 77 | to the last row. 78 | 79 | Here is an example to interpret a row of f(x) trace. 80 | [5, 3.2, 2.1, ..., 1.1] 81 | The first sampled f(x) is 5, the second sampled f(x) is 3.2, and the last sampled f(x) is 1.1 82 | 83 | 2. How to improve the performance? 84 | Tune Cp, leaf_size, and improve BO sampler with others. 85 | 86 | """ 87 | -------------------------------------------------------------------------------- /LA-MCTS/setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | 3 | with open("README.md", "r") as fh: 4 | long_description = fh.read() 5 | 6 | with open("requirements.txt") as f: 7 | required = f.read().splitlines() 8 | 9 | setuptools.setup( 10 | name = 'LA-MCTS', 11 | version = '0.1', 12 | author = "Linnan Wang", 13 | author_email = "wangnan318@gmail.com", 14 | description = "", 15 | long_description = long_description, 16 | long_description_content_type = "text/markdown", 17 | url = "https://github.com/facebookresearch/LaMCTS", 18 | packages = ["lamcts"], 19 | install_requires=required, 20 | include_package_data = True, 21 | classifiers = [ 22 | "Programming Language :: Python :: 3", 23 | "Operating System :: OS Independent", 24 | ] 25 | ) 26 | -------------------------------------------------------------------------------- /LA-MCTS/test.py: -------------------------------------------------------------------------------- 1 | from lamcts import MCTS -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/README.md: -------------------------------------------------------------------------------- 1 | ## Distributed LaNAS 2 | • accurate evaluations, suit for chasing SoTA results 3 | 4 | • you need a lot of GPUs 5 | 6 | This provides a simple distributed framework for training using LaNAS, with which we achieve SoTA results with 500 GPUs. The distributed LaNAS trains every sampled network from scratch, and I believe techniques such as early prediction will be a very nice improvement to 7 | the current implementation. Because sending network configurations is fairly cheap, we implemented a simple client-server system to parallelize the distributed search. This figure depicts the general idea. 8 | 9 |

10 | 11 |

12 | 13 | ## Starting the server 14 | 15 | We uniformly sampled a few million networks from the NASNet search space, and pre-built search space in the file of "search_space". The server loads the file, and search the networks within the file. Feel free to change this to a random generator and merge with this branch. 16 | 17 | Here are the steps to start: 18 | 1. go to server folder, unzip search_space.zip. 19 | 2. ifconfig get your ip address 20 | 3. you need change the line 212 in MCTS.py 21 | ``` 22 | address = ('XXX.XX.XX.XXX', 8000), # replace XX to your ip address, and change to different ports if 8000 does not work. 23 | ``` 24 | 4. To start the server, ``` python MCTS.py & ```. 25 | 26 | ## Starting the clients 27 | Each client folder corresponds to a GPU; you can create as many clients folder as you want, simply copy and paste. 28 | 29 | Once the server starts running, here is what you need to start clients. 30 | 1. go to client folder, open client.py 31 | 2. change line 20, line 71, line 109 to the server's ip address. 32 | 3. set to a unused GPU 33 | 4. python client.py 34 | 35 | If you have 500 GPUs, create 500 folders, and repeat the above process 500x. ;) 36 | 37 | ## Collecting the results 38 | We write a script collect_results.py to collect all the results in client folders. Once it creates total_trace.json (we also uploaded the total trace collected from our experiments), you can read the results by ``` python read_results.py```, and the results are ranked backward, i.e. the last row is the best. 39 | 40 | Here is the snapshot of best architectures found in our distribtued search. 41 |

42 | 43 |

44 | 45 | The last column is the test accuracy after training each networks for 200 epochs. We assume the best network is the one with the best test accuracy. 46 | 47 | ## Training the top model 48 | You can train the best "searched" network using the training pipeline here. 49 | 50 | ## Fault Tolerance 51 | Fault tolerance is very important if you will use hundreds of GPUs. We have already taken care of it in the current implementation. 52 | 53 | On the server side, it will dump the pickled current state at every search iteration in the file named "mcts_agent". You can resume the searching with that state. The MCTS.py will find mcts_agent in the current folder. If your server got preempted, simply python MCTS.py again. 54 | 55 | On the client side, it will dump the training state, and resume the training if a job was preempted in the middle of training. To restart a client, python client.py. That's it. ;) 56 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/clientX/client.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import time 7 | import json 8 | import random 9 | import sys 10 | import os 11 | import pickle 12 | import socket 13 | import signal 14 | import numpy as np 15 | import traceback 16 | import train_client 17 | from nasnet_set import * 18 | from array import array 19 | from multiprocessing.connection import Client 20 | 21 | 22 | class Client_t: 23 | 24 | def __init__(self): 25 | self.addr = ('100.97.66.131', 8000) 26 | self.client_name = "client" 27 | self.total_send = 0 28 | self.total_recv = 0 29 | self.accuracy_trace = {} 30 | self.load_acc_trace() 31 | signal.signal(signal.SIGUSR1, self.sig_handler) 32 | signal.signal(signal.SIGTERM, self.term_handler) 33 | #below two is to signal the client status 34 | self.received = False 35 | self.network = [] 36 | self.acc = 0 37 | 38 | def print_client_status(self): 39 | print("client->receive status: ", client.received ) 40 | print("client->network: ", client.network ) 41 | print("client->acc: ", client.acc ) 42 | print("client->trace_len:", len(client.accuracy_trace) ) 43 | 44 | 45 | def sig_handler(self, signum, frame): 46 | print("caught signal", signum," about to exit, dump client") 47 | self.dump_client() 48 | if os.path.isfile('client.inst'): 49 | print("dump successful") 50 | 51 | def term_handler(self, signum, frame): 52 | self.dump_client() 53 | if os.path.isfile('client.inst'): 54 | print("dump successful") 55 | print("terminated caught", flush=True) 56 | 57 | def dump_client(self): 58 | client_path = 'client.inst' 59 | with open(client_path,"wb") as outfile: 60 | pickle.dump(self, outfile) 61 | 62 | def dump_acc_trace(self): 63 | with open('acc_trace.json', 'w') as fp: 64 | json.dump(self.accuracy_trace, fp) 65 | 66 | def load_acc_trace(self): 67 | if os.path.isfile('acc_trace.json'): 68 | with open('acc_trace.json') as fp: 69 | self.accuracy_trace = json.load(fp) 70 | print("loading #", len(self.accuracy_trace )," prev trained networks") 71 | 72 | def train(self): 73 | while True: 74 | while not self.received: 75 | try: 76 | send_address = ('100.97.66.131', 8000) 77 | conn = Client(send_address, authkey=b'nasnet') 78 | if conn.poll(2): 79 | [ self.network ] = conn.recv() 80 | self.total_recv += 1 81 | conn.close() 82 | self.received = True 83 | self.dump_client() 84 | print("RECEIEVE:=>", self.network) 85 | print("RECEIEVE:=>", " total_send:", self.total_send, " total_recv:", self.total_recv) 86 | self.print_client_status() 87 | except Exception as e: 88 | print(e) 89 | print(traceback.format_exc()) 90 | print("client recv error") 91 | 92 | if self.received: 93 | print("prepare training the network:", self.network) 94 | network = np.array( self.network, dtype = 'int' ) 95 | network = network.tolist() 96 | net = gen_code_from_list( network, node_num=7 ) #TODO: change it to 7 97 | net_str = json.dumps( network ) 98 | if net_str in self.accuracy_trace: 99 | self.acc = self.accuracy_trace[net_str] 100 | else: 101 | genotype_net = translator([net, net], max_node=7) #TODO: change it to 7 102 | print("--"*15) 103 | print(genotype_net) 104 | print("training the above network") 105 | print("--"*15) 106 | self.acc = train_client.run(genotype_net, epochs=600, batch_size=200) 107 | self.accuracy_trace[net_str] = self.acc 108 | self.dump_acc_trace() 109 | 110 | #TODO: train the actual network 111 | #time.sleep(random.randint(2, 5) ) 112 | while self.received: 113 | try: 114 | recv_address = ('100.97.66.131', 8000) 115 | conn = Client(recv_address, authkey=b'nasnet') 116 | network_str = json.dumps( np.array(network).tolist() ) 117 | conn.send([self.client_name, network_str, self.acc]) 118 | self.total_send += 1 119 | print("SEND:=>", self.network, self.acc) 120 | self.network = [] 121 | self.acc = 0 122 | self.received = False 123 | self.dump_client() 124 | print("SEND:=>", " total_send:", self.total_send, " total_recv:", self.total_recv) 125 | conn.close() 126 | except Exception as e: 127 | print(e) 128 | print(traceback.format_exc()) 129 | print("client send error, reconnecting") 130 | 131 | inst_path = 'client.inst' 132 | if os.path.isfile( inst_path ) == True: 133 | with open(inst_path, 'rb') as client_data: 134 | client = pickle.load( client_data ) 135 | client.print_client_status() 136 | client.train() 137 | else: 138 | client = Client_t() 139 | client.train() 140 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/clientX/continue_train.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import os 7 | import sys 8 | import time 9 | import glob 10 | import numpy as np 11 | import torch 12 | import utils 13 | import logging 14 | import argparse 15 | import torch.nn as nn 16 | import genotypes 17 | import torch.utils 18 | import torchvision.datasets as dset 19 | import torch.backends.cudnn as cudnn 20 | 21 | from model import NetworkCIFAR as Network 22 | 23 | parser = argparse.ArgumentParser("cifar10") 24 | parser.add_argument('--data', type=str, default='../data', help='location of the data corpus') 25 | parser.add_argument('--batch_size', type=int, default=96, help='batch size') 26 | parser.add_argument('--lr', type=float, default=0.025, help='init learning rate') 27 | parser.add_argument('--momentum', type=float, default=0.9, help='momentum') 28 | parser.add_argument('--wd', type=float, default=3e-4, help='weight decay') 29 | parser.add_argument('--report_freq', type=float, default=50, help='report frequency') 30 | parser.add_argument('--gpu', type=int, default=0, help='gpu device id') 31 | parser.add_argument('--epochs', type=int, default=600, help='num of training epochs') 32 | parser.add_argument('--init_ch', type=int, default=36, help='num of init channels') 33 | parser.add_argument('--layers', type=int, default=20, help='total number of layers') 34 | parser.add_argument('--model_path', type=str, default='saved_models', help='path to save the model') 35 | parser.add_argument('--auxiliary', action='store_true', default=False, help='use auxiliary tower') 36 | parser.add_argument('--auxiliary_weight', type=float, default=0.4, help='weight for auxiliary loss') 37 | parser.add_argument('--cutout', action='store_true', default=False, help='use cutout') 38 | parser.add_argument('--cutout_length', type=int, default=16, help='cutout length') 39 | parser.add_argument('--drop_path_prob', type=float, default=0.2, help='drop path probability') 40 | parser.add_argument('--exp_path', type=str, default='exp/cifar10', help='experiment name') 41 | parser.add_argument('--seed', type=int, default=0, help='random seed') 42 | parser.add_argument('--arch', type=str, default='DARTS', help='which architecture to use') 43 | parser.add_argument('--grad_clip', type=float, default=5, help='gradient clipping') 44 | parser.add_argument('--cur_epoch', type=int, default=0, help='num of training epochs') 45 | parser.add_argument('--save', type=str, default='EXP', help='experiment name') 46 | 47 | args = parser.parse_args() 48 | 49 | args.save = 'eval-{}-{}-{}'.format(args.arch, time.strftime("%Y%m%d-%H%M%S"), args.init_ch) 50 | utils.create_exp_dir(args.save, scripts_to_save=glob.glob('*.py')) 51 | 52 | log_format = '%(asctime)s %(message)s' 53 | logging.basicConfig(stream=sys.stdout, level=logging.INFO, 54 | format=log_format, datefmt='%m/%d %I:%M:%S %p') 55 | fh = logging.FileHandler(os.path.join(args.save, 'log.txt')) 56 | fh.setFormatter(logging.Formatter(log_format)) 57 | logging.getLogger().addHandler(fh) 58 | 59 | 60 | 61 | def main(): 62 | 63 | 64 | np.random.seed(args.seed) 65 | torch.cuda.set_device(args.gpu) 66 | cudnn.benchmark = True 67 | cudnn.enabled = True 68 | torch.manual_seed(args.seed) 69 | logging.info('gpu device = %d' % args.gpu) 70 | logging.info("args = %s", args) 71 | 72 | genotype = eval("genotypes.%s" % args.arch) 73 | 74 | # model = Network(args.init_ch, 10, args.layers, args.auxiliary, genotype).cuda() 75 | model = torch.load(os.path.join(args.model_path, 'model.pt')) 76 | 77 | logging.info("param size = %fMB", utils.count_parameters_in_MB(model)) 78 | 79 | criterion = nn.CrossEntropyLoss().cuda() 80 | optimizer = torch.optim.SGD( 81 | model.parameters(), 82 | args.lr, 83 | momentum=args.momentum, 84 | weight_decay=args.wd 85 | ) 86 | 87 | train_transform, valid_transform = utils._data_transforms_cifar10(args) 88 | train_data = dset.CIFAR10(root=args.data, train=True, download=True, transform=train_transform) 89 | valid_data = dset.CIFAR10(root=args.data, train=False, download=True, transform=valid_transform) 90 | 91 | train_queue = torch.utils.data.DataLoader( 92 | train_data, batch_size=args.batch_size, shuffle=True, pin_memory=True, num_workers=2) 93 | 94 | valid_queue = torch.utils.data.DataLoader( 95 | valid_data, batch_size=args.batch_size, shuffle=False, pin_memory=True, num_workers=2) 96 | 97 | scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, float(args.epochs)) 98 | 99 | best_acc = 0.0 100 | 101 | for i in range(args.cur_epoch): 102 | scheduler.step() 103 | 104 | for epoch in range(args.cur_epoch, args.epochs): 105 | scheduler.step() 106 | logging.info('epoch %d lr %e', epoch, scheduler.get_lr()[0]) 107 | model.drop_path_prob = args.drop_path_prob * epoch / args.epochs 108 | 109 | 110 | 111 | valid_acc, valid_obj = infer(valid_queue, model, criterion) 112 | logging.info('valid_acc: %f', valid_acc) 113 | 114 | if valid_acc > best_acc: 115 | best_acc = valid_acc 116 | print('this model is the best') 117 | torch.save(model, os.path.join(args.save, 'AlphaX_1.pt')) 118 | 119 | torch.save(model, os.path.join(args.save, 'trained.pt')) 120 | print('current best acc is', best_acc) 121 | 122 | 123 | train_acc, train_obj = train(train_queue, model, criterion, optimizer) 124 | logging.info('train_acc: %f', train_acc) 125 | 126 | 127 | 128 | # utils.save(model, os.path.join(args.save, 'trained.pt')) 129 | print('saved to: trained.pt') 130 | 131 | 132 | def train(train_queue, model, criterion, optimizer): 133 | 134 | objs = utils.AverageMeter() 135 | top1 = utils.AverageMeter() 136 | top5 = utils.AverageMeter() 137 | model.train() 138 | 139 | for step, (x, target) in enumerate(train_queue): 140 | x = x.cuda() 141 | target = target.cuda(non_blocking=True) 142 | 143 | optimizer.zero_grad() 144 | logits, logits_aux = model(x) 145 | loss = criterion(logits, target) 146 | if args.auxiliary: 147 | loss_aux = criterion(logits_aux, target) 148 | loss += args.auxiliary_weight * loss_aux 149 | loss.backward() 150 | nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip) 151 | optimizer.step() 152 | 153 | prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5)) 154 | n = x.size(0) 155 | objs.update(loss.item(), n) 156 | top1.update(prec1.item(), n) 157 | top5.update(prec5.item(), n) 158 | 159 | if step % args.report_freq == 0: 160 | logging.info('train %03d %e %f %f', step, objs.avg, top1.avg, top5.avg) 161 | 162 | return top1.avg, objs.avg 163 | 164 | 165 | def infer(valid_queue, model, criterion): 166 | 167 | objs = utils.AverageMeter() 168 | top1 = utils.AverageMeter() 169 | top5 = utils.AverageMeter() 170 | model.eval() 171 | 172 | for step, (x, target) in enumerate(valid_queue): 173 | x = x.cuda() 174 | target = target.cuda(non_blocking=True) 175 | 176 | with torch.no_grad(): 177 | logits, _ = model(x) 178 | loss = criterion(logits, target) 179 | 180 | prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5)) 181 | n = x.size(0) 182 | objs.update(loss.item(), n) 183 | top1.update(prec1.item(), n) 184 | top5.update(prec5.item(), n) 185 | 186 | if step % args.report_freq == 0: 187 | logging.info('>>Validation: %03d %e %f %f', step, objs.avg, top1.avg, top5.avg) 188 | 189 | return top1.avg, objs.avg 190 | 191 | 192 | if __name__ == '__main__': 193 | main() 194 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/clientX/model.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | from operations import * 9 | 10 | 11 | class Cell(nn.Module): 12 | 13 | def __init__(self, genotype, C_prev_prev, C_prev, C, reduction, reduction_prev): 14 | 15 | super(Cell, self).__init__() 16 | 17 | print(C_prev_prev, C_prev, C) 18 | 19 | if reduction_prev: 20 | self.preprocess0 = FactorizedReduce(C_prev_prev, C) 21 | else: 22 | self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0) 23 | self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0) 24 | 25 | if reduction: 26 | op_names, indices = zip(*genotype.reduce) 27 | concat = genotype.reduce_concat 28 | else: 29 | op_names, indices = zip(*genotype.normal) 30 | concat = genotype.normal_concat 31 | self._compile(C, op_names, indices, concat, reduction) 32 | 33 | def _compile(self, C, op_names, indices, concat, reduction): 34 | 35 | assert len(op_names) == len(indices) 36 | 37 | self._steps = len(op_names) // 2 38 | self._concat = concat 39 | self.multiplier = len(concat) 40 | 41 | self._ops = nn.ModuleList() 42 | for name, index in zip(op_names, indices): 43 | stride = 2 if reduction and index < 2 else 1 44 | op = OPS[name](C, stride, True) 45 | self._ops += [op] 46 | self._indices = indices 47 | 48 | def forward(self, s0, s1, drop_prob): 49 | 50 | s0 = self.preprocess0(s0) 51 | s1 = self.preprocess1(s1) 52 | 53 | states = [s0, s1] 54 | for i in range(self._steps): 55 | h1 = states[self._indices[2 * i]] 56 | h2 = states[self._indices[2 * i + 1]] 57 | op1 = self._ops[2 * i] 58 | op2 = self._ops[2 * i + 1] 59 | h1 = op1(h1) 60 | h2 = op2(h2) 61 | 62 | if self.training and drop_prob > 0.: 63 | if not isinstance(op1, Identity): 64 | h1 = drop_path(h1, drop_prob) 65 | if not isinstance(op2, Identity): 66 | h2 = drop_path(h2, drop_prob) 67 | s = h1 + h2 68 | states += [s] 69 | return torch.cat([states[i] for i in self._concat], dim=1) 70 | 71 | def drop_path(x, drop_prob): 72 | if drop_prob > 0.: 73 | keep_prob = 1. - drop_prob 74 | 75 | mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 76 | x.div_(keep_prob) 77 | try: 78 | x.mul_(mask) 79 | except: 80 | mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 81 | x.mul_(mask) 82 | return x 83 | 84 | 85 | 86 | 87 | class AuxiliaryHeadCIFAR(nn.Module): 88 | 89 | def __init__(self, C, num_classes): 90 | """assuming input size 8x8""" 91 | super(AuxiliaryHeadCIFAR, self).__init__() 92 | 93 | self.features = nn.Sequential( 94 | nn.ReLU(inplace=True), 95 | nn.AvgPool2d(5, stride=3, padding=0, count_include_pad=False), # image size = 2 x 2 96 | nn.Conv2d(C, 128, 1, bias=False), 97 | nn.BatchNorm2d(128), 98 | nn.ReLU(inplace=True), 99 | nn.Conv2d(128, 768, 2, bias=False), 100 | nn.BatchNorm2d(768), 101 | nn.ReLU(inplace=True) 102 | ) 103 | self.classifier = nn.Linear(768, num_classes) 104 | 105 | def forward(self, x): 106 | x = self.features(x) 107 | x = self.classifier(x.view(x.size(0), -1)) 108 | return x 109 | 110 | 111 | class NetworkCIFAR(nn.Module): 112 | 113 | def __init__(self, C, num_classes, layers, auxiliary, genotype): 114 | super(NetworkCIFAR, self).__init__() 115 | 116 | self._layers = layers 117 | self._auxiliary = auxiliary 118 | 119 | stem_multiplier = 3 120 | C_curr = stem_multiplier * C 121 | self.stem = nn.Sequential( 122 | nn.Conv2d(3, C_curr, 3, padding=1, bias=False), 123 | nn.BatchNorm2d(C_curr) 124 | ) 125 | 126 | C_prev_prev, C_prev, C_curr = C_curr, C_curr, C 127 | self.cells = nn.ModuleList() 128 | reduction_prev = False 129 | for i in range(layers): 130 | if i in [layers // 3, 2 * layers // 3]: 131 | C_curr *= 2 132 | reduction = True 133 | else: 134 | reduction = False 135 | cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction, reduction_prev) 136 | reduction_prev = reduction 137 | self.cells += [cell] 138 | C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr 139 | if i == 2 * layers // 3: 140 | C_to_auxiliary = C_prev 141 | 142 | if auxiliary: 143 | self.auxiliary_head = AuxiliaryHeadCIFAR(C_to_auxiliary, num_classes) 144 | self.global_pooling = nn.AdaptiveAvgPool2d(1) 145 | self.classifier = nn.Linear(C_prev, num_classes) 146 | 147 | def forward(self, input): 148 | logits_aux = None 149 | s0 = s1 = self.stem(input) 150 | for i, cell in enumerate(self.cells): 151 | s0, s1 = s1, cell(s0, s1, self.drop_path_prob) 152 | if i == 2 * self._layers // 3: 153 | if self._auxiliary and self.training: 154 | logits_aux = self.auxiliary_head(s1) 155 | out = self.global_pooling(s1) 156 | logits = self.classifier(out.view(out.size(0), -1)) 157 | return logits, logits_aux 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/clientX/operations.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | # from torch import functional as F 9 | import torch.nn.functional as F 10 | 11 | OPS = { 12 | 'avg_pool_3x3': lambda C, stride, affine: nn.AvgPool2d(3, stride=stride, padding=1, count_include_pad=False), 13 | 'max_pool_2x2' : lambda C, stride, affine: nn.MaxPool2d(2, stride=stride, padding=0), 14 | 'max_pool_3x3': lambda C, stride, affine: nn.MaxPool2d(3, stride=stride, padding=1), 15 | 'max_pool_5x5': lambda C, stride, affine: nn.MaxPool2d(5, stride=stride, padding=2), 16 | 'skip_connect': lambda C, stride, affine: Identity() if stride == 1 else FactorizedReduce(C, C, affine=affine), 17 | 'sep_conv_3x3': lambda C, stride, affine: SepConv(C, C, 3, stride, 1, affine=affine), 18 | 'sep_conv_5x5': lambda C, stride, affine: SepConv(C, C, 5, stride, 2, affine=affine), 19 | 'sep_conv_7x7': lambda C, stride, affine: SepConv(C, C, 7, stride, 3, affine=affine), 20 | 'dil_conv_3x3' : lambda C, stride, affine: DilConv(C, C, 3, stride, 2, 2, affine=affine), 21 | 'dil_conv_5x5' : lambda C, stride, affine: DilConv(C, C, 5, stride, 4, 2, affine=affine), 22 | 'conv_1x1' : lambda C, stride, affine: nn.Conv2d(C, C, (1,1), stride=(stride, stride), padding=(0,0), bias=False), 23 | 'conv_3x3' : lambda C, stride, affine: nn.Conv2d(C, C, (3,3), stride=(stride, stride), padding=(1,1), bias=False), 24 | 'conv_5x5' : lambda C, stride, affine: nn.Conv2d(C, C, (5,5), stride=(stride, stride), padding=(2,2), bias=False), 25 | } 26 | 27 | 28 | class ReLUConvBN(nn.Module): 29 | def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True): 30 | 31 | super(ReLUConvBN, self).__init__() 32 | 33 | self.op = nn.Sequential( 34 | nn.ReLU(inplace=False), 35 | Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False), 36 | nn.BatchNorm2d(C_out, affine=affine) 37 | ) 38 | 39 | def forward(self, x): 40 | return self.op(x) 41 | 42 | class Conv2d(nn.Conv2d): 43 | 44 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, 45 | padding=0, dilation=1, groups=1, bias=True): 46 | super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride, 47 | padding, dilation, groups, bias) 48 | 49 | def forward(self, x): 50 | weight = self.weight 51 | weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2, 52 | keepdim=True).mean(dim=3, keepdim=True) 53 | weight = weight - weight_mean 54 | std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5 55 | weight = weight / std.expand_as(weight) 56 | return F.conv2d(x, weight, self.bias, self.stride, 57 | self.padding, self.dilation, self.groups) 58 | 59 | 60 | class SepConv(nn.Module): 61 | 62 | def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True): 63 | super(SepConv, self).__init__() 64 | 65 | self.op = nn.Sequential( 66 | nn.ReLU(inplace=False), 67 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, 68 | groups=C_in, bias=False), 69 | Conv2d(C_in, C_in, kernel_size=1, padding=0, bias=False), 70 | nn.BatchNorm2d(C_in, affine=affine), 71 | nn.ReLU(inplace=False), 72 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=1, padding=padding, 73 | groups=C_in, bias=False), 74 | Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False), 75 | nn.BatchNorm2d(C_out, affine=affine), 76 | ) 77 | 78 | def forward(self, x): 79 | return self.op(x) 80 | 81 | 82 | class Identity(nn.Module): 83 | 84 | def __init__(self): 85 | super(Identity, self).__init__() 86 | 87 | def forward(self, x): 88 | return x 89 | 90 | 91 | class DilConv(nn.Module): 92 | def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True): 93 | super(DilConv, self).__init__() 94 | self.op = nn.Sequential( 95 | nn.ReLU(inplace=False), 96 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, 97 | groups=C_in, bias=False), 98 | Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False), 99 | nn.BatchNorm2d(C_out, affine=affine), 100 | ) 101 | 102 | def forward(self, x): 103 | return self.op(x) 104 | 105 | 106 | class FactorizedReduce(nn.Module): 107 | 108 | def __init__(self, C_in, C_out, affine=True): 109 | 110 | super(FactorizedReduce, self).__init__() 111 | 112 | assert C_out % 2 == 0 113 | 114 | self.relu = nn.ReLU(inplace=False) 115 | self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 116 | self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 117 | self.bn = nn.BatchNorm2d(C_out, affine=affine) 118 | 119 | def forward(self, x): 120 | x = self.relu(x) 121 | 122 | # x: torch.Size([32, 32, 32, 32]) 123 | # conv1: [b, c_out//2, d//2, d//2] 124 | # conv2: [] 125 | # out: torch.Size([32, 32, 16, 16]) 126 | 127 | out = torch.cat([self.conv_1(x), self.conv_2(x[:, :, 1:, 1:])], dim=1) 128 | out = self.bn(out) 129 | return out 130 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/clientX/train_client.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import os 7 | import sys 8 | import time 9 | import glob 10 | import numpy as np 11 | import torch 12 | import utils 13 | import logging 14 | import argparse 15 | import torch.nn as nn 16 | import torch.utils 17 | import torchvision.datasets as dset 18 | import torch.backends.cudnn as cudnn 19 | import json 20 | import hashlib 21 | import apex 22 | 23 | 24 | from model import NetworkCIFAR as Network 25 | 26 | 27 | 28 | def run(net, init_ch=32, layers=20, auxiliary=True, lr=0.025, momentum=0.9, wd=3e-4, cutout=True, cutout_length=16, data='../data', batch_size=96, epochs=600, drop_path_prob=0.2, auxiliary_weight=0.4): 29 | save = '/checkpoint/linnanwang/nasnet/' + hashlib.md5(json.dumps(net).encode()).hexdigest() 30 | utils.create_exp_dir(save, scripts_to_save=glob.glob('*.py')) 31 | 32 | log_format = '%(asctime)s %(message)s' 33 | logging.basicConfig(stream=sys.stdout, level=logging.INFO, 34 | format=log_format, datefmt='%m/%d %I:%M:%S %p') 35 | fh = logging.FileHandler(os.path.join(save, 'log.txt')) 36 | fh.setFormatter(logging.Formatter(log_format)) 37 | logging.getLogger().addHandler(fh) 38 | 39 | 40 | np.random.seed(0) 41 | torch.cuda.set_device(0) 42 | cudnn.benchmark = True 43 | cudnn.enabled = True 44 | torch.manual_seed(0) 45 | logging.info('gpu device = %d' % 0) 46 | # logging.info("args = %s", args) 47 | 48 | 49 | genotype = net 50 | model = Network(init_ch, 10, layers, auxiliary, genotype).cuda() 51 | 52 | logging.info("param size = %fMB", utils.count_parameters_in_MB(model)) 53 | 54 | criterion = nn.CrossEntropyLoss().cuda() 55 | optimizer = torch.optim.SGD( 56 | model.parameters(), 57 | lr, 58 | momentum=momentum, 59 | weight_decay=wd 60 | ) 61 | model, optimizer = apex.amp.initialize(model, optimizer, opt_level="O3") 62 | 63 | 64 | 65 | train_transform, valid_transform = utils._data_transforms_cifar10(cutout, cutout_length) 66 | train_data = dset.CIFAR10(root=data, train=True, download=True, transform=train_transform) 67 | valid_data = dset.CIFAR10(root=data, train=False, download=True, transform=valid_transform) 68 | 69 | train_queue = torch.utils.data.DataLoader( 70 | train_data, batch_size=batch_size, shuffle=True, pin_memory=True, num_workers=2) 71 | 72 | valid_queue = torch.utils.data.DataLoader( 73 | valid_data, batch_size=batch_size, shuffle=False, pin_memory=True, num_workers=2) 74 | 75 | scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, float(epochs)) 76 | 77 | best_acc = 0.0 78 | 79 | for epoch in range(epochs): 80 | scheduler.step() 81 | logging.info('epoch %d lr %e', epoch, scheduler.get_lr()[0]) 82 | model.drop_path_prob = drop_path_prob * epoch / epochs 83 | 84 | 85 | train_acc, train_obj = train(train_queue, model, criterion, optimizer, auxiliary=auxiliary, auxiliary_weight=auxiliary_weight) 86 | logging.info('train_acc: %f', train_acc) 87 | 88 | valid_acc, valid_obj = infer(valid_queue, model, criterion) 89 | logging.info('valid_acc: %f', valid_acc) 90 | 91 | if valid_acc > best_acc and epoch >= 50: 92 | print('this model is the best') 93 | torch.save(model.state_dict(), os.path.join(save, 'model.pt')) 94 | if valid_acc > best_acc: 95 | best_acc = valid_acc 96 | print('current best acc is', best_acc) 97 | 98 | if epoch == 100: 99 | break 100 | 101 | # utils.save(model, os.path.join(args.save, 'trained.pt')) 102 | print('saved to: model.pt') 103 | 104 | return best_acc 105 | 106 | 107 | def train(train_queue, model, criterion, optimizer, auxiliary=True, auxiliary_weight=0.4, grad_clip=float(5), report_freq=float(50)): 108 | 109 | objs = utils.AverageMeter() 110 | top1 = utils.AverageMeter() 111 | top5 = utils.AverageMeter() 112 | model.train() 113 | 114 | for step, (x, target) in enumerate(train_queue): 115 | x = x.cuda() 116 | target = target.cuda(non_blocking=True) 117 | 118 | optimizer.zero_grad() 119 | logits, logits_aux = model(x) 120 | loss = criterion(logits, target) 121 | if auxiliary: 122 | loss_aux = criterion(logits_aux, target) 123 | loss += auxiliary_weight * loss_aux 124 | loss.backward() 125 | nn.utils.clip_grad_norm_(model.parameters(), grad_clip) 126 | optimizer.step() 127 | 128 | prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5)) 129 | n = x.size(0) 130 | objs.update(loss.item(), n) 131 | top1.update(prec1.item(), n) 132 | top5.update(prec5.item(), n) 133 | 134 | if step % report_freq == 0: 135 | logging.info('train %03d %e %f %f', step, objs.avg, top1.avg, top5.avg) 136 | 137 | return top1.avg, objs.avg 138 | 139 | 140 | def infer(valid_queue, model, criterion, report_freq=float(50)): 141 | 142 | objs = utils.AverageMeter() 143 | top1 = utils.AverageMeter() 144 | top5 = utils.AverageMeter() 145 | model.eval() 146 | 147 | for step, (x, target) in enumerate(valid_queue): 148 | x = x.cuda() 149 | target = target.cuda(non_blocking=True) 150 | 151 | with torch.no_grad(): 152 | logits, _ = model(x) 153 | loss = criterion(logits, target) 154 | 155 | prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5)) 156 | n = x.size(0) 157 | objs.update(loss.item(), n) 158 | top1.update(prec1.item(), n) 159 | top5.update(prec5.item(), n) 160 | 161 | if step % report_freq == 0: 162 | logging.info('>>Validation: %03d %e %f %f', step, objs.avg, top1.avg, top5.avg) 163 | 164 | return top1.avg, objs.avg 165 | 166 | 167 | 168 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/clientX/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import os 7 | import numpy as np 8 | import torch 9 | import shutil 10 | import torchvision.transforms as transforms 11 | 12 | 13 | class AverageMeter: 14 | 15 | def __init__(self): 16 | self.reset() 17 | 18 | def reset(self): 19 | self.avg = 0 20 | self.sum = 0 21 | self.cnt = 0 22 | 23 | def update(self, val, n=1): 24 | self.sum += val * n 25 | self.cnt += n 26 | self.avg = self.sum / self.cnt 27 | 28 | 29 | def accuracy(output, target, topk=(1,)): 30 | """ 31 | 32 | :param output: logits, [b, classes] 33 | :param target: [b] 34 | :param topk: 35 | :return: 36 | """ 37 | maxk = max(topk) 38 | batch_size = target.size(0) 39 | 40 | _, pred = output.topk(maxk, 1, True, True) 41 | pred = pred.t() 42 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 43 | 44 | res = [] 45 | for k in topk: 46 | correct_k = correct[:k].view(-1).float().sum(0) 47 | res.append(correct_k.mul_(100.0 / batch_size)) 48 | 49 | return res 50 | 51 | 52 | class Cutout: 53 | def __init__(self, length): 54 | self.length = length 55 | 56 | def __call__(self, img): 57 | h, w = img.size(1), img.size(2) 58 | mask = np.ones((h, w), np.float32) 59 | y = np.random.randint(h) 60 | x = np.random.randint(w) 61 | 62 | y1 = np.clip(y - self.length // 2, 0, h) 63 | y2 = np.clip(y + self.length // 2, 0, h) 64 | x1 = np.clip(x - self.length // 2, 0, w) 65 | x2 = np.clip(x + self.length // 2, 0, w) 66 | 67 | mask[y1: y2, x1: x2] = 0. 68 | mask = torch.from_numpy(mask) 69 | mask = mask.expand_as(img) 70 | img *= mask 71 | return img 72 | 73 | 74 | def _data_transforms_cifar10(cutout, cutout_length): 75 | """ 76 | 77 | :param args: 78 | :return: 79 | """ 80 | CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124] 81 | CIFAR_STD = [0.24703233, 0.24348505, 0.26158768] 82 | 83 | train_transform = transforms.Compose([ 84 | transforms.RandomCrop(32, padding=4), 85 | transforms.RandomHorizontalFlip(), 86 | transforms.ToTensor(), 87 | transforms.Normalize(CIFAR_MEAN, CIFAR_STD), 88 | ]) 89 | if cutout: 90 | train_transform.transforms.append(Cutout(cutout_length)) 91 | 92 | valid_transform = transforms.Compose([ 93 | transforms.ToTensor(), 94 | transforms.Normalize(CIFAR_MEAN, CIFAR_STD), 95 | ]) 96 | return train_transform, valid_transform 97 | 98 | 99 | def count_parameters_in_MB(model): 100 | """ 101 | count all parameters excluding auxiliary 102 | :param model: 103 | :return: 104 | """ 105 | return np.sum(v.numel() for name, v in model.named_parameters() if "auxiliary" not in name) / 1e6 106 | 107 | 108 | def save_checkpoint(state, is_best, save): 109 | filename = os.path.join(save, 'checkpoint.pth.tar') 110 | torch.save(state, filename) 111 | if is_best: 112 | best_filename = os.path.join(save, 'model_best.pth.tar') 113 | shutil.copyfile(filename, best_filename) 114 | 115 | 116 | def save(model, model_path): 117 | print('saved to model:', model_path) 118 | torch.save(model.state_dict(), model_path) 119 | 120 | 121 | def load(model, model_path): 122 | print('load from model:', model_path) 123 | model.load_state_dict(torch.load(model_path)) 124 | 125 | 126 | def drop_path(x, drop_prob): 127 | if drop_prob > 0.: 128 | keep_prob = 1. - drop_prob 129 | 130 | mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 131 | x.div_(keep_prob) 132 | try: 133 | x.mul_(mask) 134 | except: 135 | mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 136 | x.mul_(mask) 137 | return x 138 | 139 | 140 | def create_exp_dir(path, scripts_to_save=None): 141 | if not os.path.exists(path): 142 | os.mkdir(path) 143 | print('Experiment dir : {}'.format(path)) 144 | 145 | if scripts_to_save is not None: 146 | if not os.path.exists(os.path.join(path, 'scripts')): 147 | os.mkdir(os.path.join(path, 'scripts')) 148 | for script in scripts_to_save: 149 | dst_file = os.path.join(path, 'scripts', os.path.basename(script)) 150 | shutil.copyfile(script, dst_file) 151 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/collect_results.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import json 7 | import os 8 | import operator 9 | 10 | total_trace = {} 11 | for i in range(1, 800): 12 | path = 'client' + str(i) + '/' + 'acc_trace.json' 13 | if os.path.exists(path): 14 | with open(path, 'r') as json_data: 15 | data = json.load(json_data) 16 | for k, v in data.items(): 17 | total_trace[k] = v 18 | 19 | with open('total_trace.json', 'w') as outfile: 20 | json.dump(total_trace, outfile) 21 | print("total element:", len(total_trace) ) 22 | 23 | 24 | 25 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/launch_clients.sh: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | 7 | for (( c=1; c < 600; c++ )) 8 | do 9 | echo "---------------------------------" 10 | echo $PWD 11 | cd "client$c" 12 | echo $PWD 13 | screen -S client -d -m srun --gres=gpu:1 --time=24:00:00 --cpus-per-task=1 python client.py 14 | cd ".." 15 | echo "$PWD" 16 | done 17 | 18 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/read_results.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import json 7 | import os 8 | import operator 9 | 10 | data = {} 11 | 12 | with open('total_trace.json') as json_file: 13 | data = json.load(json_file) 14 | 15 | 16 | sorted_trace = {} 17 | sorted_trace = sorted(data.items(), key=operator.itemgetter(1)) 18 | for k,v in sorted_trace: 19 | print(k, v) 20 | 21 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/server/Classifier.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | from torch.autograd import Variable 9 | import json 10 | from torch import optim 11 | import numpy as np 12 | 13 | # this is the backbone model 14 | # to split networks at a MCTS state 15 | class LinearModel(nn.Module): 16 | 17 | def __init__(self, input_dim, output_dim): 18 | super(LinearModel, self).__init__() 19 | self.fc1 = nn.Linear(input_dim, output_dim) 20 | torch.nn.init.xavier_uniform_( self.fc1.weight ) 21 | 22 | def forward(self, x): 23 | y = self.fc1(x) 24 | return y 25 | 26 | # the input will be samples! 27 | class Classifier(): 28 | def __init__(self, samples, input_dim): 29 | self.training_counter = 0 30 | assert input_dim >= 1 31 | assert type(samples) == type({}) 32 | self.input_dim = input_dim 33 | self.samples = samples 34 | self.model = LinearModel(input_dim, 1) 35 | if torch.cuda.is_available(): 36 | self.model.cuda() 37 | self.l_rate = 0.00001 38 | self.optimiser = optim.Adam(self.model.parameters(), lr=self.l_rate, betas=(0.9, 0.999), eps=1e-08) 39 | self.epochs = 10000 40 | self.boundary = -1 41 | self.nets = [] 42 | 43 | 44 | def reinit(self): 45 | torch.nn.init.xavier_uniform_( self.m.fc1.weight ) 46 | torch.nn.init.xavier_uniform_( self.m.fc2.weight ) 47 | 48 | def update_samples(self, latest_samples): 49 | assert type(latest_samples) == type(self.samples) 50 | sampled_nets = [] 51 | nets_acc = [] 52 | for k, v in latest_samples.items(): 53 | net = json.loads(k) 54 | sampled_nets.append( net ) 55 | nets_acc.append( v ) 56 | self.nets = torch.from_numpy(np.asarray(sampled_nets, dtype=np.float32).reshape(-1, self.input_dim)) 57 | self.acc = torch.from_numpy(np.asarray(nets_acc, dtype=np.float32).reshape(-1, 1)) 58 | self.samples = latest_samples 59 | if torch.cuda.is_available(): 60 | self.nets = self.nets.cuda() 61 | self.acc = self.acc.cuda() 62 | 63 | def train(self): 64 | if self.training_counter == 0: 65 | self.epochs = 20000 66 | else: 67 | self.epochs = 3000 68 | self.training_counter += 1 69 | # in a rare case, one branch has no networks 70 | if len(self.nets) == 0: 71 | return 72 | for epoch in range(self.epochs): 73 | epoch += 1 74 | nets = self.nets 75 | acc = self.acc 76 | #clear grads 77 | self.optimiser.zero_grad() 78 | #forward to get predicted values 79 | outputs = self.model.forward( nets ) 80 | loss = nn.MSELoss()(outputs, acc) 81 | loss.backward()# back props 82 | nn.utils.clip_grad_norm_(self.model.parameters(), 5) 83 | self.optimiser.step()# update the parameters 84 | # if epoch % 1000 == 0: 85 | # print('@' + self.__class__.__name__ + ' epoch {}, loss {}'.format(epoch, loss.data)) 86 | 87 | def predict(self, remaining): 88 | assert type(remaining) == type({}) 89 | remaining_archs = [] 90 | for k, v in remaining.items(): 91 | net = json.loads(k) 92 | remaining_archs.append( net ) 93 | remaining_archs = torch.from_numpy(np.asarray(remaining_archs, dtype=np.float32).reshape(-1, self.input_dim)) 94 | if torch.cuda.is_available(): 95 | remaining_archs = remaining_archs.cuda() 96 | outputs = self.model.forward(remaining_archs) 97 | if torch.cuda.is_available(): 98 | remaining_archs = remaining_archs.cpu() 99 | outputs = outputs.cpu() 100 | result = {} 101 | counter = 0 102 | for k in range(0, len(remaining_archs) ): 103 | counter += 1 104 | arch = remaining_archs[k].detach().numpy() 105 | arch_str = json.dumps( arch.tolist() ) 106 | result[ arch_str ] = outputs[k].detach().numpy().tolist()[0] 107 | assert len(result) == len(remaining) 108 | return result 109 | 110 | def split_predictions(self, remaining): 111 | assert type(remaining) == type({}) 112 | samples_badness = {} 113 | samples_goodies = {} 114 | if len(remaining) == 0: 115 | return samples_badness, samples_goodies 116 | predictions = self.predict(remaining) 117 | avg_acc = self.predict_mean() 118 | self.boundary = avg_acc 119 | for k, v in predictions.items(): 120 | if v < avg_acc: 121 | samples_badness[k] = v 122 | else: 123 | samples_goodies[k] = v 124 | assert len(samples_badness) + len(samples_goodies) == len(remaining) 125 | return samples_goodies, samples_badness 126 | 127 | 128 | def predict_mean(self): 129 | if len(self.nets) == 0: 130 | return 0 131 | # can we use the actual acc? 132 | outputs = self.model.forward(self.nets) 133 | pred_np = None 134 | if torch.cuda.is_available(): 135 | pred_np = outputs.detach().cpu().numpy() 136 | else: 137 | pred_np = outputs.detach().numpy() 138 | return np.mean(pred_np) 139 | 140 | def split_data(self): 141 | samples_badness = {} 142 | samples_goodies = {} 143 | if len(self.nets) == 0: 144 | return samples_badness, samples_goodies 145 | self.train() 146 | avg_acc = self.predict_mean() 147 | self.boundary = avg_acc 148 | for k, v in self.samples.items(): 149 | if v < avg_acc: 150 | samples_badness[k] = v 151 | else: 152 | samples_goodies[k] = v 153 | assert len(samples_badness) + len(samples_goodies) == len( self.samples ) 154 | return samples_goodies, samples_badness 155 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/server/Node.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from Classifier import Classifier 7 | import json 8 | import numpy as np 9 | import math 10 | 11 | class Node: 12 | obj_counter = 0 13 | # If a leave holds >= SPLIT_THRESH, we split into two new nodes. 14 | 15 | def __init__(self, parent = None, is_good_kid = False, arch_code_len = 0, is_root = False): 16 | # Note: every node is initialized as a leaf, 17 | # only internal nodes equip with classifiers to make decisions 18 | if not is_root: 19 | assert type( parent ) == type( self ) 20 | self.is_root = is_root 21 | self.ARCH_CODE_LEN = arch_code_len 22 | self.x_bar = float("inf") 23 | self.n = 0 24 | self.classifier = Classifier({}, self.ARCH_CODE_LEN) 25 | self.parent = parent 26 | self.is_good_kid = is_good_kid 27 | self.uct = 0 28 | 29 | #insert curt into the kids of parent 30 | if parent is not None: 31 | self.parent.kids.append(self) 32 | if self.parent.is_leaf == True: 33 | self.parent.is_leaf = False 34 | assert len(self.parent.kids) <= 2 35 | self.kids = [] 36 | self.bag = { } 37 | self.good_kid_data = {} 38 | self.bad_kid_data = {} 39 | 40 | self.is_leaf = True 41 | self.id = Node.obj_counter 42 | 43 | #data for good and bad kids, respectively 44 | Node.obj_counter += 1 45 | 46 | def visit(self): 47 | self.n += 1 48 | 49 | def collect_sample(self, arch, acc): 50 | self.bag[json.dumps(arch) ] = acc 51 | self.n = len( self.bag ) 52 | 53 | def print_bag(self): 54 | print("BAG"+"#"*10) 55 | for k, v in self.bag.items(): 56 | print("arch:", k, "acc:", v) 57 | print("BAG"+"#"*10) 58 | print('\n') 59 | 60 | 61 | def put_in_bag(self, net, acc): 62 | assert type(net) == type([]) 63 | assert type(acc) == type(float(0.1)) 64 | net_k = json.dumps(net) 65 | self.bag[net_k] = acc 66 | 67 | def get_name(self): 68 | # state is a list of jsons 69 | return "node" + str(self.id) 70 | 71 | def pad_str_to_8chars(self, ins): 72 | if len(ins) <= 14: 73 | ins += ' '*(14 - len(ins) ) 74 | return ins 75 | else: 76 | return ins 77 | 78 | def __str__(self): 79 | name = self.get_name() 80 | name = self.pad_str_to_8chars(name) 81 | name += ( self.pad_str_to_8chars( 'lf:' + str(self.is_leaf)) ) 82 | 83 | val = 0 84 | name += ( self.pad_str_to_8chars( ' val:{0:.4f} '.format(round(self.get_xbar(), 4) ) ) ) 85 | name += ( self.pad_str_to_8chars( ' uct:{0:.4f} '.format(round(self.get_uct(), 4) ) ) ) 86 | 87 | name += self.pad_str_to_8chars( 'n:'+str(self.n) ) 88 | name += self.pad_str_to_8chars( 'sp:'+ str(len(self.bag)) ) 89 | name += ( self.pad_str_to_8chars( 'g_k:' + str( len(self.good_kid_data) ) ) ) 90 | name += ( self.pad_str_to_8chars( 'b_k:' + str( len(self.bad_kid_data ) ) ) ) 91 | 92 | parent = '----' 93 | if self.parent is not None: 94 | parent = self.parent.get_name() 95 | parent = self.pad_str_to_8chars(parent) 96 | 97 | name += (' parent:' + parent) 98 | 99 | kids = '' 100 | kid = '' 101 | for k in self.kids: 102 | kid = self.pad_str_to_8chars( k.get_name() ) 103 | kids += kid 104 | name += (' kids:' + kids) 105 | 106 | return name 107 | 108 | 109 | def get_uct(self): 110 | if self.is_root and self.parent == None: 111 | return float('inf') 112 | if self.n == 0: 113 | return float('inf') 114 | Cp = 0.5 115 | return self.x_bar + 2*Cp*math.sqrt( 2* math.log(self.parent.n) / self.n ) 116 | 117 | def get_xbar(self): 118 | return self.x_bar 119 | 120 | def get_n(self): 121 | return self.n 122 | 123 | def get_parent_str(self): 124 | return self.parent.get_name() 125 | 126 | def train(self): 127 | if self.parent == None and self.is_root == True: 128 | # training starts from the bag 129 | assert len(self.bag) > 0 130 | self.classifier.update_samples(self.bag ) 131 | self.good_kid_data, self.bad_kid_data = self.classifier.split_data() 132 | elif self.is_leaf: 133 | if self.is_good_kid: 134 | self.bag = self.parent.good_kid_data 135 | else: 136 | self.bag = self.parent.bad_kid_data 137 | else: 138 | if self.is_good_kid: 139 | self.bag = self.parent.good_kid_data 140 | self.classifier.update_samples(self.parent.good_kid_data ) 141 | self.good_kid_data, self.bad_kid_data = self.classifier.split_data() 142 | else: 143 | self.bag = self.parent.bad_kid_data 144 | self.classifier.update_samples(self.parent.bad_kid_data ) 145 | self.good_kid_data, self.bad_kid_data = self.classifier.split_data() 146 | if len(self.bag) == 0: 147 | self.x_bar = float('inf') 148 | self.n = 0 149 | else: 150 | self.x_bar = np.mean( np.array(list(self.bag.values())) ) 151 | self.n = len( self.bag.values() ) 152 | 153 | def predict(self): 154 | if self.parent == None and self.is_root == True and self.is_leaf == False: 155 | self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.bag) 156 | elif self.is_leaf: 157 | if self.is_good_kid: 158 | self.bag = self.parent.good_kid_data 159 | else: 160 | self.bag = self.parent.bad_kid_data 161 | else: 162 | if self.is_good_kid: 163 | self.bag = self.parent.good_kid_data 164 | self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.parent.good_kid_data) 165 | else: 166 | self.bag = self.parent.bad_kid_data 167 | self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.parent.bad_kid_data) 168 | 169 | def sample_arch(self): 170 | if len(self.bag) == 0: 171 | return None 172 | net_str = np.random.choice( list(self.bag.keys() ) ) 173 | del self.bag[net_str] 174 | return json.loads(net_str ) 175 | 176 | def clear_data(self): 177 | self.bag.clear() 178 | self.bad_kid_data.clear() 179 | self.good_kid_data.clear() 180 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/server/net_training.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | import time 8 | import sys 9 | import copy 10 | from datetime import datetime 11 | import collections 12 | import json 13 | import operator 14 | import os 15 | 16 | class Net_Trainer: 17 | best_trace = collections.OrderedDict() 18 | dataset = collections.OrderedDict() 19 | training_trace = collections.OrderedDict() 20 | best_arch = None 21 | best_acc = 0 22 | best_accuracy = 0 23 | counter = 0 24 | 25 | def __init__(self): 26 | raw_data = [] 27 | with open('features.json', 'r') as infile: 28 | raw_data = json.loads( infile.read() ) 29 | for i in raw_data: 30 | arch = i['feature'] 31 | acc = i['acc'] 32 | self.dataset[json.dumps(arch) ] = acc 33 | if acc > self.best_acc: 34 | self.best_acc = acc 35 | self.best_arch = json.dumps( arch ) 36 | print("searching target:", self.best_arch," acc:", self.best_acc) 37 | 38 | print("trainer loaded:", len(self.dataset)," entries" ) 39 | 40 | def print_best_traces(self): 41 | print("%"*20) 42 | print("=====> best accuracy so far:", self.best_accuracy) 43 | sorted_best_traces = sorted(self.best_trace.items(), key=operator.itemgetter(1)) 44 | for item in sorted_best_traces: 45 | print(item[0],"==>", item[1]) 46 | for item in sorted_best_traces: 47 | print(item[1]) 48 | print("%"*20) 49 | 50 | def train_net(self, network): 51 | # input is a code of an architecture 52 | assert type( network ) == type( [] ) 53 | network_str = json.dumps( network ) 54 | assert network_str in self.dataset 55 | is_found = False 56 | acc = self.dataset[network_str] 57 | # we ensure not to repetitatively sample same architectures 58 | assert network_str not in self.training_trace.keys() 59 | self.training_trace[network_str] = acc 60 | self.counter += 1 61 | if acc > self.best_accuracy: 62 | print("@@@update best state:", network) 63 | print("@@@update best acc:", acc) 64 | print("target str:", self.best_arch) 65 | self.best_accuracy = acc 66 | item = [acc, self.counter] 67 | self.best_trace[network_str] = item 68 | if network_str == self.best_arch: 69 | sorted_best_traces = sorted(self.best_trace.items(), key=operator.itemgetter(1)) 70 | final_results = [] 71 | for item in sorted_best_traces: 72 | final_results.append( item[1] ) 73 | final_results_str = json.dumps(final_results) 74 | with open("result.txt", "a") as f: 75 | f.write(final_results_str + '\n') 76 | print("$$$$$$$$$$$$$$$$$$$CONGRATUGLATIONS$$$$$$$$$$$$$$$$$$$") 77 | os._exit(1) 78 | 79 | return acc 80 | -------------------------------------------------------------------------------- /LaNAS/Distributed_LaNAS/server/search_space.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/LaMCTS/489bd60886f23b0b76b10aa8602ea6722f334ad6/LaNAS/Distributed_LaNAS/server/search_space.zip -------------------------------------------------------------------------------- /LaNAS/LaNAS_NASBench101/Classifier.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | from torch.autograd import Variable 9 | import json 10 | from torch import optim 11 | import numpy as np 12 | 13 | # this is the backbone model 14 | # to split networks at a MCTS state 15 | class LinearModel(nn.Module): 16 | 17 | def __init__(self, input_dim, output_dim): 18 | super(LinearModel, self).__init__() 19 | self.fc1 = nn.Linear(input_dim, output_dim) 20 | torch.nn.init.xavier_uniform_( self.fc1.weight ) 21 | 22 | def forward(self, x): 23 | y = self.fc1(x) 24 | #print("=====>X_shape:", x.shape) 25 | return y 26 | 27 | # the input will be samples! 28 | class Classifier(): 29 | def __init__(self, samples, input_dim): 30 | self.training_counter = 0 31 | assert input_dim >= 1 32 | assert type(samples) == type({}) 33 | self.input_dim = input_dim 34 | self.samples = samples 35 | self.model = LinearModel(input_dim, 1) 36 | 37 | if torch.cuda.is_available(): 38 | self.model.cuda() 39 | self.l_rate = 0.00001 40 | self.optimiser = optim.Adam(self.model.parameters(), lr=self.l_rate, betas=(0.9, 0.999), eps=1e-08) 41 | self.epochs = 1 #TODO:revise to 100 42 | self.boundary = -1 43 | self.nets = [] 44 | 45 | def get_params(self): 46 | return self.model.fc1.weight.detach().numpy(), self.model.fc1.bias.detach().numpy() 47 | 48 | def reinit(self): 49 | torch.nn.init.xavier_uniform_( self.m.fc1.weight ) 50 | torch.nn.init.xavier_uniform_( self.m.fc2.weight ) 51 | 52 | def update_samples(self, latest_samples): 53 | assert type(latest_samples) == type(self.samples) 54 | sampled_nets = [] 55 | nets_acc = [] 56 | for k, v in latest_samples.items(): 57 | net = json.loads(k) 58 | sampled_nets.append( net ) 59 | nets_acc.append( v ) 60 | self.nets = torch.from_numpy(np.asarray(sampled_nets, dtype=np.float32).reshape(-1, self.input_dim)) 61 | self.acc = torch.from_numpy(np.asarray(nets_acc, dtype=np.float32).reshape(-1, 1)) 62 | self.samples = latest_samples 63 | if torch.cuda.is_available(): 64 | self.nets = self.nets.cuda() 65 | self.acc = self.acc.cuda() 66 | 67 | def train(self): 68 | if self.training_counter == 0: 69 | self.epochs = 20000 70 | else: 71 | self.epochs = 3000 72 | self.training_counter += 1 73 | # in a rare case, one branch has no networks 74 | if len(self.nets) == 0: 75 | return 76 | for epoch in range(self.epochs): 77 | epoch += 1 78 | nets = self.nets 79 | acc = self.acc 80 | #clear grads 81 | self.optimiser.zero_grad() 82 | #forward to get predicted values 83 | outputs = self.model.forward( nets ) 84 | loss = nn.MSELoss()(outputs, acc) 85 | loss.backward()# back props 86 | nn.utils.clip_grad_norm_(self.model.parameters(), 5) 87 | self.optimiser.step()# update the parameters 88 | # if epoch % 1000 == 0: 89 | # print('@' + self.__class__.__name__ + ' epoch {}, loss {}'.format(epoch, loss.data)) 90 | 91 | def predict(self, remaining): 92 | assert type(remaining) == type({}) 93 | remaining_archs = [] 94 | for k, v in remaining.items(): 95 | net = json.loads(k) 96 | remaining_archs.append( net ) 97 | remaining_archs = torch.from_numpy(np.asarray(remaining_archs, dtype=np.float32).reshape(-1, self.input_dim)) 98 | if torch.cuda.is_available(): 99 | remaining_archs = remaining_archs.cuda() 100 | outputs = self.model.forward(remaining_archs) 101 | if torch.cuda.is_available(): 102 | remaining_archs = remaining_archs.cpu() 103 | outputs = outputs.cpu() 104 | result = {} 105 | counter = 0 106 | for k in range(0, len(remaining_archs) ): 107 | counter += 1 108 | arch = remaining_archs[k].detach().numpy() 109 | arch_str = json.dumps( arch.tolist() ) 110 | result[ arch_str ] = outputs[k].detach().numpy().tolist()[0] 111 | assert len(result) == len(remaining) 112 | return result 113 | 114 | def split_predictions(self, remaining): 115 | assert type(remaining) == type({}) 116 | samples_badness = {} 117 | samples_goodies = {} 118 | if len(remaining) == 0: 119 | return samples_badness, samples_goodies 120 | predictions = self.predict(remaining) 121 | avg_acc = self.predict_mean() 122 | self.boundary = avg_acc 123 | for k, v in predictions.items(): 124 | if v < avg_acc: 125 | samples_badness[k] = v 126 | else: 127 | samples_goodies[k] = v 128 | assert len(samples_badness) + len(samples_goodies) == len(remaining) 129 | return samples_goodies, samples_badness 130 | 131 | 132 | def predict_mean(self): 133 | if len(self.nets) == 0: 134 | return 0 135 | # can we use the actual acc? 136 | outputs = self.model.forward(self.nets) 137 | pred_np = None 138 | if torch.cuda.is_available(): 139 | pred_np = outputs.detach().cpu().numpy() 140 | else: 141 | pred_np = outputs.detach().numpy() 142 | return np.mean(pred_np) 143 | 144 | def split_data(self): 145 | samples_badness = {} 146 | samples_goodies = {} 147 | if len(self.nets) == 0: 148 | return samples_badness, samples_goodies 149 | self.train() 150 | avg_acc = self.predict_mean() 151 | self.boundary = avg_acc 152 | for k, v in self.samples.items(): 153 | if v < avg_acc: 154 | samples_badness[k] = v 155 | else: 156 | samples_goodies[k] = v 157 | assert len(samples_badness) + len(samples_goodies) == len( self.samples ) 158 | return samples_goodies, samples_badness 159 | -------------------------------------------------------------------------------- /LaNAS/LaNAS_NASBench101/Node.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from Classifier import Classifier 7 | import json 8 | import numpy as np 9 | import math 10 | 11 | class Node: 12 | obj_counter = 0 13 | # If a leave holds >= SPLIT_THRESH, we split into two new nodes. 14 | 15 | def __init__(self, parent = None, is_good_kid = False, arch_code_len = 0, is_root = False): 16 | # Note: every node is initialized as a leaf, 17 | # only internal nodes equip with classifiers to make decisions 18 | if not is_root: 19 | assert type( parent ) == type( self ) 20 | self.is_root = is_root 21 | self.ARCH_CODE_LEN = arch_code_len 22 | self.x_bar = float("inf") 23 | self.n = 0 24 | self.classifier = Classifier({}, self.ARCH_CODE_LEN) 25 | self.parent = parent 26 | self.is_good_kid = is_good_kid 27 | self.uct = 0 28 | self.best_arch = [0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0, 0.5, 0.5, 0.5, 0.5, 1.0, 0.0] 29 | 30 | #insert curt into the kids of parent 31 | if parent is not None: 32 | self.parent.kids.append(self) 33 | if self.parent.is_leaf == True: 34 | self.parent.is_leaf = False 35 | assert len(self.parent.kids) <= 2 36 | self.kids = [] 37 | self.bag = { } 38 | self.good_kid_data = {} 39 | self.bad_kid_data = {} 40 | 41 | self.is_leaf = True 42 | self.id = Node.obj_counter 43 | 44 | #data for good and bad kids, respectively 45 | Node.obj_counter += 1 46 | 47 | def visit(self): 48 | self.n += 1 49 | 50 | def collect_sample(self, arch, acc): 51 | self.bag[json.dumps(arch) ] = acc 52 | self.n = len( self.bag ) 53 | 54 | def print_bag(self): 55 | print("BAG"+"#"*10) 56 | for k, v in self.bag.items(): 57 | print("arch:", k, "acc:", v) 58 | print("BAG"+"#"*10) 59 | print('\n') 60 | 61 | 62 | def put_in_bag(self, net, acc): 63 | assert type(net) == type([]) 64 | assert type(acc) == type(float(0.1)) 65 | net_k = json.dumps(net) 66 | self.bag[net_k] = acc 67 | 68 | def get_name(self): 69 | # state is a list of jsons 70 | return "node" + str(self.id) 71 | 72 | def pad_str_to_8chars(self, ins): 73 | if len(ins) <= 14: 74 | ins += ' '*(14 - len(ins) ) 75 | return ins 76 | 77 | def __str__(self): 78 | name = self.get_name() 79 | name = self.pad_str_to_8chars(name) 80 | name += ( self.pad_str_to_8chars( 'lf:' + str(self.is_leaf)) ) 81 | 82 | val = 0 83 | name += ( self.pad_str_to_8chars( ' val:{0:.4f} '.format(round(self.get_xbar(), 4) ) ) ) 84 | name += ( self.pad_str_to_8chars( ' uct:{0:.4f} '.format(round(self.get_uct(), 4) ) ) ) 85 | 86 | name += self.pad_str_to_8chars( 'n:'+str(self.n) ) 87 | name += self.pad_str_to_8chars( 'sp:'+ str(len(self.bag)) ) 88 | name += ( self.pad_str_to_8chars( 'g_k:' + str( len(self.good_kid_data) ) ) ) 89 | name += ( self.pad_str_to_8chars( 'b_k:' + str( len(self.bad_kid_data ) ) ) ) 90 | name += ( self.pad_str_to_8chars( 'best:' + str( json.dumps(self.best_arch) in self.bag ) ) ) 91 | 92 | 93 | parent = '----' 94 | if self.parent is not None: 95 | parent = self.parent.get_name() 96 | parent = self.pad_str_to_8chars(parent) 97 | 98 | name += (' parent:' + parent) 99 | 100 | kids = '' 101 | kid = '' 102 | for k in self.kids: 103 | kid = self.pad_str_to_8chars( k.get_name() ) 104 | kids += kid 105 | name += (' kids:' + kids) 106 | 107 | return name 108 | 109 | 110 | def get_uct(self, Cp = 0.000002): 111 | if self.is_root and self.parent == None: 112 | return float('inf') 113 | if self.n == 0: 114 | return float('inf') 115 | return self.x_bar + 2*Cp*math.sqrt( 2* math.log(self.parent.n) / self.n ) 116 | 117 | def get_xbar(self): 118 | return self.x_bar 119 | 120 | def get_n(self): 121 | return self.n 122 | 123 | def get_parent_str(self): 124 | return self.parent.get_name() 125 | 126 | def train(self): 127 | if self.parent == None and self.is_root == True: 128 | # training starts from the bag 129 | assert len(self.bag) > 0 130 | self.classifier.update_samples(self.bag ) 131 | self.good_kid_data, self.bad_kid_data = self.classifier.split_data() 132 | elif self.is_leaf: 133 | if self.is_good_kid: 134 | self.bag = self.parent.good_kid_data 135 | else: 136 | self.bag = self.parent.bad_kid_data 137 | else: 138 | if self.is_good_kid: 139 | self.bag = self.parent.good_kid_data 140 | self.classifier.update_samples(self.parent.good_kid_data ) 141 | self.good_kid_data, self.bad_kid_data = self.classifier.split_data() 142 | else: 143 | self.bag = self.parent.bad_kid_data 144 | self.classifier.update_samples(self.parent.bad_kid_data ) 145 | self.good_kid_data, self.bad_kid_data = self.classifier.split_data() 146 | if len(self.bag) == 0: 147 | self.x_bar = float('inf') 148 | self.n = 0 149 | else: 150 | self.x_bar = np.mean( np.array(list(self.bag.values())) ) 151 | self.n = len( self.bag.values() ) 152 | 153 | def predict(self): 154 | if self.parent == None and self.is_root == True and self.is_leaf == False: 155 | self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.bag) 156 | elif self.is_leaf: 157 | if self.is_good_kid: 158 | self.bag = self.parent.good_kid_data 159 | else: 160 | self.bag = self.parent.bad_kid_data 161 | else: 162 | if self.is_good_kid: 163 | self.bag = self.parent.good_kid_data 164 | self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.parent.good_kid_data) 165 | else: 166 | self.bag = self.parent.bad_kid_data 167 | self.good_kid_data, self.bad_kid_data = self.classifier.split_predictions(self.parent.bad_kid_data) 168 | 169 | def sample_arch(self): 170 | if len(self.bag) == 0: 171 | return None 172 | net_str = np.random.choice( list(self.bag.keys() ) ) 173 | del self.bag[net_str] 174 | return json.loads(net_str ) 175 | 176 | def clear_data(self): 177 | self.bag.clear() 178 | self.bad_kid_data.clear() 179 | self.good_kid_data.clear() 180 | -------------------------------------------------------------------------------- /LaNAS/LaNAS_NASBench101/README.md: -------------------------------------------------------------------------------- 1 | ## LaNAS on NASBench-101 2 | 3 | This folder has everything you need to test LaNAS on NASBench-101. Before you start, please download a preprocessed NASBench-101 from AlphaX (see section Download the dataset). 4 | ``` 5 | place nasbench_dataset in LaNAS/LaNAS_NASBench101 6 | python MCTS.py 7 | ``` 8 | The program will stop once it finds the global optimum. The search usually takes a few hours to a day. Once it finishes, The search results will be written into the last row in results.txt. Here is an example to interpret the result. 9 | 10 | >[[0.9313568472862244, 1], [0.9326255321502686, 47], [0.9332265059153239, 51], [0.9342948794364929, 72], [0.9343950351079305, 76], [0.93873530626297, 81], [0.9388020833333334, 224], [0.9388688604036967, 472], [0.9391693472862244, 639], [0.9407051205635071, 740], [0.9420072237650553, 831], [0.9423410892486572, 1545], [0.943175752957662, 3259]] 11 | 12 | This means before the 47th sample, the best validation accuracy is 0.9326255321502686; and in this case LaNAS finds the best network using 3259 samples. The results of a new experiment will be appended as a new row in results.txt. 13 | 14 | We also provided results of our past runs in our_past_results.txt, you can use that for comparisions; but feel free to reproduce the results with this release. 15 | 16 | ## About NASBench-101 17 | Please check AlphaX to see our encoding of NASBench. 18 | 19 | ## About Predictor based Search Methods 20 | 21 | The simplest way to verify "why predictor not working" is to try it on the 10 dimensional continuous Ackley function (in functions.py in LA-MCTS). In practice, the search space has 10^30 architectures, you CANNOT predict every one; and whatever predictor you use will fail. 22 | 23 | Why predictor works well on NASBench?The main issue of predictor based methods is that these methods need to predict every architecture in the search space to perform well, and misses an acquisition (e.g. in Bayesian Optimization) to make the trade-off between exploration and exploitation. NASBench only has 4.2*10^5 networks, which can be predicted in a second. We show a simple MLP can perform well (< 1000 samples) if it predicts on all the architectures in NASBench. Besides, the following figure visualizes the distribution of network-accuracy on NASBench-101, y in log scale. So it is not surprising to see even using random search can find a reasonable result, since most networks are pretty good. 24 | 25 |

26 | 27 |

28 | 29 | In fact, the purpose of neural predictors, e.g. Graph Neural Network-based predictors, are very similar to the surrogate model (e.g. Gaussian Process) used in Bayesian Optimizations. The original NASBench-101 paper chose a set of very good baselines for comparisons. 30 | 31 | 32 | Why predictor works in NASNet or EfficientNet search space? These search space are constructed under very strong prior experience; and the final network accuracy can be hack with bag of tricks listed here. 33 | 34 | In this implementation, we used MLP to predict samples to assign an architecture to a partition. This is an engineering simplification and can be replaced by a hit-and-run sampler, i.e. sampling from a convex polytope. However, we do replace this with a sampling method in one-shot LaNAS, i.e. Fig.6(c) in LaNAS; and also see LA-MCTS. Thank you. 35 | -------------------------------------------------------------------------------- /LaNAS/LaNAS_NASBench101/extract_end_time.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | 7 | import json 8 | import csv 9 | 10 | 11 | 12 | with open("our_past_results.txt", "r") as f: 13 | l = f.readlines() 14 | 15 | 16 | list_net = [] 17 | for i in range(len(l)): 18 | l[i] = l[i].rstrip('\n') 19 | list_net.append(json.loads(l[i])) 20 | #print(json.loads(l[i])) 21 | print(str(json.loads(l[i])[-1][1]), end =", "), 22 | print("") 23 | 24 | -------------------------------------------------------------------------------- /LaNAS/LaNAS_NASBench101/net_training.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | import time 8 | import sys 9 | import copy 10 | from datetime import datetime 11 | import collections 12 | import json 13 | import operator 14 | import os 15 | 16 | class Net_Trainer: 17 | 18 | def __init__(self): 19 | self.best_trace = collections.OrderedDict() 20 | self.dataset = collections.OrderedDict() 21 | self.training_trace = collections.OrderedDict() 22 | self.best_arch = None 23 | self.best_acc = 0 24 | self.best_accuracy = 0 25 | self.counter = 0 26 | 27 | raw_data = [] 28 | with open('nasbench_dataset', 'r') as infile: 29 | raw_data = json.loads( infile.read() ) 30 | for i in raw_data: 31 | arch = i['feature'] 32 | acc = i['acc'] 33 | self.dataset[json.dumps(arch) ] = acc 34 | if acc > self.best_acc: 35 | self.best_acc = acc 36 | self.best_arch = json.dumps( arch ) 37 | print("searching target:", self.best_arch," acc:", self.best_acc) 38 | 39 | print("trainer loaded:", len(self.dataset)," entries" ) 40 | 41 | def print_best_traces(self): 42 | print("%"*20) 43 | print("=====> best accuracy so far:", self.best_accuracy) 44 | sorted_best_traces = sorted(self.best_trace.items(), key=operator.itemgetter(1)) 45 | for item in sorted_best_traces: 46 | print(item[0],"==>", item[1]) 47 | for item in sorted_best_traces: 48 | print(item[1]) 49 | print("%"*20) 50 | 51 | def train_net(self, network): 52 | # input is a code of an architecture 53 | assert type( network ) == type( [] ) 54 | network_str = json.dumps( network ) 55 | assert network_str in self.dataset 56 | is_found = False 57 | acc = self.dataset[network_str] 58 | # we ensure not to repetitatively sample same architectures 59 | assert network_str not in self.training_trace.keys() 60 | self.training_trace[network_str] = acc 61 | self.counter += 1 62 | if acc > self.best_accuracy: 63 | print("@@@update best state:", network) 64 | print("@@@update best acc:", acc) 65 | print("target str:", self.best_arch) 66 | self.best_accuracy = acc 67 | item = [acc, self.counter] 68 | self.best_trace[network_str] = item 69 | if network_str == self.best_arch: 70 | sorted_best_traces = sorted(self.best_trace.items(), key=operator.itemgetter(1)) 71 | final_results = [] 72 | for item in sorted_best_traces: 73 | final_results.append( item[1] ) 74 | final_results_str = json.dumps(final_results) 75 | with open("result.txt", "a") as f: 76 | f.write(final_results_str + '\n') 77 | print("$$$$$$$$$$$$$$$$$$$CONGRATUGLATIONS$$$$$$$$$$$$$$$$$$$") 78 | os._exit(1) 79 | 80 | return acc 81 | -------------------------------------------------------------------------------- /LaNAS/LaNet/CIFAR10/README.md: -------------------------------------------------------------------------------- 1 | ## Testing LaNet 2 | 3 | 1. Download pre-trained checkpoint from here, and place and unzip it in the same folder. 4 | 5 | 2. Run the following command to test. 6 | ``` 7 | python test.py --checkpoint ./lanas_128_99.03 --layers 24 --init_ch 128 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' 8 | ``` 9 | 10 | ```[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]``` is the best network found during the search. The snapshot below shows the top performing architectures (bottom to top) found during the distributed search. You can get the whole trace from here. 11 | 12 |

13 | 14 |

15 | 16 | 17 | 18 | ## Training LaNet 19 | 1. Install cutmix 20 | 21 | ```pip install git+https://github.com/ildoonet/cutmix``` 22 | 23 | 2. run training with the following command. 24 | 25 | ``` 26 | mkdir checkpoints 27 | python train.py --auxiliary --batch_size=32 --init_ch=128 --layer=24 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' --model_ema --model-ema-decay 0.9999 --auto_augment --epochs 1500 28 | ``` 29 | 30 | - **Training on the ImageNet** 31 | 32 | Please use the training pipeline from Pytorch-Image-Models. Here we describe the procedures to do so: 33 | 1. get the network from train.py, line 121 34 | 2. go to Pytorch-Image-Models 35 | 3. find pytorch-image-models/blob/master/timm/models/factory.py, replace line 57 as follows 36 | ``` 37 | # model = create_fn(**model_args, **kwargs) 38 | model = our-network 39 | ``` 40 | Our ImageNet pipeline will be released soon, stay tuned. 41 | 42 | 43 | -------------------------------------------------------------------------------- /LaNAS/LaNet/CIFAR10/model.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from operations import * 7 | 8 | 9 | class Cell(nn.Module): 10 | 11 | def __init__(self, genotype, C_prev_prev, C_prev, C, reduction, reduction_prev): 12 | 13 | super(Cell, self).__init__() 14 | 15 | if reduction_prev: 16 | self.preprocess0 = FactorizedReduce(C_prev_prev, C) 17 | else: 18 | self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0) 19 | self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0) 20 | 21 | if reduction: 22 | op_names, indices = zip(*genotype.reduce) 23 | concat = genotype.reduce_concat 24 | else: 25 | op_names, indices = zip(*genotype.normal) 26 | concat = genotype.normal_concat 27 | self._compile(C, op_names, indices, concat, reduction) 28 | 29 | def _compile(self, C, op_names, indices, concat, reduction): 30 | 31 | assert len(op_names) == len(indices) 32 | 33 | self._steps = len(op_names) // 2 34 | self._concat = concat 35 | self.multiplier = len(concat) 36 | 37 | self._ops = nn.ModuleList() 38 | for name, index in zip(op_names, indices): 39 | stride = 2 if reduction and index < 2 else 1 40 | op = OPS[name](C, stride, True) 41 | self._ops += [op] 42 | self._indices = indices 43 | 44 | def forward(self, s0, s1, drop_prob): 45 | 46 | s0 = self.preprocess0(s0) 47 | s1 = self.preprocess1(s1) 48 | 49 | states = [s0, s1] 50 | for i in range(self._steps): 51 | h1 = states[self._indices[2 * i]] 52 | h2 = states[self._indices[2 * i + 1]] 53 | op1 = self._ops[2 * i] 54 | op2 = self._ops[2 * i + 1] 55 | h1 = op1(h1) 56 | h2 = op2(h2) 57 | 58 | if self.training and drop_prob > 0.: 59 | if not isinstance(op1, Identity): 60 | h1 = drop_path(h1, drop_prob) 61 | if not isinstance(op2, Identity): 62 | h2 = drop_path(h2, drop_prob) 63 | s = h1 + h2 64 | states += [s] 65 | return torch.cat([states[i] for i in self._concat], dim=1) 66 | 67 | def drop_path(x, drop_prob): 68 | if drop_prob > 0.: 69 | keep_prob = 1. - drop_prob 70 | 71 | mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 72 | x.div_(keep_prob) 73 | try: 74 | x.mul_(mask) 75 | except: 76 | mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 77 | x.mul_(mask) 78 | return x 79 | 80 | 81 | 82 | 83 | class AuxiliaryHeadCIFAR(nn.Module): 84 | 85 | def __init__(self, C, num_classes): 86 | """assuming input size 8x8""" 87 | super(AuxiliaryHeadCIFAR, self).__init__() 88 | 89 | self.features = nn.Sequential( 90 | nn.ReLU(inplace=True), 91 | nn.AvgPool2d(5, stride=3, padding=0, count_include_pad=False), # image size = 2 x 2 92 | nn.Conv2d(C, 128, 1, bias=False), 93 | nn.BatchNorm2d(128), 94 | nn.ReLU(inplace=True), 95 | nn.Conv2d(128, 768, 2, bias=False), 96 | nn.BatchNorm2d(768), 97 | nn.ReLU(inplace=True) 98 | ) 99 | self.classifier = nn.Linear(768, num_classes) 100 | 101 | def forward(self, x): 102 | x = self.features(x) 103 | x = self.classifier(x.view(x.size(0), -1)) 104 | return x 105 | 106 | 107 | class NetworkCIFAR(nn.Module): 108 | 109 | def __init__(self, C, num_classes, layers, auxiliary, genotype): 110 | super(NetworkCIFAR, self).__init__() 111 | 112 | self._layers = layers 113 | self._auxiliary = auxiliary 114 | 115 | stem_multiplier = 3 116 | C_curr = stem_multiplier * C 117 | self.stem = nn.Sequential( 118 | nn.Conv2d(3, C_curr, 3, padding=1, bias=False), 119 | nn.BatchNorm2d(C_curr) 120 | ) 121 | 122 | C_prev_prev, C_prev, C_curr = C_curr, C_curr, C 123 | self.cells = nn.ModuleList() 124 | reduction_prev = False 125 | for i in range(layers): 126 | if i in [layers // 3, 2 * layers // 3]: 127 | C_curr *= 2 128 | reduction = True 129 | else: 130 | reduction = False 131 | cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction, reduction_prev) 132 | reduction_prev = reduction 133 | self.cells += [cell] 134 | C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr 135 | if i == 2 * layers // 3: 136 | C_to_auxiliary = C_prev 137 | 138 | if auxiliary: 139 | self.auxiliary_head = AuxiliaryHeadCIFAR(C_to_auxiliary, num_classes) 140 | self.global_pooling = nn.AdaptiveAvgPool2d(1) 141 | self.classifier = nn.Linear(C_prev, num_classes) 142 | 143 | def forward(self, input): 144 | logits_aux = None 145 | s0 = s1 = self.stem(input) 146 | for i, cell in enumerate(self.cells): 147 | if self.training: 148 | s0, s1 = s1, cell(s0, s1, self.drop_path_prob) 149 | else: 150 | s0, s1 = s1, cell(s0, s1, 0) 151 | if i == 2 * self._layers // 3: 152 | if self._auxiliary and self.training: 153 | logits_aux = self.auxiliary_head(s1) 154 | out = self.global_pooling(s1) 155 | logits = self.classifier(out.view(out.size(0), -1)) 156 | return logits, logits_aux 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | -------------------------------------------------------------------------------- /LaNAS/LaNet/CIFAR10/operations.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | 10 | OPS = { 11 | 'avg_pool_3x3': lambda C, stride, affine: nn.AvgPool2d(3, stride=stride, padding=1, count_include_pad=False), 12 | 'max_pool_2x2' : lambda C, stride, affine: nn.MaxPool2d(2, stride=stride, padding=0), 13 | 'max_pool_3x3': lambda C, stride, affine: nn.MaxPool2d(3, stride=stride, padding=1), 14 | 'max_pool_5x5': lambda C, stride, affine: nn.MaxPool2d(5, stride=stride, padding=2), 15 | 'skip_connect': lambda C, stride, affine: Identity() if stride == 1 else FactorizedReduce(C, C, affine=affine), 16 | 'sep_conv_3x3': lambda C, stride, affine: SepConv(C, C, 3, stride, 1, affine=affine), 17 | 'sep_conv_5x5': lambda C, stride, affine: SepConv(C, C, 5, stride, 2, affine=affine), 18 | 'sep_conv_7x7': lambda C, stride, affine: SepConv(C, C, 7, stride, 3, affine=affine), 19 | 'dil_conv_3x3' : lambda C, stride, affine: DilConv(C, C, 3, stride, 2, 2, affine=affine), 20 | 'dil_conv_5x5' : lambda C, stride, affine: DilConv(C, C, 5, stride, 4, 2, affine=affine), 21 | 'conv_1x1' : lambda C, stride, affine: nn.Conv2d(C, C, (1,1), stride=(stride, stride), padding=(0,0), bias=False), 22 | 'conv_3x3' : lambda C, stride, affine: nn.Conv2d(C, C, (3,3), stride=(stride, stride), padding=(1,1), bias=False), 23 | 'conv_5x5' : lambda C, stride, affine: nn.Conv2d(C, C, (5,5), stride=(stride, stride), padding=(2,2), bias=False), 24 | } 25 | 26 | 27 | class ReLUConvBN(nn.Module): 28 | def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True): 29 | 30 | super(ReLUConvBN, self).__init__() 31 | 32 | self.op = nn.Sequential( 33 | nn.ReLU(inplace=False), 34 | Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False), 35 | nn.BatchNorm2d(C_out, affine=affine) 36 | ) 37 | 38 | def forward(self, x): 39 | return self.op(x) 40 | 41 | class Conv2d(nn.Conv2d): 42 | 43 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, 44 | padding=0, dilation=1, groups=1, bias=True): 45 | super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride, 46 | padding, dilation, groups, bias) 47 | 48 | def forward(self, x): 49 | weight = self.weight 50 | weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2, 51 | keepdim=True).mean(dim=3, keepdim=True) 52 | weight = weight - weight_mean 53 | std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5 54 | weight = weight / std.expand_as(weight) 55 | return F.conv2d(x, weight, self.bias, self.stride, 56 | self.padding, self.dilation, self.groups) 57 | 58 | 59 | class SepConv(nn.Module): 60 | 61 | def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True): 62 | super(SepConv, self).__init__() 63 | 64 | self.op = nn.Sequential( 65 | nn.ReLU(inplace=False), 66 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, 67 | groups=C_in, bias=False), 68 | Conv2d(C_in, C_in, kernel_size=1, padding=0, bias=False), 69 | nn.BatchNorm2d(C_in, affine=affine), 70 | nn.ReLU(inplace=False), 71 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=1, padding=padding, 72 | groups=C_in, bias=False), 73 | Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False), 74 | nn.BatchNorm2d(C_out, affine=affine), 75 | ) 76 | 77 | def forward(self, x): 78 | return self.op(x) 79 | 80 | 81 | class Identity(nn.Module): 82 | 83 | def __init__(self): 84 | super(Identity, self).__init__() 85 | 86 | def forward(self, x): 87 | return x 88 | 89 | 90 | class DilConv(nn.Module): 91 | def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True): 92 | super(DilConv, self).__init__() 93 | self.op = nn.Sequential( 94 | nn.ReLU(inplace=False), 95 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, 96 | groups=C_in, bias=False), 97 | Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False), 98 | nn.BatchNorm2d(C_out, affine=affine), 99 | ) 100 | 101 | def forward(self, x): 102 | return self.op(x) 103 | 104 | 105 | class FactorizedReduce(nn.Module): 106 | 107 | def __init__(self, C_in, C_out, affine=True): 108 | 109 | super(FactorizedReduce, self).__init__() 110 | 111 | assert C_out % 2 == 0 112 | 113 | self.relu = nn.ReLU(inplace=False) 114 | self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 115 | self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 116 | self.bn = nn.BatchNorm2d(C_out, affine=affine) 117 | 118 | def forward(self, x): 119 | x = self.relu(x) 120 | 121 | out = torch.cat([self.conv_1(x), self.conv_2(x[:, :, 1:, 1:])], dim=1) 122 | out = self.bn(out) 123 | return out 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | -------------------------------------------------------------------------------- /LaNAS/LaNet/CIFAR10/test.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import sys 7 | import utils 8 | import argparse 9 | import torch.nn as nn 10 | import torch.utils 11 | import torchvision.datasets as dset 12 | import torch.backends.cudnn as cudnn 13 | from collections import namedtuple 14 | from model import NetworkCIFAR as Network 15 | from utils import * 16 | from torch.utils.data.dataset import Subset 17 | import logging 18 | from nasnet_set import * 19 | 20 | 21 | 22 | parser = argparse.ArgumentParser("cifar10") 23 | parser.add_argument('--data', type=str, default='../data', help='location of the data corpus') 24 | parser.add_argument('--batch_size', type=int, default=96, help='batch size') 25 | parser.add_argument('--lr', type=float, default=0.025, help='init learning rate') 26 | parser.add_argument('--momentum', type=float, default=0.9, help='momentum') 27 | parser.add_argument('--wd', type=float, default=3e-4, help='weight decay') 28 | parser.add_argument('--report_freq', type=float, default=50, help='report frequency') 29 | parser.add_argument('--gpu', type=int, default=0, help='gpu device id') 30 | parser.add_argument('--epochs', type=int, default=600, help='num of training epochs') 31 | parser.add_argument('--layers', type=int, default=24, help='total number of layers') 32 | parser.add_argument('--init_ch', type=int, default=36, help='num of init channels') 33 | parser.add_argument('--model_path', type=str, default='saved_models', help='path to save the model') 34 | parser.add_argument('--auxiliary_weight', type=float, default=0.4, help='weight for auxiliary loss') 35 | parser.add_argument('--cutout', action='store_true', default=False, help='use cutout') 36 | parser.add_argument('--cutout_length', type=int, default=16, help='cutout length') 37 | parser.add_argument('--drop_path_prob', type=float, default=0.2, help='drop path probability') 38 | parser.add_argument('--seed', type=int, default=0, help='random seed') 39 | parser.add_argument('--arch', type=str, default='', help='which architecture to use') 40 | parser.add_argument('--checkpoint', type=str, default='', help='load from checkpoint') 41 | parser.add_argument('--save', type=str, default='EXP', help='experiment name') 42 | 43 | 44 | 45 | 46 | 47 | args = parser.parse_args() 48 | 49 | 50 | net = eval(args.arch) 51 | print(net) 52 | code = gen_code_from_list(net, node_num=int((len(net) / 4))) 53 | genotype = translator([code, code], max_node=int((len(net) / 4))) 54 | print(genotype) 55 | 56 | 57 | 58 | 59 | 60 | log_format = '%(asctime)s %(message)s' 61 | logging.basicConfig(stream=sys.stdout, level=logging.INFO, 62 | format=log_format, datefmt='%m/%d %I:%M:%S %p') 63 | fh = logging.FileHandler(os.path.join(args.checkpoint, 'log.txt')) 64 | fh.setFormatter(logging.Formatter(log_format)) 65 | logging.getLogger().addHandler(fh) 66 | 67 | 68 | 69 | def main(): 70 | 71 | torch.cuda.set_device(args.gpu) 72 | cudnn.benchmark = True 73 | cudnn.enabled = True 74 | 75 | logging.info('gpu device = %d' % args.gpu) 76 | logging.info("args = %s", args) 77 | 78 | 79 | model = Network(args.init_ch, 10, args.layers, True, genotype).cuda() 80 | 81 | logging.info("param size = %fMB", utils.count_parameters_in_MB(model)) 82 | 83 | checkpoint = torch.load(args.checkpoint + '/top1.pt') 84 | model.load_state_dict(checkpoint['model_state_dict']) 85 | criterion = nn.CrossEntropyLoss().cuda() 86 | 87 | 88 | CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124] 89 | CIFAR_STD = [0.24703233, 0.24348505, 0.26158768] 90 | 91 | valid_transform = transforms.Compose([ 92 | transforms.ToTensor(), 93 | transforms.Normalize(CIFAR_MEAN, CIFAR_STD), 94 | ]) 95 | 96 | 97 | 98 | 99 | valid_queue = torch.utils.data.DataLoader( 100 | dset.CIFAR10(root=args.data, train=False, transform=valid_transform), 101 | batch_size=args.batch_size, shuffle=True, num_workers=2, pin_memory=True) 102 | 103 | 104 | valid_acc, valid_obj = infer(valid_queue, model, criterion) 105 | logging.info('valid_acc: %f', valid_acc) 106 | 107 | 108 | 109 | def infer(valid_queue, model, criterion): 110 | 111 | objs = utils.AverageMeter() 112 | top1 = utils.AverageMeter() 113 | top5 = utils.AverageMeter() 114 | model.eval() 115 | 116 | for step, (x, target) in enumerate(valid_queue): 117 | x = x.cuda() 118 | target = target.cuda(non_blocking=True) 119 | 120 | with torch.no_grad(): 121 | logits, _ = model(x) 122 | loss = criterion(logits, target) 123 | 124 | prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5)) 125 | n = x.size(0) 126 | objs.update(loss.item(), n) 127 | top1.update(prec1.item(), n) 128 | top5.update(prec5.item(), n) 129 | 130 | 131 | if step % args.report_freq == 0: 132 | logging.info('>>Validation: %03d %e %f %f', step, objs.avg, top1.avg, top5.avg) 133 | 134 | 135 | 136 | return top1.avg, objs.avg 137 | 138 | 139 | 140 | if __name__ == '__main__': 141 | main() -------------------------------------------------------------------------------- /LaNAS/LaNet/README.md: -------------------------------------------------------------------------------- 1 | CIFAR10 folder currently contains the test and training pipeline using the NASNet search space. 2 | 3 | The code for EfficientNet search space on ImageNet will be released later. 4 | 5 | ## Performance of LaNet 6 | **CIFAR-10**: 99.03% top-1 using NASNet search space, SoTA result without using ImageNet or transfer learning. 7 | 8 | | Model | LaNet | EfficientNet-B7 | GPIPE | PyramidNet | XNAS | 9 | | -------------- | ---------- | --------- | ---------- | -------------- | -------------- | 10 | | top-1 | 99.03 | 98.9 | 99.0 | 98.64 | 98.4 | 11 | | use ImageNet | X | | | X | X | 12 | 13 | 14 | **ImageNet**: 77.7% top-1@240 MFLOPS, 80.8% top-1@600 MFLOPS using EfficientNet search space, SoTA results on the efficentNet space. 15 | 16 | 17 | | Model | LaNet | OFA | FBNetV2-F4 | MobileNet-V3 | FBNet-B | 18 | | -------------- | ---------- | --------- | ---------- | -------------- | -------------- | 19 | | top-1 | 77.7 | 76.9 | 76.0 | 75.2 | 74.1 | 20 | | MFLOPS | 240 | 230 | 238 | 219 | 295 | 21 | 22 | | Model | LaNet | OFA | FBNetV3 | EfficientNet-B1| 23 | | -------------- | ---------- | --------- | -----------| -------------- | 24 | | top-1 | 80.8 | 80.0 | 79.6 | 79.1 | 25 | | MFLOPS | 600 | 595 | 544 | 700 | 26 | 27 | 28 | **Applying LaNet to detection**: Compared to NAS-FCOS in CVPR-2020, 29 | | Backbone | Decoder | FLOPS(G) | AP | 30 | | ----------------- | ------------ | -------------- |-------| 31 | | LaNet | FPN-FCOS | 35.22 | 36.5 | 32 | | MobileNetV2 | FPN-FCOS | 105.4 | 31.2 | 33 | | MobileNetV2 | NAS-FCOS | 39.3 | 32.0 | 34 | 35 | We will release the ImageNet model, search framework, training pipeline, and their applications on detection, segmentation soon; stay tuned. 36 | 37 | 38 | 39 | ## Bag of Tricks 40 | Here are the following training heuristics we have used in our project: 41 | 42 | - ***Data Augmentation*** 43 | > We use CutOut, CutMix and RandAugmentation. Pytorch-Image-Models has a very nice implementation, but keep an eye of SoTA data augmentation techniques. 44 | 45 | - ***Distillation*** 46 | 47 | >a) The main source of the performance improvement in recent NAS EfficientNet paper. 48 | It seems training a student network together with a teacher from scratch can further improve the current SoTA, 49 | better than transferring weights from a fancy supernet. 50 | 51 | >b) Use a better teacher helps. 52 | 53 | >c) A better teacher may require larger images than student, use interpolation to resize the batch to feed into student and teacher. 54 | 55 | - ***Training Hyper-parameters*** 56 | > drop out and drop path, tune your training hyper-parameters. The learning rate cannot be too large nor too small, check your loss progress. 57 | 58 | - ***EMA*** 59 | > Using Exponential Moving Average (EMA) in the models, e.g. CNN, Detection, Transformers, unsupervised models, or whatever NLP or CV models, helps the performance, especially when your training finishes with fewer number of epochs. 60 | 61 | - ***Testing different Crops*** 62 | > Try changing different crop percentages in testing, it usually improves 0.1%. 63 | 64 | - ***Longer epochs*** 65 | > Make sure your model is sufficiently trained. 66 | 67 | -------------------------------------------------------------------------------- /LaNAS/README.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | LaNAS is an application of LA-MCTS to Neural Architecture Search (NAS), though the more general approach (LA-MCTS) was inspired from LaNAS. Here is what are included in this release: 3 | 4 | - **Evaluation on NASBench-101** : Evaluating LaNAS on NASBench-101 without training models. 5 | 6 | - **Our Searched Models, LaNet**: SoTA results: • 99.03% on CIFAR-10 • 77.7% @ 240MFLOPS on ImageNet. 7 | 8 | - **One/Few-shot LaNAS**: Using a supernet to evaluate the model, obtaining results within a few GPU days. 9 | 10 | - **Distributed LaNAS**: Distributed framework for LaNAS, usable with hundreds of GPUs. 11 | 12 | - **Training heuristics used**: We list all tricks used in ImageNet training to reach SoTA. 13 | 14 | # Publication 15 | 16 | Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search
17 | Linnan Wang (Brown University), Saining Xie (Facebook AI Research), Teng Li(Facebook AI Research), Rodrigo Fonesca (Brown University), Yuandong Tian (Facebook AI Research)
18 | 19 | And special thanks to the enormous help from Yiyang Zhao (Worcester Polytechnic Institute). 20 | 21 | # Performance on NASBench-101 22 | We have strictly followed NASBench-101 guidlines in benchmarking the results, please see our paper for details. 23 |

24 | 25 |

26 | 27 | # Performance of Searched Networks 28 | **CIFAR-10**: 99.03% top-1 using NASNet search space, SoTA result without using ImageNet or transfer learning. 29 | 30 | | Model | LaNet | EfficientNet-B7 | GPIPE | PyramidNet | XNAS | 31 | | -------------- | ---------- | --------- | ---------- | -------------- | -------------- | 32 | | top-1 | 99.03 | 98.9 | 99.0 | 98.64 | 98.4 | 33 | | use ImageNet | X | | | X | X | 34 | 35 | 36 | **ImageNet**: 77.7% top-1@240 MFLOPS, 80.8% top-1@600 MFLOPS using EfficientNet search space, SoTA results on the efficentNet space. 37 | 38 | 39 | | Model | LaNet | OFA | FBNetV2-F4 | MobileNet-V3 | FBNet-B | 40 | | -------------- | ---------- | --------- | ---------- | -------------- | -------------- | 41 | | top-1 | 77.7 | 76.9 | 76.0 | 75.2 | 74.1 | 42 | | MFLOPS | 240 | 230 | 238 | 219 | 295 | 43 | 44 | | Model | LaNet | OFA | FBNetV3 | EfficientNet-B1| 45 | | -------------- | ---------- | --------- | -----------| -------------- | 46 | | top-1 | 80.8 | 80.0 | 79.6 | 79.1 | 47 | | MFLOPS | 600 | 595 | 544 | 700 | 48 | 49 | 50 | **Applying LaNet to detection**: Compared to NAS-FCOS in CVPR-2020, 51 | | Backbone | Decoder | FLOPS(G) | AP | 52 | | ----------------- | ------------ | -------------- |-------| 53 | | LaNet | FPN-FCOS | 35.22 | 36.5 | 54 | | MobileNetV2 | FPN-FCOS | 105.4 | 31.2 | 55 | | MobileNetV2 | NAS-FCOS | 39.3 | 32.0 | 56 | 57 | We will release the ImageNet model, search framework, training pipeline, and their applications on detection, segmentation soon; stay tuned. 58 | 59 | 60 | # Trying Other CV or NLP Applications 61 | In LaNAS, we model a network architecture as a vector encoding, i.e. [x,...,x], and there is an decoder that translate the encoding into a runnable model for PyTorch or TensorFlow. Please see the function train_net in `net_training.py`. 62 | That means you only need implement an evaluator / cost model for your applications to use LaNAS. 63 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/Evaluate/generator.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | import itertools 8 | import random 9 | import copy 10 | 11 | node = 4 12 | layer_type = [ 13 | 'max_pool_3x3', 14 | 'skip_connect', 15 | 'sep_conv_3x3', 16 | 'sep_conv_5x5' 17 | ] 18 | 19 | 20 | def supernet_generator(node, layer_type): 21 | vec_length = len(layer_type) 22 | masked_vec = np.ones((1, vec_length))[0].tolist() 23 | disconnected_vec = np.zeros((1, vec_length))[0].tolist() 24 | supernet = [[] for v in range(node)] 25 | for i in range(node): 26 | for j in range(node + 2): 27 | if j < i + 2: 28 | supernet[i].append(masked_vec.copy()) 29 | else: 30 | supernet[i].append(disconnected_vec.copy()) 31 | return supernet 32 | 33 | def mask_specific_value(supernet, node_id, input_id, operation_id): 34 | supernet[node_id][input_id][operation_id] = 0.0 35 | return supernet 36 | 37 | 38 | def selected_specific_value(supernet, node_id, input_id, operation_id): 39 | for i in range(len(supernet[node_id][input_id])): 40 | if i != operation_id: 41 | supernet[node_id][input_id][i] = 0.0 42 | return supernet 43 | 44 | 45 | 46 | def name_compression_encoder(uncompressed_supernet, layer_type): 47 | supernet = copy.deepcopy(uncompressed_supernet) 48 | 49 | connectivity_domain = [0.0, 1.0] 50 | 51 | mix_operator = [p for p in itertools.product(connectivity_domain, repeat=len(layer_type))] 52 | for i in range(len(mix_operator)): 53 | mix_operator[i] = list(mix_operator[i]) 54 | 55 | for i in range(len(supernet)): 56 | for j in range(len(supernet[i])): 57 | if type(supernet[i][j]) is list: 58 | for p in range(len(mix_operator)): 59 | if supernet[i][j] == mix_operator[p]: 60 | supernet[i][j] = p 61 | return supernet 62 | 63 | def name_compression_decoder(compressed_supernet, layer_type): 64 | supernet = copy.deepcopy(compressed_supernet) 65 | 66 | connectivity_domain = [0.0, 1.0] 67 | 68 | mix_operator = [p for p in itertools.product(connectivity_domain, repeat=len(layer_type))] 69 | for i in range(len(mix_operator)): 70 | mix_operator[i] = list(mix_operator[i]) 71 | 72 | for i in range(len(supernet)): 73 | for j in range(len(supernet[i])): 74 | supernet[i][j] = mix_operator[supernet[i][j]] 75 | 76 | return supernet 77 | 78 | 79 | def layer_type_encoder(layer_type): 80 | encoded_type = [] 81 | for i in range(len(layer_type)): 82 | if layer_type[i] == 'skip_connect': 83 | encoded_type.append(0) 84 | if layer_type[i] == 'max_pool_3x3': 85 | encoded_type.append(1) 86 | if layer_type[i] == 'sep_conv_3x3': 87 | encoded_type.append(2) 88 | if layer_type[i] == 'sep_conv_5x5': 89 | encoded_type.append(3) 90 | 91 | return sorted(encoded_type) 92 | 93 | 94 | def random_net_generator(supernet, numbers): 95 | 96 | avail_node = [] 97 | for i in range(len(supernet)): 98 | for j in range(len(supernet[i])): 99 | if supernet[i][j] != 0: 100 | avail_node.append([i, j]) 101 | 102 | net_list = [] 103 | i = 0 104 | while True: 105 | new_net = copy.deepcopy(supernet) 106 | changed_time = random.randint(1, 10) 107 | 108 | for j in range(changed_time): 109 | changed_node = random.choice(avail_node) 110 | changed_value = random.randint(0, 15) 111 | new_net[changed_node[0]][changed_node[1]] = changed_value 112 | 113 | if new_net not in net_list: 114 | n = copy.deepcopy(new_net) 115 | net_list.append(n) 116 | i += 1 117 | 118 | if i == numbers: 119 | break 120 | # print(net_list) 121 | return net_list 122 | 123 | 124 | def resume_net_from_file(path): 125 | with open(path, 'r') as f: 126 | network = eval(f.read()) 127 | return network 128 | 129 | 130 | 131 | supernet_normal = supernet_generator(node, layer_type) 132 | supernet_normal = mask_specific_value(supernet_normal, 0, 0, 1) 133 | 134 | supernet_reduce = supernet_generator(node, layer_type) 135 | supernet_reduce = mask_specific_value(supernet_reduce, 0, 0, 1) 136 | 137 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/Evaluate/individual_model.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | from operations import * 10 | from torch.autograd import Variable 11 | 12 | 13 | 14 | class MixedOp(nn.Module): 15 | def __init__(self, C, stride, layer_type): 16 | super(MixedOp, self).__init__() 17 | self._ops = nn.ModuleList() 18 | for layer in layer_type: 19 | op = OPS[layer](C, stride, False) 20 | if 'pool' in layer: 21 | op = nn.Sequential(op, nn.BatchNorm2d(C, affine=False)) 22 | self._ops.append(op) 23 | 24 | def forward(self, x): 25 | return sum(op(x) for op in self._ops) 26 | # return sum(w * op(x) for w, op in zip(weights, self._ops)) # use sum instead concat 27 | 28 | 29 | class Cell(nn.Module): 30 | def __init__(self, layer_type, steps, multiplier, C_prev_prev, C_prev, C, reduction, reduction_prev, supernet_matrix): 31 | super(Cell, self).__init__() 32 | self.reduction = reduction 33 | 34 | if reduction_prev: 35 | self.preprocess0 = FactorizedReduce(C_prev_prev, C, affine=False) 36 | else: 37 | self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0, affine=False) 38 | self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0, affine=False) 39 | self._steps = steps 40 | self._multiplier = multiplier 41 | 42 | self._ops = nn.ModuleList() 43 | self._bns = nn.ModuleList() 44 | for i in range(self._steps): 45 | for j in range(2 + i): 46 | op_list = [] 47 | for type in range(len(supernet_matrix[i][j])): 48 | if supernet_matrix[i][j][type] == 1.0: 49 | op_list.append(layer_type[type]) 50 | stride = 2 if reduction and j < 2 else 1 51 | 52 | op = MixedOp(C, stride, layer_type=op_list) 53 | self._ops.append(op) 54 | 55 | 56 | def forward(self, s0, s1, supernet_matrix, drop_prob): 57 | s0 = self.preprocess0(s0) 58 | s1 = self.preprocess1(s1) 59 | 60 | states = [s0, s1] 61 | offset = 0 62 | for i in range(self._steps): 63 | H = [] 64 | op = [] 65 | for j, h in enumerate(states): 66 | if len(self._ops[offset + j]._ops) != 0: 67 | H.append(self._ops[offset + j](h)) 68 | op.append(self._ops[offset + j]) 69 | 70 | if self.training and drop_prob > 0.: 71 | for hn_index in range(len(H)): 72 | if len(op[hn_index]._ops) != 0: 73 | if not isinstance(op[hn_index]._ops[0], Identity): 74 | # print(len(op[hn_index]._ops)) 75 | # print(op[hn_index]._ops[0]) 76 | # print(op[hn_index]) 77 | H[hn_index] = drop_path(H[hn_index], drop_prob) 78 | 79 | s = sum(hn for hn in H) 80 | 81 | offset += len(states) 82 | states.append(s) 83 | return torch.cat(states[-len(supernet_matrix):], dim=1) 84 | 85 | 86 | 87 | def drop_path(x, drop_prob): 88 | if drop_prob > 0.: 89 | keep_prob = 1. - drop_prob 90 | 91 | mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 92 | x.div_(keep_prob) 93 | try: 94 | x.mul_(mask) 95 | except: 96 | mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 97 | x.mul_(mask) 98 | return x 99 | 100 | 101 | class AuxiliaryHeadCIFAR(nn.Module): 102 | 103 | def __init__(self, C, num_classes): 104 | """assuming input size 8x8""" 105 | super(AuxiliaryHeadCIFAR, self).__init__() 106 | 107 | self.features = nn.Sequential( 108 | nn.ReLU(inplace=True), 109 | nn.AvgPool2d(5, stride=3, padding=0, count_include_pad=False), # image size = 2 x 2 110 | nn.Conv2d(C, 128, 1, bias=False), 111 | nn.BatchNorm2d(128), 112 | nn.ReLU(inplace=True), 113 | nn.Conv2d(128, 768, 2, bias=False), 114 | nn.BatchNorm2d(768), 115 | nn.ReLU(inplace=True) 116 | ) 117 | self.classifier = nn.Linear(768, num_classes) 118 | 119 | def forward(self, x): 120 | x = self.features(x) 121 | x = self.classifier(x.view(x.size(0), -1)) 122 | return x 123 | 124 | 125 | class Network(nn.Module): 126 | def __init__(self, supernet_normal, supernet_reduce, layer_type, C, num_classes, layers, auxiliary, steps=4, multiplier=4, stem_multiplier=3): 127 | super(Network, self).__init__() 128 | self.supernet_normal = supernet_normal 129 | self.supernet_reduce = supernet_reduce 130 | self.layer_type = layer_type 131 | self._C = C 132 | self._num_classes = num_classes 133 | self._layers = layers 134 | self._steps = steps 135 | self._multiplier = multiplier 136 | self._auxiliary = auxiliary 137 | C_curr = stem_multiplier * C 138 | self.stem = nn.Sequential( 139 | nn.Conv2d(3, C_curr, 3, padding=1, bias=False), 140 | nn.BatchNorm2d(C_curr) 141 | ) 142 | 143 | C_prev_prev, C_prev, C_curr = C_curr, C_curr, C 144 | self.cells = nn.ModuleList() 145 | reduction_prev = False 146 | 147 | for i in range(layers): 148 | if i in [layers // 3, 2 * layers // 3]: 149 | C_curr *= 2 150 | reduction = True 151 | cell = Cell(self.layer_type, len(self.supernet_reduce), multiplier, C_prev_prev, C_prev, C_curr, reduction, reduction_prev, self.supernet_reduce) 152 | else: 153 | reduction = False 154 | cell = Cell(self.layer_type, len(self.supernet_normal), multiplier, C_prev_prev, C_prev, C_curr, reduction, reduction_prev, self.supernet_normal) 155 | 156 | reduction_prev = reduction 157 | self.cells += [cell] 158 | 159 | C_prev_prev, C_prev = C_prev, multiplier * C_curr 160 | if i == 2 * layers // 3: 161 | C_to_auxiliary = C_prev 162 | if auxiliary: 163 | self.auxiliary_head = AuxiliaryHeadCIFAR(C_to_auxiliary, num_classes) 164 | self.global_pooling = nn.AdaptiveAvgPool2d(1) 165 | self.classifier = nn.Linear(C_prev, num_classes) 166 | 167 | 168 | 169 | def forward(self, input): 170 | 171 | s0 = s1 = self.stem(input) 172 | for i, cell in enumerate(self.cells): 173 | if not cell.reduction: 174 | s0, s1 = s1, cell(s0, s1, self.supernet_normal, self.drop_path_prob) 175 | else: 176 | s0, s1 = s1, cell(s0, s1, self.supernet_reduce, self.drop_path_prob) 177 | out = self.global_pooling(s1) 178 | logits = self.classifier(out.view(out.size(0), -1)) 179 | return logits 180 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/Evaluate/operations.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | 10 | OPS = { 11 | 'none' : lambda C, stride, affine: Zero(stride), 12 | 'avg_pool_3x3' : lambda C, stride, affine: nn.AvgPool2d(3, stride=stride, padding=1, count_include_pad=False), 13 | 'max_pool_3x3' : lambda C, stride, affine: nn.MaxPool2d(3, stride=stride, padding=1), 14 | 'skip_connect' : lambda C, stride, affine: Identity() if stride == 1 else FactorizedReduce(C, C, affine=affine), 15 | 'sep_conv_3x3' : lambda C, stride, affine: SepConv(C, C, 3, stride, 1, affine=affine), 16 | 'sep_conv_5x5' : lambda C, stride, affine: SepConv(C, C, 5, stride, 2, affine=affine), 17 | 'sep_conv_7x7' : lambda C, stride, affine: SepConv(C, C, 7, stride, 3, affine=affine), 18 | 'dil_conv_3x3' : lambda C, stride, affine: DilConv(C, C, 3, stride, 2, 2, affine=affine), 19 | 'dil_conv_5x5' : lambda C, stride, affine: DilConv(C, C, 5, stride, 4, 2, affine=affine), 20 | 'conv_7x1_1x7' : lambda C, stride, affine: nn.Sequential( 21 | nn.ReLU(inplace=False), 22 | nn.Conv2d(C, C, (1,7), stride=(1, stride), padding=(0, 3), bias=False), 23 | nn.Conv2d(C, C, (7,1), stride=(stride, 1), padding=(3, 0), bias=False), 24 | nn.BatchNorm2d(C, affine=affine) 25 | ), 26 | 'conv_1x1' : lambda C, stride, affine: nn.Conv2d(C, C, (1,1), stride=(stride, stride), padding=(0,0), bias=False), 27 | 'conv_3x3' : lambda C, stride, affine: nn.Conv2d(C, C, (3,3), stride=(stride, stride), padding=(1,1), bias=False), 28 | 'conv_5x5' : lambda C, stride, affine: nn.Conv2d(C, C, (5,5), stride=(stride, stride), padding=(2,2), bias=False), 29 | } 30 | 31 | class ReLUConvBN(nn.Module): 32 | 33 | def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True): 34 | super(ReLUConvBN, self).__init__() 35 | self.op = nn.Sequential( 36 | nn.ReLU(inplace=False), 37 | nn.Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False), 38 | nn.BatchNorm2d(C_out, affine=affine) 39 | ) 40 | 41 | def forward(self, x): 42 | return self.op(x) 43 | 44 | class DilConv(nn.Module): 45 | 46 | def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True): 47 | super(DilConv, self).__init__() 48 | self.op = nn.Sequential( 49 | nn.ReLU(inplace=False), 50 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=C_in, bias=False), 51 | nn.Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False), 52 | nn.BatchNorm2d(C_out, affine=affine), 53 | ) 54 | 55 | def forward(self, x): 56 | return self.op(x) 57 | 58 | 59 | class SepConv(nn.Module): 60 | 61 | def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True): 62 | super(SepConv, self).__init__() 63 | self.op = nn.Sequential( 64 | nn.ReLU(inplace=False), 65 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, groups=C_in, bias=False), 66 | nn.Conv2d(C_in, C_in, kernel_size=1, padding=0, bias=False), 67 | nn.BatchNorm2d(C_in, affine=affine), 68 | nn.ReLU(inplace=False), 69 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=1, padding=padding, groups=C_in, bias=False), 70 | nn.Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False), 71 | nn.BatchNorm2d(C_out, affine=affine), 72 | ) 73 | 74 | def forward(self, x): 75 | return self.op(x) 76 | 77 | 78 | class Conv2d(nn.Conv2d): 79 | 80 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, 81 | padding=0, dilation=1, groups=1, bias=True): 82 | super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride, 83 | padding, dilation, groups, bias) 84 | 85 | def forward(self, x): 86 | weight = self.weight 87 | weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2, 88 | keepdim=True).mean(dim=3, keepdim=True) 89 | weight = weight - weight_mean 90 | std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5 91 | weight = weight / std.expand_as(weight) 92 | return F.conv2d(x, weight, self.bias, self.stride, 93 | self.padding, self.dilation, self.groups) 94 | 95 | 96 | 97 | class Identity(nn.Module): 98 | 99 | def __init__(self): 100 | super(Identity, self).__init__() 101 | 102 | def forward(self, x): 103 | return x 104 | 105 | 106 | class Zero(nn.Module): 107 | 108 | def __init__(self, stride): 109 | super(Zero, self).__init__() 110 | self.stride = stride 111 | 112 | def forward(self, x): 113 | if self.stride == 1: 114 | return x.mul(0.) 115 | return x[:,:,::self.stride,::self.stride].mul(0.) 116 | 117 | 118 | class FactorizedReduce(nn.Module): 119 | 120 | def __init__(self, C_in, C_out, affine=True): 121 | super(FactorizedReduce, self).__init__() 122 | assert C_out % 2 == 0 123 | self.relu = nn.ReLU(inplace=False) 124 | self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 125 | self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 126 | self.bn = nn.BatchNorm2d(C_out, affine=affine) 127 | 128 | def forward(self, x): 129 | x = self.relu(x) 130 | out = torch.cat([self.conv_1(x), self.conv_2(x[:,:,1:,1:])], dim=1) 131 | out = self.bn(out) 132 | return out 133 | 134 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/Evaluate/run.sh: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | 7 | screen -d -m -S eval_3 srun -p dev --gres=gpu:1 --time=72:00:00 --cpus-per-task=1 python super_individual_train.py --cutout --auxiliary --batch_size=16 --init_ch=36 --masked_code='[1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0]' 8 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/Evaluate/translator.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | import random 8 | 9 | node = 5 10 | layer_type = [ 11 | 'max_pool_3x3', 12 | 'skip_connect', 13 | 'sep_conv_3x3', 14 | 'sep_conv_5x5' 15 | ] 16 | 17 | 18 | def get_rand_vector(layer_type): 19 | vec_length = len(layer_type) 20 | masked_vec = [] 21 | for i in range(0, vec_length): 22 | if random.random() > 0.5: 23 | masked_vec.append(1) 24 | else: 25 | masked_vec.append(0) 26 | return masked_vec 27 | 28 | 29 | def mask_rand_generator(): 30 | supernet = [[] for v in range(node)] 31 | for i in range(node): 32 | for j in range(node + 2): 33 | if j < i + 2: 34 | supernet[i].append(get_rand_vector(layer_type)) 35 | else: 36 | supernet[i].append(0) 37 | return supernet 38 | 39 | 40 | def supernet_generator(node, layer_type): 41 | vec_length = len(layer_type) 42 | masked_vec = np.ones((1, vec_length))[0].tolist() 43 | supernet = [[] for v in range(node)] 44 | for i in range(node): 45 | for j in range(node + 2): 46 | if j < i + 2: 47 | supernet[i].append(masked_vec.copy()) 48 | else: 49 | supernet[i].append(0) 50 | return supernet 51 | 52 | 53 | def mask_specific_value(supernet, node_id, input_id, operation_id): 54 | supernet[node_id][input_id][operation_id] = 0.0 55 | return supernet 56 | 57 | 58 | def selected_specific_value(supernet, node_id, input_id, operation_id): 59 | for i in range(len(supernet[node_id][input_id])): 60 | if i != operation_id: 61 | supernet[node_id][input_id][i] = 0.0 62 | return supernet 63 | 64 | 65 | def encoding_to_masks(encoding): 66 | encoding = np.array(encoding).reshape(-1, 4) 67 | supernet_normal = supernet_generator(node, layer_type) 68 | supernet_reduce = supernet_generator(node, layer_type) 69 | supernet = [supernet_normal, supernet_reduce] 70 | mask = [] 71 | counter = 0 72 | for cell in supernet: 73 | mask_cell = [] 74 | for row in cell: 75 | mask_row = [] 76 | for col in row: 77 | if type(col) == type([]): 78 | mask_row.append(encoding[counter].tolist()) 79 | counter += 1 80 | else: 81 | mask_row.append(0) 82 | mask_cell.append(mask_row) 83 | mask.append(mask_cell) 84 | 85 | normal_mask = mask[0] 86 | reduce_mask = mask[1] 87 | 88 | return normal_mask, reduce_mask 89 | 90 | 91 | def supernet_mask(): 92 | supernet_normal = supernet_generator(node, layer_type) 93 | supernet_reduce = supernet_generator(node, layer_type) 94 | return supernet_normal, supernet_reduce 95 | 96 | 97 | def encode_supernet(): 98 | supernet_normal = supernet_generator(node, layer_type) 99 | supernet_reduce = supernet_generator(node, layer_type) 100 | supernet = [supernet_normal, supernet_reduce] 101 | 102 | layer_types_count = len(layer_type) 103 | count = 0 104 | assert type(supernet) == type([]) 105 | for cell in supernet: 106 | for row in cell: 107 | for col in row: 108 | if type(col) == type([]): 109 | count += layer_types_count 110 | return np.ones((count)).tolist() 111 | 112 | 113 | def define_search_space(): 114 | # hit-and-run default 115 | # A x <= b 116 | search_space = encode_supernet() 117 | A = [] 118 | b = [] 119 | init_point = [] 120 | param_pos = 0 121 | for i in range(0, len(search_space)): 122 | tmp = np.zeros(len(search_space)) 123 | tmp[i] = 1 124 | A.append(np.copy(tmp)) 125 | b.append(1.0000001) 126 | # we need relax a little bit here for the precision issue 127 | # A*x <= 1, we use 1.000+epsilon 128 | # A*x >= 0, we use 0-epslon 129 | A.append(-1 * np.copy(tmp)) 130 | b.append(0.0000001) 131 | for i in range(0, len(search_space)): 132 | if random.random() >= 0.5: 133 | init_point.append(0.0) 134 | else: 135 | init_point.append(1.0) 136 | return {"A": np.array(A), "b": np.array(b), "init_point": np.array(init_point)} 137 | 138 | 139 | c = [1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1] 140 | 141 | 142 | def expend_to_supernet_code(old_supernet): 143 | for i in range(len(old_supernet)): 144 | for j in range(len(old_supernet[i])): 145 | if old_supernet[i][j] == 0: 146 | old_supernet[i][j] = [0] * len(old_supernet[0][0]) 147 | for i in range(len(old_supernet)): 148 | for j in range(len(old_supernet[i])): 149 | for n in range(len(old_supernet[i][j])): 150 | old_supernet[i][j][n] = float(old_supernet[i][j][n]) 151 | 152 | return old_supernet 153 | 154 | 155 | 156 | normal, reduce = encoding_to_masks(c) 157 | 158 | normal = expend_to_supernet_code(normal) 159 | reduce = expend_to_supernet_code(reduce) 160 | 161 | print(normal) 162 | print(reduce) 163 | 164 | 165 | # print(supernet_reduce) -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/Evaluate/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import os 7 | import numpy as np 8 | import torch 9 | import shutil 10 | import torchvision.transforms as transforms 11 | from torch.autograd import Variable 12 | 13 | 14 | class AvgrageMeter(object): 15 | 16 | def __init__(self): 17 | self.reset() 18 | 19 | def reset(self): 20 | self.avg = 0 21 | self.sum = 0 22 | self.cnt = 0 23 | 24 | def update(self, val, n=1): 25 | self.sum += val * n 26 | self.cnt += n 27 | self.avg = self.sum / self.cnt 28 | 29 | 30 | def accuracy(output, target, topk=(1,)): 31 | maxk = max(topk) 32 | batch_size = target.size(0) 33 | 34 | _, pred = output.topk(maxk, 1, True, True) 35 | pred = pred.t() 36 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 37 | 38 | res = [] 39 | for k in topk: 40 | correct_k = correct[:k].view(-1).float().sum(0) 41 | res.append(correct_k.mul_(100.0/batch_size)) 42 | return res 43 | 44 | 45 | class Cutout(object): 46 | def __init__(self, length): 47 | self.length = length 48 | 49 | def __call__(self, img): 50 | h, w = img.size(1), img.size(2) 51 | mask = np.ones((h, w), np.float32) 52 | y = np.random.randint(h) 53 | x = np.random.randint(w) 54 | 55 | y1 = np.clip(y - self.length // 2, 0, h) 56 | y2 = np.clip(y + self.length // 2, 0, h) 57 | x1 = np.clip(x - self.length // 2, 0, w) 58 | x2 = np.clip(x + self.length // 2, 0, w) 59 | 60 | mask[y1: y2, x1: x2] = 0. 61 | mask = torch.from_numpy(mask) 62 | mask = mask.expand_as(img) 63 | img *= mask 64 | return img 65 | 66 | 67 | def _data_transforms_cifar10(cutout, cutout_length): 68 | CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124] 69 | CIFAR_STD = [0.24703233, 0.24348505, 0.26158768] 70 | 71 | train_transform = transforms.Compose([ 72 | transforms.RandomCrop(32, padding=4), 73 | transforms.RandomHorizontalFlip(), 74 | transforms.ToTensor(), 75 | transforms.Normalize(CIFAR_MEAN, CIFAR_STD), 76 | ]) 77 | if cutout: 78 | train_transform.transforms.append(Cutout(cutout_length)) 79 | 80 | valid_transform = transforms.Compose([ 81 | transforms.ToTensor(), 82 | transforms.Normalize(CIFAR_MEAN, CIFAR_STD), 83 | ]) 84 | return train_transform, valid_transform 85 | 86 | 87 | def count_parameters_in_MB(model): 88 | return np.sum(np.prod(v.size()) for name, v in model.named_parameters() if "auxiliary" not in name)/1e6 89 | 90 | 91 | def save_checkpoint(state, is_best, save): 92 | filename = os.path.join(save, 'checkpoint.pth.tar') 93 | torch.save(state, filename) 94 | if is_best: 95 | best_filename = os.path.join(save, 'model_best.pth.tar') 96 | shutil.copyfile(filename, best_filename) 97 | 98 | 99 | def save(model, model_path): 100 | torch.save(model.state_dict(), model_path) 101 | 102 | 103 | def load(model, model_path): 104 | model.load_state_dict(torch.load(model_path)) 105 | 106 | 107 | def drop_path(x, drop_prob): 108 | if drop_prob > 0.: 109 | keep_prob = 1. - drop_prob 110 | 111 | mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 112 | x.div_(keep_prob) 113 | try: 114 | x.mul_(mask) 115 | except: 116 | mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 117 | x.mul_(mask) 118 | return x 119 | 120 | def create_exp_dir(path, scripts_to_save=None): 121 | if not os.path.exists(path): 122 | os.mkdir(path) 123 | print('Experiment dir : {}'.format(path)) 124 | 125 | if scripts_to_save is not None: 126 | if not os.path.exists(os.path.join(path, 'scripts')): 127 | os.mkdir(os.path.join(path, 'scripts')) 128 | for script in scripts_to_save: 129 | dst_file = os.path.join(path, 'scripts', os.path.basename(script)) 130 | shutil.copyfile(script, dst_file) 131 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/LaNAS/Classifier.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | from torch.autograd import Variable 9 | import json 10 | from torch import optim 11 | import numpy as np 12 | 13 | # this is the backbone model 14 | # to split networks at a MCTS state 15 | class LinearModel(nn.Module): 16 | 17 | def __init__(self, input_dim, output_dim): 18 | super(LinearModel, self).__init__() 19 | self.fc1 = nn.Linear(input_dim, output_dim) 20 | torch.nn.init.xavier_uniform_( self.fc1.weight ) 21 | 22 | def forward(self, x): 23 | y = self.fc1(x) 24 | #print("=====>X_shape:", x.shape) 25 | return y 26 | 27 | # the input will be samples! 28 | class Classifier(): 29 | def __init__(self, samples, input_dim): 30 | self.training_counter = 0 31 | assert input_dim >= 1 32 | assert type(samples) == type({}) 33 | self.input_dim = input_dim 34 | self.samples = samples 35 | self.model = LinearModel(input_dim, 1) 36 | 37 | if torch.cuda.is_available(): 38 | self.model.cuda() 39 | self.l_rate = 0.00001 40 | self.optimiser = optim.Adam(self.model.parameters(), lr=self.l_rate, betas=(0.9, 0.999), eps=1e-08) 41 | self.epochs = 1 #TODO:revise to 100 42 | self.boundary = -1 43 | self.nets = [] 44 | 45 | def get_params(self): 46 | 47 | return self.model.fc1.weight.detach().cpu().numpy(), self.model.fc1.bias.detach().cpu().numpy() 48 | 49 | def reinit(self): 50 | torch.nn.init.xavier_uniform_( self.m.fc1.weight ) 51 | torch.nn.init.xavier_uniform_( self.m.fc2.weight ) 52 | 53 | def update_samples(self, latest_samples): 54 | assert type(latest_samples) == type(self.samples) 55 | sampled_nets = [] 56 | nets_acc = [] 57 | for k, v in latest_samples.items(): 58 | net = json.loads(k) 59 | sampled_nets.append( net ) 60 | nets_acc.append( v ) 61 | self.nets = torch.from_numpy(np.asarray(sampled_nets, dtype=np.float32).reshape(-1, self.input_dim)) 62 | self.acc = torch.from_numpy(np.asarray(nets_acc, dtype=np.float32).reshape(-1, 1)) 63 | self.samples = latest_samples 64 | if torch.cuda.is_available(): 65 | self.nets = self.nets.cuda() 66 | self.acc = self.acc.cuda() 67 | 68 | def train(self): 69 | if self.training_counter == 0: 70 | self.epochs = 1000#20000 71 | else: 72 | self.epochs = 1000#3000 73 | self.training_counter += 1 74 | # in a rare case, one branch has no networks 75 | if len(self.nets) == 0: 76 | return 77 | for epoch in range(self.epochs): 78 | epoch += 1 79 | nets = self.nets 80 | acc = self.acc 81 | #clear grads 82 | self.optimiser.zero_grad() 83 | #forward to get predicted values 84 | outputs = self.model.forward( nets ) 85 | loss = nn.MSELoss()(outputs, acc) 86 | loss.backward()# back props 87 | nn.utils.clip_grad_norm_(self.model.parameters(), 5) 88 | self.optimiser.step()# update the parameters 89 | # if epoch % 1000 == 0: 90 | # print('@' + self.__class__.__name__ + ' epoch {}, loss {}'.format(epoch, loss.data)) 91 | 92 | def predict(self, remaining): 93 | assert type(remaining) == type({}) 94 | remaining_archs = [] 95 | for k, v in remaining.items(): 96 | net = json.loads(k) 97 | remaining_archs.append( net ) 98 | remaining_archs = torch.from_numpy(np.asarray(remaining_archs, dtype=np.float32).reshape(-1, self.input_dim)) 99 | if torch.cuda.is_available(): 100 | remaining_archs = remaining_archs.cuda() 101 | outputs = self.model.forward(remaining_archs) 102 | if torch.cuda.is_available(): 103 | remaining_archs = remaining_archs.cpu() 104 | outputs = outputs.cpu() 105 | result = {} 106 | counter = 0 107 | for k in range(0, len(remaining_archs) ): 108 | counter += 1 109 | arch = remaining_archs[k].detach().numpy() 110 | arch_str = json.dumps( arch.tolist() ) 111 | result[ arch_str ] = outputs[k].detach().numpy().tolist()[0] 112 | assert len(result) == len(remaining) 113 | return result 114 | 115 | def split_predictions(self, remaining): 116 | assert type(remaining) == type({}) 117 | samples_badness = {} 118 | samples_goodies = {} 119 | if len(remaining) == 0: 120 | return samples_badness, samples_goodies 121 | predictions = self.predict(remaining) 122 | avg_acc = self.predict_mean() 123 | self.boundary = avg_acc 124 | for k, v in predictions.items(): 125 | if v < avg_acc: 126 | samples_badness[k] = v 127 | else: 128 | samples_goodies[k] = v 129 | assert len(samples_badness) + len(samples_goodies) == len(remaining) 130 | return samples_goodies, samples_badness 131 | 132 | 133 | def predict_mean(self): 134 | if len(self.nets) == 0: 135 | return 0 136 | # can we use the actual acc? 137 | outputs = self.model.forward(self.nets) 138 | pred_np = None 139 | if torch.cuda.is_available(): 140 | pred_np = outputs.detach().cpu().numpy() 141 | else: 142 | pred_np = outputs.detach().numpy() 143 | return np.mean(pred_np) 144 | 145 | def split_data(self): 146 | samples_badness = {} 147 | samples_goodies = {} 148 | if len(self.nets) == 0: 149 | return samples_badness, samples_goodies 150 | self.train() 151 | avg_acc = self.predict_mean() 152 | self.boundary = avg_acc 153 | for k, v in self.samples.items(): 154 | if v < avg_acc: 155 | samples_badness[k] = v 156 | else: 157 | samples_goodies[k] = v 158 | assert len(samples_badness) + len(samples_goodies) == len( self.samples ) 159 | return samples_goodies, samples_badness 160 | 161 | #test 162 | #with open('features.json', 'r') as infile: 163 | # data=json.loads( infile.read() ) 164 | #samples = {} 165 | #for d in data: 166 | # samples[ json.dumps(d['feature']) ] = d['acc'] 167 | # 168 | #print("total samples:", len(samples.keys() )) 169 | #c = Classifier(samples, 10) 170 | #goodies, badies = c.split_data() 171 | #print("goodies:", len(goodies.keys() ), np.mean(np.array(list(goodies.values()) ) )," bad:", len(badies.keys() ), np.mean(np.array(list(badies.values()) )) ) 172 | 173 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/LaNAS/mlp_predictor.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import numpy as np 7 | # import matplotlib.pyplot as plt 8 | import json 9 | from scipy.stats import norm 10 | from scipy.optimize import minimize 11 | import random 12 | import time 13 | import os 14 | import itertools 15 | import operator 16 | import torch 17 | import torch.nn as nn 18 | from torch.autograd import Variable 19 | import json 20 | from torch import optim 21 | import numpy as np 22 | import random 23 | 24 | 25 | class LinearModel(nn.Module): 26 | 27 | # def __init__(self, input_dim, output_dim): 28 | # super(LinearModel, self).__init__() 29 | # self.fc1 = nn.Linear(input_dim, output_dim) 30 | # torch.nn.init.xavier_uniform_( self.fc1.weight ) 31 | 32 | # def forward(self, x): 33 | # y = self.fc1(x) 34 | # return y 35 | 36 | def __init__(self, input_dim, output_dim): 37 | super(LinearModel, self).__init__() 38 | self.fc1 = nn.Linear(input_dim, 100) 39 | self.fc2 = nn.Linear(100, output_dim) 40 | 41 | torch.nn.init.xavier_uniform_( self.fc1.weight ) 42 | torch.nn.init.xavier_uniform_( self.fc2.weight ) 43 | 44 | def weights_init(self): 45 | torch.nn.init.xavier_uniform_( self.fc1.weight ) 46 | torch.nn.init.xavier_uniform_( self.fc2.weight ) 47 | 48 | def forward(self, x): 49 | x1 = self.fc1(x) 50 | x2 = torch.relu(x1) 51 | y = self.fc2(x2) 52 | y = torch.sigmoid(y) 53 | return y 54 | 55 | def train(self, samples): 56 | optimiser = optim.Adam(self.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08) 57 | 58 | X_sample = None 59 | Y_sample = None 60 | for sample in samples: 61 | if X_sample is None or Y_sample is None: 62 | X_sample = np.array( json.loads(sample) ) 63 | Y_sample = np.array( samples[ sample ] ) 64 | else: 65 | X_sample = np.vstack([X_sample, json.loads(sample) ] ) 66 | Y_sample = np.vstack([Y_sample, samples[ sample ] ] ) 67 | batch_size = 100 68 | print("dataset:", len(samples) ) 69 | chunks = int( X_sample.shape[0] / batch_size ) 70 | if X_sample.shape[0] % batch_size > 0: 71 | chunks += 1 72 | for epoch in range(0, 150): 73 | X_sample_split = np.array_split(X_sample, chunks) 74 | Y_sample_split = np.array_split(Y_sample, chunks) 75 | #print("epoch=", epoch) 76 | for i in range(0, chunks): 77 | optimiser.zero_grad() 78 | inputs = torch.from_numpy( np.asarray(X_sample_split[i], dtype=np.float32).reshape(X_sample_split[i].shape[0], X_sample_split[i].shape[1]) ) 79 | outputs = self.forward( inputs ) 80 | loss = nn.MSELoss()(outputs, torch.from_numpy( np.asarray(Y_sample_split[i], dtype=np.float32) ).reshape(-1, 1) ) 81 | loss.backward()# back props 82 | nn.utils.clip_grad_norm_(self.parameters(), 5) 83 | optimiser.step()# update the parameters 84 | 85 | def propose_networks( self, search_space ): 86 | ''' search space to predict by a meta-DNN for points selection ''' 87 | networks = [] 88 | for network in search_space.keys(): 89 | networks.append( json.loads( network ) ) 90 | X = np.array( networks ) 91 | X = torch.from_numpy( np.asarray(X, dtype=np.float32).reshape(X.shape[0], X.shape[1]) ) 92 | Y = self.forward( X ) 93 | Y = Y.data.numpy() 94 | Y = Y.reshape( len(networks) ) 95 | X = X.data.numpy( ) 96 | proposed_networks = [] 97 | n = 10 98 | if Y.shape[0] < n: 99 | n = Y.shape[0] 100 | indices = np.argsort(Y)[-n:] 101 | print("indices:", indices.shape) 102 | proposed_networks = X[indices] 103 | return proposed_networks.tolist() 104 | 105 | 106 | 107 | 108 | # ####preprocess data#### 109 | # dataset = [] 110 | # with open('nasbench_dataset', 'r') as infile: 111 | # dataset = json.loads( infile.read() ) 112 | # 113 | # samples = {} 114 | # for data in dataset: 115 | # samples[json.dumps(data["feature"])] = data["acc"] 116 | # 117 | # BEST_ACC = 0 118 | # BEST_ARCH = None 119 | # CURT_BEST = 0 120 | # BEST_TRACE = {} 121 | # for i in dataset: 122 | # arch = i['feature'] 123 | # acc = i['acc'] 124 | # if acc > BEST_ACC: 125 | # BEST_ACC = acc 126 | # BEST_ARCH = json.dumps( arch ) 127 | # print("##target acc:", BEST_ACC) 128 | # ####################### 129 | # 130 | # # bounds = np.array([[-1.0, 2.0]]) 131 | # noise = 0.2 132 | # # 133 | # # 134 | # # def f(X, noise=noise): 135 | # # return -np.sin(3*X) - X**2 + 0.7*X + noise * np.random.randn(*X.shape) 136 | # # 137 | # # X_init = np.array([[-0.9], [1.1]]) 138 | # # Y_init = f(X_init) 139 | # # 140 | # # X = np.arange(bounds[:, 0], bounds[:, 1], 0.01).reshape(-1, 1) 141 | # # Y = f(X,0) 142 | # # 143 | # 144 | # 145 | # 146 | # # Gaussian process with Matern kernel as surrogate model 147 | # 148 | # init_samples = random.sample(samples.keys(), 100) 149 | # 150 | # 151 | # # Initialize samples 152 | # # 153 | # # Number of iterations 154 | # n_iter = 1000000000000 155 | # # 156 | # # plt.figure(figsize=(12, n_iter * 3)) 157 | # # plt.subplots_adjust(hspace=0.4) 158 | # # 159 | # predictor = LinearModel(49, 1) 160 | # 161 | # window_size = 100 162 | # sample_counter = 0 163 | # 164 | # # # Obtain next sampling point from the acquisition function (expected_improvement) 165 | # X_next = propose_location(predictor, X_sample, Y_sample, samples) 166 | # # # Obtain next noisy sample from the objective function 167 | # for network in X_next: 168 | # X_sample = np.vstack([X_sample, network] ) 169 | # for network in X_next: 170 | # sample_counter += 1 171 | # acc = samples[ json.dumps( network.tolist() ) ] 172 | # if acc > CURT_BEST: 173 | # BEST_TRACE[json.dumps( network.tolist() ) ] = [acc, sample_counter] 174 | # CURT_BEST = acc 175 | # if acc == BEST_ACC: 176 | # sorted_best_traces = sorted(BEST_TRACE.items(), key=operator.itemgetter(1)) 177 | # for item in sorted_best_traces: 178 | # print(item[0],"==>", item[1]) 179 | # final_results = [] 180 | # for item in sorted_best_traces: 181 | # final_results.append( item[1] ) 182 | # final_results_str = json.dumps(final_results) 183 | # with open("result.txt", "a") as f: 184 | # f.write(final_results_str + '\n') 185 | # print("$$$$$$$$$$$$$$$$$$$CONGRATUGLATIONS$$$$$$$$$$$$$$$$$$$") 186 | # os._exit(1) 187 | # 188 | # print(network, acc) 189 | # del samples[ json.dumps( network.tolist() ) ] 190 | # Y_sample = np.vstack([Y_sample, acc] ) 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/README.md: -------------------------------------------------------------------------------- 1 | ## one-shot/few-shot LaNAS 2 | • fast to get a working result 3 | 4 | • the inaccurate prediction from supernet degrades the final network performance 5 | 6 | The one-shot LaNAS uses a pretrained supernet to predict the performance of a proposed architecture via masking. The following figure illustrates the search procedures. 7 | 8 |

9 | 10 |

11 | 12 | The training of supernet is same as the regular training except for that we apply a random mask at each iterations. 13 | 14 | ## Evaluating search algorithms on the supernet 15 | NASBench-101 has very limited architectures (~420K architectures), which can be easily predicted with some sort of predictor. Supernet can be a great alternative to solve this problem as it renders a search space having 10^21 architectures. Therefore, our supernet can also be used as a benchmark to evaluate different search algorithms. See Fig.6 in LaNAS paper. Please check how LaNAS interacts with supernet, and samples the architecture and its accuracy. 16 | 17 | 18 | ## Training the supernet 19 | You can skip this step if use our pre-trained supernet. 20 | 21 | Our supernet is designed for NASNet search space, and changing it to a new design space requires some work to change the codes. We're working on this issue, will update later. The training of supernet is fairly easy, simply 22 | 23 | ``` python train.py ``` 24 | 25 | - **Training on the ImageNet** 26 | 27 | Please use the training pipeline from Pytorch-Image-Models. Here we describe the procedures to do so: 28 | 1. get the supernet model from supernet_train.py, line 94 29 | 2. go to Pytorch-Image-Models 30 | 3. find pytorch-image-models/blob/master/timm/models/factory.py, replace line 57 as follows 31 | ``` 32 | # model = create_fn(**model_args, **kwargs) 33 | model = our-supernet 34 | ``` 35 | 36 | ## Searching with a supernet 37 | You can download the supernet pre-trained by us from here. Place it in the same folder, and start searching with 38 | 39 | 40 | ``` python train.py ``` 41 | 42 | The search results will be written into a results.txt, and you can read the results by 43 | 44 | ``` python read_result.py ``` 45 | 46 | The program outputs every samples with its test accuracy, e.g. 47 | 48 | >[[1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]] 81.69 3774 49 | 50 | > [1.0 .. 0.0] is the architecture encoding, which can be used to train a network later. 51 | 52 | > 81.69 is the test accuracy predicted from supernet via weight sharing. 53 | 54 | > 3774 means this is the 3774th sample. 55 | 56 | ## Training a searched network 57 | Once you pick a network after reading the results, you can train the network in the Evaluate folder. 58 | ``` 59 | cd Evaluate 60 | #attention, you need supply the code of target architecture in the argument of masked_code 61 | python super_individual_train.py --cutout --auxiliary --batch_size=16 --init_ch=36 --masked_code='[1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0]' 62 | ``` 63 | ## Improving with few-shot NAS 64 | Though one-shot NAS substantially reduces the computation cost by training only one supernet, to approximate the performance of every architecture in the search space via weight-sharing. However, the performance estimation can be very inaccurate due to the co-adaption among operations. 65 | Recently, we propose few-shot NAS that uses multiple supernetworks, called sub-supernet, each covering different regions of the search space to alleviate the undesired co-adaption. Since each sub-supernet only covers a small search space, compared to one-shot NAS, few-shot NAS improves the accuracy of architecture evaluation with a small increase of evaluation cost. Please see the following paper for details. 66 | 67 | Few-shot Neural Architecture Search
68 | in submission
69 | Yiyang Zhao (WPI), Linnan Wang (Brown), Yuandong Tian (FAIR), Rodrigo Fonseca (Brown), Tian Guo (WPI) 70 | 71 | **To Evaluate Few-shot NAS**, please check this repository. The following figures show the performance improvement of few-shot NAS. 72 |

73 | 74 | 75 |

76 | These figures basically tell you few-shot NAS is an effective trade-off between one-shot NAS and vanilla NAS, i.e. training from scratch that retains both good performance estimation of a network and the fast speed. 77 | 78 | 79 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/read_result.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | 7 | import json 8 | 9 | with open('result.txt') as json_data: 10 | data = json.load(json_data) 11 | 12 | counter = 0 13 | for elem in data: 14 | print(elem[0], elem[1], counter) 15 | counter += 1 16 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/search.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from supernet.generator import encode_supernet, define_search_space 7 | from LaNAS.MCTS import MCTS 8 | from supernet.supernet_train import Trainer 9 | import numpy as np 10 | import os 11 | import pickle 12 | 13 | ############## first step, training supernet 14 | 15 | # trainer.run(300) 16 | 17 | ############## second step, searching over the supernet 18 | search_space = define_search_space() 19 | print( search_space["init_point"], len( search_space["init_point"] ) ) 20 | # sample = mcts.zero_supernet_generator() 21 | # print(sample) 22 | # 23 | # 24 | trainer = Trainer( batch_size=40, init_channels= 48 ) 25 | trainer.load_model("./NASNet_Supernet.pt") 26 | 27 | # 28 | node_path = "mcts_agent" 29 | if os.path.isfile(node_path) == True: 30 | with open(node_path, 'rb') as json_data: 31 | agent = pickle.load(json_data) 32 | print("=====>loads:", len(agent.samples)," samples" ) 33 | print("=====>loads:", agent.SEARCH_COUNTER," counter" ) 34 | agent.search( ) 35 | else: 36 | mcts = MCTS(search_space, trainer, 5) 37 | sample = mcts.trainer.propose_nasnet_mask() 38 | result = trainer.infer_masks( sample ) 39 | mcts.collect_samples( result ) 40 | mcts.search( ) 41 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/supernet/operations.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | 10 | OPS = { 11 | 'none' : lambda C, stride, affine: Zero(stride), 12 | 'avg_pool_3x3' : lambda C, stride, affine: nn.AvgPool2d(3, stride=stride, padding=1, count_include_pad=False), 13 | 'max_pool_3x3' : lambda C, stride, affine: nn.MaxPool2d(3, stride=stride, padding=1), 14 | 'skip_connect' : lambda C, stride, affine: Identity() if stride == 1 else FactorizedReduce(C, C, affine=affine), 15 | 'sep_conv_3x3' : lambda C, stride, affine: SepConv(C, C, 3, stride, 1, affine=affine), 16 | 'sep_conv_5x5' : lambda C, stride, affine: SepConv(C, C, 5, stride, 2, affine=affine), 17 | 'sep_conv_7x7' : lambda C, stride, affine: SepConv(C, C, 7, stride, 3, affine=affine), 18 | 'dil_conv_3x3' : lambda C, stride, affine: DilConv(C, C, 3, stride, 2, 2, affine=affine), 19 | 'dil_conv_5x5' : lambda C, stride, affine: DilConv(C, C, 5, stride, 4, 2, affine=affine), 20 | 'conv_7x1_1x7' : lambda C, stride, affine: nn.Sequential( 21 | nn.ReLU(inplace=False), 22 | nn.Conv2d(C, C, (1,7), stride=(1, stride), padding=(0, 3), bias=False), 23 | nn.Conv2d(C, C, (7,1), stride=(stride, 1), padding=(3, 0), bias=False), 24 | nn.BatchNorm2d(C, affine=affine) 25 | ), 26 | 'conv_1x1' : lambda C, stride, affine: nn.Conv2d(C, C, (1,1), stride=(stride, stride), padding=(0,0), bias=False), 27 | 'conv_3x3' : lambda C, stride, affine: nn.Conv2d(C, C, (3,3), stride=(stride, stride), padding=(1,1), bias=False), 28 | 'conv_5x5' : lambda C, stride, affine: nn.Conv2d(C, C, (5,5), stride=(stride, stride), padding=(2,2), bias=False), 29 | } 30 | 31 | class ReLUConvBN(nn.Module): 32 | 33 | def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True): 34 | super(ReLUConvBN, self).__init__() 35 | self.op = nn.Sequential( 36 | nn.ReLU(inplace=False), 37 | nn.Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False), 38 | nn.BatchNorm2d(C_out, affine=affine) 39 | ) 40 | 41 | def forward(self, x): 42 | return self.op(x) 43 | 44 | class DilConv(nn.Module): 45 | 46 | def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation, affine=True): 47 | super(DilConv, self).__init__() 48 | self.op = nn.Sequential( 49 | nn.ReLU(inplace=False), 50 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=C_in, bias=False), 51 | nn.Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False), 52 | nn.BatchNorm2d(C_out, affine=affine), 53 | ) 54 | 55 | def forward(self, x): 56 | return self.op(x) 57 | 58 | 59 | class SepConv(nn.Module): 60 | 61 | def __init__(self, C_in, C_out, kernel_size, stride, padding, affine=True): 62 | super(SepConv, self).__init__() 63 | self.op = nn.Sequential( 64 | nn.ReLU(inplace=False), 65 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=stride, padding=padding, groups=C_in, bias=False), 66 | nn.Conv2d(C_in, C_in, kernel_size=1, padding=0, bias=False), 67 | nn.BatchNorm2d(C_in, affine=affine), 68 | nn.ReLU(inplace=False), 69 | nn.Conv2d(C_in, C_in, kernel_size=kernel_size, stride=1, padding=padding, groups=C_in, bias=False), 70 | nn.Conv2d(C_in, C_out, kernel_size=1, padding=0, bias=False), 71 | nn.BatchNorm2d(C_out, affine=affine), 72 | ) 73 | 74 | def forward(self, x): 75 | return self.op(x) 76 | 77 | 78 | class Conv2d(nn.Conv2d): 79 | 80 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, 81 | padding=0, dilation=1, groups=1, bias=True): 82 | super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride, 83 | padding, dilation, groups, bias) 84 | 85 | def forward(self, x): 86 | weight = self.weight 87 | weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2, 88 | keepdim=True).mean(dim=3, keepdim=True) 89 | weight = weight - weight_mean 90 | std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5 91 | weight = weight / std.expand_as(weight) 92 | return F.conv2d(x, weight, self.bias, self.stride, 93 | self.padding, self.dilation, self.groups) 94 | 95 | 96 | 97 | class Identity(nn.Module): 98 | 99 | def __init__(self): 100 | super(Identity, self).__init__() 101 | 102 | def forward(self, x): 103 | return x 104 | 105 | 106 | class Zero(nn.Module): 107 | 108 | def __init__(self, stride): 109 | super(Zero, self).__init__() 110 | self.stride = stride 111 | 112 | def forward(self, x): 113 | if self.stride == 1: 114 | return x.mul(0.) 115 | return x[:,:,::self.stride,::self.stride].mul(0.) 116 | 117 | 118 | class FactorizedReduce(nn.Module): 119 | 120 | def __init__(self, C_in, C_out, affine=True): 121 | super(FactorizedReduce, self).__init__() 122 | assert C_out % 2 == 0 123 | self.relu = nn.ReLU(inplace=False) 124 | self.conv_1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 125 | self.conv_2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False) 126 | self.bn = nn.BatchNorm2d(C_out, affine=affine) 127 | 128 | def forward(self, x): 129 | x = self.relu(x) 130 | out = torch.cat([self.conv_1(x), self.conv_2(x[:,:,1:,1:])], dim=1) 131 | out = self.bn(out) 132 | return out 133 | 134 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/supernet/supernet_model.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | from .operations import * 10 | from torch.autograd import Variable 11 | 12 | 13 | 14 | class MixedOp(nn.Module): 15 | def __init__(self, C, stride, layer_type): 16 | super(MixedOp, self).__init__() 17 | self._ops = nn.ModuleList() 18 | for layer in layer_type: 19 | op = OPS[layer](C, stride, False) 20 | if 'pool' in layer: 21 | op = nn.Sequential(op, nn.BatchNorm2d(C, affine=False)) 22 | self._ops.append(op) 23 | 24 | def forward(self, x, weights): 25 | # print(len(weights), len(self._ops)) 26 | return sum(w * op(x) for w, op in zip(weights, self._ops)) # use sum instead concat 27 | 28 | 29 | class Cell(nn.Module): 30 | def __init__(self, layer_type, steps, multiplier, C_prev_prev, C_prev, C, reduction, reduction_prev): 31 | super(Cell, self).__init__() 32 | self.reduction = reduction 33 | 34 | if reduction_prev: 35 | self.preprocess0 = FactorizedReduce(C_prev_prev, C, affine=False) 36 | else: 37 | self.preprocess0 = ReLUConvBN(C_prev_prev, C, 1, 1, 0, affine=False) 38 | self.preprocess1 = ReLUConvBN(C_prev, C, 1, 1, 0, affine=False) 39 | self._steps = steps 40 | self._multiplier = multiplier 41 | 42 | self._ops = nn.ModuleList() 43 | self._bns = nn.ModuleList() 44 | for i in range(self._steps): 45 | for j in range(2 + i): 46 | stride = 2 if reduction and j < 2 else 1 47 | op = MixedOp(C, stride, layer_type) 48 | self._ops.append(op) 49 | 50 | def forward(self, s0, s1, supernet_matrix): 51 | s0 = self.preprocess0(s0) 52 | s1 = self.preprocess1(s1) 53 | 54 | states = [s0, s1] 55 | offset = 0 56 | for i in range(self._steps): 57 | s = sum(self._ops[offset + j](h, supernet_matrix[i][j]) for j, h in enumerate(states)) 58 | 59 | offset += len(states) 60 | states.append(s) 61 | 62 | return torch.cat(states[-self._multiplier:], dim=1) 63 | 64 | 65 | class Network(nn.Module): 66 | def __init__(self, supernet_normal, supernet_reduce, layer_type, C, num_classes, layers, criterion, steps=4, multiplier=4, stem_multiplier=3): 67 | super(Network, self).__init__() 68 | self.supernet_normal = supernet_normal 69 | self.supernet_reduce = supernet_reduce 70 | self.layer_type = layer_type 71 | self._C = C 72 | self._num_classes = num_classes 73 | self._layers = layers 74 | self._criterion = criterion 75 | self._steps = steps 76 | self._multiplier = multiplier 77 | 78 | C_curr = stem_multiplier * C 79 | self.stem = nn.Sequential( 80 | nn.Conv2d(3, C_curr, 3, padding=1, bias=False), 81 | nn.BatchNorm2d(C_curr) 82 | ) 83 | 84 | C_prev_prev, C_prev, C_curr = C_curr, C_curr, C 85 | self.cells = nn.ModuleList() 86 | reduction_prev = False 87 | for i in range(layers): 88 | if i in [layers // 3, 2 * layers // 3]: 89 | C_curr *= 2 90 | reduction = True 91 | cell = Cell(self.layer_type, len(self.supernet_reduce), multiplier, C_prev_prev, C_prev, C_curr, reduction, reduction_prev) 92 | else: 93 | reduction = False 94 | cell = Cell(self.layer_type, steps, multiplier, C_prev_prev, C_prev, C_curr, reduction, reduction_prev) 95 | 96 | reduction_prev = reduction 97 | self.cells += [cell] 98 | C_prev_prev, C_prev = C_prev, multiplier * C_curr 99 | 100 | self.global_pooling = nn.AdaptiveAvgPool2d(1) 101 | self.classifier = nn.Linear(C_prev, num_classes) 102 | 103 | def change_masks(self, normal_mask, reduce_mask): 104 | self.supernet_normal = normal_mask 105 | self.supernet_reduce = reduce_mask 106 | 107 | def forward(self, input): 108 | s0 = s1 = self.stem(input) 109 | for i, cell in enumerate(self.cells): 110 | if not cell.reduction: 111 | s0, s1 = s1, cell(s0, s1, self.supernet_normal) 112 | else: 113 | s0, s1 = s1, cell(s0, s1, self.supernet_reduce) 114 | out = self.global_pooling(s1) 115 | logits = self.classifier(out.view(out.size(0), -1)) 116 | return logits 117 | 118 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/supernet/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | import os 7 | import numpy as np 8 | import torch 9 | import shutil 10 | import torchvision.transforms as transforms 11 | from torch.autograd import Variable 12 | 13 | 14 | class AvgrageMeter(object): 15 | 16 | def __init__(self): 17 | self.reset() 18 | 19 | def reset(self): 20 | self.avg = 0 21 | self.sum = 0 22 | self.cnt = 0 23 | 24 | def update(self, val, n=1): 25 | self.sum += val * n 26 | self.cnt += n 27 | self.avg = self.sum / self.cnt 28 | 29 | 30 | def accuracy(output, target, topk=(1,)): 31 | maxk = max(topk) 32 | batch_size = target.size(0) 33 | 34 | _, pred = output.topk(maxk, 1, True, True) 35 | pred = pred.t() 36 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 37 | 38 | res = [] 39 | for k in topk: 40 | correct_k = correct[:k].view(-1).float().sum(0) 41 | res.append(correct_k.mul_(100.0/batch_size)) 42 | return res 43 | 44 | 45 | class Cutout(object): 46 | def __init__(self, length): 47 | self.length = length 48 | 49 | def __call__(self, img): 50 | h, w = img.size(1), img.size(2) 51 | mask = np.ones((h, w), np.float32) 52 | y = np.random.randint(h) 53 | x = np.random.randint(w) 54 | 55 | y1 = np.clip(y - self.length // 2, 0, h) 56 | y2 = np.clip(y + self.length // 2, 0, h) 57 | x1 = np.clip(x - self.length // 2, 0, w) 58 | x2 = np.clip(x + self.length // 2, 0, w) 59 | 60 | mask[y1: y2, x1: x2] = 0. 61 | mask = torch.from_numpy(mask) 62 | mask = mask.expand_as(img) 63 | img *= mask 64 | return img 65 | 66 | 67 | def _data_transforms_cifar10(cutout, cutout_length): 68 | CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124] 69 | CIFAR_STD = [0.24703233, 0.24348505, 0.26158768] 70 | 71 | train_transform = transforms.Compose([ 72 | transforms.RandomCrop(32, padding=4), 73 | transforms.RandomHorizontalFlip(), 74 | transforms.ToTensor(), 75 | transforms.Normalize(CIFAR_MEAN, CIFAR_STD), 76 | ]) 77 | if cutout: 78 | train_transform.transforms.append(Cutout(cutout_length)) 79 | 80 | valid_transform = transforms.Compose([ 81 | transforms.ToTensor(), 82 | transforms.Normalize(CIFAR_MEAN, CIFAR_STD), 83 | ]) 84 | return train_transform, valid_transform 85 | 86 | 87 | def count_parameters_in_MB(model): 88 | return np.sum(np.prod(v.size()) for name, v in model.named_parameters() if "auxiliary" not in name)/1e6 89 | 90 | 91 | def save_checkpoint(state, is_best, save): 92 | filename = os.path.join(save, 'checkpoint.pth.tar') 93 | torch.save(state, filename) 94 | if is_best: 95 | best_filename = os.path.join(save, 'model_best.pth.tar') 96 | shutil.copyfile(filename, best_filename) 97 | 98 | 99 | def save(model, model_path): 100 | torch.save(model.state_dict(), model_path) 101 | 102 | 103 | def load(model, model_path): 104 | model.load_state_dict(torch.load(model_path)) 105 | 106 | 107 | def drop_path(x, drop_prob): 108 | if drop_prob > 0.: 109 | keep_prob = 1. - drop_prob 110 | 111 | mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 112 | x.div_(keep_prob) 113 | try: 114 | x.mul_(mask) 115 | except: 116 | mask = torch.cuda.HalfTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob) 117 | x.mul_(mask) 118 | return x 119 | 120 | # 121 | # def create_exp_dir(path, scripts_to_save=None): 122 | # if not os.path.exists(path): 123 | # os.mkdir(path) 124 | # print('Experiment dir : {}'.format(path)) 125 | # 126 | # if scripts_to_save is not None: 127 | # os.mkdir(os.path.join(path, 'scripts')) 128 | # for script in scripts_to_save: 129 | # dst_file = os.path.join(path, 'scripts', os.path.basename(script)) 130 | # shutil.copyfile(script, dst_file) 131 | 132 | 133 | def create_exp_dir(path, scripts_to_save=None): 134 | if not os.path.exists(path): 135 | os.mkdir(path) 136 | print('Experiment dir : {}'.format(path)) 137 | 138 | if scripts_to_save is not None: 139 | if not os.path.exists(os.path.join(path, 'scripts')): 140 | os.mkdir(os.path.join(path, 'scripts')) 141 | for script in scripts_to_save: 142 | dst_file = os.path.join(path, 'scripts', os.path.basename(script)) 143 | shutil.copyfile(script, dst_file) 144 | -------------------------------------------------------------------------------- /LaNAS/one-shot_LaNAS/train.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | # This source code is licensed under the license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | from supernet.generator import encode_supernet, define_search_space 7 | from LaNAS.MCTS import MCTS 8 | from supernet.supernet_train import Trainer 9 | 10 | 11 | trainer = Trainer( batch_size=80, init_channels= 48, epochs = 300 ) 12 | trainer.run() 13 | 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

4 | 5 | # Latent Action Monte Carlo Tree Search (LA-MCTS) 6 | 7 | LA-MCTS is a new MCTS based derivative-free meta-solver. It learns to partition the search space, so that solvers such as Bayesian optimization or evolutionary algorithms can focus on a smaller region to find better solutions with fewer samples. 8 | 9 | Please 🌟star🌟 the repo if you like our work, thank you. 10 | 11 | # Contributors 12 | Linnan Wang (First Author), Yuandong Tian (Principal Investigator), Yiyang Zhao, Saining Xie, Teng Li and Rodrigo Fonesca. 13 | 14 | # What's in this release? 15 | 16 | This release contains our implementation of LA-MCTS and its application to Neural Architecture Search (LaNAS), but it can also be applied to large-scale hyper-parameter optimization, reinforcement learning, scheduling, optimizing computer systems, and many others. 17 | 18 | ## Neural Architecture Search (NAS) 19 | - **Evaluation on NASBench-101** : Evaluating LaNAS on NASBench-101 on your laptop without training models. 20 | 21 | - **Our Searched Models, LaNet**: SoTA results: • 99.03% on CIFAR-10 • 77.7% @ 240MFLOPS on ImageNet. 22 | 23 | - **One/Few-shot LaNAS**: Using a supernet to evaluate the model, obtaining results within a few GPU days. 24 | 25 | - **Distributed LaNAS**: Distributed framework for LaNAS, usable with hundreds of GPUs. 26 | 27 | - **Training heuristics used**: We list all tricks used in ImageNet training to reach SoTA. 28 | 29 | ## Black-box optimization 30 | - **Performance with baselines**: 1 minute evaluations of LA-MCTS v.s. Bayesian Optimization and Evolutionary Search.
31 | **In the NeurIPS-2020 black box optimization challenge, the concept of LA-MCTS is used by 3rd (JetBrains) and 8th (KAIST) place teams. Check out the leaderboard here.** 32 | 33 | - **Mujoco Experiments**: LA-MCTS on Mujoco environment. 34 | 35 | 36 | # Project Logs 37 | ## Building the MCTS based NAS agent 38 | 39 | >Inspired by AlphaGo, we build the very first NAS search algorithm based on Monte Carlo Tree Search (MCTS) in 2017, namely AlphaX. The action space is fixed (layer-by-layer construction) and MCTS is used to steer towards promising search regions. We showed the Convolutional Neural Network designed by AlphaX improve the downstream applications such as detection, style transfer, image captioning, and many others. 40 | 41 | Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search
42 | AAAI-2020, [code]
43 | Linnan Wang (Brown), Yiyang Zhao(WPI), Yuu Jinnai(Brown), Yuandong Tian(FAIR), Rodrigo Fonseca(Brown)
44 | 45 | ## From AlphaX to LaNAS 46 | >On AlphaX, we find that different action space used in MCTS significantly affects the search efficiency, which motivates the idea of learning action space for MCTS on the fly during training. 47 | This leads to LaNAS. 48 | LaNAS uses a linear classifier at each decision node of MCTS to learn good versus bad actions, and evaluates each leaf node, which now represents a subregion of the search space rather than a single architecture, by a uniform random sampling one architecture and evalute. 49 | The first version of LaNAS implemented a distributed system to perform NAS by training every such samples from scratch using 500 GPUs. 50 | The second version of LaNAS, called one-shot LaNAS, uses a single one-shot subnetwork to evaluate the quality of samples, trading evaluation efficiency with accuracy. 51 | One-shot LaNAS finds a reasonable solution in a few GPU days. 52 | 53 | Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search
54 | TPAMI 2021
55 | Linnan Wang (Brown), Saining Xie (FAIR), Teng Li(FAIR), Rodrigo Fonesca (Brown), Yuandong Tian (FAIR)
56 | 57 | 58 | ## From LaNAS to a generic solver LA-MCTS 59 | > Since LaNAS works very well on NAS datasets, e.g. NASBench-101, and the core of the algorithm can be easily generalized to other problems, we extend it to be a generic solver for black-box function optimization. 60 | LA-MCTS further improves by using a nonlinear classifier at each decision node in MCTS and use a surrogate (e.g., a function approximator) to evaluate each sample in the leaf node. 61 | The surrogate can come from any existing Black-box optimizer (e.g., Bayesian Optimization). 62 | The details of LA-MCTS can be found in the following paper. 63 | 64 | Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search
65 | NeurIPS 2020
66 | Linnan Wang (Brown University), Rodrigo Fonesca (Brown University), Yuandong Tian (Facebook AI Research)
67 | 68 | ## From one-shot NAS to few-shot NAS 69 | > To overcome issues of one-shot NAS, we propose few-shot NAS that uses multiple supernets, each covering different regions of the search space specified by the intermediate of the search tree. Extensive experiments show that few-shot NAS significantly improves upon one-shot methods. See the paper below for details. 70 | 71 | Few-shot Neural Architecture Search
[code]
72 | Yiyang Zhao (WPI), Linnan Wang (Brown), Yuandong Tian (FAIR), Rodrigo Fonseca (Brown), Tian Guo (WPI) 73 | 74 | 75 | ## License 76 | LA-MCTS is under [CC-BY-NC 4.0 license](./LICENSE). 77 | --------------------------------------------------------------------------------