├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── Data ├── MIME_Test_Data.npy └── Roboturk_Test_Data.npy ├── DataGenerator ├── A_array_newcont_cond.npy ├── A_goal_directed.npy ├── B_array_newcont_cond.npy ├── B_goal_directed.npy ├── ContinuousNonZero.py ├── ContinuousTrajs.py ├── DeterministicGoalDirectedTraj.py ├── DirectedContinuousNonZero.py ├── DirectedContinuousTrajs.py ├── G_array_newcont_cond.npy ├── G_goal_directed.npy ├── GoalDirectedTrajs.py ├── NewGoalDirectedTraj.py ├── PolicyVisualizer.py ├── S_array_newcont_cond.npy ├── SeparableTrajs.py ├── X_array_newcont_cond.npy ├── X_goal_directed.npy ├── Y_array_newcont_cond.npy └── Y_goal_directed.npy ├── DataLoaders ├── GridWorld_DataLoader.py ├── InteractiveDataLoader.py ├── MIME_DataLoader.py ├── MIME_DataLoader.pyc ├── MIME_Img_DataLoader.py ├── MIMEandPlan_DataLoader.py ├── Plan_DataLoader.py ├── RandomWalks.py ├── RandomWalks.pyc ├── RoboturkeExp.py ├── SmallMaps_DataLoader.py ├── Translation.py ├── __init__.py ├── __init__.pyc ├── headers.py └── headers.pyc ├── DownstreamRL ├── PolicyNet.py └── TrainZPolicyRL.py ├── Experiments ├── Code_Runs │ └── CycleTransfer_Runs.py ├── DMP.py ├── DataLoaders.py ├── Eval_RLRewards.py ├── MIME_DataLoader.py ├── Master.py ├── MocapVisualizationExample.py ├── MocapVisualizationUtils.py ├── Mocap_DataLoader.py ├── PolicyManagers.py ├── PolicyNetworks.py ├── Processing_MocapData.py ├── RLUtils.py ├── Roboturk_DataLoader.py ├── TFLogger.py ├── TestClass.py ├── Visualizers.py ├── cluster_run.py └── headers.py ├── LICENSE └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore files. 2 | *.bvh 3 | *.html 4 | *.png 5 | *.jpg 6 | *.gif 7 | *.pyc 8 | Experiments/Experimental_Logs/* 9 | Experiments/Code_Runs/* -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to make participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies within all project spaces, and it also applies when 49 | an individual is representing the project or its community in public spaces. 50 | Examples of representing a project or community include using an official 51 | project e-mail address, posting via an official social media account, or acting 52 | as an appointed representative at an online or offline event. Representation of 53 | a project may be further defined and clarified by project maintainers. 54 | 55 | ## Enforcement 56 | 57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 58 | reported by contacting the project team at . All 59 | complaints will be reviewed and investigated and will result in a response that 60 | is deemed necessary and appropriate to the circumstances. The project team is 61 | obligated to maintain confidentiality with regard to the reporter of an incident. 62 | Further details of specific enforcement policies may be posted separately. 63 | 64 | Project maintainers who do not follow or enforce the Code of Conduct in good 65 | faith may face temporary or permanent repercussions as determined by other 66 | members of the project's leadership. 67 | 68 | ## Attribution 69 | 70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 72 | 73 | [homepage]: https://www.contributor-covenant.org 74 | 75 | For answers to common questions about this code of conduct, see 76 | https://www.contributor-covenant.org/faq 77 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to CausalSkillLearning 2 | We want to make contributing to this project as easy and transparent as 3 | possible. 4 | 5 | ## Pull Requests 6 | We actively welcome your pull requests. 7 | 8 | 1. Fork the repo and create your branch from `master`. 9 | 2. If you've added code that should be tested, add tests. 10 | 3. If you've changed APIs, update the documentation. 11 | 4. Ensure the test suite passes. 12 | 5. Make sure your code lints. 13 | 6. If you haven't already, complete the Contributor License Agreement ("CLA"). 14 | 15 | ## Contributor License Agreement ("CLA") 16 | In order to accept your pull request, we need you to submit a CLA. You only need 17 | to do this once to work on any of Facebook's open source projects. 18 | 19 | Complete your CLA here: 20 | 21 | ## Issues 22 | We use GitHub issues to track public bugs. Please ensure your description is 23 | clear and has sufficient instructions to be able to reproduce the issue. 24 | 25 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe 26 | disclosure of security bugs. In those cases, please go through the process 27 | outlined on that page and do not file a public issue. 28 | 29 | ## License 30 | By contributing to CausalSkillLearning, you agree that your contributions will be licensed 31 | under the LICENSE file in the root directory of this source tree. -------------------------------------------------------------------------------- /Data/MIME_Test_Data.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/Data/MIME_Test_Data.npy -------------------------------------------------------------------------------- /Data/Roboturk_Test_Data.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/Data/Roboturk_Test_Data.npy -------------------------------------------------------------------------------- /DataGenerator/A_array_newcont_cond.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/A_array_newcont_cond.npy -------------------------------------------------------------------------------- /DataGenerator/A_goal_directed.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/A_goal_directed.npy -------------------------------------------------------------------------------- /DataGenerator/B_array_newcont_cond.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/B_array_newcont_cond.npy -------------------------------------------------------------------------------- /DataGenerator/B_goal_directed.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/B_goal_directed.npy -------------------------------------------------------------------------------- /DataGenerator/ContinuousNonZero.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | from IPython import embed 9 | 10 | number_datapoints = 50000 11 | number_timesteps = 20 12 | 13 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 14 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 15 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 16 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 17 | 18 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 19 | 20 | for i in range(number_datapoints): 21 | if i%1000==0: 22 | print("Processing Datapoint: ",i) 23 | b_array_dataset[i,0] = 1. 24 | 25 | x_array_dataset[i,0] = 5*(np.random.random((2))-0.5) 26 | 27 | reset_counter = 0 28 | for t in range(number_timesteps-1): 29 | 30 | # GET B 31 | if t>0: 32 | # b_array[t] = np.random.binomial(1,prob_b_given_x) 33 | # b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]]) 34 | 35 | # If 3,4,5 timesteps have passed, terminate. 36 | if reset_counter>=3 and reset_counter<5: 37 | b_array_dataset[i,t] = np.random.binomial(1,0.33) 38 | elif reset_counter==5: 39 | b_array_dataset[i,t] = 1 40 | 41 | # GET Y 42 | if b_array_dataset[i,t]: 43 | y_array_dataset[i,t] = np.random.random_integers(0,high=3) 44 | reset_counter = 0 45 | else: 46 | reset_counter+=1 47 | y_array_dataset[i,t] = y_array_dataset[i,t-1] 48 | 49 | # GET A 50 | 51 | # -0.05 is because the noise is from 0-0.1, so to balance this we make it -0.05 52 | a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2)) 53 | 54 | # GET X 55 | x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t] 56 | 57 | # embed() 58 | 59 | np.save("X_array_continuous_nonzero.npy",x_array_dataset) 60 | np.save("Y_array_continuous_nonzero.npy",y_array_dataset) 61 | np.save("B_array_continuous_nonzero.npy",b_array_dataset) 62 | np.save("A_array_continuous_nonzero.npy",a_array_dataset) -------------------------------------------------------------------------------- /DataGenerator/ContinuousTrajs.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | from IPython import embed 9 | 10 | number_datapoints = 50000 11 | number_timesteps = 20 12 | 13 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 14 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 15 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 16 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 17 | 18 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 19 | 20 | for i in range(number_datapoints): 21 | if i%1000==0: 22 | print("Processing Datapoint: ",i) 23 | b_array_dataset[i,0] = 1. 24 | 25 | reset_counter = 0 26 | for t in range(number_timesteps-1): 27 | 28 | # GET B 29 | if t>0: 30 | # b_array[t] = np.random.binomial(1,prob_b_given_x) 31 | # b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]]) 32 | 33 | # If 3,4,5 timesteps have passed, terminate. 34 | if reset_counter>=3 and reset_counter<5: 35 | b_array_dataset[i,t] = np.random.binomial(1,0.33) 36 | elif reset_counter==5: 37 | b_array_dataset[i,t] = 1 38 | 39 | # GET Y 40 | if b_array_dataset[i,t]: 41 | y_array_dataset[i,t] = np.random.random_integers(0,high=3) 42 | reset_counter = 0 43 | else: 44 | reset_counter+=1 45 | y_array_dataset[i,t] = y_array_dataset[i,t-1] 46 | 47 | # GET A 48 | a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2)) 49 | 50 | # GET X 51 | x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t] 52 | 53 | # embed() 54 | 55 | np.save("X_array_continuous.npy",x_array_dataset) 56 | np.save("Y_array_continuous.npy",y_array_dataset) 57 | np.save("B_array_continuous.npy",b_array_dataset) 58 | np.save("A_array_continuous.npy",a_array_dataset) -------------------------------------------------------------------------------- /DataGenerator/DeterministicGoalDirectedTraj.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | from IPython import embed 9 | import matplotlib.pyplot as plt 10 | 11 | #number_datapoints = 20 12 | number_datapoints = 50000 13 | number_timesteps = 25 14 | 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int) 20 | 21 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 22 | start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5 23 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5 24 | 25 | valid_options = np.array([[2,3],[3,0],[1,2],[0,1]]) 26 | 27 | lim = 25 28 | 29 | for i in range(number_datapoints): 30 | 31 | if i%1000==0: 32 | print("Processing Datapoint: ",i) 33 | 34 | # b_array_dataset[i,0] = 1. 35 | goal_array_dataset[i] = np.random.random_integers(0,high=3) 36 | 37 | # Adding random noise to start state. 38 | x_array_dataset[i,-1] = goal_states[goal_array_dataset[i]] + 0.1*(np.random.random(2)-0.5) 39 | goal = goal_states[goal_array_dataset[i]] 40 | 41 | reset_counter = 0 42 | # for t in range(number_timesteps-1): 43 | for t in reversed(range(number_timesteps-1)): 44 | 45 | # GET B # Must end on b==0. 46 | if t<(number_timesteps-2): 47 | # b_array[t] = np.random.binomial(1,prob_b_given_x) 48 | # b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]]) 49 | 50 | # If 3,4,5 timesteps have passed, terminate. 51 | if t<3: 52 | b_array_dataset[i,t] = 0 53 | elif reset_counter>=3 and reset_counter<5: 54 | b_array_dataset[i,t] = np.random.binomial(1,0.33) 55 | elif reset_counter==5: 56 | b_array_dataset[i,t] = 1 57 | elif t==(number_timesteps-2): 58 | b_array_dataset[i,t] = 1 59 | 60 | # GET Y 61 | if b_array_dataset[i,t]: 62 | current_state = x_array_dataset[i,t+1] 63 | unnorm_directions = current_state-goal.squeeze(0) 64 | directions = unnorm_directions/abs(unnorm_directions) 65 | 66 | # Set valid options. 67 | dot_product = np.dot(action_map, directions) 68 | # valid_options = np.where(dot_product>=0)[0] 69 | # Sincer we're going backwards in time, 70 | valid_options = np.where(dot_product<=0)[0] 71 | 72 | # Compare states. If x-g_x>y_g_y, choose to go along... 73 | # embed() 74 | 75 | # y_array_dataset[i,t] = np.random.choice(valid_options) 76 | y_array_dataset[i,t] = valid_options[np.argmax(np.dot(action_map,unnorm_directions)[valid_options])] 77 | 78 | reset_counter = 0 79 | else: 80 | reset_counter+=1 81 | y_array_dataset[i,t] = y_array_dataset[i,t+1] 82 | 83 | # GET A 84 | a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2)) 85 | 86 | # GET X 87 | # x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t] 88 | x_array_dataset[i,t] = x_array_dataset[i,t+1]-a_array_dataset[i,t] 89 | 90 | plt.scatter(goal_states[:,0],goal_states[:,1],s=50) 91 | plt.scatter(x_array_dataset[i,:,0],x_array_dataset[i,:,1],cmap='jet',c=range(25)) 92 | plt.xlim(-lim, lim) 93 | plt.ylim(-lim, lim) 94 | plt.show() 95 | 96 | # Roll over b's. 97 | b_array_dataset = np.roll(b_array_dataset,1,axis=1) 98 | 99 | 100 | np.save("X_deter_goal_directed.npy",x_array_dataset) 101 | np.save("Y_deter_goal_directed.npy",y_array_dataset) 102 | np.save("B_deter_goal_directed.npy",b_array_dataset) 103 | np.save("A_deter_goal_directed.npy",a_array_dataset) 104 | np.save("G_deter_goal_directed.npy",goal_array_dataset) 105 | -------------------------------------------------------------------------------- /DataGenerator/DirectedContinuousNonZero.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | from IPython import embed 9 | 10 | number_datapoints = 50000 11 | number_timesteps = 25 12 | 13 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 14 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 15 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 16 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 17 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int) 18 | 19 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 20 | start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5 21 | valid_options = np.array([[2,3],[3,0],[1,2],[0,1]]) 22 | 23 | for i in range(number_datapoints): 24 | 25 | if i%1000==0: 26 | print("Processing Datapoint: ",i) 27 | b_array_dataset[i,0] = 1. 28 | 29 | # Select one of four starting points. (-2,-2), (-2,2), (2,-2), (2,2) 30 | goal_array_dataset[i] = np.random.random_integers(0,high=3) 31 | # Adding random noise to start state. 32 | x_array_dataset[i,0] = start_states[goal_array_dataset[i]] + 0.2*(np.random.random(2)-0.5) 33 | goal = -start_states[goal_array_dataset[i]] 34 | 35 | reset_counter = 0 36 | for t in range(number_timesteps-1): 37 | 38 | # GET B 39 | if t>0: 40 | # b_array[t] = np.random.binomial(1,prob_b_given_x) 41 | # b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]]) 42 | 43 | # If 3,4,5 timesteps have passed, terminate. 44 | if reset_counter>=3 and reset_counter<5: 45 | b_array_dataset[i,t] = np.random.binomial(1,0.33) 46 | elif reset_counter==5: 47 | b_array_dataset[i,t] = 1 48 | 49 | # GET Y 50 | if b_array_dataset[i,t]: 51 | 52 | axes = -goal/abs(goal) 53 | step1 = 30*np.ones((2))-axes*np.abs(x_array_dataset[i,t]-x_array_dataset[i,0]) 54 | # baseline = t*20*np.sqrt(2)/20 55 | baseline = t 56 | step2 = step1-baseline 57 | step3 = step2/step2.sum() 58 | y_array_dataset[i,t] = np.random.choice(valid_options[goal_array_dataset[i][0]]) 59 | 60 | reset_counter = 0 61 | else: 62 | reset_counter+=1 63 | y_array_dataset[i,t] = y_array_dataset[i,t-1] 64 | 65 | # GET A 66 | a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2)) 67 | 68 | # GET X 69 | x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t] 70 | 71 | np.save("X_dir_cont_nonzero.npy",x_array_dataset) 72 | np.save("Y_dir_cont_nonzero.npy",y_array_dataset) 73 | np.save("B_dir_cont_nonzero.npy",b_array_dataset) 74 | np.save("A_dir_cont_nonzero.npy",a_array_dataset) 75 | np.save("G_dir_cont_nonzero.npy",goal_array_dataset) 76 | -------------------------------------------------------------------------------- /DataGenerator/DirectedContinuousTrajs.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | from IPython import embed 9 | 10 | number_datapoints = 50000 11 | number_timesteps = 25 12 | 13 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 14 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 15 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 16 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 17 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int) 18 | 19 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 20 | start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5 21 | valid_options = np.array([[2,3],[3,0],[1,2],[0,1]]) 22 | 23 | for i in range(number_datapoints): 24 | 25 | if i%1000==0: 26 | print("Processing Datapoint: ",i) 27 | b_array_dataset[i,0] = 1. 28 | 29 | # Select one of four starting points. (-2,-2), (-2,2), (2,-2), (2,2) 30 | goal_array_dataset[i] = np.random.random_integers(0,high=3) 31 | x_array_dataset[i,0] = start_states[goal_array_dataset[i]] 32 | goal = -start_states[goal_array_dataset[i]] 33 | 34 | reset_counter = 0 35 | for t in range(number_timesteps-1): 36 | 37 | # GET B 38 | if t>0: 39 | # b_array[t] = np.random.binomial(1,prob_b_given_x) 40 | # b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]]) 41 | 42 | # If 3,4,5 timesteps have passed, terminate. 43 | if reset_counter>=3 and reset_counter<5: 44 | b_array_dataset[i,t] = np.random.binomial(1,0.33) 45 | elif reset_counter==5: 46 | b_array_dataset[i,t] = 1 47 | 48 | # GET Y 49 | if b_array_dataset[i,t]: 50 | 51 | axes = -goal/abs(goal) 52 | step1 = 30*np.ones((2))-axes*np.abs(x_array_dataset[i,t]-x_array_dataset[i,0]) 53 | # baseline = t*20*np.sqrt(2)/20 54 | baseline = t 55 | step2 = step1-baseline 56 | step3 = step2/step2.sum() 57 | y_array_dataset[i,t] = np.random.choice(valid_options[goal_array_dataset[i][0]]) 58 | 59 | reset_counter = 0 60 | else: 61 | reset_counter+=1 62 | y_array_dataset[i,t] = y_array_dataset[i,t-1] 63 | 64 | # GET A 65 | a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2)) 66 | 67 | # GET X 68 | x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t] 69 | 70 | np.save("X_array_directed_continuous.npy",x_array_dataset) 71 | np.save("Y_array_directed_continuous.npy",y_array_dataset) 72 | np.save("B_array_directed_continuous.npy",b_array_dataset) 73 | np.save("A_array_directed_continuous.npy",a_array_dataset) -------------------------------------------------------------------------------- /DataGenerator/G_array_newcont_cond.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/G_array_newcont_cond.npy -------------------------------------------------------------------------------- /DataGenerator/G_goal_directed.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/G_goal_directed.npy -------------------------------------------------------------------------------- /DataGenerator/GoalDirectedTrajs.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | from IPython import embed 9 | import matplotlib.pyplot as plt 10 | 11 | number_datapoints = 1 12 | # number_datapoints = 50000 13 | number_timesteps = 25 14 | 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int) 20 | 21 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 22 | start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5 23 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5 24 | 25 | valid_options = np.array([[2,3],[3,0],[1,2],[0,1]]) 26 | 27 | for i in range(number_datapoints): 28 | 29 | if i%1000==0: 30 | print("Processing Datapoint: ",i) 31 | 32 | # b_array_dataset[i,0] = 1. 33 | goal_array_dataset[i] = np.random.random_integers(0,high=3) 34 | 35 | # Adding random noise to start state. 36 | x_array_dataset[i,-1] = goal_states[goal_array_dataset[i]] + 0.1*(np.random.random(2)-0.5) 37 | goal = goal_states[goal_array_dataset[i]] 38 | 39 | reset_counter = 0 40 | # for t in range(number_timesteps-1): 41 | for t in reversed(range(number_timesteps-1)): 42 | 43 | # GET B # Must end on b==0. 44 | if t<(number_timesteps-2): 45 | # b_array[t] = np.random.binomial(1,prob_b_given_x) 46 | # b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]]) 47 | 48 | # If 3,4,5 timesteps have passed, terminate. 49 | if t<3: 50 | b_array_dataset[i,t] = 0 51 | elif reset_counter>=3 and reset_counter<5: 52 | b_array_dataset[i,t] = np.random.binomial(1,0.33) 53 | elif reset_counter==5: 54 | b_array_dataset[i,t] = 1 55 | elif t==(number_timesteps-2): 56 | b_array_dataset[i,t] = 1 57 | 58 | # GET Y 59 | if b_array_dataset[i,t]: 60 | current_state = x_array_dataset[i,t+1] 61 | # directions = current_state-goal.squeeze(0) 62 | directions = goal.squeeze(0)-current_state 63 | norm_directions = directions/abs(directions) 64 | 65 | # # Set valid options. 66 | dot_product = np.dot(action_map, norm_directions) 67 | # valid_options = np.where(dot_product>=0)[0] 68 | # # Sincer we're going backwards in time, 69 | valid_options = np.where(dot_product<=0)[0] 70 | 71 | # # axes = -goal/abs(goal) 72 | # # step1 = 30*np.ones((2))-axes*np.abs(x_array_dataset[i,t]-x_array_dataset[i,0]) 73 | # # # baseline = t*20*np.sqrt(2)/20 74 | # # baseline = t 75 | # # step2 = step1-baseline 76 | # # step3 = step2/step2.sum() 77 | # # y_array_dataset[i,t] = np.random.choice(valid_options[goal_array_dataset[i][0]]) 78 | # embed() 79 | dot_product = np.dot(action_map,directions) 80 | 81 | y_array_dataset[i,t] = np.argmax(dot_product) 82 | # y_array_dataset[i,t] = np.random.choice(valid_options) 83 | 84 | reset_counter = 0 85 | else: 86 | reset_counter+=1 87 | y_array_dataset[i,t] = y_array_dataset[i,t+1] 88 | 89 | # GET A 90 | a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2)) 91 | 92 | # GET X 93 | # x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t] 94 | x_array_dataset[i,t] = x_array_dataset[i,t+1]-a_array_dataset[i,t] 95 | 96 | plt.scatter(goal_states[:,0],goal_states[:,1],s=50) 97 | plt.scatter(x_array_dataset[i,:,0],x_array_dataset[i,:,1],cmap='jet',c=range(25)) 98 | plt.xlim(-25,25) 99 | plt.ylim(-25,25) 100 | plt.show() 101 | 102 | # Roll over b's. 103 | b_array_dataset = np.roll(b_array_dataset,1,axis=1) 104 | 105 | 106 | np.save("X_goal_directed.npy",x_array_dataset) 107 | np.save("Y_goal_directed.npy",y_array_dataset) 108 | np.save("B_goal_directed.npy",b_array_dataset) 109 | np.save("A_goal_directed.npy",a_array_dataset) 110 | np.save("G_goal_directed.npy",goal_array_dataset) 111 | -------------------------------------------------------------------------------- /DataGenerator/NewGoalDirectedTraj.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np, copy 8 | from IPython import embed 9 | import matplotlib.pyplot as plt 10 | 11 | number_datapoints = 20 12 | # number_datapoints = 50000 13 | number_timesteps = 25 14 | 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int) 20 | 21 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 22 | # start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5 23 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*10 24 | 25 | # Creating a policy map. 26 | lim = 50 27 | size = 9 28 | scale = 5 29 | policy_map = np.zeros((size,size),dtype=int) 30 | 31 | # Row wise assignment: 32 | policy_map[0,:] = 2 33 | 34 | policy_map[1,:7] = 2 35 | policy_map[1,7:] = 1 36 | 37 | policy_map[2:4,0] = 2 38 | policy_map[2:4,1:4] = 3 39 | policy_map[2:4,4:7] = 2 40 | policy_map[2:4,7:] = 1 41 | 42 | policy_map[4,:4] = 3 43 | policy_map[4,4] = 3 44 | policy_map[4,5:] = 1 45 | 46 | policy_map[5,:3] = 3 47 | policy_map[5,3:5] = 0 48 | policy_map[5,5:] = 1 49 | 50 | policy_map[6,:2] = 3 51 | policy_map[6,2:7] = 0 52 | policy_map[6,7:] = 1 53 | 54 | policy_map[7:,0] = 3 55 | policy_map[7:,1:7] = 0 56 | policy_map[7:,7:] = 1 57 | 58 | # policy_map = np.transpose(policy_map) 59 | 60 | goal_based_policy_maps = np.zeros((4,size,size)) 61 | goal_based_policy_maps[0] = copy.deepcopy(policy_map) 62 | goal_based_policy_maps[1] = np.flipud(policy_map) 63 | goal_based_policy_maps[2] = np.fliplr(policy_map) 64 | goal_based_policy_maps[3] = np.flipud(np.fliplr(policy_map)) 65 | 66 | def get_bucket(state, reference_state): 67 | # baseline = 4*np.ones(2) 68 | baseline = np.zeros(2) 69 | compensated_state = state - reference_state 70 | # compensated_state = (np.round(state - reference_state) + baseline).astype(int) 71 | 72 | x = (np.arange(-(size-1)/2,(size-1)/2+1)-0.5)*scale 73 | 74 | bucket = np.zeros((2)) 75 | 76 | bucket[0] = min(np.searchsorted(x,compensated_state[0]),size-1) 77 | bucket[1] = min(np.searchsorted(x,compensated_state[1]),size-1) 78 | 79 | return bucket.astype(int) 80 | 81 | for i in range(number_datapoints): 82 | 83 | if i%1000==0: 84 | print("Processing Datapoint: ",i) 85 | 86 | # b_array_dataset[i,0] = 1. 87 | goal_array_dataset[i] = np.random.random_integers(0,high=3) 88 | 89 | # Adding random noise to start state. 90 | # x_array_dataset[i,0] = goal_states[goal_array_dataset[i]] + 0.1*(np.random.random(2)-0.5) 91 | 92 | scale = 25 93 | x_array_dataset[i,0] = goal_states[goal_array_dataset[i]] + scale*(np.random.random(2)-0.5) 94 | goal = goal_states[goal_array_dataset[i]] 95 | 96 | reset_counter = 0 97 | for t in range(number_timesteps-1): 98 | 99 | # GET B 100 | if t>0: 101 | # If 3,4,5 timesteps have passed, terminate. 102 | if reset_counter>=3 and reset_counter<5: 103 | b_array_dataset[i,t] = np.random.binomial(1,0.33) 104 | elif reset_counter==5: 105 | b_array_dataset[i,t] = 1 106 | 107 | # GET Y 108 | if b_array_dataset[i,t]: 109 | current_state = x_array_dataset[i,t] 110 | 111 | # Select options from policy map, based on the bucket the current state falls in. 112 | bucket = get_bucket(current_state, goal_states[goal_array_dataset[i]][0]) 113 | # Now that we've the bucket, pick the option we should be executing given the bucket. 114 | 115 | if (bucket==0).all(): 116 | y_array_dataset[i,t] = np.random.randint(0,high=4) 117 | else: 118 | y_array_dataset[i,t] = goal_based_policy_maps[goal_array_dataset[i], bucket[0], bucket[1]] 119 | y_array_dataset[i,t] = policy_map[bucket[0], bucket[1]] 120 | reset_counter = 0 121 | else: 122 | reset_counter+=1 123 | y_array_dataset[i,t] = y_array_dataset[i,t-1] 124 | 125 | # GET A 126 | a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.1*(np.random.random((2))-0.5) 127 | 128 | # GET X 129 | # Already taking care of backwards generation here, no need to use action_compliments. 130 | 131 | x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t] 132 | 133 | plt.scatter(goal_states[:,0],goal_states[:,1],s=50) 134 | # plt.scatter() 135 | plt.scatter(x_array_dataset[i,:,0],x_array_dataset[i,:,1],cmap='jet',c=range(25)) 136 | plt.xlim(-lim,lim) 137 | plt.ylim(-lim,lim) 138 | plt.show() 139 | 140 | # Roll over b's. 141 | b_array_dataset = np.roll(b_array_dataset,1,axis=1) 142 | 143 | 144 | np.save("X_goal_directed.npy",x_array_dataset) 145 | np.save("Y_goal_directed.npy",y_array_dataset) 146 | np.save("B_goal_directed.npy",b_array_dataset) 147 | np.save("A_goal_directed.npy",a_array_dataset) 148 | np.save("G_goal_directed.npy",goal_array_dataset) 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | -------------------------------------------------------------------------------- /DataGenerator/PolicyVisualizer.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np, copy 8 | from IPython import embed 9 | import matplotlib.pyplot as plt 10 | 11 | number_datapoints = 20 12 | # number_datapoints = 50000 13 | number_timesteps = 25 14 | 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int) 20 | 21 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 22 | # action_map = np.array([[-1,0],[0,-1],[1,0],[0,1]]) 23 | 24 | # start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5 25 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5 26 | 27 | # Creating a policy map. 28 | size = 9 29 | scale = 5 30 | policy_map = np.zeros((size,size),dtype=int) 31 | 32 | # Row wise assignment: 33 | policy_map[0,:] = 2 34 | 35 | policy_map[1,:7] = 2 36 | policy_map[1,7:] = 1 37 | 38 | policy_map[2:4,0] = 2 39 | policy_map[2:4,1:4] = 3 40 | policy_map[2:4,4:7] = 2 41 | policy_map[2:4,7:] = 1 42 | 43 | policy_map[4,:4] = 3 44 | policy_map[4,4] = 3 45 | policy_map[4,5:] = 1 46 | 47 | policy_map[5,:3] = 3 48 | policy_map[5,3:5] = 0 49 | policy_map[5,5:] = 1 50 | 51 | policy_map[6,:2] = 3 52 | policy_map[6,2:7] = 0 53 | policy_map[6,7:] = 1 54 | 55 | policy_map[7:,0] = 3 56 | policy_map[7:,1:7] = 0 57 | policy_map[7:,7:] = 1 58 | 59 | policy_map = np.transpose(policy_map) 60 | 61 | 62 | # x = np.meshgrid(range(9),range(9)) 63 | x = np.meshgrid(np.arange(9),np.arange(9)) 64 | dxdy = action_map[policy_map[x[0],x[1]]] 65 | 66 | traj = np.zeros((10,2)) 67 | traj[0] = [0,8] 68 | for t in range(9): 69 | # embed() 70 | action_index = policy_map[int(traj[t,0]),int(traj[t,1])] 71 | action = action_map[action_index] 72 | traj[t+1] = traj[t] + action 73 | print(action_index, action) 74 | 75 | plt.ylim(9,-1) 76 | plt.plot(traj[:,0],traj[:,1],'or') 77 | plt.plot(traj[:,0],traj[:,1],'r') 78 | 79 | plt.scatter(x[0],x[1]) 80 | for i in range(9): 81 | for j in range(9): 82 | plt.arrow(x[0][i,j],x[1][i,j],0.1*dxdy[i,j,0],0.1*dxdy[i,j,1],width=0.01) 83 | 84 | plt.show() 85 | 86 | # embed() 87 | 88 | # Transformed vis. 89 | size = 9 90 | scale = 5 91 | scaled_size = scale*size 92 | # policy_map = np.flipud(np.transpose(policy_map)) 93 | policy_map = np.transpose(policy_map) 94 | # goal_based_policy_maps = np.zeros((4,size,size),dtype=int) 95 | # goal_based_policy_maps[0] = copy.deepcopy(policy_map) 96 | # goal_based_policy_maps[1] = np.rot90(policy_map) 97 | # goal_based_policy_maps[2] = np.rot90(policy_map,k=2) 98 | # goal_based_policy_maps[3] = np.rot90(policy_map,k=3) 99 | 100 | def get_bucket(state, reference_state): 101 | # baseline = 4*np.ones(2) 102 | baseline = np.zeros(2) 103 | compensated_state = state - reference_state 104 | # compensated_state = (np.round(state - reference_state) + baseline).astype(int) 105 | 106 | scaled_size = scale*size 107 | x = (np.arange(-(size-1)/2,(size-1)/2+1)-0.5)*scale 108 | 109 | bucket = np.zeros((2)) 110 | 111 | bucket[0] = min(np.searchsorted(x,compensated_state[0]),size-1) 112 | bucket[1] = min(np.searchsorted(x,compensated_state[1]),size-1) 113 | 114 | return bucket.astype(int) 115 | 116 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*10 117 | 118 | # goal_index = 1 119 | # # meshrange = np.arange(-scaled_size/2,scaled_size/2+1,5) 120 | # meshrange = (np.arange(-(size-1)/2,(size-1)/2+1)-0.5)*scale 121 | # evalrange = (np.arange(-(size-1)/2,(size-1)/2+1)-1)*scale 122 | 123 | # x = np.meshgrid(goal_states[goal_index,0]+meshrange,goal_states[goal_index,1]+meshrange) 124 | 125 | # dxdy = np.zeros((9,9,2)) 126 | # # dxdy = action_map[policy_map[x[0],x[1]]] 127 | # plt.scatter(x[0],x[1]) 128 | # plt.ylim(50,-50) 129 | 130 | # arr = np.zeros((9,9,2)) 131 | 132 | # for i in range(9): 133 | # for j in range(9): 134 | # a = goal_states[goal_index,0]+evalrange[i] 135 | # b = goal_states[goal_index,1]+evalrange[j] 136 | # bucket = get_bucket(np.array([a,b]), goal_states[goal_index]) 137 | # arr[i,j,0] = i 138 | # arr[i,j,1] = j 139 | # dxdy[bucket[0],bucket[1]] = action_map[policy_map[bucket[0],bucket[1]]] 140 | # plt.arrow(x[0][i,j],x[1][i,j],0.1*dxdy[i,j,0],0.1*dxdy[i,j,1],width=0.01*scale) 141 | 142 | # plt.show() 143 | 144 | for goal_index in range(4): 145 | # embed() 146 | # meshrange = np.arange(-scaled_size/2,scaled_size/2+1,5) 147 | meshrange = (np.arange(-(size-1)/2,(size-1)/2+1)-0.5)*scale 148 | evalrange = (np.arange(-(size-1)/2,(size-1)/2+1)-1)*scale 149 | 150 | x = np.meshgrid(goal_states[goal_index,0]+meshrange,goal_states[goal_index,1]+meshrange) 151 | 152 | dxdy = np.zeros((9,9,2)) 153 | # dxdy = action_map[policy_map[x[0],x[1]]] 154 | plt.scatter(x[0],x[1]) 155 | plt.ylim(50,-50) 156 | plt.xlim(-50,50) 157 | 158 | arr = np.zeros((9,9,2)) 159 | 160 | for i in range(9): 161 | for j in range(9): 162 | a = goal_states[goal_index,0]+evalrange[i] 163 | b = goal_states[goal_index,1]+evalrange[j] 164 | bucket = get_bucket(np.array([a,b]), goal_states[goal_index]) 165 | arr[i,j,0] = i 166 | arr[i,j,1] = j 167 | # dxdy[bucket[0],bucket[1]] = action_map[goal_based_policy_maps[goal_index,bucket[0],bucket[1]]] 168 | dxdy[bucket[0],bucket[1]] = action_map[policy_map[bucket[0],bucket[1]]] 169 | # plt.arrow(x[0][i,j],x[1][i,j],0.1*dxdy[i,j,0],0.1*dxdy[i,j,1],width=0.01*scale) 170 | 171 | # plt.quiver(x[0],x[1],0.1*dxdy[:,:,1],0.1*dxdy[:,:,0],width=0.0001,headwidth=4,headlength=2) 172 | plt.quiver(x[0],x[1],0.1*dxdy[:,:,1],0.1*dxdy[:,:,0]) 173 | 174 | traj_len = 20 175 | traj = np.zeros((20,2)) 176 | traj[0] = np.random.randint(-25,high=25,size=2) 177 | 178 | for t in range(traj_len-1): 179 | 180 | bucket = get_bucket(traj[t], goal_states[goal_index]) 181 | action_index = policy_map[bucket[0],bucket[1]] 182 | action = action_map[action_index] 183 | traj[t+1] = traj[t] + action 184 | 185 | plt.plot(traj[:,0],traj[:,1],'r') 186 | plt.plot(traj[:,0],traj[:,1],'or') 187 | 188 | plt.show() 189 | 190 | -------------------------------------------------------------------------------- /DataGenerator/S_array_newcont_cond.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/S_array_newcont_cond.npy -------------------------------------------------------------------------------- /DataGenerator/SeparableTrajs.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | from IPython import embed 9 | import matplotlib.pyplot as plt 10 | 11 | # number_datapoints = 20 12 | number_datapoints = 50000 13 | number_timesteps = 20 14 | 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2)) 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2)) 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int) 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int) 20 | start_config_dataset = np.zeros((number_datapoints, 1),dtype=int) 21 | 22 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]]) 23 | start_scale = 15 24 | start_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*start_scale 25 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5 26 | scale = 5 27 | start_configs = np.zeros((4,5,2),dtype=int) 28 | start_configs[[0,3]] = np.array([[-2,2],[-1,1],[0,0],[1,-1],[2,-2]])*scale 29 | start_configs[[1,2]] = np.array([[-2,-2],[-1,-1],[0,0],[1,1],[2,2]])*scale 30 | 31 | # valid_options = np.array([[2,3],[3,0],[1,2],[0,1]]) 32 | valid_options = np.array([[3,2],[3,0],[2,1],[0,1]]) 33 | lim = 50 34 | 35 | progression_of_options = np.zeros((5,4),dtype=int) 36 | progression_of_options[1,0] = 1 37 | progression_of_options[2,:2] = 1 38 | progression_of_options[3,1:] = 1 39 | progression_of_options[4,:] = 1 40 | 41 | for i in range(number_datapoints): 42 | 43 | if i%1000==0: 44 | print("Processing Datapoint: ",i) 45 | 46 | goal_array_dataset[i] = np.random.random_integers(0,high=3) 47 | start_config_dataset[i] = np.random.random_integers(0,high=4) 48 | # start_config_dataset[i] = 4 49 | 50 | # Adding random noise to start state. 51 | x_array_dataset[i,0] = start_states[goal_array_dataset[i]] + start_configs[goal_array_dataset[i],start_config_dataset[i]] + 0.1*(np.random.random(2)-0.5) 52 | 53 | reset_counter = 0 54 | option_counter = 0 55 | 56 | for t in range(number_timesteps-1): 57 | 58 | # GET B 59 | if t==0: 60 | b_array_dataset[i,t] = 1 61 | if t>0: 62 | # If 3,4,5 timesteps have passed, terminate. 63 | if reset_counter>=3 and reset_counter<5: 64 | b_array_dataset[i,t] = np.random.binomial(1,0.33) 65 | elif reset_counter==5: 66 | b_array_dataset[i,t] = 1 67 | 68 | # GET Y 69 | if b_array_dataset[i,t]: 70 | current_state = x_array_dataset[i,t] 71 | 72 | # select new y_array_dataset[i,t] 73 | y_array_dataset[i,t] = valid_options[goal_array_dataset[i]][0][progression_of_options[start_config_dataset[i],min(option_counter,3)]] 74 | 75 | option_counter+=1 76 | reset_counter = 0 77 | else: 78 | reset_counter+=1 79 | y_array_dataset[i,t] = y_array_dataset[i,t-1] 80 | 81 | # GET A 82 | a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]+0.1*(np.random.random((2))-0.5) 83 | 84 | # GET X 85 | # Already taking care of backwards generation here, no need to use action_compliments. 86 | 87 | x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t] 88 | 89 | # plt.scatter(goal_states[:,0],goal_states[:,1],s=50) 90 | # # plt.scatter() 91 | # plt.scatter(x_array_dataset[i,:,0],x_array_dataset[i,:,1],cmap='jet',c=range(number_timesteps)) 92 | # plt.xlim(-lim,lim) 93 | # plt.ylim(-lim,lim) 94 | # plt.show() 95 | 96 | 97 | # Roll over b's. 98 | b_array_dataset = np.roll(b_array_dataset,1,axis=1) 99 | 100 | 101 | np.save("X_separable.npy",x_array_dataset) 102 | np.save("Y_separable.npy",y_array_dataset) 103 | np.save("B_separable.npy",b_array_dataset) 104 | np.save("A_separable.npy",a_array_dataset) 105 | np.save("G_separable.npy",goal_array_dataset) 106 | np.save("StartConfig_separable.npy",start_config_dataset) 107 | -------------------------------------------------------------------------------- /DataGenerator/X_array_newcont_cond.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/X_array_newcont_cond.npy -------------------------------------------------------------------------------- /DataGenerator/X_goal_directed.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/X_goal_directed.npy -------------------------------------------------------------------------------- /DataGenerator/Y_array_newcont_cond.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/Y_array_newcont_cond.npy -------------------------------------------------------------------------------- /DataGenerator/Y_goal_directed.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/Y_goal_directed.npy -------------------------------------------------------------------------------- /DataLoaders/GridWorld_DataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from headers import * 8 | 9 | class GridWorldDataset(Dataset): 10 | 11 | # Class implementing instance of dataset class for gridworld data. 12 | 13 | def __init__(self, dataset_directory): 14 | self.dataset_directory = dataset_directory 15 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 16 | 17 | self.action_map = np.array([[-1,0],[1,0],[0,-1],[0,1],[-1,-1],[-1,1],[1,-1],[1,1]]) 18 | ## UP, DOWN, LEFT, RIGHT, UPLEFT, UPRIGHT, DOWNLEFT, DOWNRIGHT. ## 19 | 20 | def __len__(self): 21 | 22 | # Find out how many images we've stored. 23 | filelist = glob.glob(os.path.join(self.dataset_directory,"*.png")) 24 | 25 | # FOR NOW: USE ONLY till 3200 images. 26 | return 3200 27 | # return len(filelist) 28 | 29 | def parse_trajectory_actions(self, coordinate_trajectory): 30 | # Takes coordinate trajectory, returns action index taken. 31 | 32 | state_diffs = np.diff(coordinate_trajectory,axis=0) 33 | action_sequence = np.zeros((len(state_diffs)),dtype=int) 34 | 35 | for i in range(len(state_diffs)): 36 | for k in range(len(self.action_map)): 37 | if (state_diffs[i]==self.action_map[k]).all(): 38 | action_sequence[i]=k 39 | 40 | return action_sequence.astype(float) 41 | 42 | def __getitem__(self, index): 43 | 44 | # The getitem function must return a Map-Trajectory pair. 45 | # We will handle per-timestep processes within our code. 46 | # Assumes index is within range [0,len(filelist)-1] 47 | image = cv2.imread(os.path.join(self.dataset_directory,"Image{0}.png".format(index))) 48 | coordinate_trajectory = np.load(os.path.join(self.dataset_directory,"Image{0}_Traj1.npy".format(index))).astype(float) 49 | 50 | action_sequence = self.parse_trajectory_actions(coordinate_trajectory) 51 | 52 | return image, coordinate_trajectory, action_sequence -------------------------------------------------------------------------------- /DataLoaders/InteractiveDataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | from .headers import * 11 | from . import MIME_DataLoader 12 | 13 | opts = flags.FLAGS 14 | 15 | def main(_): 16 | 17 | dataset = MIME_DataLoader.MIME_Dataset(opts) 18 | print("Created DataLoader.") 19 | 20 | embed() 21 | 22 | if __name__ == '__main__': 23 | app.run(main) -------------------------------------------------------------------------------- /DataLoaders/MIME_DataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from .headers import * 12 | import os.path as osp 13 | 14 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers') 15 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1') 16 | flags.DEFINE_string('MIME_dir', '/checkpoint/tanmayshankar/MIME/', 'Data Directory') 17 | # flags.DEFINE_boolean('downsampling', True, 'Whether to downsample trajectories. ') 18 | flags.DEFINE_integer('ds_freq', 20, 'Downsample joint trajectories by this fraction. Original recroding rate = 100Hz') 19 | flags.DEFINE_boolean('remote', False, 'Whether operating from a remote server or not.') 20 | # opts = flags.FLAGS 21 | 22 | 23 | def select_baxter_angles(trajectory, joint_names, arm='right'): 24 | # joint names in order as used via mujoco visualizer 25 | baxter_joint_names = ['right_s0', 'right_s1', 'right_e0', 'right_e1', 'right_w0', 'right_w1', 'right_w2', 'left_s0', 'left_s1', 'left_e0', 'left_e1', 'left_w0', 'left_w1', 'left_w2'] 26 | if arm == 'right': 27 | select_joints = baxter_joint_names[:7] 28 | elif arm == 'left': 29 | select_joints = baxter_joint_names[7:] 30 | elif arm == 'both': 31 | select_joints = baxter_joint_names 32 | inds = [joint_names.index(j) for j in select_joints] 33 | return trajectory[:, inds] 34 | 35 | 36 | def resample(original_trajectory, desired_number_timepoints): 37 | original_traj_len = len(original_trajectory) 38 | new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int) 39 | return original_trajectory[new_timepoints] 40 | 41 | 42 | class MIME_Dataset(Dataset): 43 | ''' 44 | Class implementing instance of dataset class for MIME data. 45 | ''' 46 | def __init__(self, opts, split='all'): 47 | self.dataset_directory = opts.MIME_dir 48 | 49 | # Default: /checkpoint/tanmayshankar/MIME/ 50 | self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/joint_angles.txt') 51 | 52 | if opts.remote: 53 | self.suff_filelist = np.load(osp.join(self.dataset_directory,"Suffix_Filelist.npy")) 54 | self.filelist = [] 55 | for j in range(len(self.suff_filelist)): 56 | self.filelist.append(osp.join(self.dataset_directory,self.suff_filelist[j])) 57 | else: 58 | self.filelist = glob.glob(self.fulltext) 59 | 60 | self.ds_freq = opts.ds_freq 61 | 62 | with open(self.filelist[0], 'r') as file: 63 | lines = file.readlines() 64 | self.joint_names = sorted(eval(lines[0].rstrip('\n')).keys()) 65 | 66 | if split == 'all': 67 | self.filelist = self.filelist 68 | else: 69 | self.task_lists = np.load(os.path.join( 70 | self.dataset_directory, 'MIME_jointangles/{}_Lists.npy'.format(split.capitalize()))) 71 | 72 | self.filelist = [] 73 | for i in range(20): 74 | self.filelist.extend(self.task_lists[i]) 75 | self.filelist = [f.replace('/checkpoint/tanmayshankar/MIME/', opts.MIME_dir) for f in self.filelist] 76 | # print(len(self.filelist)) 77 | 78 | def __len__(self): 79 | # Return length of file list. 80 | return len(self.filelist) 81 | 82 | def __getitem__(self, index): 83 | ''' 84 | # Returns Joint Angles as: 85 | # List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 86 | # Assumes index is within range [0,len(filelist)-1] 87 | ''' 88 | file = self.filelist[index] 89 | 90 | left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt')) 91 | right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt')) 92 | 93 | orig_left_traj = np.load(osp.join(osp.split(file)[0], 'Left_EE.npy')) 94 | orig_right_traj = np.load(osp.join(osp.split(file)[0], 'Right_EE.npy')) 95 | 96 | joint_angle_trajectory = [] 97 | # Open file. 98 | with open(file, 'r') as file: 99 | lines = file.readlines() 100 | for line in lines: 101 | dict_element = eval(line.rstrip('\n')) 102 | if len(dict_element.keys()) == len(self.joint_names): 103 | # some files have extra lines with gripper keys e.g. MIME_jointangles/4/12405Nov19/joint_angles.txt 104 | array_element = np.array([dict_element[joint] for joint in self.joint_names]) 105 | joint_angle_trajectory.append(array_element) 106 | 107 | joint_angle_trajectory = np.array(joint_angle_trajectory) 108 | 109 | n_samples = len(orig_left_traj) // self.ds_freq 110 | 111 | elem = {} 112 | elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples) 113 | elem['left_trajectory'] = resample(orig_left_traj, n_samples) 114 | elem['right_trajectory'] = resample(orig_right_traj, n_samples) 115 | elem['left_gripper'] = resample(left_gripper, n_samples)/100 116 | elem['right_gripper'] = resample(right_gripper, n_samples)/100 117 | elem['path_prefix'] = os.path.split(self.filelist[index])[0] 118 | elem['ra_trajectory'] = select_baxter_angles(elem['joint_angle_trajectory'], self.joint_names, arm='right') 119 | elem['la_trajectory'] = select_baxter_angles(elem['joint_angle_trajectory'], self.joint_names, arm='left') 120 | # If max norm of differences is <1.0, valid. 121 | elem['is_valid'] = int(np.linalg.norm(np.diff(elem['joint_angle_trajectory'],axis=0),axis=1).max() < 1.0) 122 | 123 | return elem 124 | 125 | def recreate_dictionary(self, arm, joint_angles): 126 | if arm=="left": 127 | offset = 2 128 | width = 7 129 | elif arm=="right": 130 | offset = 9 131 | width = 7 132 | elif arm=="full": 133 | offset = 0 134 | width = len(self.joint_names) 135 | return dict((self.joint_names[i],joint_angles[i-offset]) for i in range(offset,offset+width)) 136 | 137 | # ------------ Data Loader ----------- # 138 | # ------------------------------------ # 139 | def data_loader(opts, split='all', shuffle=True): 140 | dset = MIME_Dataset(opts, split=split) 141 | 142 | return DataLoader( 143 | dset, 144 | batch_size=opts.batch_size, 145 | shuffle=shuffle, 146 | num_workers=opts.n_data_workers, 147 | drop_last=True) 148 | -------------------------------------------------------------------------------- /DataLoaders/MIME_DataLoader.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/MIME_DataLoader.pyc -------------------------------------------------------------------------------- /DataLoaders/MIME_Img_DataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from .headers import * 12 | import os.path as osp 13 | import pdb 14 | import scipy.misc 15 | 16 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers') 17 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1') 18 | flags.DEFINE_string('MIME_dir', '/checkpoint/tanmayshankar/MIME/', 'Data Directory') 19 | flags.DEFINE_string('MIME_imgs_dir', '/checkpoint/shubhtuls/data/MIME/', 'Data Directory') 20 | flags.DEFINE_integer('img_h', 64, 'Height') 21 | flags.DEFINE_integer('img_w', 128, 'Width') 22 | flags.DEFINE_integer('ds_freq', 20, 'Downsample joint trajectories by this fraction. Original recroding rate = 100Hz') 23 | 24 | 25 | def resample(original_trajectory, desired_number_timepoints): 26 | original_traj_len = len(original_trajectory) 27 | new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int) 28 | return original_trajectory[new_timepoints] 29 | 30 | 31 | class MIME_Img_Dataset(Dataset): 32 | ''' 33 | Class implementing instance of dataset class for MIME data. 34 | ''' 35 | def __init__(self, opts, split='all'): 36 | self.dataset_directory = opts.MIME_dir 37 | self.imgs_dataset_directory = opts.MIME_imgs_dir 38 | self.img_h = opts.img_h 39 | self.img_w = opts.img_w 40 | 41 | # Default: /checkpoint/tanmayshankar/MIME/ 42 | self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/joint_angles.txt') 43 | self.filelist = glob.glob(self.fulltext) 44 | 45 | self.ds_freq = opts.ds_freq 46 | 47 | with open(self.filelist[0], 'r') as file: 48 | lines = file.readlines() 49 | self.joint_names = sorted(eval(lines[0].rstrip('\n')).keys()) 50 | 51 | if split == 'all': 52 | self.filelist = self.filelist 53 | else: 54 | self.task_lists = np.load(os.path.join( 55 | self.dataset_directory, 'MIME_jointangles/{}_Lists.npy'.format(split.capitalize()))) 56 | self.filelist = [] 57 | for i in range(20): 58 | self.filelist.extend(self.task_lists[i]) 59 | self.filelist = [f.replace('/checkpoint/tanmayshankar/MIME/', opts.MIME_dir) for f in self.filelist] 60 | 61 | def __len__(self): 62 | # Return length of file list. 63 | return len(self.filelist) 64 | 65 | def __getitem__(self, index): 66 | ''' 67 | # Returns Joint Angles as: 68 | # List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 69 | # Assumes index is within range [0,len(filelist)-1] 70 | ''' 71 | file = self.filelist[index] 72 | file_split = file.split('/') 73 | frames_folder = osp.join(self.imgs_dataset_directory, file_split[-3], file_split[-2], 'frames') 74 | n_frames = len(os.listdir(frames_folder)) 75 | 76 | imgs = [] 77 | frame_inds = [0, n_frames//2, n_frames-1] 78 | for fi in frame_inds: 79 | img = scipy.misc.imread(osp.join(frames_folder, 'im_{}.png'.format(fi+1))) 80 | imgs.append(scipy.misc.imresize(img, (self.img_h, self.img_w))) 81 | imgs = np.stack(imgs) 82 | 83 | left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt')) 84 | right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt')) 85 | 86 | joint_angle_trajectory = [] 87 | # Open file. 88 | with open(file, 'r') as file: 89 | lines = file.readlines() 90 | for line in lines: 91 | dict_element = eval(line.rstrip('\n')) 92 | if len(dict_element.keys()) == len(self.joint_names): 93 | array_element = np.array([dict_element[joint] for joint in self.joint_names]) 94 | joint_angle_trajectory.append(array_element) 95 | 96 | joint_angle_trajectory = np.array(joint_angle_trajectory) 97 | 98 | n_samples = len(joint_angle_trajectory) // self.ds_freq 99 | 100 | elem = {} 101 | elem['imgs'] = imgs 102 | elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples) 103 | elem['left_gripper'] = resample(left_gripper, n_samples)/100 104 | elem['right_gripper'] = resample(right_gripper, n_samples)/100 105 | elem['is_valid'] = int(np.linalg.norm(np.diff(elem['joint_angle_trajectory'],axis=0),axis=1).max() < 1.0) 106 | 107 | return elem 108 | 109 | def recreate_dictionary(self, arm, joint_angles): 110 | if arm=="left": 111 | offset = 2 112 | width = 7 113 | elif arm=="right": 114 | offset = 9 115 | width = 7 116 | elif arm=="full": 117 | offset = 0 118 | width = len(self.joint_names) 119 | return dict((self.joint_names[i],joint_angles[i-offset]) for i in range(offset,offset+width)) 120 | 121 | # ------------ Data Loader ----------- # 122 | # ------------------------------------ # 123 | 124 | def data_loader(opts, split='all', shuffle=True): 125 | dset = MIME_Img_Dataset(opts, split=split) 126 | 127 | return DataLoader( 128 | dset, 129 | batch_size=opts.batch_size, 130 | shuffle=shuffle, 131 | num_workers=opts.n_data_workers, 132 | drop_last=True) 133 | -------------------------------------------------------------------------------- /DataLoaders/MIMEandPlan_DataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from .headers import * 12 | import os.path as osp 13 | 14 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers') 15 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1') 16 | flags.DEFINE_string('MIME_dir', '/checkpoint/tanmayshankar/MIME/', 'Data Directory') 17 | # flags.DEFINE_boolean('downsampling', True, 'Whether to downsample trajectories. ') 18 | flags.DEFINE_integer('ds_freq', 20, 'Downsample joint trajectories by this fraction. Original recroding rate = 100Hz') 19 | flags.DEFINE_boolean('remote', False, 'Whether operating from a remote server or not.') 20 | # opts = flags.FLAGS 21 | 22 | def resample(original_trajectory, desired_number_timepoints): 23 | original_traj_len = len(original_trajectory) 24 | new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int) 25 | return original_trajectory[new_timepoints] 26 | 27 | class MIME_Dataset(Dataset): 28 | ''' 29 | Class implementing instance of dataset class for MIME data. 30 | ''' 31 | def __init__(self, opts): 32 | self.dataset_directory = opts.MIME_dir 33 | 34 | # Default: /checkpoint/tanmayshankar/MIME/ 35 | self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/joint_angles.txt') 36 | 37 | if opts.remote: 38 | self.suff_filelist = np.load(osp.join(self.dataset_directory,"Suffix_Filelist.npy")) 39 | self.filelist = [] 40 | for j in range(len(self.suff_filelist)): 41 | self.filelist.append(osp.join(self.dataset_directory,self.suff_filelist[j])) 42 | else: 43 | self.filelist = sorted(glob.glob(self.fulltext)) 44 | 45 | self.ds_freq = opts.ds_freq 46 | 47 | with open(self.filelist[0], 'r') as file: 48 | print(self.filelist[0]) 49 | lines = file.readlines() 50 | self.joint_names = sorted(eval(lines[0].rstrip('\n')).keys()) 51 | 52 | self.train_lists = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Train_Lists.npy")) 53 | self.val_lists = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Val_Lists.npy")) 54 | self.test_lists = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Test_Lists.npy")) 55 | 56 | def __len__(self): 57 | # Return length of file list. 58 | return len(self.filelist) 59 | 60 | def setup_splits(self): 61 | self.train_filelist = [] 62 | self.val_filelist = [] 63 | self.test_filelist = [] 64 | 65 | for i in range(20): 66 | self.train_filelist.extend(self.train_lists[i]) 67 | self.val_filelist.extend(self.val_lists[i]) 68 | self.test_filelist.extend(self.test_lists[i]) 69 | 70 | def getit(self, index, split=None, return_plan_run=None): 71 | ''' 72 | # Returns Joint Angles as: 73 | # List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 74 | # Assumes index is within range [0,len(filelist)-1] 75 | ''' 76 | 77 | if split=="train": 78 | file = self.train_filelist[index] 79 | elif split=="val": 80 | file = self.val_filelist[index] 81 | elif split=="test": 82 | file = self.test_filelist[index] 83 | elif split is None: 84 | file = self.filelist[index] 85 | 86 | left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt')) 87 | right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt')) 88 | 89 | orig_left_traj = np.load(osp.join(osp.split(file)[0], 'Left_EE.npy')) 90 | orig_right_traj = np.load(osp.join(osp.split(file)[0], 'Right_EE.npy')) 91 | 92 | joint_angle_trajectory = [] 93 | 94 | folder = "New_Plans" 95 | if return_plan_run is not None: 96 | ee_plan = np.load(os.path.join(os.path.split(file)[0],"{0}/Run{1}_EE_Plan.npy".format(folder,return_plan_run))) 97 | ja_plan = np.load(os.path.join(os.path.split(file)[0],"{0}/Run{1}_Joint_Plan.npy".format(folder,return_plan_run))) 98 | 99 | # Open file. 100 | with open(file, 'r') as file: 101 | lines = file.readlines() 102 | for line in lines: 103 | dict_element = eval(line.rstrip('\n')) 104 | if len(dict_element.keys()) == len(self.joint_names): 105 | # some files have extra lines with gripper keys e.g. MIME_jointangles/4/12405Nov19/joint_angles.txt 106 | array_element = np.array([dict_element[joint] for joint in self.joint_names]) 107 | joint_angle_trajectory.append(array_element) 108 | 109 | joint_angle_trajectory = np.array(joint_angle_trajectory) 110 | 111 | n_samples = len(orig_left_traj) // self.ds_freq 112 | 113 | elem = {} 114 | elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples) 115 | elem['left_trajectory'] = resample(orig_left_traj, n_samples) 116 | elem['right_trajectory'] = resample(orig_right_traj, n_samples) 117 | elem['left_gripper'] = resample(left_gripper, n_samples) 118 | elem['right_gripper'] = resample(right_gripper, n_samples) 119 | elem['path_prefix'] = os.path.split(self.filelist[index])[0] 120 | elem['JA_Plan'] = ja_plan 121 | elem['EE_Plan'] = ee_plan 122 | 123 | return elem 124 | 125 | 126 | def __getitem__(self, index, split=None, return_plan_run=None): 127 | # def __getitem__(self, inputs): 128 | ''' 129 | # Returns Joint Angles as: 130 | # List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 131 | # Assumes index is within range [0,len(filelist)-1] 132 | ''' 133 | 134 | if split=="train": 135 | file = self.train_filelist[index] 136 | elif split=="val": 137 | file = self.val_filelist[index] 138 | elif split=="test": 139 | file = self.test_filelist[index] 140 | elif split is None: 141 | file = self.filelist[index] 142 | 143 | left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt')) 144 | right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt')) 145 | 146 | orig_left_traj = np.load(osp.join(osp.split(file)[0], 'Left_EE.npy')) 147 | orig_right_traj = np.load(osp.join(osp.split(file)[0], 'Right_EE.npy')) 148 | 149 | joint_angle_trajectory = [] 150 | 151 | folder = "New_Plans" 152 | if return_plan_run is not None: 153 | ee_plan = np.load(os.path.join(os.path.split(file)[0],"{0}/Run{1}_EE_Plan.npy".format(folder,return_plan_run))) 154 | ja_plan = np.load(os.path.join(os.path.split(file)[0],"{0}/Run{1}_JA_Plan.npy".format(folder,return_plan_run))) 155 | 156 | # Open file. 157 | with open(file, 'r') as file: 158 | lines = file.readlines() 159 | for line in lines: 160 | dict_element = eval(line.rstrip('\n')) 161 | if len(dict_element.keys()) == len(self.joint_names): 162 | # some files have extra lines with gripper keys e.g. MIME_jointangles/4/12405Nov19/joint_angles.txt 163 | array_element = np.array([dict_element[joint] for joint in self.joint_names]) 164 | joint_angle_trajectory.append(array_element) 165 | 166 | joint_angle_trajectory = np.array(joint_angle_trajectory) 167 | 168 | n_samples = len(orig_left_traj) // self.ds_freq 169 | 170 | elem = {} 171 | elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples) 172 | elem['left_trajectory'] = resample(orig_left_traj, n_samples) 173 | elem['right_trajectory'] = resample(orig_right_traj, n_samples) 174 | elem['left_gripper'] = resample(left_gripper, n_samples) 175 | elem['right_gripper'] = resample(right_gripper, n_samples) 176 | elem['path_prefix'] = os.path.split(self.filelist[index])[0] 177 | elem['JA_Plan'] = ja_plan 178 | elem['EE_Plan'] = ee_plan 179 | 180 | return elem 181 | 182 | def recreate_dictionary(self, arm, joint_angles): 183 | if arm=="left": 184 | offset = 2 185 | width = 7 186 | elif arm=="right": 187 | offset = 9 188 | width = 7 189 | elif arm=="full": 190 | offset = 0 191 | width = len(self.joint_names) 192 | return dict((self.joint_names[i],joint_angles[i-offset]) for i in range(offset,offset+width)) 193 | 194 | # ------------ Data Loader ----------- # 195 | # ------------------------------------ # 196 | def data_loader(opts, shuffle=True): 197 | dset = MIME_Dataset(opts) 198 | 199 | return DataLoader( 200 | dset, 201 | batch_size=opts.batch_size, 202 | shuffle=shuffle, 203 | num_workers=opts.n_data_workers, 204 | drop_last=True) 205 | -------------------------------------------------------------------------------- /DataLoaders/Plan_DataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from .headers import * 12 | import os.path as osp 13 | import pdb 14 | 15 | # flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers') 16 | # flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1') 17 | # flags.DEFINE_string('MIME_dir', '/checkpoint/tanmayshankar/MIME/', 'Data Directory') 18 | flags.DEFINE_enum('arm', 'both', ['left', 'right', 'both'], 'Which arms data to load') 19 | 20 | class Plan_Dataset(Dataset): 21 | ''' 22 | Class implementing instance of dataset class for MIME data. 23 | ''' 24 | def __init__(self, opts, split='all'): 25 | self.opts = opts 26 | self.split = split 27 | self.dataset_directory = self.opts.MIME_dir 28 | 29 | # # Must consider permutations of arm and split. 30 | # Right Arm: New_Plans / Run*_EE_Plan 31 | # / Run*_Joint_Plan 32 | # / Run*_RG_Traj 33 | 34 | # Left Arm: New_Plans_Left / Run*_EE_Plan 35 | # / Run*_Joint_Plan 36 | # / Run*_LG_traj 37 | 38 | # Both Arms: Ambidextrous_Plans / Run*_EE_Plan 39 | # / Run*_Joint_Plan 40 | # / Run*_Grip_Traj 41 | 42 | # Set these parameters to replace. 43 | if self.opts.arm=='left': 44 | folder = 'New_Plans' 45 | gripper_suffix = "_LG_Traj" 46 | elif self.opts.arm=='right': 47 | folder = 'New_Plans_Left' 48 | gripper_suffix = "_RG_Traj" 49 | elif self.opts.arm=='both': 50 | folder = 'Ambidextrous_Plans' 51 | gripper_suffix = "_Grip_Traj" 52 | 53 | # Default: /checkpoint/tanmayshankar/MIME/ 54 | 55 | if self.split=='all': 56 | # Collect list of all EE Plans, we will select all Joint Angle Plans correspondingly. 57 | self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/New_Plans/Run*_EE_Plan.npy') 58 | # Joint angle plans filelist is in same order thanks to glob. 59 | self.jatext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/New_Plans/Run*_Joint_Plan.npy') 60 | # Gripper plans filelist is in same order thanks to glob. 61 | # self.rgtext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/New_Plans/Run*_RG_Traj.npy') 62 | 63 | self.filelist = sorted(glob.glob(self.fulltext)) 64 | self.joint_filelist = sorted(glob.glob(self.jatext)) 65 | # self.gripper_filelist = sorted(glob.glob(self.rgtext)) 66 | 67 | elif self.split=='train': 68 | self.filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanTrainList.npy")) 69 | self.joint_filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanJointTrainList.npy")) 70 | elif self.split=='val': 71 | self.filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanValList.npy")) 72 | self.joint_filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanJointValList.npy")) 73 | elif self.split=='test': 74 | self.filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanTestList.npy")) 75 | self.joint_filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanJointTestList.npy")) 76 | 77 | # the loaded np arrays give byte strings, and not strings, which breaks later code 78 | if not isinstance(self.filelist[0], str): 79 | self.filelist = [f.decode() for f in self.filelist] 80 | self.joint_filelist = [f.decode() for f in self.joint_filelist] 81 | 82 | # Now replace terms in filelists based on what arm it is. 83 | # The EE file list only needs folder replaced. 84 | self.filelist = [f.replace("New_Plans",folder).replace('/checkpoint/tanmayshankar/MIME',self.opts.MIME_dir) for f in self.filelist] 85 | # The Joint file list also only needs folder replaced. 86 | self.joint_filelist = [f.replace("New_Plans",folder).replace('/checkpoint/tanmayshankar/MIME',self.opts.MIME_dir) for f in self.joint_filelist] 87 | # Since we didn't create split lists for Gripper, use the filelist and replace to Gripper. 88 | self.gripper_filelist = [f.replace("New_Plans",folder).replace("_EE_Plan",gripper_suffix).replace('/checkpoint/tanmayshankar/MIME',self.opts.MIME_dir) for f in self.filelist] 89 | 90 | # Set joint names. 91 | self.left_joint_names = ['left_s0','left_s1','left_e0','left_e1','left_w0','left_w1','left_w2'] 92 | self.right_joint_names = ['right_s0','right_s1','right_e0','right_e1','right_w0','right_w1','right_w2'] 93 | self.both_joint_names = self.left_joint_names+self.right_joint_names 94 | 95 | def __len__(self): 96 | # Return length of file list. 97 | return len(self.filelist) 98 | 99 | def __getitem__(self, index): 100 | 101 | file = self.filelist[index] 102 | joint_file = self.joint_filelist[index] 103 | gripper_file = self.gripper_filelist[index] 104 | 105 | # Load items. 106 | elem = {} 107 | elem['EE_Plan'] = np.load(file) 108 | elem['JA_Plan'] = np.load(joint_file) 109 | elem['Grip_Plan'] = np.load(gripper_file)/100 110 | 111 | return elem 112 | 113 | # ------------ Data Loader ----------- # 114 | # ------------------------------------ # 115 | def data_loader(opts, split='all', shuffle=True): 116 | dset = Plan_Dataset(opts, split=split) 117 | 118 | return DataLoader( 119 | dset, 120 | batch_size=opts.batch_size, 121 | shuffle=shuffle, 122 | num_workers=opts.n_data_workers, 123 | drop_last=True) 124 | 125 | -------------------------------------------------------------------------------- /DataLoaders/RandomWalks.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import sys 12 | import os 13 | import random as stdlib_random, string 14 | 15 | import matplotlib 16 | matplotlib.use('Agg') 17 | import matplotlib.pyplot as plt 18 | 19 | 20 | import numpy as np 21 | 22 | from absl import flags, app 23 | 24 | import torch 25 | from torch.utils.data import Dataset 26 | from torch.utils.data import DataLoader 27 | from torch.utils.data.dataloader import default_collate 28 | from ..utils import plotting as plot_util 29 | 30 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers') 31 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1') 32 | flags.DEFINE_integer('n_segments_min', 4, 'Min Number of gt segments per trajectory') 33 | flags.DEFINE_integer('n_segments_max', 4, 'Max number of gt segments per trajectory') 34 | 35 | dirs_2d = np.array([ 36 | [1,0], 37 | [0,1], 38 | [-1,0], 39 | [0,-1] 40 | ]) 41 | 42 | 43 | def vis_walk(walk): 44 | ''' 45 | Args: 46 | walk: (nT+1) X 2 array 47 | Returns: 48 | im: 200 X 200 X 4 numpy array 49 | ''' 50 | 51 | t = walk.shape[0] 52 | xs = walk[:,0] 53 | ys = walk[:,1] 54 | color_inds = np.linspace(0, 255, t).astype(np.int).tolist() 55 | cs = plot_util.colormap[color_inds, :] 56 | 57 | fig = plt.figure(figsize=(4, 4), dpi=50) 58 | ax = fig.subplots() 59 | 60 | ax.scatter(xs, ys, c=cs) 61 | ax.set_xlim(-2, 2) 62 | ax.set_ylim(-2, 2) 63 | ax.set_aspect('equal', 'box') 64 | 65 | ax.tick_params( 66 | axis='x', 67 | which='both', 68 | bottom=False, 69 | top=False, 70 | labelbottom=False) 71 | 72 | ax.tick_params( 73 | axis='y', 74 | which='both', 75 | left=False, 76 | right=False, 77 | labelleft=False) 78 | 79 | fig.tight_layout() 80 | fname = '/tmp/' + ''.join(stdlib_random.choices(string.ascii_letters, k=8)) + '.png' 81 | fig.savefig(fname) 82 | plt.close(fig) 83 | 84 | im = plt.imread(fname) 85 | os.remove(fname) 86 | 87 | return im 88 | 89 | 90 | def walk_segment(origin, direction, n_steps=10, step_size=0.1, noise=0.02, rng=None): 91 | ''' 92 | Args: 93 | origin: nd numpy array 94 | direction: nd numpy array with unit norm 95 | n_steps: length of time seq 96 | step_size: size of each step 97 | noise: magintude of max actuation noise 98 | Returns: 99 | segment: n_steps X nd array 100 | note that the first position in segment is different from origin 101 | ''' 102 | if rng is None: 103 | rng = np.random 104 | 105 | nd = origin.shape[0] 106 | segment = np.zeros((n_steps, nd)) + origin 107 | segment += np.arange(1, n_steps+1).reshape((-1,1))*direction*step_size 108 | segment += rng.uniform(low=-1, high=1, size=(n_steps, nd)) * noise/nd 109 | return segment 110 | 111 | 112 | def random_walk2d(origin, num_segments=4, rng=None): 113 | ''' 114 | Args: 115 | origin: 2d numpy array 116 | num_segments: length of time seq 117 | Returns: 118 | walk: (nT+1) X 2 array 119 | ''' 120 | if rng is None: 121 | rng = np.random 122 | 123 | dir_ind = rng.randint(4) 124 | walk = origin.reshape(1,2) 125 | seg_lengths = [] 126 | for s in range(num_segments): 127 | seg_length = rng.randint(6,10) 128 | seg_lengths.append(seg_length) 129 | step_size = 0.1 + (rng.uniform() - 0.5)*0.05 130 | 131 | segment = walk_segment(origin, dirs_2d[dir_ind], n_steps=seg_length, step_size=step_size, rng=rng) 132 | origin = segment[-1] 133 | walk = np.concatenate((walk, segment), axis=0) 134 | 135 | dir_ind += 2 * rng.randint(2) -1 136 | dir_ind = dir_ind % 4 137 | 138 | return walk, seg_lengths 139 | 140 | 141 | class RandomWalksDataset(Dataset): 142 | 143 | def __init__(self, opts): 144 | self.opts = opts 145 | self.n_segments_min = self.opts.n_segments_min 146 | self.n_segments_max = self.opts.n_segments_max 147 | 148 | def __len__(self): 149 | return int(1e6) 150 | 151 | def __getitem__(self, ix): 152 | rng = np.random.RandomState(ix) 153 | ns = rng.randint(self.n_segments_min, self.n_segments_max+1) 154 | trajectory, self.seg_lengths_ix = random_walk2d(np.zeros(2), num_segments=ns, rng=rng) 155 | return trajectory 156 | 157 | # ------------ Data Loader ----------- # 158 | # ------------------------------------ # 159 | def data_loader(opts, shuffle=True): 160 | dset = RandomWalksDataset(opts) 161 | 162 | return DataLoader( 163 | dset, 164 | batch_size=opts.batch_size, 165 | shuffle=shuffle, 166 | num_workers=opts.n_data_workers, 167 | drop_last=True) 168 | 169 | 170 | if __name__ == '__main__': 171 | walk = random_walk2d(np.zeros(2), num_segments=4) 172 | print(walk) 173 | -------------------------------------------------------------------------------- /DataLoaders/RandomWalks.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/RandomWalks.pyc -------------------------------------------------------------------------------- /DataLoaders/RoboturkeExp.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | 8 | """ 9 | A convenience script to playback random demonstrations from 10 | a set of demonstrations stored in a hdf5 file. 11 | Arguments: 12 | --folder (str): Path to demonstrations 13 | --use_actions (optional): If this flag is provided, the actions are played back 14 | through the MuJoCo simulator, instead of loading the simulator states 15 | one by one. 16 | Example: 17 | $ python playback_demonstrations_from_hdf5.py --folder ../models/assets/demonstrations/SawyerPickPlace/ 18 | """ 19 | import os 20 | import h5py 21 | import argparse 22 | import random 23 | import numpy as np 24 | 25 | import robosuite 26 | from robosuite.utils.mjcf_utils import postprocess_model_xml 27 | from IPython import embed 28 | 29 | if __name__ == "__main__": 30 | parser = argparse.ArgumentParser() 31 | parser.add_argument( 32 | "--folder", 33 | type=str, 34 | default=os.path.join( 35 | robosuite.models.assets_root, "demonstrations/SawyerNutAssembly" 36 | ), 37 | ) 38 | parser.add_argument( 39 | "--use-actions", 40 | action='store_true', 41 | ) 42 | args = parser.parse_args() 43 | 44 | demo_path = args.folder 45 | hdf5_path = os.path.join(demo_path, "demo.hdf5") 46 | f = h5py.File(hdf5_path, "r") 47 | env_name = f["data"].attrs["env"] 48 | 49 | env = robosuite.make( 50 | env_name, 51 | has_renderer=False, 52 | # has_renderer=True, 53 | ignore_done=True, 54 | use_camera_obs=False, 55 | gripper_visualization=True, 56 | reward_shaping=True, 57 | control_freq=100, 58 | ) 59 | 60 | # list of all demonstrations episodes 61 | demos = list(f["data"].keys()) 62 | 63 | while True: 64 | print("Playing back random episode... (press ESC to quit)") 65 | 66 | # # select an episode randomly 67 | ep = random.choice(demos) 68 | 69 | # read the model xml, using the metadata stored in the attribute for this episode 70 | model_file = f["data/{}".format(ep)].attrs["model_file"] 71 | model_path = os.path.join(demo_path, "models", model_file) 72 | with open(model_path, "r") as model_f: 73 | model_xml = model_f.read() 74 | 75 | env.reset() 76 | xml = postprocess_model_xml(model_xml) 77 | env.reset_from_xml_string(xml) 78 | env.sim.reset() 79 | # env.viewer.set_camera(0) 80 | 81 | # load the flattened mujoco states 82 | states = f["data/{}/states".format(ep)].value 83 | 84 | if args.use_actions: 85 | 86 | # load the initial state 87 | env.sim.set_state_from_flattened(states[0]) 88 | env.sim.forward() 89 | 90 | # load the actions and play them back open-loop 91 | jvels = f["data/{}/joint_velocities".format(ep)].value 92 | grip_acts = f["data/{}/gripper_actuations".format(ep)].value 93 | actions = np.concatenate([jvels, grip_acts], axis=1) 94 | num_actions = actions.shape[0] 95 | 96 | for j, action in enumerate(actions): 97 | env.step(action) 98 | # env.render() 99 | 100 | if j < num_actions - 1: 101 | # ensure that the actions deterministically lead to the same recorded states 102 | state_playback = env.sim.get_state().flatten() 103 | 104 | embed() 105 | assert(np.all(np.equal(states[j + 1], state_playback))) 106 | 107 | else: 108 | 109 | print("Embedding in not use actions branch") 110 | embed() 111 | # force the sequence of internal mujoco states one by one 112 | for state in states: 113 | env.sim.set_state_from_flattened(state) 114 | env.sim.forward() 115 | # env.render() 116 | 117 | f.close() -------------------------------------------------------------------------------- /DataLoaders/SmallMaps_DataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from headers import * 8 | 9 | class GridWorldDataset(Dataset): 10 | 11 | # Class implementing instance of dataset class for gridworld data. 12 | 13 | def __init__(self, dataset_directory): 14 | self.dataset_directory = dataset_directory 15 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 16 | 17 | self.action_map = np.array([[-1,0],[1,0],[0,-1],[0,1],[-1,-1],[-1,1],[1,-1],[1,1]]) 18 | ## UP, DOWN, LEFT, RIGHT, UPLEFT, UPRIGHT, DOWNLEFT, DOWNRIGHT. ## 19 | 20 | def __len__(self): 21 | 22 | # Find out how many images we've stored. 23 | filelist = glob.glob(os.path.join(self.dataset_directory,"*.png")) 24 | return 4000 25 | # return len(filelist) 26 | 27 | def parse_trajectory_actions(self, coordinate_trajectory): 28 | # Takes coordinate trajectory, returns action index taken. 29 | 30 | state_diffs = np.diff(coordinate_trajectory,axis=0) 31 | action_sequence = np.zeros((len(state_diffs)),dtype=int) 32 | 33 | for i in range(len(state_diffs)): 34 | for k in range(len(self.action_map)): 35 | if (state_diffs[i]==self.action_map[k]).all(): 36 | action_sequence[i]=k 37 | 38 | return action_sequence.astype(float) 39 | 40 | def __getitem__(self, index): 41 | 42 | # The getitem function must return a Map-Trajectory pair. 43 | # We will handle per-timestep processes within our code. 44 | # Assumes index is within range [0,len(filelist)-1] 45 | image = np.load(os.path.join(self.dataset_directory,"Map{0}.npy".format(index))) 46 | time_limit = 20 47 | coordinate_trajectory = np.load(os.path.join(self.dataset_directory,"Map{0}_Traj1.npy".format(index))).astype(float)[:time_limit] 48 | action_sequence = self.parse_trajectory_actions(coordinate_trajectory) 49 | 50 | return image, coordinate_trajectory, action_sequence -------------------------------------------------------------------------------- /DataLoaders/Translation.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | from __future__ import unicode_literals 11 | 12 | from .headers import * 13 | import os.path as osp 14 | 15 | from io import open 16 | import unicodedata 17 | import string 18 | import re 19 | import random 20 | 21 | import torch 22 | import torch.nn as nn 23 | from torch import optim 24 | import torch.nn.functional as F 25 | 26 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 27 | 28 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers') 29 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1') 30 | flags.DEFINE_string('lang_dir', '/private/home/shubhtuls/code/sfd/cachedir/data/lang/', 'Data Directory') 31 | 32 | SOS_token = 0 33 | EOS_token = 1 34 | 35 | 36 | class Lang: 37 | def __init__(self, name): 38 | self.name = name 39 | self.word2index = {} 40 | self.word2count = {} 41 | self.index2word = {0: "SOS", 1: "EOS"} 42 | self.n_words = 2 # Count SOS and EOS 43 | 44 | def addSentence(self, sentence): 45 | for word in sentence.split(' '): 46 | self.addWord(word) 47 | 48 | def addWord(self, word): 49 | if word not in self.word2index: 50 | self.word2index[word] = self.n_words 51 | self.word2count[word] = 1 52 | self.index2word[self.n_words] = word 53 | self.n_words += 1 54 | else: 55 | self.word2count[word] += 1 56 | 57 | # Turn a Unicode string to plain ASCII, thanks to 58 | # https://stackoverflow.com/a/518232/2809427 59 | def unicodeToAscii(s): 60 | return ''.join( 61 | c for c in unicodedata.normalize('NFD', s) 62 | if unicodedata.category(c) != 'Mn' 63 | ) 64 | 65 | # Lowercase, trim, and remove non-letter characters 66 | def normalizeString(s): 67 | s = unicodeToAscii(s.lower().strip()) 68 | s = re.sub(r"([.!?])", r" \1", s) 69 | s = re.sub(r"[^a-zA-Z.!?]+", r" ", s) 70 | return s 71 | 72 | 73 | def readLangs(data_dir, lang1, lang2, reverse=False): 74 | print("Reading lines...") 75 | 76 | # Read the file and split into lines 77 | lines = open(osp.join(data_dir, '%s-%s.txt' % (lang1, lang2)), encoding='utf-8').\ 78 | read().strip().split('\n') 79 | 80 | # Split every line into pairs and normalize 81 | pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines] 82 | 83 | # Reverse pairs, make Lang instances 84 | if reverse: 85 | pairs = [list(reversed(p)) for p in pairs] 86 | input_lang = Lang(lang2) 87 | output_lang = Lang(lang1) 88 | else: 89 | input_lang = Lang(lang1) 90 | output_lang = Lang(lang2) 91 | 92 | return input_lang, output_lang, pairs 93 | 94 | 95 | MAX_LENGTH = 10 96 | 97 | eng_prefixes = ( 98 | "i am ", "i m ", 99 | "he is", "he s ", 100 | "she is", "she s ", 101 | "you are", "you re ", 102 | "we are", "we re ", 103 | "they are", "they re " 104 | ) 105 | 106 | 107 | def filterPair(p): 108 | return len(p[0].split(' ')) < MAX_LENGTH and \ 109 | len(p[1].split(' ')) < MAX_LENGTH 110 | # and \ 111 | # p[1].startswith(eng_prefixes) 112 | 113 | 114 | def filterPairs(pairs): 115 | return [pair for pair in pairs if filterPair(pair)] 116 | 117 | 118 | def prepareData(data_dir, lang1, lang2, reverse=False): 119 | input_lang, output_lang, pairs = readLangs(data_dir, lang1, lang2, reverse) 120 | print("Read %s sentence pairs" % len(pairs)) 121 | pairs = filterPairs(pairs) 122 | print("Trimmed to %s sentence pairs" % len(pairs)) 123 | print("Counting words...") 124 | for pair in pairs: 125 | input_lang.addSentence(pair[0]) 126 | output_lang.addSentence(pair[1]) 127 | print("Counted words:") 128 | print(input_lang.name, input_lang.n_words) 129 | print(output_lang.name, output_lang.n_words) 130 | return input_lang, output_lang, pairs 131 | 132 | 133 | def indexesFromSentence(lang, sentence): 134 | return [lang.word2index[word] for word in sentence.split(' ')] 135 | 136 | 137 | def tensorFromSentence(lang, sentence): 138 | indexes = indexesFromSentence(lang, sentence) 139 | indexes.append(EOS_token) 140 | return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1) 141 | 142 | 143 | class TranslationDataset(Dataset): 144 | ''' 145 | Class implementing instance of dataset class for MIME data. 146 | ''' 147 | def __init__(self, opts): 148 | self.dataset_directory = opts.lang_dir 149 | self.l1, self.l2, self.pairs = prepareData(self.dataset_directory, 'eng', 'fra', reverse=False) 150 | 151 | def __len__(self): 152 | # Return length of file list. 153 | return len(self.l1) 154 | 155 | def tensorsFromPair(self, pair): 156 | input_tensor = tensorFromSentence(self.l1, pair[0]) 157 | target_tensor = tensorFromSentence(self.l2, pair[1]) 158 | return (input_tensor, target_tensor) 159 | 160 | def __getitem__(self, index): 161 | elem = {} 162 | elem['pair'] = self.pairs[index] 163 | elem['l1'], elem['l2'] = self.tensorsFromPair(elem['pair']) 164 | 165 | return elem -------------------------------------------------------------------------------- /DataLoaders/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/__init__.py -------------------------------------------------------------------------------- /DataLoaders/__init__.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/__init__.pyc -------------------------------------------------------------------------------- /DataLoaders/headers.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | import torch 9 | import glob, cv2, os 10 | from torch.utils.data import Dataset, DataLoader 11 | from torchvision import transforms, utils 12 | from absl import flags 13 | from IPython import embed 14 | from absl import flags, app -------------------------------------------------------------------------------- /DataLoaders/headers.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/headers.pyc -------------------------------------------------------------------------------- /DownstreamRL/PolicyNet.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | 8 | from ..SkillNetwork.headers import * 9 | from ..SkillNetwork.LSTMNetwork import LSTMNetwork, LSTMNetwork_Fixed 10 | 11 | class PolicyNetwork(torch.nn.Module): 12 | 13 | def __init__(self, opts, input_size, hidden_size, output_size, fixed=True): 14 | 15 | super(PolicyNetwork, self).__init__() 16 | 17 | self.opts = opts 18 | self.input_size = input_size 19 | self.hidden_size = hidden_size 20 | self.output_size = output_size 21 | 22 | if fixed: 23 | self.lstmnet = LSTMNetwork_Fixed(input_size=input_size, hidden_size=hidden_size, output_size=output_size).cuda() 24 | else: 25 | self.lstmnet = LSTMNetwork(input_size=input_size, hidden_size=hidden_size, output_size=output_size).cuda() 26 | 27 | # Create linear layer to split prediction into mu and sigma. 28 | self.mu_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz) 29 | self.sig_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz) 30 | 31 | # Stopping probability predictor. (Softmax, not sigmoid) 32 | self.stopping_probability_layer = torch.nn.Linear(self.hidden_size, 2) 33 | self.softmax_layer = torch.nn.Softmax(dim=-1) 34 | 35 | def forward(self, input): 36 | 37 | format_input = torch.tensor(input).view(1,1,self.input_size).cuda().float() 38 | predicted_Z_preparam, stop_probabilities = self.lstmnet.forward(format_input) 39 | 40 | predicted_Z_preparam = predicted_Z_preparam.squeeze(1) 41 | 42 | self.latent_z_seq = [] 43 | self.latent_mu_seq = [] 44 | self.latent_log_sigma_seq = [] 45 | self.kld_loss = 0. 46 | 47 | t = 0 48 | 49 | # Remember, the policy is Gaussian (so we can implement VAE-KLD on it). 50 | latent_z_mu_seq = self.mu_linear_layer(predicted_Z_preparam) 51 | latent_z_log_sig_seq = self.sig_linear_layer(predicted_Z_preparam) 52 | 53 | # Compute standard deviation. 54 | std = torch.exp(0.5*latent_z_log_sig_seq).cuda() 55 | # Sample random variable. 56 | eps = torch.randn_like(std).cuda() 57 | 58 | self.latent_z_seq = latent_z_mu_seq+eps*std 59 | 60 | # Compute KL Divergence Loss term here, so we don't have to return mu's and sigma's. 61 | self.kld_loss = torch.zeros(1) 62 | for t in range(latent_z_mu_seq.shape[0]): 63 | # Taken from mime_plan_skill.py Line 159 - KL Divergence for Gaussian prior and Gaussian prediction. 64 | self.kld_loss += -0.5 * torch.sum(1. + latent_z_log_sig_seq[t] - latent_z_mu_seq[t].pow(2) - latent_z_log_sig_seq[t].exp()) 65 | 66 | # Create distributions so that we can evaluate log probability. 67 | self.dists = [torch.distributions.MultivariateNormal(loc = latent_z_mu_seq[t], covariance_matrix = std[t]*torch.eye((self.opts.nz)).cuda()) for t in range(latent_z_mu_seq.shape[0])] 68 | 69 | # Evaluate log probability in forward so we don't have to do it elswhere. 70 | self.log_probs = [self.dists[i].log_prob(self.latent_z_seq[i]) for i in range(self.latent_z_seq.shape[0])] 71 | 72 | return self.latent_z_seq, stop_probabilities 73 | 74 | class PolicyNetworkSingleTimestep(torch.nn.Module): 75 | 76 | # Policy Network inherits from torch.nn.Module. 77 | # Now we overwrite the init, forward functions. And define anything else that we need. 78 | 79 | def __init__(self, opts, input_size, hidden_size, output_size): 80 | 81 | # Ensures inheriting from torch.nn.Module goes nicely and cleanly. 82 | super(PolicyNetworkSingleTimestep, self).__init__() 83 | 84 | self.opts = opts 85 | self.input_size = input_size 86 | self.hidden_size = hidden_size 87 | self.output_size = output_size 88 | self.num_layers = 4 89 | self.maximum_length = 15 90 | 91 | # Define a bidirectional LSTM now. 92 | self.lstm = torch.nn.LSTM(input_size=self.input_size,hidden_size=self.hidden_size,num_layers=self.num_layers) 93 | 94 | # Define output layers for the LSTM, and activations for this output layer. 95 | self.output_layer = torch.nn.Linear(self.hidden_size, self.output_size) 96 | # Create linear layer to split prediction into mu and sigma. 97 | self.mu_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz) 98 | self.sig_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz) 99 | 100 | # Stopping probability predictor. (Softmax, not sigmoid) 101 | self.stopping_probability_layer = torch.nn.Linear(self.hidden_size, 2) 102 | 103 | self.softmax_layer = torch.nn.Softmax(dim=-1) 104 | self.logsoftmax_layer = torch.nn.LogSoftmax(dim=-1) 105 | 106 | def forward(self, input, hidden=None): 107 | # Input format must be: Sequence_Length x 1 x Input_Size. 108 | # Assuming input is a numpy array. 109 | format_input = torch.tensor(input).view(input.shape[0],1,self.input_size).cuda().float() 110 | 111 | # Instead of iterating over time and passing each timestep's input to the LSTM, we can now just pass the entire input sequence. 112 | outputs, hidden = self.lstm(format_input, hidden) 113 | 114 | # Predict parameters 115 | latentz_preparam = self.output_layer(outputs[-1]) 116 | # Remember, the policy is Gaussian (so we can implement VAE-KLD on it). 117 | latent_z_mu = self.mu_linear_layer(latentz_preparam) 118 | latent_z_log_sig = self.sig_linear_layer(latentz_preparam) 119 | 120 | # Predict stop probability. 121 | preact_stop_probs = self.stopping_probability_layer(outputs[-1]) 122 | stop_probability = self.softmax_layer(preact_stop_probs) 123 | 124 | stop = self.sample_action(stop_probability) 125 | 126 | # Remember, the policy is Gaussian (so we can implement VAE-KLD on it). 127 | latent_z_mu = self.mu_linear_layer(latentz_preparam) 128 | latent_z_log_sig = self.sig_linear_layer(latentz_preparam) 129 | 130 | # Compute standard deviation. 131 | std = torch.exp(0.5*latent_z_log_sig).cuda() 132 | # Sample random variable. 133 | eps = torch.randn_like(std).cuda() 134 | 135 | latent_z = latent_z_mu+eps*std 136 | 137 | # Compute KL Divergence Loss term here, so we don't have to return mu's and sigma's. 138 | # Taken from mime_plan_skill.py Line 159 - KL Divergence for Gaussian prior and Gaussian prediction. 139 | kld_loss = -0.5 * torch.sum(1. + latent_z_log_sig - latent_z_mu.pow(2) - latent_z_log_sig.exp()) 140 | 141 | # Create distributions so that we can evaluate log probability. 142 | dist = torch.distributions.MultivariateNormal(loc = latent_z_mu, covariance_matrix = std*torch.eye((self.opts.nz)).cuda()) 143 | 144 | # Evaluate log probability in forward so we don't have to do it elswhere. 145 | log_prob = dist.log_prob(latent_z) 146 | 147 | return latent_z, stop_probability, stop, log_prob, kld_loss, hidden 148 | 149 | def sample_action(self, action_probabilities): 150 | # Categorical distribution sampling. 151 | sample_action = torch.distributions.Categorical(probs=action_probabilities).sample().squeeze(0) 152 | return sample_action 153 | 154 | class AltPolicyNetworkSingleTimestep(torch.nn.Module): 155 | 156 | # Policy Network inherits from torch.nn.Module. 157 | # Now we overwrite the init, forward functions. And define anything else that we need. 158 | 159 | def __init__(self, opts, input_size, hidden_size, output_size): 160 | 161 | # Ensures inheriting from torch.nn.Module goes nicely and cleanly. 162 | super(AltPolicyNetworkSingleTimestep, self).__init__() 163 | 164 | self.opts = opts 165 | self.input_size = input_size 166 | self.hidden_size = hidden_size 167 | self.output_size = output_size 168 | self.num_layers = 4 169 | self.maximum_length = 15 170 | 171 | # Define a bidirectional LSTM now. 172 | self.lstm = torch.nn.LSTM(input_size=self.input_size,hidden_size=self.hidden_size,num_layers=self.num_layers) 173 | 174 | # Define output layers for the LSTM, and activations for this output layer. 175 | self.output_layer = torch.nn.Linear(self.hidden_size, self.output_size) 176 | # Create linear layer to split prediction into mu and sigma. 177 | self.mu_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz) 178 | self.sig_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz) 179 | self.softplus_activation_layer = torch.nn.Softplus() 180 | 181 | # Stopping probability predictor. (Softmax, not sigmoid) 182 | self.stopping_probability_layer = torch.nn.Linear(self.hidden_size, 2) 183 | 184 | self.softmax_layer = torch.nn.Softmax(dim=-1) 185 | self.logsoftmax_layer = torch.nn.LogSoftmax(dim=-1) 186 | 187 | def forward(self, input, hidden=None): 188 | # Input format must be: Sequence_Length x 1 x Input_Size. 189 | # Assuming input is a numpy array. 190 | format_input = torch.tensor(input).view(input.shape[0],1,self.input_size).cuda().float() 191 | 192 | # Instead of iterating over time and passing each timestep's input to the LSTM, we can now just pass the entire input sequence. 193 | outputs, hidden = self.lstm(format_input, hidden) 194 | 195 | # Predict parameters 196 | latentz_preparam = self.output_layer(outputs[-1]) 197 | # Remember, the policy is Gaussian (so we can implement VAE-KLD on it). 198 | latent_z_mu = self.mu_linear_layer(latentz_preparam) 199 | latent_z_log_sig = self.sig_linear_layer(latentz_preparam) 200 | latent_z_sig = self.softplus_activation_layer(self.sig_linear_layer(latentz_preparam)) 201 | 202 | # Predict stop probability. 203 | preact_stop_probs = self.stopping_probability_layer(outputs[-1]) 204 | stop_probability = self.softmax_layer(preact_stop_probs) 205 | 206 | stop = self.sample_action(stop_probability) 207 | 208 | # Create distributions so that we can evaluate log probability. 209 | dist = torch.distributions.MultivariateNormal(loc = latent_z_mu, covariance_matrix = torch.diag_embed(latent_z_sig)) 210 | 211 | latent_z = dist.sample() 212 | 213 | # Evaluate log probability in forward so we don't have to do it elswhere. 214 | log_prob = dist.log_prob(latent_z) 215 | 216 | 217 | # Set standard distribution for KL. 218 | standard_distribution = torch.distributions.MultivariateNormal(torch.zeros((self.output_size)).cuda(),torch.eye((self.output_size)).cuda()) 219 | # Compute KL. 220 | kl_divergence = torch.distributions.kl_divergence(dist, standard_distribution) 221 | 222 | return latent_z, stop_probability, stop, log_prob, kl_divergence, hidden 223 | 224 | def sample_action(self, action_probabilities): 225 | # Categorical distribution sampling. 226 | sample_action = torch.distributions.Categorical(probs=action_probabilities).sample().squeeze(0) 227 | return sample_action -------------------------------------------------------------------------------- /DownstreamRL/TrainZPolicyRL.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | 8 | """ 9 | 10 | # For both arms and grippers. 11 | python -m SkillsfromDemonstrations.Experiments.UseSkillsRL.TrainZPolicyRL --train --transformer --nz=64 --nh=64 --variable_nseg=False --network_dir=saved_models/T356_fnseg_vae_sl2pt0_kldwt0pt002_finetune --variable_ns=False --st_space=joint_both_gripper --vae_enc 12 | """ 13 | 14 | from __future__ import absolute_import 15 | 16 | import os, sys, torch 17 | import matplotlib.pyplot as plt 18 | from ...DataLoaders import MIME_DataLoader 19 | from ..abstraction import mime_eval 20 | from ..abstraction.abstraction_utils import ScoreFunctionEstimator 21 | from .PolicyNet import PolicyNetwork, PolicyNetworkSingleTimestep, AltPolicyNetworkSingleTimestep 22 | from absl import app, flags 23 | import imageio, numpy as np, copy, os, shutil 24 | from IPython import embed 25 | import robosuite 26 | import tensorboard, tensorboardX 27 | 28 | flags.DEFINE_boolean('train',False,'Whether to run train.') 29 | flags.DEFINE_boolean('debug',False,'Whether to debug.') 30 | # flags.DEFINE_float('sf_loss_wt', 0.1, 'Weight of pseudo loss for SF estimator') 31 | # flags.DEFINE_float('kld_loss_wt', 0, 'Weight for KL Divergence loss if using VAE encoder.') 32 | flags.DEFINE_float('reinforce_loss_wt', 1., 'Weight for primary reinforce loss.') 33 | # flags.DEFINE_string('name',None,'Name to give run.') 34 | 35 | class ZPolicyTrainer(object): 36 | 37 | def __init__(self, opts): 38 | 39 | self.opts = opts 40 | 41 | self.input_size = self.opts.n_state 42 | self.zpolicy_input_size = 85 43 | self.hidden_size = 20 44 | self.output_size = self.opts.nz 45 | 46 | self.primitive_length = 10 47 | self.learning_rate = 1e-4 48 | self.number_epochs = 200 49 | self.number_episodes = 500 50 | self.save_every_epoch = 5 51 | self.maximum_skills = 6 52 | 53 | def initialize_plots(self): 54 | self.log_dir = os.path.join("SkillsfromDemonstrations/cachedir/logs/RL",self.opts.name) 55 | if not(os.path.isdir(self.log_dir)): 56 | os.mkdir(self.log_dir) 57 | self.writer = tensorboardX.SummaryWriter(self.log_dir) 58 | 59 | def setup_networks(self): 60 | # Set up evaluator to load mime model and stuff. 61 | self.evaluator = mime_eval.PrimitiveDiscoverEvaluator(self.opts) 62 | self.evaluator.setup_testing(split='val') 63 | 64 | # Also create a ZPolicy. 65 | # self.z_policy = PolicyNetworkSingleTimestep(opts=self.opts, input_size=self.zpolicy_input_size, hidden_size=self.hidden_size, output_size=self.output_size).cuda() 66 | self.z_policy = AltPolicyNetworkSingleTimestep(opts=self.opts, input_size=self.zpolicy_input_size, hidden_size=self.hidden_size, output_size=self.output_size).cuda() 67 | 68 | if self.opts.variable_nseg: 69 | self.sf_loss_fn = ScoreFunctionEstimator() 70 | 71 | # Creating optimizer. 72 | self.z_policy_optimizer = torch.optim.Adam(self.z_policy.parameters(), lr=self.learning_rate) 73 | 74 | def load_network(self, network_dir): 75 | # Load the evaluator networks (Abstraction network and skill network) 76 | self.evaluator.load_network(self.evaluator.model, 'pred', 'latest', network_dir=network_dir) 77 | 78 | # Freeze parameters of the IntendedTrajectoryPredictorModel. 79 | for parameter in self.evaluator.model.parameters(): 80 | parameter.require_grad = False 81 | 82 | def save_zpolicy_model(self, path, suffix): 83 | if not(os.path.isdir(path)): 84 | os.mkdir(path) 85 | save_object = {} 86 | save_object['ZPolicy'] = self.z_policy.state_dict() 87 | torch.save(save_object,os.path.join(path,"ZPolicyModel"+suffix)) 88 | 89 | def load_all_models(self, path): 90 | load_object = torch.load(path) 91 | self.z_policy.load_state_dict(load_object['ZPolicy']) 92 | 93 | # def update_plots(self, counter, sample_map, loglikelihood): 94 | def update_plots(self, counter): 95 | 96 | if self.opts.variable_nseg: 97 | self.writer.add_scalar('Stop_Prob_Reinforce_Loss', torch.mean(self.stop_prob_reinforce_loss), counter) 98 | self.writer.add_scalar('Predicted_Zs_Reinforce_Loss', torch.mean(self.reinforce_predicted_Zs), counter) 99 | self.writer.add_scalar('KL_Divergence_Loss', torch.mean(self.kld_loss_seq), counter) 100 | self.writer.add_scalar('Total_Loss', torch.mean(self.total_loss), counter) 101 | 102 | def assemble_input(self, trajectory): 103 | traj_start = trajectory[0] 104 | traj_end = trajectory[-1] 105 | return torch.cat([torch.tensor(traj_start).cuda(),torch.tensor(traj_end).cuda()],dim=0) 106 | 107 | # def update_networks(self, state_traj, reward_traj, predicted_Zs): 108 | def update_networks(self, state_traj_torch, reward_traj, latent_z_seq, log_prob_seq, stop_prob_seq, stop_seq, kld_loss_seq): 109 | # embed() 110 | # Get cummulative rewards corresponding to actions executed after selecting a particular Z. -# This is basically adding up the rewards from the end of the array. 111 | # cumm_reward_to_go = torch.cumsum(torch.tensor(reward_traj[::-1]).cuda().float())[::-1] 112 | cumm_reward_to_go_numpy = copy.deepcopy(np.cumsum(copy.deepcopy(reward_traj[::-1]))[::-1]) 113 | cumm_reward_to_go = torch.tensor(cumm_reward_to_go_numpy).cuda().float() 114 | 115 | self.total_loss = 0. 116 | 117 | if self.opts.variable_nseg: 118 | # Remember, this stop probability loss is for stopping predicting Z's, #NOT INTERMEDIATE TIMESTEPS! 119 | # So we still use cumm_reward_to_go rather than cumm_reward_to_go_array 120 | 121 | self.stop_prob_reinforce_loss = self.sf_loss_fn.forward(cumm_reward_to_go, stop_prob_seq.unsqueeze(1), stop_seq.long()) 122 | # Add reinforce loss and loss value. 123 | self.total_loss += self.opts.sf_loss_wt*self.stop_prob_reinforce_loss 124 | 125 | # Now adding the reinforce loss associated with predicted Zs. 126 | # (Remember, we want to maximize reward times log prob, so multiply by -1 to minimize.) 127 | 128 | self.reinforce_predicted_Zs = (self.opts.reinforce_loss_wt * -1. * cumm_reward_to_go*log_prob_seq.view(-1)).sum() 129 | self.total_loss += self.reinforce_predicted_Zs 130 | 131 | # Add loss term with KL Divergence between 0 mean Gaussian and predicted Zs. 132 | 133 | self.kld_loss_seq = kld_loss_seq 134 | self.total_loss += self.opts.kld_loss_wt*self.kld_loss_seq[0] 135 | 136 | # Zero gradients of optimizer, compute backward, then step optimizer. 137 | self.z_policy_optimizer.zero_grad() 138 | self.total_loss.sum().backward() 139 | self.z_policy_optimizer.step() 140 | 141 | def reorder_actions(self, actions): 142 | 143 | # Assume that the actions are 16 dimensional, and are ordered as: 144 | # 7 DoF for left arm, 7 DoF for right arm, 1 for left gripper, and 1 for right gripper. 145 | 146 | # The original trajectory has gripper values from 0 (Close) to 1 (Open), but we've to rescale to -1 (Open) to 1 (Close) for Mujoco. 147 | # And handle joint velocities. 148 | # MIME Gripper values are from 0 to 100 (Close to Open), but we assume actions has values from 0 to 1 (Close to Open), and then rescale to (-1 Open to 1 Close) for Mujoco. 149 | # Mujoco needs them flipped. 150 | 151 | indices = np.array([ 7, 8, 9, 10, 11, 12, 13, 0, 1, 2, 3, 4, 5, 6, 15, 14]) 152 | reordered_actions = actions[:,indices] 153 | reordered_actions[:,14:] = 1 - 2*reordered_actions[:,14:] 154 | return reordered_actions 155 | 156 | def run_episode(self, counter): 157 | 158 | # For number of epochs: 159 | # # 1) Given start and goal (for reaching task, say) 160 | # # 2) Run Z_Policy on start and goal to retrieve predicted Zs. 161 | # # 3) Decode predicted Zs into trajectory. 162 | # # 4) Retrieve "actions" from trajectory. 163 | # # 5) Feed "actions" into RL environment and collect reward. 164 | # # 6) Train ZPolicy to maximize cummulative reward with favorite RL algorithm. 165 | 166 | # Reset environment. 167 | state = self.environment.reset() 168 | terminal = False 169 | reward_traj = None 170 | state_traj_torch = None 171 | t_out = 0 172 | stop = False 173 | hidden = None 174 | latent_z_seq = None 175 | stop_prob_seq = None 176 | stop_seq = None 177 | log_prob_seq = None 178 | kld_loss_seq = 0. 179 | previous_state = None 180 | 181 | while terminal==False and stop==False: 182 | 183 | ######################################################## 184 | ######## 1) Collect input for first timestep. ########## 185 | ######################################################## 186 | zpolicy_input = np.concatenate([state['robot-state'],state['object-state']]).reshape(1,self.zpolicy_input_size) 187 | 188 | ######################################################## 189 | # 2) Feed into the Z policy to retrieve the predicted Z. 190 | ######################################################## 191 | latent_z, stop_probability, stop, log_prob, kld_loss, hidden = self.z_policy.forward(zpolicy_input, hidden=hidden) 192 | latent_z = latent_z.squeeze(1) 193 | 194 | ######################################################## 195 | ############## 3) Decode into trajectory. ############## 196 | ######################################################## 197 | 198 | primitive_and_skill_stop_prob = self.evaluator.model.primitive_decoder(latent_z) 199 | traj_seg = primitive_and_skill_stop_prob[0].squeeze(1).detach().cpu().numpy() 200 | 201 | if previous_state is None: 202 | previous_state = traj_seg[-1].reshape(1,self.opts.n_state) 203 | else: 204 | # Concatenate previous state to trajectory, so that when we take actions we get an action from previous segment to the current one. 205 | traj_seg = np.concatenate([previous_state,traj_seg],axis=0) 206 | previous_state = traj_seg[-1].reshape(-1,self.opts.n_state) 207 | 208 | ######################################################## 209 | ## 4) Finite diff along time axis to retrieve actions ## 210 | ######################################################## 211 | actions = np.diff(traj_seg,axis=0) 212 | actions = self.reorder_actions(actions) 213 | actions_torch = torch.tensor(actions).cuda().float() 214 | 215 | cummulative_reward_in_segment = 0. 216 | # Run step into evironment for all actions in this segment. 217 | t = 0 218 | while t=self.maximum_skills: 259 | stop = True 260 | 261 | # if self.opts.debug==True: 262 | # embed() 263 | 264 | if self.opts.train: 265 | # 6) Feed states, actions, reward, and predicted Zs to update. (These are all lists of tensors.) 266 | # self.update_networks(state_traj_torch, action_torch, reward_traj, latent_zs) 267 | self.update_networks(state_traj_torch, reward_traj, latent_z_seq, log_prob_seq, stop_prob_seq, stop_seq, kld_loss_seq) 268 | self.update_plots(counter) 269 | 270 | def setup_RL_environment(self, has_display=False): 271 | 272 | # Create Mujoco environment. 273 | self.environment = robosuite.make("BaxterLift", has_renderer=has_display) 274 | self.initialize_plots() 275 | 276 | def trainRL(self): 277 | 278 | 279 | # Basic function to train. 280 | counter = 0 281 | 282 | for e in range(self.number_epochs): 283 | 284 | # Number of episodes per epoch. 285 | for i in range(self.number_episodes): 286 | 287 | print("#########################################") 288 | print("Epoch: ",e,"Traj: ",i) 289 | 290 | # Run an episode. 291 | self.run_episode(counter) 292 | 293 | counter += 1 294 | 295 | if self.opts.train and e%self.save_every_epoch==0: 296 | self.save_zpolicy_model(os.path.join("saved_models/RL",self.opts.name), "epoch{0}".format(e)) 297 | 298 | def main(_): 299 | 300 | # This is only to be executed for notebooks. 301 | # flags.FLAGS(['']) 302 | opts = flags.FLAGS 303 | 304 | # Set state space. 305 | if opts.st_space == 'ee_r' or opts.st_space == 'ee_l': 306 | opts.n_state = 7 307 | if opts.st_space == 'joint_ra' or opts.st_space == 'joint_la': 308 | opts.n_state = 7 309 | if opts.st_space == 'joint_both': 310 | opts.n_state = 14 311 | elif opts.st_space == 'ee_all': 312 | opts.n_state = 14 313 | elif opts.st_space == 'joint': 314 | opts.n_state = 17 315 | elif opts.st_space =='joint_both_gripper': 316 | opts.n_state = 16 317 | 318 | opts.logging_dir = os.path.join(opts.logging_dir, 'mime') 319 | opts.transformer = True 320 | 321 | torch.manual_seed(0) 322 | 323 | # Create instance of class. 324 | zpolicy_trainer = ZPolicyTrainer(opts) 325 | zpolicy_trainer.setup_networks() 326 | zpolicy_trainer.setup_RL_environment() 327 | # Still need this to load primitive decoder network. 328 | zpolicy_trainer.load_network(opts.network_dir) 329 | zpolicy_trainer.trainRL() 330 | 331 | 332 | if __name__ == '__main__': 333 | app.run(main) 334 | -------------------------------------------------------------------------------- /Experiments/Code_Runs/CycleTransfer_Runs.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | 3 | # Debugging cycle consistency transfer. 4 | 5 | python Master.py --name=CTdebug --train=1 --setting=cycle_transfer --source_domain=ContinuousNonZero --target_domain=ContinuousNonZero --z_dimensions=64 --number_layers=5 --hidden_size=64 --data=ContinuousNonZero --training_phase_size=10000 --display_freq=1000 --eval_freq=4 --alternating_phase_size=200 --discriminator_phase_size=2 --vae_loss_weight=1. --discriminability_weight=2.0 --kl_weight=0.001 6 | -------------------------------------------------------------------------------- /Experiments/DMP.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from headers import * 8 | 9 | class DMP(): 10 | 11 | # def __init__(self, time_steps=100, num_ker=25, dimensions=3, kernel_bandwidth=None, alphaz=None, time_basis=False): 12 | def __init__(self, time_steps=40, num_ker=15, dimensions=7, kernel_bandwidth=3.5, alphaz=5., time_basis=True): 13 | # DMP(dimensions=7,time_steps=40,num_ker=15,kernel_bandwidth=3.5,alphaz=5.,time_basis=True) 14 | 15 | # self.alphaz = 25.0 16 | if alphaz is not None: 17 | self.alphaz = alphaz 18 | else: 19 | self.alphaz = 10. 20 | self.betaz = self.alphaz/4 21 | self.alpha = self.alphaz/3 22 | 23 | self.time_steps = time_steps 24 | self.tau = self.time_steps 25 | # self.tau = 1. 26 | self.use_time_basis = time_basis 27 | 28 | self.dimensions = dimensions 29 | # self.number_kernels = max(500,self.time_steps) 30 | self.number_kernels = num_ker 31 | if kernel_bandwidth is not None: 32 | self.kernel_bandwidth = kernel_bandwidth 33 | else: 34 | self.kernel_bandwidth = self.calculate_good_sigma(self.time_steps, self.number_kernels) 35 | self.epsilon = 0.001 36 | self.setup() 37 | 38 | def setup(self): 39 | 40 | self.gaussian_kernels = np.zeros((self.number_kernels,2)) 41 | 42 | self.weights = np.zeros((self.number_kernels, self.dimensions)) 43 | 44 | self.demo_pos = np.zeros((self.time_steps, self.dimensions)) 45 | self.demo_vel = np.zeros((self.time_steps, self.dimensions)) 46 | self.demo_acc = np.zeros((self.time_steps, self.dimensions)) 47 | 48 | self.target_forces = np.zeros((self.time_steps, self.dimensions)) 49 | self.phi = np.zeros((self.number_kernels, self.time_steps, self.time_steps)) 50 | self.eta = np.zeros((self.time_steps, self.dimensions)) 51 | self.vector_phase = np.zeros(self.time_steps) 52 | 53 | # Defining Rollout variables. 54 | self.rollout_time = self.time_steps 55 | self.dt = 1./self.rollout_time 56 | self.pos_roll = np.zeros((self.rollout_time,self.dimensions)) 57 | self.vel_roll = np.zeros((self.rollout_time,self.dimensions)) 58 | self.acc_roll = np.zeros((self.rollout_time,self.dimensions)) 59 | self.force_roll = np.zeros((self.rollout_time,self.dimensions)) 60 | self.goal = np.zeros(self.dimensions) 61 | self.start = np.zeros(self.dimensions) 62 | 63 | def calculate_good_sigma(self, time, number_kernels, threshold=0.15): 64 | return time/(2*(number_kernels-1)*(np.sqrt(-np.log(threshold)))) 65 | 66 | def load_trajectory(self,pos,vel=None,acc=None): 67 | 68 | self.demo_pos = np.zeros((self.time_steps, self.dimensions)) 69 | self.demo_vel = np.zeros((self.time_steps, self.dimensions)) 70 | self.demo_acc = np.zeros((self.time_steps, self.dimensions)) 71 | 72 | if vel is not None and acc is not None: 73 | self.demo_pos = copy.deepcopy(pos) 74 | self.demo_vel = copy.deepcopy(vel) 75 | self.demo_acc = copy.deepcopy(acc) 76 | else: 77 | self.smooth_interpolate(pos) 78 | 79 | def smooth_interpolate(self, pos): 80 | # Filter the posiiton input by Gaussian smoothing. 81 | smooth_pos = gaussian_filter1d(pos,3.5,axis=0,mode='nearest') 82 | 83 | time_range = np.linspace(0, pos.shape[0]-1, pos.shape[0]) 84 | new_time_range = np.linspace(0,pos.shape[0]-1,self.time_steps+2) 85 | 86 | self.interpolated_pos = np.zeros((self.time_steps+2,self.dimensions)) 87 | interpolating_objects = [] 88 | 89 | for i in range(self.dimensions): 90 | interpolating_objects.append(interp1d(time_range,pos[:,i],kind='linear')) 91 | self.interpolated_pos[:,i] = interpolating_objects[i](new_time_range) 92 | 93 | self.demo_vel = np.diff(self.interpolated_pos,axis=0)[:self.time_steps] 94 | self.demo_acc = np.diff(self.interpolated_pos,axis=0,n=2)[:self.time_steps] 95 | self.demo_pos = self.interpolated_pos[:self.time_steps] 96 | 97 | def initialize_variables(self): 98 | self.weights = np.zeros((self.number_kernels, self.dimensions)) 99 | self.target_forces = np.zeros((self.time_steps, self.dimensions)) 100 | self.phi = np.zeros((self.number_kernels, self.time_steps, self.time_steps)) 101 | self.eta = np.zeros((self.time_steps, self.dimensions)) 102 | 103 | self.kernel_centers = np.linspace(0,self.time_steps,self.number_kernels) 104 | 105 | self.vector_phase = self.calc_vector_phase(self.kernel_centers) 106 | self.gaussian_kernels[:,0] = self.vector_phase 107 | 108 | # Different kernel parameters that have worked before, giving different behavior. 109 | # # dummy = (np.diff(self.gaussian_kernels[:,0]*0.55))**2 110 | # # dummy = (np.diff(self.gaussian_kernels[:,0]*2))**2 111 | # # dummy = (np.diff(self.gaussian_kernels[:,0]))**2 112 | 113 | dummy = (np.diff(self.gaussian_kernels[:,0]*self.kernel_bandwidth))**2 114 | self.gaussian_kernels[:,1] = 1. / np.append(dummy,dummy[-1]) 115 | 116 | # self.gaussian_kernels[:,1] = self.number_kernels/self.gaussian_kernels[:,0] 117 | 118 | def calc_phase(self,time): 119 | return np.exp(-self.alpha*float(time)/self.tau) 120 | 121 | def calc_vector_phase(self,time): 122 | return np.exp(-self.alpha*time.astype(float)/self.tau) 123 | 124 | def basis(self,index,time): 125 | return np.exp(-(self.gaussian_kernels[index,1])*((self.calc_phase(time)-self.gaussian_kernels[index,0])**2)) 126 | 127 | def time_basis(self, index, time): 128 | # return np.exp(-(self.gaussian_kernels[index,1])*((time-self.kernel_centers[index])**2)) 129 | # return np.exp(-(time-self.kernel_centers[index])**2) 130 | return np.exp(-((time-self.kernel_centers[index])**2)/(self.kernel_bandwidth)) 131 | 132 | def vector_basis(self, index, time_range): 133 | return np.exp(-(self.gaussian_kernels[index,1])*((self.calc_vector_phase(time_range)-self.gaussian_kernels[index,0])**2)) 134 | 135 | def update_target_force_itau(self): 136 | self.target_forces = (self.tau**2)*self.demo_acc - self.alphaz*(self.betaz*(self.demo_pos[self.time_steps-1]-self.demo_pos)-self.tau*self.demo_vel) 137 | 138 | def update_target_force_dtau(self): 139 | self.target_forces = self.demo_acc/(self.tau**2) - self.alphaz*(self.betaz*(self.demo_pos[self.time_steps-1]-self.demo_pos)-self.demo_vel/self.tau) 140 | 141 | def update_target_force(self): 142 | self.target_forces = self.demo_acc - self.alphaz*(self.betaz*(self.demo_pos[self.time_steps-1]-self.demo_pos)-self.demo_vel) 143 | 144 | def update_phi(self): 145 | for i in range(self.number_kernels): 146 | for t in range(self.time_steps): 147 | if self.use_time_basis: 148 | self.phi[i,t,t] = self.time_basis(i,t) 149 | else: 150 | self.phi[i,t,t] = self.basis(i,t) 151 | 152 | def update_eta(self): 153 | t_range = np.linspace(0,self.time_steps,self.time_steps) 154 | vector_phase = self.calc_vector_phase(t_range) 155 | 156 | for k in range(self.dimensions): 157 | self.eta[:,k] = vector_phase*(self.demo_pos[self.time_steps-1,k]-self.demo_pos[0,k]) 158 | 159 | def learn_DMP(self, pos, forces="i"): 160 | self.setup() 161 | self.load_trajectory(pos) 162 | self.initialize_variables() 163 | self.learn_weights(forces=forces) 164 | 165 | def learn_weights(self, forces="i"): 166 | 167 | if forces=="i": 168 | self.update_target_force_itau() 169 | elif forces=="d": 170 | self.update_target_force_dtau() 171 | elif forces=="n": 172 | self.update_target_force() 173 | self.update_phi() 174 | self.update_eta() 175 | 176 | for j in range(self.dimensions): 177 | for i in range(self.number_kernels): 178 | self.weights[i,j] = np.dot(self.eta[:,j],np.dot(self.phi[i],self.target_forces[:,j])) 179 | self.weights[i,j] /= np.dot(self.eta[:,j],np.dot(self.phi[i],self.eta[:,j])) + self.epsilon 180 | 181 | def initialize_rollout(self,start,goal,init_vel): 182 | 183 | self.pos_roll = np.zeros((self.rollout_time,self.dimensions)) 184 | self.vel_roll = np.zeros((self.rollout_time,self.dimensions)) 185 | self.acc_roll = np.zeros((self.rollout_time,self.dimensions)) 186 | 187 | self.tau = self.rollout_time 188 | self.pos_roll[0] = copy.deepcopy(start) 189 | self.vel_roll[0] = copy.deepcopy(init_vel) 190 | self.goal = goal 191 | self.start = start 192 | self.dt = self.tau/self.rollout_time 193 | # print(self.dt,self.tau,self.rollout_time) 194 | 195 | def calc_rollout_force(self, roll_time): 196 | den = 0 197 | time = copy.deepcopy(roll_time) 198 | for i in range(self.number_kernels): 199 | 200 | if self.use_time_basis: 201 | self.force_roll[roll_time] += self.time_basis(i,time)*self.weights[i] 202 | den += self.time_basis(i,time) 203 | else: 204 | self.force_roll[roll_time] += self.basis(i,time)*self.weights[i] 205 | den += self.basis(i,time) 206 | 207 | self.force_roll[roll_time] *= (self.goal-self.start)*self.calc_phase(time)/den 208 | 209 | def calc_rollout_acceleration(self,time): 210 | self.acc_roll[time] = (1./self.tau**2)*(self.alphaz * (self.betaz * (self.goal - self.pos_roll[time]) - self.tau*self.vel_roll[time]) + self.force_roll[time]) 211 | 212 | def calc_rollout_vel(self,time): 213 | self.vel_roll[time] = self.vel_roll[time-1] + self.acc_roll[time-1]*self.dt 214 | 215 | def calc_rollout_pos(self,time): 216 | self.pos_roll[time] = self.pos_roll[time-1] + self.vel_roll[time-1]*self.dt 217 | 218 | def rollout(self,start,goal,init_vel): 219 | self.initialize_rollout(start,goal,init_vel) 220 | self.calc_rollout_force(0) 221 | self.calc_rollout_acceleration(0) 222 | for i in range(1,self.rollout_time): 223 | self.calc_rollout_force(i) 224 | self.calc_rollout_vel(i) 225 | self.calc_rollout_pos(i) 226 | self.calc_rollout_acceleration(i) 227 | return self.pos_roll 228 | 229 | def load_weights(self, weight): 230 | self.weights = copy.deepcopy(weight) 231 | 232 | def main(args): 233 | 234 | pos = np.load(str(sys.argv[1]))[:,:3] 235 | vel = np.load(str(sys.argv[2]))[:,:3] 236 | acc = np.load(str(sys.argv[3]))[:,:3] 237 | 238 | rolltime = 500 239 | dmp = DMP(rolltime) 240 | 241 | dmp.load_trajectory(pos) 242 | dmp.initialize_variables() 243 | dmp.learn_DMP() 244 | 245 | start = np.zeros(dmp.dimensions) 246 | goal = np.ones(dmp.dimensions) 247 | norm_vector = pos[-1]-pos[0] 248 | init_vel = np.divide(vel[0],norm_vector) 249 | 250 | dmp.rollout(start, goal, init_vel) 251 | dmp.save_rollout() 252 | 253 | -------------------------------------------------------------------------------- /Experiments/DataLoaders.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from headers import * 8 | 9 | class GridWorldDataset(Dataset): 10 | 11 | # Class implementing instance of dataset class for gridworld data. 12 | 13 | def __init__(self, dataset_directory): 14 | self.dataset_directory = dataset_directory 15 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 16 | 17 | self.action_map = np.array([[-1,0],[1,0],[0,-1],[0,1],[-1,-1],[-1,1],[1,-1],[1,1]]) 18 | ## UP, DOWN, LEFT, RIGHT, UPLEFT, UPRIGHT, DOWNLEFT, DOWNRIGHT. ## 19 | 20 | def __len__(self): 21 | 22 | # Find out how many images we've stored. 23 | filelist = glob.glob(os.path.join(self.dataset_directory,"*.png")) 24 | 25 | # FOR NOW: USE ONLY till 3200 images. 26 | return 3200 27 | # return len(filelist) 28 | 29 | def parse_trajectory_actions(self, coordinate_trajectory): 30 | # Takes coordinate trajectory, returns action index taken. 31 | 32 | state_diffs = np.diff(coordinate_trajectory,axis=0) 33 | action_sequence = np.zeros((len(state_diffs)),dtype=int) 34 | 35 | for i in range(len(state_diffs)): 36 | for k in range(len(self.action_map)): 37 | if (state_diffs[i]==self.action_map[k]).all(): 38 | action_sequence[i]=k 39 | 40 | return action_sequence.astype(float) 41 | 42 | def __getitem__(self, index): 43 | 44 | # The getitem function must return a Map-Trajectory pair. 45 | # We will handle per-timestep processes within our code. 46 | # Assumes index is within range [0,len(filelist)-1] 47 | image = cv2.imread(os.path.join(self.dataset_directory,"Image{0}.png".format(index))) 48 | coordinate_trajectory = np.load(os.path.join(self.dataset_directory,"Image{0}_Traj1.npy".format(index))).astype(float) 49 | 50 | action_sequence = self.parse_trajectory_actions(coordinate_trajectory) 51 | 52 | return image, coordinate_trajectory, action_sequence 53 | 54 | class SmallMapsDataset(Dataset): 55 | 56 | # Class implementing instance of dataset class for gridworld data. 57 | 58 | def __init__(self, dataset_directory): 59 | self.dataset_directory = dataset_directory 60 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 61 | 62 | self.action_map = np.array([[-1,0],[1,0],[0,-1],[0,1],[-1,-1],[-1,1],[1,-1],[1,1]]) 63 | ## UP, DOWN, LEFT, RIGHT, UPLEFT, UPRIGHT, DOWNLEFT, DOWNRIGHT. ## 64 | 65 | def __len__(self): 66 | 67 | # Find out how many images we've stored. 68 | filelist = glob.glob(os.path.join(self.dataset_directory,"*.png")) 69 | return 4000 70 | # return len(filelist) 71 | 72 | def parse_trajectory_actions(self, coordinate_trajectory): 73 | # Takes coordinate trajectory, returns action index taken. 74 | 75 | state_diffs = np.diff(coordinate_trajectory,axis=0) 76 | action_sequence = np.zeros((len(state_diffs)),dtype=int) 77 | 78 | for i in range(len(state_diffs)): 79 | for k in range(len(self.action_map)): 80 | if (state_diffs[i]==self.action_map[k]).all(): 81 | action_sequence[i]=k 82 | 83 | return action_sequence.astype(float) 84 | 85 | def __getitem__(self, index): 86 | 87 | # The getitem function must return a Map-Trajectory pair. 88 | # We will handle per-timestep processes within our code. 89 | # Assumes index is within range [0,len(filelist)-1] 90 | image = np.load(os.path.join(self.dataset_directory,"Map{0}.npy".format(index))) 91 | time_limit = 20 92 | coordinate_trajectory = np.load(os.path.join(self.dataset_directory,"Map{0}_Traj1.npy".format(index))).astype(float)[:time_limit] 93 | action_sequence = self.parse_trajectory_actions(coordinate_trajectory) 94 | 95 | return image, coordinate_trajectory, action_sequence 96 | 97 | class ToyDataset(Dataset): 98 | 99 | # Class implementing instance of dataset class for toy data. 100 | 101 | def __init__(self, dataset_directory): 102 | self.dataset_directory = dataset_directory 103 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 104 | 105 | self.x_path = os.path.join(self.dataset_directory,"X_array_actions.npy") 106 | self.a_path = os.path.join(self.dataset_directory,"A_array_actions.npy") 107 | 108 | self.X_array = np.load(self.x_path) 109 | self.A_array = np.load(self.a_path) 110 | 111 | def __len__(self): 112 | return 50000 113 | 114 | def __getitem__(self, index): 115 | 116 | # Return trajectory and action sequence. 117 | return self.X_array[index],self.A_array[index] 118 | 119 | class ContinuousToyDataset(Dataset): 120 | 121 | # Class implementing instance of dataset class for toy data. 122 | 123 | def __init__(self, dataset_directory): 124 | self.dataset_directory = dataset_directory 125 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 126 | 127 | self.x_path = os.path.join(self.dataset_directory,"X_array_continuous.npy") 128 | self.a_path = os.path.join(self.dataset_directory,"A_array_continuous.npy") 129 | self.y_path = os.path.join(self.dataset_directory,"Y_array_continuous.npy") 130 | self.b_path = os.path.join(self.dataset_directory,"B_array_continuous.npy") 131 | 132 | self.X_array = np.load(self.x_path) 133 | self.A_array = np.load(self.a_path) 134 | self.Y_array = np.load(self.y_path) 135 | self.B_array = np.load(self.b_path) 136 | 137 | def __len__(self): 138 | return 50000 139 | 140 | def __getitem__(self, index): 141 | 142 | # Return trajectory and action sequence. 143 | return self.X_array[index],self.A_array[index] 144 | 145 | def get_latent_variables(self, index): 146 | return self.B_array[index],self.Y_array[index] 147 | 148 | class ContinuousDirectedToyDataset(Dataset): 149 | 150 | # Class implementing instance of dataset class for toy data. 151 | 152 | def __init__(self, dataset_directory): 153 | self.dataset_directory = dataset_directory 154 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 155 | 156 | self.x_path = os.path.join(self.dataset_directory,"X_array_directed_continuous.npy") 157 | self.a_path = os.path.join(self.dataset_directory,"A_array_directed_continuous.npy") 158 | self.y_path = os.path.join(self.dataset_directory,"Y_array_directed_continuous.npy") 159 | self.b_path = os.path.join(self.dataset_directory,"B_array_directed_continuous.npy") 160 | 161 | self.X_array = np.load(self.x_path) 162 | self.A_array = np.load(self.a_path) 163 | self.Y_array = np.load(self.y_path) 164 | self.B_array = np.load(self.b_path) 165 | 166 | def __len__(self): 167 | return 50000 168 | 169 | def __getitem__(self, index): 170 | 171 | # Return trajectory and action sequence. 172 | return self.X_array[index],self.A_array[index] 173 | 174 | def get_latent_variables(self, index): 175 | return self.B_array[index],self.Y_array[index] 176 | 177 | class ContinuousNonZeroToyDataset(Dataset): 178 | 179 | # Class implementing instance of dataset class for toy data. 180 | 181 | def __init__(self, dataset_directory): 182 | self.dataset_directory = dataset_directory 183 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 184 | 185 | self.x_path = os.path.join(self.dataset_directory,"X_array_continuous_nonzero.npy") 186 | self.a_path = os.path.join(self.dataset_directory,"A_array_continuous_nonzero.npy") 187 | self.y_path = os.path.join(self.dataset_directory,"Y_array_continuous_nonzero.npy") 188 | self.b_path = os.path.join(self.dataset_directory,"B_array_continuous_nonzero.npy") 189 | 190 | self.X_array = np.load(self.x_path) 191 | self.A_array = np.load(self.a_path) 192 | self.Y_array = np.load(self.y_path) 193 | self.B_array = np.load(self.b_path) 194 | 195 | def __len__(self): 196 | return 50000 197 | 198 | def __getitem__(self, index): 199 | 200 | # Return trajectory and action sequence. 201 | return self.X_array[index],self.A_array[index] 202 | 203 | def get_latent_variables(self, index): 204 | return self.B_array[index],self.Y_array[index] 205 | 206 | class ContinuousDirectedNonZeroToyDataset(Dataset): 207 | 208 | # Class implementing instance of dataset class for toy data. 209 | 210 | def __init__(self, dataset_directory): 211 | self.dataset_directory = dataset_directory 212 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 213 | 214 | self.x_path = os.path.join(self.dataset_directory,"X_dir_cont_nonzero.npy") 215 | self.a_path = os.path.join(self.dataset_directory,"A_dir_cont_nonzero.npy") 216 | self.y_path = os.path.join(self.dataset_directory,"Y_dir_cont_nonzero.npy") 217 | self.b_path = os.path.join(self.dataset_directory,"B_dir_cont_nonzero.npy") 218 | self.g_path = os.path.join(self.dataset_directory,"G_dir_cont_nonzero.npy") 219 | 220 | self.X_array = np.load(self.x_path) 221 | self.A_array = np.load(self.a_path) 222 | self.Y_array = np.load(self.y_path) 223 | self.B_array = np.load(self.b_path) 224 | self.G_array = np.load(self.g_path) 225 | 226 | def __len__(self): 227 | return 50000 228 | 229 | def __getitem__(self, index): 230 | 231 | # Return trajectory and action sequence. 232 | return self.X_array[index],self.A_array[index] 233 | 234 | def get_latent_variables(self, index): 235 | return self.B_array[index],self.Y_array[index] 236 | 237 | class GoalDirectedDataset(Dataset): 238 | 239 | # Class implementing instance of dataset class for toy data. 240 | 241 | def __init__(self, dataset_directory): 242 | self.dataset_directory = dataset_directory 243 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 244 | 245 | self.x_path = os.path.join(self.dataset_directory,"X_goal_directed.npy") 246 | self.a_path = os.path.join(self.dataset_directory,"A_goal_directed.npy") 247 | self.y_path = os.path.join(self.dataset_directory,"Y_goal_directed.npy") 248 | self.b_path = os.path.join(self.dataset_directory,"B_goal_directed.npy") 249 | self.g_path = os.path.join(self.dataset_directory,"G_goal_directed.npy") 250 | 251 | self.X_array = np.load(self.x_path) 252 | self.A_array = np.load(self.a_path) 253 | self.Y_array = np.load(self.y_path) 254 | self.B_array = np.load(self.b_path) 255 | self.G_array = np.load(self.g_path) 256 | 257 | def __len__(self): 258 | return 50000 259 | 260 | def __getitem__(self, index): 261 | 262 | # Return trajectory and action sequence. 263 | return self.X_array[index],self.A_array[index] 264 | 265 | def get_latent_variables(self, index): 266 | return self.B_array[index],self.Y_array[index] 267 | 268 | def get_goal(self, index): 269 | return self.G_array[index] 270 | 271 | class DeterministicGoalDirectedDataset(Dataset): 272 | 273 | # Class implementing instance of dataset class for toy data. 274 | 275 | def __init__(self, dataset_directory): 276 | self.dataset_directory = dataset_directory 277 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 278 | 279 | self.x_path = os.path.join(self.dataset_directory,"X_deter_goal_directed.npy") 280 | self.a_path = os.path.join(self.dataset_directory,"A_deter_goal_directed.npy") 281 | self.y_path = os.path.join(self.dataset_directory,"Y_deter_goal_directed.npy") 282 | self.b_path = os.path.join(self.dataset_directory,"B_deter_goal_directed.npy") 283 | self.g_path = os.path.join(self.dataset_directory,"G_deter_goal_directed.npy") 284 | 285 | self.X_array = np.load(self.x_path) 286 | self.A_array = np.load(self.a_path) 287 | self.Y_array = np.load(self.y_path) 288 | self.B_array = np.load(self.b_path) 289 | self.G_array = np.load(self.g_path) 290 | 291 | self.goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5 292 | 293 | def __len__(self): 294 | return 50000 295 | 296 | def __getitem__(self, index): 297 | 298 | # Return trajectory and action sequence. 299 | return self.X_array[index],self.A_array[index] 300 | 301 | def get_latent_variables(self, index): 302 | return self.B_array[index],self.Y_array[index] 303 | 304 | def get_goal(self, index): 305 | return self.G_array[index] 306 | 307 | def get_goal_position(self, index): 308 | return self.goal_states[self.G_array[index]] 309 | 310 | class SeparableDataset(Dataset): 311 | 312 | # Class implementing instance of dataset class for toy data. 313 | 314 | def __init__(self, dataset_directory): 315 | self.dataset_directory = dataset_directory 316 | # For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2 317 | 318 | self.x_path = os.path.join(self.dataset_directory,"X_separable.npy") 319 | self.a_path = os.path.join(self.dataset_directory,"A_separable.npy") 320 | self.y_path = os.path.join(self.dataset_directory,"Y_separable.npy") 321 | self.b_path = os.path.join(self.dataset_directory,"B_separable.npy") 322 | self.g_path = os.path.join(self.dataset_directory,"G_separable.npy") 323 | self.s_path = os.path.join(self.dataset_directory,"StartConfig_separable.npy") 324 | 325 | self.X_array = np.load(self.x_path) 326 | self.A_array = np.load(self.a_path) 327 | self.Y_array = np.load(self.y_path) 328 | self.B_array = np.load(self.b_path) 329 | self.G_array = np.load(self.g_path) 330 | self.S_array = np.load(self.s_path) 331 | 332 | def __len__(self): 333 | return 50000 334 | 335 | def __getitem__(self, index): 336 | 337 | # Return trajectory and action sequence. 338 | return self.X_array[index],self.A_array[index] 339 | 340 | def get_latent_variables(self, index): 341 | return self.B_array[index],self.Y_array[index] 342 | 343 | def get_goal(self, index): 344 | return self.G_array[index] 345 | 346 | def get_startconfig(self, index): 347 | return self.S_array[index] -------------------------------------------------------------------------------- /Experiments/Eval_RLRewards.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | 8 | import numpy as np, glob, os 9 | from IPython import embed 10 | 11 | # Env list. 12 | environment_names = ["SawyerPickPlaceBread","SawyerPickPlaceCan","SawyerPickPlaceCereal","SawyerPickPlaceMilk","SawyerNutAssemblyRound","SawyerNutAssemblySquare"] 13 | 14 | # Evaluate baselineRL methods. 15 | a = 86 16 | b = 86 17 | 18 | a = 130 19 | b = 137 20 | prefix = 'RL' 21 | increment = 100 22 | reward_list = [] 23 | 24 | for i in range(a,b+1): 25 | 26 | model_template = "RL{0}/saved_models/Model_epoch*".format(i) 27 | models = glob.glob(model_template) 28 | # number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models] 29 | max_model = int(models[-1].lstrip("RL{0}/saved_models/Model_epoch".format(i))) 30 | 31 | model_range = np.arange(0,max_model+increment,increment) 32 | rewards = np.zeros((len(model_range))) 33 | 34 | for j in range(len(model_range)): 35 | rewards[j] = np.load("RL{0}/MEval/m{1}/Mean_Reward_RL{0}.npy".format(i,model_range[j])) 36 | 37 | reward_list.append(rewards) 38 | 39 | embed() 40 | # x = np.arange(0,260,20) 41 | # dists = np.zeros((6,len(x),100)) 42 | # a = 6 43 | # b = 12 44 | # for i in range(a,b): 45 | # for j in range(len(x)): 46 | # dists[i-a,j] = np.load("IL0{0}/MEval/m{1}/Total_Rewards_IL0{0}.npy".format(str(i).zfill(2),x[j])) 47 | 48 | 49 | # IL 50 | a = 18 51 | b = 23 52 | prefix = 'IL0' 53 | increment = 20 54 | reward_list = [] 55 | 56 | for i in range(a,b+1): 57 | 58 | model_template = "{0}{1}/saved_models/Model_epoch*".format(prefix,i) 59 | models = glob.glob(model_template) 60 | # number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models] 61 | max_model = int(models[-1].lstrip("{0}{1}/saved_models/Model_epoch".format(prefix,i))) 62 | 63 | model_range = np.arange(0,max_model+increment,increment) 64 | rewards = np.zeros((len(model_range))) 65 | 66 | for j in range(len(model_range)): 67 | rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(i,model_range[j],prefix)) 68 | 69 | reward_list.append(rewards) 70 | 71 | # Get distances 72 | a = 30 73 | b = 37 74 | prefix = 'RJ' 75 | increment = 20 76 | distance_list = [] 77 | 78 | for i in range(a,b+1): 79 | 80 | model_template = "{0}{1}/saved_models/Model_epoch*".format(prefix,i) 81 | models = glob.glob(model_template) 82 | # number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models] 83 | max_model = int(models[-1].lstrip("{0}{1}/saved_models/Model_epoch".format(prefix,i))) 84 | max_model = max_model-max_model%increment 85 | model_range = np.arange(0,max_model+increment,increment) 86 | distances = np.zeros((len(model_range))) 87 | 88 | for j in range(len(model_range)): 89 | distances[j] = np.load("{2}{0}/MEval/m{1}/Mean_Trajectory_Distance_{2}{0}.npy".format(i,model_range[j],prefix)) 90 | 91 | distance_list.append(distances) 92 | 93 | ################################################ 94 | # Env list. 95 | environment_names = ["SawyerPickPlaceBread","SawyerPickPlaceCan","SawyerPickPlaceCereal","SawyerPickPlaceMilk","SawyerNutAssemblyRound","SawyerNutAssemblySquare"] 96 | 97 | # Evaluate baselineRL methods. 98 | a = 5 99 | b = 12 100 | prefix = 'downRL' 101 | increment = 20 102 | reward_list = [] 103 | 104 | for i in range(a,b+1): 105 | 106 | padded_index = str(i).zfill(3) 107 | 108 | model_template = "{1}{0}/saved_models/Model_epoch*".format(padded_index,prefix) 109 | models = glob.glob(model_template) 110 | # number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models] 111 | max_model = int(models[-1].lstrip("{1}{0}/saved_models/Model_epoch".format(padded_index,prefix))) 112 | max_model = max_model-max_model%increment 113 | model_range = np.arange(0,max_model+increment,increment) 114 | rewards = np.zeros((len(model_range))) 115 | 116 | for j in range(len(model_range)): 117 | rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(padded_index,model_range[j],prefix)) 118 | # rewards[j] = np.load("{0}{1}/MEval/m{2}/Mean_Reward_{0}{1}.npy".format(prefix,padded_indexi,model_range[j],prefix)) 119 | reward_list.append(rewards) 120 | 121 | ############################################## 122 | # MOcap distances 123 | 124 | # Get distances 125 | a = 1 126 | b = 2 127 | prefix = 'Mocap00' 128 | increment = 20 129 | distance_list = [] 130 | 131 | for i in range(a,b+1): 132 | 133 | model_template = "{0}{1}/saved_models/Model_epoch*".format(prefix,i) 134 | models = glob.glob(model_template) 135 | # number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models] 136 | max_model = int(models[-1].lstrip("{0}{1}/saved_models/Model_epoch".format(prefix,i))) 137 | max_model = max_model-max_model%increment 138 | model_range = np.arange(0,max_model+increment,increment) 139 | distances = np.zeros((len(model_range))) 140 | 141 | for j in range(len(model_range)): 142 | distances[j] = np.load("{2}{0}/MEval/m{1}/Mean_Trajectory_Distance_{2}{0}.npy".format(i,model_range[j],prefix)) 143 | 144 | distance_list.append(distances) 145 | 146 | ############################################## 147 | 148 | ################################################ 149 | # Env list. 150 | environment_names = ["SawyerPickPlaceBread","SawyerPickPlaceCan","SawyerPickPlaceCereal","SawyerPickPlaceMilk","SawyerNutAssemblyRound","SawyerNutAssemblySquare"] 151 | 152 | 153 | def remove_start(inputstring, word_to_remove): 154 | return inputstring[len(word_to_remove):] if inputstring.startswith(word_to_remove) else inputstring 155 | 156 | # Evaluate baselineRL methods. 157 | a = 23 158 | b = 28 159 | 160 | 161 | prefix = 'downRL_pi' 162 | increment = 20 163 | reward_list = [] 164 | 165 | for i in range(a,b+1): 166 | 167 | padded_index = str(i).zfill(3) 168 | 169 | model_template = "{1}{0}/saved_models/Model_epoch*".format(padded_index,prefix) 170 | models = glob.glob(model_template) 171 | # number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models] 172 | # max_model = int(models[-1].lstrip("{1}{0}/saved_models/Model_epoch".format(padded_index,prefix))) 173 | max_model = int(remove_start(models[-1],"{1}{0}/saved_models/Model_epoch".format(padded_index,prefix))) 174 | 175 | max_model = max_model-max_model%increment 176 | model_range = np.arange(0,max_model+increment,increment) 177 | rewards = np.zeros((len(model_range))) 178 | 179 | for j in range(len(model_range)-1): 180 | rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(padded_index,model_range[j],prefix)) 181 | # rewards[j] = np.load("{0}{1}/MEval/m{2}/Mean_Reward_{0}{1}.npy".format(prefix,padded_indexi,model_range[j],prefix)) 182 | reward_list.append(rewards) 183 | 184 | for i in range(a,b+1): 185 | 186 | print("For environment: ", environment_names[i-a]) 187 | print("Average reward:", np.array(reward_list[i-a]).max()) 188 | 189 | def evalrl(a,b): 190 | 191 | prefix = 'downRL_pi' 192 | increment = 20 193 | reward_list = [] 194 | 195 | for i in range(a,b+1): 196 | 197 | padded_index = str(i).zfill(3) 198 | 199 | model_template = "{1}{0}/saved_models/Model_epoch*".format(padded_index,prefix) 200 | models = glob.glob(model_template) 201 | # number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models] 202 | # max_model = int(models[-1].lstrip("{1}{0}/saved_models/Model_epoch".format(padded_index,prefix))) 203 | max_model = int(remove_start(models[-1],"{1}{0}/saved_models/Model_epoch".format(padded_index,prefix))) 204 | 205 | max_model = max_model-max_model%increment 206 | model_range = np.arange(0,max_model+increment,increment) 207 | rewards = np.zeros((len(model_range))) 208 | 209 | for j in range(len(model_range)-1): 210 | rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(padded_index,model_range[j],prefix)) 211 | # rewards[j] = np.load("{0}{1}/MEval/m{2}/Mean_Reward_{0}{1}.npy".format(prefix,padded_indexi,model_range[j],prefix)) 212 | reward_list.append(rewards) 213 | 214 | for i in range(a,b+1): 215 | 216 | print("For environment: ", environment_names[i-a]) 217 | print("Average reward:", np.array(reward_list[i-a]).max()) 218 | 219 | def evalrl(a,b): 220 | 221 | prefix = 'RL' 222 | increment = 20 223 | reward_list = [] 224 | 225 | for i in range(a,b+1): 226 | 227 | padded_index = str(i).zfill(2) 228 | 229 | model_template = "{1}{0}/saved_models/Model_epoch*".format(padded_index,prefix) 230 | models = glob.glob(model_template) 231 | # number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models] 232 | # max_model = int(models[-1].lstrip("{1}{0}/saved_models/Model_epoch".format(padded_index,prefix))) 233 | max_model = int(remove_start(models[-1],"{1}{0}/saved_models/Model_epoch".format(padded_index,prefix))) 234 | 235 | max_model = max_model-max_model%increment 236 | model_range = np.arange(0,max_model+increment,increment) 237 | rewards = np.zeros((len(model_range))) 238 | 239 | for j in range(len(model_range)-1): 240 | rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(padded_index,model_range[j],prefix)) 241 | # rewards[j] = np.load("{0}{1}/MEval/m{2}/Mean_Reward_{0}{1}.npy".format(prefix,padded_indexi,model_range[j],prefix)) 242 | reward_list.append(rewards) 243 | 244 | for i in range(a,b+1): 245 | 246 | print("For environment: ", environment_names[i-a]) 247 | print("Average reward:", np.array(reward_list[i-a]).max()) -------------------------------------------------------------------------------- /Experiments/MIME_DataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from headers import * 12 | import os.path as osp 13 | 14 | def select_baxter_angles(trajectory, joint_names, arm='right'): 15 | # joint names in order as used via mujoco visualizer 16 | baxter_joint_names = ['right_s0', 'right_s1', 'right_e0', 'right_e1', 'right_w0', 'right_w1', 'right_w2', 'left_s0', 'left_s1', 'left_e0', 'left_e1', 'left_w0', 'left_w1', 'left_w2'] 17 | if arm == 'right': 18 | select_joints = baxter_joint_names[:7] 19 | elif arm == 'left': 20 | select_joints = baxter_joint_names[7:] 21 | elif arm == 'both': 22 | select_joints = baxter_joint_names 23 | inds = [joint_names.index(j) for j in select_joints] 24 | return trajectory[:, inds] 25 | 26 | def resample(original_trajectory, desired_number_timepoints): 27 | original_traj_len = len(original_trajectory) 28 | new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int) 29 | return original_trajectory[new_timepoints] 30 | 31 | class MIME_Dataset(Dataset): 32 | ''' 33 | Class implementing instance of dataset class for MIME data. 34 | ''' 35 | def __init__(self, split='all'): 36 | self.dataset_directory = '/checkpoint/tanmayshankar/MIME/' 37 | self.ds_freq = 20 38 | 39 | # Default: /checkpoint/tanmayshankar/MIME/ 40 | self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/joint_angles.txt') 41 | self.filelist = glob.glob(self.fulltext) 42 | 43 | with open(self.filelist[0], 'r') as file: 44 | lines = file.readlines() 45 | self.joint_names = sorted(eval(lines[0].rstrip('\n')).keys()) 46 | 47 | if split == 'all': 48 | self.filelist = self.filelist 49 | else: 50 | self.task_lists = np.load(os.path.join( 51 | self.dataset_directory, 'MIME_jointangles/{}_Lists.npy'.format(split.capitalize()))) 52 | 53 | self.filelist = [] 54 | for i in range(20): 55 | self.filelist.extend(self.task_lists[i]) 56 | self.filelist = [f.replace('/checkpoint/tanmayshankar/MIME/', self.dataset_directory) for f in self.filelist] 57 | # print(len(self.filelist)) 58 | 59 | def __len__(self): 60 | # Return length of file list. 61 | return len(self.filelist) 62 | 63 | def __getitem__(self, index): 64 | ''' 65 | # Returns Joint Angles as: 66 | # List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 67 | # Assumes index is within range [0,len(filelist)-1] 68 | ''' 69 | file = self.filelist[index] 70 | 71 | left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt')) 72 | right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt')) 73 | 74 | orig_left_traj = np.load(osp.join(osp.split(file)[0], 'Left_EE.npy')) 75 | orig_right_traj = np.load(osp.join(osp.split(file)[0], 'Right_EE.npy')) 76 | 77 | joint_angle_trajectory = [] 78 | # Open file. 79 | with open(file, 'r') as file: 80 | lines = file.readlines() 81 | for line in lines: 82 | dict_element = eval(line.rstrip('\n')) 83 | if len(dict_element.keys()) == len(self.joint_names): 84 | # some files have extra lines with gripper keys e.g. MIME_jointangles/4/12405Nov19/joint_angles.txt 85 | array_element = np.array([dict_element[joint] for joint in self.joint_names]) 86 | joint_angle_trajectory.append(array_element) 87 | 88 | joint_angle_trajectory = np.array(joint_angle_trajectory) 89 | 90 | n_samples = len(orig_left_traj) // self.ds_freq 91 | 92 | elem = {} 93 | elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples) 94 | elem['left_trajectory'] = resample(orig_left_traj, n_samples) 95 | elem['right_trajectory'] = resample(orig_right_traj, n_samples) 96 | elem['left_gripper'] = resample(left_gripper, n_samples)/100 97 | elem['right_gripper'] = resample(right_gripper, n_samples)/100 98 | elem['path_prefix'] = os.path.split(self.filelist[index])[0] 99 | elem['ra_trajectory'] = select_baxter_angles(elem['joint_angle_trajectory'], self.joint_names, arm='right') 100 | elem['la_trajectory'] = select_baxter_angles(elem['joint_angle_trajectory'], self.joint_names, arm='left') 101 | # If max norm of differences is <1.0, valid. 102 | 103 | # if elem['joint_angle_trajectory'].shape[0]>1: 104 | elem['is_valid'] = int(np.linalg.norm(np.diff(elem['joint_angle_trajectory'],axis=0),axis=1).max() < 1.0) 105 | 106 | return elem 107 | 108 | def recreate_dictionary(self, arm, joint_angles): 109 | if arm=="left": 110 | offset = 2 111 | width = 7 112 | elif arm=="right": 113 | offset = 9 114 | width = 7 115 | elif arm=="full": 116 | offset = 0 117 | width = len(self.joint_names) 118 | return dict((self.joint_names[i],joint_angles[i-offset]) for i in range(offset,offset+width)) 119 | 120 | class MIME_NewDataset(Dataset): 121 | 122 | def __init__(self, split='all'): 123 | self.dataset_directory = '/checkpoint/tanmayshankar/MIME/' 124 | 125 | # Load the entire set of trajectories. 126 | self.data_list = np.load(os.path.join(self.dataset_directory, "Data_List.npy"),allow_pickle=True) 127 | 128 | self.dataset_length = len(self.data_list) 129 | 130 | def __len__(self): 131 | # Return length of file list. 132 | return self.dataset_length 133 | 134 | def __getitem__(self, index): 135 | # Return n'th item of dataset. 136 | # This has already processed everything. 137 | 138 | return self.data_list[index] 139 | 140 | def compute_statistics(self): 141 | 142 | self.state_size = 16 143 | self.total_length = self.__len__() 144 | mean = np.zeros((self.state_size)) 145 | variance = np.zeros((self.state_size)) 146 | mins = np.zeros((self.total_length, self.state_size)) 147 | maxs = np.zeros((self.total_length, self.state_size)) 148 | lens = np.zeros((self.total_length)) 149 | 150 | # And velocity statistics. 151 | vel_mean = np.zeros((self.state_size)) 152 | vel_variance = np.zeros((self.state_size)) 153 | vel_mins = np.zeros((self.total_length, self.state_size)) 154 | vel_maxs = np.zeros((self.total_length, self.state_size)) 155 | 156 | 157 | for i in range(self.total_length): 158 | 159 | print("Phase 1: DP: ",i) 160 | data_element = self.__getitem__(i) 161 | 162 | if data_element['is_valid']: 163 | demo = data_element['demo'] 164 | vel = np.diff(demo,axis=0) 165 | mins[i] = demo.min(axis=0) 166 | maxs[i] = demo.max(axis=0) 167 | mean += demo.sum(axis=0) 168 | lens[i] = demo.shape[0] 169 | 170 | vel_mins[i] = abs(vel).min(axis=0) 171 | vel_maxs[i] = abs(vel).max(axis=0) 172 | vel_mean += vel.sum(axis=0) 173 | 174 | mean /= lens.sum() 175 | vel_mean /= lens.sum() 176 | 177 | for i in range(self.total_length): 178 | 179 | print("Phase 2: DP: ",i) 180 | data_element = self.__getitem__(i) 181 | 182 | # Just need to normalize the demonstration. Not the rest. 183 | if data_element['is_valid']: 184 | demo = data_element['demo'] 185 | vel = np.diff(demo,axis=0) 186 | variance += ((demo-mean)**2).sum(axis=0) 187 | vel_variance += ((vel-vel_mean)**2).sum(axis=0) 188 | 189 | variance /= lens.sum() 190 | variance = np.sqrt(variance) 191 | 192 | vel_variance /= lens.sum() 193 | vel_variance = np.sqrt(vel_variance) 194 | 195 | max_value = maxs.max(axis=0) 196 | min_value = mins.min(axis=0) 197 | 198 | vel_max_value = vel_maxs.max(axis=0) 199 | vel_min_value = vel_mins.min(axis=0) 200 | 201 | np.save("MIME_Orig_Mean.npy", mean) 202 | np.save("MIME_Orig_Var.npy", variance) 203 | np.save("MIME_Orig_Min.npy", min_value) 204 | np.save("MIME_Orig_Max.npy", max_value) 205 | np.save("MIME_Orig_Vel_Mean.npy", vel_mean) 206 | np.save("MIME_Orig_Vel_Var.npy", vel_variance) 207 | np.save("MIME_Orig_Vel_Min.npy", vel_min_value) 208 | np.save("MIME_Orig_Vel_Max.npy", vel_max_value) 209 | 210 | class MIME_Dataloader_Tester(unittest.TestCase): 211 | 212 | def test_MIMEdataloader(self): 213 | 214 | self.dataset = MIME_NewDataset() 215 | 216 | # Check the first index of the dataset. 217 | data_element = self.dataset[0] 218 | 219 | validity = data_element['is_valid']==1 220 | check_demo_data = (data_element['demo']==np.load("Test_Data/MIME_Dataloader_DE.npy")).all() 221 | 222 | self.assertTrue(validity and check_demo_data) 223 | 224 | if __name__ == '__main__': 225 | # Run all tests defined for the dataloader. 226 | unittest.main() -------------------------------------------------------------------------------- /Experiments/Master.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from headers import * 8 | import DataLoaders, MIME_DataLoader, Roboturk_DataLoader, Mocap_DataLoader 9 | from PolicyManagers import * 10 | import TestClass 11 | 12 | def return_dataset(args, data=None): 13 | 14 | # The data parameter overrides the data in args.data. 15 | # This is so that we can call return_dataset with source and target data for transfer setting. 16 | if data is not None: 17 | args.data = data 18 | 19 | # Define Data Loader. 20 | if args.data=='Continuous': 21 | dataset = DataLoaders.ContinuousToyDataset(args.datadir) 22 | elif args.data=='ContinuousNonZero': 23 | dataset = DataLoaders.ContinuousNonZeroToyDataset(args.datadir) 24 | elif args.data=='DeterGoal': 25 | dataset = DataLoaders.DeterministicGoalDirectedDataset(args.datadir) 26 | elif args.data=='MIME': 27 | dataset = MIME_DataLoader.MIME_NewDataset() 28 | elif args.data=='Roboturk': 29 | dataset = Roboturk_DataLoader.Roboturk_NewSegmentedDataset(args) 30 | elif args.data=='OrigRoboturk': 31 | dataset = Roboturk_DataLoader.Roboturk_Dataset(args) 32 | elif args.data=='FullRoboturk': 33 | dataset = Roboturk_DataLoader.Roboturk_FullDataset(args) 34 | elif args.data=='Mocap': 35 | dataset = Mocap_DataLoader.Mocap_Dataset(args) 36 | 37 | return dataset 38 | 39 | class Master(): 40 | 41 | def __init__(self, arguments): 42 | self.args = arguments 43 | 44 | self.dataset = return_dataset(self.args) 45 | 46 | # Now define policy manager. 47 | if self.args.setting=='learntsub': 48 | self.policy_manager = PolicyManager_Joint(self.args.number_policies, self.dataset, self.args) 49 | elif self.args.setting=='pretrain_sub': 50 | self.policy_manager = PolicyManager_Pretrain(self.args.number_policies, self.dataset, self.args) 51 | elif self.args.setting=='baselineRL': 52 | self.policy_manager = PolicyManager_BaselineRL(args=self.args) 53 | elif self.args.setting=='downstreamRL': 54 | self.policy_manager = PolicyManager_DownstreamRL(args=self.args) 55 | elif self.args.setting=='DMP': 56 | self.policy_manager = PolicyManager_DMPBaselines(self.args.number_policies, self.dataset, self.args) 57 | elif self.args.setting=='imitation': 58 | self.policy_manager = PolicyManager_Imitation(self.args.number_policies, self.dataset, self.args) 59 | elif self.args.setting=='transfer' or self.args.setting=='cycle_transfer': 60 | source_dataset = return_dataset(self.args, data=self.args.source_domain) 61 | target_dataset = return_dataset(self.args, data=self.args.target_domain) 62 | 63 | if self.args.setting=='transfer': 64 | self.policy_manager = PolicyManager_Transfer(args=self.args, source_dataset=source_dataset, target_dataset=target_dataset) 65 | elif self.args.setting=='cycle_transfer': 66 | self.policy_manager = PolicyManager_CycleConsistencyTransfer(args=self.args, source_dataset=source_dataset, target_dataset=target_dataset) 67 | 68 | if self.args.debug: 69 | embed() 70 | 71 | # Create networks and training operations. 72 | self.policy_manager.setup() 73 | 74 | def run(self): 75 | if self.args.setting=='pretrain_sub' or self.args.setting=='pretrain_prior' or \ 76 | self.args.setting=='imitation' or self.args.setting=='baselineRL' or self.args.setting=='downstreamRL' or \ 77 | self.args.setting=='transfer' or self.args.setting=='cycle_transfer': 78 | if self.args.train: 79 | if self.args.model: 80 | self.policy_manager.train(self.args.model) 81 | else: 82 | self.policy_manager.train() 83 | else: 84 | if self.args.setting=='pretrain_prior': 85 | self.policy_manager.train(self.args.model) 86 | else: 87 | self.policy_manager.evaluate(model=self.args.model) 88 | 89 | elif self.args.setting=='learntsub': 90 | if self.args.train: 91 | if self.args.model: 92 | self.policy_manager.train(self.args.model) 93 | else: 94 | if self.args.subpolicy_model: 95 | print("Just loading subpolicies.") 96 | self.policy_manager.load_all_models(self.args.subpolicy_model, just_subpolicy=True) 97 | self.policy_manager.train() 98 | else: 99 | # self.policy_manager.train(self.args.model) 100 | self.policy_manager.evaluate(self.args.model) 101 | 102 | # elif self.args.setting=='baselineRL' or self.args.setting=='downstreamRL': 103 | # if self.args.train: 104 | # if self.args.model: 105 | # self.policy_manager.train(self.args.model) 106 | # else: 107 | # self.policy_manager.train() 108 | 109 | elif self.args.setting=='DMP': 110 | self.policy_manager.evaluate_across_testset() 111 | 112 | def test(self): 113 | if self.args.test_code: 114 | loader = TestClass.TestLoaderWithKwargs() 115 | suite = loader.loadTestsFromTestCase(TestClass.MetaTestClass, policy_manager=self.policy_manager) 116 | unittest.TextTestRunner().run(suite) 117 | 118 | def parse_arguments(): 119 | parser = argparse.ArgumentParser(description='Learning Skills from Demonstrations') 120 | 121 | # Setup training. 122 | parser.add_argument('--datadir', dest='datadir',type=str,default='../Data/ContData/') 123 | parser.add_argument('--train',dest='train',type=int,default=0) 124 | parser.add_argument('--debug',dest='debug',type=int,default=0) 125 | parser.add_argument('--notes',dest='notes',type=str) 126 | parser.add_argument('--name',dest='name',type=str,default=None) 127 | parser.add_argument('--fake_batch_size',dest='fake_batch_size',type=int,default=1) 128 | parser.add_argument('--batch_size',dest='batch_size',type=int,default=1) 129 | parser.add_argument('--training_phase_size',dest='training_phase_size',type=int,default=500000) 130 | parser.add_argument('--initial_counter_value',dest='initial_counter_value',type=int,default=0) 131 | parser.add_argument('--data',dest='data',type=str,default='Continuous') 132 | parser.add_argument('--setting',dest='setting',type=str,default='gtsub') 133 | parser.add_argument('--test_code',dest='test_code',type=int,default=0) 134 | parser.add_argument('--model',dest='model',type=str) 135 | parser.add_argument('--logdir',dest='logdir',type=str,default='Experiment_Logs/') 136 | parser.add_argument('--epochs',dest='epochs',type=int,default=500) # Number of epochs to train for. Reduce for Mocap. 137 | 138 | # Training setting. 139 | parser.add_argument('--discrete_z',dest='discrete_z',type=int,default=0) 140 | # parser.add_argument('--transformer',dest='transformer',type=int,default=0) 141 | parser.add_argument('--z_dimensions',dest='z_dimensions',type=int,default=64) 142 | parser.add_argument('--number_layers',dest='number_layers',type=int,default=5) 143 | parser.add_argument('--hidden_size',dest='hidden_size',type=int,default=64) 144 | parser.add_argument('--environment',dest='environment',type=str,default='SawyerLift') # Defines robosuite environment for RL. 145 | 146 | # Data parameters. 147 | parser.add_argument('--traj_segments',dest='traj_segments',type=int,default=1) # Defines whether to use trajectory segments for pretraining or entire trajectories. Useful for baseline implementation. 148 | parser.add_argument('--gripper',dest='gripper',type=int,default=1) # Whether to use gripper training in roboturk. 149 | parser.add_argument('--ds_freq',dest='ds_freq',type=int,default=1) # Additional downsample frequency. 150 | parser.add_argument('--condition_size',dest='condition_size',type=int,default=4) 151 | parser.add_argument('--smoothen', dest='smoothen',type=int,default=0) # Whether to smoothen the original dataset. 152 | parser.add_argument('--smoothing_kernel_bandwidth', dest='smoothing_kernel_bandwidth',type=float,default=3.5) # The smoothing bandwidth that is applied to data loader trajectories. 153 | 154 | parser.add_argument('--new_gradient',dest='new_gradient',type=int,default=1) 155 | parser.add_argument('--b_prior',dest='b_prior',type=int,default=1) 156 | parser.add_argument('--constrained_b_prior',dest='constrained_b_prior',type=int,default=1) # Whether to use constrained b prior var network or just normal b prior one. 157 | parser.add_argument('--reparam',dest='reparam',type=int,default=1) 158 | parser.add_argument('--number_policies',dest='number_policies',type=int,default=4) 159 | parser.add_argument('--fix_subpolicy',dest='fix_subpolicy',type=int,default=1) 160 | parser.add_argument('--train_only_policy',dest='train_only_policy',type=int,default=0) # Train only the policy network and use a pretrained encoder. This is weird but whatever. 161 | parser.add_argument('--load_latent',dest='load_latent',type=int,default=1) # Whether to load latent policy from model or not. 162 | parser.add_argument('--subpolicy_model',dest='subpolicy_model',type=str) 163 | parser.add_argument('--traj_length',dest='traj_length',type=int,default=10) 164 | parser.add_argument('--skill_length',dest='skill_length',type=int,default=5) 165 | parser.add_argument('--var_skill_length',dest='var_skill_length',type=int,default=0) 166 | parser.add_argument('--display_freq',dest='display_freq',type=int,default=10000) 167 | parser.add_argument('--save_freq',dest='save_freq',type=int,default=1) 168 | parser.add_argument('--eval_freq',dest='eval_freq',type=int,default=20) 169 | parser.add_argument('--perplexity',dest='perplexity',type=float,default=30,help='Value of perplexity fed to TSNE.') 170 | 171 | parser.add_argument('--entropy',dest='entropy',type=int,default=0) 172 | parser.add_argument('--var_entropy',dest='var_entropy',type=int,default=0) 173 | parser.add_argument('--ent_weight',dest='ent_weight',type=float,default=0.) 174 | parser.add_argument('--var_ent_weight',dest='var_ent_weight',type=float,default=2.) 175 | 176 | parser.add_argument('--pretrain_bias_sampling',type=float,default=0.) # Defines percentage of trajectory within which to sample trajectory segments for pretraining. 177 | parser.add_argument('--pretrain_bias_sampling_prob',type=float,default=0.) 178 | parser.add_argument('--action_scale_factor',type=float,default=1) 179 | 180 | parser.add_argument('--z_exploration_bias',dest='z_exploration_bias',type=float,default=0.) 181 | parser.add_argument('--b_exploration_bias',dest='b_exploration_bias',type=float,default=0.) 182 | parser.add_argument('--lat_z_wt',dest='lat_z_wt',type=float,default=0.1) 183 | parser.add_argument('--lat_b_wt',dest='lat_b_wt',type=float,default=1.) 184 | parser.add_argument('--z_probability_factor',dest='z_probability_factor',type=float,default=0.1) 185 | parser.add_argument('--b_probability_factor',dest='b_probability_factor',type=float,default=0.1) 186 | parser.add_argument('--subpolicy_clamp_value',dest='subpolicy_clamp_value',type=float,default=-5) 187 | parser.add_argument('--latent_clamp_value',dest='latent_clamp_value',type=float,default=-5) 188 | parser.add_argument('--min_variance_bias',dest='min_variance_bias',type=float,default=0.01) 189 | parser.add_argument('--normalization',dest='normalization',type=str,default='None') 190 | 191 | parser.add_argument('--likelihood_penalty',dest='likelihood_penalty',type=int,default=10) 192 | parser.add_argument('--subpolicy_ratio',dest='subpolicy_ratio',type=float,default=0.01) 193 | parser.add_argument('--latentpolicy_ratio',dest='latentpolicy_ratio',type=float,default=0.1) 194 | parser.add_argument('--temporal_latentpolicy_ratio',dest='temporal_latentpolicy_ratio',type=float,default=0.) 195 | parser.add_argument('--latent_loss_weight',dest='latent_loss_weight',type=float,default=0.1) 196 | parser.add_argument('--kl_weight',dest='kl_weight',type=float,default=0.01) 197 | parser.add_argument('--var_loss_weight',dest='var_loss_weight',type=float,default=1.) 198 | parser.add_argument('--prior_weight',dest='prior_weight',type=float,default=0.00001) 199 | 200 | # Cross Domain Skill Transfer parameters. 201 | parser.add_argument('--discriminability_weight',dest='discriminability_weight',type=float,default=1.,help='Weight of discriminability loss in cross domain skill transfer.') 202 | parser.add_argument('--vae_loss_weight',dest='vae_loss_weight',type=float,default=1.,help='Weight of VAE loss in cross domain skill transfer.') 203 | parser.add_argument('--alternating_phase_size',dest='alternating_phase_size',type=int,default=2000, help='Size of alternating training phases.') 204 | parser.add_argument('--discriminator_phase_size',dest='discriminator_phase_size',type=int,default=2,help='Factor by which to train discriminator more than generator.') 205 | parser.add_argument('--cycle_reconstruction_loss_weight',dest='cycle_reconstruction_loss_weight',type=float,default=1.,help='Weight of the cycle-consistency reconstruction loss term.') 206 | 207 | # Exploration and learning rate parameters. 208 | parser.add_argument('--epsilon_from',dest='epsilon_from',type=float,default=0.3) 209 | parser.add_argument('--epsilon_to',dest='epsilon_to',type=float,default=0.05) 210 | parser.add_argument('--epsilon_over',dest='epsilon_over',type=int,default=30) 211 | parser.add_argument('--learning_rate',dest='learning_rate',type=float,default=1e-4) 212 | 213 | # Baseline parameters. 214 | parser.add_argument('--baseline_kernels',dest='baseline_kernels',type=int,default=15) 215 | parser.add_argument('--baseline_window',dest='baseline_window',type=int,default=15) 216 | parser.add_argument('--baseline_kernel_bandwidth',dest='baseline_kernel_bandwidth',type=float,default=3.5) 217 | 218 | # Reinforcement Learning parameters. 219 | parser.add_argument('--TD',dest='TD',type=int,default=0) # Whether or not to use Temporal difference while training the critic network. 220 | parser.add_argument('--OU',dest='OU',type=int,default=1) # Whether or not to use the Ornstein Uhlenbeck noise process while training. 221 | parser.add_argument('--OU_max_sigma',dest='OU_max_sigma',type=float,default=0.2) # Max Sigma value of the Ornstein Uhlenbeck noise process. 222 | parser.add_argument('--OU_min_sigma',dest='OU_min_sigma',type=float,default=0.2) # Min Sigma value of the Ornstein Uhlenbeck noise process. 223 | parser.add_argument('--MLP_policy',dest='MLP_policy',type=int,default=0) # Whether or not to use MLP policy. 224 | parser.add_argument('--mean_nonlinearity',dest='mean_nonlinearity',type=int,default=0) # Whether or not to use Tanh activation. 225 | parser.add_argument('--burn_in_eps',dest='burn_in_eps',type=int,default=500) # How many epsiodes to burn in. 226 | parser.add_argument('--random_memory_burn_in',dest='random_memory_burn_in',type=int,default=1) # Whether to burn in episodes into memory randomly or not. 227 | parser.add_argument('--shaped_reward',dest='shaped_reward',type=int,default=0) # Whether or not to use shaped rewards. 228 | parser.add_argument('--memory_size',dest='memory_size',type=int,default=2000) # Size of replay memory. 2000 is okay, but is still kind of short sighted. 229 | 230 | # Transfer learning domains, etc. 231 | parser.add_argument('--source_domain',dest='source_domain',type=str,help='What the source domain is in transfer.') 232 | parser.add_argument('--target_domain',dest='target_domain',type=str,help='What the target domain is in transfer.') 233 | 234 | return parser.parse_args() 235 | 236 | def main(args): 237 | 238 | args = parse_arguments() 239 | master = Master(args) 240 | 241 | if args.test_code: 242 | master.test() 243 | else: 244 | master.run() 245 | 246 | if __name__=='__main__': 247 | main(sys.argv) 248 | 249 | -------------------------------------------------------------------------------- /Experiments/MocapVisualizationExample.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | 8 | import MocapVisualizationUtils 9 | import threading, time, numpy as np 10 | 11 | # bvh_filename = "/home/tanmayshankar/Research/Code/CausalSkillLearning/Experiments/01_01_poses.bvh" 12 | bvh_filename = "/private/home/tanmayshankar/Research/Code/CausalSkillLearning/Experiments/01_01_poses.bvh" 13 | filenames = [bvh_filename] 14 | file_num = 0 15 | 16 | print("About to run viewer.") 17 | 18 | cam_cur = MocapVisualizationUtils.camera.Camera(pos=np.array([6.0, 0.0, 2.0]), 19 | origin=np.array([0.0, 0.0, 0.0]), 20 | vup=np.array([0.0, 0.0, 1.0]), 21 | fov=45.0) 22 | 23 | def run_thread(): 24 | MocapVisualizationUtils.viewer.run( 25 | title='BVH viewer', 26 | cam=cam_cur, 27 | size=(1280, 720), 28 | keyboard_callback=None, 29 | render_callback=MocapVisualizationUtils.render_callback_time_independent, 30 | idle_callback=MocapVisualizationUtils.idle_callback, 31 | ) 32 | 33 | def run_thread(): 34 | MocapVisualizationUtils.viewer.run( 35 | title='BVH viewer', 36 | cam=cam_cur, 37 | size=(1280, 720), 38 | keyboard_callback=None, 39 | render_callback=MocapVisualizationUtils.render_callback_time_independent, 40 | idle_callback=MocapVisualizationUtils.idle_callback_return, 41 | ) 42 | 43 | 44 | # Run init before loading animation. 45 | MocapVisualizationUtils.init() 46 | MocapVisualizationUtils.global_positions, MocapVisualizationUtils.joint_parents, MocapVisualizationUtils.time_per_frame = MocapVisualizationUtils.load_animation(filenames[file_num]) 47 | thread = threading.Thread(target=run_thread) 48 | thread.start() 49 | 50 | print("Going to actually call callback now.") 51 | MocapVisualizationUtils.whether_to_render = True 52 | 53 | x_count = 0 54 | while MocapVisualizationUtils.done_with_render==False and MocapVisualizationUtils.whether_to_render==True: 55 | x_count += 1 56 | time.sleep(1) 57 | print("x_count is now: ",x_count) 58 | 59 | print("We finished with the visualization!") 60 | -------------------------------------------------------------------------------- /Experiments/MocapVisualizationUtils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from mocap_processing.motion.pfnn import Animation, BVH 8 | from basecode.render import glut_viewer as viewer 9 | from basecode.render import gl_render, camera 10 | from basecode.utils import basics 11 | from basecode.math import mmMath 12 | 13 | import numpy as np, imageio 14 | 15 | from OpenGL.GL import * 16 | from OpenGL.GLU import * 17 | from OpenGL.GLUT import * 18 | 19 | import time, threading 20 | from IPython import embed 21 | 22 | global whether_to_render 23 | whether_to_render = False 24 | 25 | def init(): 26 | global whether_to_render, global_positions, counter, joint_parents, done_with_render, save_path, name_prefix, image_list 27 | whether_to_render = False 28 | done_with_render = False 29 | global_positions = None 30 | joint_parents = None 31 | save_path = "/private/home/tanmayshankar/Research/Code/" 32 | name_prefix = "Viz_Image" 33 | image_list = [] 34 | counter = 0 35 | 36 | # Define function to load animation file. 37 | def load_animation(bvh_filename): 38 | animation, joint_names, time_per_frame = BVH.load(bvh_filename) 39 | joint_parents = animation.parents 40 | global_positions = Animation.positions_global(animation) 41 | return global_positions, joint_parents, time_per_frame 42 | 43 | # Function that draws body of animated character from the global positions. 44 | def render_pose_by_capsule(global_positions, frame_num, joint_parents, scale=1.0, color=[0.5, 0.5, 0.5, 1], radius=0.05): 45 | glPushMatrix() 46 | glScalef(scale, scale, scale) 47 | 48 | for i in range(len(joint_parents)): 49 | pos = global_positions[frame_num][i] 50 | # gl_render.render_point(pos, radius=radius, color=color) 51 | j = joint_parents[i] 52 | if j!=-1: 53 | pos_parent = global_positions[frame_num][j] 54 | p = 0.5 * (pos_parent + pos) 55 | l = np.linalg.norm(pos_parent-pos) 56 | R = mmMath.getSO3FromVectors(np.array([0, 0, 1]), pos_parent-pos) 57 | gl_render.render_capsule(mmMath.Rp2T(R,p), l, radius, color=color, slice=16) 58 | glPopMatrix() 59 | 60 | # Callback that renders one pose. 61 | def render_callback_time_independent(): 62 | global global_positions, joint_parents, counter 63 | 64 | if counter=global_positions.shape[0]: 105 | if counter>=10: 106 | whether_to_render = False 107 | done_with_render = True 108 | 109 | # If whether to render is false, reset the counter. 110 | else: 111 | counter = 0 112 | 113 | def idle_callback_return(): 114 | # # Increment counter 115 | # # Set frame number of trajectory to be rendered 116 | # # Using the time independent rendering. 117 | # # Call drawGL and savescreen. 118 | # # Since this is an idle callback, drawGL won't call itself (only calls render callback). 119 | 120 | global whether_to_render, counter, global_positions, done_with_render, save_path, name_prefix, image_list 121 | done_with_render = False 122 | 123 | if whether_to_render and counter=global_positions.shape[0]: 138 | # if counter>=10: 139 | whether_to_render = False 140 | done_with_render = True 141 | 142 | # If whether to render is false, reset the counter. 143 | else: 144 | counter = 0 -------------------------------------------------------------------------------- /Experiments/Mocap_DataLoader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from headers import * 12 | 13 | def resample(original_trajectory, desired_number_timepoints): 14 | original_traj_len = len(original_trajectory) 15 | new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int) 16 | return original_trajectory[new_timepoints] 17 | 18 | class Mocap_Dataset(Dataset): 19 | 20 | def __init__(self, args, split='all'): 21 | self.dataset_directory = '/checkpoint/tanmayshankar/Mocap/' 22 | self.args = args 23 | # Load the entire set of trajectories. 24 | self.data_list = np.load(os.path.join(self.dataset_directory, "Demo_Array.npy"),allow_pickle=True) 25 | self.dataset_length = len(self.data_list) 26 | self.ds_freq = self.args.ds_freq 27 | 28 | def __len__(self): 29 | # Return length of file list. 30 | return self.dataset_length 31 | 32 | def process_item(self, item): 33 | resample_length = len(item['global_positions']) // self.ds_freq 34 | 35 | if resample_length<5: 36 | item['is_valid'] = False 37 | else: 38 | item['is_valid'] = True 39 | item['global_positions'] = resample(item['global_positions'], resample_length) 40 | demo = resample(item['local_positions'], resample_length) 41 | item['local_positions'] = demo 42 | item['local_rotations'] = resample(item['local_rotations'], resample_length) 43 | item['animation'] = resample(item['animation'], resample_length) 44 | 45 | # Replicate as demo for downstream dataloading. # Reshape to TxNumber of dimensions. 46 | item['demo'] = demo.reshape((demo.shape[0],-1)) 47 | 48 | return item 49 | 50 | def __getitem__(self, index): 51 | # Return n'th item of dataset. 52 | # This has already processed everything. 53 | 54 | # Remember, the global and local posiitons are all stored as Number_Frames x Number_Joints x 3 array. 55 | # Change this to # Number_Frames x Number_Dimensions...? But the dimensions are not independent.. so what do we do? 56 | 57 | return self.process_item(copy.deepcopy(self.data_list[index])) 58 | 59 | def compute_statistics(self): 60 | embed() -------------------------------------------------------------------------------- /Experiments/Processing_MocapData.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import mocap_processing, glob, numpy as np, os 8 | from mocap_processing.motion.pfnn import Animation, BVH 9 | from mocap_processing.motion.pfnn import Animation, BVH 10 | from IPython import embed 11 | 12 | # Define function that loads global and local posiitons, and the rotations from a datafile. 13 | def load_animation_data(bvh_filename): 14 | animation, joint_names, time_per_frame = BVH.load(bvh_filename) 15 | global_positions = Animation.positions_global(animation) 16 | # return global_positions, joint_parents, time_per_frame 17 | return global_positions, animation.positions, animation.rotations, animation 18 | 19 | # Set directory. 20 | directory = "/checkpoint/dgopinath/amass/CMU" 21 | save_directory = "/checkpoint/tanmayshankar/Mocap" 22 | # Get file list. 23 | filelist = glob.glob(os.path.join(directory,"*/*.bvh")) 24 | 25 | demo_list = [] 26 | 27 | print("Starting to preprocess data.") 28 | 29 | for i in range(len(filelist)): 30 | 31 | print("Processing file number: ",i, " of ",len(filelist)) 32 | # Get filename. 33 | filename = os.path.join(directory, filelist[i]) 34 | # Actually load file. 35 | global_positions, local_positions, local_rotations, animation = load_animation_data(filename) 36 | 37 | # Create data element object. 38 | data_element = {} 39 | data_element['global_positions'] = global_positions 40 | data_element['local_positions'] = local_positions 41 | # Get quaternion as array. 42 | data_element['local_rotations'] = local_rotations.qs 43 | data_element['animation'] = animation 44 | 45 | demo_list.append(data_element) 46 | 47 | demo_array = np.array(demo_list) 48 | np.save(os.path.join(save_directory,"Demo_Array.npy"),demo_array) -------------------------------------------------------------------------------- /Experiments/RLUtils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | from headers import * 8 | 9 | def resample(original_trajectory, desired_number_timepoints): 10 | original_traj_len = len(original_trajectory) 11 | new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int) 12 | return original_trajectory[new_timepoints] 13 | 14 | class Transition(): 15 | 16 | def __init__(self, state, action, next_state, onestep_reward, terminal, success): 17 | # Now that we're doing 1step TD, and AC architectures rather than MC, 18 | # Don't need an explicit value of return. 19 | self.state = state 20 | self.action = action 21 | self.next_state = next_state 22 | self.onestep_reward = onestep_reward 23 | self.terminal = terminal 24 | self.success = success 25 | 26 | class Episode_TransitionList(): 27 | 28 | def __init__(self, transition_list): 29 | self.episode = transition_list 30 | 31 | def length(self): 32 | return len(self.episode) 33 | 34 | # Alternate way of implementing an episode... 35 | # Make it a class that has state_list, action_list, etc. over the episode.. 36 | class Episode(): 37 | 38 | def __init__(self, state_list=None, action_list=None, reward_list=None, terminal_list=None): 39 | self.state_list = state_list 40 | self.action_list = action_list 41 | self.reward_list = reward_list 42 | self.terminal_list = terminal_list 43 | self.episode_lenth = len(self.state_list) 44 | 45 | def length(self): 46 | return self.episode_lenth 47 | 48 | class HierarchicalEpisode(Episode): 49 | 50 | def __init__(self, state_list=None, action_list=None, reward_list=None, terminal_list=None, latent_z_list=None, latent_b_list=None): 51 | 52 | super(HierarchicalEpisode, self).__init__(state_list, action_list, reward_list, terminal_list) 53 | 54 | self.latent_z_list = latent_z_list 55 | self.latent_b_list = latent_b_list 56 | 57 | class ReplayMemory(): 58 | 59 | def __init__(self, memory_size=10000): 60 | 61 | # Implementing the memory as a list of EPISODES. 62 | # This acts as a queue. 63 | self.memory = [] 64 | 65 | # Accessing the memory with indices should be constant time, so it's okay to use a list. 66 | # Not using a priority either. 67 | self.memory_len = 0 68 | self.memory_size = memory_size 69 | 70 | print("Setup Memory.") 71 | 72 | def append_to_memory(self, episode): 73 | 74 | if self.check_full(): 75 | # Remove first episode in the memory (queue). 76 | self.memory.pop(0) 77 | # Now push the episode to the end of hte queue. 78 | self.memory.append(episode) 79 | else: 80 | self.memory.append(episode) 81 | 82 | self.memory_len+=1 83 | 84 | def sample_batch(self, batch_size=25): 85 | 86 | self.memory_len = len(self.memory) 87 | 88 | indices = np.random.randint(0,high=self.memory_len,size=(batch_size)) 89 | 90 | return indices 91 | 92 | def retrieve_batch(self, batch_size=25): 93 | # self.memory_len = len(self.memory) 94 | 95 | return np.arange(0,batch_size) 96 | 97 | def check_full(self): 98 | 99 | self.memory_len = len(self.memory) 100 | 101 | if self.memory_len0 and segmentations[t]==1: 76 | image_list.append(255*np.ones_like(new_image)+new_image) 77 | 78 | if return_and_save: 79 | imageio.mimsave(os.path.join(gif_path,gif_name), image_list) 80 | return image_list 81 | elif return_gif: 82 | return image_list 83 | else: 84 | imageio.mimsave(os.path.join(gif_path,gif_name), image_list) 85 | 86 | class BaxterVisualizer(): 87 | 88 | def __init__(self, has_display=False): 89 | 90 | # Create environment. 91 | print("Do I have a display?", has_display) 92 | # self.base_env = robosuite.make('BaxterLift', has_renderer=has_display) 93 | self.base_env = robosuite.make("BaxterViz",has_renderer=has_display) 94 | 95 | # Create kinematics object. 96 | self.baxter_IK_object = IKWrapper(self.base_env) 97 | self.environment = self.baxter_IK_object.env 98 | 99 | def update_state(self): 100 | # Updates all joint states 101 | self.full_state = self.environment._get_observation() 102 | 103 | def set_ee_pose_return_image(self, ee_pose, arm='right', seed=None): 104 | 105 | # Assumes EE pose is Position in the first three elements, and quaternion in last 4 elements. 106 | self.update_state() 107 | 108 | if seed is None: 109 | # Set seed to current state. 110 | seed = self.full_state['joint_pos'] 111 | 112 | if arm == 'right': 113 | joint_positions = self.baxter_IK_object.controller.inverse_kinematics( 114 | target_position_right=ee_pose[:3], 115 | target_orientation_right=ee_pose[3:], 116 | target_position_left=self.full_state['left_eef_pos'], 117 | target_orientation_left=self.full_state['left_eef_quat'], 118 | rest_poses=seed 119 | ) 120 | 121 | elif arm == 'left': 122 | joint_positions = self.baxter_IK_object.controller.inverse_kinematics( 123 | target_position_right=self.full_state['right_eef_pos'], 124 | target_orientation_right=self.full_state['right_eef_quat'], 125 | target_position_left=ee_pose[:3], 126 | target_orientation_left=ee_pose[3:], 127 | rest_poses=seed 128 | ) 129 | 130 | elif arm == 'both': 131 | joint_positions = self.baxter_IK_object.controller.inverse_kinematics( 132 | target_position_right=ee_pose[:3], 133 | target_orientation_right=ee_pose[3:7], 134 | target_position_left=ee_pose[7:10], 135 | target_orientation_left=ee_pose[10:], 136 | rest_poses=seed 137 | ) 138 | image = self.set_joint_pose_return_image(joint_positions, arm=arm, gripper=False) 139 | return image 140 | 141 | def set_joint_pose_return_image(self, joint_pose, arm='both', gripper=False): 142 | 143 | # FOR FULL 16 DOF STATE: ASSUMES JOINT_POSE IS . 144 | 145 | self.update_state() 146 | self.state = copy.deepcopy(self.full_state['joint_pos']) 147 | # THE FIRST 7 JOINT ANGLES IN MUJOCO ARE THE RIGHT HAND. 148 | # THE LAST 7 JOINT ANGLES IN MUJOCO ARE THE LEFT HAND. 149 | 150 | if arm=='right': 151 | # Assume joint_pose is 8 DoF - 7 for the arm, and 1 for the gripper. 152 | self.state[:7] = copy.deepcopy(joint_pose[:7]) 153 | elif arm=='left': 154 | # Assume joint_pose is 8 DoF - 7 for the arm, and 1 for the gripper. 155 | self.state[7:] = copy.deepcopy(joint_pose[:7]) 156 | elif arm=='both': 157 | # The Plans were generated as: Left arm, Right arm, left gripper, right gripper. 158 | # Assume joint_pose is 16 DoF. 7 DoF for left arm, 7 DoF for right arm. (These need to be flipped)., 1 for left gripper. 1 for right gripper. 159 | # First right hand. 160 | self.state[:7] = joint_pose[7:14] 161 | # Now left hand. 162 | self.state[7:] = joint_pose[:7] 163 | # Set the joint angles magically. 164 | self.environment.set_robot_joint_positions(self.state) 165 | 166 | action = np.zeros((16)) 167 | if gripper: 168 | # Left gripper is 15. Right gripper is 14. 169 | # MIME Gripper values are from 0 to 100 (Close to Open), but we treat the inputs to this function as 0 to 1 (Close to Open), and then rescale to (-1 Open to 1 Close) for Mujoco. 170 | if arm=='right': 171 | action[14] = -joint_pose[-1]*2+1 172 | elif arm=='left': 173 | action[15] = -joint_pose[-1]*2+1 174 | elif arm=='both': 175 | action[14] = -joint_pose[15]*2+1 176 | action[15] = -joint_pose[14]*2+1 177 | # Move gripper positions. 178 | self.environment.step(action) 179 | 180 | image = np.flipud(self.environment.sim.render(600, 600, camera_name='vizview1')) 181 | return image 182 | 183 | def visualize_joint_trajectory(self, trajectory, return_gif=False, gif_path=None, gif_name="Traj.gif", segmentations=None, return_and_save=False, additional_info=None): 184 | 185 | image_list = [] 186 | for t in range(trajectory.shape[0]): 187 | new_image = self.set_joint_pose_return_image(trajectory[t]) 188 | image_list.append(new_image) 189 | 190 | # Insert white 191 | if segmentations is not None: 192 | if t>0 and segmentations[t]==1: 193 | image_list.append(255*np.ones_like(new_image)+new_image) 194 | 195 | if return_and_save: 196 | imageio.mimsave(os.path.join(gif_path,gif_name), image_list) 197 | return image_list 198 | elif return_gif: 199 | return image_list 200 | else: 201 | imageio.mimsave(os.path.join(gif_path,gif_name), image_list) 202 | 203 | # class MocapVisualizer(): 204 | 205 | # def __init__(self, has_display=False, args=None): 206 | 207 | # # Load some things from the MocapVisualizationUtils and set things up so that they're ready to go. 208 | # # self.cam_cur = MocapVisualizationUtils.camera.Camera(pos=np.array([6.0, 0.0, 2.0]), 209 | # # origin=np.array([0.0, 0.0, 0.0]), 210 | # # vup=np.array([0.0, 0.0, 1.0]), 211 | # # fov=45.0) 212 | 213 | # self.args = args 214 | 215 | # # Default is local data. 216 | # self.global_data = False 217 | 218 | # self.cam_cur = MocapVisualizationUtils.camera.Camera(pos=np.array([4.5, 0.0, 2.0]), 219 | # origin=np.array([0.0, 0.0, 0.0]), 220 | # vup=np.array([0.0, 0.0, 1.0]), 221 | # fov=45.0) 222 | 223 | # # Path to dummy file that is going to populate joint_parents, initial global positions, etc. 224 | # bvh_filename = "/private/home/tanmayshankar/Research/Code/CausalSkillLearning/Experiments/01_01_poses.bvh" 225 | 226 | # # Run init before loading animation. 227 | # MocapVisualizationUtils.init() 228 | # MocapVisualizationUtils.global_positions, MocapVisualizationUtils.joint_parents, MocapVisualizationUtils.time_per_frame = MocapVisualizationUtils.load_animation(bvh_filename) 229 | 230 | # # State sizes. 231 | # self.number_joints = 22 232 | # self.number_dimensions = 3 233 | # self.total_dimensions = self.number_joints*self.number_dimensions 234 | 235 | # # Run thread of viewer, so that callbacks start running. 236 | # thread = threading.Thread(target=self.run_thread) 237 | # thread.start() 238 | 239 | # # Also create dummy animation object. 240 | # self.animation_object, _, _ = BVH.load(bvh_filename) 241 | 242 | # def run_thread(self): 243 | # MocapVisualizationUtils.viewer.run( 244 | # title='BVH viewer', 245 | # cam=self.cam_cur, 246 | # size=(1280, 720), 247 | # keyboard_callback=None, 248 | # render_callback=MocapVisualizationUtils.render_callback_time_independent, 249 | # idle_callback=MocapVisualizationUtils.idle_callback_return, 250 | # ) 251 | 252 | # def get_global_positions(self, positions, animation_object=None): 253 | # # Function to get global positions corresponding to predicted or actual local positions. 254 | 255 | # traj_len = positions.shape[0] 256 | 257 | # def resample(original_trajectory, desired_number_timepoints): 258 | # original_traj_len = len(original_trajectory) 259 | # new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int) 260 | # return original_trajectory[new_timepoints] 261 | 262 | # if animation_object is not None: 263 | # # Now copy over from animation_object instead of just dummy animation object. 264 | # new_animation_object = Animation.Animation(resample(animation_object.rotations, traj_len), positions, animation_object.orients, animation_object.offsets, animation_object.parents) 265 | # else: 266 | # # Create a dummy animation object. 267 | # new_animation_object = Animation.Animation(self.animation_object.rotations[:traj_len], positions, self.animation_object.orients, self.animation_object.offsets, self.animation_object.parents) 268 | 269 | # # Then transform them. 270 | # transformed_global_positions = Animation.positions_global(new_animation_object) 271 | 272 | # # Now return coordinates. 273 | # return transformed_global_positions 274 | 275 | # def visualize_joint_trajectory(self, trajectory, return_gif=False, gif_path=None, gif_name="Traj.gif", segmentations=None, return_and_save=False, additional_info=None): 276 | 277 | # image_list = [] 278 | 279 | # if self.global_data: 280 | # # If we predicted in the global setting, just reshape. 281 | # predicted_global_positions = np.reshape(trajectory, (-1,self.number_joints,self.number_dimensions)) 282 | # else: 283 | # # If it's local data, then transform to global. 284 | # # Assume trajectory is number of timesteps x number_dimensions. 285 | # # Convert to number_of_timesteps x number_of_joints x 3. 286 | # predicted_local_positions = np.reshape(trajectory, (-1,self.number_joints,self.number_dimensions)) 287 | 288 | # # Assume trajectory was predicted in local coordinates. Transform to global for visualization. 289 | # predicted_global_positions = self.get_global_positions(predicted_local_positions, animation_object=additional_info) 290 | 291 | # # Copy into the global variable. 292 | # MocapVisualizationUtils.global_positions = predicted_global_positions 293 | 294 | # # Reset Image List. 295 | # MocapVisualizationUtils.image_list = [] 296 | # # Set save_path and prefix. 297 | # MocapVisualizationUtils.save_path = gif_path 298 | # MocapVisualizationUtils.name_prefix = gif_name.rstrip('.gif') 299 | # # Now set the whether_to_render as true. 300 | # MocapVisualizationUtils.whether_to_render = True 301 | 302 | # # Wait till rendering is complete. 303 | # x_count = 0 304 | # while MocapVisualizationUtils.done_with_render==False and MocapVisualizationUtils.whether_to_render==True: 305 | # x_count += 1 306 | # time.sleep(1) 307 | 308 | # # Now that rendering is complete, load images. 309 | # image_list = MocapVisualizationUtils.image_list 310 | 311 | # # Now actually save the GIF or return. 312 | # if return_and_save: 313 | # imageio.mimsave(os.path.join(gif_path,gif_name), image_list) 314 | # return image_list 315 | # elif return_gif: 316 | # return image_list 317 | # else: 318 | # imageio.mimsave(os.path.join(gif_path,gif_name), image_list) 319 | 320 | class ToyDataVisualizer(): 321 | 322 | def __init__(self): 323 | 324 | pass 325 | 326 | def visualize_joint_trajectory(self, trajectory, return_gif=False, gif_path=None, gif_name="Traj.gif", segmentations=None, return_and_save=False, additional_info=None): 327 | 328 | fig = plt.figure() 329 | ax = fig.gca() 330 | ax.scatter(trajectory[:,0],trajectory[:,1],c=range(len(trajectory)),cmap='jet') 331 | plt.xlim(-10,10) 332 | plt.ylim(-10,10) 333 | 334 | fig.canvas.draw() 335 | 336 | width, height = fig.get_size_inches() * fig.get_dpi() 337 | image = np.fromstring(fig.canvas.tostring_rgb(), dtype=np.uint8).reshape(int(height), int(width), 3) 338 | image = np.transpose(image, axes=[2,0,1]) 339 | 340 | return image 341 | 342 | 343 | if __name__ == '__main__': 344 | # end_eff_pose = [0.3, -0.3, 0.09798524029948213, 0.38044099037703677, 0.9228975092885654, -0.021717379118030174, 0.05525572942370394] 345 | # end_eff_pose = [0.53303758, -0.59997265, 0.09359371, 0.77337391, 0.34998901, 0.46797516, -0.24576358] 346 | # end_eff_pose = np.array([0.64, -0.83, 0.09798524029948213, 0.38044099037703677, 0.9228975092885654, -0.021717379118030174, 0.05525572942370394]) 347 | visualizer = MujocoVisualizer() 348 | # img = visualizer.set_ee_pose_return_image(end_eff_pose, arm='right') 349 | # scipy.misc.imsave('mj_vis.png', img) 350 | -------------------------------------------------------------------------------- /Experiments/cluster_run.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | 8 | """ 9 | Wrapper script for launching a job on the fair cluster. 10 | Sample usage: 11 | python cluster_run.py --name=trial --setup='/path/to/setup.sh' --cmd='job_command' 12 | """ 13 | 14 | from __future__ import absolute_import 15 | from __future__ import division 16 | from __future__ import print_function 17 | 18 | import pdb 19 | from absl import app 20 | from absl import flags 21 | import os 22 | import sys 23 | import random 24 | import string 25 | import datetime 26 | import re 27 | 28 | opts = flags.FLAGS 29 | 30 | flags.DEFINE_integer('nodes', 1, 'Number of nodes per task') 31 | flags.DEFINE_integer('ntp', 1, 'Number of tasks per node') 32 | flags.DEFINE_integer('ncpus', 40, 'Number of cpu cores per task') 33 | flags.DEFINE_integer('ngpus', 1, 'Number of gpus per task') 34 | 35 | flags.DEFINE_string('name', '', 'Job name') 36 | flags.DEFINE_enum('partition', 'learnfair', ['dev', 'priority','uninterrupted','learnfair'], 'Cluster partition') 37 | flags.DEFINE_string('comment', 'for ICML deadline in 2020.', 'Comment') 38 | flags.DEFINE_string('time', '72:00:00', 'Time for which the job should run') 39 | 40 | flags.DEFINE_string('setup', '/private/home/tanmayshankar/Research/Code/Setup.bash', 'Setup script that will be run before the command') 41 | # flags.DEFINE_string('workdir', os.getcwd(), 'Job command') 42 | flags.DEFINE_string('workdir', '/private/home/tanmayshankar/Research/Code/CausalSkillLearning/Experiments', 'Jod command') 43 | # flags.DEFINE_string('workdir', '/private/home/tanmayshankar/Research/Code/SkillsfromDemonstrations/Experiments/BidirectionalInfoModel/', 'Job command') 44 | flags.DEFINE_string('cmd', 'echo $PWD', 'Directory to run job from') 45 | 46 | 47 | def mkdir(path): 48 | if not os.path.exists(path): 49 | os.makedirs(path) 50 | 51 | def main(_): 52 | job_folder = '/checkpoint/tanmayshankar/jobs/' + datetime.date.today().strftime('%y_%m_%d') 53 | mkdir(job_folder) 54 | 55 | if len(opts.name) == 0: 56 | # read name from command 57 | opts.name = re.search('--name=\w+', opts.cmd).group(0)[7:] 58 | print(opts.name) 59 | slurm_cmd = '#!/bin/bash\n\n' 60 | slurm_cmd += '#SBATCH --job-name={}\n'.format(opts.name) 61 | slurm_cmd += '#SBATCH --output={}/{}-%j.out\n'.format(job_folder, opts.name) 62 | slurm_cmd += '#SBATCH --error={}/{}-%j.err\n'.format(job_folder, opts.name) 63 | # slurm_cmd += '#SBATCH --exclude=learnfair2038' 64 | slurm_cmd += '\n' 65 | 66 | slurm_cmd += '#SBATCH --partition={}\n'.format(opts.partition) 67 | if len(opts.comment) > 0: 68 | slurm_cmd += '#SBATCH --comment="{}"\n'.format(opts.comment) 69 | slurm_cmd += '\n' 70 | 71 | slurm_cmd += '#SBATCH --nodes={}\n'.format(opts.nodes) 72 | slurm_cmd += '#SBATCH --ntasks-per-node={}\n'.format(opts.ntp) 73 | if opts.ngpus > 0: 74 | slurm_cmd += '#SBATCH --gres=gpu:{}\n'.format(opts.ngpus) 75 | slurm_cmd += '#SBATCH --cpus-per-task={}\n'.format(opts.ncpus) 76 | slurm_cmd += '#SBATCH --time={}\n'.format(opts.time) 77 | slurm_cmd += '\n' 78 | 79 | slurm_cmd += 'source {}\n'.format(opts.setup) 80 | slurm_cmd += 'cd {} \n\n'.format(opts.workdir) 81 | slurm_cmd += '{}\n'.format(opts.cmd) 82 | 83 | job_fname = '{}/{}.sh'.format(job_folder, ''.join(random.choices(string.ascii_letters, k=8))) 84 | 85 | with open(job_fname, 'w') as f: 86 | f.write(slurm_cmd) 87 | 88 | #print('sbatch {}'.format(job_fname)) 89 | os.system('sbatch {}'.format(job_fname)) 90 | 91 | 92 | if __name__ == '__main__': 93 | app.run(main) 94 | 95 | -------------------------------------------------------------------------------- /Experiments/headers.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # All rights reserved. 3 | 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | import numpy as np 8 | import glob, os, sys, argparse 9 | import torch, copy 10 | from torch.utils.data import Dataset, DataLoader 11 | from torchvision import transforms, utils 12 | from IPython import embed 13 | 14 | import matplotlib 15 | matplotlib.use('Agg') 16 | # matplotlib.rcParams['animation.ffmpeg_args'] = '-report' 17 | matplotlib.rcParams['animation.bitrate'] = 2000 18 | import matplotlib.pyplot as plt 19 | import tensorboardX 20 | from scipy import stats 21 | from absl import flags 22 | from memory_profiler import profile 23 | from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas 24 | from matplotlib.figure import Figure 25 | 26 | from IPython import embed 27 | import pdb 28 | import sklearn.manifold as skl_manifold 29 | from sklearn.decomposition import PCA 30 | from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage, 31 | AnnotationBbox) 32 | from matplotlib.animation import FuncAnimation 33 | import tensorflow as tf 34 | import tempfile 35 | import moviepy.editor as mpy 36 | import subprocess 37 | import h5py 38 | import time 39 | import robosuite 40 | import unittest 41 | import cProfile 42 | 43 | from scipy import stats, signal 44 | from scipy.interpolate import interp1d 45 | from scipy.ndimage.filters import gaussian_filter1d 46 | from scipy.signal import find_peaks, argrelextrema 47 | 48 | from sklearn.neighbors import NearestNeighbors 49 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Learning Robot Skills with Temporal Variational Inference 2 | 3 | ### What is this? ### 4 | 5 | This repository has code for our ICML 2020 paper on [Learning Robot Skills with Temporal Variational Inference](https://proceedings.icml.cc/static/paper_files/icml/2020/2847-Paper.pdf), authored by Tanmay Shankar and Abhinav Gupta. 6 | 7 | ### I want a TL;DR of what this paper does. ### 8 | 9 | Our paper presents a way to jointly learn robot skills and how to use them from demonstrations in an unsupervised manner. 10 | The code implements the training procedure for this across 3 different datasets, and provides tools to visualize the learnt skills. 11 | 12 | ### Cool. Can I use your code? ### 13 | 14 | Yes! If you would like to use our code, please cite our paper and this repository in your work. 15 | Also, be aware of the license for this repository: the Creative Commons Attribution-NonCommercial 4.0 International. Details may be viewed in the License file. 16 | 17 | ### I need help, or I have brilliant ideas to make this code even better. ### 18 | 19 | Great! Feel free to mail Tanmay (tanmay.shankar@gmail.com), for help, suggestions, questions and feedback. You can also create issues in the repository, if you feel like the problem is pertinent to others. 20 | 21 | ### How do I set up this repository? ### 22 | 23 | #### Dependencies #### 24 | You will need a few packages to be able to run the code in this repository. 25 | For Robotic environments, you will need to install Mujoco, Mujoco_Py, OpenAI Gym, and Robosuite. [Here](https://docs.google.com/document/d/1V6BJf4R-2TXKO_IEOII5rLJbGj0jrJPptjtBCfczPk8/edit?usp=sharing) is a list of instructions on how to set these up. 26 | 27 | You will also need some standard deep learning packages, Pytorch, Tensorflow, Tensorboard, and TensorboardX. Usually you can just pip install these packages. We recommend using a virtual environment for them. 28 | 29 | #### Data #### 30 | We run our model on various publicly available datasets, i.e. the [MIME dataset](https://sites.google.com/view/mimedataset), the [Roboturk dataset](https://sites.google.com/view/mimedataset), and the [CMU Mocap dataset](http://mocap.cs.cmu.edu/). In the case of the MIME and Roboturk datasets, we collate relevant data modalities and store them in quickly accessible formats for our code. You can find the links to these files below. 31 | 32 | [MIME Dataset]() 33 | [Roboturk Dataset]() 34 | [CMU Mocap Dataset]() 35 | 36 | Once you have downloaded this data locally, you will want to feed the path to these datasets in the `--dataset_directory` command line flag when you run your code. 37 | 38 | ### Tell me how to run the code already! ### 39 | 40 | Here is a list of commands to run pre-training and joint skill learning on the various datasets used in our paper. The hyper-parameters specified here are used in our paper. Depending on your use case, you may want to play with these values. For a full list of the hyper-parameters, look at `Experiments/Master.py`. 41 | 42 | #### The MIME Dataset #### 43 | For the MIME dataset, to run pre-training of the low-level policy: 44 | 45 | ``` 46 | python Master.py --train=1 --setting=pretrain_sub --name=MIME_Pretraining --data=MIME --number_layers=8 --hidden_size=128 --kl_weight=0.01 --var_skill_length=1 --z_dimensions=64 --normalization=meanvar 47 | ``` 48 | 49 | This should automatically run some evaluation and visualization tools every few epochs, and you can view the results in Experimental_Logs//. 50 | Once you've run this pre-training, you can run the joint training using: 51 | 52 | ``` 53 | python Master.py --train=1 --setting=learntsub --name=J100 --normalization=meanvar --kl_weight=0.0001 --subpolicy_ratio=0.1 --latentpolicy_ratio=0.001 --b_probability_factor=0.01 --data=MIME --subpolicy_model=Experiment_Logs//saved_models/Model_epoch480 --latent_loss_weight=0.01 --z_dimensions=64 --traj_length=-1 --var_skill_length=1 --training_phase_size=200000 54 | ``` 55 | 56 | #### The Roboturk Dataset #### 57 | For the Roboturk dataset, to run pre-training of the low-level policy: 58 | 59 | ``` 60 | python Master.py --train=1 --setting=pretrain_sub --name=Roboturk_Pretraining --data=FullRoboturk --kl_weight=0.0001 --var_skill_length=1 --z_dimensions=64 --number_layers=8 --hidden_size=128 61 | ``` 62 | 63 | Just as in the case of the MIME dataset, you can then run the joint training using: 64 | 65 | ``` 66 | python Master.py --train=1 --setting=learntsub --name=RJ80 --latent_loss_weight=1. --latentpolicy_ratio=0.01 --kl_weight=0.0001 --subpolicy_ratio=0.1 --b_probability_factor=0.001 --data=Roboturk --subpolicy_model=Experiment_Logs//saved_models/Model_epoch20 --z_dimensions=64 --traj_length=-1 --var_skill_length=1 --number_layers=8 --hidden_size=128 67 | ``` 68 | Stay tuned for more! 69 | --------------------------------------------------------------------------------