├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Data
    ├── MIME_Test_Data.npy
    └── Roboturk_Test_Data.npy
├── DataGenerator
    ├── A_array_newcont_cond.npy
    ├── A_goal_directed.npy
    ├── B_array_newcont_cond.npy
    ├── B_goal_directed.npy
    ├── ContinuousNonZero.py
    ├── ContinuousTrajs.py
    ├── DeterministicGoalDirectedTraj.py
    ├── DirectedContinuousNonZero.py
    ├── DirectedContinuousTrajs.py
    ├── G_array_newcont_cond.npy
    ├── G_goal_directed.npy
    ├── GoalDirectedTrajs.py
    ├── NewGoalDirectedTraj.py
    ├── PolicyVisualizer.py
    ├── S_array_newcont_cond.npy
    ├── SeparableTrajs.py
    ├── X_array_newcont_cond.npy
    ├── X_goal_directed.npy
    ├── Y_array_newcont_cond.npy
    └── Y_goal_directed.npy
├── DataLoaders
    ├── GridWorld_DataLoader.py
    ├── InteractiveDataLoader.py
    ├── MIME_DataLoader.py
    ├── MIME_DataLoader.pyc
    ├── MIME_Img_DataLoader.py
    ├── MIMEandPlan_DataLoader.py
    ├── Plan_DataLoader.py
    ├── RandomWalks.py
    ├── RandomWalks.pyc
    ├── RoboturkeExp.py
    ├── SmallMaps_DataLoader.py
    ├── Translation.py
    ├── __init__.py
    ├── __init__.pyc
    ├── headers.py
    └── headers.pyc
├── DownstreamRL
    ├── PolicyNet.py
    └── TrainZPolicyRL.py
├── Experiments
    ├── Code_Runs
    │   └── CycleTransfer_Runs.py
    ├── DMP.py
    ├── DataLoaders.py
    ├── Eval_RLRewards.py
    ├── MIME_DataLoader.py
    ├── Master.py
    ├── MocapVisualizationExample.py
    ├── MocapVisualizationUtils.py
    ├── Mocap_DataLoader.py
    ├── PolicyManagers.py
    ├── PolicyNetworks.py
    ├── Processing_MocapData.py
    ├── RLUtils.py
    ├── Roboturk_DataLoader.py
    ├── TFLogger.py
    ├── TestClass.py
    ├── Visualizers.py
    ├── cluster_run.py
    └── headers.py
├── LICENSE
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore files.
2 | *.bvh
3 | *.html
4 | *.png
5 | *.jpg
6 | *.gif
7 | *.pyc
8 | Experiments/Experimental_Logs/*
9 | Experiments/Code_Runs/*


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as
 6 | contributors and maintainers pledge to make participation in our project and
 7 | our community a harassment-free experience for everyone, regardless of age, body
 8 | size, disability, ethnicity, sex characteristics, gender identity and expression,
 9 | level of experience, education, socio-economic status, nationality, personal
10 | appearance, race, religion, or sexual identity and orientation.
11 | 
12 | ## Our Standards
13 | 
14 | Examples of behavior that contributes to creating a positive environment
15 | include:
16 | 
17 | * Using welcoming and inclusive language
18 | * Being respectful of differing viewpoints and experiences
19 | * Gracefully accepting constructive criticism
20 | * Focusing on what is best for the community
21 | * Showing empathy towards other community members
22 | 
23 | Examples of unacceptable behavior by participants include:
24 | 
25 | * The use of sexualized language or imagery and unwelcome sexual attention or
26 | advances
27 | * Trolling, insulting/derogatory comments, and personal or political attacks
28 | * Public or private harassment
29 | * Publishing others' private information, such as a physical or electronic
30 | address, without explicit permission
31 | * Other conduct which could reasonably be considered inappropriate in a
32 | professional setting
33 | 
34 | ## Our Responsibilities
35 | 
36 | Project maintainers are responsible for clarifying the standards of acceptable
37 | behavior and are expected to take appropriate and fair corrective action in
38 | response to any instances of unacceptable behavior.
39 | 
40 | Project maintainers have the right and responsibility to remove, edit, or
41 | reject comments, commits, code, wiki edits, issues, and other contributions
42 | that are not aligned to this Code of Conduct, or to ban temporarily or
43 | permanently any contributor for other behaviors that they deem inappropriate,
44 | threatening, offensive, or harmful.
45 | 
46 | ## Scope
47 | 
48 | This Code of Conduct applies within all project spaces, and it also applies when
49 | an individual is representing the project or its community in public spaces.
50 | Examples of representing a project or community include using an official
51 | project e-mail address, posting via an official social media account, or acting
52 | as an appointed representative at an online or offline event. Representation of
53 | a project may be further defined and clarified by project maintainers.
54 | 
55 | ## Enforcement
56 | 
57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
58 | reported by contacting the project team at <opensource-conduct@fb.com>. All
59 | complaints will be reviewed and investigated and will result in a response that
60 | is deemed necessary and appropriate to the circumstances. The project team is
61 | obligated to maintain confidentiality with regard to the reporter of an incident.
62 | Further details of specific enforcement policies may be posted separately.
63 | 
64 | Project maintainers who do not follow or enforce the Code of Conduct in good
65 | faith may face temporary or permanent repercussions as determined by other
66 | members of the project's leadership.
67 | 
68 | ## Attribution
69 | 
70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
72 | 
73 | [homepage]: https://www.contributor-covenant.org
74 | 
75 | For answers to common questions about this code of conduct, see
76 | https://www.contributor-covenant.org/faq
77 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to CausalSkillLearning
 2 | We want to make contributing to this project as easy and transparent as
 3 | possible.
 4 | 
 5 | ## Pull Requests
 6 | We actively welcome your pull requests.
 7 | 
 8 | 1. Fork the repo and create your branch from `master`.
 9 | 2. If you've added code that should be tested, add tests.
10 | 3. If you've changed APIs, update the documentation.
11 | 4. Ensure the test suite passes.
12 | 5. Make sure your code lints.
13 | 6. If you haven't already, complete the Contributor License Agreement ("CLA").
14 | 
15 | ## Contributor License Agreement ("CLA")
16 | In order to accept your pull request, we need you to submit a CLA. You only need
17 | to do this once to work on any of Facebook's open source projects.
18 | 
19 | Complete your CLA here: <https://code.facebook.com/cla>
20 | 
21 | ## Issues
22 | We use GitHub issues to track public bugs. Please ensure your description is
23 | clear and has sufficient instructions to be able to reproduce the issue.
24 | 
25 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
26 | disclosure of security bugs. In those cases, please go through the process
27 | outlined on that page and do not file a public issue.
28 | 
29 | ## License
30 | By contributing to CausalSkillLearning, you agree that your contributions will be licensed
31 | under the LICENSE file in the root directory of this source tree.


--------------------------------------------------------------------------------
/Data/MIME_Test_Data.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/Data/MIME_Test_Data.npy


--------------------------------------------------------------------------------
/Data/Roboturk_Test_Data.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/Data/Roboturk_Test_Data.npy


--------------------------------------------------------------------------------
/DataGenerator/A_array_newcont_cond.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/A_array_newcont_cond.npy


--------------------------------------------------------------------------------
/DataGenerator/A_goal_directed.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/A_goal_directed.npy


--------------------------------------------------------------------------------
/DataGenerator/B_array_newcont_cond.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/B_array_newcont_cond.npy


--------------------------------------------------------------------------------
/DataGenerator/B_goal_directed.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/B_goal_directed.npy


--------------------------------------------------------------------------------
/DataGenerator/ContinuousNonZero.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | import numpy as np
 8 | from IPython import embed
 9 | 
10 | number_datapoints = 50000
11 | number_timesteps = 20
12 | 
13 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
14 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
15 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
16 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
17 | 
18 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
19 | 
20 | for i in range(number_datapoints):
21 | 	if i%1000==0:
22 | 		print("Processing Datapoint: ",i)
23 | 	b_array_dataset[i,0] = 1.
24 | 
25 | 	x_array_dataset[i,0] = 5*(np.random.random((2))-0.5)
26 | 
27 | 	reset_counter = 0
28 | 	for t in range(number_timesteps-1):
29 | 		
30 | 		# GET B
31 | 		if t>0:
32 | 			# b_array[t] = np.random.binomial(1,prob_b_given_x)
33 | 			# b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]])
34 | 
35 | 			# If 3,4,5 timesteps have passed, terminate. 
36 | 			if reset_counter>=3 and reset_counter<5:
37 | 				b_array_dataset[i,t] = np.random.binomial(1,0.33)
38 | 			elif reset_counter==5:
39 | 				b_array_dataset[i,t] = 1
40 | 
41 | 		# GET Y
42 | 		if b_array_dataset[i,t]:
43 | 			y_array_dataset[i,t] = np.random.random_integers(0,high=3)
44 | 			reset_counter = 0
45 | 		else:
46 | 			reset_counter+=1
47 | 			y_array_dataset[i,t] = y_array_dataset[i,t-1]
48 | 
49 | 		# GET A
50 | 
51 | 		# -0.05 is because the noise is from 0-0.1, so to balance this we make it -0.05
52 | 		a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2)) 
53 | 
54 | 		# GET X
55 | 		x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t]
56 | 
57 | 	# embed()
58 | 
59 | np.save("X_array_continuous_nonzero.npy",x_array_dataset)
60 | np.save("Y_array_continuous_nonzero.npy",y_array_dataset)
61 | np.save("B_array_continuous_nonzero.npy",b_array_dataset)
62 | np.save("A_array_continuous_nonzero.npy",a_array_dataset)


--------------------------------------------------------------------------------
/DataGenerator/ContinuousTrajs.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | import numpy as np
 8 | from IPython import embed
 9 | 
10 | number_datapoints = 50000
11 | number_timesteps = 20
12 | 
13 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
14 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
15 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
16 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
17 | 
18 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
19 | 
20 | for i in range(number_datapoints):
21 | 	if i%1000==0:
22 | 		print("Processing Datapoint: ",i)
23 | 	b_array_dataset[i,0] = 1.
24 | 
25 | 	reset_counter = 0
26 | 	for t in range(number_timesteps-1):
27 | 		
28 | 		# GET B
29 | 		if t>0:
30 | 			# b_array[t] = np.random.binomial(1,prob_b_given_x)
31 | 			# b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]])
32 | 
33 | 			# If 3,4,5 timesteps have passed, terminate. 
34 | 			if reset_counter>=3 and reset_counter<5:
35 | 				b_array_dataset[i,t] = np.random.binomial(1,0.33)
36 | 			elif reset_counter==5:
37 | 				b_array_dataset[i,t] = 1
38 | 
39 | 		# GET Y
40 | 		if b_array_dataset[i,t]:
41 | 			y_array_dataset[i,t] = np.random.random_integers(0,high=3)
42 | 			reset_counter = 0
43 | 		else:
44 | 			reset_counter+=1
45 | 			y_array_dataset[i,t] = y_array_dataset[i,t-1]
46 | 
47 | 		# GET A
48 | 		a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2))  		
49 | 
50 | 		# GET X
51 | 		x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t]
52 | 
53 | 	# embed()
54 | 
55 | np.save("X_array_continuous.npy",x_array_dataset)
56 | np.save("Y_array_continuous.npy",y_array_dataset)
57 | np.save("B_array_continuous.npy",b_array_dataset)
58 | np.save("A_array_continuous.npy",a_array_dataset)


--------------------------------------------------------------------------------
/DataGenerator/DeterministicGoalDirectedTraj.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | import numpy as np
  8 | from IPython import embed
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | #number_datapoints = 20
 12 | number_datapoints = 50000
 13 | number_timesteps = 25
 14 | 
 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int)
 20 | 
 21 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
 22 | start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5
 23 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5
 24 | 
 25 | valid_options = np.array([[2,3],[3,0],[1,2],[0,1]])
 26 | 
 27 | lim = 25
 28 | 
 29 | for i in range(number_datapoints):
 30 | 
 31 | 	if i%1000==0:
 32 | 		print("Processing Datapoint: ",i)
 33 | 
 34 | 	# b_array_dataset[i,0] = 1.	
 35 | 	goal_array_dataset[i] = np.random.random_integers(0,high=3)
 36 | 
 37 | 	# Adding random noise to start state.
 38 | 	x_array_dataset[i,-1] = goal_states[goal_array_dataset[i]] + 0.1*(np.random.random(2)-0.5)	
 39 | 	goal = goal_states[goal_array_dataset[i]]
 40 | 
 41 | 	reset_counter = 0
 42 | 	# for t in range(number_timesteps-1):
 43 | 	for t in reversed(range(number_timesteps-1)):
 44 | 
 45 | 		# GET B	# Must end on b==0. 
 46 | 		if t<(number_timesteps-2):
 47 | 			# b_array[t] = np.random.binomial(1,prob_b_given_x)
 48 | 			# b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]])
 49 | 
 50 | 			# If 3,4,5 timesteps have passed, terminate. 
 51 | 			if t<3:
 52 | 				b_array_dataset[i,t] = 0
 53 | 			elif reset_counter>=3 and reset_counter<5:
 54 | 				b_array_dataset[i,t] = np.random.binomial(1,0.33)
 55 | 			elif reset_counter==5:
 56 | 				b_array_dataset[i,t] = 1				
 57 | 		elif t==(number_timesteps-2):
 58 | 			b_array_dataset[i,t] = 1
 59 | 
 60 | 		# GET Y
 61 | 		if b_array_dataset[i,t]:
 62 | 			current_state = x_array_dataset[i,t+1]
 63 | 			unnorm_directions = current_state-goal.squeeze(0)
 64 | 			directions = unnorm_directions/abs(unnorm_directions)
 65 | 
 66 | 			# Set valid options. 
 67 | 			dot_product = np.dot(action_map, directions)
 68 | 			# valid_options = np.where(dot_product>=0)[0]
 69 | 			# Sincer we're going backwards in time, 
 70 | 			valid_options = np.where(dot_product<=0)[0]
 71 | 
 72 | 			# Compare states. If x-g_x>y_g_y, choose to go along... 
 73 | 			# embed()
 74 | 
 75 | 			# y_array_dataset[i,t] = np.random.choice(valid_options)
 76 | 			y_array_dataset[i,t] = valid_options[np.argmax(np.dot(action_map,unnorm_directions)[valid_options])]
 77 | 
 78 | 			reset_counter = 0
 79 | 		else:
 80 | 			reset_counter+=1
 81 | 			y_array_dataset[i,t] = y_array_dataset[i,t+1]
 82 | 
 83 | 		# GET A
 84 | 		a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2))  		
 85 | 
 86 | 		# GET X
 87 | 		# x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t]
 88 | 		x_array_dataset[i,t] = x_array_dataset[i,t+1]-a_array_dataset[i,t]
 89 | 
 90 | 	plt.scatter(goal_states[:,0],goal_states[:,1],s=50)
 91 | 	plt.scatter(x_array_dataset[i,:,0],x_array_dataset[i,:,1],cmap='jet',c=range(25))
 92 | 	plt.xlim(-lim, lim)
 93 | 	plt.ylim(-lim, lim)
 94 | 	plt.show()
 95 | 
 96 | 	# Roll over b's.
 97 | 	b_array_dataset = np.roll(b_array_dataset,1,axis=1)
 98 | 	
 99 | 
100 | np.save("X_deter_goal_directed.npy",x_array_dataset)
101 | np.save("Y_deter_goal_directed.npy",y_array_dataset)
102 | np.save("B_deter_goal_directed.npy",b_array_dataset)
103 | np.save("A_deter_goal_directed.npy",a_array_dataset)
104 | np.save("G_deter_goal_directed.npy",goal_array_dataset)
105 | 


--------------------------------------------------------------------------------
/DataGenerator/DirectedContinuousNonZero.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | import numpy as np
 8 | from IPython import embed
 9 | 
10 | number_datapoints = 50000
11 | number_timesteps = 25
12 | 
13 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
14 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
15 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
16 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
17 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int)
18 | 
19 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
20 | start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5
21 | valid_options = np.array([[2,3],[3,0],[1,2],[0,1]])
22 | 
23 | for i in range(number_datapoints):
24 | 
25 | 	if i%1000==0:
26 | 		print("Processing Datapoint: ",i)
27 | 	b_array_dataset[i,0] = 1.
28 | 
29 | 	# Select one of four starting points. (-2,-2), (-2,2), (2,-2), (2,2)
30 | 	goal_array_dataset[i] = np.random.random_integers(0,high=3)
31 | 	# Adding random noise to start state.
32 | 	x_array_dataset[i,0] = start_states[goal_array_dataset[i]] + 0.2*(np.random.random(2)-0.5)	
33 | 	goal = -start_states[goal_array_dataset[i]]
34 | 
35 | 	reset_counter = 0
36 | 	for t in range(number_timesteps-1):
37 | 
38 | 		# GET B
39 | 		if t>0:
40 | 			# b_array[t] = np.random.binomial(1,prob_b_given_x)
41 | 			# b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]])
42 | 
43 | 			# If 3,4,5 timesteps have passed, terminate. 
44 | 			if reset_counter>=3 and reset_counter<5:
45 | 				b_array_dataset[i,t] = np.random.binomial(1,0.33)
46 | 			elif reset_counter==5:
47 | 				b_array_dataset[i,t] = 1
48 | 
49 | 		# GET Y
50 | 		if b_array_dataset[i,t]:			
51 | 
52 | 			axes = -goal/abs(goal)
53 | 			step1 = 30*np.ones((2))-axes*np.abs(x_array_dataset[i,t]-x_array_dataset[i,0])
54 | 			# baseline = t*20*np.sqrt(2)/20
55 | 			baseline = t
56 | 			step2 = step1-baseline
57 | 			step3 = step2/step2.sum()
58 | 			y_array_dataset[i,t] = np.random.choice(valid_options[goal_array_dataset[i][0]])
59 | 
60 | 			reset_counter = 0
61 | 		else:
62 | 			reset_counter+=1
63 | 			y_array_dataset[i,t] = y_array_dataset[i,t-1]
64 | 
65 | 		# GET A
66 | 		a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2))  		
67 | 
68 | 		# GET X
69 | 		x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t]
70 | 
71 | np.save("X_dir_cont_nonzero.npy",x_array_dataset)
72 | np.save("Y_dir_cont_nonzero.npy",y_array_dataset)
73 | np.save("B_dir_cont_nonzero.npy",b_array_dataset)
74 | np.save("A_dir_cont_nonzero.npy",a_array_dataset)
75 | np.save("G_dir_cont_nonzero.npy",goal_array_dataset)
76 | 


--------------------------------------------------------------------------------
/DataGenerator/DirectedContinuousTrajs.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | import numpy as np
 8 | from IPython import embed
 9 | 
10 | number_datapoints = 50000
11 | number_timesteps = 25
12 | 
13 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
14 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
15 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
16 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
17 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int)
18 | 
19 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
20 | start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5
21 | valid_options = np.array([[2,3],[3,0],[1,2],[0,1]])
22 | 
23 | for i in range(number_datapoints):
24 | 
25 | 	if i%1000==0:
26 | 		print("Processing Datapoint: ",i)
27 | 	b_array_dataset[i,0] = 1.
28 | 
29 | 	# Select one of four starting points. (-2,-2), (-2,2), (2,-2), (2,2)
30 | 	goal_array_dataset[i] = np.random.random_integers(0,high=3)
31 | 	x_array_dataset[i,0] = start_states[goal_array_dataset[i]]	
32 | 	goal = -start_states[goal_array_dataset[i]]
33 | 
34 | 	reset_counter = 0
35 | 	for t in range(number_timesteps-1):
36 | 
37 | 		# GET B
38 | 		if t>0:
39 | 			# b_array[t] = np.random.binomial(1,prob_b_given_x)
40 | 			# b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]])
41 | 
42 | 			# If 3,4,5 timesteps have passed, terminate. 
43 | 			if reset_counter>=3 and reset_counter<5:
44 | 				b_array_dataset[i,t] = np.random.binomial(1,0.33)
45 | 			elif reset_counter==5:
46 | 				b_array_dataset[i,t] = 1
47 | 
48 | 		# GET Y
49 | 		if b_array_dataset[i,t]:			
50 | 
51 | 			axes = -goal/abs(goal)
52 | 			step1 = 30*np.ones((2))-axes*np.abs(x_array_dataset[i,t]-x_array_dataset[i,0])
53 | 			# baseline = t*20*np.sqrt(2)/20
54 | 			baseline = t
55 | 			step2 = step1-baseline
56 | 			step3 = step2/step2.sum()
57 | 			y_array_dataset[i,t] = np.random.choice(valid_options[goal_array_dataset[i][0]])
58 | 
59 | 			reset_counter = 0
60 | 		else:
61 | 			reset_counter+=1
62 | 			y_array_dataset[i,t] = y_array_dataset[i,t-1]
63 | 
64 | 		# GET A
65 | 		a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2))  		
66 | 
67 | 		# GET X
68 | 		x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t]
69 | 
70 | np.save("X_array_directed_continuous.npy",x_array_dataset)
71 | np.save("Y_array_directed_continuous.npy",y_array_dataset)
72 | np.save("B_array_directed_continuous.npy",b_array_dataset)
73 | np.save("A_array_directed_continuous.npy",a_array_dataset)


--------------------------------------------------------------------------------
/DataGenerator/G_array_newcont_cond.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/G_array_newcont_cond.npy


--------------------------------------------------------------------------------
/DataGenerator/G_goal_directed.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/G_goal_directed.npy


--------------------------------------------------------------------------------
/DataGenerator/GoalDirectedTrajs.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | import numpy as np
  8 | from IPython import embed
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | number_datapoints = 1
 12 | # number_datapoints = 50000
 13 | number_timesteps = 25
 14 | 
 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int)
 20 | 
 21 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
 22 | start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5
 23 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5
 24 | 
 25 | valid_options = np.array([[2,3],[3,0],[1,2],[0,1]])
 26 | 
 27 | for i in range(number_datapoints):
 28 | 
 29 | 	if i%1000==0:
 30 | 		print("Processing Datapoint: ",i)
 31 | 
 32 | 	# b_array_dataset[i,0] = 1.	
 33 | 	goal_array_dataset[i] = np.random.random_integers(0,high=3)
 34 | 
 35 | 	# Adding random noise to start state.
 36 | 	x_array_dataset[i,-1] = goal_states[goal_array_dataset[i]] + 0.1*(np.random.random(2)-0.5)	
 37 | 	goal = goal_states[goal_array_dataset[i]]
 38 | 
 39 | 	reset_counter = 0
 40 | 	# for t in range(number_timesteps-1):
 41 | 	for t in reversed(range(number_timesteps-1)):
 42 | 
 43 | 		# GET B	# Must end on b==0. 
 44 | 		if t<(number_timesteps-2):
 45 | 			# b_array[t] = np.random.binomial(1,prob_b_given_x)
 46 | 			# b_array_dataset[i,t] = np.random.binomial(1,pb_x[0,x_array_dataset[i,t]])
 47 | 
 48 | 			# If 3,4,5 timesteps have passed, terminate. 
 49 | 			if t<3:
 50 | 				b_array_dataset[i,t] = 0
 51 | 			elif reset_counter>=3 and reset_counter<5:
 52 | 				b_array_dataset[i,t] = np.random.binomial(1,0.33)
 53 | 			elif reset_counter==5:
 54 | 				b_array_dataset[i,t] = 1				
 55 | 		elif t==(number_timesteps-2):
 56 | 			b_array_dataset[i,t] = 1
 57 | 
 58 | 		# GET Y
 59 | 		if b_array_dataset[i,t]:
 60 | 			current_state = x_array_dataset[i,t+1]
 61 | 			# directions = current_state-goal.squeeze(0)
 62 | 			directions = goal.squeeze(0)-current_state
 63 | 			norm_directions = directions/abs(directions)
 64 | 
 65 | 			# # Set valid options. 
 66 | 			dot_product = np.dot(action_map, norm_directions)
 67 | 			# valid_options = np.where(dot_product>=0)[0]
 68 | 			# # Sincer we're going backwards in time, 
 69 | 			valid_options = np.where(dot_product<=0)[0]
 70 | 
 71 | 			# # axes = -goal/abs(goal)
 72 | 			# # step1 = 30*np.ones((2))-axes*np.abs(x_array_dataset[i,t]-x_array_dataset[i,0])
 73 | 			# # # baseline = t*20*np.sqrt(2)/20
 74 | 			# # baseline = t
 75 | 			# # step2 = step1-baseline
 76 | 			# # step3 = step2/step2.sum()
 77 | 			# # y_array_dataset[i,t] = np.random.choice(valid_options[goal_array_dataset[i][0]])
 78 | 			# embed()
 79 | 			dot_product = np.dot(action_map,directions)
 80 | 
 81 | 			y_array_dataset[i,t] = np.argmax(dot_product)
 82 | 			# y_array_dataset[i,t] = np.random.choice(valid_options)
 83 | 
 84 | 			reset_counter = 0
 85 | 		else:
 86 | 			reset_counter+=1
 87 | 			y_array_dataset[i,t] = y_array_dataset[i,t+1]
 88 | 
 89 | 		# GET A
 90 | 		a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.05+0.1*np.random.random((2))  		
 91 | 
 92 | 		# GET X
 93 | 		# x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t]
 94 | 		x_array_dataset[i,t] = x_array_dataset[i,t+1]-a_array_dataset[i,t]
 95 | 
 96 | 	plt.scatter(goal_states[:,0],goal_states[:,1],s=50)
 97 | 	plt.scatter(x_array_dataset[i,:,0],x_array_dataset[i,:,1],cmap='jet',c=range(25))
 98 | 	plt.xlim(-25,25)
 99 | 	plt.ylim(-25,25)
100 | 	plt.show()
101 | 
102 | 	# Roll over b's.
103 | 	b_array_dataset = np.roll(b_array_dataset,1,axis=1)
104 | 	
105 | 
106 | np.save("X_goal_directed.npy",x_array_dataset)
107 | np.save("Y_goal_directed.npy",y_array_dataset)
108 | np.save("B_goal_directed.npy",b_array_dataset)
109 | np.save("A_goal_directed.npy",a_array_dataset)
110 | np.save("G_goal_directed.npy",goal_array_dataset)
111 | 


--------------------------------------------------------------------------------
/DataGenerator/NewGoalDirectedTraj.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | import numpy as np, copy
  8 | from IPython import embed
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | number_datapoints = 20
 12 | # number_datapoints = 50000
 13 | number_timesteps = 25
 14 | 
 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int)
 20 | 
 21 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
 22 | # start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5
 23 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*10
 24 | 
 25 | # Creating a policy map. 
 26 | lim = 50
 27 | size = 9
 28 | scale = 5
 29 | policy_map = np.zeros((size,size),dtype=int)
 30 | 
 31 | # Row wise assignment: 
 32 | policy_map[0,:] = 2
 33 | 
 34 | policy_map[1,:7] = 2
 35 | policy_map[1,7:] = 1
 36 | 
 37 | policy_map[2:4,0] = 2
 38 | policy_map[2:4,1:4] = 3
 39 | policy_map[2:4,4:7] = 2
 40 | policy_map[2:4,7:] = 1
 41 | 
 42 | policy_map[4,:4] = 3
 43 | policy_map[4,4] = 3
 44 | policy_map[4,5:] = 1
 45 | 
 46 | policy_map[5,:3] = 3
 47 | policy_map[5,3:5] = 0
 48 | policy_map[5,5:] = 1
 49 | 
 50 | policy_map[6,:2] = 3
 51 | policy_map[6,2:7] = 0
 52 | policy_map[6,7:] = 1
 53 | 
 54 | policy_map[7:,0] = 3
 55 | policy_map[7:,1:7] = 0
 56 | policy_map[7:,7:] = 1
 57 | 
 58 | # policy_map = np.transpose(policy_map)
 59 | 
 60 | goal_based_policy_maps = np.zeros((4,size,size))
 61 | goal_based_policy_maps[0] = copy.deepcopy(policy_map)
 62 | goal_based_policy_maps[1] = np.flipud(policy_map)
 63 | goal_based_policy_maps[2] = np.fliplr(policy_map)
 64 | goal_based_policy_maps[3] = np.flipud(np.fliplr(policy_map))
 65 | 
 66 | def get_bucket(state, reference_state):
 67 | 	# baseline = 4*np.ones(2)
 68 | 	baseline = np.zeros(2)
 69 | 	compensated_state = state - reference_state
 70 | 	# compensated_state = (np.round(state - reference_state) + baseline).astype(int)
 71 | 	
 72 | 	x = (np.arange(-(size-1)/2,(size-1)/2+1)-0.5)*scale
 73 | 	
 74 | 	bucket = np.zeros((2))
 75 | 	
 76 | 	bucket[0] = min(np.searchsorted(x,compensated_state[0]),size-1)
 77 | 	bucket[1] = min(np.searchsorted(x,compensated_state[1]),size-1)
 78 | 	
 79 | 	return bucket.astype(int)
 80 | 
 81 | for i in range(number_datapoints):
 82 | 
 83 | 	if i%1000==0:
 84 | 		print("Processing Datapoint: ",i)
 85 | 
 86 | 	# b_array_dataset[i,0] = 1.	
 87 | 	goal_array_dataset[i] = np.random.random_integers(0,high=3)
 88 | 
 89 | 	# Adding random noise to start state.
 90 | 	# x_array_dataset[i,0] = goal_states[goal_array_dataset[i]] + 0.1*(np.random.random(2)-0.5)
 91 | 
 92 | 	scale = 25
 93 | 	x_array_dataset[i,0] = goal_states[goal_array_dataset[i]] + scale*(np.random.random(2)-0.5)
 94 | 	goal = goal_states[goal_array_dataset[i]]
 95 | 
 96 | 	reset_counter = 0
 97 | 	for t in range(number_timesteps-1):
 98 | 
 99 | 		# GET B
100 | 		if t>0:
101 | 			# If 3,4,5 timesteps have passed, terminate. 
102 | 			if reset_counter>=3 and reset_counter<5:
103 | 				b_array_dataset[i,t] = np.random.binomial(1,0.33)
104 | 			elif reset_counter==5:
105 | 				b_array_dataset[i,t] = 1
106 | 
107 | 		# GET Y
108 | 		if b_array_dataset[i,t]:
109 | 			current_state = x_array_dataset[i,t]
110 | 
111 | 			# Select options from policy map, based on the bucket the current state falls in. 
112 | 			bucket = get_bucket(current_state, goal_states[goal_array_dataset[i]][0])
113 | 			# Now that we've the bucket, pick the option we should be executing given the bucket.
114 | 
115 | 			if (bucket==0).all():
116 | 				y_array_dataset[i,t] = np.random.randint(0,high=4)
117 | 			else:
118 | 				y_array_dataset[i,t] = goal_based_policy_maps[goal_array_dataset[i], bucket[0], bucket[1]]
119 | 				y_array_dataset[i,t] = policy_map[bucket[0], bucket[1]]
120 | 			reset_counter = 0
121 | 		else:
122 | 			reset_counter+=1
123 | 			y_array_dataset[i,t] = y_array_dataset[i,t-1]
124 | 
125 | 		# GET A
126 | 		a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]-0.1*(np.random.random((2))-0.5)
127 | 
128 | 		# GET X
129 | 		# Already taking care of backwards generation here, no need to use action_compliments. 
130 | 
131 | 		x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t]	
132 | 
133 | 	plt.scatter(goal_states[:,0],goal_states[:,1],s=50)
134 | 	# plt.scatter()
135 | 	plt.scatter(x_array_dataset[i,:,0],x_array_dataset[i,:,1],cmap='jet',c=range(25))
136 | 	plt.xlim(-lim,lim)
137 | 	plt.ylim(-lim,lim)
138 | 	plt.show()
139 | 
140 | 	# Roll over b's.
141 | 	b_array_dataset = np.roll(b_array_dataset,1,axis=1)
142 | 	
143 | 
144 | np.save("X_goal_directed.npy",x_array_dataset)
145 | np.save("Y_goal_directed.npy",y_array_dataset)
146 | np.save("B_goal_directed.npy",b_array_dataset)
147 | np.save("A_goal_directed.npy",a_array_dataset)
148 | np.save("G_goal_directed.npy",goal_array_dataset)
149 | 
150 | 
151 | 
152 | 
153 | 
154 | 
155 | 
156 | 


--------------------------------------------------------------------------------
/DataGenerator/PolicyVisualizer.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | import numpy as np, copy
  8 | from IPython import embed
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | number_datapoints = 20
 12 | # number_datapoints = 50000
 13 | number_timesteps = 25
 14 | 
 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int)
 20 | 
 21 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
 22 | # action_map = np.array([[-1,0],[0,-1],[1,0],[0,1]])
 23 | 
 24 | # start_states = np.array([[-2,-2],[-2,2],[2,-2],[2,2]])*5
 25 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5
 26 | 
 27 | # Creating a policy map. 
 28 | size = 9
 29 | scale = 5
 30 | policy_map = np.zeros((size,size),dtype=int)
 31 | 
 32 | # Row wise assignment: 
 33 | policy_map[0,:] = 2
 34 | 
 35 | policy_map[1,:7] = 2
 36 | policy_map[1,7:] = 1
 37 | 
 38 | policy_map[2:4,0] = 2
 39 | policy_map[2:4,1:4] = 3
 40 | policy_map[2:4,4:7] = 2
 41 | policy_map[2:4,7:] = 1
 42 | 
 43 | policy_map[4,:4] = 3
 44 | policy_map[4,4] = 3
 45 | policy_map[4,5:] = 1
 46 | 
 47 | policy_map[5,:3] = 3
 48 | policy_map[5,3:5] = 0
 49 | policy_map[5,5:] = 1
 50 | 
 51 | policy_map[6,:2] = 3
 52 | policy_map[6,2:7] = 0
 53 | policy_map[6,7:] = 1
 54 | 
 55 | policy_map[7:,0] = 3
 56 | policy_map[7:,1:7] = 0
 57 | policy_map[7:,7:] = 1
 58 | 
 59 | policy_map = np.transpose(policy_map)
 60 | 
 61 | 
 62 | # x = np.meshgrid(range(9),range(9))
 63 | x = np.meshgrid(np.arange(9),np.arange(9))
 64 | dxdy = action_map[policy_map[x[0],x[1]]]
 65 | 
 66 | traj = np.zeros((10,2))
 67 | traj[0] = [0,8]
 68 | for t in range(9):
 69 | 	# embed()
 70 | 	action_index = policy_map[int(traj[t,0]),int(traj[t,1])]
 71 | 	action = action_map[action_index]
 72 | 	traj[t+1] = traj[t] + action
 73 | 	print(action_index, action)
 74 | 
 75 | plt.ylim(9,-1)
 76 | plt.plot(traj[:,0],traj[:,1],'or')      
 77 | plt.plot(traj[:,0],traj[:,1],'r')      
 78 | 
 79 | plt.scatter(x[0],x[1])      
 80 | for i in range(9):                      
 81 | 	for j in range(9):
 82 | 		plt.arrow(x[0][i,j],x[1][i,j],0.1*dxdy[i,j,0],0.1*dxdy[i,j,1],width=0.01)
 83 | 
 84 | plt.show()
 85 | 
 86 | # embed()
 87 | 
 88 | # Transformed vis.
 89 | size = 9
 90 | scale = 5
 91 | scaled_size = scale*size
 92 | # policy_map = np.flipud(np.transpose(policy_map))
 93 | policy_map = np.transpose(policy_map)
 94 | # goal_based_policy_maps = np.zeros((4,size,size),dtype=int)
 95 | # goal_based_policy_maps[0] = copy.deepcopy(policy_map)
 96 | # goal_based_policy_maps[1] = np.rot90(policy_map)
 97 | # goal_based_policy_maps[2] = np.rot90(policy_map,k=2)
 98 | # goal_based_policy_maps[3] = np.rot90(policy_map,k=3)
 99 | 
100 | def get_bucket(state, reference_state):
101 | 	# baseline = 4*np.ones(2)
102 | 	baseline = np.zeros(2)
103 | 	compensated_state = state - reference_state
104 | 	# compensated_state = (np.round(state - reference_state) + baseline).astype(int)
105 | 
106 | 	scaled_size = scale*size
107 | 	x = (np.arange(-(size-1)/2,(size-1)/2+1)-0.5)*scale
108 | 
109 | 	bucket = np.zeros((2))
110 | 	
111 | 	bucket[0] = min(np.searchsorted(x,compensated_state[0]),size-1)
112 | 	bucket[1] = min(np.searchsorted(x,compensated_state[1]),size-1)
113 | 	
114 | 	return bucket.astype(int)
115 | 
116 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*10
117 | 
118 | # goal_index = 1
119 | # # meshrange = np.arange(-scaled_size/2,scaled_size/2+1,5)
120 | # meshrange = (np.arange(-(size-1)/2,(size-1)/2+1)-0.5)*scale
121 | # evalrange = (np.arange(-(size-1)/2,(size-1)/2+1)-1)*scale
122 | 
123 | # x = np.meshgrid(goal_states[goal_index,0]+meshrange,goal_states[goal_index,1]+meshrange) 
124 | 
125 | # dxdy = np.zeros((9,9,2))
126 | # # dxdy = action_map[policy_map[x[0],x[1]]]
127 | # plt.scatter(x[0],x[1])      
128 | # plt.ylim(50,-50)
129 | 
130 | # arr = np.zeros((9,9,2))
131 | 
132 | # for i in range(9):                      
133 | # 	for j in range(9):
134 | # 		a = goal_states[goal_index,0]+evalrange[i]
135 | # 		b = goal_states[goal_index,1]+evalrange[j]
136 | # 		bucket = get_bucket(np.array([a,b]), goal_states[goal_index])
137 | # 		arr[i,j,0] = i
138 | # 		arr[i,j,1] = j
139 | # 		dxdy[bucket[0],bucket[1]] = action_map[policy_map[bucket[0],bucket[1]]]
140 | # 		plt.arrow(x[0][i,j],x[1][i,j],0.1*dxdy[i,j,0],0.1*dxdy[i,j,1],width=0.01*scale)
141 | 
142 | # plt.show()
143 | 
144 | for goal_index in range(4):
145 | 	# embed()
146 | 	# meshrange = np.arange(-scaled_size/2,scaled_size/2+1,5)
147 | 	meshrange = (np.arange(-(size-1)/2,(size-1)/2+1)-0.5)*scale
148 | 	evalrange = (np.arange(-(size-1)/2,(size-1)/2+1)-1)*scale
149 | 
150 | 	x = np.meshgrid(goal_states[goal_index,0]+meshrange,goal_states[goal_index,1]+meshrange) 
151 | 
152 | 	dxdy = np.zeros((9,9,2))
153 | 	# dxdy = action_map[policy_map[x[0],x[1]]]
154 | 	plt.scatter(x[0],x[1])      
155 | 	plt.ylim(50,-50)
156 | 	plt.xlim(-50,50)
157 | 
158 | 	arr = np.zeros((9,9,2))
159 | 
160 | 	for i in range(9):                      
161 | 		for j in range(9):
162 | 			a = goal_states[goal_index,0]+evalrange[i]
163 | 			b = goal_states[goal_index,1]+evalrange[j]
164 | 			bucket = get_bucket(np.array([a,b]), goal_states[goal_index])
165 | 			arr[i,j,0] = i
166 | 			arr[i,j,1] = j
167 | 			# dxdy[bucket[0],bucket[1]] = action_map[goal_based_policy_maps[goal_index,bucket[0],bucket[1]]]
168 | 			dxdy[bucket[0],bucket[1]] = action_map[policy_map[bucket[0],bucket[1]]]
169 | 			# plt.arrow(x[0][i,j],x[1][i,j],0.1*dxdy[i,j,0],0.1*dxdy[i,j,1],width=0.01*scale)
170 | 	
171 | 	# plt.quiver(x[0],x[1],0.1*dxdy[:,:,1],0.1*dxdy[:,:,0],width=0.0001,headwidth=4,headlength=2)			
172 | 	plt.quiver(x[0],x[1],0.1*dxdy[:,:,1],0.1*dxdy[:,:,0])
173 | 
174 | 	traj_len = 20
175 | 	traj = np.zeros((20,2))
176 | 	traj[0] = np.random.randint(-25,high=25,size=2)
177 | 	
178 | 	for t in range(traj_len-1):
179 | 		
180 | 		bucket = get_bucket(traj[t], goal_states[goal_index])
181 | 		action_index = policy_map[bucket[0],bucket[1]]
182 | 		action = action_map[action_index]
183 | 		traj[t+1] = traj[t] + action
184 | 		
185 | 	plt.plot(traj[:,0],traj[:,1],'r')      
186 | 	plt.plot(traj[:,0],traj[:,1],'or')
187 | 
188 | 	plt.show()
189 | 
190 | 


--------------------------------------------------------------------------------
/DataGenerator/S_array_newcont_cond.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/S_array_newcont_cond.npy


--------------------------------------------------------------------------------
/DataGenerator/SeparableTrajs.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | import numpy as np
  8 | from IPython import embed
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | # number_datapoints = 20
 12 | number_datapoints = 50000
 13 | number_timesteps = 20
 14 | 
 15 | x_array_dataset = np.zeros((number_datapoints, number_timesteps, 2))
 16 | a_array_dataset = np.zeros((number_datapoints, number_timesteps-1, 2))
 17 | y_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 18 | b_array_dataset = np.zeros((number_datapoints, number_timesteps-1),dtype=int)
 19 | goal_array_dataset = np.zeros((number_datapoints, 1),dtype=int)
 20 | start_config_dataset = np.zeros((number_datapoints, 1),dtype=int)
 21 | 
 22 | action_map = np.array([[0,-1],[-1,0],[0,1],[1,0]])
 23 | start_scale = 15
 24 | start_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*start_scale
 25 | goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5
 26 | scale = 5
 27 | start_configs = np.zeros((4,5,2),dtype=int)
 28 | start_configs[[0,3]] = np.array([[-2,2],[-1,1],[0,0],[1,-1],[2,-2]])*scale
 29 | start_configs[[1,2]] = np.array([[-2,-2],[-1,-1],[0,0],[1,1],[2,2]])*scale
 30 | 
 31 | # valid_options = np.array([[2,3],[3,0],[1,2],[0,1]])
 32 | valid_options = np.array([[3,2],[3,0],[2,1],[0,1]])
 33 | lim = 50
 34 | 
 35 | progression_of_options = np.zeros((5,4),dtype=int)
 36 | progression_of_options[1,0] = 1
 37 | progression_of_options[2,:2] = 1
 38 | progression_of_options[3,1:] = 1
 39 | progression_of_options[4,:] = 1
 40 | 
 41 | for i in range(number_datapoints):
 42 | 
 43 | 	if i%1000==0:
 44 | 		print("Processing Datapoint: ",i)
 45 | 
 46 | 	goal_array_dataset[i] = np.random.random_integers(0,high=3)
 47 | 	start_config_dataset[i] = np.random.random_integers(0,high=4)
 48 | 	# start_config_dataset[i] = 4
 49 | 	
 50 | 	# Adding random noise to start state.
 51 | 	x_array_dataset[i,0] = start_states[goal_array_dataset[i]] + start_configs[goal_array_dataset[i],start_config_dataset[i]] + 0.1*(np.random.random(2)-0.5)	
 52 | 
 53 | 	reset_counter = 0
 54 | 	option_counter = 0
 55 | 
 56 | 	for t in range(number_timesteps-1):
 57 | 
 58 | 		# GET B
 59 | 		if t==0:
 60 | 			b_array_dataset[i,t] = 1
 61 | 		if t>0:
 62 | 			# If 3,4,5 timesteps have passed, terminate. 
 63 | 			if reset_counter>=3 and reset_counter<5:
 64 | 				b_array_dataset[i,t] = np.random.binomial(1,0.33)
 65 | 			elif reset_counter==5:
 66 | 				b_array_dataset[i,t] = 1
 67 | 
 68 | 		# GET Y
 69 | 		if b_array_dataset[i,t]:
 70 | 			current_state = x_array_dataset[i,t]
 71 | 			
 72 | 			# select new y_array_dataset[i,t]
 73 | 			y_array_dataset[i,t] = valid_options[goal_array_dataset[i]][0][progression_of_options[start_config_dataset[i],min(option_counter,3)]]
 74 | 
 75 | 			option_counter+=1
 76 | 			reset_counter = 0			
 77 | 		else:
 78 | 			reset_counter+=1
 79 | 			y_array_dataset[i,t] = y_array_dataset[i,t-1]
 80 | 
 81 | 		# GET A
 82 | 		a_array_dataset[i,t] = action_map[y_array_dataset[i,t]]+0.1*(np.random.random((2))-0.5)
 83 | 
 84 | 		# GET X
 85 | 		# Already taking care of backwards generation here, no need to use action_compliments. 
 86 | 
 87 | 		x_array_dataset[i,t+1] = x_array_dataset[i,t]+a_array_dataset[i,t]	
 88 | 
 89 | 	# plt.scatter(goal_states[:,0],goal_states[:,1],s=50)
 90 | 	# # plt.scatter()
 91 | 	# plt.scatter(x_array_dataset[i,:,0],x_array_dataset[i,:,1],cmap='jet',c=range(number_timesteps))
 92 | 	# plt.xlim(-lim,lim)
 93 | 	# plt.ylim(-lim,lim)
 94 | 	# plt.show()
 95 | 
 96 | 
 97 | 	# Roll over b's.
 98 | 	b_array_dataset = np.roll(b_array_dataset,1,axis=1)
 99 | 	
100 | 
101 | np.save("X_separable.npy",x_array_dataset)
102 | np.save("Y_separable.npy",y_array_dataset)
103 | np.save("B_separable.npy",b_array_dataset)
104 | np.save("A_separable.npy",a_array_dataset)
105 | np.save("G_separable.npy",goal_array_dataset)
106 | np.save("StartConfig_separable.npy",start_config_dataset)
107 | 


--------------------------------------------------------------------------------
/DataGenerator/X_array_newcont_cond.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/X_array_newcont_cond.npy


--------------------------------------------------------------------------------
/DataGenerator/X_goal_directed.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/X_goal_directed.npy


--------------------------------------------------------------------------------
/DataGenerator/Y_array_newcont_cond.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/Y_array_newcont_cond.npy


--------------------------------------------------------------------------------
/DataGenerator/Y_goal_directed.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataGenerator/Y_goal_directed.npy


--------------------------------------------------------------------------------
/DataLoaders/GridWorld_DataLoader.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | from headers import *
 8 | 
 9 | class GridWorldDataset(Dataset):
10 | 
11 | 	# Class implementing instance of dataset class for gridworld data. 
12 | 
13 | 	def __init__(self, dataset_directory):
14 | 		self.dataset_directory = dataset_directory
15 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
16 | 
17 | 		self.action_map = np.array([[-1,0],[1,0],[0,-1],[0,1],[-1,-1],[-1,1],[1,-1],[1,1]])
18 | 		## UP, DOWN, LEFT, RIGHT, UPLEFT, UPRIGHT, DOWNLEFT, DOWNRIGHT. ##
19 | 
20 | 	def __len__(self):
21 | 
22 | 		# Find out how many images we've stored. 
23 | 		filelist = glob.glob(os.path.join(self.dataset_directory,"*.png"))
24 | 
25 | 		# FOR NOW: USE ONLY till 3200 images. 		
26 | 		return 3200
27 | 		# return len(filelist)		
28 | 
29 | 	def parse_trajectory_actions(self, coordinate_trajectory):
30 | 		# Takes coordinate trajectory, returns action index taken. 
31 | 
32 | 		state_diffs = np.diff(coordinate_trajectory,axis=0)
33 | 		action_sequence = np.zeros((len(state_diffs)),dtype=int)
34 | 
35 | 		for i in range(len(state_diffs)):
36 | 			for k in range(len(self.action_map)):
37 | 				if (state_diffs[i]==self.action_map[k]).all():
38 | 					action_sequence[i]=k
39 | 
40 | 		return action_sequence.astype(float)
41 | 
42 | 	def __getitem__(self, index):
43 | 
44 | 		# The getitem function must return a Map-Trajectory pair. 
45 | 		# We will handle per-timestep processes within our code. 
46 | 		# Assumes index is within range [0,len(filelist)-1]
47 | 		image = cv2.imread(os.path.join(self.dataset_directory,"Image{0}.png".format(index)))
48 | 		coordinate_trajectory = np.load(os.path.join(self.dataset_directory,"Image{0}_Traj1.npy".format(index))).astype(float)
49 | 
50 | 		action_sequence = self.parse_trajectory_actions(coordinate_trajectory)		
51 | 
52 | 		return image, coordinate_trajectory, action_sequence


--------------------------------------------------------------------------------
/DataLoaders/InteractiveDataLoader.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | from __future__ import absolute_import
 8 | from __future__ import division
 9 | from __future__ import print_function
10 | from .headers import *
11 | from . import MIME_DataLoader
12 | 
13 | opts = flags.FLAGS
14 | 
15 | def main(_):
16 | 
17 | 	dataset = MIME_DataLoader.MIME_Dataset(opts)
18 | 	print("Created DataLoader.")
19 | 
20 | 	embed()
21 | 
22 | if __name__ == '__main__':
23 | 	app.run(main)


--------------------------------------------------------------------------------
/DataLoaders/MIME_DataLoader.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | from .headers import *
 12 | import os.path as osp
 13 | 
 14 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers')
 15 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1')
 16 | flags.DEFINE_string('MIME_dir', '/checkpoint/tanmayshankar/MIME/', 'Data Directory')
 17 | # flags.DEFINE_boolean('downsampling', True, 'Whether to downsample trajectories. ')
 18 | flags.DEFINE_integer('ds_freq', 20, 'Downsample joint trajectories by this fraction. Original recroding rate = 100Hz')
 19 | flags.DEFINE_boolean('remote', False, 'Whether operating from a remote server or not.')
 20 | # opts = flags.FLAGS
 21 | 
 22 | 
 23 | def select_baxter_angles(trajectory, joint_names, arm='right'):
 24 |     # joint names in order as used via mujoco visualizer
 25 |     baxter_joint_names = ['right_s0', 'right_s1', 'right_e0', 'right_e1', 'right_w0', 'right_w1', 'right_w2', 'left_s0', 'left_s1', 'left_e0', 'left_e1', 'left_w0', 'left_w1', 'left_w2']
 26 |     if arm == 'right':
 27 |         select_joints = baxter_joint_names[:7]
 28 |     elif arm == 'left':
 29 |         select_joints = baxter_joint_names[7:]
 30 |     elif arm == 'both':
 31 |         select_joints = baxter_joint_names
 32 |     inds = [joint_names.index(j) for j in select_joints]
 33 |     return trajectory[:, inds]
 34 | 
 35 | 
 36 | def resample(original_trajectory, desired_number_timepoints):
 37 | 	original_traj_len = len(original_trajectory)
 38 | 	new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int)
 39 | 	return original_trajectory[new_timepoints]
 40 | 
 41 | 
 42 | class MIME_Dataset(Dataset):
 43 | 	'''
 44 | 	Class implementing instance of dataset class for MIME data. 
 45 | 	'''
 46 | 	def __init__(self, opts, split='all'):
 47 | 		self.dataset_directory = opts.MIME_dir
 48 | 
 49 | 		# Default: /checkpoint/tanmayshankar/MIME/
 50 | 		self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/joint_angles.txt')
 51 | 
 52 | 		if opts.remote:
 53 | 			self.suff_filelist = np.load(osp.join(self.dataset_directory,"Suffix_Filelist.npy"))
 54 | 			self.filelist = []
 55 | 			for j in range(len(self.suff_filelist)):
 56 | 				self.filelist.append(osp.join(self.dataset_directory,self.suff_filelist[j]))
 57 | 		else:
 58 | 			self.filelist = glob.glob(self.fulltext)
 59 | 
 60 | 		self.ds_freq = opts.ds_freq
 61 | 
 62 | 		with open(self.filelist[0], 'r') as file:
 63 | 			lines = file.readlines()
 64 | 			self.joint_names = sorted(eval(lines[0].rstrip('\n')).keys())
 65 | 
 66 | 		if split == 'all':
 67 | 			self.filelist = self.filelist
 68 | 		else:
 69 | 			self.task_lists = np.load(os.path.join(
 70 | 				self.dataset_directory, 'MIME_jointangles/{}_Lists.npy'.format(split.capitalize())))
 71 | 
 72 | 			self.filelist = []
 73 | 			for i in range(20):
 74 | 				self.filelist.extend(self.task_lists[i])
 75 | 			self.filelist = [f.replace('/checkpoint/tanmayshankar/MIME/', opts.MIME_dir) for f in self.filelist]
 76 | 		# print(len(self.filelist))
 77 | 
 78 | 	def __len__(self):
 79 | 		# Return length of file list. 
 80 | 		return len(self.filelist)
 81 | 
 82 | 	def __getitem__(self, index):
 83 | 		'''
 84 | 		# Returns Joint Angles as: 
 85 | 		# List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 
 86 | 		# Assumes index is within range [0,len(filelist)-1]
 87 | 		'''
 88 | 		file = self.filelist[index]
 89 | 
 90 | 		left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt'))
 91 | 		right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt'))
 92 | 
 93 | 		orig_left_traj = np.load(osp.join(osp.split(file)[0], 'Left_EE.npy'))
 94 | 		orig_right_traj = np.load(osp.join(osp.split(file)[0], 'Right_EE.npy'))        
 95 | 
 96 | 		joint_angle_trajectory = []
 97 | 		# Open file. 
 98 | 		with open(file, 'r') as file:
 99 | 			lines = file.readlines()
100 | 			for line in lines:
101 | 				dict_element = eval(line.rstrip('\n'))
102 | 				if len(dict_element.keys()) == len(self.joint_names):
103 | 					# some files have extra lines with gripper keys e.g. MIME_jointangles/4/12405Nov19/joint_angles.txt
104 | 					array_element = np.array([dict_element[joint] for joint in self.joint_names])
105 | 					joint_angle_trajectory.append(array_element)
106 | 
107 | 		joint_angle_trajectory = np.array(joint_angle_trajectory)
108 | 
109 | 		n_samples = len(orig_left_traj) // self.ds_freq
110 | 
111 | 		elem = {}
112 | 		elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples)
113 | 		elem['left_trajectory'] = resample(orig_left_traj, n_samples)
114 | 		elem['right_trajectory'] = resample(orig_right_traj, n_samples)
115 | 		elem['left_gripper'] = resample(left_gripper, n_samples)/100
116 | 		elem['right_gripper'] = resample(right_gripper, n_samples)/100
117 | 		elem['path_prefix'] = os.path.split(self.filelist[index])[0]
118 | 		elem['ra_trajectory'] = select_baxter_angles(elem['joint_angle_trajectory'], self.joint_names, arm='right')
119 | 		elem['la_trajectory'] = select_baxter_angles(elem['joint_angle_trajectory'], self.joint_names, arm='left')
120 | 		# If max norm of differences is <1.0, valid. 
121 | 		elem['is_valid'] = int(np.linalg.norm(np.diff(elem['joint_angle_trajectory'],axis=0),axis=1).max() < 1.0)
122 | 
123 | 		return elem
124 | 
125 | 	def recreate_dictionary(self, arm, joint_angles):
126 | 		if arm=="left":
127 | 			offset = 2
128 | 			width = 7 
129 | 		elif arm=="right":
130 | 			offset = 9
131 | 			width = 7
132 | 		elif arm=="full":
133 | 			offset = 0
134 | 			width = len(self.joint_names)
135 | 		return dict((self.joint_names[i],joint_angles[i-offset]) for i in range(offset,offset+width))
136 | 
137 | # ------------ Data Loader ----------- #
138 | # ------------------------------------ #
139 | def data_loader(opts, split='all', shuffle=True):
140 | 	dset = MIME_Dataset(opts, split=split)
141 | 
142 | 	return DataLoader(
143 | 		dset,
144 | 		batch_size=opts.batch_size,
145 | 		shuffle=shuffle,
146 | 		num_workers=opts.n_data_workers,
147 | 		drop_last=True)
148 | 


--------------------------------------------------------------------------------
/DataLoaders/MIME_DataLoader.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/MIME_DataLoader.pyc


--------------------------------------------------------------------------------
/DataLoaders/MIME_Img_DataLoader.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | from .headers import *
 12 | import os.path as osp
 13 | import pdb
 14 | import scipy.misc
 15 | 
 16 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers')
 17 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1')
 18 | flags.DEFINE_string('MIME_dir', '/checkpoint/tanmayshankar/MIME/', 'Data Directory')
 19 | flags.DEFINE_string('MIME_imgs_dir', '/checkpoint/shubhtuls/data/MIME/', 'Data Directory')
 20 | flags.DEFINE_integer('img_h', 64, 'Height')
 21 | flags.DEFINE_integer('img_w', 128, 'Width')
 22 | flags.DEFINE_integer('ds_freq', 20, 'Downsample joint trajectories by this fraction. Original recroding rate = 100Hz')
 23 | 
 24 | 
 25 | def resample(original_trajectory, desired_number_timepoints):
 26 | 	original_traj_len = len(original_trajectory)
 27 | 	new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int)
 28 | 	return original_trajectory[new_timepoints]
 29 | 
 30 | 
 31 | class MIME_Img_Dataset(Dataset):
 32 | 	'''
 33 | 	Class implementing instance of dataset class for MIME data. 
 34 | 	'''
 35 | 	def __init__(self, opts, split='all'):
 36 | 		self.dataset_directory = opts.MIME_dir
 37 | 		self.imgs_dataset_directory = opts.MIME_imgs_dir
 38 | 		self.img_h = opts.img_h
 39 | 		self.img_w = opts.img_w
 40 | 
 41 | 		# Default: /checkpoint/tanmayshankar/MIME/
 42 | 		self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/joint_angles.txt')
 43 | 		self.filelist = glob.glob(self.fulltext)
 44 | 
 45 | 		self.ds_freq = opts.ds_freq
 46 | 
 47 | 		with open(self.filelist[0], 'r') as file:
 48 | 			lines = file.readlines()
 49 | 			self.joint_names = sorted(eval(lines[0].rstrip('\n')).keys())
 50 | 
 51 | 		if split == 'all':
 52 | 			self.filelist = self.filelist
 53 | 		else:
 54 | 			self.task_lists = np.load(os.path.join(
 55 | 				self.dataset_directory, 'MIME_jointangles/{}_Lists.npy'.format(split.capitalize())))
 56 | 			self.filelist = []
 57 | 			for i in range(20):
 58 | 				self.filelist.extend(self.task_lists[i])
 59 | 			self.filelist = [f.replace('/checkpoint/tanmayshankar/MIME/', opts.MIME_dir) for f in self.filelist]
 60 | 
 61 | 	def __len__(self):
 62 | 		# Return length of file list. 
 63 | 		return len(self.filelist)
 64 | 
 65 | 	def __getitem__(self, index):
 66 | 		'''
 67 | 		# Returns Joint Angles as: 
 68 | 		# List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 
 69 | 		# Assumes index is within range [0,len(filelist)-1]
 70 | 		'''
 71 | 		file = self.filelist[index]
 72 | 		file_split = file.split('/')
 73 | 		frames_folder = osp.join(self.imgs_dataset_directory, file_split[-3], file_split[-2], 'frames')
 74 | 		n_frames = len(os.listdir(frames_folder))
 75 | 
 76 | 		imgs = []
 77 | 		frame_inds = [0, n_frames//2, n_frames-1]
 78 | 		for fi in frame_inds:
 79 | 			img = scipy.misc.imread(osp.join(frames_folder, 'im_{}.png'.format(fi+1)))
 80 | 			imgs.append(scipy.misc.imresize(img, (self.img_h, self.img_w)))
 81 | 		imgs = np.stack(imgs)
 82 | 
 83 | 		left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt'))
 84 | 		right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt'))
 85 | 
 86 | 		joint_angle_trajectory = []
 87 | 		# Open file. 
 88 | 		with open(file, 'r') as file:
 89 | 			lines = file.readlines()
 90 | 			for line in lines:
 91 | 				dict_element = eval(line.rstrip('\n'))
 92 | 				if len(dict_element.keys()) == len(self.joint_names):
 93 | 					array_element = np.array([dict_element[joint] for joint in self.joint_names])
 94 | 					joint_angle_trajectory.append(array_element)
 95 | 
 96 | 		joint_angle_trajectory = np.array(joint_angle_trajectory)
 97 | 
 98 | 		n_samples = len(joint_angle_trajectory) // self.ds_freq
 99 | 
100 | 		elem = {}
101 | 		elem['imgs'] = imgs
102 | 		elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples)
103 | 		elem['left_gripper'] = resample(left_gripper, n_samples)/100
104 | 		elem['right_gripper'] = resample(right_gripper, n_samples)/100
105 | 		elem['is_valid'] = int(np.linalg.norm(np.diff(elem['joint_angle_trajectory'],axis=0),axis=1).max() < 1.0)
106 | 
107 | 		return elem
108 | 
109 | 	def recreate_dictionary(self, arm, joint_angles):
110 | 		if arm=="left":
111 | 			offset = 2
112 | 			width = 7 
113 | 		elif arm=="right":
114 | 			offset = 9
115 | 			width = 7
116 | 		elif arm=="full":
117 | 			offset = 0
118 | 			width = len(self.joint_names)
119 | 		return dict((self.joint_names[i],joint_angles[i-offset]) for i in range(offset,offset+width))
120 | 
121 | # ------------ Data Loader ----------- #
122 | # ------------------------------------ #
123 | 
124 | def data_loader(opts, split='all', shuffle=True):
125 | 	dset = MIME_Img_Dataset(opts, split=split)
126 | 
127 | 	return DataLoader(
128 | 		dset,
129 | 		batch_size=opts.batch_size,
130 | 		shuffle=shuffle,
131 | 		num_workers=opts.n_data_workers,
132 | 		drop_last=True)
133 | 


--------------------------------------------------------------------------------
/DataLoaders/MIMEandPlan_DataLoader.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | from .headers import *
 12 | import os.path as osp
 13 | 
 14 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers')
 15 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1')
 16 | flags.DEFINE_string('MIME_dir', '/checkpoint/tanmayshankar/MIME/', 'Data Directory')
 17 | # flags.DEFINE_boolean('downsampling', True, 'Whether to downsample trajectories. ')
 18 | flags.DEFINE_integer('ds_freq', 20, 'Downsample joint trajectories by this fraction. Original recroding rate = 100Hz')
 19 | flags.DEFINE_boolean('remote', False, 'Whether operating from a remote server or not.')
 20 | # opts = flags.FLAGS
 21 | 
 22 | def resample(original_trajectory, desired_number_timepoints):
 23 | 	original_traj_len = len(original_trajectory)
 24 | 	new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int)
 25 | 	return original_trajectory[new_timepoints]
 26 | 
 27 | class MIME_Dataset(Dataset):
 28 | 	'''
 29 | 	Class implementing instance of dataset class for MIME data. 
 30 | 	'''
 31 | 	def __init__(self, opts):
 32 | 		self.dataset_directory = opts.MIME_dir
 33 | 
 34 | 		# Default: /checkpoint/tanmayshankar/MIME/
 35 | 		self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/joint_angles.txt')
 36 | 
 37 | 		if opts.remote:
 38 | 			self.suff_filelist = np.load(osp.join(self.dataset_directory,"Suffix_Filelist.npy"))
 39 | 			self.filelist = []
 40 | 			for j in range(len(self.suff_filelist)):
 41 | 				self.filelist.append(osp.join(self.dataset_directory,self.suff_filelist[j]))
 42 | 		else:
 43 | 			self.filelist = sorted(glob.glob(self.fulltext))
 44 | 
 45 | 		self.ds_freq = opts.ds_freq
 46 | 
 47 | 		with open(self.filelist[0], 'r') as file:
 48 | 			print(self.filelist[0])
 49 | 			lines = file.readlines()
 50 | 			self.joint_names = sorted(eval(lines[0].rstrip('\n')).keys())
 51 | 
 52 | 		self.train_lists = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Train_Lists.npy"))
 53 | 		self.val_lists = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Val_Lists.npy"))
 54 | 		self.test_lists = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Test_Lists.npy"))
 55 | 
 56 | 	def __len__(self):
 57 | 		# Return length of file list. 
 58 | 		return len(self.filelist)
 59 | 
 60 | 	def setup_splits(self):
 61 | 		self.train_filelist = [] 
 62 | 		self.val_filelist = [] 
 63 | 		self.test_filelist = [] 
 64 | 
 65 | 		for i in range(20):
 66 | 			self.train_filelist.extend(self.train_lists[i])
 67 | 			self.val_filelist.extend(self.val_lists[i])
 68 | 			self.test_filelist.extend(self.test_lists[i])                      
 69 | 
 70 | 	def getit(self, index, split=None, return_plan_run=None):
 71 | 		'''
 72 | 		# Returns Joint Angles as: 
 73 | 		# List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 
 74 | 		# Assumes index is within range [0,len(filelist)-1]
 75 | 		'''
 76 | 
 77 | 		if split=="train":
 78 | 			file = self.train_filelist[index]
 79 | 		elif split=="val":
 80 | 			file = self.val_filelist[index]
 81 | 		elif split=="test":
 82 | 			file = self.test_filelist[index]
 83 | 		elif split is None: 
 84 | 			file = self.filelist[index]
 85 | 			
 86 | 		left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt'))
 87 | 		right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt'))
 88 | 
 89 | 		orig_left_traj = np.load(osp.join(osp.split(file)[0], 'Left_EE.npy'))
 90 | 		orig_right_traj = np.load(osp.join(osp.split(file)[0], 'Right_EE.npy'))        
 91 | 
 92 | 		joint_angle_trajectory = []
 93 | 
 94 | 		folder = "New_Plans"
 95 | 		if return_plan_run is not None:
 96 | 			ee_plan = np.load(os.path.join(os.path.split(file)[0],"{0}/Run{1}_EE_Plan.npy".format(folder,return_plan_run)))
 97 | 			ja_plan = np.load(os.path.join(os.path.split(file)[0],"{0}/Run{1}_Joint_Plan.npy".format(folder,return_plan_run)))
 98 | 
 99 | 		# Open file. 
100 | 		with open(file, 'r') as file:
101 | 			lines = file.readlines()
102 | 			for line in lines:
103 | 				dict_element = eval(line.rstrip('\n'))
104 | 				if len(dict_element.keys()) == len(self.joint_names):
105 | 					# some files have extra lines with gripper keys e.g. MIME_jointangles/4/12405Nov19/joint_angles.txt
106 | 					array_element = np.array([dict_element[joint] for joint in self.joint_names])
107 | 					joint_angle_trajectory.append(array_element)
108 | 
109 | 		joint_angle_trajectory = np.array(joint_angle_trajectory)
110 | 
111 | 		n_samples = len(orig_left_traj) // self.ds_freq
112 | 
113 | 		elem = {}
114 | 		elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples)
115 | 		elem['left_trajectory'] = resample(orig_left_traj, n_samples)
116 | 		elem['right_trajectory'] = resample(orig_right_traj, n_samples)
117 | 		elem['left_gripper'] = resample(left_gripper, n_samples)
118 | 		elem['right_gripper'] = resample(right_gripper, n_samples)
119 | 		elem['path_prefix'] = os.path.split(self.filelist[index])[0]
120 | 		elem['JA_Plan'] = ja_plan
121 | 		elem['EE_Plan'] = ee_plan
122 | 
123 | 		return elem
124 | 
125 | 
126 | 	def __getitem__(self, index, split=None, return_plan_run=None):
127 | 	# def __getitem__(self, inputs):
128 | 		'''
129 | 		# Returns Joint Angles as: 
130 | 		# List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 
131 | 		# Assumes index is within range [0,len(filelist)-1]
132 | 		'''
133 | 
134 | 		if split=="train":
135 | 			file = self.train_filelist[index]
136 | 		elif split=="val":
137 | 			file = self.val_filelist[index]
138 | 		elif split=="test":
139 | 			file = self.test_filelist[index]
140 | 		elif split is None: 
141 | 			file = self.filelist[index]
142 | 			
143 | 		left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt'))
144 | 		right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt'))
145 | 
146 | 		orig_left_traj = np.load(osp.join(osp.split(file)[0], 'Left_EE.npy'))
147 | 		orig_right_traj = np.load(osp.join(osp.split(file)[0], 'Right_EE.npy'))        
148 | 
149 | 		joint_angle_trajectory = []
150 | 
151 | 		folder = "New_Plans"
152 | 		if return_plan_run is not None:
153 | 			ee_plan = np.load(os.path.join(os.path.split(file)[0],"{0}/Run{1}_EE_Plan.npy".format(folder,return_plan_run)))
154 | 			ja_plan = np.load(os.path.join(os.path.split(file)[0],"{0}/Run{1}_JA_Plan.npy".format(folder,return_plan_run)))
155 | 
156 | 		# Open file. 
157 | 		with open(file, 'r') as file:
158 | 			lines = file.readlines()
159 | 			for line in lines:
160 | 				dict_element = eval(line.rstrip('\n'))
161 | 				if len(dict_element.keys()) == len(self.joint_names):
162 | 					# some files have extra lines with gripper keys e.g. MIME_jointangles/4/12405Nov19/joint_angles.txt
163 | 					array_element = np.array([dict_element[joint] for joint in self.joint_names])
164 | 					joint_angle_trajectory.append(array_element)
165 | 
166 | 		joint_angle_trajectory = np.array(joint_angle_trajectory)
167 | 
168 | 		n_samples = len(orig_left_traj) // self.ds_freq
169 | 
170 | 		elem = {}
171 | 		elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples)
172 | 		elem['left_trajectory'] = resample(orig_left_traj, n_samples)
173 | 		elem['right_trajectory'] = resample(orig_right_traj, n_samples)
174 | 		elem['left_gripper'] = resample(left_gripper, n_samples)
175 | 		elem['right_gripper'] = resample(right_gripper, n_samples)
176 | 		elem['path_prefix'] = os.path.split(self.filelist[index])[0]
177 | 		elem['JA_Plan'] = ja_plan
178 | 		elem['EE_Plan'] = ee_plan
179 | 
180 | 		return elem
181 | 
182 | 	def recreate_dictionary(self, arm, joint_angles):
183 | 		if arm=="left":
184 | 			offset = 2
185 | 			width = 7 
186 | 		elif arm=="right":
187 | 			offset = 9
188 | 			width = 7
189 | 		elif arm=="full":
190 | 			offset = 0
191 | 			width = len(self.joint_names)
192 | 		return dict((self.joint_names[i],joint_angles[i-offset]) for i in range(offset,offset+width))
193 | 
194 | # ------------ Data Loader ----------- #
195 | # ------------------------------------ #
196 | def data_loader(opts, shuffle=True):
197 | 	dset = MIME_Dataset(opts)
198 | 
199 | 	return DataLoader(
200 | 		dset,
201 | 		batch_size=opts.batch_size,
202 | 		shuffle=shuffle,
203 | 		num_workers=opts.n_data_workers,
204 | 		drop_last=True)
205 | 


--------------------------------------------------------------------------------
/DataLoaders/Plan_DataLoader.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | from .headers import *
 12 | import os.path as osp
 13 | import pdb
 14 | 
 15 | # flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers')
 16 | # flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1')
 17 | # flags.DEFINE_string('MIME_dir', '/checkpoint/tanmayshankar/MIME/', 'Data Directory')
 18 | flags.DEFINE_enum('arm', 'both', ['left', 'right', 'both'], 'Which arms data to load')
 19 | 
 20 | class Plan_Dataset(Dataset):
 21 | 	'''
 22 | 	Class implementing instance of dataset class for MIME data. 
 23 | 	'''
 24 | 	def __init__(self, opts, split='all'):
 25 | 		self.opts = opts
 26 | 		self.split = split
 27 | 		self.dataset_directory = self.opts.MIME_dir
 28 | 
 29 | 		# # Must consider permutations of arm and split. 
 30 | 		# Right Arm: New_Plans / Run*_EE_Plan
 31 | 		#                      / Run*_Joint_Plan
 32 | 		#                      / Run*_RG_Traj
 33 | 
 34 | 		# Left Arm: New_Plans_Left / Run*_EE_Plan
 35 | 		#                          / Run*_Joint_Plan
 36 | 		#                          / Run*_LG_traj
 37 | 
 38 | 		# Both Arms: Ambidextrous_Plans / Run*_EE_Plan
 39 | 		#                               / Run*_Joint_Plan
 40 | 		#                               / Run*_Grip_Traj
 41 | 
 42 | 		# Set these parameters to replace. 
 43 | 		if self.opts.arm=='left':
 44 | 			folder = 'New_Plans'
 45 | 			gripper_suffix = "_LG_Traj"
 46 | 		elif self.opts.arm=='right':
 47 | 			folder = 'New_Plans_Left'
 48 | 			gripper_suffix = "_RG_Traj"
 49 | 		elif self.opts.arm=='both':
 50 | 			folder = 'Ambidextrous_Plans'
 51 | 			gripper_suffix = "_Grip_Traj"
 52 | 
 53 | 		# Default: /checkpoint/tanmayshankar/MIME/
 54 | 
 55 | 		if self.split=='all':
 56 | 			# Collect list of all EE Plans, we will select all Joint Angle Plans correspondingly. 
 57 | 			self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/New_Plans/Run*_EE_Plan.npy')
 58 | 			# Joint angle plans filelist is in same order thanks to glob. 
 59 | 			self.jatext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/New_Plans/Run*_Joint_Plan.npy')
 60 | 			# Gripper plans filelist is in same order thanks to glob. 
 61 | 			# self.rgtext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/New_Plans/Run*_RG_Traj.npy')
 62 | 
 63 | 			self.filelist = sorted(glob.glob(self.fulltext))
 64 | 			self.joint_filelist = sorted(glob.glob(self.jatext))
 65 | 			# self.gripper_filelist = sorted(glob.glob(self.rgtext))            
 66 | 
 67 | 		elif self.split=='train':
 68 | 			self.filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanTrainList.npy"))
 69 | 			self.joint_filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanJointTrainList.npy"))
 70 | 		elif self.split=='val':
 71 | 			self.filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanValList.npy"))
 72 | 			self.joint_filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanJointValList.npy"))
 73 | 		elif self.split=='test':
 74 | 			self.filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanTestList.npy"))
 75 | 			self.joint_filelist = np.load(os.path.join(self.dataset_directory,"MIME_jointangles/Plan_Lists/PlanJointTestList.npy"))
 76 | 
 77 | 		# the loaded np arrays give byte strings, and not strings, which breaks later code
 78 | 		if not isinstance(self.filelist[0], str):
 79 | 			self.filelist = [f.decode() for f in self.filelist]
 80 | 			self.joint_filelist = [f.decode() for f in self.joint_filelist]
 81 | 
 82 | 		# Now replace terms in filelists based on what arm it is. 
 83 | 		# The EE file list only needs folder replaced.
 84 | 		self.filelist = [f.replace("New_Plans",folder).replace('/checkpoint/tanmayshankar/MIME',self.opts.MIME_dir) for f in self.filelist]
 85 | 		# The Joint file list also only needs folder replaced. 
 86 | 		self.joint_filelist = [f.replace("New_Plans",folder).replace('/checkpoint/tanmayshankar/MIME',self.opts.MIME_dir) for f in self.joint_filelist]
 87 | 		# Since we didn't create split lists for Gripper, use the filelist and replace to Gripper. 
 88 | 		self.gripper_filelist = [f.replace("New_Plans",folder).replace("_EE_Plan",gripper_suffix).replace('/checkpoint/tanmayshankar/MIME',self.opts.MIME_dir) for f in self.filelist]
 89 | 
 90 | 		# Set joint names.
 91 | 		self.left_joint_names = ['left_s0','left_s1','left_e0','left_e1','left_w0','left_w1','left_w2']
 92 | 		self.right_joint_names = ['right_s0','right_s1','right_e0','right_e1','right_w0','right_w1','right_w2']
 93 | 		self.both_joint_names = self.left_joint_names+self.right_joint_names
 94 | 
 95 | 	def __len__(self):
 96 | 		# Return length of file list. 
 97 | 		return len(self.filelist)
 98 | 
 99 | 	def __getitem__(self, index):
100 | 
101 | 		file = self.filelist[index]
102 | 		joint_file = self.joint_filelist[index]
103 | 		gripper_file = self.gripper_filelist[index]
104 | 
105 | 		# Load items.
106 | 		elem = {}
107 | 		elem['EE_Plan'] = np.load(file)
108 | 		elem['JA_Plan'] = np.load(joint_file)
109 | 		elem['Grip_Plan'] = np.load(gripper_file)/100
110 | 
111 | 		return elem
112 | 
113 | # ------------ Data Loader ----------- #
114 | # ------------------------------------ #
115 | def data_loader(opts, split='all', shuffle=True):
116 | 	dset = Plan_Dataset(opts, split=split)
117 | 
118 | 	return DataLoader(
119 | 		dset,
120 | 		batch_size=opts.batch_size,
121 | 		shuffle=shuffle,
122 | 		num_workers=opts.n_data_workers,
123 | 		drop_last=True)
124 | 
125 | 


--------------------------------------------------------------------------------
/DataLoaders/RandomWalks.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | import sys
 12 | import os
 13 | import random as stdlib_random, string
 14 | 
 15 | import matplotlib
 16 | matplotlib.use('Agg')
 17 | import matplotlib.pyplot as plt
 18 | 
 19 | 
 20 | import numpy as np
 21 | 
 22 | from absl import flags, app
 23 | 
 24 | import torch
 25 | from torch.utils.data import Dataset
 26 | from torch.utils.data import DataLoader
 27 | from torch.utils.data.dataloader import default_collate
 28 | from ..utils import plotting as plot_util
 29 | 
 30 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers')
 31 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1')
 32 | flags.DEFINE_integer('n_segments_min', 4, 'Min Number of gt segments per trajectory')
 33 | flags.DEFINE_integer('n_segments_max', 4, 'Max number of gt segments per trajectory')
 34 | 
 35 | dirs_2d = np.array([
 36 |     [1,0],
 37 |     [0,1],
 38 |     [-1,0],
 39 |     [0,-1]
 40 | ])
 41 | 
 42 | 
 43 | def vis_walk(walk):
 44 |     '''
 45 |     Args:
 46 |         walk: (nT+1) X 2 array
 47 |     Returns:
 48 |         im: 200 X 200 X 4 numpy array
 49 |     '''
 50 | 
 51 |     t = walk.shape[0]
 52 |     xs = walk[:,0]
 53 |     ys = walk[:,1]
 54 |     color_inds = np.linspace(0, 255, t).astype(np.int).tolist()
 55 |     cs = plot_util.colormap[color_inds, :]
 56 | 
 57 |     fig = plt.figure(figsize=(4, 4), dpi=50)
 58 |     ax = fig.subplots()
 59 | 
 60 |     ax.scatter(xs, ys, c=cs)
 61 |     ax.set_xlim(-2, 2)
 62 |     ax.set_ylim(-2, 2)
 63 |     ax.set_aspect('equal', 'box')
 64 | 
 65 |     ax.tick_params(
 66 |         axis='x',
 67 |         which='both',
 68 |         bottom=False,
 69 |         top=False,
 70 |         labelbottom=False)
 71 | 
 72 |     ax.tick_params(
 73 |         axis='y',
 74 |         which='both',
 75 |         left=False,
 76 |         right=False,
 77 |         labelleft=False)
 78 | 
 79 |     fig.tight_layout()
 80 |     fname = '/tmp/' + ''.join(stdlib_random.choices(string.ascii_letters, k=8)) + '.png'
 81 |     fig.savefig(fname)
 82 |     plt.close(fig)
 83 | 
 84 |     im = plt.imread(fname)
 85 |     os.remove(fname)
 86 | 
 87 |     return im
 88 | 
 89 | 
 90 | def walk_segment(origin, direction, n_steps=10, step_size=0.1, noise=0.02, rng=None):
 91 |     '''
 92 |     Args:
 93 |         origin: nd numpy array
 94 |         direction: nd numpy array with unit norm
 95 |         n_steps: length of time seq
 96 |         step_size: size of each step
 97 |         noise: magintude of max actuation noise
 98 |     Returns:
 99 |         segment: n_steps X nd array
100 |             note that the first position in segment is different from origin
101 |     '''
102 |     if rng is None:
103 |         rng = np.random
104 | 
105 |     nd = origin.shape[0]
106 |     segment = np.zeros((n_steps, nd)) + origin
107 |     segment += np.arange(1, n_steps+1).reshape((-1,1))*direction*step_size
108 |     segment += rng.uniform(low=-1, high=1, size=(n_steps, nd)) * noise/nd
109 |     return segment
110 | 
111 | 
112 | def random_walk2d(origin, num_segments=4, rng=None):
113 |     '''
114 |     Args:
115 |         origin: 2d numpy array
116 |         num_segments: length of time seq
117 |     Returns:
118 |         walk: (nT+1) X 2 array
119 |     '''
120 |     if rng is None:
121 |         rng = np.random
122 | 
123 |     dir_ind = rng.randint(4)
124 |     walk = origin.reshape(1,2)
125 |     seg_lengths = []
126 |     for s in range(num_segments):
127 |         seg_length = rng.randint(6,10)
128 |         seg_lengths.append(seg_length)
129 |         step_size = 0.1 + (rng.uniform() - 0.5)*0.05
130 | 
131 |         segment = walk_segment(origin, dirs_2d[dir_ind], n_steps=seg_length, step_size=step_size, rng=rng)
132 |         origin = segment[-1]
133 |         walk = np.concatenate((walk, segment), axis=0)
134 | 
135 |         dir_ind += 2 * rng.randint(2) -1
136 |         dir_ind = dir_ind % 4
137 | 
138 |     return walk, seg_lengths
139 | 
140 | 
141 | class RandomWalksDataset(Dataset):
142 | 
143 |     def __init__(self, opts):
144 |         self.opts = opts
145 |         self.n_segments_min = self.opts.n_segments_min
146 |         self.n_segments_max = self.opts.n_segments_max
147 | 
148 |     def __len__(self):
149 |         return int(1e6)
150 | 
151 |     def __getitem__(self, ix):
152 |         rng = np.random.RandomState(ix)
153 |         ns = rng.randint(self.n_segments_min, self.n_segments_max+1)
154 |         trajectory, self.seg_lengths_ix = random_walk2d(np.zeros(2), num_segments=ns, rng=rng)
155 |         return trajectory
156 | 
157 | # ------------ Data Loader ----------- #
158 | # ------------------------------------ #
159 | def data_loader(opts, shuffle=True):
160 |     dset = RandomWalksDataset(opts)
161 | 
162 |     return DataLoader(
163 |         dset,
164 |         batch_size=opts.batch_size,
165 |         shuffle=shuffle,
166 |         num_workers=opts.n_data_workers,
167 |         drop_last=True)
168 | 
169 | 
170 | if __name__ == '__main__':
171 |     walk = random_walk2d(np.zeros(2), num_segments=4)
172 |     print(walk)
173 | 


--------------------------------------------------------------------------------
/DataLoaders/RandomWalks.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/RandomWalks.pyc


--------------------------------------------------------------------------------
/DataLoaders/RoboturkeExp.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | 
  8 | """
  9 | A convenience script to playback random demonstrations from
 10 | a set of demonstrations stored in a hdf5 file.
 11 | Arguments:
 12 |     --folder (str): Path to demonstrations
 13 |     --use_actions (optional): If this flag is provided, the actions are played back 
 14 |         through the MuJoCo simulator, instead of loading the simulator states
 15 |         one by one.
 16 | Example:
 17 |     $ python playback_demonstrations_from_hdf5.py --folder ../models/assets/demonstrations/SawyerPickPlace/
 18 | """
 19 | import os
 20 | import h5py
 21 | import argparse
 22 | import random
 23 | import numpy as np
 24 | 
 25 | import robosuite
 26 | from robosuite.utils.mjcf_utils import postprocess_model_xml
 27 | from IPython import embed
 28 | 
 29 | if __name__ == "__main__":
 30 |     parser = argparse.ArgumentParser()
 31 |     parser.add_argument(
 32 |         "--folder",
 33 |         type=str,
 34 |         default=os.path.join(
 35 |             robosuite.models.assets_root, "demonstrations/SawyerNutAssembly"
 36 |         ),
 37 |     )
 38 |     parser.add_argument(
 39 |         "--use-actions", 
 40 |         action='store_true',
 41 |     )
 42 |     args = parser.parse_args()
 43 | 
 44 |     demo_path = args.folder
 45 |     hdf5_path = os.path.join(demo_path, "demo.hdf5")
 46 |     f = h5py.File(hdf5_path, "r")
 47 |     env_name = f["data"].attrs["env"]
 48 | 
 49 |     env = robosuite.make(
 50 |         env_name,
 51 |         has_renderer=False,
 52 |         # has_renderer=True,
 53 |         ignore_done=True,
 54 |         use_camera_obs=False,
 55 |         gripper_visualization=True,
 56 |         reward_shaping=True,
 57 |         control_freq=100,
 58 |     )
 59 | 
 60 |     # list of all demonstrations episodes
 61 |     demos = list(f["data"].keys())
 62 | 
 63 |     while True:
 64 |         print("Playing back random episode... (press ESC to quit)")
 65 | 
 66 |         # # select an episode randomly
 67 |         ep = random.choice(demos)
 68 | 
 69 |         # read the model xml, using the metadata stored in the attribute for this episode
 70 |         model_file = f["data/{}".format(ep)].attrs["model_file"]
 71 |         model_path = os.path.join(demo_path, "models", model_file)
 72 |         with open(model_path, "r") as model_f:
 73 |             model_xml = model_f.read()
 74 | 
 75 |         env.reset()
 76 |         xml = postprocess_model_xml(model_xml)
 77 |         env.reset_from_xml_string(xml)
 78 |         env.sim.reset()
 79 |         # env.viewer.set_camera(0)
 80 | 
 81 |         # load the flattened mujoco states
 82 |         states = f["data/{}/states".format(ep)].value
 83 | 
 84 |         if args.use_actions:
 85 | 
 86 |             # load the initial state
 87 |             env.sim.set_state_from_flattened(states[0])
 88 |             env.sim.forward()
 89 | 
 90 |             # load the actions and play them back open-loop
 91 |             jvels = f["data/{}/joint_velocities".format(ep)].value
 92 |             grip_acts = f["data/{}/gripper_actuations".format(ep)].value
 93 |             actions = np.concatenate([jvels, grip_acts], axis=1)
 94 |             num_actions = actions.shape[0]
 95 | 
 96 |             for j, action in enumerate(actions):
 97 |                 env.step(action)
 98 |                 # env.render()
 99 | 
100 |                 if j < num_actions - 1:
101 |                     # ensure that the actions deterministically lead to the same recorded states
102 |                     state_playback = env.sim.get_state().flatten()
103 | 
104 |                     embed()
105 |                     assert(np.all(np.equal(states[j + 1], state_playback)))
106 | 
107 |         else:
108 | 
109 |             print("Embedding in not use actions branch")
110 |             embed()
111 |             # force the sequence of internal mujoco states one by one
112 |             for state in states:
113 |                 env.sim.set_state_from_flattened(state)
114 |                 env.sim.forward()
115 |                 # env.render()
116 | 
117 |     f.close()


--------------------------------------------------------------------------------
/DataLoaders/SmallMaps_DataLoader.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | from headers import *
 8 | 
 9 | class GridWorldDataset(Dataset):
10 | 
11 | 	# Class implementing instance of dataset class for gridworld data. 
12 | 
13 | 	def __init__(self, dataset_directory):
14 | 		self.dataset_directory = dataset_directory
15 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
16 | 
17 | 		self.action_map = np.array([[-1,0],[1,0],[0,-1],[0,1],[-1,-1],[-1,1],[1,-1],[1,1]])
18 | 		## UP, DOWN, LEFT, RIGHT, UPLEFT, UPRIGHT, DOWNLEFT, DOWNRIGHT. ##
19 | 
20 | 	def __len__(self):
21 | 
22 | 		# Find out how many images we've stored. 
23 | 		filelist = glob.glob(os.path.join(self.dataset_directory,"*.png"))		
24 | 		return 4000
25 | 		# return len(filelist)		
26 | 
27 | 	def parse_trajectory_actions(self, coordinate_trajectory):
28 | 		# Takes coordinate trajectory, returns action index taken. 
29 | 
30 | 		state_diffs = np.diff(coordinate_trajectory,axis=0)
31 | 		action_sequence = np.zeros((len(state_diffs)),dtype=int)
32 | 
33 | 		for i in range(len(state_diffs)):
34 | 			for k in range(len(self.action_map)):
35 | 				if (state_diffs[i]==self.action_map[k]).all():
36 | 					action_sequence[i]=k
37 | 
38 | 		return action_sequence.astype(float)
39 | 
40 | 	def __getitem__(self, index):
41 | 
42 | 		# The getitem function must return a Map-Trajectory pair. 
43 | 		# We will handle per-timestep processes within our code. 
44 | 		# Assumes index is within range [0,len(filelist)-1]
45 | 		image = np.load(os.path.join(self.dataset_directory,"Map{0}.npy".format(index)))
46 | 		time_limit = 20
47 | 		coordinate_trajectory = np.load(os.path.join(self.dataset_directory,"Map{0}_Traj1.npy".format(index))).astype(float)[:time_limit]
48 | 		action_sequence = self.parse_trajectory_actions(coordinate_trajectory)		
49 | 
50 | 		return image, coordinate_trajectory, action_sequence


--------------------------------------------------------------------------------
/DataLoaders/Translation.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | from __future__ import unicode_literals
 11 | 
 12 | from .headers import *
 13 | import os.path as osp
 14 | 
 15 | from io import open
 16 | import unicodedata
 17 | import string
 18 | import re
 19 | import random
 20 | 
 21 | import torch
 22 | import torch.nn as nn
 23 | from torch import optim
 24 | import torch.nn.functional as F
 25 | 
 26 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 27 | 
 28 | flags.DEFINE_integer('n_data_workers', 4, 'Number of data loading workers')
 29 | flags.DEFINE_integer('batch_size', 1, 'Batch size. Code currently only handles bs=1')
 30 | flags.DEFINE_string('lang_dir', '/private/home/shubhtuls/code/sfd/cachedir/data/lang/', 'Data Directory')
 31 | 
 32 | SOS_token = 0
 33 | EOS_token = 1
 34 | 
 35 | 
 36 | class Lang:
 37 |     def __init__(self, name):
 38 |         self.name = name
 39 |         self.word2index = {}
 40 |         self.word2count = {}
 41 |         self.index2word = {0: "SOS", 1: "EOS"}
 42 |         self.n_words = 2  # Count SOS and EOS
 43 | 
 44 |     def addSentence(self, sentence):
 45 |         for word in sentence.split(' '):
 46 |             self.addWord(word)
 47 | 
 48 |     def addWord(self, word):
 49 |         if word not in self.word2index:
 50 |             self.word2index[word] = self.n_words
 51 |             self.word2count[word] = 1
 52 |             self.index2word[self.n_words] = word
 53 |             self.n_words += 1
 54 |         else:
 55 |             self.word2count[word] += 1
 56 | 
 57 | # Turn a Unicode string to plain ASCII, thanks to
 58 | # https://stackoverflow.com/a/518232/2809427
 59 | def unicodeToAscii(s):
 60 |     return ''.join(
 61 |         c for c in unicodedata.normalize('NFD', s)
 62 |         if unicodedata.category(c) != 'Mn'
 63 |     )
 64 | 
 65 | # Lowercase, trim, and remove non-letter characters
 66 | def normalizeString(s):
 67 |     s = unicodeToAscii(s.lower().strip())
 68 |     s = re.sub(r"([.!?])", r" \1", s)
 69 |     s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
 70 |     return s
 71 | 
 72 | 
 73 | def readLangs(data_dir, lang1, lang2, reverse=False):
 74 |     print("Reading lines...")
 75 | 
 76 |     # Read the file and split into lines
 77 |     lines = open(osp.join(data_dir, '%s-%s.txt' % (lang1, lang2)), encoding='utf-8').\
 78 |         read().strip().split('\n')
 79 | 
 80 |     # Split every line into pairs and normalize
 81 |     pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]
 82 | 
 83 |     # Reverse pairs, make Lang instances
 84 |     if reverse:
 85 |         pairs = [list(reversed(p)) for p in pairs]
 86 |         input_lang = Lang(lang2)
 87 |         output_lang = Lang(lang1)
 88 |     else:
 89 |         input_lang = Lang(lang1)
 90 |         output_lang = Lang(lang2)
 91 | 
 92 |     return input_lang, output_lang, pairs
 93 | 
 94 | 
 95 | MAX_LENGTH = 10
 96 | 
 97 | eng_prefixes = (
 98 |     "i am ", "i m ",
 99 |     "he is", "he s ",
100 |     "she is", "she s ",
101 |     "you are", "you re ",
102 |     "we are", "we re ",
103 |     "they are", "they re "
104 | )
105 | 
106 | 
107 | def filterPair(p):
108 |     return len(p[0].split(' ')) < MAX_LENGTH and \
109 |         len(p[1].split(' ')) < MAX_LENGTH
110 |         # and \
111 |         # p[1].startswith(eng_prefixes)
112 | 
113 | 
114 | def filterPairs(pairs):
115 |     return [pair for pair in pairs if filterPair(pair)]
116 | 
117 | 
118 | def prepareData(data_dir, lang1, lang2, reverse=False):
119 |     input_lang, output_lang, pairs = readLangs(data_dir, lang1, lang2, reverse)
120 |     print("Read %s sentence pairs" % len(pairs))
121 |     pairs = filterPairs(pairs)
122 |     print("Trimmed to %s sentence pairs" % len(pairs))
123 |     print("Counting words...")
124 |     for pair in pairs:
125 |         input_lang.addSentence(pair[0])
126 |         output_lang.addSentence(pair[1])
127 |     print("Counted words:")
128 |     print(input_lang.name, input_lang.n_words)
129 |     print(output_lang.name, output_lang.n_words)
130 |     return input_lang, output_lang, pairs
131 | 
132 | 
133 | def indexesFromSentence(lang, sentence):
134 |     return [lang.word2index[word] for word in sentence.split(' ')]
135 | 
136 | 
137 | def tensorFromSentence(lang, sentence):
138 |     indexes = indexesFromSentence(lang, sentence)
139 |     indexes.append(EOS_token)
140 |     return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)
141 | 
142 | 
143 | class TranslationDataset(Dataset):
144 |     '''
145 |     Class implementing instance of dataset class for MIME data. 
146 |     '''
147 |     def __init__(self, opts):
148 |         self.dataset_directory = opts.lang_dir
149 |         self.l1, self.l2, self.pairs = prepareData(self.dataset_directory, 'eng', 'fra', reverse=False)
150 | 
151 |     def __len__(self):
152 |         # Return length of file list. 
153 |         return len(self.l1)
154 | 
155 |     def tensorsFromPair(self, pair):
156 |         input_tensor = tensorFromSentence(self.l1, pair[0])
157 |         target_tensor = tensorFromSentence(self.l2, pair[1])
158 |         return (input_tensor, target_tensor)
159 | 
160 |     def __getitem__(self, index):
161 |         elem = {}
162 |         elem['pair'] = self.pairs[index]
163 |         elem['l1'], elem['l2'] = self.tensorsFromPair(elem['pair'])
164 | 
165 |         return elem


--------------------------------------------------------------------------------
/DataLoaders/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/__init__.py


--------------------------------------------------------------------------------
/DataLoaders/__init__.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/__init__.pyc


--------------------------------------------------------------------------------
/DataLoaders/headers.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | import numpy as np
 8 | import torch
 9 | import glob, cv2, os
10 | from torch.utils.data import Dataset, DataLoader
11 | from torchvision import transforms, utils
12 | from absl import flags
13 | from IPython import embed
14 | from absl import flags, app


--------------------------------------------------------------------------------
/DataLoaders/headers.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facebookresearch/CausalSkillLearning/b840101102017455d79a4e6bfa21af929c9cf4de/DataLoaders/headers.pyc


--------------------------------------------------------------------------------
/DownstreamRL/PolicyNet.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | 
  8 | from ..SkillNetwork.headers import *
  9 | from ..SkillNetwork.LSTMNetwork import LSTMNetwork, LSTMNetwork_Fixed
 10 | 
 11 | class PolicyNetwork(torch.nn.Module):
 12 | 
 13 | 	def __init__(self, opts, input_size, hidden_size, output_size, fixed=True):
 14 | 		
 15 | 		super(PolicyNetwork, self).__init__()
 16 | 
 17 | 		self.opts = opts 
 18 | 		self.input_size = input_size
 19 | 		self.hidden_size = hidden_size
 20 | 		self.output_size = output_size
 21 | 
 22 | 		if fixed:
 23 | 			self.lstmnet = LSTMNetwork_Fixed(input_size=input_size, hidden_size=hidden_size, output_size=output_size).cuda()
 24 | 		else:
 25 | 			self.lstmnet = LSTMNetwork(input_size=input_size, hidden_size=hidden_size, output_size=output_size).cuda()
 26 | 
 27 | 		# Create linear layer to split prediction into mu and sigma. 
 28 | 		self.mu_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz)
 29 | 		self.sig_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz)
 30 | 
 31 | 		# Stopping probability predictor. (Softmax, not sigmoid)
 32 | 		self.stopping_probability_layer = torch.nn.Linear(self.hidden_size, 2)	
 33 | 		self.softmax_layer = torch.nn.Softmax(dim=-1)
 34 | 
 35 | 	def forward(self, input):
 36 | 
 37 | 		format_input = torch.tensor(input).view(1,1,self.input_size).cuda().float()
 38 | 		predicted_Z_preparam, stop_probabilities = self.lstmnet.forward(format_input)
 39 | 
 40 | 		predicted_Z_preparam = predicted_Z_preparam.squeeze(1)
 41 | 
 42 | 		self.latent_z_seq = []
 43 | 		self.latent_mu_seq = []
 44 | 		self.latent_log_sigma_seq = []
 45 | 		self.kld_loss = 0.
 46 | 
 47 | 		t = 0
 48 | 
 49 | 		# Remember, the policy is Gaussian (so we can implement VAE-KLD on it).
 50 | 		latent_z_mu_seq = self.mu_linear_layer(predicted_Z_preparam)
 51 | 		latent_z_log_sig_seq = self.sig_linear_layer(predicted_Z_preparam)	
 52 | 
 53 | 		# Compute standard deviation. 
 54 | 		std = torch.exp(0.5*latent_z_log_sig_seq).cuda()
 55 | 		# Sample random variable. 
 56 | 		eps = torch.randn_like(std).cuda()
 57 | 
 58 | 		self.latent_z_seq = latent_z_mu_seq+eps*std
 59 | 
 60 | 		# Compute KL Divergence Loss term here, so we don't have to return mu's and sigma's. 
 61 | 		self.kld_loss = torch.zeros(1)
 62 | 		for t in range(latent_z_mu_seq.shape[0]):
 63 | 			# Taken from mime_plan_skill.py Line 159 - KL Divergence for Gaussian prior and Gaussian prediction. 
 64 | 			self.kld_loss += -0.5 * torch.sum(1. + latent_z_log_sig_seq[t] - latent_z_mu_seq[t].pow(2) - latent_z_log_sig_seq[t].exp())
 65 | 	
 66 | 		# Create distributions so that we can evaluate log probability. 	
 67 | 		self.dists = [torch.distributions.MultivariateNormal(loc = latent_z_mu_seq[t], covariance_matrix = std[t]*torch.eye((self.opts.nz)).cuda()) for t in range(latent_z_mu_seq.shape[0])]
 68 | 
 69 | 		# Evaluate log probability in forward so we don't have to do it elswhere. 
 70 | 		self.log_probs = [self.dists[i].log_prob(self.latent_z_seq[i]) for i in range(self.latent_z_seq.shape[0])]
 71 | 
 72 | 		return self.latent_z_seq, stop_probabilities
 73 | 
 74 | class PolicyNetworkSingleTimestep(torch.nn.Module):
 75 | 
 76 | 	# Policy Network inherits from torch.nn.Module. 
 77 | 	# Now we overwrite the init, forward functions. And define anything else that we need. 
 78 | 
 79 | 	def __init__(self, opts, input_size, hidden_size, output_size):
 80 | 
 81 | 		# Ensures inheriting from torch.nn.Module goes nicely and cleanly. 	
 82 | 		super(PolicyNetworkSingleTimestep, self).__init__()
 83 | 
 84 | 		self.opts = opts
 85 | 		self.input_size = input_size
 86 | 		self.hidden_size = hidden_size
 87 | 		self.output_size = output_size
 88 | 		self.num_layers = 4
 89 | 		self.maximum_length = 15
 90 | 
 91 | 		# Define a bidirectional LSTM now.
 92 | 		self.lstm = torch.nn.LSTM(input_size=self.input_size,hidden_size=self.hidden_size,num_layers=self.num_layers)
 93 | 
 94 | 		# Define output layers for the LSTM, and activations for this output layer. 
 95 | 		self.output_layer = torch.nn.Linear(self.hidden_size, self.output_size)		
 96 | 		# Create linear layer to split prediction into mu and sigma. 
 97 | 		self.mu_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz)
 98 | 		self.sig_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz)
 99 | 
100 | 		# Stopping probability predictor. (Softmax, not sigmoid)
101 | 		self.stopping_probability_layer = torch.nn.Linear(self.hidden_size, 2)	
102 | 
103 | 		self.softmax_layer = torch.nn.Softmax(dim=-1)
104 | 		self.logsoftmax_layer = torch.nn.LogSoftmax(dim=-1)
105 | 
106 | 	def forward(self, input, hidden=None):
107 | 		# Input format must be: Sequence_Length x 1 x Input_Size. 
108 | 		# Assuming input is a numpy array. 		
109 | 		format_input = torch.tensor(input).view(input.shape[0],1,self.input_size).cuda().float()
110 | 		
111 | 		# Instead of iterating over time and passing each timestep's input to the LSTM, we can now just pass the entire input sequence.
112 | 		outputs, hidden = self.lstm(format_input, hidden)
113 | 
114 | 		# Predict parameters	
115 | 		latentz_preparam = self.output_layer(outputs[-1])
116 | 		# Remember, the policy is Gaussian (so we can implement VAE-KLD on it).
117 | 		latent_z_mu = self.mu_linear_layer(latentz_preparam)
118 | 		latent_z_log_sig = self.sig_linear_layer(latentz_preparam)	
119 | 
120 | 		# Predict stop probability. 
121 | 		preact_stop_probs = self.stopping_probability_layer(outputs[-1])
122 | 		stop_probability = self.softmax_layer(preact_stop_probs)
123 | 
124 | 		stop = self.sample_action(stop_probability)
125 | 
126 | 		# Remember, the policy is Gaussian (so we can implement VAE-KLD on it).
127 | 		latent_z_mu = self.mu_linear_layer(latentz_preparam)
128 | 		latent_z_log_sig = self.sig_linear_layer(latentz_preparam)	
129 | 
130 | 		# Compute standard deviation. 
131 | 		std = torch.exp(0.5*latent_z_log_sig).cuda()
132 | 		# Sample random variable. 
133 | 		eps = torch.randn_like(std).cuda()
134 | 
135 | 		latent_z = latent_z_mu+eps*std
136 | 
137 | 		# Compute KL Divergence Loss term here, so we don't have to return mu's and sigma's. 
138 | 		# Taken from mime_plan_skill.py Line 159 - KL Divergence for Gaussian prior and Gaussian prediction. 
139 | 		kld_loss = -0.5 * torch.sum(1. + latent_z_log_sig - latent_z_mu.pow(2) - latent_z_log_sig.exp())
140 | 	
141 | 		# Create distributions so that we can evaluate log probability. 	
142 | 		dist = torch.distributions.MultivariateNormal(loc = latent_z_mu, covariance_matrix = std*torch.eye((self.opts.nz)).cuda())
143 | 
144 | 		# Evaluate log probability in forward so we don't have to do it elswhere. 
145 | 		log_prob = dist.log_prob(latent_z)
146 | 
147 | 		return latent_z, stop_probability, stop, log_prob, kld_loss, hidden
148 | 
149 | 	def sample_action(self, action_probabilities):
150 | 		# Categorical distribution sampling. 
151 | 		sample_action = torch.distributions.Categorical(probs=action_probabilities).sample().squeeze(0)
152 | 		return sample_action
153 | 
154 | class AltPolicyNetworkSingleTimestep(torch.nn.Module):
155 | 
156 | 	# Policy Network inherits from torch.nn.Module. 
157 | 	# Now we overwrite the init, forward functions. And define anything else that we need. 
158 | 
159 | 	def __init__(self, opts, input_size, hidden_size, output_size):
160 | 
161 | 		# Ensures inheriting from torch.nn.Module goes nicely and cleanly. 	
162 | 		super(AltPolicyNetworkSingleTimestep, self).__init__()
163 | 
164 | 		self.opts = opts
165 | 		self.input_size = input_size
166 | 		self.hidden_size = hidden_size
167 | 		self.output_size = output_size
168 | 		self.num_layers = 4
169 | 		self.maximum_length = 15
170 | 
171 | 		# Define a bidirectional LSTM now.
172 | 		self.lstm = torch.nn.LSTM(input_size=self.input_size,hidden_size=self.hidden_size,num_layers=self.num_layers)
173 | 
174 | 		# Define output layers for the LSTM, and activations for this output layer. 
175 | 		self.output_layer = torch.nn.Linear(self.hidden_size, self.output_size)		
176 | 		# Create linear layer to split prediction into mu and sigma. 
177 | 		self.mu_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz)
178 | 		self.sig_linear_layer = torch.nn.Linear(self.opts.nz, self.opts.nz)
179 | 		self.softplus_activation_layer = torch.nn.Softplus()
180 | 
181 | 		# Stopping probability predictor. (Softmax, not sigmoid)
182 | 		self.stopping_probability_layer = torch.nn.Linear(self.hidden_size, 2)	
183 | 
184 | 		self.softmax_layer = torch.nn.Softmax(dim=-1)
185 | 		self.logsoftmax_layer = torch.nn.LogSoftmax(dim=-1)
186 | 
187 | 	def forward(self, input, hidden=None):
188 | 		# Input format must be: Sequence_Length x 1 x Input_Size. 
189 | 		# Assuming input is a numpy array. 		
190 | 		format_input = torch.tensor(input).view(input.shape[0],1,self.input_size).cuda().float()
191 | 		
192 | 		# Instead of iterating over time and passing each timestep's input to the LSTM, we can now just pass the entire input sequence.
193 | 		outputs, hidden = self.lstm(format_input, hidden)
194 | 
195 | 		# Predict parameters	
196 | 		latentz_preparam = self.output_layer(outputs[-1])
197 | 		# Remember, the policy is Gaussian (so we can implement VAE-KLD on it).
198 | 		latent_z_mu = self.mu_linear_layer(latentz_preparam)
199 | 		latent_z_log_sig = self.sig_linear_layer(latentz_preparam)	
200 | 		latent_z_sig = self.softplus_activation_layer(self.sig_linear_layer(latentz_preparam))
201 | 
202 | 		# Predict stop probability. 
203 | 		preact_stop_probs = self.stopping_probability_layer(outputs[-1])
204 | 		stop_probability = self.softmax_layer(preact_stop_probs)
205 | 
206 | 		stop = self.sample_action(stop_probability)
207 | 
208 | 		# Create distributions so that we can evaluate log probability. 	
209 | 		dist = torch.distributions.MultivariateNormal(loc = latent_z_mu, covariance_matrix = torch.diag_embed(latent_z_sig))
210 | 
211 | 		latent_z = dist.sample()
212 | 		
213 | 		# Evaluate log probability in forward so we don't have to do it elswhere. 
214 | 		log_prob = dist.log_prob(latent_z)
215 | 
216 | 
217 | 		# Set standard distribution for KL. 
218 | 		standard_distribution = torch.distributions.MultivariateNormal(torch.zeros((self.output_size)).cuda(),torch.eye((self.output_size)).cuda())
219 | 		# Compute KL.
220 | 		kl_divergence = torch.distributions.kl_divergence(dist, standard_distribution)
221 | 
222 | 		return latent_z, stop_probability, stop, log_prob, kl_divergence, hidden
223 | 
224 | 	def sample_action(self, action_probabilities):
225 | 		# Categorical distribution sampling. 
226 | 		sample_action = torch.distributions.Categorical(probs=action_probabilities).sample().squeeze(0)
227 | 		return sample_action


--------------------------------------------------------------------------------
/DownstreamRL/TrainZPolicyRL.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | 
  8 | """
  9 | 
 10 | # For both arms and grippers. 
 11 | python -m SkillsfromDemonstrations.Experiments.UseSkillsRL.TrainZPolicyRL --train --transformer --nz=64 --nh=64 --variable_nseg=False --network_dir=saved_models/T356_fnseg_vae_sl2pt0_kldwt0pt002_finetune --variable_ns=False --st_space=joint_both_gripper --vae_enc 
 12 | """
 13 | 
 14 | from __future__ import absolute_import
 15 | 
 16 | import os, sys, torch
 17 | import matplotlib.pyplot as plt
 18 | from ...DataLoaders import MIME_DataLoader
 19 | from ..abstraction import mime_eval
 20 | from ..abstraction.abstraction_utils import ScoreFunctionEstimator
 21 | from .PolicyNet import PolicyNetwork, PolicyNetworkSingleTimestep, AltPolicyNetworkSingleTimestep
 22 | from absl import app, flags
 23 | import imageio, numpy as np, copy, os, shutil
 24 | from IPython import embed
 25 | import robosuite
 26 | import tensorboard, tensorboardX
 27 | 
 28 | flags.DEFINE_boolean('train',False,'Whether to run train.')
 29 | flags.DEFINE_boolean('debug',False,'Whether to debug.')
 30 | # flags.DEFINE_float('sf_loss_wt', 0.1, 'Weight of pseudo loss for SF estimator')
 31 | # flags.DEFINE_float('kld_loss_wt', 0, 'Weight for KL Divergence loss if using VAE encoder.')
 32 | flags.DEFINE_float('reinforce_loss_wt', 1., 'Weight for primary reinforce loss.')
 33 | # flags.DEFINE_string('name',None,'Name to give run.')
 34 | 
 35 | class ZPolicyTrainer(object):
 36 | 
 37 | 	def __init__(self, opts):
 38 | 
 39 | 		self.opts = opts
 40 | 
 41 | 		self.input_size = self.opts.n_state
 42 | 		self.zpolicy_input_size = 85
 43 | 		self.hidden_size = 20
 44 | 		self.output_size = self.opts.nz
 45 | 
 46 | 		self.primitive_length = 10
 47 | 		self.learning_rate = 1e-4
 48 | 		self.number_epochs = 200
 49 | 		self.number_episodes = 500
 50 | 		self.save_every_epoch = 5
 51 | 		self.maximum_skills = 6
 52 | 
 53 | 	def initialize_plots(self):
 54 | 		self.log_dir = os.path.join("SkillsfromDemonstrations/cachedir/logs/RL",self.opts.name)
 55 | 		if not(os.path.isdir(self.log_dir)):
 56 | 			os.mkdir(self.log_dir)
 57 | 		self.writer = tensorboardX.SummaryWriter(self.log_dir)
 58 | 
 59 | 	def setup_networks(self):
 60 | 		# Set up evaluator to load mime model and stuff.
 61 | 		self.evaluator = mime_eval.PrimitiveDiscoverEvaluator(self.opts)
 62 | 		self.evaluator.setup_testing(split='val')
 63 | 
 64 | 		# Also create a ZPolicy.
 65 | 		# self.z_policy = PolicyNetworkSingleTimestep(opts=self.opts, input_size=self.zpolicy_input_size, hidden_size=self.hidden_size, output_size=self.output_size).cuda()
 66 | 		self.z_policy = AltPolicyNetworkSingleTimestep(opts=self.opts, input_size=self.zpolicy_input_size, hidden_size=self.hidden_size, output_size=self.output_size).cuda()
 67 | 
 68 | 		if self.opts.variable_nseg:
 69 | 			self.sf_loss_fn = ScoreFunctionEstimator()
 70 | 		
 71 | 		# Creating optimizer. 
 72 | 		self.z_policy_optimizer = torch.optim.Adam(self.z_policy.parameters(), lr=self.learning_rate)
 73 | 		
 74 | 	def load_network(self, network_dir):
 75 | 		# Load the evaluator networks (Abstraction network and skill network)    
 76 | 		self.evaluator.load_network(self.evaluator.model, 'pred', 'latest', network_dir=network_dir)
 77 | 
 78 | 		# Freeze parameters of the IntendedTrajectoryPredictorModel.
 79 | 		for parameter in self.evaluator.model.parameters():
 80 | 			parameter.require_grad = False
 81 | 
 82 | 	def save_zpolicy_model(self, path, suffix):
 83 | 		if not(os.path.isdir(path)):
 84 | 			os.mkdir(path)
 85 | 		save_object = {}
 86 | 		save_object['ZPolicy'] = self.z_policy.state_dict()
 87 | 		torch.save(save_object,os.path.join(path,"ZPolicyModel"+suffix))
 88 | 
 89 | 	def load_all_models(self, path):
 90 | 		load_object = torch.load(path)
 91 | 		self.z_policy.load_state_dict(load_object['ZPolicy'])
 92 | 
 93 | 	# def update_plots(self, counter, sample_map, loglikelihood):
 94 | 	def update_plots(self, counter):
 95 | 
 96 | 		if self.opts.variable_nseg:
 97 | 			self.writer.add_scalar('Stop_Prob_Reinforce_Loss', torch.mean(self.stop_prob_reinforce_loss), counter)
 98 | 		self.writer.add_scalar('Predicted_Zs_Reinforce_Loss', torch.mean(self.reinforce_predicted_Zs), counter)
 99 | 		self.writer.add_scalar('KL_Divergence_Loss', torch.mean(self.kld_loss_seq), counter)
100 | 		self.writer.add_scalar('Total_Loss', torch.mean(self.total_loss), counter)
101 | 		
102 | 	def assemble_input(self, trajectory):
103 | 		traj_start = trajectory[0]
104 | 		traj_end = trajectory[-1]
105 | 		return torch.cat([torch.tensor(traj_start).cuda(),torch.tensor(traj_end).cuda()],dim=0)
106 | 
107 | 	# def update_networks(self, state_traj, reward_traj, predicted_Zs):
108 | 	def update_networks(self, state_traj_torch, reward_traj, latent_z_seq, log_prob_seq, stop_prob_seq, stop_seq, kld_loss_seq):
109 | 		# embed()
110 | 		# Get cummulative rewards corresponding to actions executed after selecting a particular Z. -# This is basically adding up the rewards from the end of the array. 
111 | 		# cumm_reward_to_go = torch.cumsum(torch.tensor(reward_traj[::-1]).cuda().float())[::-1]
112 | 		cumm_reward_to_go_numpy = copy.deepcopy(np.cumsum(copy.deepcopy(reward_traj[::-1]))[::-1])
113 | 		cumm_reward_to_go = torch.tensor(cumm_reward_to_go_numpy).cuda().float()
114 | 
115 | 		self.total_loss = 0.
116 | 
117 | 		if self.opts.variable_nseg:
118 | 			# Remember, this stop probability loss is for stopping predicting Z's, #NOT INTERMEDIATE TIMESTEPS! 
119 | 			# So we still use cumm_reward_to_go rather than cumm_reward_to_go_array
120 | 
121 | 			self.stop_prob_reinforce_loss = self.sf_loss_fn.forward(cumm_reward_to_go, stop_prob_seq.unsqueeze(1), stop_seq.long()) 
122 | 			# Add reinforce loss and loss value.             
123 | 			self.total_loss += self.opts.sf_loss_wt*self.stop_prob_reinforce_loss
124 | 
125 | 		# Now adding the reinforce loss associated with predicted Zs. 
126 | 		# (Remember, we want to maximize reward times log prob, so multiply by -1 to minimize.)
127 | 
128 | 		self.reinforce_predicted_Zs = (self.opts.reinforce_loss_wt * -1. * cumm_reward_to_go*log_prob_seq.view(-1)).sum()
129 | 		self.total_loss += self.reinforce_predicted_Zs
130 | 
131 | 		# Add loss term with KL Divergence between 0 mean Gaussian and predicted Zs. 
132 | 
133 | 		self.kld_loss_seq = kld_loss_seq
134 | 		self.total_loss += self.opts.kld_loss_wt*self.kld_loss_seq[0]
135 | 
136 | 		# Zero gradients of optimizer, compute backward, then step optimizer. 
137 | 		self.z_policy_optimizer.zero_grad()
138 | 		self.total_loss.sum().backward()
139 | 		self.z_policy_optimizer.step()
140 | 
141 | 	def reorder_actions(self, actions):
142 | 
143 | 		# Assume that the actions are 16 dimensional, and are ordered as: 
144 | 		# 7 DoF for left arm, 7 DoF for right arm, 1 for left gripper, and 1 for right gripper. 
145 | 
146 | 		# The original trajectory has gripper values from 0 (Close) to 1 (Open), but we've to rescale to -1 (Open) to 1 (Close) for Mujoco. 
147 | 		# And handle joint velocities.
148 | 		# MIME Gripper values are from 0 to 100 (Close to Open), but we assume actions has values from 0 to 1 (Close to Open), and then rescale to (-1 Open to 1 Close) for Mujoco.
149 | 		# Mujoco needs them flipped.
150 | 
151 | 		indices = np.array([ 7,  8,  9, 10, 11, 12, 13,  0,  1,  2,  3,  4,  5,  6, 15, 14])
152 | 		reordered_actions = actions[:,indices]
153 | 		reordered_actions[:,14:] = 1 - 2*reordered_actions[:,14:]
154 | 		return reordered_actions
155 | 
156 | 	def run_episode(self, counter):
157 | 
158 | 		# For number of epochs:
159 | 		#   # 1) Given start and goal (for reaching task, say)
160 | 		#   # 2) Run Z_Policy on start and goal to retrieve predicted Zs.
161 | 		#   # 3) Decode predicted Zs into trajectory. 
162 | 		#   # 4) Retrieve "actions" from trajectory. 
163 | 		#   # 5) Feed "actions" into RL environment and collect reward. 
164 | 		#   # 6) Train ZPolicy to maximize cummulative reward with favorite RL algorithm. 
165 | 
166 | 		# Reset environment. 
167 | 		state = self.environment.reset()
168 | 		terminal = False
169 | 		reward_traj = None		
170 | 		state_traj_torch = None
171 | 		t_out = 0
172 | 		stop = False
173 | 		hidden = None
174 | 		latent_z_seq = None
175 | 		stop_prob_seq = None
176 | 		stop_seq = None
177 | 		log_prob_seq = None
178 | 		kld_loss_seq = 0.
179 | 		previous_state = None
180 | 
181 | 		while terminal==False and stop==False:
182 | 			
183 | 			########################################################
184 | 			######## 1) Collect input for first timestep. ##########
185 | 			########################################################
186 | 			zpolicy_input = np.concatenate([state['robot-state'],state['object-state']]).reshape(1,self.zpolicy_input_size)
187 | 
188 | 			########################################################
189 | 			# 2) Feed into the Z policy to retrieve the predicted Z.
190 | 			######################################################## 
191 | 			latent_z, stop_probability, stop, log_prob, kld_loss, hidden = self.z_policy.forward(zpolicy_input, hidden=hidden)
192 | 			latent_z = latent_z.squeeze(1)
193 | 
194 | 			########################################################
195 | 			############## 3) Decode into trajectory. ##############
196 | 			########################################################
197 | 
198 | 			primitive_and_skill_stop_prob = self.evaluator.model.primitive_decoder(latent_z)			
199 | 			traj_seg = primitive_and_skill_stop_prob[0].squeeze(1).detach().cpu().numpy()					
200 | 
201 | 			if previous_state is None:
202 | 				previous_state = traj_seg[-1].reshape(1,self.opts.n_state)
203 | 			else:
204 | 				# Concatenate previous state to trajectory, so that when we take actions we get an action from previous segment to the current one. 
205 | 				traj_seg = np.concatenate([previous_state,traj_seg],axis=0)
206 | 				previous_state = traj_seg[-1].reshape(-1,self.opts.n_state)
207 | 
208 | 			########################################################
209 | 			## 4) Finite diff along time axis to retrieve actions ##
210 | 			########################################################
211 | 			actions = np.diff(traj_seg,axis=0)
212 | 			actions = self.reorder_actions(actions)
213 | 			actions_torch = torch.tensor(actions).cuda().float()
214 | 		
215 | 			cummulative_reward_in_segment = 0.			
216 | 			# Run step into evironment for all actions in this segment. 
217 | 			t = 0
218 | 			while t<actions_torch.shape[0] and terminal==False:
219 | 				
220 | 				# Step. 
221 | 				state, onestep_reward, terminal, success = self.environment.step(actions[t])		
222 | 
223 | 				# Collect onestep_rewards within this segment. 
224 | 				cummulative_reward_in_segment += float(onestep_reward)
225 | 				# Assuming we have fixed_ns (i.e. novariable_ns), we can use the set decoding length of primitives to assign cummulative reward-to-go values to the various predicted Z variables. 
226 | 				# (This is also why we need the reward history, and not just the cummulative rewards obtained over the course of training.
227 | 				
228 | 				t+=1 
229 | 
230 | 			# Everything is going to be set to None, so set variables. 
231 | 			# Do some bookkeeping in life.
232 | 			if t_out==0:
233 | 				state_traj_torch = torch.tensor(zpolicy_input).cuda().float().view(-1,self.zpolicy_input_size)
234 | 				latent_z_seq = latent_z.view(-1,self.opts.nz)
235 | 				stop_seq = stop.clone().detach().view(-1,1)
236 | 				stop_prob_seq = stop_probability.view(-1,2)
237 | 				log_prob_seq = log_prob.view(-1,1)
238 | 				# reward_traj = torch.tensor(copy.deepcopy(cummulative_reward_in_segment)).cuda().float().view(-1,1)
239 | 				reward_traj = np.array(cummulative_reward_in_segment).reshape((1,1))
240 | 			else:
241 | 				state_traj_torch = torch.cat([state_traj_torch, torch.tensor(zpolicy_input).cuda().float().view(-1,self.zpolicy_input_size)],dim=0)
242 | 				latent_z_seq = torch.cat([latent_z_seq, latent_z.view(-1,self.opts.nz)], dim=0)				
243 | 				stop_seq = torch.cat([stop_seq, stop.view(-1,1)], dim=0)
244 | 				stop_prob_seq = torch.cat([stop_prob_seq, stop_probability.view(-1,2)], dim=0)
245 | 				log_prob_seq = torch.cat([log_prob_seq, log_prob.view(-1,1)], dim=0)
246 | 				# reward_traj = torch.cat([reward_traj.view(-1,1), torch.tensor(copy.deepcopy(cummulative_reward_in_segment)).cuda().float().view(-1,1)])
247 | 				reward_traj = np.concatenate([reward_traj, np.array(cummulative_reward_in_segment).reshape((1,1))], axis=0)
248 | 
249 | 			# Either way: 
250 | 			kld_loss_seq += kld_loss
251 | 			t_out += 1 	
252 | 			# print(t_out)			
253 | 
254 | 			# Set to false by default. 
255 | 			if self.opts.variable_nseg==False:
256 | 				stop = False
257 | 
258 | 			if t_out>=self.maximum_skills:
259 | 				stop = True
260 | 
261 | 			# if self.opts.debug==True:
262 | 			# 	embed()
263 | 
264 | 		if self.opts.train:
265 | 			# 6) Feed states, actions, reward, and predicted Zs to update. (These are all lists of tensors.)
266 | 			# self.update_networks(state_traj_torch, action_torch, reward_traj, latent_zs)
267 | 			self.update_networks(state_traj_torch, reward_traj, latent_z_seq, log_prob_seq, stop_prob_seq, stop_seq, kld_loss_seq)
268 | 			self.update_plots(counter)
269 | 
270 | 	def setup_RL_environment(self, has_display=False):
271 | 		
272 | 		# Create Mujoco environment. 
273 | 		self.environment = robosuite.make("BaxterLift", has_renderer=has_display)
274 | 		self.initialize_plots()
275 | 
276 | 	def trainRL(self):
277 | 
278 | 
279 | 		# Basic function to train.		
280 | 		counter = 0
281 | 
282 | 		for e in range(self.number_epochs):
283 | 
284 | 			# Number of episodes per epoch.
285 | 			for i in range(self.number_episodes):
286 | 
287 | 				print("#########################################")
288 | 				print("Epoch: ",e,"Traj: ",i)
289 | 
290 | 				# Run an episode.
291 | 				self.run_episode(counter)               
292 | 
293 | 				counter += 1
294 | 
295 | 			if self.opts.train and e%self.save_every_epoch==0:
296 | 				self.save_zpolicy_model(os.path.join("saved_models/RL",self.opts.name), "epoch{0}".format(e))
297 | 
298 | def main(_):
299 | 	
300 | 	# This is only to be executed for notebooks. 
301 | 	# flags.FLAGS([''])    
302 | 	opts = flags.FLAGS
303 | 
304 | 	# Set state space. 
305 | 	if opts.st_space == 'ee_r' or opts.st_space == 'ee_l':
306 | 		opts.n_state = 7
307 | 	if opts.st_space == 'joint_ra' or opts.st_space == 'joint_la':
308 | 		opts.n_state = 7
309 | 	if opts.st_space == 'joint_both':
310 | 		opts.n_state = 14
311 | 	elif opts.st_space == 'ee_all':
312 | 		opts.n_state = 14
313 | 	elif opts.st_space == 'joint':
314 | 		opts.n_state = 17
315 | 	elif opts.st_space =='joint_both_gripper':
316 | 		opts.n_state = 16
317 | 
318 | 	opts.logging_dir = os.path.join(opts.logging_dir, 'mime')
319 | 	opts.transformer = True
320 | 
321 | 	torch.manual_seed(0)  
322 | 
323 | 	# Create instance of class.
324 | 	zpolicy_trainer = ZPolicyTrainer(opts)
325 | 	zpolicy_trainer.setup_networks()
326 | 	zpolicy_trainer.setup_RL_environment()
327 | 	# Still need this to load primitive decoder network.
328 | 	zpolicy_trainer.load_network(opts.network_dir)
329 | 	zpolicy_trainer.trainRL()    
330 | 	
331 | 
332 | if __name__ == '__main__':
333 | 	app.run(main)
334 | 


--------------------------------------------------------------------------------
/Experiments/Code_Runs/CycleTransfer_Runs.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates.
2 | 
3 | # Debugging cycle consistency transfer.
4 | 
5 | python Master.py --name=CTdebug --train=1 --setting=cycle_transfer --source_domain=ContinuousNonZero --target_domain=ContinuousNonZero --z_dimensions=64 --number_layers=5 --hidden_size=64 --data=ContinuousNonZero --training_phase_size=10000 --display_freq=1000 --eval_freq=4 --alternating_phase_size=200 --discriminator_phase_size=2 --vae_loss_weight=1. --discriminability_weight=2.0 --kl_weight=0.001
6 | 


--------------------------------------------------------------------------------
/Experiments/DMP.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from headers import *
  8 | 
  9 | class DMP():
 10 | 	
 11 | 	# def __init__(self, time_steps=100, num_ker=25, dimensions=3, kernel_bandwidth=None, alphaz=None, time_basis=False):
 12 | 	def __init__(self, time_steps=40, num_ker=15, dimensions=7, kernel_bandwidth=3.5, alphaz=5., time_basis=True):
 13 | 		# DMP(dimensions=7,time_steps=40,num_ker=15,kernel_bandwidth=3.5,alphaz=5.,time_basis=True)
 14 | 
 15 | 		# self.alphaz = 25.0
 16 | 		if alphaz is not None:
 17 | 			self.alphaz = alphaz
 18 | 		else:
 19 | 			self.alphaz = 10.
 20 | 		self.betaz = self.alphaz/4
 21 | 		self.alpha = self.alphaz/3
 22 | 		
 23 | 		self.time_steps = time_steps
 24 | 		self.tau = self.time_steps
 25 | 		# self.tau = 1.
 26 | 		self.use_time_basis = time_basis
 27 | 
 28 | 		self.dimensions = dimensions
 29 | 		# self.number_kernels = max(500,self.time_steps)
 30 | 		self.number_kernels = num_ker
 31 | 		if kernel_bandwidth is not None:
 32 | 			self.kernel_bandwidth = kernel_bandwidth
 33 | 		else:
 34 | 			self.kernel_bandwidth = self.calculate_good_sigma(self.time_steps, self.number_kernels)
 35 | 		self.epsilon = 0.001
 36 | 		self.setup()
 37 | 		
 38 | 	def setup(self):
 39 | 
 40 | 		self.gaussian_kernels = np.zeros((self.number_kernels,2))
 41 | 
 42 | 		self.weights = np.zeros((self.number_kernels, self.dimensions))
 43 | 
 44 | 		self.demo_pos = np.zeros((self.time_steps, self.dimensions))
 45 | 		self.demo_vel = np.zeros((self.time_steps, self.dimensions))
 46 | 		self.demo_acc = np.zeros((self.time_steps, self.dimensions))
 47 | 
 48 | 		self.target_forces = np.zeros((self.time_steps, self.dimensions))        
 49 | 		self.phi = np.zeros((self.number_kernels, self.time_steps, self.time_steps))
 50 | 		self.eta = np.zeros((self.time_steps, self.dimensions))
 51 | 		self.vector_phase = np.zeros(self.time_steps)	
 52 |         
 53 | 		# Defining Rollout variables.
 54 | 		self.rollout_time = self.time_steps
 55 | 		self.dt = 1./self.rollout_time
 56 | 		self.pos_roll = np.zeros((self.rollout_time,self.dimensions))
 57 | 		self.vel_roll = np.zeros((self.rollout_time,self.dimensions))
 58 | 		self.acc_roll = np.zeros((self.rollout_time,self.dimensions))
 59 | 		self.force_roll = np.zeros((self.rollout_time,self.dimensions))        
 60 | 		self.goal = np.zeros(self.dimensions)
 61 | 		self.start = np.zeros(self.dimensions)        
 62 | 
 63 | 	def calculate_good_sigma(self, time, number_kernels, threshold=0.15):
 64 | 		return time/(2*(number_kernels-1)*(np.sqrt(-np.log(threshold))))
 65 | 
 66 | 	def load_trajectory(self,pos,vel=None,acc=None):
 67 | 
 68 | 		self.demo_pos = np.zeros((self.time_steps, self.dimensions))
 69 | 		self.demo_vel = np.zeros((self.time_steps, self.dimensions))
 70 | 		self.demo_acc = np.zeros((self.time_steps, self.dimensions))
 71 | 
 72 | 		if vel is not None and acc is not None: 
 73 | 			self.demo_pos = copy.deepcopy(pos)
 74 | 			self.demo_vel = copy.deepcopy(vel)
 75 | 			self.demo_acc = copy.deepcopy(acc)
 76 | 		else: 
 77 | 			self.smooth_interpolate(pos)
 78 | 
 79 | 	def smooth_interpolate(self, pos):
 80 | 		# Filter the posiiton input by Gaussian smoothing. 
 81 | 		smooth_pos = gaussian_filter1d(pos,3.5,axis=0,mode='nearest')
 82 | 
 83 | 		time_range = np.linspace(0, pos.shape[0]-1, pos.shape[0])
 84 | 		new_time_range = np.linspace(0,pos.shape[0]-1,self.time_steps+2)
 85 | 
 86 | 		self.interpolated_pos = np.zeros((self.time_steps+2,self.dimensions))
 87 | 		interpolating_objects = []
 88 | 
 89 | 		for i in range(self.dimensions):
 90 | 			interpolating_objects.append(interp1d(time_range,pos[:,i],kind='linear'))
 91 | 			self.interpolated_pos[:,i] = interpolating_objects[i](new_time_range)
 92 | 
 93 | 		self.demo_vel = np.diff(self.interpolated_pos,axis=0)[:self.time_steps]
 94 | 		self.demo_acc = np.diff(self.interpolated_pos,axis=0,n=2)[:self.time_steps]
 95 | 		self.demo_pos = self.interpolated_pos[:self.time_steps]
 96 | 
 97 | 	def initialize_variables(self):	
 98 | 		self.weights = np.zeros((self.number_kernels, self.dimensions))
 99 | 		self.target_forces = np.zeros((self.time_steps, self.dimensions))
100 | 		self.phi = np.zeros((self.number_kernels, self.time_steps, self.time_steps))
101 | 		self.eta = np.zeros((self.time_steps, self.dimensions))
102 | 
103 | 		self.kernel_centers = np.linspace(0,self.time_steps,self.number_kernels)
104 | 
105 | 		self.vector_phase = self.calc_vector_phase(self.kernel_centers)
106 | 		self.gaussian_kernels[:,0] = self.vector_phase
107 | 
108 | 		# Different kernel parameters that have worked before, giving different behavior. 		
109 | 		# # dummy = (np.diff(self.gaussian_kernels[:,0]*0.55))**2        		
110 | 		# # dummy = (np.diff(self.gaussian_kernels[:,0]*2))**2
111 | 		# # dummy = (np.diff(self.gaussian_kernels[:,0]))**2        						
112 | 
113 | 		dummy = (np.diff(self.gaussian_kernels[:,0]*self.kernel_bandwidth))**2
114 | 		self.gaussian_kernels[:,1] = 1. / np.append(dummy,dummy[-1])
115 | 		
116 | 		# self.gaussian_kernels[:,1] = self.number_kernels/self.gaussian_kernels[:,0]
117 | 
118 | 	def calc_phase(self,time):
119 | 		return np.exp(-self.alpha*float(time)/self.tau)
120 | 
121 | 	def calc_vector_phase(self,time):
122 | 		return np.exp(-self.alpha*time.astype(float)/self.tau)
123 | 
124 | 	def basis(self,index,time):
125 | 		return np.exp(-(self.gaussian_kernels[index,1])*((self.calc_phase(time)-self.gaussian_kernels[index,0])**2))
126 | 
127 | 	def time_basis(self, index, time):
128 | 		# return np.exp(-(self.gaussian_kernels[index,1])*((time-self.kernel_centers[index])**2))
129 | 		# return np.exp(-(time-self.kernel_centers[index])**2)
130 | 		return np.exp(-((time-self.kernel_centers[index])**2)/(self.kernel_bandwidth))
131 | 
132 | 	def vector_basis(self, index, time_range):
133 | 		return np.exp(-(self.gaussian_kernels[index,1])*((self.calc_vector_phase(time_range)-self.gaussian_kernels[index,0])**2))
134 | 
135 | 	def update_target_force_itau(self):
136 | 		self.target_forces = (self.tau**2)*self.demo_acc - self.alphaz*(self.betaz*(self.demo_pos[self.time_steps-1]-self.demo_pos)-self.tau*self.demo_vel)
137 | 
138 | 	def update_target_force_dtau(self):
139 | 		self.target_forces = self.demo_acc/(self.tau**2) - self.alphaz*(self.betaz*(self.demo_pos[self.time_steps-1]-self.demo_pos)-self.demo_vel/self.tau)    
140 | 
141 | 	def update_target_force(self):
142 | 		self.target_forces = self.demo_acc - self.alphaz*(self.betaz*(self.demo_pos[self.time_steps-1]-self.demo_pos)-self.demo_vel)
143 | 
144 | 	def update_phi(self):		
145 | 		for i in range(self.number_kernels):
146 | 			for t in range(self.time_steps):
147 | 				if self.use_time_basis:
148 | 					self.phi[i,t,t] = self.time_basis(i,t)
149 | 				else:
150 | 					self.phi[i,t,t] = self.basis(i,t)
151 |                 
152 | 	def update_eta(self):        
153 | 		t_range = np.linspace(0,self.time_steps,self.time_steps)        
154 | 		vector_phase = self.calc_vector_phase(t_range)        
155 | 
156 | 		for k in range(self.dimensions):
157 | 			self.eta[:,k] = vector_phase*(self.demo_pos[self.time_steps-1,k]-self.demo_pos[0,k])
158 | 
159 | 	def learn_DMP(self, pos, forces="i"):
160 | 		self.setup()
161 | 		self.load_trajectory(pos)
162 | 		self.initialize_variables()
163 | 		self.learn_weights(forces=forces)
164 | 
165 | 	def learn_weights(self, forces="i"):
166 | 
167 | 		if forces=="i":
168 | 			self.update_target_force_itau()    
169 | 		elif forces=="d":
170 | 			self.update_target_force_dtau() 
171 | 		elif forces=="n":
172 | 			self.update_target_force() 
173 | 		self.update_phi()
174 | 		self.update_eta()
175 | 
176 | 		for j in range(self.dimensions):
177 | 			for i in range(self.number_kernels):
178 | 				self.weights[i,j] = np.dot(self.eta[:,j],np.dot(self.phi[i],self.target_forces[:,j]))
179 | 				self.weights[i,j] /= np.dot(self.eta[:,j],np.dot(self.phi[i],self.eta[:,j])) + self.epsilon 	
180 | 
181 | 	def initialize_rollout(self,start,goal,init_vel):
182 | 
183 | 		self.pos_roll = np.zeros((self.rollout_time,self.dimensions))
184 | 		self.vel_roll = np.zeros((self.rollout_time,self.dimensions))
185 | 		self.acc_roll = np.zeros((self.rollout_time,self.dimensions))
186 | 
187 | 		self.tau = self.rollout_time		
188 | 		self.pos_roll[0] = copy.deepcopy(start)
189 | 		self.vel_roll[0] = copy.deepcopy(init_vel)
190 | 		self.goal = goal
191 | 		self.start = start
192 | 		self.dt = self.tau/self.rollout_time   		
193 | 		# print(self.dt,self.tau,self.rollout_time)
194 | 
195 | 	def calc_rollout_force(self, roll_time):
196 | 		den = 0        		
197 | 		time = copy.deepcopy(roll_time)
198 | 		for i in range(self.number_kernels):
199 | 			
200 | 			if self.use_time_basis:
201 | 				self.force_roll[roll_time] += self.time_basis(i,time)*self.weights[i]
202 | 				den += self.time_basis(i,time)
203 | 			else:
204 | 				self.force_roll[roll_time] += self.basis(i,time)*self.weights[i]
205 | 				den += self.basis(i,time)
206 | 			
207 | 		self.force_roll[roll_time] *= (self.goal-self.start)*self.calc_phase(time)/den
208 |         
209 | 	def calc_rollout_acceleration(self,time):        
210 | 		self.acc_roll[time] = (1./self.tau**2)*(self.alphaz * (self.betaz * (self.goal - self.pos_roll[time]) - self.tau*self.vel_roll[time]) + self.force_roll[time])
211 |         
212 | 	def calc_rollout_vel(self,time):		
213 | 		self.vel_roll[time] = self.vel_roll[time-1] + self.acc_roll[time-1]*self.dt
214 | 
215 | 	def calc_rollout_pos(self,time):
216 | 		self.pos_roll[time] = self.pos_roll[time-1] + self.vel_roll[time-1]*self.dt
217 | 
218 | 	def rollout(self,start,goal,init_vel):
219 | 		self.initialize_rollout(start,goal,init_vel)
220 | 		self.calc_rollout_force(0)
221 | 		self.calc_rollout_acceleration(0)
222 | 		for i in range(1,self.rollout_time):        
223 | 			self.calc_rollout_force(i)		
224 | 			self.calc_rollout_vel(i)
225 | 			self.calc_rollout_pos(i)   
226 | 			self.calc_rollout_acceleration(i)
227 | 		return self.pos_roll
228 | 
229 | 	def load_weights(self, weight):
230 | 		self.weights = copy.deepcopy(weight)	
231 | 
232 | def main(args):    
233 | 
234 | 	pos = np.load(str(sys.argv[1]))[:,:3]
235 | 	vel = np.load(str(sys.argv[2]))[:,:3]
236 | 	acc = np.load(str(sys.argv[3]))[:,:3]
237 | 
238 | 	rolltime = 500
239 | 	dmp = DMP(rolltime)
240 | 
241 | 	dmp.load_trajectory(pos)
242 | 	dmp.initialize_variables()
243 | 	dmp.learn_DMP()
244 | 		
245 | 	start = np.zeros(dmp.dimensions)	
246 | 	goal = np.ones(dmp.dimensions)
247 | 	norm_vector = pos[-1]-pos[0]
248 | 	init_vel = np.divide(vel[0],norm_vector)	
249 | 
250 | 	dmp.rollout(start, goal, init_vel)	
251 | 	dmp.save_rollout()
252 | 
253 | 


--------------------------------------------------------------------------------
/Experiments/DataLoaders.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from headers import *
  8 | 
  9 | class GridWorldDataset(Dataset):
 10 | 
 11 | 	# Class implementing instance of dataset class for gridworld data. 
 12 | 
 13 | 	def __init__(self, dataset_directory):
 14 | 		self.dataset_directory = dataset_directory
 15 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
 16 | 
 17 | 		self.action_map = np.array([[-1,0],[1,0],[0,-1],[0,1],[-1,-1],[-1,1],[1,-1],[1,1]])
 18 | 		## UP, DOWN, LEFT, RIGHT, UPLEFT, UPRIGHT, DOWNLEFT, DOWNRIGHT. ##
 19 | 
 20 | 	def __len__(self):
 21 | 
 22 | 		# Find out how many images we've stored. 
 23 | 		filelist = glob.glob(os.path.join(self.dataset_directory,"*.png"))
 24 | 
 25 | 		# FOR NOW: USE ONLY till 3200 images. 		
 26 | 		return 3200
 27 | 		# return len(filelist)		
 28 | 
 29 | 	def parse_trajectory_actions(self, coordinate_trajectory):
 30 | 		# Takes coordinate trajectory, returns action index taken. 
 31 | 
 32 | 		state_diffs = np.diff(coordinate_trajectory,axis=0)
 33 | 		action_sequence = np.zeros((len(state_diffs)),dtype=int)
 34 | 
 35 | 		for i in range(len(state_diffs)):
 36 | 			for k in range(len(self.action_map)):
 37 | 				if (state_diffs[i]==self.action_map[k]).all():
 38 | 					action_sequence[i]=k
 39 | 
 40 | 		return action_sequence.astype(float)
 41 | 
 42 | 	def __getitem__(self, index):
 43 | 
 44 | 		# The getitem function must return a Map-Trajectory pair. 
 45 | 		# We will handle per-timestep processes within our code. 
 46 | 		# Assumes index is within range [0,len(filelist)-1]
 47 | 		image = cv2.imread(os.path.join(self.dataset_directory,"Image{0}.png".format(index)))
 48 | 		coordinate_trajectory = np.load(os.path.join(self.dataset_directory,"Image{0}_Traj1.npy".format(index))).astype(float)
 49 | 
 50 | 		action_sequence = self.parse_trajectory_actions(coordinate_trajectory)		
 51 | 
 52 | 		return image, coordinate_trajectory, action_sequence
 53 | 
 54 | class SmallMapsDataset(Dataset):
 55 | 
 56 | 	# Class implementing instance of dataset class for gridworld data. 
 57 | 
 58 | 	def __init__(self, dataset_directory):
 59 | 		self.dataset_directory = dataset_directory
 60 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
 61 | 
 62 | 		self.action_map = np.array([[-1,0],[1,0],[0,-1],[0,1],[-1,-1],[-1,1],[1,-1],[1,1]])
 63 | 		## UP, DOWN, LEFT, RIGHT, UPLEFT, UPRIGHT, DOWNLEFT, DOWNRIGHT. ##
 64 | 
 65 | 	def __len__(self):
 66 | 
 67 | 		# Find out how many images we've stored. 
 68 | 		filelist = glob.glob(os.path.join(self.dataset_directory,"*.png"))		
 69 | 		return 4000
 70 | 		# return len(filelist)		
 71 | 
 72 | 	def parse_trajectory_actions(self, coordinate_trajectory):
 73 | 		# Takes coordinate trajectory, returns action index taken. 
 74 | 
 75 | 		state_diffs = np.diff(coordinate_trajectory,axis=0)
 76 | 		action_sequence = np.zeros((len(state_diffs)),dtype=int)
 77 | 
 78 | 		for i in range(len(state_diffs)):
 79 | 			for k in range(len(self.action_map)):
 80 | 				if (state_diffs[i]==self.action_map[k]).all():
 81 | 					action_sequence[i]=k
 82 | 
 83 | 		return action_sequence.astype(float)
 84 | 
 85 | 	def __getitem__(self, index):
 86 | 
 87 | 		# The getitem function must return a Map-Trajectory pair. 
 88 | 		# We will handle per-timestep processes within our code. 
 89 | 		# Assumes index is within range [0,len(filelist)-1]
 90 | 		image = np.load(os.path.join(self.dataset_directory,"Map{0}.npy".format(index)))
 91 | 		time_limit = 20
 92 | 		coordinate_trajectory = np.load(os.path.join(self.dataset_directory,"Map{0}_Traj1.npy".format(index))).astype(float)[:time_limit]
 93 | 		action_sequence = self.parse_trajectory_actions(coordinate_trajectory)		
 94 | 
 95 | 		return image, coordinate_trajectory, action_sequence
 96 | 
 97 | class ToyDataset(Dataset):
 98 | 
 99 | 	# Class implementing instance of dataset class for toy data. 
100 | 
101 | 	def __init__(self, dataset_directory):
102 | 		self.dataset_directory = dataset_directory
103 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
104 | 
105 | 		self.x_path = os.path.join(self.dataset_directory,"X_array_actions.npy")
106 | 		self.a_path = os.path.join(self.dataset_directory,"A_array_actions.npy")
107 | 
108 | 		self.X_array = np.load(self.x_path)
109 | 		self.A_array = np.load(self.a_path)
110 | 
111 | 	def __len__(self):
112 | 		return 50000
113 | 
114 | 	def __getitem__(self, index):
115 | 
116 | 		# Return trajectory and action sequence.
117 | 		return self.X_array[index],self.A_array[index]
118 | 
119 | class ContinuousToyDataset(Dataset):
120 | 
121 | 	# Class implementing instance of dataset class for toy data. 
122 | 
123 | 	def __init__(self, dataset_directory):
124 | 		self.dataset_directory = dataset_directory
125 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
126 | 
127 | 		self.x_path = os.path.join(self.dataset_directory,"X_array_continuous.npy")
128 | 		self.a_path = os.path.join(self.dataset_directory,"A_array_continuous.npy")
129 | 		self.y_path = os.path.join(self.dataset_directory,"Y_array_continuous.npy")
130 | 		self.b_path = os.path.join(self.dataset_directory,"B_array_continuous.npy")
131 | 
132 | 		self.X_array = np.load(self.x_path)
133 | 		self.A_array = np.load(self.a_path)
134 | 		self.Y_array = np.load(self.y_path)
135 | 		self.B_array = np.load(self.b_path)
136 | 
137 | 	def __len__(self):
138 | 		return 50000
139 | 
140 | 	def __getitem__(self, index):
141 | 
142 | 		# Return trajectory and action sequence.
143 | 		return self.X_array[index],self.A_array[index]
144 | 
145 | 	def get_latent_variables(self, index):
146 | 		return self.B_array[index],self.Y_array[index]
147 | 
148 | class ContinuousDirectedToyDataset(Dataset):
149 | 
150 | 	# Class implementing instance of dataset class for toy data. 
151 | 
152 | 	def __init__(self, dataset_directory):
153 | 		self.dataset_directory = dataset_directory
154 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
155 | 
156 | 		self.x_path = os.path.join(self.dataset_directory,"X_array_directed_continuous.npy")
157 | 		self.a_path = os.path.join(self.dataset_directory,"A_array_directed_continuous.npy")
158 | 		self.y_path = os.path.join(self.dataset_directory,"Y_array_directed_continuous.npy")
159 | 		self.b_path = os.path.join(self.dataset_directory,"B_array_directed_continuous.npy")
160 | 
161 | 		self.X_array = np.load(self.x_path)
162 | 		self.A_array = np.load(self.a_path)
163 | 		self.Y_array = np.load(self.y_path)
164 | 		self.B_array = np.load(self.b_path)
165 | 
166 | 	def __len__(self):
167 | 		return 50000
168 | 
169 | 	def __getitem__(self, index):
170 | 
171 | 		# Return trajectory and action sequence.
172 | 		return self.X_array[index],self.A_array[index]
173 | 
174 | 	def get_latent_variables(self, index):
175 | 		return self.B_array[index],self.Y_array[index]
176 | 
177 | class ContinuousNonZeroToyDataset(Dataset):
178 | 
179 | 	# Class implementing instance of dataset class for toy data. 
180 | 
181 | 	def __init__(self, dataset_directory):
182 | 		self.dataset_directory = dataset_directory
183 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
184 | 
185 | 		self.x_path = os.path.join(self.dataset_directory,"X_array_continuous_nonzero.npy")
186 | 		self.a_path = os.path.join(self.dataset_directory,"A_array_continuous_nonzero.npy")
187 | 		self.y_path = os.path.join(self.dataset_directory,"Y_array_continuous_nonzero.npy")
188 | 		self.b_path = os.path.join(self.dataset_directory,"B_array_continuous_nonzero.npy")
189 | 
190 | 		self.X_array = np.load(self.x_path)
191 | 		self.A_array = np.load(self.a_path)
192 | 		self.Y_array = np.load(self.y_path)
193 | 		self.B_array = np.load(self.b_path)
194 | 
195 | 	def __len__(self):
196 | 		return 50000
197 | 
198 | 	def __getitem__(self, index):
199 | 
200 | 		# Return trajectory and action sequence.
201 | 		return self.X_array[index],self.A_array[index]
202 | 
203 | 	def get_latent_variables(self, index):
204 | 		return self.B_array[index],self.Y_array[index]
205 | 
206 | class ContinuousDirectedNonZeroToyDataset(Dataset):
207 | 
208 | 	# Class implementing instance of dataset class for toy data. 
209 | 
210 | 	def __init__(self, dataset_directory):
211 | 		self.dataset_directory = dataset_directory
212 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
213 | 
214 | 		self.x_path = os.path.join(self.dataset_directory,"X_dir_cont_nonzero.npy")
215 | 		self.a_path = os.path.join(self.dataset_directory,"A_dir_cont_nonzero.npy")
216 | 		self.y_path = os.path.join(self.dataset_directory,"Y_dir_cont_nonzero.npy")
217 | 		self.b_path = os.path.join(self.dataset_directory,"B_dir_cont_nonzero.npy")
218 | 		self.g_path = os.path.join(self.dataset_directory,"G_dir_cont_nonzero.npy")
219 | 
220 | 		self.X_array = np.load(self.x_path)
221 | 		self.A_array = np.load(self.a_path)
222 | 		self.Y_array = np.load(self.y_path)
223 | 		self.B_array = np.load(self.b_path)
224 | 		self.G_array = np.load(self.g_path)
225 | 
226 | 	def __len__(self):
227 | 		return 50000
228 | 
229 | 	def __getitem__(self, index):
230 | 
231 | 		# Return trajectory and action sequence.
232 | 		return self.X_array[index],self.A_array[index]
233 | 
234 | 	def get_latent_variables(self, index):
235 | 		return self.B_array[index],self.Y_array[index]
236 | 
237 | class GoalDirectedDataset(Dataset):
238 | 
239 | 	# Class implementing instance of dataset class for toy data. 
240 | 
241 | 	def __init__(self, dataset_directory):
242 | 		self.dataset_directory = dataset_directory
243 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
244 | 
245 | 		self.x_path = os.path.join(self.dataset_directory,"X_goal_directed.npy")
246 | 		self.a_path = os.path.join(self.dataset_directory,"A_goal_directed.npy")
247 | 		self.y_path = os.path.join(self.dataset_directory,"Y_goal_directed.npy")
248 | 		self.b_path = os.path.join(self.dataset_directory,"B_goal_directed.npy")
249 | 		self.g_path = os.path.join(self.dataset_directory,"G_goal_directed.npy")
250 | 
251 | 		self.X_array = np.load(self.x_path)
252 | 		self.A_array = np.load(self.a_path)
253 | 		self.Y_array = np.load(self.y_path)
254 | 		self.B_array = np.load(self.b_path)
255 | 		self.G_array = np.load(self.g_path)
256 | 
257 | 	def __len__(self):
258 | 		return 50000
259 | 
260 | 	def __getitem__(self, index):
261 | 
262 | 		# Return trajectory and action sequence.
263 | 		return self.X_array[index],self.A_array[index]
264 | 
265 | 	def get_latent_variables(self, index):
266 | 		return self.B_array[index],self.Y_array[index]
267 | 
268 | 	def get_goal(self, index):
269 | 		return self.G_array[index]
270 | 
271 | class DeterministicGoalDirectedDataset(Dataset):
272 | 
273 | 	# Class implementing instance of dataset class for toy data. 
274 | 
275 | 	def __init__(self, dataset_directory):
276 | 		self.dataset_directory = dataset_directory
277 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
278 | 
279 | 		self.x_path = os.path.join(self.dataset_directory,"X_deter_goal_directed.npy")
280 | 		self.a_path = os.path.join(self.dataset_directory,"A_deter_goal_directed.npy")
281 | 		self.y_path = os.path.join(self.dataset_directory,"Y_deter_goal_directed.npy")
282 | 		self.b_path = os.path.join(self.dataset_directory,"B_deter_goal_directed.npy")
283 | 		self.g_path = os.path.join(self.dataset_directory,"G_deter_goal_directed.npy")
284 | 
285 | 		self.X_array = np.load(self.x_path)
286 | 		self.A_array = np.load(self.a_path)
287 | 		self.Y_array = np.load(self.y_path)
288 | 		self.B_array = np.load(self.b_path)
289 | 		self.G_array = np.load(self.g_path)
290 | 
291 | 		self.goal_states = np.array([[-1,-1],[-1,1],[1,-1],[1,1]])*5
292 | 
293 | 	def __len__(self):
294 | 		return 50000
295 | 
296 | 	def __getitem__(self, index):
297 | 
298 | 		# Return trajectory and action sequence.
299 | 		return self.X_array[index],self.A_array[index]
300 | 
301 | 	def get_latent_variables(self, index):
302 | 		return self.B_array[index],self.Y_array[index]
303 | 
304 | 	def get_goal(self, index):
305 | 		return self.G_array[index]
306 | 
307 | 	def get_goal_position(self, index):
308 | 		return self.goal_states[self.G_array[index]]
309 | 
310 | class SeparableDataset(Dataset):
311 | 
312 | 	# Class implementing instance of dataset class for toy data. 
313 | 
314 | 	def __init__(self, dataset_directory):
315 | 		self.dataset_directory = dataset_directory
316 | 		# For us, this is Research/Code/GraphPlanningNetworks/scripts/DatasetPlanning/CreateDemos/Demos2
317 | 
318 | 		self.x_path = os.path.join(self.dataset_directory,"X_separable.npy")
319 | 		self.a_path = os.path.join(self.dataset_directory,"A_separable.npy")
320 | 		self.y_path = os.path.join(self.dataset_directory,"Y_separable.npy")
321 | 		self.b_path = os.path.join(self.dataset_directory,"B_separable.npy")
322 | 		self.g_path = os.path.join(self.dataset_directory,"G_separable.npy")
323 | 		self.s_path = os.path.join(self.dataset_directory,"StartConfig_separable.npy")
324 | 
325 | 		self.X_array = np.load(self.x_path)
326 | 		self.A_array = np.load(self.a_path)
327 | 		self.Y_array = np.load(self.y_path)
328 | 		self.B_array = np.load(self.b_path)
329 | 		self.G_array = np.load(self.g_path)
330 | 		self.S_array = np.load(self.s_path)
331 | 
332 | 	def __len__(self):
333 | 		return 50000
334 | 
335 | 	def __getitem__(self, index):
336 | 
337 | 		# Return trajectory and action sequence.
338 | 		return self.X_array[index],self.A_array[index]
339 | 
340 | 	def get_latent_variables(self, index):
341 | 		return self.B_array[index],self.Y_array[index]
342 | 
343 | 	def get_goal(self, index):
344 | 		return self.G_array[index]
345 | 
346 | 	def get_startconfig(self, index):
347 | 		return self.S_array[index]


--------------------------------------------------------------------------------
/Experiments/Eval_RLRewards.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | 
  8 | import numpy as np, glob, os
  9 | from IPython import embed
 10 | 
 11 | # Env list. 
 12 | environment_names = ["SawyerPickPlaceBread","SawyerPickPlaceCan","SawyerPickPlaceCereal","SawyerPickPlaceMilk","SawyerNutAssemblyRound","SawyerNutAssemblySquare"]
 13 | 
 14 | # Evaluate baselineRL methods. 
 15 | a = 86
 16 | b = 86
 17 | 
 18 | a = 130
 19 | b = 137
 20 | prefix = 'RL'
 21 | increment = 100
 22 | reward_list = []
 23 | 
 24 | for i in range(a,b+1):
 25 | 
 26 | 	model_template = "RL{0}/saved_models/Model_epoch*".format(i)
 27 | 	models = glob.glob(model_template)	
 28 | 	# number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models]
 29 | 	max_model = int(models[-1].lstrip("RL{0}/saved_models/Model_epoch".format(i)))
 30 | 
 31 | 	model_range = np.arange(0,max_model+increment,increment)
 32 | 	rewards = np.zeros((len(model_range)))
 33 | 
 34 | 	for j in range(len(model_range)):
 35 | 		rewards[j] = np.load("RL{0}/MEval/m{1}/Mean_Reward_RL{0}.npy".format(i,model_range[j]))
 36 | 
 37 | 	reward_list.append(rewards)
 38 | 
 39 | embed()
 40 | # x = np.arange(0,260,20)
 41 | # dists = np.zeros((6,len(x),100))
 42 | # a = 6
 43 | # b = 12
 44 | # for i in range(a,b):
 45 | # 	for j in range(len(x)):
 46 | # 		dists[i-a,j] = np.load("IL0{0}/MEval/m{1}/Total_Rewards_IL0{0}.npy".format(str(i).zfill(2),x[j]))
 47 | 
 48 | 
 49 | # IL 
 50 | a = 18
 51 | b = 23
 52 | prefix = 'IL0'
 53 | increment = 20
 54 | reward_list = []
 55 | 
 56 | for i in range(a,b+1):
 57 | 
 58 | 	model_template = "{0}{1}/saved_models/Model_epoch*".format(prefix,i)
 59 | 	models = glob.glob(model_template)	
 60 | 	# number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models]
 61 | 	max_model = int(models[-1].lstrip("{0}{1}/saved_models/Model_epoch".format(prefix,i)))
 62 | 
 63 | 	model_range = np.arange(0,max_model+increment,increment)
 64 | 	rewards = np.zeros((len(model_range)))
 65 | 
 66 | 	for j in range(len(model_range)):
 67 | 		rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(i,model_range[j],prefix))
 68 | 
 69 | 	reward_list.append(rewards)
 70 | 
 71 | # Get distances
 72 | a = 30
 73 | b = 37
 74 | prefix = 'RJ'
 75 | increment = 20
 76 | distance_list = []
 77 | 
 78 | for i in range(a,b+1):
 79 | 
 80 | 	model_template = "{0}{1}/saved_models/Model_epoch*".format(prefix,i)
 81 | 	models = glob.glob(model_template)	
 82 | 	# number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models]
 83 | 	max_model = int(models[-1].lstrip("{0}{1}/saved_models/Model_epoch".format(prefix,i)))
 84 | 	max_model = max_model-max_model%increment
 85 | 	model_range = np.arange(0,max_model+increment,increment)
 86 | 	distances = np.zeros((len(model_range)))
 87 | 
 88 | 	for j in range(len(model_range)):
 89 | 		distances[j] = np.load("{2}{0}/MEval/m{1}/Mean_Trajectory_Distance_{2}{0}.npy".format(i,model_range[j],prefix))
 90 | 
 91 | 	distance_list.append(distances)
 92 | 
 93 | ################################################
 94 | # Env list. 
 95 | environment_names = ["SawyerPickPlaceBread","SawyerPickPlaceCan","SawyerPickPlaceCereal","SawyerPickPlaceMilk","SawyerNutAssemblyRound","SawyerNutAssemblySquare"]
 96 | 
 97 | # Evaluate baselineRL methods. 
 98 | a = 5
 99 | b = 12
100 | prefix = 'downRL'
101 | increment = 20
102 | reward_list = []
103 | 
104 | for i in range(a,b+1):
105 | 
106 | 	padded_index = str(i).zfill(3)
107 | 
108 | 	model_template = "{1}{0}/saved_models/Model_epoch*".format(padded_index,prefix)
109 | 	models = glob.glob(model_template)	
110 | 	# number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models]
111 | 	max_model = int(models[-1].lstrip("{1}{0}/saved_models/Model_epoch".format(padded_index,prefix)))
112 | 	max_model = max_model-max_model%increment
113 | 	model_range = np.arange(0,max_model+increment,increment)
114 | 	rewards = np.zeros((len(model_range)))
115 | 
116 | 	for j in range(len(model_range)):
117 | 		rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(padded_index,model_range[j],prefix))
118 | 		# rewards[j] = np.load("{0}{1}/MEval/m{2}/Mean_Reward_{0}{1}.npy".format(prefix,padded_indexi,model_range[j],prefix))
119 | 	reward_list.append(rewards)
120 | 
121 | ##############################################
122 | # MOcap distances
123 | 
124 | # Get distances
125 | a = 1
126 | b = 2
127 | prefix = 'Mocap00'
128 | increment = 20
129 | distance_list = []
130 | 
131 | for i in range(a,b+1):
132 | 
133 | 	model_template = "{0}{1}/saved_models/Model_epoch*".format(prefix,i)
134 | 	models = glob.glob(model_template)	
135 | 	# number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models]
136 | 	max_model = int(models[-1].lstrip("{0}{1}/saved_models/Model_epoch".format(prefix,i)))
137 | 	max_model = max_model-max_model%increment
138 | 	model_range = np.arange(0,max_model+increment,increment)
139 | 	distances = np.zeros((len(model_range)))
140 | 
141 | 	for j in range(len(model_range)):
142 | 		distances[j] = np.load("{2}{0}/MEval/m{1}/Mean_Trajectory_Distance_{2}{0}.npy".format(i,model_range[j],prefix))
143 | 
144 | 	distance_list.append(distances)
145 | 
146 | ##############################################
147 | 
148 | ################################################
149 | # Env list. 
150 | environment_names = ["SawyerPickPlaceBread","SawyerPickPlaceCan","SawyerPickPlaceCereal","SawyerPickPlaceMilk","SawyerNutAssemblyRound","SawyerNutAssemblySquare"]
151 | 
152 | 
153 | def remove_start(inputstring, word_to_remove):
154 | 	return inputstring[len(word_to_remove):] if inputstring.startswith(word_to_remove) else inputstring
155 | 
156 | # Evaluate baselineRL methods. 
157 | a = 23
158 | b = 28
159 | 
160 | 
161 | prefix = 'downRL_pi'
162 | increment = 20
163 | reward_list = []
164 | 
165 | for i in range(a,b+1):
166 | 
167 | 	padded_index = str(i).zfill(3)
168 | 
169 | 	model_template = "{1}{0}/saved_models/Model_epoch*".format(padded_index,prefix)
170 | 	models = glob.glob(model_template)	
171 | 	# number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models]
172 | 	# max_model = int(models[-1].lstrip("{1}{0}/saved_models/Model_epoch".format(padded_index,prefix)))
173 | 	max_model = int(remove_start(models[-1],"{1}{0}/saved_models/Model_epoch".format(padded_index,prefix)))
174 | 
175 | 	max_model = max_model-max_model%increment
176 | 	model_range = np.arange(0,max_model+increment,increment)
177 | 	rewards = np.zeros((len(model_range)))
178 | 
179 | 	for j in range(len(model_range)-1):
180 | 		rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(padded_index,model_range[j],prefix))
181 | 		# rewards[j] = np.load("{0}{1}/MEval/m{2}/Mean_Reward_{0}{1}.npy".format(prefix,padded_indexi,model_range[j],prefix))
182 | 	reward_list.append(rewards)
183 | 
184 | for i in range(a,b+1):
185 | 
186 | 	print("For environment: ", environment_names[i-a])
187 | 	print("Average reward:", np.array(reward_list[i-a]).max())
188 | 
189 | def evalrl(a,b):
190 | 
191 | 	prefix = 'downRL_pi'
192 | 	increment = 20
193 | 	reward_list = []
194 | 
195 | 	for i in range(a,b+1):
196 | 
197 | 		padded_index = str(i).zfill(3)
198 | 
199 | 		model_template = "{1}{0}/saved_models/Model_epoch*".format(padded_index,prefix)
200 | 		models = glob.glob(model_template)	
201 | 		# number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models]
202 | 		# max_model = int(models[-1].lstrip("{1}{0}/saved_models/Model_epoch".format(padded_index,prefix)))
203 | 		max_model = int(remove_start(models[-1],"{1}{0}/saved_models/Model_epoch".format(padded_index,prefix)))
204 | 
205 | 		max_model = max_model-max_model%increment
206 | 		model_range = np.arange(0,max_model+increment,increment)
207 | 		rewards = np.zeros((len(model_range)))
208 | 
209 | 		for j in range(len(model_range)-1):
210 | 			rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(padded_index,model_range[j],prefix))
211 | 			# rewards[j] = np.load("{0}{1}/MEval/m{2}/Mean_Reward_{0}{1}.npy".format(prefix,padded_indexi,model_range[j],prefix))
212 | 		reward_list.append(rewards)
213 | 
214 | 	for i in range(a,b+1):
215 | 
216 | 		print("For environment: ", environment_names[i-a])
217 | 		print("Average reward:", np.array(reward_list[i-a]).max())
218 | 
219 | def evalrl(a,b):
220 | 	
221 | 	prefix = 'RL'
222 | 	increment = 20
223 | 	reward_list = []
224 | 
225 | 	for i in range(a,b+1):
226 | 
227 | 		padded_index = str(i).zfill(2)
228 | 
229 | 		model_template = "{1}{0}/saved_models/Model_epoch*".format(padded_index,prefix)
230 | 		models = glob.glob(model_template)	
231 | 		# number_models = [int((model.lstrip("RL{0}/saved_models/Model_epoch".format(i))).zfill(4)) for model in models]
232 | 		# max_model = int(models[-1].lstrip("{1}{0}/saved_models/Model_epoch".format(padded_index,prefix)))
233 | 		max_model = int(remove_start(models[-1],"{1}{0}/saved_models/Model_epoch".format(padded_index,prefix)))
234 | 
235 | 		max_model = max_model-max_model%increment
236 | 		model_range = np.arange(0,max_model+increment,increment)
237 | 		rewards = np.zeros((len(model_range)))
238 | 
239 | 		for j in range(len(model_range)-1):
240 | 			rewards[j] = np.load("{2}{0}/MEval/m{1}/Mean_Reward_{2}{0}.npy".format(padded_index,model_range[j],prefix))
241 | 			# rewards[j] = np.load("{0}{1}/MEval/m{2}/Mean_Reward_{0}{1}.npy".format(prefix,padded_indexi,model_range[j],prefix))
242 | 		reward_list.append(rewards)
243 | 
244 | 	for i in range(a,b+1):
245 | 
246 | 		print("For environment: ", environment_names[i-a])
247 | 		print("Average reward:", np.array(reward_list[i-a]).max())


--------------------------------------------------------------------------------
/Experiments/MIME_DataLoader.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | from headers import *
 12 | import os.path as osp
 13 | 
 14 | def select_baxter_angles(trajectory, joint_names, arm='right'):
 15 |     # joint names in order as used via mujoco visualizer
 16 |     baxter_joint_names = ['right_s0', 'right_s1', 'right_e0', 'right_e1', 'right_w0', 'right_w1', 'right_w2', 'left_s0', 'left_s1', 'left_e0', 'left_e1', 'left_w0', 'left_w1', 'left_w2']
 17 |     if arm == 'right':
 18 |         select_joints = baxter_joint_names[:7]
 19 |     elif arm == 'left':
 20 |         select_joints = baxter_joint_names[7:]
 21 |     elif arm == 'both':
 22 |         select_joints = baxter_joint_names
 23 |     inds = [joint_names.index(j) for j in select_joints]
 24 |     return trajectory[:, inds]
 25 | 
 26 | def resample(original_trajectory, desired_number_timepoints):
 27 | 	original_traj_len = len(original_trajectory)
 28 | 	new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int)
 29 | 	return original_trajectory[new_timepoints]
 30 | 
 31 | class MIME_Dataset(Dataset):
 32 | 	'''
 33 | 	Class implementing instance of dataset class for MIME data. 
 34 | 	'''
 35 | 	def __init__(self, split='all'):
 36 | 		self.dataset_directory = '/checkpoint/tanmayshankar/MIME/'
 37 | 		self.ds_freq = 20
 38 | 
 39 | 		# Default: /checkpoint/tanmayshankar/MIME/
 40 | 		self.fulltext = osp.join(self.dataset_directory, 'MIME_jointangles/*/*/joint_angles.txt')
 41 | 		self.filelist = glob.glob(self.fulltext)
 42 | 
 43 | 		with open(self.filelist[0], 'r') as file:
 44 | 			lines = file.readlines()
 45 | 			self.joint_names = sorted(eval(lines[0].rstrip('\n')).keys())
 46 | 
 47 | 		if split == 'all':
 48 | 			self.filelist = self.filelist
 49 | 		else:
 50 | 			self.task_lists = np.load(os.path.join(
 51 | 				self.dataset_directory, 'MIME_jointangles/{}_Lists.npy'.format(split.capitalize())))
 52 | 
 53 | 			self.filelist = []
 54 | 			for i in range(20):
 55 | 				self.filelist.extend(self.task_lists[i])
 56 | 			self.filelist = [f.replace('/checkpoint/tanmayshankar/MIME/', self.dataset_directory) for f in self.filelist]
 57 | 		# print(len(self.filelist))
 58 | 
 59 | 	def __len__(self):
 60 | 		# Return length of file list. 
 61 | 		return len(self.filelist)
 62 | 
 63 | 	def __getitem__(self, index):
 64 | 		'''
 65 | 		# Returns Joint Angles as: 
 66 | 		# List of length Number_Timesteps, with each element of the list a dictionary containing the sequence of joint angles. 
 67 | 		# Assumes index is within range [0,len(filelist)-1]
 68 | 		'''
 69 | 		file = self.filelist[index]
 70 | 
 71 | 		left_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'left_gripper.txt'))
 72 | 		right_gripper = np.loadtxt(os.path.join(os.path.split(file)[0],'right_gripper.txt'))
 73 | 
 74 | 		orig_left_traj = np.load(osp.join(osp.split(file)[0], 'Left_EE.npy'))
 75 | 		orig_right_traj = np.load(osp.join(osp.split(file)[0], 'Right_EE.npy'))        
 76 | 
 77 | 		joint_angle_trajectory = []
 78 | 		# Open file. 
 79 | 		with open(file, 'r') as file:
 80 | 			lines = file.readlines()
 81 | 			for line in lines:
 82 | 				dict_element = eval(line.rstrip('\n'))
 83 | 				if len(dict_element.keys()) == len(self.joint_names):
 84 | 					# some files have extra lines with gripper keys e.g. MIME_jointangles/4/12405Nov19/joint_angles.txt
 85 | 					array_element = np.array([dict_element[joint] for joint in self.joint_names])
 86 | 					joint_angle_trajectory.append(array_element)
 87 | 
 88 | 		joint_angle_trajectory = np.array(joint_angle_trajectory)
 89 | 
 90 | 		n_samples = len(orig_left_traj) // self.ds_freq
 91 | 
 92 | 		elem = {}
 93 | 		elem['joint_angle_trajectory'] = resample(joint_angle_trajectory, n_samples)
 94 | 		elem['left_trajectory'] = resample(orig_left_traj, n_samples)
 95 | 		elem['right_trajectory'] = resample(orig_right_traj, n_samples)
 96 | 		elem['left_gripper'] = resample(left_gripper, n_samples)/100
 97 | 		elem['right_gripper'] = resample(right_gripper, n_samples)/100
 98 | 		elem['path_prefix'] = os.path.split(self.filelist[index])[0]
 99 | 		elem['ra_trajectory'] = select_baxter_angles(elem['joint_angle_trajectory'], self.joint_names, arm='right')
100 | 		elem['la_trajectory'] = select_baxter_angles(elem['joint_angle_trajectory'], self.joint_names, arm='left')
101 | 		# If max norm of differences is <1.0, valid. 
102 | 
103 | 		# if elem['joint_angle_trajectory'].shape[0]>1:
104 | 		elem['is_valid'] = int(np.linalg.norm(np.diff(elem['joint_angle_trajectory'],axis=0),axis=1).max() < 1.0)
105 | 
106 | 		return elem
107 | 
108 | 	def recreate_dictionary(self, arm, joint_angles):
109 | 		if arm=="left":
110 | 			offset = 2
111 | 			width = 7 
112 | 		elif arm=="right":
113 | 			offset = 9
114 | 			width = 7
115 | 		elif arm=="full":
116 | 			offset = 0
117 | 			width = len(self.joint_names)
118 | 		return dict((self.joint_names[i],joint_angles[i-offset]) for i in range(offset,offset+width))
119 | 
120 | class MIME_NewDataset(Dataset):
121 | 
122 | 	def __init__(self, split='all'):
123 | 		self.dataset_directory = '/checkpoint/tanmayshankar/MIME/'
124 | 
125 | 		# Load the entire set of trajectories. 
126 | 		self.data_list = np.load(os.path.join(self.dataset_directory, "Data_List.npy"),allow_pickle=True)
127 | 
128 | 		self.dataset_length = len(self.data_list)
129 | 	
130 | 	def __len__(self):
131 | 		# Return length of file list. 
132 | 		return self.dataset_length
133 | 
134 | 	def __getitem__(self, index):
135 | 		# Return n'th item of dataset.
136 | 		# This has already processed everything.
137 | 
138 | 		return self.data_list[index]
139 | 
140 | 	def compute_statistics(self):
141 | 
142 | 		self.state_size = 16
143 | 		self.total_length = self.__len__()
144 | 		mean = np.zeros((self.state_size))
145 | 		variance = np.zeros((self.state_size))
146 | 		mins = np.zeros((self.total_length, self.state_size))
147 | 		maxs = np.zeros((self.total_length, self.state_size))
148 | 		lens = np.zeros((self.total_length))
149 | 
150 | 		# And velocity statistics. 
151 | 		vel_mean = np.zeros((self.state_size))
152 | 		vel_variance = np.zeros((self.state_size))
153 | 		vel_mins = np.zeros((self.total_length, self.state_size))
154 | 		vel_maxs = np.zeros((self.total_length, self.state_size))
155 | 
156 | 		
157 | 		for i in range(self.total_length):
158 | 
159 | 			print("Phase 1: DP: ",i)
160 | 			data_element = self.__getitem__(i)
161 | 
162 | 			if data_element['is_valid']:
163 | 				demo = data_element['demo']
164 | 				vel = np.diff(demo,axis=0)
165 | 				mins[i] = demo.min(axis=0)
166 | 				maxs[i] = demo.max(axis=0)
167 | 				mean += demo.sum(axis=0)
168 | 				lens[i] = demo.shape[0]
169 | 
170 | 				vel_mins[i] = abs(vel).min(axis=0)
171 | 				vel_maxs[i] = abs(vel).max(axis=0)
172 | 				vel_mean += vel.sum(axis=0)			
173 | 
174 | 		mean /= lens.sum()
175 | 		vel_mean /= lens.sum()
176 | 
177 | 		for i in range(self.total_length):
178 | 
179 | 			print("Phase 2: DP: ",i)
180 | 			data_element = self.__getitem__(i)
181 | 			
182 | 			# Just need to normalize the demonstration. Not the rest. 
183 | 			if data_element['is_valid']:
184 | 				demo = data_element['demo']
185 | 				vel = np.diff(demo,axis=0)
186 | 				variance += ((demo-mean)**2).sum(axis=0)
187 | 				vel_variance += ((vel-vel_mean)**2).sum(axis=0)
188 | 
189 | 		variance /= lens.sum()
190 | 		variance = np.sqrt(variance)
191 | 
192 | 		vel_variance /= lens.sum()
193 | 		vel_variance = np.sqrt(vel_variance)
194 | 
195 | 		max_value = maxs.max(axis=0)
196 | 		min_value = mins.min(axis=0)
197 | 
198 | 		vel_max_value = vel_maxs.max(axis=0)
199 | 		vel_min_value = vel_mins.min(axis=0)
200 | 
201 | 		np.save("MIME_Orig_Mean.npy", mean)
202 | 		np.save("MIME_Orig_Var.npy", variance)
203 | 		np.save("MIME_Orig_Min.npy", min_value)
204 | 		np.save("MIME_Orig_Max.npy", max_value)
205 | 		np.save("MIME_Orig_Vel_Mean.npy", vel_mean)
206 | 		np.save("MIME_Orig_Vel_Var.npy", vel_variance)
207 | 		np.save("MIME_Orig_Vel_Min.npy", vel_min_value)
208 | 		np.save("MIME_Orig_Vel_Max.npy", vel_max_value)
209 | 
210 | class MIME_Dataloader_Tester(unittest.TestCase):
211 | 	
212 | 	def test_MIMEdataloader(self):
213 | 
214 | 		self.dataset = MIME_NewDataset()
215 | 
216 | 		# Check the first index of the dataset.
217 | 		data_element = self.dataset[0]
218 | 
219 | 		validity = data_element['is_valid']==1
220 | 		check_demo_data = (data_element['demo']==np.load("Test_Data/MIME_Dataloader_DE.npy")).all()
221 | 
222 | 		self.assertTrue(validity and check_demo_data)
223 | 
224 | if __name__ == '__main__':
225 | 	# Run all tests defined for the dataloader.
226 |     unittest.main()


--------------------------------------------------------------------------------
/Experiments/Master.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from headers import *
  8 | import DataLoaders, MIME_DataLoader, Roboturk_DataLoader, Mocap_DataLoader
  9 | from PolicyManagers import *
 10 | import TestClass
 11 | 
 12 | def return_dataset(args, data=None):
 13 | 	
 14 | 	# The data parameter overrides the data in args.data. 
 15 | 	# This is so that we can call return_dataset with source and target data for transfer setting.
 16 | 	if data is not None:
 17 | 		args.data = data
 18 | 
 19 | 	# Define Data Loader.
 20 | 	if args.data=='Continuous':
 21 | 		dataset = DataLoaders.ContinuousToyDataset(args.datadir)
 22 | 	elif args.data=='ContinuousNonZero':
 23 | 		dataset = DataLoaders.ContinuousNonZeroToyDataset(args.datadir)
 24 | 	elif args.data=='DeterGoal':
 25 | 		dataset = DataLoaders.DeterministicGoalDirectedDataset(args.datadir)			
 26 | 	elif args.data=='MIME':
 27 | 		dataset = MIME_DataLoader.MIME_NewDataset()
 28 | 	elif args.data=='Roboturk':		
 29 | 		dataset = Roboturk_DataLoader.Roboturk_NewSegmentedDataset(args)
 30 | 	elif args.data=='OrigRoboturk':
 31 | 		dataset = Roboturk_DataLoader.Roboturk_Dataset(args)
 32 | 	elif args.data=='FullRoboturk':
 33 | 		dataset = Roboturk_DataLoader.Roboturk_FullDataset(args)
 34 | 	elif args.data=='Mocap':
 35 | 		dataset = Mocap_DataLoader.Mocap_Dataset(args)
 36 | 
 37 | 	return dataset
 38 | 
 39 | class Master():
 40 | 
 41 | 	def __init__(self, arguments):
 42 | 		self.args = arguments 
 43 | 
 44 | 		self.dataset = return_dataset(self.args)
 45 | 
 46 | 		# Now define policy manager.
 47 | 		if self.args.setting=='learntsub':
 48 | 			self.policy_manager = PolicyManager_Joint(self.args.number_policies, self.dataset, self.args)
 49 | 		elif self.args.setting=='pretrain_sub':
 50 | 			self.policy_manager = PolicyManager_Pretrain(self.args.number_policies, self.dataset, self.args)
 51 | 		elif self.args.setting=='baselineRL':
 52 | 			self.policy_manager = PolicyManager_BaselineRL(args=self.args)
 53 | 		elif self.args.setting=='downstreamRL':
 54 | 			self.policy_manager = PolicyManager_DownstreamRL(args=self.args)
 55 | 		elif self.args.setting=='DMP':			
 56 | 			self.policy_manager = PolicyManager_DMPBaselines(self.args.number_policies, self.dataset, self.args)
 57 | 		elif self.args.setting=='imitation':
 58 | 			self.policy_manager = PolicyManager_Imitation(self.args.number_policies, self.dataset, self.args)
 59 | 		elif self.args.setting=='transfer' or self.args.setting=='cycle_transfer':
 60 | 			source_dataset = return_dataset(self.args, data=self.args.source_domain)
 61 | 			target_dataset = return_dataset(self.args, data=self.args.target_domain)
 62 | 
 63 | 			if self.args.setting=='transfer':
 64 | 				self.policy_manager = PolicyManager_Transfer(args=self.args, source_dataset=source_dataset, target_dataset=target_dataset)
 65 | 			elif self.args.setting=='cycle_transfer':
 66 | 				self.policy_manager = PolicyManager_CycleConsistencyTransfer(args=self.args, source_dataset=source_dataset, target_dataset=target_dataset)				
 67 | 
 68 | 		if self.args.debug:
 69 | 			embed()
 70 | 			
 71 | 		# Create networks and training operations. 
 72 | 		self.policy_manager.setup()
 73 | 
 74 | 	def run(self):
 75 | 		if self.args.setting=='pretrain_sub' or self.args.setting=='pretrain_prior' or \
 76 | 			self.args.setting=='imitation' or self.args.setting=='baselineRL' or self.args.setting=='downstreamRL' or \
 77 | 			 self.args.setting=='transfer' or self.args.setting=='cycle_transfer':
 78 | 			if self.args.train:
 79 | 				if self.args.model:
 80 | 					self.policy_manager.train(self.args.model)
 81 | 				else:
 82 | 					self.policy_manager.train()
 83 | 			else:
 84 | 				if self.args.setting=='pretrain_prior':
 85 | 					self.policy_manager.train(self.args.model)
 86 | 				else:
 87 | 					self.policy_manager.evaluate(model=self.args.model)		
 88 | 				
 89 | 		elif self.args.setting=='learntsub':
 90 | 			if self.args.train:
 91 | 				if self.args.model:
 92 | 					self.policy_manager.train(self.args.model)
 93 | 				else:
 94 | 					if self.args.subpolicy_model:
 95 | 						print("Just loading subpolicies.")
 96 | 						self.policy_manager.load_all_models(self.args.subpolicy_model, just_subpolicy=True)
 97 | 					self.policy_manager.train()
 98 | 			else:
 99 | 				# self.policy_manager.train(self.args.model)
100 | 				self.policy_manager.evaluate(self.args.model)
101 | 
102 | 		# elif self.args.setting=='baselineRL' or self.args.setting=='downstreamRL':
103 | 		# 	if self.args.train:
104 | 		# 		if self.args.model:
105 | 		# 			self.policy_manager.train(self.args.model)
106 | 		# 		else:
107 | 		# 			self.policy_manager.train()
108 | 
109 | 		elif self.args.setting=='DMP':
110 | 			self.policy_manager.evaluate_across_testset()
111 | 
112 | 	def test(self):
113 | 		if self.args.test_code:
114 | 			loader = TestClass.TestLoaderWithKwargs()
115 | 			suite = loader.loadTestsFromTestCase(TestClass.MetaTestClass, policy_manager=self.policy_manager)
116 | 			unittest.TextTestRunner().run(suite)
117 | 
118 | def parse_arguments():
119 | 	parser = argparse.ArgumentParser(description='Learning Skills from Demonstrations')
120 | 
121 | 	# Setup training. 
122 | 	parser.add_argument('--datadir', dest='datadir',type=str,default='../Data/ContData/')
123 | 	parser.add_argument('--train',dest='train',type=int,default=0)
124 | 	parser.add_argument('--debug',dest='debug',type=int,default=0)
125 | 	parser.add_argument('--notes',dest='notes',type=str)
126 | 	parser.add_argument('--name',dest='name',type=str,default=None)
127 | 	parser.add_argument('--fake_batch_size',dest='fake_batch_size',type=int,default=1)
128 | 	parser.add_argument('--batch_size',dest='batch_size',type=int,default=1)
129 | 	parser.add_argument('--training_phase_size',dest='training_phase_size',type=int,default=500000)
130 | 	parser.add_argument('--initial_counter_value',dest='initial_counter_value',type=int,default=0)
131 | 	parser.add_argument('--data',dest='data',type=str,default='Continuous')
132 | 	parser.add_argument('--setting',dest='setting',type=str,default='gtsub')
133 | 	parser.add_argument('--test_code',dest='test_code',type=int,default=0)
134 | 	parser.add_argument('--model',dest='model',type=str)
135 | 	parser.add_argument('--logdir',dest='logdir',type=str,default='Experiment_Logs/')
136 | 	parser.add_argument('--epochs',dest='epochs',type=int,default=500) # Number of epochs to train for. Reduce for Mocap.
137 | 
138 | 	# Training setting. 
139 | 	parser.add_argument('--discrete_z',dest='discrete_z',type=int,default=0)
140 | 	# parser.add_argument('--transformer',dest='transformer',type=int,default=0)	
141 | 	parser.add_argument('--z_dimensions',dest='z_dimensions',type=int,default=64)
142 | 	parser.add_argument('--number_layers',dest='number_layers',type=int,default=5)
143 | 	parser.add_argument('--hidden_size',dest='hidden_size',type=int,default=64)
144 | 	parser.add_argument('--environment',dest='environment',type=str,default='SawyerLift') # Defines robosuite environment for RL.
145 | 	
146 | 	# Data parameters. 
147 | 	parser.add_argument('--traj_segments',dest='traj_segments',type=int,default=1) # Defines whether to use trajectory segments for pretraining or entire trajectories. Useful for baseline implementation.
148 | 	parser.add_argument('--gripper',dest='gripper',type=int,default=1) # Whether to use gripper training in roboturk.
149 | 	parser.add_argument('--ds_freq',dest='ds_freq',type=int,default=1) # Additional downsample frequency.
150 | 	parser.add_argument('--condition_size',dest='condition_size',type=int,default=4)
151 | 	parser.add_argument('--smoothen', dest='smoothen',type=int,default=0) # Whether to smoothen the original dataset. 
152 | 	parser.add_argument('--smoothing_kernel_bandwidth', dest='smoothing_kernel_bandwidth',type=float,default=3.5) # The smoothing bandwidth that is applied to data loader trajectories. 
153 | 
154 | 	parser.add_argument('--new_gradient',dest='new_gradient',type=int,default=1)
155 | 	parser.add_argument('--b_prior',dest='b_prior',type=int,default=1)
156 | 	parser.add_argument('--constrained_b_prior',dest='constrained_b_prior',type=int,default=1) # Whether to use constrained b prior var network or just normal b prior one.
157 | 	parser.add_argument('--reparam',dest='reparam',type=int,default=1)	
158 | 	parser.add_argument('--number_policies',dest='number_policies',type=int,default=4)
159 | 	parser.add_argument('--fix_subpolicy',dest='fix_subpolicy',type=int,default=1)
160 | 	parser.add_argument('--train_only_policy',dest='train_only_policy',type=int,default=0) # Train only the policy network and use a pretrained encoder. This is weird but whatever. 
161 | 	parser.add_argument('--load_latent',dest='load_latent',type=int,default=1) # Whether to load latent policy from model or not.
162 | 	parser.add_argument('--subpolicy_model',dest='subpolicy_model',type=str)
163 | 	parser.add_argument('--traj_length',dest='traj_length',type=int,default=10)
164 | 	parser.add_argument('--skill_length',dest='skill_length',type=int,default=5)
165 | 	parser.add_argument('--var_skill_length',dest='var_skill_length',type=int,default=0)
166 | 	parser.add_argument('--display_freq',dest='display_freq',type=int,default=10000)
167 | 	parser.add_argument('--save_freq',dest='save_freq',type=int,default=1)	
168 | 	parser.add_argument('--eval_freq',dest='eval_freq',type=int,default=20)	
169 | 	parser.add_argument('--perplexity',dest='perplexity',type=float,default=30,help='Value of perplexity fed to TSNE.')
170 | 
171 | 	parser.add_argument('--entropy',dest='entropy',type=int,default=0)
172 | 	parser.add_argument('--var_entropy',dest='var_entropy',type=int,default=0)
173 | 	parser.add_argument('--ent_weight',dest='ent_weight',type=float,default=0.)
174 | 	parser.add_argument('--var_ent_weight',dest='var_ent_weight',type=float,default=2.)
175 | 	
176 | 	parser.add_argument('--pretrain_bias_sampling',type=float,default=0.) # Defines percentage of trajectory within which to sample trajectory segments for pretraining.
177 | 	parser.add_argument('--pretrain_bias_sampling_prob',type=float,default=0.)
178 | 	parser.add_argument('--action_scale_factor',type=float,default=1)	
179 | 
180 | 	parser.add_argument('--z_exploration_bias',dest='z_exploration_bias',type=float,default=0.)
181 | 	parser.add_argument('--b_exploration_bias',dest='b_exploration_bias',type=float,default=0.)
182 | 	parser.add_argument('--lat_z_wt',dest='lat_z_wt',type=float,default=0.1)
183 | 	parser.add_argument('--lat_b_wt',dest='lat_b_wt',type=float,default=1.)
184 | 	parser.add_argument('--z_probability_factor',dest='z_probability_factor',type=float,default=0.1)
185 | 	parser.add_argument('--b_probability_factor',dest='b_probability_factor',type=float,default=0.1)
186 | 	parser.add_argument('--subpolicy_clamp_value',dest='subpolicy_clamp_value',type=float,default=-5)
187 | 	parser.add_argument('--latent_clamp_value',dest='latent_clamp_value',type=float,default=-5)
188 | 	parser.add_argument('--min_variance_bias',dest='min_variance_bias',type=float,default=0.01)
189 | 	parser.add_argument('--normalization',dest='normalization',type=str,default='None')
190 | 
191 | 	parser.add_argument('--likelihood_penalty',dest='likelihood_penalty',type=int,default=10)
192 | 	parser.add_argument('--subpolicy_ratio',dest='subpolicy_ratio',type=float,default=0.01)
193 | 	parser.add_argument('--latentpolicy_ratio',dest='latentpolicy_ratio',type=float,default=0.1)
194 | 	parser.add_argument('--temporal_latentpolicy_ratio',dest='temporal_latentpolicy_ratio',type=float,default=0.)
195 | 	parser.add_argument('--latent_loss_weight',dest='latent_loss_weight',type=float,default=0.1)
196 | 	parser.add_argument('--kl_weight',dest='kl_weight',type=float,default=0.01)
197 | 	parser.add_argument('--var_loss_weight',dest='var_loss_weight',type=float,default=1.)
198 | 	parser.add_argument('--prior_weight',dest='prior_weight',type=float,default=0.00001)
199 | 
200 | 	# Cross Domain Skill Transfer parameters. 
201 | 	parser.add_argument('--discriminability_weight',dest='discriminability_weight',type=float,default=1.,help='Weight of discriminability loss in cross domain skill transfer.') 
202 | 	parser.add_argument('--vae_loss_weight',dest='vae_loss_weight',type=float,default=1.,help='Weight of VAE loss in cross domain skill transfer.') 	
203 | 	parser.add_argument('--alternating_phase_size',dest='alternating_phase_size',type=int,default=2000, help='Size of alternating training phases.')
204 | 	parser.add_argument('--discriminator_phase_size',dest='discriminator_phase_size',type=int,default=2,help='Factor by which to train discriminator more than generator.')
205 | 	parser.add_argument('--cycle_reconstruction_loss_weight',dest='cycle_reconstruction_loss_weight',type=float,default=1.,help='Weight of the cycle-consistency reconstruction loss term.')
206 | 
207 | 	# Exploration and learning rate parameters. 
208 | 	parser.add_argument('--epsilon_from',dest='epsilon_from',type=float,default=0.3)
209 | 	parser.add_argument('--epsilon_to',dest='epsilon_to',type=float,default=0.05)
210 | 	parser.add_argument('--epsilon_over',dest='epsilon_over',type=int,default=30)
211 | 	parser.add_argument('--learning_rate',dest='learning_rate',type=float,default=1e-4)
212 | 
213 | 	# Baseline parameters. 
214 | 	parser.add_argument('--baseline_kernels',dest='baseline_kernels',type=int,default=15)
215 | 	parser.add_argument('--baseline_window',dest='baseline_window',type=int,default=15)
216 | 	parser.add_argument('--baseline_kernel_bandwidth',dest='baseline_kernel_bandwidth',type=float,default=3.5)
217 | 
218 | 	# Reinforcement Learning parameters. 
219 | 	parser.add_argument('--TD',dest='TD',type=int,default=0) # Whether or not to use Temporal difference while training the critic network.
220 | 	parser.add_argument('--OU',dest='OU',type=int,default=1) # Whether or not to use the Ornstein Uhlenbeck noise process while training.
221 | 	parser.add_argument('--OU_max_sigma',dest='OU_max_sigma',type=float,default=0.2) # Max Sigma value of the Ornstein Uhlenbeck noise process.
222 | 	parser.add_argument('--OU_min_sigma',dest='OU_min_sigma',type=float,default=0.2) # Min Sigma value of the Ornstein Uhlenbeck noise process.
223 | 	parser.add_argument('--MLP_policy',dest='MLP_policy',type=int,default=0) # Whether or not to use MLP policy.
224 | 	parser.add_argument('--mean_nonlinearity',dest='mean_nonlinearity',type=int,default=0) # Whether or not to use Tanh activation.
225 | 	parser.add_argument('--burn_in_eps',dest='burn_in_eps',type=int,default=500) # How many epsiodes to burn in.
226 | 	parser.add_argument('--random_memory_burn_in',dest='random_memory_burn_in',type=int,default=1) # Whether to burn in episodes into memory randomly or not.
227 | 	parser.add_argument('--shaped_reward',dest='shaped_reward',type=int,default=0) # Whether or not to use shaped rewards.
228 | 	parser.add_argument('--memory_size',dest='memory_size',type=int,default=2000) # Size of replay memory. 2000 is okay, but is still kind of short sighted. 
229 | 
230 | 	# Transfer learning domains, etc. 
231 | 	parser.add_argument('--source_domain',dest='source_domain',type=str,help='What the source domain is in transfer.')
232 | 	parser.add_argument('--target_domain',dest='target_domain',type=str,help='What the target domain is in transfer.')
233 | 
234 | 	return parser.parse_args()
235 | 
236 | def main(args):
237 | 
238 | 	args = parse_arguments()
239 | 	master = Master(args)
240 | 
241 | 	if args.test_code:
242 | 		master.test()
243 | 	else:
244 | 		master.run()
245 | 
246 | if __name__=='__main__':
247 | 	main(sys.argv)
248 | 
249 | 


--------------------------------------------------------------------------------
/Experiments/MocapVisualizationExample.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | 
 8 | import MocapVisualizationUtils
 9 | import threading, time, numpy as np
10 | 
11 | # bvh_filename = "/home/tanmayshankar/Research/Code/CausalSkillLearning/Experiments/01_01_poses.bvh"  
12 | bvh_filename = "/private/home/tanmayshankar/Research/Code/CausalSkillLearning/Experiments/01_01_poses.bvh"  
13 | filenames = [bvh_filename]
14 | file_num = 0
15 | 
16 | print("About to run viewer.")
17 | 
18 | cam_cur = MocapVisualizationUtils.camera.Camera(pos=np.array([6.0, 0.0, 2.0]),
19 | 								origin=np.array([0.0, 0.0, 0.0]), 
20 | 								vup=np.array([0.0, 0.0, 1.0]), 
21 | 								fov=45.0)
22 | 
23 | def run_thread():
24 | 	MocapVisualizationUtils.viewer.run(
25 | 		title='BVH viewer',
26 | 		cam=cam_cur,
27 | 		size=(1280, 720),
28 | 		keyboard_callback=None,
29 | 		render_callback=MocapVisualizationUtils.render_callback_time_independent,
30 | 		idle_callback=MocapVisualizationUtils.idle_callback,
31 | 	)
32 | 
33 | def run_thread():
34 | 	MocapVisualizationUtils.viewer.run(
35 | 		title='BVH viewer',
36 | 		cam=cam_cur,
37 | 		size=(1280, 720),
38 | 		keyboard_callback=None,
39 | 		render_callback=MocapVisualizationUtils.render_callback_time_independent,
40 | 		idle_callback=MocapVisualizationUtils.idle_callback_return,
41 | 	)
42 | 
43 | 
44 | # Run init before loading animation.
45 | MocapVisualizationUtils.init()
46 | MocapVisualizationUtils.global_positions, MocapVisualizationUtils.joint_parents, MocapVisualizationUtils.time_per_frame = MocapVisualizationUtils.load_animation(filenames[file_num])
47 | thread = threading.Thread(target=run_thread)
48 | thread.start()
49 | 
50 | print("Going to actually call callback now.")
51 | MocapVisualizationUtils.whether_to_render = True
52 | 
53 | x_count = 0
54 | while MocapVisualizationUtils.done_with_render==False and MocapVisualizationUtils.whether_to_render==True:
55 | 	x_count += 1
56 | 	time.sleep(1)
57 | 	print("x_count is now: ",x_count)
58 | 
59 | print("We finished with the visualization!")
60 | 


--------------------------------------------------------------------------------
/Experiments/MocapVisualizationUtils.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from mocap_processing.motion.pfnn import Animation, BVH
  8 | from basecode.render import glut_viewer as viewer
  9 | from basecode.render import gl_render, camera
 10 | from basecode.utils import basics
 11 | from basecode.math import mmMath
 12 | 
 13 | import numpy as np, imageio
 14 | 
 15 | from OpenGL.GL import *
 16 | from OpenGL.GLU import *
 17 | from OpenGL.GLUT import *
 18 | 
 19 | import time, threading
 20 | from IPython import embed
 21 | 
 22 | global whether_to_render
 23 | whether_to_render = False
 24 | 
 25 | def init():
 26 | 	global whether_to_render, global_positions, counter, joint_parents, done_with_render, save_path, name_prefix, image_list
 27 | 	whether_to_render = False
 28 | 	done_with_render = False
 29 | 	global_positions = None
 30 | 	joint_parents = None
 31 | 	save_path = "/private/home/tanmayshankar/Research/Code/"
 32 | 	name_prefix = "Viz_Image"
 33 | 	image_list = []
 34 | 	counter = 0
 35 | 
 36 | # Define function to load animation file. 
 37 | def load_animation(bvh_filename):
 38 | 	animation, joint_names, time_per_frame = BVH.load(bvh_filename)
 39 | 	joint_parents = animation.parents
 40 | 	global_positions = Animation.positions_global(animation)
 41 | 	return global_positions, joint_parents, time_per_frame
 42 | 
 43 | # Function that draws body of animated character from the global positions.
 44 | def render_pose_by_capsule(global_positions, frame_num, joint_parents, scale=1.0, color=[0.5, 0.5, 0.5, 1], radius=0.05):
 45 | 	glPushMatrix()
 46 | 	glScalef(scale, scale, scale)
 47 | 
 48 | 	for i in range(len(joint_parents)):
 49 | 		pos = global_positions[frame_num][i]
 50 | 		# gl_render.render_point(pos, radius=radius, color=color)
 51 | 		j = joint_parents[i]
 52 | 		if j!=-1:
 53 | 			pos_parent = global_positions[frame_num][j]
 54 | 			p = 0.5 * (pos_parent + pos)
 55 | 			l = np.linalg.norm(pos_parent-pos)
 56 | 			R = mmMath.getSO3FromVectors(np.array([0, 0, 1]), pos_parent-pos)
 57 | 			gl_render.render_capsule(mmMath.Rp2T(R,p), l, radius, color=color, slice=16)        
 58 | 	glPopMatrix()
 59 | 
 60 | # Callback that renders one pose. 
 61 | def render_callback_time_independent():	
 62 | 	global global_positions, joint_parents, counter
 63 | 
 64 | 	if counter<global_positions.shape[0]:
 65 | 		gl_render.render_ground(size=[100, 100], color=[0.8, 0.8, 0.8], axis='z', origin=True, use_arrow=True)
 66 | 
 67 | 		# Render Shadow of Character
 68 | 
 69 | 		glEnable(GL_DEPTH_TEST)
 70 | 		glDisable(GL_LIGHTING)
 71 | 		glPushMatrix()
 72 | 		glTranslatef(0, 0, 0.001)
 73 | 		glScalef(1, 1, 0)
 74 | 		render_pose_by_capsule(global_positions, counter, joint_parents, color=[0.5,0.5,0.5,1.0])	
 75 | 		glPopMatrix()
 76 | 
 77 | 		# Render Character
 78 | 		glEnable(GL_LIGHTING)
 79 | 		render_pose_by_capsule(global_positions, counter, joint_parents, color=np.array([85, 160, 173, 255])/255.0)
 80 | 			
 81 | # Callback that runs rendering when the global variable is set to true.
 82 | def idle_callback():
 83 | 	# 	# Increment counter
 84 | 	# 	# Set frame number of trajectory to be rendered
 85 | 	# 	# Using the time independent rendering. 
 86 | 	# 	# Call drawGL and savescreen. 
 87 | 	# 	# Since this is an idle callback, drawGL won't call itself (only calls render callback).
 88 | 
 89 | 	global whether_to_render, counter, global_positions, done_with_render, save_path, name_prefix
 90 | 	done_with_render = False
 91 | 
 92 | 	# if whether_to_render and counter<global_positions.shape[0]:	
 93 | 	if whether_to_render and counter<10:	
 94 | 
 95 | 		# print("Whether to render is actually true, with counter:",counter)
 96 | 		# render_callback_time_independent()
 97 | 		viewer.drawGL()
 98 | 		viewer.save_screen(save_path, "Image_{}_{}".format(name_prefix, counter))
 99 | 		# viewer.save_screen("/home/tanmayshankar/Research/Code/","Visualize_Image_{}".format(counter))
100 | 
101 | 		counter += 1
102 | 
103 | 		# Set whether to render to false if counter exceeded. 
104 | 		# if counter>=global_positions.shape[0]:
105 | 		if counter>=10:
106 | 			whether_to_render = False
107 | 			done_with_render = True
108 | 
109 | 	# If whether to render is false, reset the counter.
110 | 	else:
111 | 		counter = 0
112 | 
113 | def idle_callback_return():
114 | 	# 	# Increment counter
115 | 	# 	# Set frame number of trajectory to be rendered
116 | 	# 	# Using the time independent rendering. 
117 | 	# 	# Call drawGL and savescreen. 
118 | 	# 	# Since this is an idle callback, drawGL won't call itself (only calls render callback).
119 | 
120 | 	global whether_to_render, counter, global_positions, done_with_render, save_path, name_prefix, image_list
121 | 	done_with_render = False
122 | 
123 | 	if whether_to_render and counter<global_positions.shape[0]:	
124 | 	# if whether_to_render and counter<10:	
125 | 
126 | 		# print("Whether to render is actually true, with counter:",counter)
127 | 		# render_callback_time_independent()
128 | 		viewer.drawGL()
129 | 		name = "Image_{}_{}".format(name_prefix, counter)
130 | 		viewer.save_screen(save_path, name)
131 | 		img = imageio.imread(os.path.join(save_path, name+".png"))
132 | 		image_list.append(img)
133 | 
134 | 		counter += 1
135 | 
136 | 		# Set whether to render to false if counter exceeded. 
137 | 		if counter>=global_positions.shape[0]:
138 | 		# if counter>=10:
139 | 			whether_to_render = False
140 | 			done_with_render = True
141 | 
142 | 	# If whether to render is false, reset the counter.
143 | 	else:
144 | 		counter = 0


--------------------------------------------------------------------------------
/Experiments/Mocap_DataLoader.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | from __future__ import absolute_import
 8 | from __future__ import division
 9 | from __future__ import print_function
10 | 
11 | from headers import *
12 | 
13 | def resample(original_trajectory, desired_number_timepoints):
14 | 	original_traj_len = len(original_trajectory)
15 | 	new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int)
16 | 	return original_trajectory[new_timepoints]
17 | 
18 | class Mocap_Dataset(Dataset):
19 | 
20 | 	def __init__(self, args, split='all'):
21 | 		self.dataset_directory = '/checkpoint/tanmayshankar/Mocap/'
22 | 		self.args = args
23 | 		# Load the entire set of trajectories. 
24 | 		self.data_list = np.load(os.path.join(self.dataset_directory, "Demo_Array.npy"),allow_pickle=True)
25 | 		self.dataset_length = len(self.data_list)
26 | 		self.ds_freq = self.args.ds_freq
27 | 	
28 | 	def __len__(self):
29 | 		# Return length of file list. 
30 | 		return self.dataset_length
31 | 
32 | 	def process_item(self, item):
33 | 		resample_length = len(item['global_positions']) // self.ds_freq
34 | 			
35 | 		if resample_length<5:
36 | 			item['is_valid'] = False
37 | 		else:
38 | 			item['is_valid'] = True
39 | 			item['global_positions'] = resample(item['global_positions'], resample_length)
40 | 			demo = resample(item['local_positions'], resample_length)
41 | 			item['local_positions'] = demo
42 | 			item['local_rotations'] = resample(item['local_rotations'], resample_length)
43 | 			item['animation'] = resample(item['animation'], resample_length)
44 | 						
45 | 			# Replicate as demo for downstream dataloading. # Reshape to TxNumber of dimensions.
46 | 			item['demo'] = demo.reshape((demo.shape[0],-1))
47 | 
48 | 		return item
49 | 
50 | 	def __getitem__(self, index):
51 | 		# Return n'th item of dataset.
52 | 		# This has already processed everything.
53 | 
54 | 		# Remember, the global and local posiitons are all stored as Number_Frames x Number_Joints x 3 array. 
55 | 		# Change this to # Number_Frames x Number_Dimensions...? But the dimensions are not independent.. so what do we do?
56 | 
57 | 		return self.process_item(copy.deepcopy(self.data_list[index]))
58 | 
59 | 	def compute_statistics(self):
60 | 		embed()


--------------------------------------------------------------------------------
/Experiments/Processing_MocapData.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | import mocap_processing, glob, numpy as np, os
 8 | from mocap_processing.motion.pfnn import Animation, BVH
 9 | from mocap_processing.motion.pfnn import Animation, BVH
10 | from IPython import embed
11 | 
12 | # Define function that loads global and local posiitons, and the rotations from a datafile.
13 | def load_animation_data(bvh_filename):
14 |     animation, joint_names, time_per_frame = BVH.load(bvh_filename)
15 |     global_positions = Animation.positions_global(animation)
16 |     # return global_positions, joint_parents, time_per_frame
17 |     return global_positions, animation.positions, animation.rotations, animation
18 | 
19 | # Set directory. 
20 | directory = "/checkpoint/dgopinath/amass/CMU"
21 | save_directory = "/checkpoint/tanmayshankar/Mocap"
22 | # Get file list. 
23 | filelist = glob.glob(os.path.join(directory,"*/*.bvh"))
24 | 
25 | demo_list = []
26 | 
27 | print("Starting to preprocess data.")
28 | 
29 | for i in range(len(filelist)):
30 | 
31 | 	print("Processing file number: ",i, " of ",len(filelist))
32 | 	# Get filename. 
33 | 	filename = os.path.join(directory, filelist[i])
34 | 	# Actually load file. 
35 | 	global_positions, local_positions, local_rotations, animation = load_animation_data(filename)
36 | 
37 | 	# Create data element object.
38 | 	data_element = {}
39 | 	data_element['global_positions'] = global_positions
40 | 	data_element['local_positions'] = local_positions
41 | 	# Get quaternion as array.
42 | 	data_element['local_rotations'] = local_rotations.qs	
43 | 	data_element['animation'] = animation
44 | 
45 | 	demo_list.append(data_element)
46 | 
47 | demo_array = np.array(demo_list)
48 | np.save(os.path.join(save_directory,"Demo_Array.npy"),demo_array)


--------------------------------------------------------------------------------
/Experiments/RLUtils.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from headers import *
  8 | 
  9 | def resample(original_trajectory, desired_number_timepoints):
 10 | 	original_traj_len = len(original_trajectory)
 11 | 	new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int)
 12 | 	return original_trajectory[new_timepoints]
 13 | 
 14 | class Transition():
 15 | 
 16 | 	def __init__(self, state, action, next_state, onestep_reward, terminal, success):
 17 | 		# Now that we're doing 1step TD, and AC architectures rather than MC, 
 18 | 		# Don't need an explicit value of return.
 19 | 		self.state = state
 20 | 		self.action = action
 21 | 		self.next_state = next_state
 22 | 		self.onestep_reward = onestep_reward
 23 | 		self.terminal = terminal
 24 | 		self.success = success
 25 | 
 26 | class Episode_TransitionList():
 27 | 
 28 | 	def __init__(self, transition_list):
 29 | 		self.episode = transition_list	
 30 | 
 31 | 	def length(self):
 32 | 		return len(self.episode)
 33 | 
 34 | # Alternate way of implementing an episode... 
 35 | # Make it a class that has state_list, action_list, etc. over the episode..
 36 | class Episode():
 37 | 
 38 | 	def __init__(self, state_list=None, action_list=None, reward_list=None, terminal_list=None):
 39 | 		self.state_list = state_list
 40 | 		self.action_list = action_list
 41 | 		self.reward_list = reward_list
 42 | 		self.terminal_list = terminal_list
 43 | 		self.episode_lenth = len(self.state_list)
 44 | 
 45 | 	def length(self):
 46 | 		return self.episode_lenth
 47 | 
 48 | class HierarchicalEpisode(Episode):
 49 | 
 50 | 	def __init__(self, state_list=None, action_list=None, reward_list=None, terminal_list=None, latent_z_list=None, latent_b_list=None):
 51 | 
 52 | 		super(HierarchicalEpisode, self).__init__(state_list, action_list, reward_list, terminal_list)
 53 | 
 54 | 		self.latent_z_list = latent_z_list
 55 | 		self.latent_b_list = latent_b_list		
 56 | 
 57 | class ReplayMemory():
 58 | 
 59 | 	def __init__(self, memory_size=10000):
 60 | 		
 61 | 		# Implementing the memory as a list of EPISODES.
 62 | 		# This acts as a queue. 
 63 | 		self.memory = []
 64 | 
 65 | 		# Accessing the memory with indices should be constant time, so it's okay to use a list. 
 66 | 		# Not using a priority either. 
 67 | 		self.memory_len = 0
 68 | 		self.memory_size = memory_size
 69 | 
 70 | 		print("Setup Memory.")
 71 | 
 72 | 	def append_to_memory(self, episode):
 73 | 
 74 | 		if self.check_full():
 75 | 			# Remove first episode in the memory (queue).
 76 | 			self.memory.pop(0)
 77 | 			# Now push the episode to the end of hte queue. 
 78 | 			self.memory.append(episode)
 79 | 		else:
 80 | 			self.memory.append(episode)
 81 | 
 82 | 		self.memory_len+=1
 83 | 
 84 | 	def sample_batch(self, batch_size=25):
 85 | 
 86 | 		self.memory_len = len(self.memory)
 87 | 
 88 | 		indices = np.random.randint(0,high=self.memory_len,size=(batch_size))
 89 | 
 90 | 		return indices
 91 | 
 92 | 	def retrieve_batch(self, batch_size=25):
 93 | 		# self.memory_len = len(self.memory)
 94 | 
 95 | 		return np.arange(0,batch_size)
 96 | 
 97 | 	def check_full(self):
 98 | 
 99 | 		self.memory_len = len(self.memory)
100 | 
101 | 		if self.memory_len<self.memory_size:
102 | 			return 0 
103 | 		else:
104 | 			return 1 
105 | 
106 | # Refer: https://towardsdatascience.com/deep-deterministic-policy-gradients-explained-2d94655a9b7b
107 | """
108 | Taken from https://github.com/vitchyr/rlkit/blob/master/rlkit/exploration_strategies/ou_strategy.py
109 | """
110 | class OUNoise(object):
111 |     def __init__(self, action_space_size, mu=0.0, theta=0.15, max_sigma=0.2, min_sigma=0.2, decay_period=100000):
112 |         self.mu           = mu
113 |         self.theta        = theta
114 |         self.sigma        = max_sigma
115 |         self.max_sigma    = max_sigma
116 |         self.min_sigma    = min_sigma
117 |         self.decay_period = decay_period
118 |         self.action_dim   = action_space_size
119 |         self.low          = -np.ones((self.action_dim))
120 |         self.high         = np.ones((self.action_dim))
121 |         self.reset()
122 |         
123 |     def reset(self):
124 |         self.state = np.ones(self.action_dim) * self.mu
125 |         
126 |     def evolve_state(self):
127 |         x  = self.state
128 |         dx = self.theta * (self.mu - x) + self.sigma * np.random.randn(self.action_dim)
129 |         self.state = x + dx
130 |         return self.state
131 |     
132 |     def get_action(self, action, t=0): 
133 |         ou_state = self.evolve_state()
134 |         self.sigma = self.max_sigma - (self.max_sigma - self.min_sigma) * min(1.0, t / self.decay_period)
135 |         return np.clip(action + ou_state, self.low, self.high)


--------------------------------------------------------------------------------
/Experiments/TFLogger.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | 
  8 | # Code referenced from https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514
  9 | import tensorflow as tf
 10 | import numpy as np
 11 | import scipy.misc 
 12 | try:
 13 |     from StringIO import StringIO  # Python 2.7
 14 | except ImportError:
 15 |     from io import BytesIO         # Python 3.x
 16 | 
 17 | import tempfile
 18 | import moviepy.editor as mpy
 19 | import os
 20 | import os.path as osp
 21 | import numpy as np
 22 | 
 23 | from tensorflow.python.framework import constant_op 
 24 | from tensorflow.python.ops import summary_op_util
 25 | 
 26 | 
 27 | def py_encode_gif(im_thwc, tag, fps=4):
 28 |     """
 29 |     Given a 4D numpy tensor of images, encodes as a gif.
 30 |     """
 31 |     with tempfile.NamedTemporaryFile() as f: fname = f.name + '.gif'
 32 |     clip = mpy.ImageSequenceClip(list(im_thwc), fps=fps)
 33 |     clip.write_gif(fname, verbose=False, logger=None)
 34 |     with open(fname, 'rb') as f: enc_gif = f.read()
 35 |     os.remove(fname)
 36 |     # create a tensorflow image summary protobuf:
 37 |     thwc = im_thwc.shape
 38 |     im_summ = tf.Summary.Image()
 39 |     im_summ.height = thwc[1]
 40 |     im_summ.width = thwc[2]
 41 |     im_summ.colorspace = 3 # fix to 3 == RGB
 42 |     im_summ.encoded_image_string = enc_gif
 43 |     return im_summ
 44 | 
 45 |     # create a summary obj:
 46 |     #summ = tf.Summary()
 47 |     #summ.value.add(tag=tag, image=im_summ)
 48 |     #summ_str = summ.SerializeToString()
 49 |     #return summ_str
 50 | 
 51 | 
 52 | class Logger(object):
 53 |     
 54 |     def __init__(self, log_dir):
 55 |         """Create a summary writer logging to log_dir."""
 56 |         # Switching to TF 2.2 implementation.
 57 |         self.writer = tf.summary.create_file_writer(log_dir)
 58 |         # self.writer = tf.summary.FileWriter(log_dir)
 59 | 
 60 |     def scalar_summary(self, tag, value, step):
 61 |         """Log a scalar variable."""
 62 |         summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value)])
 63 |         self.writer.add_summary(summary, step)
 64 | 
 65 |     def gif_summary(self, tag, images, step):
 66 |         """Log a list of TXHXWX3 images."""
 67 |         # from https://github.com/tensorflow/tensorboard/issues/39
 68 | 
 69 |         img_summaries = []
 70 |         for i, img in enumerate(images):
 71 |             # Create a Summary value
 72 |             img_sum = py_encode_gif(img, '%s/%d' % (tag, i), fps=4)
 73 |             img_summaries.append(tf.Summary.Value(tag='%s/%d' % (tag, i), image=img_sum))
 74 | 
 75 |         # Create and write Summary
 76 |         summary = tf.Summary(value=img_summaries)
 77 |         self.writer.add_summary(summary, step)
 78 | 
 79 |     def image_summary(self, tag, images, step):
 80 |         """Log a list of images."""
 81 | 
 82 |         img_summaries = []
 83 |         for i, img in enumerate(images):
 84 |             # Write the image to a string
 85 |             try:
 86 |                 s = StringIO()
 87 |             except:
 88 |                 s = BytesIO()
 89 |             scipy.misc.toimage(img).save(s, format="png")
 90 | 
 91 |             # Create an Image object
 92 |             img_sum = tf.Summary.Image(encoded_image_string=s.getvalue(),
 93 |                                        height=img.shape[0],
 94 |                                        width=img.shape[1])
 95 |             # Create a Summary value
 96 |             img_summaries.append(tf.Summary.Value(tag='%s/%d' % (tag, i), image=img_sum))
 97 | 
 98 |         # Create and write Summary
 99 |         summary = tf.Summary(value=img_summaries)
100 |         self.writer.add_summary(summary, step)
101 | 
102 |     def histo_summary(self, tag, values, step, bins=1000):
103 |         """Log a histogram of the tensor of values."""
104 | 
105 |         # Create a histogram using numpy
106 |         counts, bin_edges = np.histogram(values, bins=bins)
107 | 
108 |         # Fill the fields of the histogram proto
109 |         hist = tf.HistogramProto()
110 |         hist.min = float(np.min(values))
111 |         hist.max = float(np.max(values))
112 |         hist.num = int(np.prod(values.shape))
113 |         hist.sum = float(np.sum(values))
114 |         hist.sum_squares = float(np.sum(values**2))
115 | 
116 |         # Drop the start of the first bin
117 |         bin_edges = bin_edges[1:]
118 | 
119 |         # Add bin edges and counts
120 |         for edge in bin_edges:
121 |             hist.bucket_limit.append(edge)
122 |         for c in counts:
123 |             hist.bucket.append(c)
124 | 
125 |         # Create and write Summary
126 |         summary = tf.Summary(value=[tf.Summary.Value(tag=tag, histo=hist)])
127 |         self.writer.add_summary(summary, step)
128 |         self.writer.flush()


--------------------------------------------------------------------------------
/Experiments/TestClass.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | 
  8 | from headers import *
  9 | 
 10 | class TestLoaderWithKwargs(unittest.TestLoader):
 11 |     """A test loader which allows to parse keyword arguments to the
 12 |        test case class."""
 13 |     # def loadTestsFromTestCase(self, testCaseClass, **kwargs):
 14 |     def loadTestsFromTestCase(self, testCaseClass, policy_manager):
 15 |         """Return a suite of all tests cases contained in 
 16 |            testCaseClass."""
 17 |         if issubclass(testCaseClass, unittest.suite.TestSuite):
 18 |             raise TypeError("Test cases should not be derived from "\
 19 |                             "TestSuite. Maybe you meant to derive from"\
 20 |                             " TestCase?")
 21 |         testCaseNames = self.getTestCaseNames(testCaseClass)
 22 |         if not testCaseNames and hasattr(testCaseClass, 'runTest'):
 23 |             testCaseNames = ['runTest']
 24 | 
 25 |         # Modification here: parse keyword arguments to testCaseClass.
 26 |         test_cases = []
 27 | 
 28 |         # embed()
 29 |         for test_case_name in testCaseNames:
 30 |             # test_cases.append(testCaseClass(policy_manager))
 31 |             test_cases.append(testCaseClass(test_case_name, policy_manager))
 32 |         loaded_suite = self.suiteClass(test_cases)
 33 | 
 34 |         return loaded_suite 
 35 | 
 36 | class MetaTestClass(unittest.TestCase):
 37 | 
 38 | 	def __init__(self, test_name, policy_manager):		
 39 | 		super(MetaTestClass, self).__init__(test_name)
 40 | 		
 41 | 		self.policy_manager = policy_manager
 42 | 		self.args = self.policy_manager.args
 43 | 		self.dataset = self.policy_manager.dataset
 44 | 
 45 | 	def test_dataloader(self):
 46 | 
 47 | 		if self.args.data=='Roboturk':
 48 | 			self.check_Roboturkdataloader()
 49 | 		if self.args.data=='MIME':
 50 | 			self.check_MIMEdataloader()
 51 | 
 52 | 	def check_MIMEdataloader(self):
 53 | 
 54 | 		# Check the first index of the dataset.
 55 | 		data_element = self.dataset[0]
 56 | 
 57 | 		validity = data_element['is_valid']==1
 58 | 		check_demo_data = (data_element['demo']==np.load("Test_Data/MIME_Dataloader_DE.npy")).all()
 59 | 
 60 | 		self.assertTrue(validity and check_demo_data)
 61 | 
 62 | 	def check_Roboturkdataloader(self):
 63 | 
 64 | 		# Check the first index of the dataset.
 65 | 		data_element = self.dataset[0]
 66 | 
 67 | 		validity = data_element['is_valid']
 68 | 		check_demo_data = (data_element['demo']==np.load("Test_Data/Roboturk_Dataloader_DE.npy")).all()
 69 | 
 70 | 		self.assertTrue(validity and check_demo_data)
 71 | 
 72 | 	def test_variational_policy(self):
 73 | 
 74 | 		if self.args.setting=='learntsub':
 75 | 			# Assume the variational policy is an instance of ContinuousVariationalPolicyNetwork_BPrior class.		
 76 | 			inputs = torch.ones((40,self.policy_manager.variational_policy.input_size)).cuda().float()
 77 | 
 78 | 			expected_outputs = np.load("Test_Data/{0}_Varpolicy_Res.npy".format(self.args.data),allow_pickle=True)
 79 | 			pred_outputs = self.policy_manager.variational_policy.forward(inputs, epsilon=0.)
 80 | 			error = (((expected_outputs[0]-pred_outputs[0])**2).mean()).detach().cpu().numpy()
 81 | 
 82 | 			threshold = 0.01
 83 | 
 84 | 			self.assertTrue(error < threshold)
 85 | 		else:
 86 | 			pass
 87 | 
 88 | 	def test_subpolicy(self):
 89 | 
 90 | 		# Assume the subpolicy is an instance of ContinuousPolicyNetwork class.
 91 | 		inputs = torch.ones((15,self.policy_manager.policy_network.input_size)).cuda().float()
 92 | 		actions = np.ones((15,self.policy_manager.policy_network.output_size))
 93 | 
 94 | 		expected_outputs = np.load("Test_Data/{0}_Subpolicy_Res.npy".format(self.args.data),allow_pickle=True)
 95 | 		pred_outputs = self.policy_manager.policy_network.forward(inputs, actions)
 96 | 
 97 | 		error = (((expected_outputs[0]-pred_outputs[0])**2).mean()).detach().cpu().numpy()
 98 | 
 99 | 		threshold = 0.01
100 | 
101 | 		self.assertTrue(error < threshold)
102 | 
103 | 	def test_latent_policy(self):
104 | 
105 | 		# Assume the latent policy is a ContinuousLatentPolicyNetwork class.
106 | 		pass
107 | 
108 | 	def test_encoder_policy(self):
109 | 
110 | 		# Assume is instance of ContinuousEncoderNetwork class.
111 | 		pass


--------------------------------------------------------------------------------
/Experiments/Visualizers.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates.
  2 | # All rights reserved.
  3 | 
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | from absl import flags, app
 12 | import copy, os, imageio, scipy.misc, pdb, math, time, numpy as np
 13 | 
 14 | import robosuite, threading
 15 | from robosuite.wrappers import IKWrapper
 16 | import matplotlib.pyplot as plt
 17 | from IPython import embed
 18 | 
 19 | # # Mocap viz.
 20 | # import MocapVisualizationUtils
 21 | # from mocap_processing.motion.pfnn import Animation, BVH
 22 | 
 23 | class SawyerVisualizer():
 24 | 
 25 | 	def __init__(self, has_display=False):
 26 | 
 27 | 		# Create environment.
 28 | 		print("Do I have a display?", has_display)
 29 | 		# self.base_env = robosuite.make('BaxterLift', has_renderer=has_display)
 30 | 		self.base_env = robosuite.make("SawyerViz",has_renderer=has_display)
 31 | 
 32 | 		# Create kinematics object. 
 33 | 		self.sawyer_IK_object = IKWrapper(self.base_env)
 34 | 		self.environment = self.sawyer_IK_object.env        
 35 | 
 36 | 	def update_state(self):
 37 | 		# Updates all joint states
 38 | 		self.full_state = self.environment._get_observation()
 39 | 
 40 | 	def set_joint_pose_return_image(self, joint_angles, arm='both', gripper=False):
 41 | 
 42 | 		# In the roboturk dataset, we've the following joint angles: 
 43 | 		# ('time','right_j0', 'head_pan', 'right_j1', 'right_j2', 'right_j3', 'right_j4', 'right_j5', 'right_j6', 'r_gripper_l_finger_joint', 'r_gripper_r_finger_joint')
 44 | 
 45 | 		# Set usual joint angles through set joint positions API.
 46 | 		self.environment.reset()
 47 | 		self.environment.set_robot_joint_positions(joint_angles[:7])
 48 | 
 49 | 		# For gripper, use "step". 
 50 | 		# Mujoco requires actions that are -1 for Open and 1 for Close.
 51 | 
 52 | 		# [l,r]
 53 | 		# gripper_open = [0.0115, -0.0115]
 54 | 		# gripper_closed = [-0.020833, 0.020833]
 55 | 		# In mujoco, -1 is open, and 1 is closed.
 56 | 		
 57 | 		actions = np.zeros((8))
 58 | 		actions[-1] = joint_angles[-1]
 59 | 
 60 | 		# Move gripper positions.
 61 | 		self.environment.step(actions)
 62 | 
 63 | 		image = np.flipud(self.environment.sim.render(600, 600, camera_name='vizview1'))
 64 | 		return image
 65 | 
 66 | 	def visualize_joint_trajectory(self, trajectory, return_gif=False, gif_path=None, gif_name="Traj.gif", segmentations=None, return_and_save=False, additional_info=None):
 67 | 
 68 | 		image_list = []
 69 | 		for t in range(trajectory.shape[0]):
 70 | 			new_image = self.set_joint_pose_return_image(trajectory[t])
 71 | 			image_list.append(new_image)
 72 | 
 73 | 			# Insert white 
 74 | 			if segmentations is not None:
 75 | 				if t>0 and segmentations[t]==1:
 76 | 					image_list.append(255*np.ones_like(new_image)+new_image)
 77 | 
 78 | 		if return_and_save:
 79 | 			imageio.mimsave(os.path.join(gif_path,gif_name), image_list)
 80 | 			return image_list
 81 | 		elif return_gif:
 82 | 			return image_list
 83 | 		else:
 84 | 			imageio.mimsave(os.path.join(gif_path,gif_name), image_list)            
 85 | 
 86 | class BaxterVisualizer():
 87 | 
 88 | 	def __init__(self, has_display=False):
 89 | 
 90 | 		# Create environment.
 91 | 		print("Do I have a display?", has_display)
 92 | 		# self.base_env = robosuite.make('BaxterLift', has_renderer=has_display)
 93 | 		self.base_env = robosuite.make("BaxterViz",has_renderer=has_display)
 94 | 
 95 | 		# Create kinematics object. 
 96 | 		self.baxter_IK_object = IKWrapper(self.base_env)
 97 | 		self.environment = self.baxter_IK_object.env        
 98 | 	
 99 | 	def update_state(self):
100 | 		# Updates all joint states
101 | 		self.full_state = self.environment._get_observation()
102 | 
103 | 	def set_ee_pose_return_image(self, ee_pose, arm='right', seed=None):
104 | 
105 | 		# Assumes EE pose is Position in the first three elements, and quaternion in last 4 elements. 
106 | 		self.update_state()
107 | 
108 | 		if seed is None:
109 | 			# Set seed to current state.
110 | 			seed = self.full_state['joint_pos']
111 | 
112 | 		if arm == 'right':
113 | 			joint_positions = self.baxter_IK_object.controller.inverse_kinematics(
114 | 				target_position_right=ee_pose[:3],
115 | 				target_orientation_right=ee_pose[3:],
116 | 				target_position_left=self.full_state['left_eef_pos'],
117 | 				target_orientation_left=self.full_state['left_eef_quat'],
118 | 				rest_poses=seed
119 | 			)
120 | 
121 | 		elif arm == 'left':
122 | 			joint_positions = self.baxter_IK_object.controller.inverse_kinematics(
123 | 				target_position_right=self.full_state['right_eef_pos'],
124 | 				target_orientation_right=self.full_state['right_eef_quat'],
125 | 				target_position_left=ee_pose[:3],
126 | 				target_orientation_left=ee_pose[3:],
127 | 				rest_poses=seed
128 | 			)
129 | 
130 | 		elif arm == 'both':
131 | 			joint_positions = self.baxter_IK_object.controller.inverse_kinematics(
132 | 				target_position_right=ee_pose[:3],
133 | 				target_orientation_right=ee_pose[3:7],
134 | 				target_position_left=ee_pose[7:10],
135 | 				target_orientation_left=ee_pose[10:],
136 | 				rest_poses=seed
137 | 			)
138 | 		image = self.set_joint_pose_return_image(joint_positions, arm=arm, gripper=False)
139 | 		return image
140 | 
141 | 	def set_joint_pose_return_image(self, joint_pose, arm='both', gripper=False):
142 | 
143 | 		# FOR FULL 16 DOF STATE: ASSUMES JOINT_POSE IS <LEFT_JA, RIGHT_JA, LEFT_GRIPPER, RIGHT_GRIPPER>.
144 | 
145 | 		self.update_state()
146 | 		self.state = copy.deepcopy(self.full_state['joint_pos'])
147 | 		# THE FIRST 7 JOINT ANGLES IN MUJOCO ARE THE RIGHT HAND. 
148 | 		# THE LAST 7 JOINT ANGLES IN MUJOCO ARE THE LEFT HAND. 
149 | 		
150 | 		if arm=='right':
151 | 			# Assume joint_pose is 8 DoF - 7 for the arm, and 1 for the gripper.
152 | 			self.state[:7] = copy.deepcopy(joint_pose[:7])
153 | 		elif arm=='left':    
154 | 			# Assume joint_pose is 8 DoF - 7 for the arm, and 1 for the gripper.
155 | 			self.state[7:] = copy.deepcopy(joint_pose[:7])
156 | 		elif arm=='both':
157 | 			# The Plans were generated as: Left arm, Right arm, left gripper, right gripper.
158 | 			# Assume joint_pose is 16 DoF. 7 DoF for left arm, 7 DoF for right arm. (These need to be flipped)., 1 for left gripper. 1 for right gripper.            
159 | 			# First right hand. 
160 | 			self.state[:7] = joint_pose[7:14]
161 | 			# Now left hand. 
162 | 			self.state[7:] = joint_pose[:7]
163 | 		# Set the joint angles magically. 
164 | 		self.environment.set_robot_joint_positions(self.state)
165 | 
166 | 		action = np.zeros((16))
167 | 		if gripper:
168 | 			# Left gripper is 15. Right gripper is 14. 
169 | 			# MIME Gripper values are from 0 to 100 (Close to Open), but we treat the inputs to this function as 0 to 1 (Close to Open), and then rescale to (-1 Open to 1 Close) for Mujoco.
170 | 			if arm=='right':
171 | 				action[14] = -joint_pose[-1]*2+1
172 | 			elif arm=='left':                        
173 | 				action[15] = -joint_pose[-1]*2+1
174 | 			elif arm=='both':
175 | 				action[14] = -joint_pose[15]*2+1
176 | 				action[15] = -joint_pose[14]*2+1
177 | 			# Move gripper positions.
178 | 			self.environment.step(action)
179 | 
180 | 		image = np.flipud(self.environment.sim.render(600, 600, camera_name='vizview1'))
181 | 		return image
182 | 
183 | 	def visualize_joint_trajectory(self, trajectory, return_gif=False, gif_path=None, gif_name="Traj.gif", segmentations=None, return_and_save=False, additional_info=None):
184 | 
185 | 		image_list = []
186 | 		for t in range(trajectory.shape[0]):
187 | 			new_image = self.set_joint_pose_return_image(trajectory[t])
188 | 			image_list.append(new_image)
189 | 
190 | 			# Insert white 
191 | 			if segmentations is not None:
192 | 				if t>0 and segmentations[t]==1:
193 | 					image_list.append(255*np.ones_like(new_image)+new_image)
194 | 
195 | 		if return_and_save:
196 | 			imageio.mimsave(os.path.join(gif_path,gif_name), image_list)
197 | 			return image_list
198 | 		elif return_gif:
199 | 			return image_list
200 | 		else:
201 | 			imageio.mimsave(os.path.join(gif_path,gif_name), image_list)
202 | 
203 | # class MocapVisualizer():
204 | 
205 | # 	def __init__(self, has_display=False, args=None):
206 | 
207 | # 		# Load some things from the MocapVisualizationUtils and set things up so that they're ready to go. 
208 | # 		# self.cam_cur = MocapVisualizationUtils.camera.Camera(pos=np.array([6.0, 0.0, 2.0]),
209 | # 		# 						origin=np.array([0.0, 0.0, 0.0]), 
210 | # 		# 						vup=np.array([0.0, 0.0, 1.0]), 
211 | # 		# 						fov=45.0)
212 | 
213 | # 		self.args = args
214 | 
215 | # 		# Default is local data. 
216 | # 		self.global_data = False
217 | 
218 | # 		self.cam_cur = MocapVisualizationUtils.camera.Camera(pos=np.array([4.5, 0.0, 2.0]),
219 | # 								origin=np.array([0.0, 0.0, 0.0]), 
220 | # 								vup=np.array([0.0, 0.0, 1.0]), 
221 | # 								fov=45.0)
222 | 
223 | # 		# Path to dummy file that is going to populate joint_parents, initial global positions, etc. 
224 | # 		bvh_filename = "/private/home/tanmayshankar/Research/Code/CausalSkillLearning/Experiments/01_01_poses.bvh"  
225 | 
226 | # 		# Run init before loading animation.
227 | # 		MocapVisualizationUtils.init()
228 | # 		MocapVisualizationUtils.global_positions, MocapVisualizationUtils.joint_parents, MocapVisualizationUtils.time_per_frame = MocapVisualizationUtils.load_animation(bvh_filename)
229 | 
230 | # 		# State sizes. 
231 | # 		self.number_joints = 22
232 | # 		self.number_dimensions = 3
233 | # 		self.total_dimensions = self.number_joints*self.number_dimensions
234 | 
235 | # 		# Run thread of viewer, so that callbacks start running. 
236 | # 		thread = threading.Thread(target=self.run_thread)
237 | # 		thread.start()
238 | 
239 | # 		# Also create dummy animation object. 
240 | # 		self.animation_object, _, _ = BVH.load(bvh_filename)
241 | 
242 | # 	def run_thread(self):
243 | # 		MocapVisualizationUtils.viewer.run(
244 | # 			title='BVH viewer',
245 | # 			cam=self.cam_cur,
246 | # 			size=(1280, 720),
247 | # 			keyboard_callback=None,
248 | # 			render_callback=MocapVisualizationUtils.render_callback_time_independent,
249 | # 			idle_callback=MocapVisualizationUtils.idle_callback_return,
250 | # 		) 
251 | 
252 | # 	def get_global_positions(self, positions, animation_object=None):
253 | # 		# Function to get global positions corresponding to predicted or actual local positions.
254 | 
255 | # 		traj_len = positions.shape[0]
256 | 
257 | # 		def resample(original_trajectory, desired_number_timepoints):
258 | # 			original_traj_len = len(original_trajectory)
259 | # 			new_timepoints = np.linspace(0, original_traj_len-1, desired_number_timepoints, dtype=int)
260 | # 			return original_trajectory[new_timepoints]
261 | 
262 | # 		if animation_object is not None:
263 | # 			# Now copy over from animation_object instead of just dummy animation object.
264 | # 			new_animation_object = Animation.Animation(resample(animation_object.rotations, traj_len), positions, animation_object.orients, animation_object.offsets, animation_object.parents)
265 | # 		else:	
266 | # 			# Create a dummy animation object. 
267 | # 			new_animation_object = Animation.Animation(self.animation_object.rotations[:traj_len], positions, self.animation_object.orients, self.animation_object.offsets, self.animation_object.parents)
268 | 
269 | # 		# Then transform them.
270 | # 		transformed_global_positions = Animation.positions_global(new_animation_object)
271 | 
272 | # 		# Now return coordinates. 
273 | # 		return transformed_global_positions
274 | 
275 | # 	def visualize_joint_trajectory(self, trajectory, return_gif=False, gif_path=None, gif_name="Traj.gif", segmentations=None, return_and_save=False, additional_info=None):
276 | 
277 | # 		image_list = []
278 | 
279 | # 		if self.global_data:
280 | # 			# If we predicted in the global setting, just reshape.
281 | # 			predicted_global_positions = np.reshape(trajectory, (-1,self.number_joints,self.number_dimensions)) 
282 | # 		else:
283 | # 			# If it's local data, then transform to global. 
284 | # 			# Assume trajectory is number of timesteps x number_dimensions. 
285 | # 			# Convert to number_of_timesteps x number_of_joints x 3.
286 | # 			predicted_local_positions = np.reshape(trajectory, (-1,self.number_joints,self.number_dimensions))
287 | 
288 | # 			# Assume trajectory was predicted in local coordinates. Transform to global for visualization.
289 | # 			predicted_global_positions = self.get_global_positions(predicted_local_positions, animation_object=additional_info)
290 | 
291 | # 		# Copy into the global variable.
292 | # 		MocapVisualizationUtils.global_positions = predicted_global_positions
293 | 
294 | # 		# Reset Image List. 
295 | # 		MocapVisualizationUtils.image_list = []
296 | # 		# Set save_path and prefix.
297 | # 		MocapVisualizationUtils.save_path = gif_path
298 | # 		MocapVisualizationUtils.name_prefix = gif_name.rstrip('.gif')
299 | # 		# Now set the whether_to_render as true. 
300 | # 		MocapVisualizationUtils.whether_to_render = True
301 | 
302 | # 		# Wait till rendering is complete. 
303 | # 		x_count = 0
304 | # 		while MocapVisualizationUtils.done_with_render==False and MocapVisualizationUtils.whether_to_render==True:
305 | # 			x_count += 1
306 | # 			time.sleep(1)
307 | 			
308 | # 		# Now that rendering is complete, load images.
309 | # 		image_list = MocapVisualizationUtils.image_list
310 | 
311 | # 		# Now actually save the GIF or return.
312 | # 		if return_and_save:
313 | # 			imageio.mimsave(os.path.join(gif_path,gif_name), image_list)
314 | # 			return image_list
315 | # 		elif return_gif:
316 | # 			return image_list
317 | # 		else:
318 | # 			imageio.mimsave(os.path.join(gif_path,gif_name), image_list)
319 | 
320 | class ToyDataVisualizer():
321 | 
322 | 	def __init__(self):
323 | 
324 | 		pass
325 | 
326 | 	def visualize_joint_trajectory(self, trajectory, return_gif=False, gif_path=None, gif_name="Traj.gif", segmentations=None, return_and_save=False, additional_info=None):
327 | 
328 | 		fig = plt.figure()		
329 | 		ax = fig.gca()
330 | 		ax.scatter(trajectory[:,0],trajectory[:,1],c=range(len(trajectory)),cmap='jet')
331 | 		plt.xlim(-10,10)
332 | 		plt.ylim(-10,10)
333 | 
334 | 		fig.canvas.draw()
335 | 
336 | 		width, height = fig.get_size_inches() * fig.get_dpi()
337 | 		image = np.fromstring(fig.canvas.tostring_rgb(), dtype=np.uint8).reshape(int(height), int(width), 3)
338 | 		image = np.transpose(image, axes=[2,0,1])
339 | 
340 | 		return image
341 | 
342 | 
343 | if __name__ == '__main__':
344 | 	# end_eff_pose = [0.3, -0.3, 0.09798524029948213, 0.38044099037703677, 0.9228975092885654, -0.021717379118030174, 0.05525572942370394]
345 | 	# end_eff_pose = [0.53303758, -0.59997265,  0.09359371,  0.77337391,  0.34998901, 0.46797516, -0.24576358]
346 | 	# end_eff_pose = np.array([0.64, -0.83, 0.09798524029948213, 0.38044099037703677, 0.9228975092885654, -0.021717379118030174, 0.05525572942370394])
347 | 	visualizer = MujocoVisualizer()
348 | 	# img = visualizer.set_ee_pose_return_image(end_eff_pose, arm='right')
349 | 	# scipy.misc.imsave('mj_vis.png', img)
350 | 


--------------------------------------------------------------------------------
/Experiments/cluster_run.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | 
 8 | """
 9 | Wrapper script for launching a job on the fair cluster.
10 | Sample usage:
11 |       python cluster_run.py --name=trial --setup='/path/to/setup.sh' --cmd='job_command'
12 | """
13 | 
14 | from __future__ import absolute_import
15 | from __future__ import division
16 | from __future__ import print_function
17 | 
18 | import pdb
19 | from absl import app
20 | from absl import flags
21 | import os
22 | import sys
23 | import random
24 | import string
25 | import datetime
26 | import re
27 | 
28 | opts = flags.FLAGS
29 | 
30 | flags.DEFINE_integer('nodes', 1, 'Number of nodes per task')
31 | flags.DEFINE_integer('ntp', 1, 'Number of tasks per node')
32 | flags.DEFINE_integer('ncpus', 40, 'Number of cpu cores per task')
33 | flags.DEFINE_integer('ngpus', 1, 'Number of gpus per task')
34 | 
35 | flags.DEFINE_string('name', '', 'Job name')
36 | flags.DEFINE_enum('partition', 'learnfair', ['dev', 'priority','uninterrupted','learnfair'], 'Cluster partition')
37 | flags.DEFINE_string('comment', 'for ICML deadline in 2020.', 'Comment')
38 | flags.DEFINE_string('time', '72:00:00', 'Time for which the job should run')
39 | 
40 | flags.DEFINE_string('setup', '/private/home/tanmayshankar/Research/Code/Setup.bash', 'Setup script that will be run before the command')
41 | # flags.DEFINE_string('workdir', os.getcwd(), 'Job command')
42 | flags.DEFINE_string('workdir', '/private/home/tanmayshankar/Research/Code/CausalSkillLearning/Experiments', 'Jod command')
43 | # flags.DEFINE_string('workdir', '/private/home/tanmayshankar/Research/Code/SkillsfromDemonstrations/Experiments/BidirectionalInfoModel/', 'Job command')
44 | flags.DEFINE_string('cmd', 'echo $PWD', 'Directory to run job from')
45 | 
46 | 
47 | def mkdir(path):
48 |     if not os.path.exists(path):
49 |         os.makedirs(path)
50 | 
51 | def main(_):
52 |     job_folder = '/checkpoint/tanmayshankar/jobs/' + datetime.date.today().strftime('%y_%m_%d')
53 |     mkdir(job_folder)
54 | 
55 |     if len(opts.name) == 0:
56 |         # read name from command
57 |         opts.name = re.search('--name=\w+', opts.cmd).group(0)[7:]
58 |     print(opts.name)
59 |     slurm_cmd = '#!/bin/bash\n\n'
60 |     slurm_cmd += '#SBATCH --job-name={}\n'.format(opts.name)
61 |     slurm_cmd += '#SBATCH --output={}/{}-%j.out\n'.format(job_folder, opts.name)
62 |     slurm_cmd += '#SBATCH --error={}/{}-%j.err\n'.format(job_folder, opts.name)
63 |     # slurm_cmd += '#SBATCH --exclude=learnfair2038'
64 |     slurm_cmd += '\n'
65 | 
66 |     slurm_cmd += '#SBATCH --partition={}\n'.format(opts.partition)
67 |     if len(opts.comment) > 0:
68 |         slurm_cmd += '#SBATCH --comment="{}"\n'.format(opts.comment)
69 |     slurm_cmd += '\n'
70 | 
71 |     slurm_cmd += '#SBATCH --nodes={}\n'.format(opts.nodes)
72 |     slurm_cmd += '#SBATCH --ntasks-per-node={}\n'.format(opts.ntp)
73 |     if opts.ngpus > 0:
74 |         slurm_cmd += '#SBATCH --gres=gpu:{}\n'.format(opts.ngpus)
75 |     slurm_cmd += '#SBATCH --cpus-per-task={}\n'.format(opts.ncpus)
76 |     slurm_cmd += '#SBATCH --time={}\n'.format(opts.time)
77 |     slurm_cmd += '\n'
78 | 
79 |     slurm_cmd += 'source {}\n'.format(opts.setup)
80 |     slurm_cmd += 'cd {} \n\n'.format(opts.workdir)
81 |     slurm_cmd += '{}\n'.format(opts.cmd)
82 | 
83 |     job_fname = '{}/{}.sh'.format(job_folder, ''.join(random.choices(string.ascii_letters, k=8)))
84 | 
85 |     with open(job_fname, 'w') as f:
86 |         f.write(slurm_cmd)
87 | 
88 |     #print('sbatch {}'.format(job_fname))
89 |     os.system('sbatch {}'.format(job_fname))
90 | 
91 | 
92 | if __name__ == '__main__':
93 |     app.run(main)
94 | 
95 | 


--------------------------------------------------------------------------------
/Experiments/headers.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates.
 2 | # All rights reserved.
 3 | 
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | 
 7 | import numpy as np
 8 | import glob, os, sys, argparse
 9 | import torch, copy
10 | from torch.utils.data import Dataset, DataLoader
11 | from torchvision import transforms, utils
12 | from IPython import embed
13 | 
14 | import matplotlib
15 | matplotlib.use('Agg')
16 | # matplotlib.rcParams['animation.ffmpeg_args'] = '-report'
17 | matplotlib.rcParams['animation.bitrate'] = 2000
18 | import matplotlib.pyplot as plt
19 | import tensorboardX
20 | from scipy import stats
21 | from absl import flags
22 | from memory_profiler import profile
23 | from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
24 | from matplotlib.figure import Figure
25 | 
26 | from IPython import embed
27 | import pdb
28 | import sklearn.manifold as skl_manifold
29 | from sklearn.decomposition import PCA
30 | from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
31 |                                   AnnotationBbox)
32 | from matplotlib.animation import FuncAnimation
33 | import tensorflow as tf
34 | import tempfile
35 | import moviepy.editor as mpy
36 | import subprocess
37 | import h5py
38 | import time
39 | import robosuite
40 | import unittest
41 | import cProfile
42 | 
43 | from scipy import stats, signal
44 | from scipy.interpolate import interp1d
45 | from scipy.ndimage.filters import gaussian_filter1d
46 | from scipy.signal import find_peaks, argrelextrema
47 | 
48 | from sklearn.neighbors import NearestNeighbors
49 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Learning Robot Skills with Temporal Variational Inference
 2 | 
 3 | ### What is this? ###
 4 | 
 5 | This repository has code for our ICML 2020 paper on [Learning Robot Skills with Temporal Variational Inference](https://proceedings.icml.cc/static/paper_files/icml/2020/2847-Paper.pdf), authored by Tanmay Shankar and Abhinav Gupta.
 6 | 
 7 | ### I want a TL;DR of what this paper does. ###
 8 | 
 9 | Our paper presents a way to jointly learn robot skills and how to use them from demonstrations in an unsupervised manner. 
10 | The code implements the training procedure for this across 3 different datasets, and provides tools to visualize the learnt skills.
11 | 
12 | ### Cool. Can I use your code? ###
13 | 
14 | Yes! If you would like to use our code, please cite our paper and this repository in your work.
15 | Also, be aware of the license for this repository: the Creative Commons Attribution-NonCommercial 4.0 International. Details may be viewed in the License file. 
16 | 
17 | ### I need help, or I have brilliant ideas to make this code even better. ###
18 | 
19 | Great! Feel free to mail Tanmay (tanmay.shankar@gmail.com), for help, suggestions, questions and feedback. You can also create issues in the repository, if you feel like the problem is pertinent to others. 
20 | 
21 | ### How do I set up this repository? ###
22 | 
23 | #### Dependencies ####
24 | You will need a few packages to be able to run the code in this repository.
25 | For Robotic environments, you will need to install Mujoco, Mujoco_Py, OpenAI Gym, and Robosuite. [Here](https://docs.google.com/document/d/1V6BJf4R-2TXKO_IEOII5rLJbGj0jrJPptjtBCfczPk8/edit?usp=sharing) is a list of instructions on how to set these up. 
26 | 
27 | You will also need some standard deep learning packages, Pytorch, Tensorflow, Tensorboard, and TensorboardX. Usually you can just pip install these packages. We recommend using a virtual environment for them. 
28 | 
29 | #### Data ####
30 | We run our model on various publicly available datasets, i.e. the [MIME dataset](https://sites.google.com/view/mimedataset), the [Roboturk dataset](https://sites.google.com/view/mimedataset), and the [CMU Mocap dataset](http://mocap.cs.cmu.edu/). In the case of the MIME and Roboturk datasets, we collate relevant data modalities and store them in quickly accessible formats for our code. You can find the links to these files below.
31 | 
32 | [MIME Dataset]()
33 | [Roboturk Dataset]()
34 | [CMU Mocap Dataset]()
35 | 
36 | Once you have downloaded this data locally, you will want to feed the path to these datasets in the `--dataset_directory` command line flag when you run your code.
37 | 
38 | ### Tell me how to run the code already! ###
39 | 
40 | Here is a list of commands to run pre-training and joint skill learning on the various datasets used in our paper. The hyper-parameters specified here are used in our paper. Depending on your use case, you may want to play with these values. For a full list of the hyper-parameters, look at `Experiments/Master.py`. 
41 | 
42 | #### The MIME Dataset ####
43 | For the MIME dataset, to run pre-training of the low-level policy: 
44 | 
45 | ```
46 | python Master.py --train=1 --setting=pretrain_sub --name=MIME_Pretraining --data=MIME --number_layers=8 --hidden_size=128 --kl_weight=0.01 --var_skill_length=1 --z_dimensions=64 --normalization=meanvar
47 | ```
48 |   
49 | This should automatically run some evaluation and visualization tools every few epochs, and you can view the results in Experimental_Logs/<Run_Name>/. 
50 | Once you've run this pre-training, you can run the joint training using: 
51 | 
52 | ```
53 | python Master.py --train=1 --setting=learntsub --name=J100 --normalization=meanvar --kl_weight=0.0001 --subpolicy_ratio=0.1 --latentpolicy_ratio=0.001 --b_probability_factor=0.01 --data=MIME --subpolicy_model=Experiment_Logs/<MIME_Pretraining>/saved_models/Model_epoch480 --latent_loss_weight=0.01 --z_dimensions=64 --traj_length=-1 --var_skill_length=1 --training_phase_size=200000  
54 | ```
55 | 
56 | #### The Roboturk Dataset ####
57 | For the Roboturk dataset, to run pre-training of the low-level policy: 
58 | 
59 | ```
60 | python Master.py --train=1 --setting=pretrain_sub --name=Roboturk_Pretraining --data=FullRoboturk --kl_weight=0.0001 --var_skill_length=1 --z_dimensions=64 --number_layers=8 --hidden_size=128
61 | ```
62 | 
63 | Just as in the case of the MIME dataset, you can then run the joint training using: 
64 | 
65 | ```
66 | python Master.py --train=1 --setting=learntsub --name=RJ80 --latent_loss_weight=1. --latentpolicy_ratio=0.01 --kl_weight=0.0001 --subpolicy_ratio=0.1 --b_probability_factor=0.001 --data=Roboturk --subpolicy_model=Experiment_Logs/<Roboturk_Pretraining>/saved_models/Model_epoch20 --z_dimensions=64 --traj_length=-1 --var_skill_length=1 --number_layers=8 --hidden_size=128
67 | ```
68 | Stay tuned for more! 
69 | 


--------------------------------------------------------------------------------