├── .gitignore
├── LICENSE.md
├── README.md
├── assets
├── DQL_NN_architecture.png
├── DQL_results.jpg
├── DQL_sim_quant
│ ├── entropies.jpg
│ ├── trained_strategy.jpg
│ ├── training_history.jpg
│ └── training_history_moving_average.jpg
├── DQL_sim_quant_results.jpg
├── QL_results.jpg
├── agent_walk.gif
├── bellman_equation.png
├── circuit.jpg
├── encoding.jpg
├── layer_1.jpg
├── number_of_steps.png
└── percentage_of_succesfull_walks.png
├── requirements.txt
├── results
├── auto_hp_tuning
│ ├── results.csv
│ ├── results_scatter_plot.gif
│ └── results_triangular_surface.gif
├── classical_DQL
│ ├── cl_entropies.txt
│ ├── entropies.jpg
│ ├── entropies.txt
│ ├── hyperparameters.txt
│ ├── trained_strategy.jpg
│ ├── training_history.jpg
│ └── training_history_moving_average.jpg
├── classical_DQL_sim_quantum
│ └── _BEST_1_layers_sigmoid_activation_longer
│ │ ├── cl_entropies.txt
│ │ ├── entropies.jpg
│ │ ├── entropies.txt
│ │ ├── hyperparameters.txt
│ │ ├── trained_strategy.jpg
│ │ ├── training_history.jpg
│ │ └── training_history_moving_average.jpg
├── classical_QL
│ ├── hyperparameters.txt
│ ├── trained_strategy.jpg
│ ├── training_history.jpg
│ └── training_history_moving_average.jpg
└── quantum
│ ├── Quantum_DQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSProp_REWARD_NO20220426183607.png
│ ├── earlystopping_Quantum_DQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSProp_REWARD_NO20220425223058.png
│ ├── earlystopping_entropies.png
│ └── entropies.png
├── scripts
├── 1._Classical_QL.py
├── 2._Classical_DQL.py
├── 3._Classical_DQL_sim_quant.py
├── 3b._Classical_DQL_sim_quant_grid_search.py
├── 3c._Classical_DQL_sim_quant_finetuning.ipynb
├── QML
│ ├── QML_DQN_FROZEN_LAKE.py
│ ├── README_ENTANGLEMENT.md
│ ├── ShortestPathFrozenLake.py
│ └── run_job.sh
├── auto_hp_tuning_visuals.py
└── src
│ ├── DQL
│ └── quant_sim.py
│ ├── entropies.py
│ ├── utils.py
│ └── visualizations.py
└── tutorials
├── training_example.py
└── tutorial_state_preparation.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 | **/__pycache__
3 | _archive
4 |
5 | /scripts/_external_sv_save.py
6 | /scripts/3z_entropy-calculation.ipynb
7 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 "Qubit" Science Club , Wrocław University of Science and Technology
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Quantum Reinforcement Learning
2 |
3 | ## Requirements:
4 |
5 | Code was developed and run on `Python` version `3.9.11`. All main requirements are in [`requirements`](./requirements.txt) file.
6 |
7 | ## Structure:
8 |
9 | ```
10 | ├───assets # assets for readme
11 | ├───results # results from training
12 | │ ├───auto_hp_tuning # auto hyperparameter finetunign results directory
13 | │ ├───classical_QL # classical Q-learning results directory
14 | │ ├───classical_DQL # classical DQL results directory
15 | │ ├───classical_DQL_sim_quantum # classical DQL simulating quantum model results directory
16 | │ └───quantum # quantum model results directory
17 | │
18 | ├───scripts # scripts for generating results
19 | │ ├───src # source code directory
20 | │ ├───QML # quantum model directory
21 | │ │ 1._Classical_QL.py # classical Q-learning
22 | │ │ 2._Classical_DQL.py # classical Deep Q-learning
23 | │ │ 3._Classical_DQL_sim_quant.py # classical Deep Q-learning simulating quantum circuit
24 | │ │ 3b._Classical_DQL_sim_quant_grid_search.py # script no. 3 for grid search finetuning
25 | │ │ 3c._Classical_DQL_sim_quant_finetuning.ipynb # script no. 3 with automatic hyperparameter finetuning
26 | │ │ auto_hp_tuning_visuals.py # auto hyperparameter finetuning results ploting script
27 | │
28 | └───tutorials # supplementary tutorials to start with
29 |
30 | ```
31 |
32 |
33 | ## Introduction
34 |
35 | The concept of entropy is widely used in various machine learning methods. In this project we wanted to see, if this value behaves in the same way as an analogous quantity used in quantum physics - the entanglement entropy. To investigate this, we used both classical deep Q-learning and its quantum counterpart to train an agent to move in a simple Frozen Lake environment.
36 |
37 | The output of a quantum circuit is of course a quantum state, for which the concept of entanglement entropy is pretty straightforward. In contrast, it's not as obvious in the case of classical RL. In that case, we were treating the output generated by the neural network as a quantum state (which in fact, is just some vector).
38 |
39 | ## Classical method
40 |
41 | ### Reinforcement learning
42 |
43 | Q-learning is a machine learning algorithm of the "Reinforcement Learning" type. The family of such algorithms differs from supervised and unsupervised learning in that the information for training is collected not from data, but from the interaction of the **agent** (trained algorithm) with the **environment**. The function that the agent is guided by is called **politics** and takes the **observation** as an input (e.g. the position of a player on the board, car speed on the track) and returns an **action** (e.g. move right, add gas).
44 |
45 | Further reading: https://en.wikipedia.org/wiki/Reinforcement_learning
46 |
47 | ### Classical Q-Learning
48 |
49 |
50 |
51 | >Run in [this script](./scripts/1._Classical_QL.py)
52 | 🚀
53 |
54 |
55 |
56 | We will explain the concept of Q-learning on the example of an agent moving around the "FrozenLake" environment. Let's take a 4x4 element board:
57 |
58 | ```Python
59 | import gym
60 |
61 | lake = gym.make('FrozenLake-v1', is_slippery=False)
62 | lake.reset()
63 | lake.render()
64 | ```
65 |
66 | SFFF
67 |
68 | FHFH
69 |
70 | FFFH
71 |
72 | HFFG
73 |
74 | **Legend:**
75 | - S: starting point, one per board,
76 | - F: frozen surface - safe field,
77 | - H: hole - an ice hole, a field that gives a large penalty, or ends the game,
78 | - G: goal, end point.
79 |
80 | The agent is always in one of the spaces and, based on his policy, decides to move in one direction (left, right, up, down). The entire walk focuses on the reward, which the agent increases or decreases by entering a specific field.
81 | Scoring example:
82 |
83 | - Frozen surface: 0 or -0.01 to the reward (optional penalty to eliminate detour or walking in circles),
84 | - Starting point: 0 or -0.01 to reward,
85 | - Hole: -1 to the reward,
86 | - End point: +1 to the reward.
87 |
88 | Politics is a Q function that takes an action (e.g. move left) and a state (field 5 - second row from the top, second column from the right) as an input, and returns an appropriate reward. In the the most basic case (without the use of deep learning), the update of the politics (after each step) is represented by the Bellman equation:
89 |
90 |
91 | 
92 |
93 |
94 | Further reading: https://en.wikipedia.org/wiki/Q-learning
95 |
96 |
97 |
98 | ### Deep Q-Learning
99 |
100 |
101 | >Run in [this script](./scripts/2._Classical_DQL.py)
102 | 🚀
103 |
104 |
105 |
106 | However, in our case, we use the so-called Deep Q-Learning (DQL). To make decisions, we use a neural network with a 16-element input (number of fields on the board) and a 4-element output, corresponding to the weights provided for each movement:
107 |
108 | - 0 - move left,
109 | - 1 - move down,
110 | - 2 - move right,
111 | - 3 - move up.
112 |
113 | For example, if we would like to check which way is the best to go from the field number 5, we activate the neuron with index 4
114 |
115 | 
116 |
117 |
118 | We can access the exact values returned by the neural network as follows
119 |
120 | ```Python
121 | agent.Qstate(4)
122 | ```
123 | tensor([0.4165, 0.6063, 0.5308, 0.4209])
124 |
125 |
126 | We can see, that the second element is the greatest in value, so the agent is going to move downwards.
127 |
128 |
129 | ### Implementation
130 |
131 | The whole thing was coded in pytorch. The architecture of our model is a network consisting of linear layers, followed by a sigmoid activation function:
132 | - https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html
133 | - https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
134 |
135 | The loss function used was the SmoothL1Loss: https://pytorch.org/docs/stable/generated/torch.nn.SmoothL1Loss.html
136 |
137 | Several hundred epochs are enough to train a model. During the training we are tracking the percentage of cases, in which the walk ended up succesfully. For example, if the ratio of walks (over the last 50 epochs) during which the agent ended up in the goal field is equivalent to 90%, we are finishing the training process.
138 |
139 |
140 | Below we present the average success rate over the last 50 epochs of training, during the total of 1000 epochs:
141 |
142 | ```Python
143 | plt.figure(figsize=(12,5))
144 | plt.plot(bin_stat.statistic)
145 | plt.title(label=f'Percentage of succesfully finised walks (over the last {epoch_bin} epochs)')
146 | plt.ylim(-1, 101)
147 | plt.xlim(-1, (epochs+1)/epoch_bin)
148 | plt.yticks(ticks=list(range(0, 110, 10)))
149 | plt.xticks(ticks=bin_stat.bin_edges/epoch_bin-1, labels=bin_stat.bin_edges.astype(int))
150 | plt.grid()
151 | plt.show()
152 | ```
153 |
154 | 
155 |
156 |
157 | Additionally, we check whether the process is optimal or not. We strive for an algorithm that covers the route in 6 steps. If there is more of them, the algorithm is walking in circles. We are looking for the moment where it starts to oscillate around the optimal value. Therefore, we track how many steps an agent takes thanks to its politics, after each epoch:
158 |
159 |
160 | ```Python
161 | plt.figure(figsize=(12,5))
162 | plt.plot(t.jList)
163 | plt.yticks(ticks=list(range(0, 110, 10)))
164 | plt.title(label="Number of states done by the agent")
165 | plt.grid()
166 | plt.show()
167 | ```
168 |
169 | 
170 |
171 | Let's see how the trained agent works:
172 |
173 | 
174 |
175 |
176 | ## Variational Quantum Circuit
177 |
178 |
179 | >Run from [this folder](./scripts/RUN_QML/)
180 | 🚀
181 |
182 |
183 |
184 | In the quantum approach we are replacing the neural network with the so-called Variational Quantum Circuit (VQC). This is a type of quantum circuit with manipulable (classical) parameters. Like neural networks, VQCs can approximate arbitrary functions or classifiers. The following implementation is taken from
185 |
186 | [S. Y.-C. Chen, C.-H. H. Yang, J. Qi, P.-Y. Chen, X. Ma and H.-S. Goan, *"Variational Quantum Circuits for Deep Reinforcement Learning"*, 2019](https://arxiv.org/abs/1907.00397)
187 |
188 | ### Encoding of the state
189 |
190 | Recall that our task is, given a state (the place on the board where our agent is), to find the best corresponding move. The board contains 16 possible states (0-15), which can be encoded using 4 bits (0000-1111)
191 |
192 |
193 | 
194 |
195 | or similarly 4 qubits
196 |
197 |
198 | 
199 |
200 | For example, if we are in the 13th state, its bitwise representation is 1101, which can be also written down using qubits' states as .
201 |
202 | Finally, the explicit gates which must be applied on consecutive qubits to encode agent's state are shown below:
203 |
204 | 
205 |
206 | Note, that we are using the  and  parameters. These parameters won't be trained in the further part of the procedure, and are only used to encode the state properly.
207 |
208 | ### Layers
209 |
210 | On the encoded state we are acting with the following gates:
211 |
212 | 
213 |
214 | At the beginning we are entangling all of the qubits using the *CNOT* gates. Then, we are rotating each qubit along the *X*, *Y* and *Z* axes according to the following formula
215 |
216 |
217 | &space;=&space;R_x(\alpha_i)&space;R_y(\beta_i)&space;R_z(\gamma_i))
218 |
219 | ,  and  are the parameters that will be optimized during each training's iteration (which also means, that the gradient will be calculated exactly over these variables).
220 |
221 | At the very end we are conducting a measurement of each qubit and basing on the output we make a proper move. The whole circuit looks as follows:
222 |
223 | 
224 |
225 | ### Early stopping
226 |
227 | Because the training proces of VQC is very unstable we are (similarly as in the case of classical approach) using early stopping. We are terminating the procedure if during the last 20 epochs the reward didn't change and was positive.
228 |
229 | ## Extended classical model (NN simulating VQC)
230 |
231 |
232 | >Run in [this script](./scripts/3._Classical_DQL_sim_quant.py)
233 | 🚀
234 |
235 |
236 | We have also tested classical Neural Network, which mimic quantum model behaviour.
237 |
238 |
239 | It's important to note, that the state vector given by the VQC consists of  complex variables, where *n* is the number of qubits, which in our case gives 16 numbers. To resemble this situation as close as possible we are extending the classical model by increasing the number of neurons in each layer (also the output one) to 32. We are using 32 real numbers to encode 16 complex ones.
240 |
241 | It should be noted, that unfortunately due to this extension the agent's training becomes more unstable and converges more slowly.
242 |
243 |
244 | ## Results
245 | ### Quantum DQN
246 |
247 | With earlystopping | WithOUT earlystopping
248 | :---: | :---:
249 |  | 
250 |  | 
251 |
252 |
253 | As we can see, the two kinds of entropy behave differently and there is no obvious correltation between both of them (and also between entropies and the obtained reward).
254 |
255 | ### Classical Q-learnig:
256 |
257 | Classical Q-learning model learns for wide range of parameters and has very predictible behaviour:
258 | 1. agent wanders until spots reward
259 | 2. then stays on that path usually optimizing path to the shortest one in few epochs
260 | The only fun is to set hyperparamers to converge as fast as possible.
261 |
262 | 
263 |
264 | We didn't calculate entanglement entropies here. This one is just complementary proof of concept.
265 |
266 | ### Deep Q-learning:
267 |
268 | Learns slower than non-deep version and is slightly more sensitive to hyperparameters. Almost always converge to optimal number of steps but sometimes vary a lot along the way:
269 |
270 | 
271 |
272 | ### Deep Q-learning simulating quantum circuit (Extended classical model):
273 |
274 | Here was a lot of trouble to train this model. We were seeking best architecture and set of hyperparameters in 3 stages:
275 |
276 | ### **Finetuning stages:**
277 | -----
278 |
279 | #### **1. Manual finetuning on baseline model:**
280 |
281 | * *Model*: one/two hidden layer and sigmoid activation function
282 | * *Goal*: We perform a few experiments to gain a sense how and which hyperparamers influence our model training. Also we had baseline ranges to start more systematic searching
283 | * *Results*: It turns out the model is very hard to train, in fact only for some very narrow ranges we obtained ~40% win ratio in last few epoch means. It occur rarely, only for some seeds.
284 |
285 | User can run our [script](./scripts/3._Classical_DQL_sim_quant.py), all paramers there are the results of this finetuning.
286 |
287 | #### **2. Grid search for the best architecture:**
288 |
289 | * *Model*:
290 | * **Hidden layers**: from 1 to 6 incl.
291 | * **Activation functions**: sigmoid, hyperbolic tangent and leaky relu with slope 0.1
292 | * *Goal*: Here we put training with slightly lower hyperparameters, which gaves training 'pace' (learning rate, random paramer scaling etc.) and run every combination of tested architectures for `20'000` epochs to choose best architecture to hyperparameter finetuning.
293 | * *Results*:
294 | * None of the models trained to win.
295 | * Most promising results shown leaky relu, but we quit it in next stage since it 'favors' positive values.
296 | * Tangent performs the worst.
297 | * Model starts to train for 1 and 2 hidden layers, for 3 and over hidden layers the architecture seems to be too complicated.
298 |
299 | However, the one layer architecture with sigmoid activation function almost trained (has around 90% win ratio at the end of a training). Naturally we put it in **training with 30'000 epochs and it trained calling early stop on 22800 epoch**. Model converged to optimal number of steps, but the training history is very chaotic in comparison to the previous methods.
300 | This experiment information and history is in [this directory](./results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/).
301 |
302 | All the other results are in [release with full results](./results/classical_DQL_sim_quantum/) in `results` directory. They were too big to include them into main repository (weighting around 0.5 GB).
303 | Script used for training is [here](./scripts/3b._Classical_DQL_sim_quant_grid_search.py).
304 |
305 | #### **3. Automatic hyperparamers finetuning :**
306 |
307 | For this stage we used `pyTorch` [finetuning tutorial](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html) with `ray`.
308 |
309 | * *Model*:
310 | * **Hidden layers**: from 1 to 2 incl.
311 | * **Activation functions**: sigmoid
312 | * *Details*:
313 | * 150 experiments run 12 at once with scheduler setting next 16 in queue
314 | * 30'000 epochs per experiment, with grace period for early stopper from scheduler on 15'000 epochs. Early stopper was enabled to prevent wasting resources for models, that doesn't train.
315 | * We used [ray's ASHA scheduler](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html)
316 |
317 |
318 | * *Goal*:
319 | * Final, automatic, full scale finetuning.
320 | * *Results*:
321 | * only one of the experiments trained with lowered win ratio threshold set to 70% from last 100 epochs. (Gamma: ~`0.92`, learning rate: ~`0.008`, random scaling: ~`0.9998`, `1` hidden layer)
322 | * The only winner architecture is very similar to our parameters from previous stages:
323 | * Best model from 2nd stage: Gamma: `0.9`, learning rate: ~`0.0002`, random scaling: `0.9998`, `1` hidden layer.
324 | * 1 hidden layers model dominates in higher win ratio domain
325 |
326 |
327 | | Results on scatter plot | Results on triangular surface |
328 | | ------------- | ------------- |
329 | |  |  |
330 | | Violet dots are models with 1 hidden layers.
Yellow dots are models with 2 hidden layers. | |
331 |
332 | All the details are in [results csv file](./results/auto_hp_tuning/results.csv).
333 |
334 | Notebook used for training is [here](./scripts/3c._Classical_DQL_sim_quant_finetuning.ipynb). Finetuning was performed on desktop with i5-8600K 3.6 GHz CPU (6 CPUs) and Nvidia 2060 RTX GPU.
335 |
336 | ### **Results of classical model simulating quantum model**:
337 |
338 | For best classical model simulating quantum model, we can see convergence of (both) entropies to zero, right in epochs, where model started to reach goal:
339 |
340 | 
341 | 
342 |
343 | All parameters are in [results folder](./results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/).
344 |
345 | However this method is incomparably harder to obtain effective model for environment. Not only in terms of hyperparameters sensitivity, but also from training duration and only small amount of 'succesful' experiments i.e. model which has mean win ratio over 40%. Also to obtain 'succesful' model we need to stop in right place i.e. for smaller win ratio early stop condition, which does not guarantee optimal path.
346 |
347 | Interesting is, that models with 'real' quantum circuits (not simulated by neural network) were able to train, even if rarely.
348 | This shows that simulating quantum distributions for classical nerual networks can be tough. In our case particularly with:
349 | * classical data encoded with basis embedding
350 | * classical data decoded with expectation values from Pauli Z operator (with flipped sign)
351 |
352 | Similar conclusions, that quantum disribution can be hard to simulate, can be found in literature e.g. *Learning hard quantum distributions with variational
353 | autoencoders* [Rochetto et, al. 2018](https://www.nature.com/articles/s41534-018-0077-z.pdf).
354 |
355 | # Conclusions
356 |
357 | We examined two kinds of entropy - the von Neumann (entanglement) entropy and Shannon entropy of the output vector (treated as a quantum state-vector) returned by different ML models. It's worth noticing, that these two values are the same, when we express the (quantum) state vector in the Schmidt basis (which can be obtained by means of the SVD). See [here](https://physics.stackexchange.com/questions/600120/von-neumann-entropy-vs-shannon-entropy-for-a-quantum-state-vector/608071#608071) for more details. However, the SVD is never conducted when training ML models, which justifies the different behaviour of the curves presented in above plots. However, we were not able to find any obvious relations between these two entropies, and also between them and the reward obtained by the agent.
358 |
359 | We can see that in the case of a classical model trying to reproduce the behavior of a quantum model the two analyzed entropies differ from the one obtained from a truly quantum model. The two values begin their oscillations at a similar level, but for the classical model the entanglement entropy seems to be dropping faster. Also, classical model seems to be "more stable". However, these conclusions shouldn't be regarded as general rules, because in theory other agents might give a different behavior.
360 |
361 | Finally, **we suggest, that the von Neumann entropy can be also used during the training of any (classical) ML model**, which outputs a vector of length $4^n$, for some integer $n$. In that case we would just treat the output as the state vector of some quantum system. **The maximization of entropy is widely used in RL by adding it as a bonus component to the loss function (as described [here](https://awjuliani.medium.com/maximum-entropy-policies-in-reinforcement-learning-everyday-life-f5a1cc18d32d) and [here](https://towardsdatascience.com/entropy-regularization-in-reinforcement-learning-a6fa6d7598df)), so it would be interesting to see, if we could gain some different behaviour of an agent by utilizing the entanglement entropy in a similar way**. It should be possible, because the von Neumann entropy is differentiable (see [here](https://math.stackexchange.com/questions/3123031/derivative-of-the-von-neumann-entropy), [here](https://math.stackexchange.com/questions/2877997/derivative-of-von-neumann-entropy) and [here](https://quantumcomputing.stackexchange.com/questions/22263/how-to-compute-derivatives-of-partial-traces-of-the-form-frac-partial-operat)).
362 |
363 | ## Authors:
364 |
365 | M.Sc. Bartosz Rzepkowski, Janusz Twardak, Michał Łukomski, Marek Kowalik .
366 | Project was developed as part of [science club "Qubit"](http://qubit.pwr.edu.pl/) ([Wrocław University of Science and Technology](https://pwr.edu.pl/en/)) work.
367 |
368 | ## License:
369 |
370 | Code and other materials are under [MIT license](LICENSE).
371 |
372 | > **Disclaimer**:
Our project uses fragments of code from other sources, sometimes with different licenses. All these fragments are marked properly. MIT license refers only to our original code.
373 |
--------------------------------------------------------------------------------
/assets/DQL_NN_architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/DQL_NN_architecture.png
--------------------------------------------------------------------------------
/assets/DQL_results.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/DQL_results.jpg
--------------------------------------------------------------------------------
/assets/DQL_sim_quant/entropies.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/DQL_sim_quant/entropies.jpg
--------------------------------------------------------------------------------
/assets/DQL_sim_quant/trained_strategy.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/DQL_sim_quant/trained_strategy.jpg
--------------------------------------------------------------------------------
/assets/DQL_sim_quant/training_history.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/DQL_sim_quant/training_history.jpg
--------------------------------------------------------------------------------
/assets/DQL_sim_quant/training_history_moving_average.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/DQL_sim_quant/training_history_moving_average.jpg
--------------------------------------------------------------------------------
/assets/DQL_sim_quant_results.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/DQL_sim_quant_results.jpg
--------------------------------------------------------------------------------
/assets/QL_results.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/QL_results.jpg
--------------------------------------------------------------------------------
/assets/agent_walk.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/agent_walk.gif
--------------------------------------------------------------------------------
/assets/bellman_equation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/bellman_equation.png
--------------------------------------------------------------------------------
/assets/circuit.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/circuit.jpg
--------------------------------------------------------------------------------
/assets/encoding.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/encoding.jpg
--------------------------------------------------------------------------------
/assets/layer_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/layer_1.jpg
--------------------------------------------------------------------------------
/assets/number_of_steps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/number_of_steps.png
--------------------------------------------------------------------------------
/assets/percentage_of_succesfull_walks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/assets/percentage_of_succesfull_walks.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | gym==0.23.1
2 | matplotlib==3.5.1
3 | numpy==1.21.5
4 | pandas==1.4.2
5 | PennyLane==0.22.2
6 | scipy==1.8.0
7 | torch==1.11.0
8 | torchsummary==1.5.1
9 | tqdm==4.63.1
10 |
--------------------------------------------------------------------------------
/results/auto_hp_tuning/results.csv:
--------------------------------------------------------------------------------
1 | ,if_trained,win_ratio,episode_reward_mean,steps,config/gamma,config/lr,config/n_hidden_layers,config/random_scaling
2 | 0,False,0.0,-0.01,30000,0.6429321044920745,0.3381538660004886,2,0.9996758915938184
3 | 1,False,0.0,-0.01,30000,0.9634394487957104,0.2477646826964672,1,0.9995080556877116
4 | 2,False,0.0,-0.0297999999999999,30000,0.1278502176306413,0.0038077620563105,2,0.9998001486092865
5 | 3,False,0.0,-0.01,30000,1.0896413637009497,0.0017159593249697,2,0.9995951030113496
6 | 4,False,0.0,-0.01,30000,1.0712069399496895,0.085835916834808,1,0.9996429880689348
7 | 5,False,0.0,-0.01,30000,0.8661881600527119,2.554632307572298e-05,1,0.9996932883674374
8 | 6,False,0.0,-0.01,30000,0.1225708589257898,0.0139979971854847,2,0.9997758306755856
9 | 7,False,0.0,-0.9604,30000,0.2754528346538609,0.0076829647096619,1,0.9999600857569494
10 | 8,False,0.0,-0.01,30000,1.7615002274484195,0.0552439197921312,1,0.999598278069634
11 | 9,False,0.0,-0.01,30000,1.5133222203725931,0.1273945273279201,2,0.9997853193656548
12 | 10,False,0.0,-0.01,30000,1.5002468807136646,2.6380295940306665e-05,1,0.9997212188109448
13 | 11,False,0.0,-0.0396999999999999,30000,0.1633806918537651,0.0013508943695215,1,0.999795513082905
14 | 12,False,0.0,-0.01,30000,1.043793449468295,0.0004852534818499,2,0.9997452502646372
15 | 13,False,0.0,-0.01,30000,0.1867059025212574,0.0022757296997554,2,0.9997575758039736
16 | 14,False,0.0,-0.01,30000,0.1651360985918654,0.0482472028034841,2,0.9996526573436372
17 | 15,False,0.0,-0.01,30000,0.2070144579998142,0.0070213709546484,1,0.9996215729338886
18 | 16,False,0.0,-0.0990999999999998,30000,0.9923896837900787,0.1056199344999185,1,0.9998951477891126
19 | 17,False,0.0,-0.01,30000,1.4201523331746466,0.0015458123147767,2,0.9997230074833044
20 | 18,False,0.0,-0.01,30000,1.7480348463032849,2.014815576209728e-05,1,0.9995371364283584
21 | 19,False,0.0,-0.0199,30000,1.8417113562031533,0.2154447653216558,1,0.99985706158186
22 | 20,False,0.0,-0.5445999999999995,30000,0.5970746888952899,0.0001218713769029,1,0.9998234467457447
23 | 21,False,0.0,-0.3070000000000005,30000,1.1827803259001586,0.0247895655778716,1,0.9999165844503128
24 | 22,False,0.0,-0.01,30000,0.1282933336489403,0.0466998624759277,1,0.9995931449082964
25 | 23,False,0.01,-0.2672000000000004,30000,0.4622827099556696,0.0021432397052196,1,0.9998740751853912
26 | 24,False,0.0,-0.01,30000,1.2977867224977024,0.0561111162867689,1,0.9996255510058668
27 | 25,False,0.0,-0.01,30000,0.1567895264753695,0.0165690023921189,1,0.9996237193594274
28 | 26,False,0.0,-0.01,30000,1.4727594079969806,0.0004150972159699,1,0.9997270378517816
29 | 27,False,0.0,-0.0199,30000,0.9182373810574538,0.0015362350728754,2,0.999739198852663
30 | 28,False,0.0,-0.01,30000,0.7434210257967419,0.001170155304365,2,0.9996202599603984
31 | 29,False,0.04,-0.92,30000,0.247895277202211,0.0002437194807502,1,0.9996984813681832
32 | 30,False,0.0,-1.0,30000,1.6122508191725604,0.0329602790122621,2,0.9998265615906032
33 | 31,False,0.0,-0.01,30000,1.173721736028401,1.5129672327953871e-05,2,0.9995861852433636
34 | 32,False,0.0,-1.0,30000,0.6992660103232805,0.0453870875743961,1,0.9995416645930334
35 | 33,False,0.0,-0.01,30000,0.1653453980074516,0.0707238056454616,1,0.9997029239321734
36 | 34,False,0.0,-0.01,30000,1.027998697712554,0.122835059742065,1,0.9996781856116186
37 | 35,True,0.71,0.42,5081,0.9290491174416302,0.008324696975339,1,0.9997782891020208
38 | 36,False,0.0,-0.0693999999999998,30000,0.4197324795305008,0.0001720748361111,2,0.9998019298510372
39 | 37,False,0.0,-0.0495999999999998,30000,0.3733165908582321,0.1212374949291871,2,0.9998862590476916
40 | 38,False,0.0,-0.6039999999999998,30000,0.1573622973702977,0.0035960750603891,1,0.999765686982886
41 | 39,False,0.0,-1.0,30000,0.2473455049724662,0.0378887863351928,2,0.99956675528091
42 | 40,False,0.0,-1.0,30000,0.1764780350453074,0.0001031075100616,2,0.9997205847528896
43 | 41,False,0.0,-0.0396999999999999,30000,0.4072720256706442,0.0070840569297821,1,0.9997780684078488
44 | 42,False,0.0,-0.0594999999999999,30000,1.7268542051153504,0.0048827754968105,2,0.9998067154411232
45 | 43,False,0.0,-0.01,30000,0.8495164604325529,0.0053360962329389,2,0.9996787880072384
46 | 44,False,0.0,-0.8713000000000001,30000,0.8646900133433162,1.4006101524620536e-05,1,0.9996289920450506
47 | 45,False,0.0,-0.01,30000,0.3793606189997634,0.0054999946824404,2,0.9997672713132628
48 | 46,False,0.02,-0.96,30000,0.5929944913180015,0.0003991362367752,2,0.9998398280483788
49 | 47,False,0.0,-0.0495999999999998,30000,0.1363438069180158,0.253873180398043,1,0.9998645676206388
50 | 48,False,0.0,-0.0297999999999999,30000,1.7455507839170006,0.0001459711892257,1,0.9998564577565828
51 | 49,False,0.0,-0.01,30000,0.1764179871049245,0.0025394200819503,1,0.9998398574939236
52 | 50,False,0.0,-0.01,30000,0.2872822760506955,0.1146244473995302,2,0.9995671909060307
53 | 51,False,0.03,-0.94,30000,0.1529637490250278,0.0044881029490943,2,0.9995889919222896
54 | 52,False,0.0,-0.01,30000,0.8781078384803666,0.0015079445240741,1,0.9995379728085784
55 | 53,False,0.0,-1.0,30000,0.2435036737900988,0.0434386006754544,1,0.9996608279719646
56 | 54,False,0.0,-0.01,30000,1.7624853324164442,0.0001049000522652,1,0.9997320302874484
57 | 55,False,0.0,-0.01,30000,0.1029824728427666,0.0645324560288929,1,0.9996948118334364
58 | 56,False,0.0,-0.01,30000,0.8249080134344877,0.000173328586113,1,0.999707189826582
59 | 57,False,0.0,-0.2476000000000003,30000,0.114258424353069,0.1783563851894506,1,0.9998624876682436
60 | 58,False,0.0,-0.01,30000,1.76851907345061,0.1733522664878826,2,0.9995974556033712
61 | 59,False,0.0,-0.0693999999999998,30000,0.4136452613713096,0.0882459180925758,1,0.9998796528202928
62 | 60,False,0.0,-0.3070000000000004,30000,1.6984180942346725,0.0013904745347936,2,0.999917056000489
63 | 61,False,0.0,-0.2080000000000002,30000,0.3581003809357341,0.048697114879218,1,0.999910488465492
64 | 62,False,0.0,-0.3961,30000,0.934022277197529,1.6417061742262208e-05,1,0.9996838403225784
65 | 63,False,0.0,-0.01,30000,0.8728713457173364,0.0905910978154328,2,0.9996910151888726
66 | 64,False,0.0,-0.01,30000,0.1916298481672728,0.0008895571978989,1,0.9997006918247698
67 | 65,False,0.0,-1.0,30000,0.4632149915967479,1.763914837333703e-05,2,0.9996287493268948
68 | 66,False,0.41,-0.18,30000,0.2677309847165141,3.583438994283214e-05,1,0.9996196260689872
69 | 67,False,0.0,-1.0,30000,0.8688559293028582,0.0139797099016574,1,0.9999844191947254
70 | 68,False,0.15,-0.7,30000,0.1931046440223698,0.0004632822261573,1,0.9999008309179324
71 | 69,False,0.16,-0.68,30000,0.1563756248922679,6.048453533896794e-05,1,0.999633302277556
72 | 70,False,0.0,-1.0,30000,0.348985690523884,9.973450560320763e-05,1,0.9995430566861742
73 | 71,False,0.0,-1.0,30000,0.1056654675880671,0.0027974320434799,2,0.9999176548313498
74 | 72,False,0.0,-1.0,30000,0.3074087589238471,0.0155020707632973,2,0.9999842516561064
75 | 73,False,0.03,-0.94,30000,0.1546564267346814,9.93342422253282e-05,2,0.9999203463049876
76 | 74,False,0.28,-0.44,30000,0.3162738638752472,0.0004800338315775,1,0.9999277422458096
77 | 75,False,0.02,-0.96,30000,0.2828673746129952,9.048801037284429e-05,2,0.9997972993675318
78 | 76,False,0.0,-0.6039999999999996,30000,1.5915177412269603,0.0778440917382973,1,0.9999446009038824
79 | 77,False,0.0,-1.0,30000,0.1242618394065656,0.00063302531668,2,0.9995621965349032
80 | 78,False,0.0,-0.01,30000,0.4608206303735836,0.2295263739992638,2,0.9997569438594714
81 | 79,False,0.0,-1.0,30000,0.2749294471106079,0.0016065331863113,2,0.9997794731736648
82 | 80,False,0.0,-0.01,30000,0.6212225918930153,0.1130091047209987,2,0.9995550533248264
83 | 81,False,0.0,-0.01,30000,0.3602010991674411,0.0851414730181986,2,0.999574528562404
84 | 82,False,0.0,-1.0,30000,0.6421513501812322,0.0001621933069186,1,0.9995803002618714
85 | 83,False,0.4,-0.2,30000,0.2850173581529486,0.0010002090231379,1,0.9997445099676456
86 | 84,False,0.08,-0.84,30000,0.100983557622482,0.0001067076275481,2,0.99953552724926
87 | 85,False,0.02,-0.96,30000,0.2435556513266287,0.002661728342529,1,0.999758479138454
88 | 86,False,0.0,-0.7525000000000002,30000,1.5030567251135971,0.1990498973597343,1,0.9999516952647388
89 | 87,False,0.0,-0.01,30000,0.1732110622001226,0.0037424042043013,2,0.9995978807984024
90 | 88,False,0.0,-0.9208000000000002,30000,0.2210277762652791,0.0283417056156158,2,0.9999631431740428
91 | 89,False,0.0,-0.01,30000,1.2363435124406372,0.0015610712409781,2,0.9995204402150571
92 | 90,False,0.0,-1.0,30000,0.4271359114067996,0.0151022327985865,1,0.999636335719432
93 | 91,False,0.0,-0.01,30000,0.7583286273947663,0.1222865247561305,2,0.9998012814765712
94 | 92,False,0.0,-0.01,30000,1.3652593949268037,0.1254053091325861,1,0.9997548352547312
95 | 93,False,0.0,-0.01,30000,1.246076004080192,0.1230026378197825,2,0.9995093891848044
96 | 94,False,0.0,-0.4555,30000,0.1403656232842004,0.0086860096010341,1,0.9997717049183344
97 | 95,False,0.0,-0.01,30000,1.5345245808230117,0.0015850033704645,2,0.9996743577736196
98 | 96,False,0.0,-1.0,30000,0.3688726659296873,0.0017085093430574,1,0.9995032685069192
99 | 97,False,0.0,-1.0,30000,0.2927250366610833,1.7712898673717144e-05,1,0.9997372345427918
100 | 98,False,0.0,-0.4555,30000,0.9641072950932572,0.0652305759095847,1,0.9998942222237154
101 | 99,False,0.0,-0.0693999999999998,30000,0.827293829484058,4.821232610393972e-05,1,0.999838770674412
102 | 100,False,0.0,-0.0990999999999998,30000,1.8489630306643132,0.0006665007713949,2,0.9998132139934904
103 | 101,False,0.0,-1.0,30000,0.8131687154441953,0.0015490470772516,2,0.9998404814641412
104 | 102,False,0.01,-0.98,30000,0.4175501508386886,0.0037450660080778,1,0.9999853290628032
105 | 103,False,0.01,-0.8513,30000,0.1224406285594123,3.23995734435952e-05,1,0.99989190354388
106 | 104,False,0.0,-1.0,30000,0.1989093004579079,0.0030819439728224,2,0.9999238417961404
107 | 105,False,0.03,-0.8212000000000002,30000,0.2698351090053247,0.0093102520489009,2,0.9997683109625122
108 | 106,False,0.0,-0.7822000000000001,30000,0.7679231502364715,0.0087134494030332,2,0.9999353081836212
109 | 107,False,0.0,-0.01,30000,0.5357687861652993,0.2494288305441379,1,0.9997191378956938
110 | 108,False,0.0,-1.0,30000,0.4216947617566411,0.0001468907882574,1,0.9997246670100448
111 | 109,False,0.11,-0.78,30000,0.253585884640042,2.815797771621376e-05,1,0.999790471858894
112 | 110,False,0.0,-0.1882000000000001,30000,0.2742842856642405,0.0054770047753832,1,0.9998626205440888
113 | 111,False,0.01,-0.98,30000,0.5011013684041216,0.0090731514832493,2,0.9997913341343893
114 | 112,False,0.01,-0.98,30000,0.1410562465371877,1.6969948901266786e-05,1,0.999972028450545
115 | 113,False,0.0,-0.01,30000,0.2390276056834814,0.3785721790510388,2,0.999704442437822
116 | 114,False,0.0,-0.0297999999999999,30000,1.1767288593959575,0.0066718731062661,2,0.9998579557261104
117 | 115,False,0.0,-0.01,30000,1.049595836497032,1.641136402320201e-05,1,0.9995394434056774
118 | 116,False,0.0,-0.0199,30000,0.1528049548842212,0.0002432643624448,1,0.999666318898786
119 | 117,False,0.0,-1.0,30000,1.004942815700538,0.0001036668288347,1,0.999723273467624
120 | 118,False,0.06,-0.88,30000,0.1113152183052778,0.0001489935272995,1,0.9996457099944768
121 | 119,False,0.0,-0.01,30000,0.1236944359712828,0.0533697887578857,2,0.9996897949919288
122 | 120,False,0.0,-1.0,30000,0.2542219582858407,5.9475469355802686e-05,2,0.9999738288631536
123 | 121,False,0.0,-1.0,30000,0.4812530835598186,0.0095977348930231,2,0.999763361263128
124 | 122,False,0.0,-0.7326999999999999,30000,0.1555210030751397,0.0022717248366247,1,0.9996506750393652
125 | 123,False,0.01,-0.9602,30000,0.3814656542401731,0.0001539739756366,2,0.9998189959803054
126 | 124,False,0.0,-0.1188999999999998,30000,0.6086110436806602,0.0017970177993131,1,0.9998442728255352
127 | 125,False,0.0,-0.01,30000,1.1974470449356451,1.806034163988675e-05,2,0.9996287681454716
128 | 126,False,0.04,-0.92,30000,0.1998974154718145,0.000625094829695,2,0.9995303782986686
129 | 127,False,0.11,-0.78,30000,0.2415105677606703,0.0144280109251387,2,0.9997104788474996
130 | 128,False,0.0,-0.01,30000,1.3706071726274904,0.0003946668555946,2,0.9995924294159184
131 | 129,False,0.0,-0.01,30000,1.7255441606182815,0.000864726021344,2,0.9996879766505772
132 | 130,False,0.0,-0.01,30000,1.3853807348232268,0.0155724131678131,1,0.9995470768132348
133 | 131,False,0.0,-0.01,30000,1.983272393937628,3.0549862970746466e-05,2,0.9997320274701303
134 | 132,False,0.0,-0.01,30000,0.2082208474114526,0.1139820713316224,2,0.9997805821600446
135 | 133,False,0.0,-0.01,30000,0.9261263514700856,0.0130307419924446,1,0.9996430823206288
136 | 134,False,0.0,-0.2674000000000002,30000,0.5132629697454023,0.0003982877290684,2,0.9998700352439036
137 | 135,False,0.0,-0.9109,15000,0.5444327848035303,1.8578367335631687e-05,2,0.9999266494882394
138 | 136,False,0.01,-0.98,30000,0.3487198566880853,0.002602537112272,1,0.9997628222091388
139 | 137,False,0.0,-1.0,30000,0.695916596406484,0.0548471321697416,2,0.9996909205314796
140 | 138,False,0.05,-0.9,30000,0.3631532553272026,5.12493179570414e-05,1,0.9997638600282388
141 | 139,False,0.36,-0.2602,30000,0.6411061541320067,1.331602334783476e-05,1,0.9996297518563424
142 | 140,False,0.03,-0.94,30000,0.178833972659598,0.0018197303708018,1,0.9999411358269298
143 | 141,False,0.0,-0.9802,15000,0.2855286451736493,0.0010119331361557,1,0.999964704489208
144 | 142,False,0.0,-1.0,15000,0.2228139045304387,0.0003225958253618,2,0.999571484968628
145 | 143,False,0.0,-0.4951,15000,0.6505560909197262,0.0544989023448025,1,0.9998728620721252
146 | 144,False,0.05,-0.9,30000,0.1176609532497377,1.9708523079728263e-05,2,0.9997699760339828
147 | 145,False,0.0,-0.7326999999999998,15000,0.2364637942701128,0.0017079191690217,2,0.9995315884365752
148 | 146,False,0.01,-0.98,15000,0.1228124108551648,0.0026631816126636,2,0.999565769986434
149 | 147,False,0.09,-0.82,30000,0.1563081523121499,0.0180635340699864,2,0.999750519585998
150 | 148,False,0.0,-0.9901,15000,1.7890956973019545,0.0002644387570891,1,0.999956651391072
151 | 149,False,0.0,-1.0,15000,1.8453549162305485,0.155827858415406,1,0.9999884512123388
152 |
--------------------------------------------------------------------------------
/results/auto_hp_tuning/results_scatter_plot.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/auto_hp_tuning/results_scatter_plot.gif
--------------------------------------------------------------------------------
/results/auto_hp_tuning/results_triangular_surface.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/auto_hp_tuning/results_triangular_surface.gif
--------------------------------------------------------------------------------
/results/classical_DQL/entropies.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_DQL/entropies.jpg
--------------------------------------------------------------------------------
/results/classical_DQL/hyperparameters.txt:
--------------------------------------------------------------------------------
1 | gamma;0.8
2 | epochs;500
3 | max_steps;60
4 | learning_rate;0.002
5 | random_chance;0.99
6 | random_scaling;0.99
7 | window;40
8 | target_win_ratio;0.98
9 | min_steps_num;6
10 |
--------------------------------------------------------------------------------
/results/classical_DQL/trained_strategy.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_DQL/trained_strategy.jpg
--------------------------------------------------------------------------------
/results/classical_DQL/training_history.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_DQL/training_history.jpg
--------------------------------------------------------------------------------
/results/classical_DQL/training_history_moving_average.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_DQL/training_history_moving_average.jpg
--------------------------------------------------------------------------------
/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/entropies.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/entropies.jpg
--------------------------------------------------------------------------------
/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/hyperparameters.txt:
--------------------------------------------------------------------------------
1 | gamma;0.9
2 | epochs;30000
3 | max_steps;60
4 | learning_rate;0.0002
5 | non_random_chance;0.99
6 | random_scaling;0.9998
7 | window;40
8 | target_win_ratio;0.98
9 | min_steps_num;6
10 | n_hidden_layers;1
11 | activation_function;sigmoid
12 | global_seed;123456
13 |
--------------------------------------------------------------------------------
/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/trained_strategy.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/trained_strategy.jpg
--------------------------------------------------------------------------------
/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/training_history.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/training_history.jpg
--------------------------------------------------------------------------------
/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/training_history_moving_average.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_DQL_sim_quantum/_BEST_1_layers_sigmoid_activation_longer/training_history_moving_average.jpg
--------------------------------------------------------------------------------
/results/classical_QL/hyperparameters.txt:
--------------------------------------------------------------------------------
1 | gamma;0.05
2 | epochs;200
3 | max_steps;400
4 | learning_rate;0.001
5 | random_chance;0.99
6 | random_scaling;0.95
7 | window;40
8 | target_win_ratio;0.98
9 | min_steps_num;6
10 |
--------------------------------------------------------------------------------
/results/classical_QL/trained_strategy.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_QL/trained_strategy.jpg
--------------------------------------------------------------------------------
/results/classical_QL/training_history.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_QL/training_history.jpg
--------------------------------------------------------------------------------
/results/classical_QL/training_history_moving_average.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/classical_QL/training_history_moving_average.jpg
--------------------------------------------------------------------------------
/results/quantum/Quantum_DQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSProp_REWARD_NO20220426183607.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/quantum/Quantum_DQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSProp_REWARD_NO20220426183607.png
--------------------------------------------------------------------------------
/results/quantum/earlystopping_Quantum_DQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSProp_REWARD_NO20220425223058.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/quantum/earlystopping_Quantum_DQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSProp_REWARD_NO20220425223058.png
--------------------------------------------------------------------------------
/results/quantum/earlystopping_entropies.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/quantum/earlystopping_entropies.png
--------------------------------------------------------------------------------
/results/quantum/entropies.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qubit-science-club/quantum_reinforcement_learning/31f2daf7a2ce8493b8fd9fe85aaea0ebd754448f/results/quantum/entropies.png
--------------------------------------------------------------------------------
/scripts/1._Classical_QL.py:
--------------------------------------------------------------------------------
1 | import gym
2 | from gym.envs.registration import register
3 | import torch
4 |
5 | from tqdm import tqdm
6 | import numpy as np
7 | from src.visualizations import *
8 | import os
9 |
10 | # Parameters
11 | gamma = 0.05
12 | epochs = 200
13 | max_steps = 400
14 | learning_rate = 0.001
15 | random_chance = 0.99
16 | random_scaling = 0.95
17 | window = 40
18 | target_win_ratio = 0.98
19 | min_steps_num = 6
20 | global_seed = 42
21 |
22 | np.random.seed(global_seed)
23 | torch.manual_seed(global_seed)
24 |
25 | # register(
26 | # id='FrozenLake-v1',
27 | # entry_point='gym.envs.toy_text:FrozenLakeEnv',
28 | # kwargs={'map_name' : '4x4',
29 | # 'is_slippery': False})
30 | env = gym.make('FrozenLake-v1', is_slippery=False)
31 | #print(env.render(mode='ansi'))
32 |
33 |
34 | #Initilize Q
35 | number_of_states = env.observation_space.n
36 | number_of_actions = env.action_space.n
37 |
38 | print('number_of_states:', number_of_states,'\nnumber_of_actions' ,number_of_actions)
39 |
40 | # At first Q is a zero tensor with action and observation space
41 | Q = torch.zeros([number_of_states, number_of_actions])
42 |
43 | steps_total = []
44 | rewards_total = []
45 | win_history = []
46 | random_params = []
47 | epoch_random_chance = random_chance
48 | for i_episode in tqdm(range(epochs)):
49 | state = env.reset()
50 | reward_all = 0
51 | epoch_random_chance*=random_scaling
52 |
53 | for step in range(max_steps):
54 | # action
55 | if torch.rand(1) < epoch_random_chance:
56 | Q_state = torch.rand(number_of_actions)
57 | else:
58 | Q_state = Q[state]
59 |
60 | action = torch.argmax(Q_state)
61 |
62 | #Take the best action
63 | new_state, reward, done, info = env.step(action.item())
64 | if reward==0:
65 | if done==True:
66 | reward=-1
67 | # else:
68 | # reward=-0.01
69 |
70 | #Update Q and state
71 | Q[state,action] = Q[state,action]+learning_rate*(reward + gamma * torch.max(Q[new_state])-Q[state,action])
72 | state = new_state
73 | reward_all += reward
74 |
75 | #env.render()
76 | if done or step==max_steps-1:
77 | steps_total.append(step+1)
78 | rewards_total.append(reward_all)
79 | win_history.append(1 if reward==1. else 0)
80 | random_params.append(epoch_random_chance)
81 | break
82 |
83 | if sum(win_history[-window:])/window>=target_win_ratio:
84 | break
85 |
86 | results_path = "../results/classical_QL"
87 |
88 | strategy = np.array([torch.argmax(Q_state).item() for Q_state in Q]).reshape((4,4))
89 | holes_indexes = np.array([5,7,11,12])
90 | strategy_save_path = os.path.join(results_path, "trained_strategy.jpg")
91 |
92 | plot_strategy(strategy, holes_indexes, strategy_save_path)
93 |
94 |
95 | moving_average_history_save_path = os.path.join(results_path, "training_history_moving_average.jpg")
96 | plot_rolling_window_history(steps_total, rewards_total, win_history, random_params, target_win_ratio, min_steps_num, moving_average_history_save_path, window=window)
97 | history_save_path = os.path.join(results_path, "training_history.jpg")
98 | plot_history(steps_total, rewards_total, win_history, random_params, target_win_ratio, min_steps_num, history_save_path)
99 |
100 |
101 | with open(os.path.join(results_path, "hyperparameters.txt"), "w+") as f:
102 | f.write(f'gamma;{gamma}\n')
103 | f.write(f'epochs;{epochs}\n')
104 | f.write(f'max_steps;{max_steps}\n')
105 | f.write(f'learning_rate;{learning_rate}\n')
106 | f.write(f'random_chance;{random_chance}\n')
107 | f.write(f'random_scaling;{random_scaling}\n')
108 | f.write(f'window;{window}\n')
109 | f.write(f'target_win_ratio;{target_win_ratio}\n')
110 | f.write(f'min_steps_num;{min_steps_num}\n')
111 |
--------------------------------------------------------------------------------
/scripts/2._Classical_DQL.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn.functional as F
3 | from torch import linalg as LA
4 |
5 | import numpy as np
6 | import pandas as pd
7 | import matplotlib.pyplot as plt
8 | import os
9 | from tqdm import tqdm
10 |
11 | from src.entropies import entanglement_entropy, classical_entropy
12 | from src.visualizations import *
13 |
14 | os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
15 |
16 | gamma = 0.8
17 | epochs = 500
18 | max_steps = 60
19 | learning_rate = 0.002
20 | random_chance = 0.99
21 | random_scaling = 0.99
22 | window = 40
23 | target_win_ratio = 0.98
24 | min_steps_num = 6
25 | global_seed = 42
26 |
27 | np.random.seed(global_seed)
28 | torch.manual_seed(global_seed)
29 |
30 | class Agent(torch.nn.Module):
31 | def __init__(self, location_space_size, action_space_size, hidden_layer_size):
32 | super(Agent, self).__init__()
33 | self.location_space_size = location_space_size
34 | self.action_space_size = action_space_size
35 |
36 | self.l1 = torch.nn.Linear(in_features=location_space_size, out_features=hidden_layer_size)
37 | self.l2 = torch.nn.Linear(in_features=hidden_layer_size, out_features=action_space_size) #action_space_size
38 | self.l1.weight.data.uniform_()
39 | self.l1.bias.data.fill_(-0.02)
40 | self.l2.weight.data.uniform_()
41 | self.l2.bias.data.fill_(-0.02)
42 |
43 | print("Set the neural network with \
44 | \n\tInput size: \t{inp}, \
45 | \n\tHidden layer size: \t{hidden} \
46 | \n\tOutput size: \t{outp}"\
47 | .format(inp=self.location_space_size, hidden=self.location_space_size, outp=self.action_space_size))
48 |
49 | def forward(self, state):
50 | state_one_hot = torch.zeros(self.location_space_size)
51 | state_one_hot.scatter_(0,torch.tensor([int(state)]), 1.)
52 | out1 = torch.sigmoid(self.l1(state_one_hot))
53 | return self.l2(out1).view((-1)) # 1 x ACTION_SPACE_SIZE == 1 x 4 => 4
54 |
55 | class Trainer:
56 | def __init__(self):
57 | self.action_space_size = 4
58 | self.location_space_size = 16
59 | self.holes = 2
60 | self.agent = Agent(self.location_space_size, self.action_space_size, 16)
61 | self.optimizer = torch.optim.Adam(params=self.agent.parameters(), lr=learning_rate)
62 | self.location = 0
63 | temp_lake = torch.zeros(self.location_space_size)
64 |
65 | holes_indexes = np.random.randint(1, self.location_space_size-1, (self.holes,))
66 | while np.unique(holes_indexes).size11:
105 | return self.location, self.lake[self.location]
106 | else:
107 | return self.location+4, self.lake[self.location+4]
108 | if step==3:
109 | if (self.location%4)==0:
110 | return self.location, self.lake[self.location]
111 | else:
112 | return self.location-1, self.lake[self.location-1]
113 |
114 | def choose_action(self):
115 | if np.random.rand(1) > (self.epsilon):
116 | action = torch.argmax(self.agent(self.location)) #wybor najwiekszej wartosci z tablicy
117 | else:
118 | action = torch.tensor(np.random.randint(0, 4))
119 | return action
120 |
121 | def Qtable(self):
122 | return torch.stack([self.agent(i) for i in range(self.location_space_size)], dim=0)
123 |
124 | def Qstrategy(self):
125 | return [torch.argmax(self.agent(i)).item() for i in range(self.location_space_size)]
126 |
127 | def train(self, epochs):
128 | for x in (pbar := tqdm(range(epochs))):
129 | pbar.set_description(f'Success rate: {sum(self.success[-window:])/window:.2%} | Random chance: {self.epsilon:.2%}')
130 | j=0
131 | self.location = 0
132 | while j100:
166 | if sum(self.success[-window:])/window>target_win_ratio:
167 | print("Network trained before epoch limit on {x} epoch".format(x=x))
168 | break
169 |
170 |
171 | if __name__ == "__main__":
172 | fl = Trainer()
173 | print("Setting deep Q-learning in FrozenLake environment",\
174 | "\nFrozenlake:")
175 | #print(fl.render())
176 |
177 | print("Train through {epochs} epochs". format(epochs=epochs))
178 | fl.train(epochs)
179 |
180 | plot_success_steps_history(fl.jInEpoch, fl.success)
181 |
182 | results_path = "../results/classical_DQL"
183 |
184 | strategy = np.array(fl.Qstrategy()).reshape((4,4))
185 | strategy_save_path = os.path.join(results_path, "trained_strategy.jpg")
186 | strategy_angles = ((strategy+3)%4)*90
187 | plot_strategy(strategy, fl.holes_indexes, strategy_save_path, custom_angles=strategy_angles)
188 |
189 | entropies = np.array(fl.entropies)
190 | cl_entropies = np.array(fl.cl_entropies)
191 | entropies_save_path = os.path.join(results_path, "entropies.jpg")
192 | plot_entropies(entropies, cl_entropies, entropies_save_path)
193 |
194 | moving_average_history_save_path = os.path.join(results_path, "training_history_moving_average.jpg")
195 | plot_rolling_window_history(fl.jInEpoch, fl.reward_list, fl.success, fl.epsilon_list, target_win_ratio, min_steps_num, moving_average_history_save_path, window=window)
196 | history_save_path = os.path.join(results_path, "training_history.jpg")
197 | plot_history(fl.jInEpoch, fl.reward_list, fl.success, fl.epsilon_list, target_win_ratio, min_steps_num, history_save_path)
198 |
199 |
200 | with open(os.path.join(results_path, "hyperparameters.txt"), "w+") as f:
201 | f.write(f'gamma;{gamma}\n')
202 | f.write(f'epochs;{epochs}\n')
203 | f.write(f'max_steps;{max_steps}\n')
204 | f.write(f'learning_rate;{learning_rate}\n')
205 | f.write(f'random_chance;{random_chance}\n')
206 | f.write(f'random_scaling;{random_scaling}\n')
207 | f.write(f'window;{window}\n')
208 | f.write(f'target_win_ratio;{target_win_ratio}\n')
209 | f.write(f'min_steps_num;{min_steps_num}\n')
210 |
211 | with open(os.path.join(results_path, "entropies.txt"), "w") as f:
212 | for ent in fl.entropies:
213 | f.write(str(ent)+";")
214 |
215 | with open(os.path.join(results_path, "cl_entropies.txt"), "w") as f:
216 | for ent in fl.cl_entropies:
217 | f.write(str(ent)+";")
--------------------------------------------------------------------------------
/scripts/3._Classical_DQL_sim_quant.py:
--------------------------------------------------------------------------------
1 | import gym
2 | import torch
3 | from torch import nn
4 | from torch.autograd import Variable
5 | from torch import optim
6 | from torch.nn import functional as F
7 | from torch import linalg as LA
8 |
9 | import numpy as np
10 | from tqdm import tqdm
11 |
12 | from src.entropies import entanglement_entropy, classical_entropy
13 | from src.visualizations import *
14 |
15 | import os
16 | os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
17 |
18 |
19 | global_seed = 123456
20 |
21 | torch.manual_seed(global_seed)
22 | np.random.seed(global_seed)
23 |
24 |
25 | gamma = 0.9
26 | epochs = 30000
27 | max_steps = 60
28 | learning_rate = 0.0002
29 | non_random_chance = 0.99
30 | random_scaling = 0.9998
31 | window = 40
32 | target_win_ratio = 0.98
33 | min_steps_num = 6
34 | activation_function = 'sigmoid'
35 | n_hidden_layers = 1
36 | results_folder = f'_BEST_{n_hidden_layers}_layers_{activation_function}_activation_longer'
37 | results_path = os.path.join('../results', 'classical_DQL_sim_quantum', results_folder)
38 |
39 | if not os.path.exists(results_path):
40 | os.mkdir(results_path)
41 |
42 |
43 | # ## 1. Create a FrozenLake environment
44 | lake = gym.make('FrozenLake-v1', is_slippery=False)
45 | lake.reset()
46 | print(lake.render(mode='ansi'))
47 |
48 |
49 | # ## 1. Define one_hot encoding function, and uniform initializer for linear layer
50 | def one_hot(ids, nb_digits):
51 | """
52 | ids: (list, ndarray) shape:[batch_size]
53 | """
54 | if not isinstance(ids, (list, np.ndarray)):
55 | raise ValueError("ids must be 1-D list or array")
56 | batch_size = len(ids)
57 | ids = torch.LongTensor(ids).view(batch_size, 1)
58 | out_tensor = Variable(torch.FloatTensor(batch_size, nb_digits))
59 | out_tensor.data.zero_()
60 | out_tensor.data.scatter_(dim=1, index=ids, value=1.)
61 | return out_tensor
62 |
63 | def uniform_linear_layer(linear_layer):
64 | linear_layer.weight.data.uniform_()
65 | linear_layer.bias.data.fill_(-0.02)
66 |
67 |
68 | # ## 3. Define Agent model, basically for Q values
69 | class Agent(nn.Module):
70 | def __init__(self, observation_space_size, action_space_size, n_hidden_layers):
71 | super(Agent, self).__init__()
72 | self.observation_space_size = observation_space_size
73 | self.hidden_size = 2*self.observation_space_size
74 |
75 | self.l1 = nn.Linear(in_features=2*self.observation_space_size, out_features=self.hidden_size)
76 | self.hidden_layers = [
77 | nn.Linear(in_features=self.hidden_size, out_features=self.hidden_size) \
78 | for i in range(n_hidden_layers)
79 | ]
80 | self.l2 = nn.Linear(in_features=self.hidden_size, out_features=32)
81 | self.activation = None
82 | if activation_function=='lrelu':
83 | self.activation = F.leaky_relu
84 | if activation_function=='sigmoid':
85 | self.activation = F.sigmoid
86 | if activation_function=='tanh':
87 | self.activation = F.tanh
88 |
89 | uniform_linear_layer(self.l1)
90 | for l in self.hidden_layers:
91 | uniform_linear_layer(l)
92 |
93 | uniform_linear_layer(self.l2)
94 |
95 | print('Set the neural network with:')
96 | print(f'\tInput size: \t{2*self.observation_space_size}')
97 | for i, l in enumerate(range(n_hidden_layers)):
98 | print(f'\tHidden {i+1}. layer size: \t{self.hidden_size}')
99 | print(f'\tOutput size: \t{32}')
100 |
101 | def forward(self, state):
102 | obs_emb = one_hot([int(2*state)], 2*self.observation_space_size)
103 | # first layer:
104 | out1 = self.activation(self.l1(obs_emb))
105 |
106 | # hidden layers:
107 | for l in self.hidden_layers:
108 | out1 = self.activation(l(out1))
109 |
110 | # output layers:
111 | out2 = self.activation(self.l2(out1))
112 |
113 | return out2.view((-1))
114 |
115 |
116 | # ## 4. Define the Trainer to optimize Agent model
117 | class Trainer:
118 | def __init__(self, n_hidden_layers):
119 | self.holes_indexes = np.array([5,7,11,12])
120 |
121 | self.agent = Agent(lake.observation_space.n, lake.action_space.n, n_hidden_layers)
122 | self.optimizer = optim.Adam(params=self.agent.parameters(), lr=learning_rate)
123 |
124 | self.epsilon = non_random_chance
125 | self.epsilon_growth_rate = random_scaling
126 | self.gamma = gamma
127 |
128 | self.epsilon_list = []
129 | self.success = []
130 | self.jList = []
131 | self.reward_list = []
132 |
133 | self.compute_entropy = True
134 | self.entropies = []
135 | self.cl_entropies = []
136 | self.entropies_episodes = [0]
137 |
138 | self.print = False
139 |
140 |
141 | def train(self, epoch):
142 | # entropies_episodes = [0] * (epoch+1)
143 | for i in (pbar := tqdm(range(epoch))):
144 | pbar.set_description(f'Success rate: {sum(self.success[-window:])/window:.2%} | Random chance: {self.epsilon:.2%}')
145 |
146 | s = lake.reset() #stan na jeziorze 0-16, dla resetu 0
147 | j = 0
148 | self.entropies_episodes.append(0)
149 | while j < max_steps:
150 | j += 1
151 | # perform chosen action
152 | a = self.choose_action(s)
153 | s1, r, d, _ = lake.step(int(a))
154 | if d == True and r == 0: r = -1
155 | elif d== True: r == 1
156 | elif r==0: r = -0.01
157 |
158 | # if self.print==False:
159 | # print(self.agent(s)[a])
160 | # self.print=True
161 |
162 | # calculate target and loss
163 | target_q = r + self.gamma * torch.max(self.calc_probabilities(s1).detach())
164 |
165 | loss = F.smooth_l1_loss(self.calc_probability(s, a), target_q)
166 | # update model to optimize Q
167 | self.optimizer.zero_grad()
168 | loss.backward()
169 | self.optimizer.step()
170 |
171 | # update state
172 | s = s1
173 | if(self.compute_entropy):
174 | with torch.inference_mode():
175 | self.entropies.append(entanglement_entropy(self.calc_statevector(s)))
176 | self.cl_entropies.append(classical_entropy(self.calc_statevector(s)))
177 | self.entropies_episodes[i] += 1
178 |
179 | if d == True: break
180 |
181 | # append results onto report lists
182 | if d == True and r > 0:
183 | self.success.append(1)
184 | else:
185 | self.success.append(0)
186 |
187 | self.reward_list.append(r)
188 | self.jList.append(j)
189 |
190 | if self.epsilon < 1.:
191 | self.epsilon *= self.epsilon_growth_rate
192 | self.epsilon_list.append(self.epsilon)
193 |
194 | if i>100:
195 | if sum(self.success[-window:])/window>target_win_ratio:
196 | print("Network trained before epoch limit on {i} epoch".format(i=i))
197 | break
198 |
199 | #print("last 100 epoches success rate: " + str(sum(self.success[-100:])/100) + "%")
200 |
201 | def choose_action(self, s):
202 | self.calc_probabilities(s)
203 | if np.random.rand(1) > self.epsilon :
204 | action = torch.argmax(self.calc_probabilities(s)) #wybor najwiekszej wartosci z tablicy
205 | else:
206 | action = torch.tensor(np.random.randint(0, 4))
207 | return action
208 |
209 | def calc_statevector(self, s):
210 | return torch.complex(self.agent(s)[0::2], self.agent(s)[1::2])
211 |
212 | def calc_probability(self, s, a): #liczenie prawdopodobieństwa obsadzenia kubitu (0-3) z danego stanu planszy (0-15)
213 | statevector = torch.complex(self.agent(s)[0::2], self.agent(s)[1::2])
214 | probabilities = (statevector.abs()**2)
215 | probabilities = probabilities/probabilities.sum() #normowanie
216 | prob_indexes = [
217 | [0,1,2,3,4,5,6,7],
218 | [0,1,2,3,8,9,10,11],
219 | [0,1,4,5,8,9,12,13],
220 | [0,2,4,6,8,10,12,14]
221 | ]
222 | return probabilities[prob_indexes[a]].sum()
223 |
224 | def calc_probabilities(self, s): #liczenie prawdopodobieństw każdego z kubitów z danego stanu planszy (0-15) do tensora o kształcie (4)
225 | raw_wavefunction = torch.complex(self.agent(s)[0::2], self.agent(s)[1::2])
226 | probabilities = (raw_wavefunction.abs()**2)
227 | probabilities = probabilities/probabilities.sum() #normowanie
228 | probs_of_qubits = torch.tensor([
229 | probabilities[[0,1,2,3,4,5,6,7]].sum(),
230 | probabilities[[0,1,2,3,8,9,10,11]].sum(),
231 | probabilities[[0,1,4,5,8,9,12,13]].sum(),
232 | probabilities[[0,2,4,6,8,10,12,14]].sum()
233 | ])
234 | return probs_of_qubits
235 |
236 |
237 | def Q(self):
238 | Q = []
239 | for x in range(lake.observation_space.n):
240 | Qstate = self.agent(x).detach()
241 | Qstate /= LA.norm(Qstate)
242 | Q.append(Qstate)
243 | Q_out = torch.Tensor(lake.observation_space.n, lake.action_space.n)
244 | torch.cat(Q, out=Q_out)
245 | return Q_out
246 |
247 | def Qstate(self, state):
248 | Qstate = self.agent(state).detach()
249 | Qstate /= LA.norm(Qstate)
250 | return Qstate
251 |
252 | def Qstrategy(self):
253 | return [torch.argmax(self.calc_probabilities(state)).item() for state in range(lake.observation_space.n)]
254 |
255 |
256 | # ## 5. Initialize a trainer, and perform training by 2k epoches
257 | fl = Trainer(n_hidden_layers)
258 |
259 | t = torch.Tensor(np.array([32], dtype=int))
260 | t = t.to(torch.int32)
261 |
262 | print("Train through {epochs} epochs". format(epochs=epochs))
263 | fl.train(epochs)
264 |
265 | plot_success_steps_history(fl.jList, fl.success)
266 |
267 | strategy = np.array(fl.Qstrategy()).reshape((4,4))
268 | strategy_save_path = os.path.join(results_path, "trained_strategy.jpg")
269 | plot_strategy(strategy, fl.holes_indexes, strategy_save_path)
270 |
271 | entropies = np.array(fl.entropies)
272 | cl_entropies = np.array(fl.cl_entropies)
273 | entropies_save_path = os.path.join(results_path, "entropies.jpg")
274 | plot_entropies(entropies, cl_entropies, entropies_save_path)
275 |
276 | moving_average_history_save_path = os.path.join(results_path, "training_history_moving_average.jpg")
277 | plot_rolling_window_history(fl.jList, fl.reward_list, fl.success, np.array(fl.epsilon_list), target_win_ratio, min_steps_num, moving_average_history_save_path, window=window)
278 | history_save_path = os.path.join(results_path, "training_history.jpg")
279 | plot_history(fl.jList, fl.reward_list, fl.success, np.array(fl.epsilon_list), target_win_ratio, min_steps_num, history_save_path)
280 |
281 |
282 | with open(os.path.join(results_path, "hyperparameters.txt"), "w+") as f:
283 | f.write(f'gamma;{gamma}\n')
284 | f.write(f'epochs;{epochs}\n')
285 | f.write(f'max_steps;{max_steps}\n')
286 | f.write(f'learning_rate;{learning_rate}\n')
287 | f.write(f'non_random_chance;{non_random_chance}\n')
288 | f.write(f'random_scaling;{random_scaling}\n')
289 | f.write(f'window;{window}\n')
290 | f.write(f'target_win_ratio;{target_win_ratio}\n')
291 | f.write(f'min_steps_num;{min_steps_num}\n')
292 | f.write(f'n_hidden_layers;{n_hidden_layers}\n')
293 | f.write(f'activation_function;{activation_function}\n')
294 | f.write(f'global_seed;{global_seed}\n')
295 |
296 | with open(os.path.join(results_path, "entropies.txt"), "w") as f:
297 | for ent in fl.entropies:
298 | f.write(str(ent)+";")
299 |
300 | with open(os.path.join(results_path, "cl_entropies.txt"), "w") as f:
301 | for ent in fl.cl_entropies:
302 | f.write(str(ent)+";")
303 |
304 |
305 |
306 |
307 |
308 | #na poczatku zamiast one hota ma wziąć cos 32 liczby
309 | #na razie jest funkcja zmianiajace liczbe na zwykły binarny zapis
310 | #następnie sluży nam to do w obwodzie kwantowym do obrotu kubitów, żeby je włączyć
311 | #i tak np jak stoimy na polu 13 (1,1,0,1) to włączymy wszystkie kubity poza 3cim
312 | #to jest wejscie sieci kwantowej
313 | #u nas musi to być 32
314 |
315 | #u nas trzeba z lokalizacji wybrać jedną z 16tu kombinacji kubitów i nadać jej amplitudę jakby została włączona (po zobaczeniu w kodzie kwantowym jak odbywa się preparation state wrzucić to do symulatora obwodu kwantowego i uzyskać amplitudę, czyli (-1+0i))
316 | #stąd wszędzie bd one-hoty postaci para [-1, 0] a reszta zera
317 | #zmieni się to, jeżeli będziemy mieli do czynienia z innym przygotowaniem
318 |
319 | def decimalToBinaryFixLength(_length, _decimal):
320 | binNum = bin(int(_decimal))[2:]
321 | outputNum = [int(item) for item in binNum]
322 | if len(outputNum) < _length:
323 | outputNum = np.concatenate((np.zeros((_length-len(outputNum),)),np.array(outputNum)))
324 | else:
325 | outputNum = np.array(outputNum)
326 | return outputNum
327 |
328 |
329 |
330 |
331 |
332 |
--------------------------------------------------------------------------------
/scripts/3b._Classical_DQL_sim_quant_grid_search.py:
--------------------------------------------------------------------------------
1 | from itertools import product
2 |
3 | activations = ['lrelu', 'sigmoid', 'tanh']
4 |
5 | n_layers = list(range(1,7))
6 | global_seed = 123456
7 |
8 |
9 | for i, (activation_function, n_hidden_layers) in enumerate(product(activations, n_layers)):
10 | print(f'Experiment: {i+1}/{len(activations)*len(n_layers)}')
11 | print("Activation function: ", activation_function)
12 | print("Number of hidden layers: ", n_hidden_layers)
13 |
14 | import gym
15 | import torch
16 | from torch import nn
17 | from torch.autograd import Variable
18 | from torch import optim
19 | from torch.nn import functional as F
20 | from torch import linalg as LA
21 |
22 | import numpy as np
23 | from tqdm import tqdm
24 |
25 | from src.entropies import entanglement_entropy, classical_entropy
26 | from src.visualizations import *
27 |
28 | import os
29 | os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
30 |
31 | torch.manual_seed(global_seed)
32 | np.random.seed(global_seed)
33 |
34 |
35 | gamma = 0.9
36 | epochs = 20000
37 | max_steps = 60
38 | learning_rate = 0.0002
39 | non_random_chance = 0.99
40 | random_scaling = 0.9998
41 | window = 40
42 | target_win_ratio = 0.98
43 | min_steps_num = 6
44 | #activation_function = 'sigmoid'
45 | #n_hidden_layers = 1
46 | results_folder = f'{n_hidden_layers}_layers_{activation_function}_activation'
47 | results_path = os.path.join('../results', 'classical_DQL_sim_quantum', results_folder)
48 |
49 | if not os.path.exists(results_path):
50 | os.mkdir(results_path)
51 |
52 |
53 | # ## 1. Create a FrozenLake environment
54 | lake = gym.make('FrozenLake-v1', is_slippery=False)
55 | lake.reset()
56 | print(lake.render(mode='ansi'))
57 |
58 |
59 | # ## 1. Define one_hot encoding function, and uniform initializer for linear layer
60 | def one_hot(ids, nb_digits):
61 | """
62 | ids: (list, ndarray) shape:[batch_size]
63 | """
64 | if not isinstance(ids, (list, np.ndarray)):
65 | raise ValueError("ids must be 1-D list or array")
66 | batch_size = len(ids)
67 | ids = torch.LongTensor(ids).view(batch_size, 1)
68 | out_tensor = Variable(torch.FloatTensor(batch_size, nb_digits))
69 | out_tensor.data.zero_()
70 | out_tensor.data.scatter_(dim=1, index=ids, value=1.)
71 | return out_tensor
72 |
73 | def uniform_linear_layer(linear_layer):
74 | linear_layer.weight.data.uniform_()
75 | linear_layer.bias.data.fill_(-0.02)
76 |
77 |
78 | # ## 3. Define Agent model, basically for Q values
79 | class Agent(nn.Module):
80 | def __init__(self, observation_space_size, action_space_size, n_hidden_layers):
81 | super(Agent, self).__init__()
82 | self.observation_space_size = observation_space_size
83 | self.hidden_size = 2*self.observation_space_size
84 |
85 | self.l1 = nn.Linear(in_features=2*self.observation_space_size, out_features=self.hidden_size)
86 | self.hidden_layers = [
87 | nn.Linear(in_features=self.hidden_size, out_features=self.hidden_size) \
88 | for i in range(n_hidden_layers)
89 | ]
90 | self.l2 = nn.Linear(in_features=self.hidden_size, out_features=32)
91 | self.activation = None
92 | if activation_function=='lrelu':
93 | self.activation = F.leaky_relu
94 | if activation_function=='sigmoid':
95 | self.activation = F.sigmoid
96 | if activation_function=='tanh':
97 | self.activation = F.tanh
98 |
99 | uniform_linear_layer(self.l1)
100 | for l in self.hidden_layers:
101 | uniform_linear_layer(l)
102 |
103 | uniform_linear_layer(self.l2)
104 |
105 | print('Set the neural network with:')
106 | print(f'\tInput size: \t{2*self.observation_space_size}')
107 | for i, l in enumerate(range(n_hidden_layers)):
108 | print(f'\tHidden {i+1}. layer size: \t{self.hidden_size}')
109 | print(f'\tOutput size: \t{32}')
110 |
111 | def forward(self, state):
112 | obs_emb = one_hot([int(2*state)], 2*self.observation_space_size)
113 | # first layer:
114 | out1 = self.activation(self.l1(obs_emb))
115 |
116 | # hidden layers:
117 | for l in self.hidden_layers:
118 | out1 = self.activation(l(out1))
119 |
120 | # output layers:
121 | out2 = self.activation(self.l2(out1))
122 |
123 | return out2.view((-1))
124 |
125 |
126 | # ## 4. Define the Trainer to optimize Agent model
127 | class Trainer:
128 | def __init__(self, n_hidden_layers):
129 | self.holes_indexes = np.array([5,7,11,12])
130 |
131 | self.agent = Agent(lake.observation_space.n, lake.action_space.n, n_hidden_layers)
132 | self.optimizer = optim.Adam(params=self.agent.parameters(), lr=learning_rate)
133 |
134 | self.epsilon = non_random_chance
135 | self.epsilon_growth_rate = random_scaling
136 | self.gamma = gamma
137 |
138 | self.epsilon_list = []
139 | self.success = []
140 | self.jList = []
141 | self.reward_list = []
142 |
143 | self.compute_entropy = True
144 | self.entropies = []
145 | self.cl_entropies = []
146 | self.entropies_episodes = [0]
147 |
148 | self.print = False
149 |
150 |
151 | def train(self, epoch):
152 | # entropies_episodes = [0] * (epoch+1)
153 | for i in (pbar := tqdm(range(epoch))):
154 | pbar.set_description(f'Success rate: {sum(self.success[-window:])/window:.2%} | Random chance: {self.epsilon:.2%}')
155 |
156 | s = lake.reset() #stan na jeziorze 0-16, dla resetu 0
157 | j = 0
158 | self.entropies_episodes.append(0)
159 | while j < max_steps:
160 | j += 1
161 | # perform chosen action
162 | a = self.choose_action(s)
163 | s1, r, d, _ = lake.step(int(a))
164 | if d == True and r == 0: r = -1
165 | elif d== True: r == 1
166 | elif r==0: r = -0.01
167 |
168 | # if self.print==False:
169 | # print(self.agent(s)[a])
170 | # self.print=True
171 |
172 | # calculate target and loss
173 | target_q = r + self.gamma * torch.max(self.calc_probabilities(s1).detach())
174 |
175 | loss = F.smooth_l1_loss(self.calc_probability(s, a), target_q)
176 | # update model to optimize Q
177 | self.optimizer.zero_grad()
178 | loss.backward()
179 | self.optimizer.step()
180 |
181 | # update state
182 | s = s1
183 | if(self.compute_entropy):
184 | with torch.inference_mode():
185 | self.entropies.append(entanglement_entropy(self.calc_statevector(s)))
186 | self.cl_entropies.append(classical_entropy(self.calc_statevector(s)))
187 | self.entropies_episodes[i] += 1
188 |
189 | if d == True: break
190 |
191 | # append results onto report lists
192 | if d == True and r > 0:
193 | self.success.append(1)
194 | else:
195 | self.success.append(0)
196 |
197 | self.reward_list.append(r)
198 | self.jList.append(j)
199 |
200 | if self.epsilon < 1.:
201 | self.epsilon *= self.epsilon_growth_rate
202 | self.epsilon_list.append(self.epsilon)
203 |
204 | if i>100:
205 | if sum(self.success[-window:])/window>target_win_ratio:
206 | print("Network trained before epoch limit on {i} epoch".format(i=i))
207 | break
208 |
209 | #print("last 100 epoches success rate: " + str(sum(self.success[-100:])/100) + "%")
210 |
211 | def choose_action(self, s):
212 | self.calc_probabilities(s)
213 | if np.random.rand(1) > self.epsilon :
214 | action = torch.argmax(self.calc_probabilities(s)) #wybor najwiekszej wartosci z tablicy
215 | else:
216 | action = torch.tensor(np.random.randint(0, 4))
217 | return action
218 |
219 | def calc_statevector(self, s):
220 | return torch.complex(self.agent(s)[0::2], self.agent(s)[1::2])
221 |
222 | def calc_probability(self, s, a): #liczenie prawdopodobieństwa obsadzenia kubitu (0-3) z danego stanu planszy (0-15)
223 | statevector = torch.complex(self.agent(s)[0::2], self.agent(s)[1::2])
224 | probabilities = (statevector.abs()**2)
225 | probabilities = probabilities/probabilities.sum() #normowanie
226 | prob_indexes = [
227 | [0,1,2,3,4,5,6,7],
228 | [0,1,2,3,8,9,10,11],
229 | [0,1,4,5,8,9,12,13],
230 | [0,2,4,6,8,10,12,14]
231 | ]
232 | return probabilities[prob_indexes[a]].sum()
233 |
234 | def calc_probabilities(self, s): #liczenie prawdopodobieństw każdego z kubitów z danego stanu planszy (0-15) do tensora o kształcie (4)
235 | raw_wavefunction = torch.complex(self.agent(s)[0::2], self.agent(s)[1::2])
236 | probabilities = (raw_wavefunction.abs()**2)
237 | probabilities = probabilities/probabilities.sum() #normowanie
238 | probs_of_qubits = torch.tensor([
239 | probabilities[[0,1,2,3,4,5,6,7]].sum(),
240 | probabilities[[0,1,2,3,8,9,10,11]].sum(),
241 | probabilities[[0,1,4,5,8,9,12,13]].sum(),
242 | probabilities[[0,2,4,6,8,10,12,14]].sum()
243 | ])
244 | return probs_of_qubits
245 |
246 |
247 | def Q(self):
248 | Q = []
249 | for x in range(lake.observation_space.n):
250 | Qstate = self.agent(x).detach()
251 | Qstate /= LA.norm(Qstate)
252 | Q.append(Qstate)
253 | Q_out = torch.Tensor(lake.observation_space.n, lake.action_space.n)
254 | torch.cat(Q, out=Q_out)
255 | return Q_out
256 |
257 | def Qstate(self, state):
258 | Qstate = self.agent(state).detach()
259 | Qstate /= LA.norm(Qstate)
260 | return Qstate
261 |
262 | def Qstrategy(self):
263 | return [torch.argmax(self.calc_probabilities(state)).item() for state in range(lake.observation_space.n)]
264 |
265 |
266 | # ## 5. Initialize a trainer, and perform training by 2k epoches
267 | fl = Trainer(n_hidden_layers)
268 |
269 | t = torch.Tensor(np.array([32], dtype=int))
270 | t = t.to(torch.int32)
271 |
272 | print("Train through {epochs} epochs". format(epochs=epochs))
273 | fl.train(epochs)
274 |
275 | plot_success_steps_history(fl.jList, fl.success)
276 |
277 | strategy = np.array(fl.Qstrategy()).reshape((4,4))
278 | strategy_save_path = os.path.join(results_path, "trained_strategy.jpg")
279 | plot_strategy(strategy, fl.holes_indexes, strategy_save_path)
280 |
281 | entropies = np.array(fl.entropies)
282 | cl_entropies = np.array(fl.cl_entropies)
283 | entropies_save_path = os.path.join(results_path, "entropies.jpg")
284 | plot_entropies(entropies, cl_entropies, entropies_save_path)
285 |
286 | moving_average_history_save_path = os.path.join(results_path, "training_history_moving_average.jpg")
287 | plot_rolling_window_history(fl.jList, fl.reward_list, fl.success, np.array(fl.epsilon_list), target_win_ratio, min_steps_num, moving_average_history_save_path, window=window)
288 | history_save_path = os.path.join(results_path, "training_history.jpg")
289 | plot_history(fl.jList, fl.reward_list, fl.success, np.array(fl.epsilon_list), target_win_ratio, min_steps_num, history_save_path)
290 |
291 |
292 | with open(os.path.join(results_path, "hyperparameters.txt"), "w+") as f:
293 | f.write(f'gamma;{gamma}\n')
294 | f.write(f'epochs;{epochs}\n')
295 | f.write(f'max_steps;{max_steps}\n')
296 | f.write(f'learning_rate;{learning_rate}\n')
297 | f.write(f'non_random_chance;{non_random_chance}\n')
298 | f.write(f'random_scaling;{random_scaling}\n')
299 | f.write(f'window;{window}\n')
300 | f.write(f'target_win_ratio;{target_win_ratio}\n')
301 | f.write(f'min_steps_num;{min_steps_num}\n')
302 | f.write(f'n_hidden_layers;{n_hidden_layers}\n')
303 | f.write(f'activation_function;{activation_function}\n')
304 | f.write(f'global_seed;{global_seed}\n')
305 |
306 | with open(os.path.join(results_path, "entropies.txt"), "w") as f:
307 | for ent in fl.entropies:
308 | f.write(str(ent)+";")
309 |
310 | with open(os.path.join(results_path, "cl_entropies.txt"), "w") as f:
311 | for ent in fl.cl_entropies:
312 | f.write(str(ent)+";")
313 |
--------------------------------------------------------------------------------
/scripts/QML/QML_DQN_FROZEN_LAKE.py:
--------------------------------------------------------------------------------
1 | # QML as Q Learning function approximator
2 | # Need to specify STATE input format
3 | # Computational Basis Encoding
4 | # action output is still softmax [a_0, a_1, a_2, a_3, a_4, a_5]
5 | # Deep Q-Learning DQN
6 | # Experimence Replay (For i.i.d sampling)
7 | # Target Network (Updata every C episodes) ==> Another Circuit Parameter Set
8 |
9 | # This version is enhanced with PyTorch
10 | # Adapt some code from
11 | # PyTorch tutorial on deep reinforcement learning
12 | # and
13 | # Xanadu AI github repository
14 | # Environment: OpenAI gym FrozenLake
15 |
16 | ##
17 | import pennylane as qml
18 | from pennylane import numpy as np
19 | from pennylane.optimize import NesterovMomentumOptimizer
20 |
21 | import torch
22 | import torch.nn as nn
23 | from torch.autograd import Variable
24 |
25 | import matplotlib.pyplot as plt
26 | from datetime import datetime
27 | import pickle
28 |
29 | import gym
30 | import time
31 | import random
32 | from collections import namedtuple
33 | from copy import deepcopy
34 |
35 | from scipy.linalg import logm
36 |
37 | from ShortestPathFrozenLake import ShortestPathFrozenLake
38 |
39 | # ENTANGLEMENT ENTROPY
40 | def entanglement_entropy(state):
41 | state = np.array(state.detach(), ndmin=2)
42 | ket = state.T
43 | bra = state.conj()
44 | rho_final = np.outer(ket, bra)
45 | num_wires = int(np.log2(state.size))
46 | S = []
47 | for d in range(1, num_wires):
48 | Ia = np.identity(2 ** d)
49 | Ib = np.identity(2 ** (num_wires - d))
50 | Tr_a = np.empty(
51 | [2 ** d, 2 ** (num_wires - d), 2 ** (num_wires - d)], dtype=complex
52 | )
53 | for i in range(2 ** d):
54 | ai = np.array(Ia[i], ndmin=2).T
55 | Tr_a[i] = np.kron(ai.conj().T, Ib).dot(rho_final).dot(np.kron(ai, Ib))
56 | rho_b = Tr_a.sum(axis=0)
57 | rho_b_l2 = logm(rho_b) / np.log(2.0)
58 | S_rho_b = -rho_b.dot(rho_b_l2).trace()
59 | S.append(S_rho_b)
60 | return np.array(S).numpy().mean()
61 |
62 |
63 | # ENTROPY
64 | def classical_entropy(state):
65 | state_temp = state.detach()
66 | ket_2 = np.abs(state_temp) ** 2
67 | return -torch.sum(ket_2 * np.log2(ket_2)) # zwraca jednoelementowy tensor torch'a
68 |
69 |
70 | ## Definition of Replay Memory
71 | ## If next_state == None
72 | ## it is in the terminal state
73 |
74 |
75 | Transition = namedtuple(
76 | "Transition", ("state", "action", "reward", "next_state", "done")
77 | )
78 |
79 |
80 | class ReplayMemory(object):
81 | def __init__(self, capacity):
82 | self.capacity = capacity
83 | self.memory = []
84 | self.position = 0
85 |
86 | def push(self, *args):
87 | """Saves a transition."""
88 | if len(self.memory) < self.capacity:
89 | self.memory.append(None)
90 | self.memory[self.position] = Transition(*args)
91 | self.position = (self.position + 1) % self.capacity
92 |
93 | def sample(self, batch_size):
94 | return random.sample(self.memory, batch_size)
95 |
96 | def output_all(self):
97 | return self.memory
98 |
99 | def __len__(self):
100 | return len(self.memory)
101 |
102 |
103 | ####
104 |
105 |
106 | ## Plotting Function ##
107 | """
108 | Note: the plotting code is origin from Yang, Chao-Han Huck, et al. "Enhanced Adversarial Strategically-Timed Attacks Against Deep Reinforcement Learning."
109 | ## ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2020.
110 | If you use the code in your research, please cite the original reference.
111 | """
112 |
113 |
114 | def plotTrainingResultCombined(
115 | _iter_index, _iter_reward, _iter_total_steps, _fileTitle
116 | ):
117 | fig, ax = plt.subplots()
118 | # plt.yscale('log')
119 | ax.plot(_iter_index, _iter_reward, "-b", label="Reward")
120 | ax.plot(_iter_index, _iter_total_steps, "-r", label="Total Steps")
121 | leg = ax.legend()
122 |
123 | ax.set(xlabel="Iteration Index", title=_fileTitle)
124 | fig.savefig(_fileTitle + "_" + datetime.now().strftime("NO%Y%m%d%H%M%S") + ".png")
125 |
126 |
127 | def plotTrainingResultReward(_iter_index, _iter_reward, _iter_total_steps, _fileTitle):
128 | fig, ax = plt.subplots()
129 | # plt.yscale('log')
130 | ax.plot(_iter_index, _iter_reward, "-b", label="Reward")
131 | # ax.plot(_iter_index, _iter_total_steps, '-r', label='Total Steps')
132 | leg = ax.legend()
133 |
134 | ax.set(xlabel="Iteration Index", title=_fileTitle)
135 | fig.savefig(
136 | _fileTitle
137 | + "_REWARD"
138 | + "_"
139 | + datetime.now().strftime("NO%Y%m%d%H%M%S")
140 | + ".png"
141 | )
142 |
143 |
144 | ########################################
145 |
146 |
147 | def decimalToBinaryFixLength(_length, _decimal):
148 | binNum = bin(int(_decimal))[2:]
149 | outputNum = [int(item) for item in binNum]
150 | if len(outputNum) < _length:
151 | outputNum = np.concatenate(
152 | (np.zeros((_length - len(outputNum),)), np.array(outputNum))
153 | )
154 | else:
155 | outputNum = np.array(outputNum)
156 | return outputNum
157 |
158 |
159 | ## PennyLane Part ##
160 |
161 | # Specify the datatype of the Totch tensor
162 | dtype = torch.DoubleTensor
163 |
164 | ## Define a FOUR qubit system
165 | dev = qml.device("default.qubit", wires=4)
166 | # dev = qml.device('qiskit.basicaer', wires=4)
167 | def statepreparation(a):
168 |
169 | """Quantum circuit to encode the input vector into variational params
170 |
171 | Args:
172 | a: feature vector of rad and rad_square => np.array([rad_X_0, rad_X_1, rad_square_X_0, rad_square_X_1])
173 | """
174 |
175 | # Rot to computational basis encoding
176 | # a = [a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7, a_8]
177 |
178 | for ind in range(len(a)):
179 | qml.RX(np.pi * a[ind], wires=ind)
180 | qml.RZ(np.pi * a[ind], wires=ind)
181 |
182 |
183 | def layer(W):
184 | """ Single layer of the variational classifier.
185 |
186 | Args:
187 | W (array[float]): 2-d array of variables for one layer
188 | """
189 |
190 | qml.CNOT(wires=[0, 1])
191 | qml.CNOT(wires=[1, 2])
192 | qml.CNOT(wires=[2, 3])
193 |
194 | qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
195 | qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
196 | qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
197 | qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)
198 |
199 |
200 | @qml.qnode(dev, interface="torch")
201 | def circuit(weights, angles=None):
202 | """The circuit of the variational classifier."""
203 | # Can consider different expectation value
204 | # PauliX , PauliY , PauliZ , Identity
205 |
206 | statepreparation(angles)
207 |
208 | for W in weights:
209 | layer(W)
210 |
211 | return [qml.expval(qml.PauliZ(ind)) for ind in range(4)]
212 |
213 |
214 | def variational_classifier(var_Q_circuit, var_Q_bias, angles=None):
215 | """The variational classifier."""
216 |
217 | # Change to SoftMax???
218 |
219 | weights = var_Q_circuit
220 | # bias_1 = var_Q_bias[0]
221 | # bias_2 = var_Q_bias[1]
222 | # bias_3 = var_Q_bias[2]
223 | # bias_4 = var_Q_bias[3]
224 | # bias_5 = var_Q_bias[4]
225 | # bias_6 = var_Q_bias[5]
226 |
227 | # raw_output = circuit(weights, angles=angles) + np.array([bias_1,bias_2,bias_3,bias_4,bias_5,bias_6])
228 | raw_output = circuit(weights, angles=angles) + var_Q_bias
229 |
230 | # We are approximating Q Value
231 | # Maybe softmax is no need
232 | # softMaxOutPut = np.exp(raw_output) / np.exp(raw_output).sum()
233 |
234 | return raw_output
235 |
236 |
237 | def square_loss(labels, predictions):
238 | """ Square loss function
239 |
240 | Args:
241 | labels (array[float]): 1-d array of labels
242 | predictions (array[float]): 1-d array of predictions
243 | Returns:
244 | float: square loss
245 | """
246 | loss = 0
247 | for l, p in zip(labels, predictions):
248 | loss = loss + (l - p) ** 2
249 | loss = loss / len(labels)
250 | # print("LOSS")
251 |
252 | # print(loss)
253 |
254 | # output = torch.abs(predictions - labels)**2
255 | # output = torch.sum(output) / len(labels)
256 |
257 | # loss = nn.MSELoss()
258 | # output = loss(labels.double(), predictions.double())
259 |
260 | return loss
261 |
262 |
263 | # def square_loss(labels, predictions):
264 | # """ Square loss function
265 |
266 | # Args:
267 | # labels (array[float]): 1-d array of labels
268 | # predictions (array[float]): 1-d array of predictions
269 | # Returns:
270 | # float: square loss
271 | # """
272 | # # In Deep Q Learning
273 | # # labels = target_action_value_Q
274 | # # predictions = action_value_Q
275 |
276 | # # loss = 0
277 | # # for l, p in zip(labels, predictions):
278 | # # loss = loss + (l - p) ** 2
279 | # # loss = loss / len(labels)
280 |
281 | # # loss = nn.MSELoss()
282 | # output = torch.abs(predictions - labels)**2
283 | # output = torch.sum(output) / len(labels)
284 | # # output = loss(torch.tensor(predictions), torch.tensor(labels))
285 | # # print("LOSS OUTPUT")
286 | # # print(output)
287 |
288 | # return output
289 |
290 |
291 | def abs_loss(labels, predictions):
292 | """ Square loss function
293 |
294 | Args:
295 | labels (array[float]): 1-d array of labels
296 | predictions (array[float]): 1-d array of predictions
297 | Returns:
298 | float: square loss
299 | """
300 | # In Deep Q Learning
301 | # labels = target_action_value_Q
302 | # predictions = action_value_Q
303 |
304 | # loss = 0
305 | # for l, p in zip(labels, predictions):
306 | # loss = loss + (l - p) ** 2
307 | # loss = loss / len(labels)
308 |
309 | # loss = nn.MSELoss()
310 | output = torch.abs(predictions - labels)
311 | output = torch.sum(output) / len(labels)
312 | # output = loss(torch.tensor(predictions), torch.tensor(labels))
313 | # print("LOSS OUTPUT")
314 | # print(output)
315 |
316 | return output
317 |
318 |
319 | def huber_loss(labels, predictions):
320 | """ Square loss function
321 |
322 | Args:
323 | labels (array[float]): 1-d array of labels
324 | predictions (array[float]): 1-d array of predictions
325 | Returns:
326 | float: square loss
327 | """
328 | # In Deep Q Learning
329 | # labels = target_action_value_Q
330 | # predictions = action_value_Q
331 |
332 | # loss = 0
333 | # for l, p in zip(labels, predictions):
334 | # loss = loss + (l - p) ** 2
335 | # loss = loss / len(labels)
336 |
337 | # loss = nn.MSELoss()
338 | loss = nn.SmoothL1Loss()
339 | # output = loss(torch.tensor(predictions), torch.tensor(labels))
340 | # print("LOSS OUTPUT")
341 | # print(output)
342 |
343 | return loss(labels, predictions)
344 |
345 |
346 | def cost(var_Q_circuit, var_Q_bias, features, labels):
347 | """Cost (error) function to be minimized."""
348 |
349 | # predictions = [variational_classifier(weights, angles=f) for f in features]
350 | # Torch data type??
351 |
352 | predictions = [
353 | variational_classifier(
354 | var_Q_circuit=var_Q_circuit,
355 | var_Q_bias=var_Q_bias,
356 | angles=decimalToBinaryFixLength(4, item.state),
357 | )[item.action]
358 | for item in features
359 | ]
360 | # predictions = torch.tensor(predictions,requires_grad=True)
361 | # labels = torch.tensor(labels)
362 | # print("PRIDICTIONS:")
363 | # print(predictions)
364 | # print("LABELS:")
365 | # print(labels)
366 |
367 | return square_loss(labels, predictions)
368 |
369 |
370 | #############################
371 |
372 |
373 | def epsilon_greedy(var_Q_circuit, var_Q_bias, epsilon, n_actions, s, train=False):
374 | """
375 | @param Q Q values state x action -> value
376 | @param epsilon for exploration
377 | @param s number of states
378 | @param train if true then no random actions selected
379 | """
380 |
381 | # Modify to incorporate with Variational Quantum Classifier
382 | # epsilon should change along training
383 | # In the beginning => More Exploration
384 | # In the end => More Exploitation
385 |
386 | # More Random
387 | # np.random.seed(int(datetime.now().strftime("%S%f")))
388 |
389 | if train or np.random.rand() < ((epsilon / n_actions) + (1 - epsilon)):
390 | # action = np.argmax(Q[s, :])
391 | # variational classifier output is torch tensor
392 | # action = np.argmax(variational_classifier(var_Q_circuit = var_Q_circuit, var_Q_bias = var_Q_bias, angles = decimalToBinaryFixLength(9,s)))
393 | action = torch.argmax(
394 | variational_classifier(
395 | var_Q_circuit=var_Q_circuit,
396 | var_Q_bias=var_Q_bias,
397 | angles=decimalToBinaryFixLength(4, s),
398 | )
399 | )
400 | # after circuit() dev.state changes so we can compute entropy
401 | # if(compute_entropy):
402 | # S = entanglement_entropy(dev.state) #dev is global variable
403 | # entropies.append(S)
404 | else:
405 | # need to be torch tensor
406 | action = torch.tensor(np.random.randint(0, n_actions))
407 | return action
408 |
409 |
410 | def deep_Q_Learning(
411 | alpha,
412 | gamma,
413 | epsilon,
414 | episodes,
415 | max_steps,
416 | n_tests,
417 | render=False,
418 | test=False,
419 | early_stopping_threshold=100,
420 | ):
421 | """
422 | @param alpha learning rate
423 | @param gamma decay factor
424 | @param epsilon for exploration
425 | @param max_steps for max step in each episode
426 | @param n_tests number of test episodes
427 | """
428 |
429 | env = ShortestPathFrozenLake(is_slippery=False)
430 |
431 | n_states, n_actions = env.observation_space.n, env.action_space.n
432 | print("NUMBER OF STATES:" + str(n_states))
433 | print("NUMBER OF ACTIONS:" + str(n_actions))
434 |
435 | # Initialize Q function approximator variational quantum circuit
436 | # initialize weight layers
437 |
438 | num_qubits = 4
439 | num_layers = 2
440 | # var_init = (0.01 * np.random.randn(num_layers, num_qubits, 3), 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
441 |
442 | var_init_circuit = Variable(
443 | torch.tensor(
444 | 0.01 * np.random.randn(num_layers, num_qubits, 3), device="cpu"
445 | ).type(dtype),
446 | requires_grad=True,
447 | )
448 | var_init_bias = Variable(
449 | torch.tensor([0.0, 0.0, 0.0, 0.0], device="cpu").type(dtype), requires_grad=True
450 | )
451 |
452 | # Define the two Q value function initial parameters
453 | # Use np copy() function to DEEP COPY the numpy array
454 | var_Q_circuit = var_init_circuit
455 | var_Q_bias = var_init_bias
456 | # print("INIT PARAMS")
457 | # print(var_Q_circuit)
458 |
459 | var_target_Q_circuit = var_Q_circuit.clone().detach()
460 | var_target_Q_bias = var_Q_bias.clone().detach()
461 |
462 | ##########################
463 | # Optimization method => random select train batch from replay memory
464 | # and opt
465 |
466 | # opt = NesterovMomentumOptimizer(0.01)
467 |
468 | # opt = torch.optim.Adam([var_Q_circuit, var_Q_bias], lr = 0.1)
469 | # opt = torch.optim.SGD([var_Q_circuit, var_Q_bias], lr=0.1, momentum=0.9)
470 | opt = torch.optim.RMSprop(
471 | [var_Q_circuit, var_Q_bias],
472 | lr=0.01,
473 | alpha=0.99,
474 | eps=1e-08,
475 | weight_decay=0,
476 | momentum=0,
477 | centered=False,
478 | )
479 |
480 | ## NEed to move out of the function
481 | TARGET_UPDATE = 20
482 | batch_size = 5
483 | OPTIMIZE_STEPS = 5
484 | ##
485 |
486 | target_update_counter = 0
487 |
488 | iter_index = []
489 | iter_reward = []
490 | iter_total_steps = []
491 |
492 | cost_list = []
493 |
494 | timestep_reward = []
495 |
496 | # Demo of generating an ACTION
497 | # Output a numpy array of value for each action
498 |
499 | # Define the replay memory
500 | # Each transition:
501 | # (s_t_0, a_t_0, r_t, s_t_1, 'DONE')
502 |
503 | memory = ReplayMemory(80)
504 |
505 | # Input Angle = decimalToBinaryFixLength(9, stateInd)
506 | # Input Angle is a numpy array
507 |
508 | # stateVector = decimalToBinaryFixLength(9, stateInd)
509 |
510 | # q_val_s_t = variational_classifier(var_Q, angles=stateVector)
511 | # # action_t = q_val_s_t.argmax()
512 | # action_t = epsilon_greedy(var_Q, epsilon, n_actions, s)
513 | # q_val_target_s_t = variational_classifier(var_target_Q, angles=stateVector)
514 |
515 | # train the variational classifier
516 |
517 | episode = 0
518 | episodes_since_last_reward_change = 0
519 | last_total_reward = -np.inf # infinity
520 |
521 | # for episode in range(episodes):
522 | while (
523 | episode < episodes
524 | and episodes_since_last_reward_change <= early_stopping_threshold
525 | ):
526 | print(f"Episode: {episode}")
527 | # Output s in decimal format
528 | s = env.reset()
529 | # Doing epsilog greedy action selection
530 | # With var_Q
531 | a = epsilon_greedy(
532 | var_Q_circuit=var_Q_circuit,
533 | var_Q_bias=var_Q_bias,
534 | epsilon=epsilon,
535 | n_actions=n_actions,
536 | s=s,
537 | ).item()
538 | t = 0
539 | total_reward = 0
540 | done = False
541 |
542 | while t < max_steps:
543 | if render:
544 | print("###RENDER###")
545 | env.render()
546 | print("###RENDER###")
547 | t += 1
548 |
549 | target_update_counter += 1
550 |
551 | # Execute the action
552 | s_, reward, done, info = env.step(a)
553 | # print("Reward : " + str(reward))
554 | # print("Done : " + str(done))
555 | total_reward += reward
556 | # a_ = np.argmax(Q[s_, :])
557 | a_ = epsilon_greedy(
558 | var_Q_circuit=var_Q_circuit,
559 | var_Q_bias=var_Q_bias,
560 | epsilon=epsilon,
561 | n_actions=n_actions,
562 | s=s_,
563 | ).item()
564 |
565 | # print("ACTION:")
566 | # print(a_)
567 |
568 | memory.push(s, a, reward, s_, done)
569 |
570 | if len(memory) > batch_size:
571 |
572 | # Sampling Mini_Batch from Replay Memory
573 |
574 | batch_sampled = memory.sample(batch_size=batch_size)
575 |
576 | # Transition = (s_t, a_t, r_t, s_t+1, done(True / False))
577 |
578 | # item.state => state
579 | # item.action => action taken at state s
580 | # item.reward => reward given based on (s,a)
581 | # item.next_state => state arrived based on (s,a)
582 |
583 | Q_target = [
584 | item.reward
585 | + (1 - int(item.done))
586 | * gamma
587 | * torch.max(
588 | variational_classifier(
589 | var_Q_circuit=var_target_Q_circuit,
590 | var_Q_bias=var_target_Q_bias,
591 | angles=decimalToBinaryFixLength(4, item.next_state),
592 | )
593 | )
594 | for item in batch_sampled
595 | ]
596 | # Q_prediction = [variational_classifier(var_Q, angles=decimalToBinaryFixLength(9,item.state))[item.action] for item in batch_sampled ]
597 |
598 | # Gradient Descent
599 | # cost(weights, features, labels)
600 | # square_loss_training = square_loss(labels = Q_target, Q_predictions)
601 | # print("UPDATING PARAMS...")
602 |
603 | # CHANGE TO TORCH OPTIMIZER
604 |
605 | # var_Q = opt.step(lambda v: cost(v, batch_sampled, Q_target), var_Q)
606 | # opt.zero_grad()
607 | # loss = cost(var_Q_circuit = var_Q_circuit, var_Q_bias = var_Q_bias, features = batch_sampled, labels = Q_target)
608 | # print(loss)
609 | # FIX this gradient error
610 | # loss.backward()
611 | # opt.step(loss)
612 |
613 | def closure():
614 | opt.zero_grad()
615 | entropies[0].append(entanglement_entropy(dev.state))
616 | loss = cost(
617 | var_Q_circuit=var_Q_circuit,
618 | var_Q_bias=var_Q_bias,
619 | features=batch_sampled,
620 | labels=Q_target,
621 | )
622 | entropies[1].append(entanglement_entropy(dev.state))
623 | # print(loss)
624 | loss.backward()
625 | entropies[2].append(entanglement_entropy(dev.state))
626 | entropies[3].append(classical_entropy(dev.state))
627 | return loss
628 |
629 | opt.step(closure)
630 |
631 | # print("UPDATING PARAMS COMPLETED")
632 | current_replay_memory = memory.output_all()
633 | current_target_for_replay_memory = [
634 | item.reward
635 | + (1 - int(item.done))
636 | * gamma
637 | * torch.max(
638 | variational_classifier(
639 | var_Q_circuit=var_target_Q_circuit,
640 | var_Q_bias=var_target_Q_bias,
641 | angles=decimalToBinaryFixLength(4, item.next_state),
642 | )
643 | )
644 | for item in current_replay_memory
645 | ]
646 | # current_target_for_replay_memory = [item.reward + (1 - int(item.done)) * gamma * np.max(variational_classifier(var_target_Q, angles=decimalToBinaryFixLength(9,item.next_state))) for item in current_replay_memory]
647 |
648 | # if t%5 == 0:
649 | # cost_ = cost(var_Q_circuit = var_Q_circuit, var_Q_bias = var_Q_bias, features = current_replay_memory, labels = current_target_for_replay_memory)
650 | # print("Cost: ")
651 | # print(cost_.item())
652 | # cost_list.append(cost_)
653 |
654 | if target_update_counter > TARGET_UPDATE:
655 | print("UPDATING TARGET CIRCUIT...")
656 |
657 | var_target_Q_circuit = var_Q_circuit.clone().detach()
658 | var_target_Q_bias = var_Q_bias.clone().detach()
659 |
660 | target_update_counter = 0
661 |
662 | s, a = s_, a_
663 |
664 | if done:
665 | if render:
666 | print("###FINAL RENDER###")
667 | env.render()
668 | print("###FINAL RENDER###")
669 | print(f"This episode took {t} timesteps and reward: {total_reward}")
670 | epsilon = epsilon / ((episode / 100) + 1)
671 | # print("Q Circuit Params:")
672 | # print(var_Q_circuit)
673 | print(f"This episode took {t} timesteps and reward: {total_reward}")
674 | timestep_reward.append(total_reward)
675 | iter_index.append(episode)
676 | iter_reward.append(total_reward)
677 | iter_total_steps.append(t)
678 | break
679 |
680 | ########################################################################
681 | # EARLY STOPPING
682 | ########################################################################
683 | if total_reward != last_total_reward:
684 | last_total_reward = total_reward
685 | episodes_since_last_reward_change = 0
686 | elif (
687 | total_reward == last_total_reward and total_reward > 0
688 | ): # Increment the counter only if the reward was positive
689 | episodes_since_last_reward_change += 1
690 | ########################################################################
691 |
692 | # Increment the episode's index
693 | episode += 1
694 |
695 | if render:
696 | print(f"Here are the Q values:\n{var_Q_circuit}\nTesting now:")
697 | if test:
698 | test_agent(var_Q_circuit, var_Q_bias, env, n_tests, n_actions)
699 | return (
700 | timestep_reward,
701 | iter_index,
702 | iter_reward,
703 | iter_total_steps,
704 | var_Q_circuit,
705 | var_Q_bias,
706 | )
707 |
708 |
709 | def test_agent(var_Q_circuit, var_Q_bias, env, n_tests, n_actions, delay=1):
710 | for test in range(n_tests):
711 | print(f"Test #{test}")
712 | s = env.reset()
713 | done = False
714 | epsilon = 0
715 | while True:
716 | time.sleep(delay)
717 | env.render()
718 | a = epsilon_greedy(
719 | var_Q_circuit, var_Q_bias, epsilon, n_actions, s, train=False
720 | ).item()
721 | print(f"Chose action {a} for state {s}")
722 | s, reward, done, info = env.step(a)
723 | if done:
724 | if reward > 0:
725 | print("Reached goal!")
726 | else:
727 | print("Shit! dead x_x")
728 | time.sleep(3)
729 | break
730 |
731 |
732 | # Should add plotting function and KeyboardInterrupt Handler
733 |
734 | # run circuit with trained params
735 | def run(n_tests):
736 | env = gym.make("Deterministic-ShortestPath-4x4-FrozenLake-v0")
737 | # env = gym.make('Deterministic-4x4-FrozenLake-v0')
738 | n_states, n_actions = env.observation_space.n, env.action_space.n
739 | print("NUMBER OF STATES:" + str(n_states))
740 | print("NUMBER OF ACTIONS:" + str(n_actions))
741 |
742 | # Initialize Q function approximator variational quantum circuit
743 | # initialize weight layers
744 |
745 | with open(
746 | "VQDQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSPropNO20190628142021_var_Q_circuit.txt",
747 | "rb",
748 | ) as fp:
749 | var_Q_circuit = pickle.load(fp)
750 |
751 | with open(
752 | "VQDQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSPropNO20190628142021_var_Q_bias"
753 | + ".txt",
754 | "rb",
755 | ) as fp:
756 | var_Q_bias = pickle.load(fp)
757 |
758 | # with open("_iter_reward" + ".txt", "rb") as fp:
759 | # iter_reward = pickle.load(fp)
760 |
761 | test_agent(var_Q_circuit, var_Q_bias, env, n_tests, n_actions)
762 |
763 |
764 | if __name__ == "__main__":
765 | alpha = 0.4
766 | gamma = 0.999
767 | epsilon = 0.999
768 | episodes = 10000
769 | max_steps = 2500
770 | n_tests = 10
771 | early_stopping_threshold = 20
772 |
773 | compute_entropy = True
774 | if compute_entropy:
775 | entropies = [[], [], [], []]
776 | train = True # Training from scratch
777 | evaluate = False # Evaluation
778 | if train:
779 | # timestep_reward, iter_index, iter_reward, iter_total_steps , var_Q_circuit, var_Q_bias = deep_Q_Learning(alpha, gamma, epsilon, episodes, max_steps, n_tests, render=False, test=False)
780 | (
781 | timestep_reward,
782 | iter_index,
783 | iter_reward,
784 | iter_total_steps,
785 | var_Q_circuit,
786 | var_Q_bias,
787 | ) = deep_Q_Learning(
788 | alpha,
789 | gamma,
790 | epsilon,
791 | episodes,
792 | max_steps,
793 | n_tests,
794 | render=False,
795 | test=False,
796 | early_stopping_threshold=early_stopping_threshold,
797 | )
798 |
799 | print(timestep_reward)
800 |
801 | ## Drawing Training Result ##
802 | file_title = "VQDQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSProp" + datetime.now().strftime(
803 | "NO%Y%m%d%H%M%S"
804 | )
805 |
806 | plotTrainingResultReward(
807 | _iter_index=iter_index,
808 | _iter_reward=iter_reward,
809 | _iter_total_steps=iter_total_steps,
810 | _fileTitle="Quantum_DQN_Frozen_Lake_NonSlip_Dynamic_Epsilon_RMSProp",
811 | )
812 | # plt.show()
813 | ## Saving the model
814 | with open(file_title + "_var_Q_circuit" + ".txt", "wb") as fp:
815 | pickle.dump(var_Q_circuit, fp)
816 |
817 | with open(file_title + "_var_Q_bias" + ".txt", "wb") as fp:
818 | pickle.dump(var_Q_bias, fp)
819 |
820 | with open(file_title + "_iter_reward" + ".txt", "wb") as fp:
821 | pickle.dump(iter_reward, fp)
822 | if compute_entropy:
823 | with open("entropies.txt", "wb") as fp:
824 | entropies = np.array(entropies)
825 | np.savetxt(fp, entropies)
826 |
827 | fig, ax = plt.subplots()
828 | ax.plot(entropies[0].real, label="before loss")
829 | ax.plot(entropies[1].real, label="after loss")
830 | ax.plot(entropies[2].real, ":", label="after .backward")
831 | ax.plot(entropies[3].real, ":", label="classical")
832 | ax.set_title("Entropies")
833 | fig.legend()
834 | fig.savefig("entropies.png")
835 | # plt.show()
836 |
837 | if evaluate:
838 | run(n_tests)
839 |
--------------------------------------------------------------------------------
/scripts/QML/README_ENTANGLEMENT.md:
--------------------------------------------------------------------------------
1 | Potrzebne pliki do odpalenia to : "QML_DQN_FROZEN_LAKE.py", "ShortestPathFrozenLake.py" (znajdują się w foldrze "RUN_QML", najlepiej skopiować cały folder i przenieść się do niego)
2 |
3 | W QML_DQN_... można zmieniać liczbę episodes.
4 |
5 | Odpalić skrypt "QML_DQN_FROZEN_LAKE.py"
6 |
7 | zgrać wszystkie otrzymane pliki
8 |
9 | Potrzebne biblioteki:
10 | gym
11 | numpy
12 | pennylane
13 | pytorch
14 | scipy
--------------------------------------------------------------------------------
/scripts/QML/ShortestPathFrozenLake.py:
--------------------------------------------------------------------------------
1 | import gym
2 | from gym.envs import toy_text
3 | import sys
4 |
5 | class ShortestPathFrozenLake(toy_text.frozen_lake.FrozenLakeEnv):
6 | def __init__(self, **kwargs):
7 | super(ShortestPathFrozenLake, self).__init__(**kwargs)
8 |
9 | for state in range(self.observation_space.start, self.observation_space.n): # for all states
10 | for action in range(self.action_space.start, self.action_space.n): # for all actions
11 | my_transitions = []
12 | for (prob, next_state, _, is_terminal) in self.P[state][action]:
13 | row = next_state // self.ncol
14 | col = next_state - row * self.ncol
15 | tile_type = self.desc[row, col]
16 | if tile_type == b'H':
17 | reward = -0.2
18 | elif tile_type == b'G':
19 | reward = 1.
20 | else:
21 | reward = -0.01
22 |
23 | my_transitions.append((prob, next_state, reward, is_terminal))
24 | self.P[state][action] = my_transitions
25 |
--------------------------------------------------------------------------------
/scripts/QML/run_job.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | #SBATCH -N1
3 | #SBATCH -c8
4 | #SBATCH --mem=4gb
5 | #SBATCH --time=168:00:00
6 |
7 | #SBATCH --job-name=QML
8 |
9 | python3 QML_DQN_FROZEN_LAKE.py
10 |
--------------------------------------------------------------------------------
/scripts/auto_hp_tuning_visuals.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import matplotlib.pyplot as plt
3 | from mpl_toolkits import mplot3d
4 | import numpy as np
5 | import os
6 | import matplotlib.ticker as mticker
7 |
8 |
9 | def log_tick_formatter(val, pos=None):
10 | return f"$10^{{{int(val)}}}$"
11 |
12 |
13 | base_results_df_path = os.path.join("..", "results", "auto_hp_tuning")+'/results.csv'
14 |
15 | res_df = pd.read_csv(base_results_df_path, index_col=0)
16 | res_df = res_df[['if_trained', 'win_ratio', 'episode_reward_mean', 'steps','config/gamma',
17 | 'config/lr', 'config/n_hidden_layers', 'config/random_scaling']]
18 |
19 |
20 | ax = plt.axes(projection='3d')
21 |
22 | # Data for three-dimensional scattered points
23 | zdata = res_df['win_ratio']
24 | xdata = np.log10(res_df['config/gamma'])
25 | ydata = np.log10(res_df['config/lr'])
26 |
27 | ax.plot_trisurf(xdata, ydata, zdata, cmap='viridis')
28 |
29 | ax.set_xlabel("gamma")
30 | ax.set_ylabel("learning rate")
31 | ax.set_zlabel("win ratio")
32 |
33 | ax.yaxis.set_major_formatter(mticker.FuncFormatter(log_tick_formatter))
34 | ax.yaxis.set_major_locator(mticker.MaxNLocator(integer=True))
35 |
36 | ax.xaxis.set_major_formatter(mticker.FuncFormatter(log_tick_formatter))
37 | ax.xaxis.set_major_locator(mticker.MaxNLocator(integer=True))
38 |
39 | plt.suptitle('Win ratio distribution in experiments:')
40 | plt.show()
41 |
42 |
43 | ax = plt.axes(projection='3d')
44 |
45 | ax.scatter3D(xdata, ydata, zdata, c=res_df['config/n_hidden_layers'], cmap='viridis')
46 |
47 | ax.set_xlabel("gamma")
48 | ax.set_ylabel("learning rate")
49 | ax.set_zlabel("win ratio")
50 |
51 | ax.yaxis.set_major_formatter(mticker.FuncFormatter(log_tick_formatter))
52 | ax.yaxis.set_major_locator(mticker.MaxNLocator(integer=True))
53 |
54 | ax.xaxis.set_major_formatter(mticker.FuncFormatter(log_tick_formatter))
55 | ax.xaxis.set_major_locator(mticker.MaxNLocator(integer=True))
56 |
57 | plt.suptitle('Win ratio distribution in experiments:')
58 | plt.show()
--------------------------------------------------------------------------------
/scripts/src/DQL/quant_sim.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch import nn
3 | from torch import optim
4 | from torch.nn import functional as F
5 | from torch import linalg as LA
6 |
7 | import numpy as np
8 | from tqdm import tqdm
9 |
10 | from src.entropies import entanglement_entropy, classical_entropy
11 | from src.visualizations import *
12 | from src.utils import one_hot, uniform_linear_layer
13 |
14 |
15 | class Agent(nn.Module):
16 | def __init__(self, observation_space_size, n_hidden_layers, activation_function):
17 | super(Agent, self).__init__()
18 | self.observation_space_size = observation_space_size
19 | self.hidden_size = 2*self.observation_space_size
20 |
21 | self.l1 = nn.Linear(in_features=2*self.observation_space_size, out_features=self.hidden_size)
22 | self.hidden_layers = [
23 | nn.Linear(in_features=self.hidden_size, out_features=self.hidden_size) \
24 | for i in range(n_hidden_layers)
25 | ]
26 | self.l2 = nn.Linear(in_features=self.hidden_size, out_features=32)
27 | self.activation = None
28 | if activation_function=='lrelu':
29 | self.activation = F.leaky_relu
30 | if activation_function=='sigmoid':
31 | self.activation = F.sigmoid
32 | if activation_function=='tanh':
33 | self.activation = F.tanh
34 |
35 | uniform_linear_layer(self.l1)
36 | for l in self.hidden_layers:
37 | uniform_linear_layer(l)
38 |
39 | uniform_linear_layer(self.l2)
40 |
41 | print('Set the neural network with:')
42 | print(f'\tInput size: \t{2*self.observation_space_size}')
43 | for i, l in enumerate(range(n_hidden_layers)):
44 | print(f'\tHidden {i+1}. layer size: \t{self.hidden_size}')
45 | print(f'\tOutput size: \t{32}')
46 |
47 | def forward(self, state):
48 | obs_emb = one_hot([int(2*state)], 2*self.observation_space_size)
49 | # first layer:
50 | out1 = self.activation(self.l1(obs_emb))
51 |
52 | # hidden layers:
53 | for l in self.hidden_layers:
54 | out1 = self.activation(l(out1))
55 |
56 | # output layers:
57 | out2 = self.activation(self.l2(out1))
58 |
59 | return out2.view((-1))
60 |
61 |
62 | class Trainer:
63 | def __init__(self, n_hidden_layers, lake, learning_rate, non_random_chance, random_scaling, gamma, activation_function):
64 | self.holes_indexes = np.array([5,7,11,12])
65 |
66 | self.lake = lake
67 | self.agent = Agent(self.lake.observation_space.n, n_hidden_layers, activation_function)
68 | self.optimizer = optim.Adam(params=self.agent.parameters(), lr=learning_rate)
69 |
70 | self.epsilon = non_random_chance
71 | self.epsilon_growth_rate = random_scaling
72 | self.gamma = gamma
73 |
74 | self.epsilon_list = []
75 | self.success = []
76 | self.jList = []
77 | self.reward_list = []
78 |
79 | self.compute_entropy = True
80 | self.entropies = []
81 | self.cl_entropies = []
82 | self.entropies_episodes = [0]
83 |
84 | self.print = False
85 |
86 |
87 | def train(self, epoch, max_steps, window, target_win_ratio):
88 | # entropies_episodes = [0] * (epoch+1)
89 | for i in (pbar := tqdm(range(epoch))):
90 | pbar.set_description(f'Success rate: {sum(self.success[-window:])/window:.2%} | Random chance: {self.epsilon:.2%}')
91 |
92 | s = self.lake.reset() #stan na jeziorze 0-16, dla resetu 0
93 | j = 0
94 | self.entropies_episodes.append(0)
95 | while j < max_steps:
96 | j += 1
97 | # perform chosen action
98 | a = self.choose_action(s)
99 | s1, r, d, _ = self.lake.step(int(a))
100 | if d == True and r == 0: r = -1
101 | elif d== True: r == 1
102 | elif r==0: r = -0.01
103 |
104 | # if self.print==False:
105 | # print(self.agent(s)[a])
106 | # self.print=True
107 |
108 | # calculate target and loss
109 | target_q = r + self.gamma * torch.max(self.calc_probabilities(s1).detach())
110 |
111 | loss = F.smooth_l1_loss(self.calc_probability(s, a), target_q)
112 | # update model to optimize Q
113 | self.optimizer.zero_grad()
114 | loss.backward()
115 | self.optimizer.step()
116 |
117 | # update state
118 | s = s1
119 | if(self.compute_entropy):
120 | self.entropies.append(entanglement_entropy(self.calc_probabilities(s)))
121 | self.cl_entropies.append(classical_entropy(self.calc_probabilities(s)))
122 | self.entropies_episodes[i] += 1
123 |
124 | if d == True: break
125 |
126 | # append results onto report lists
127 | if d == True and r > 0:
128 | self.success.append(1)
129 | else:
130 | self.success.append(0)
131 |
132 | self.reward_list.append(r)
133 | self.jList.append(j)
134 |
135 | if self.epsilon < 1.:
136 | self.epsilon *= self.epsilon_growth_rate
137 | self.epsilon_list.append(self.epsilon)
138 |
139 | if i%10==0 and i>100:
140 | if sum(self.success[-window:])/window>target_win_ratio:
141 | print("Network trained before epoch limit on {i} epoch".format(i=i))
142 | break
143 |
144 | #print("last 100 epoches success rate: " + str(sum(self.success[-100:])/100) + "%")
145 |
146 | def choose_action(self, s):
147 | self.calc_probabilities(s)
148 | if np.random.rand(1) > self.epsilon :
149 | action = torch.argmax(self.calc_probabilities(s)) #wybor najwiekszej wartosci z tablicy
150 | else:
151 | action = torch.tensor(np.random.randint(0, 4))
152 | return action
153 |
154 | def calc_probability(self, s, a): #liczenie prawdopodobieństwa obsadzenia kubitu (0-3) z danego stanu planszy (0-15)
155 | raw_wavefunction = torch.complex(self.agent(s)[0::2], self.agent(s)[1::2])
156 | probabilities = (raw_wavefunction.abs()**2)
157 | probabilities = probabilities/probabilities.sum() #normowanie
158 | prob_indexes = [
159 | [0,1,2,3,4,5,6,7],
160 | [0,1,2,3,8,9,10,11],
161 | [0,1,4,5,8,9,12,13],
162 | [0,2,4,6,8,10,12,14]
163 | ]
164 | return probabilities[prob_indexes[a]].sum()
165 |
166 | def calc_probabilities(self, s): #liczenie prawdopodobieństw każdego z kubitów z danego stanu planszy (0-15) do tensora o kształcie (4)
167 | raw_wavefunction = torch.complex(self.agent(s)[0::2], self.agent(s)[1::2])
168 | probabilities = (raw_wavefunction.abs()**2)
169 | probabilities = probabilities/probabilities.sum() #normowanie
170 | probs_of_qubits = torch.tensor([
171 | probabilities[[0,1,2,3,4,5,6,7]].sum(),
172 | probabilities[[0,1,2,3,8,9,10,11]].sum(),
173 | probabilities[[0,1,4,5,8,9,12,13]].sum(),
174 | probabilities[[0,2,4,6,8,10,12,14]].sum()
175 | ])
176 | return probs_of_qubits
177 |
178 |
179 | def Q(self):
180 | Q = []
181 | for x in range(self.lake.observation_space.n):
182 | Qstate = self.agent(x).detach()
183 | Qstate /= LA.norm(Qstate)
184 | Q.append(Qstate)
185 | Q_out = torch.Tensor(self.lake.observation_space.n, self.lake.action_space.n)
186 | torch.cat(Q, out=Q_out)
187 | return Q_out
188 |
189 | def Qstate(self, state):
190 | Qstate = self.agent(state).detach()
191 | Qstate /= LA.norm(Qstate)
192 | return Qstate
193 |
194 | def Qstrategy(self):
195 | return [torch.argmax(self.calc_probabilities(state)).item() for state in range(self.lake.observation_space.n)]
--------------------------------------------------------------------------------
/scripts/src/entropies.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.linalg import logm
3 | from torch import sum as t_sum
4 |
5 |
6 | def entanglement_entropy(state):
7 | #state = np.array(state, ndmin=2)
8 | state = np.array(state.detach().numpy(), ndmin=2)
9 |
10 | ket = state.T
11 | bra = state.conj()
12 | rho_final = np.outer(ket,bra)
13 | num_wires = int(np.log2(state.size))
14 | S = []
15 | for d in range(1, num_wires):
16 | Ia = np.identity(2**d)
17 | Ib = np.identity(2**(num_wires-d))
18 | Tr_a = np.empty([2**d, 2**(num_wires-d), 2**(num_wires-d)], dtype=complex)
19 | for i in range(2**d):
20 | ai = np.array(Ia[i], ndmin=2).T
21 | Tr_a[i] = np.kron(ai.conj().T, Ib).dot(rho_final).dot(np.kron(ai,Ib))
22 | rho_b = Tr_a.sum(axis=0)
23 | rho_b_l2 = logm(rho_b+0.000001)/np.log(2.0)
24 | S_rho_b = - rho_b.dot(rho_b_l2).trace()
25 | S.append(S_rho_b)
26 | return np.array(S).mean()
27 |
28 | def classical_entropy(state):
29 | state_temp = state.detach()
30 | ket_2 = np.abs(state_temp)**2
31 | return - t_sum(ket_2 * np.log2(ket_2))
32 |
--------------------------------------------------------------------------------
/scripts/src/utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Variable
3 | import numpy as np
4 |
5 | def one_hot(ids, nb_digits):
6 | """
7 | ids: (list, ndarray) shape:[batch_size]
8 | """
9 | if not isinstance(ids, (list, np.ndarray)):
10 | raise ValueError("ids must be 1-D list or array")
11 | batch_size = len(ids)
12 | ids = torch.LongTensor(ids).view(batch_size, 1)
13 | out_tensor = Variable(torch.FloatTensor(batch_size, nb_digits))
14 | out_tensor.data.zero_()
15 | out_tensor.data.scatter_(dim=1, index=ids, value=1.)
16 | return out_tensor
17 |
18 | def uniform_linear_layer(linear_layer):
19 | linear_layer.weight.data.uniform_()
20 | linear_layer.bias.data.fill_(-0.02)
--------------------------------------------------------------------------------
/scripts/src/visualizations.py:
--------------------------------------------------------------------------------
1 | import matplotlib.pyplot as plt
2 | import numpy as np
3 | import pandas as pd
4 |
5 | def plot_success_steps_history(steps_list, success_list):
6 | plt.plot(steps_list, label="Steps in epoch")
7 | plt.plot(success_list, label="If success")
8 | plt.legend()
9 | plt.title("Steps from epochs with success indicator")
10 | plt.show()
11 |
12 |
13 | def plot_strategy(strategy, holes_indexes, save_path, custom_angles=None):
14 | #just for the plot purposes
15 | strategy_angles = ((2-strategy)%4)*90
16 | if custom_angles is not None:
17 | strategy_angles = custom_angles
18 | fig, axs = plt.subplots(1, 1, figsize=(3.5, 3.5), sharex=True, sharey=True, tight_layout=True)
19 | axs.set_aspect(1)
20 | x,y = np.meshgrid(np.linspace(0,3,4), np.linspace(3,0,4))
21 | axs.quiver(x, y, np.ones((x.shape))*1.5,np.ones((x.shape))*1.5,angles=np.flip(strategy_angles, axis=0), pivot='middle', units='xy')
22 | axs.scatter( [0], [0], c="cornflowerblue", s=150, alpha=0.6, label="start")
23 | axs.scatter( holes_indexes%4, holes_indexes//4, c="firebrick", s=150, alpha=0.6, label="hole")
24 | axs.scatter( [3], [3], c="mediumseagreen", s=150, alpha=0.6, label="goal")
25 | major_ticks = np.arange(0, 4, 1)
26 | axs.set_xticks(major_ticks)
27 | axs.set_yticks(major_ticks)
28 | axs.set_title("Move strategy from Qtable")
29 | axs.grid(which="major", alpha=0.4)
30 | axs.legend()
31 | plt.savefig(save_path, dpi=900)
32 | plt.show()
33 |
34 |
35 | def plot_entropies(entropies, cl_entropies, save_path):
36 | fig, ax = plt.subplots()
37 | ax.plot(entropies, label="entglmt_entr Lax")
38 | ax.plot(cl_entropies, color='red', label="cl_entropy Rax", alpha=0.4)
39 | ax.legend()
40 | plt.savefig(save_path, dpi=900)
41 | plt.show()
42 |
43 |
44 | def plot_rolling_window_history(steps_list, reward_list, success_list, epsilon_list, target_win_ratio, min_steps_num, save_path, window=40):
45 | plt.figure(figsize=[9,16])
46 | plt.subplot(411)
47 | plt.plot(pd.Series(steps_list).rolling(window).mean())
48 | plt.title('Step Moving Average ({}-episode window)'.format(window))
49 | plt.ylabel('Moves')
50 | plt.xlabel('Episode')
51 | plt.axhline(y=min_steps_num, color='g', linestyle='-', label=f'Optimal number of steps: {min_steps_num}')
52 | plt.ylim(bottom=0)
53 | plt.legend()
54 | plt.grid()
55 |
56 | plt.subplot(412)
57 | plt.plot(pd.Series(reward_list).rolling(window).mean())
58 | plt.title('Reward Moving Average ({}-episode window)'.format(window))
59 | plt.ylabel('Reward')
60 | plt.xlabel('Episode')
61 | plt.ylim(-1.1, 1.1)
62 | plt.grid()
63 |
64 | plt.subplot(413)
65 | plt.plot(pd.Series(success_list).rolling(window).mean())
66 | plt.title('Wins Moving Average ({}-episode window)'.format(window))
67 | plt.ylabel('If won')
68 | plt.axhline(y=target_win_ratio, color='r', linestyle='-', label=f'Early stop condition: {target_win_ratio*100:.2f}%')
69 | plt.legend()
70 | plt.xlabel('Episode')
71 | plt.ylim(-0.1, 1.1)
72 | plt.grid()
73 |
74 | plt.subplot(414)
75 | plt.plot(np.array(epsilon_list))
76 | plt.title('Random Action Parameter')
77 | plt.ylabel('Chance Random Action')
78 | plt.xlabel('Episode')
79 | plt.ylim(-0.1, 1.1)
80 | plt.grid()
81 |
82 | plt.tight_layout(pad=2)
83 | plt.savefig(save_path, dpi=450)
84 | plt.show()
85 |
86 |
87 | def plot_history(steps_list, reward_list, success_list, epsilon_list, target_win_ratio, min_steps_num, save_path):
88 | plt.figure(figsize=[9,16])
89 | plt.subplot(411)
90 | plt.plot(pd.Series(steps_list))
91 | plt.title('Steps per training episode')
92 | plt.ylabel('Moves')
93 | plt.xlabel('Episode')
94 | plt.axhline(y=min_steps_num, color='g', linestyle='-', label=f'Optimal number of steps: {min_steps_num}')
95 | plt.ylim(bottom=0)
96 | plt.legend()
97 | plt.grid()
98 |
99 | plt.subplot(412)
100 | plt.plot(pd.Series(reward_list))
101 | plt.title('Reward per training episode')
102 | plt.ylabel('Reward')
103 | plt.xlabel('Episode')
104 | plt.ylim(-1.1, 1.1)
105 | plt.grid()
106 |
107 | plt.subplot(413)
108 | plt.plot(pd.Series(success_list))
109 | plt.title('Wins per training episode')
110 | plt.ylabel('If won')
111 | plt.axhline(y=target_win_ratio, color='r', linestyle='-', label=f'Early stop condition: {target_win_ratio*100:.2f}%')
112 | plt.legend()
113 | plt.xlabel('Episode')
114 | plt.ylim(-0.1, 1.1)
115 | plt.grid()
116 |
117 | plt.subplot(414)
118 | plt.plot(np.array(epsilon_list))
119 | plt.title('Random Action Parameter')
120 | plt.ylabel('Chance Random Action')
121 | plt.xlabel('Episode')
122 | plt.ylim(-0.1, 1.1)
123 | plt.grid()
124 |
125 | plt.tight_layout(pad=2)
126 | plt.savefig(save_path, dpi=450)
127 | plt.show()
--------------------------------------------------------------------------------
/tutorials/training_example.py:
--------------------------------------------------------------------------------
1 | import pennylane as qml
2 | import numpy as np
3 | import torch
4 | from torch.autograd import Variable
5 |
6 | np.random.seed(42)
7 |
8 | # we generate a three-dimensional random vector by sampling
9 | # each entry from a standard normal distribution
10 | v = np.random.normal(0, 1, 3)
11 |
12 | # purity of the target state
13 | purity = 0.66
14 |
15 | # create a random Bloch vector with the specified purity
16 | bloch_v = np.sqrt(2 * purity - 1) * v / np.sqrt(np.sum(v ** 2))
17 | # bloch_v = np.array([0.0, 0.0, 1.0])
18 |
19 | # array of Pauli matrices (will be useful later)
20 | Paulis = np.zeros((3, 2, 2), dtype=complex)
21 | Paulis[0] = [[0, 1], [1, 0]]
22 | Paulis[1] = [[0, -1j], [1j, 0]]
23 | Paulis[2] = [[1, 0], [0, -1]]
24 |
25 | # number of qubits in the circuit
26 | nr_qubits = 3
27 | # number of layers in the circuit
28 | nr_layers = 2
29 |
30 | # randomly initialize parameters from a normal distribution. Below
31 | # - first parameter (equal to 0) is the “centre” of the distribution,
32 | # - second parameter (equal to pi) is the standard deviation of the distribution
33 | params = np.random.normal(0, np.pi, (nr_qubits, nr_layers, 3))
34 | # params = np.zeros((nr_qubits, nr_layers, 3))
35 | params = Variable(torch.tensor(params), requires_grad=True)
36 |
37 | # a layer of the circuit ansatz
38 | def layer(params, j):
39 | for i in range(nr_qubits):
40 | qml.RX(params[i, j, 0], wires=i)
41 | qml.RY(params[i, j, 1], wires=i)
42 | qml.RZ(params[i, j, 2], wires=i)
43 |
44 | qml.CNOT(wires=[0, 1])
45 | qml.CNOT(wires=[0, 2])
46 | qml.CNOT(wires=[1, 2])
47 |
48 | dev = qml.device("default.qubit", wires=3)
49 |
50 | @qml.qnode(dev, interface="torch")
51 | def circuit(params, A=None):
52 |
53 | # repeatedly apply each layer in the circuit
54 | for j in range(nr_layers):
55 | layer(params, j)
56 |
57 | # returns the expectation of the input matrix A on the first qubit
58 | return qml.expval(qml.Hermitian(A, wires=0))
59 |
60 | # cost function
61 | def cost_fn(params):
62 | cost = 0
63 | for k in range(3):
64 | cost += torch.abs(circuit(params, A=Paulis[k]) - bloch_v[k])
65 |
66 | return cost
67 |
68 | # set up the optimizer
69 | opt = torch.optim.Adam([params], lr=0.1)
70 |
71 | # number of steps in the optimization routine
72 | steps = 200
73 |
74 | # the final stage of optimization isn't always the best, so we keep track of
75 | # the best parameters along the way
76 | best_cost = cost_fn(params)
77 | best_params = np.zeros((nr_qubits, nr_layers, 3))
78 |
79 | print("Cost after 0 steps is {:.4f}".format(cost_fn(params)))
80 |
81 | # optimization begins
82 | for n in range(steps):
83 | opt.zero_grad()
84 | loss = cost_fn(params)
85 | loss.backward()
86 | opt.step()
87 |
88 | # keeps track of best parameters
89 | if loss < best_cost:
90 | best_params = params
91 |
92 | # Keep track of progress every 10 steps
93 | if n % 10 == 9 or n == steps - 1:
94 | print("Cost after {} steps is {:.4f}".format(n + 1, loss))
95 |
96 | # calculate the Bloch vector of the output state
97 | output_bloch_v = np.zeros(3)
98 | for l in range(3):
99 | output_bloch_v[l] = circuit(best_params, A=Paulis[l])
100 |
101 | print("Target Bloch vector = ", bloch_v)
102 | print("Output Bloch vector = ", output_bloch_v)
103 | print(circuit.draw())
104 |
105 | # This code would give such an output:
106 | # Cost after 0 steps is 1.0179
107 | # Cost after 10 steps is 0.1467
108 | # Cost after 20 steps is 0.0768
109 | # ...
110 | # Cost after 190 steps is 0.0502
111 | # Cost after 200 steps is 0.0573
112 | # Target Bloch vector = [ 0.33941241 -0.09447812 0.44257553]
113 | # Output Bloch vector = [ 0.3070773 -0.07421859 0.47392787]
114 | # Found circuit:
115 | # 0: ──RX(4.974)───RY(-0.739)──RZ(-0.358)──╭C──╭C───RX(4.6)──RY(2.739)───RZ(-1.297)──────────────╭C──╭C──────┤ ⟨H0⟩
116 | # 1: ──RX(1.927)───RY(-1.859)──RZ(-1.008)──╰X──│───╭C────────RX(0.375)───RY(-6.204)──RZ(-5.583)──╰X──│───╭C──┤
117 | # 2: ──RX(-2.027)──RY(-3.447)──RZ(1.425)───────╰X──╰X────────RX(-2.378)──RY(-4.139)──RZ(4.284)───────╰X──╰X──┤
118 | # H0 =
119 | # [[ 1.+0.j 0.+0.j]
120 | # [ 0.+0.j -1.+0.j]]
121 |
--------------------------------------------------------------------------------
/tutorials/tutorial_state_preparation.py:
--------------------------------------------------------------------------------
1 | r"""
2 | .. _state_preparation:
3 |
4 | Training a quantum circuit with PyTorch
5 | =======================================
6 |
7 | .. meta::
8 | :property="og:description": Build and optimize a circuit to prepare
9 | arbitrary single-qubit states, including mixed states, with PyTorch
10 | and PennyLane.
11 | :property="og:image": https://pennylane.ai/qml/_images/NOON.png
12 |
13 | .. related::
14 |
15 | tutorial_qubit_rotation Basic tutorial: qubit rotation
16 | pytorch_noise PyTorch and noisy devices
17 | tutorial_isingmodel_PyTorch 3-qubit Ising model in PyTorch
18 |
19 | In this notebook, we build and optimize a circuit to prepare arbitrary
20 | single-qubit states, including mixed states. Along the way, we also show
21 | how to:
22 |
23 | 1. Construct compact expressions for circuits composed of many layers.
24 | 2. Succinctly evaluate expectation values of many observables.
25 | 3. Estimate expectation values from repeated measurements, as in real
26 | hardware.
27 |
28 | """
29 |
30 | ##############################################################################
31 | # The most general state of a qubit is represented in terms of a positive
32 | # semi-definite density matrix :math:`\rho` with unit trace. The density
33 | # matrix can be uniquely described in terms of its three-dimensional
34 | # *Bloch vector* :math:`\vec{a}=(a_x, a_y, a_z)` as:
35 | #
36 | # .. math:: \rho=\frac{1}{2}(\mathbb{1}+a_x\sigma_x+a_y\sigma_y+a_z\sigma_z),
37 | #
38 | # where :math:`\sigma_x, \sigma_y, \sigma_z` are the Pauli matrices. Any
39 | # Bloch vector corresponds to a valid density matrix as long as
40 | # :math:`\|\vec{a}\|\leq 1`.
41 | #
42 | # The *purity* of a state is defined as :math:`p=\text{Tr}(\rho^2)`, which
43 | # for a qubit is bounded as :math:`1/2\leq p\leq 1`. The state is pure if
44 | # :math:`p=1` and maximally mixed if :math:`p=1/2`. In this example, we
45 | # select the target state by choosing a random Bloch vector and
46 | # renormalizing it to have a specified purity.
47 | #
48 | # To start, we import PennyLane, NumPy, and PyTorch for the optimization:
49 |
50 | import pennylane as qml
51 | import numpy as np
52 | import torch
53 | from torch._C import dtype
54 | from torch.autograd import Variable
55 |
56 | from scipy.linalg import logm
57 | import matplotlib.pyplot as plt
58 |
59 | np.random.seed(42)
60 |
61 | # we generate a three-dimensional random vector by sampling
62 | # each entry from a standard normal distribution
63 | v = np.random.normal(0, 1, 3)
64 |
65 | # purity of the target state
66 | purity = 0.66
67 |
68 | # create a random Bloch vector with the specified purity
69 | bloch_v = np.sqrt(2 * purity - 1) * v / np.sqrt(np.sum(v ** 2))
70 |
71 | # array of Pauli matrices (will be useful later)
72 | Paulis = np.zeros((3, 2, 2), dtype=complex)
73 | Paulis[0] = [[0, 1], [1, 0]]
74 | Paulis[1] = [[0, -1j], [1j, 0]]
75 | Paulis[2] = [[1, 0], [0, -1]]
76 |
77 | ##############################################################################
78 | # Unitary operations map pure states to pure states. So how can we prepare
79 | # mixed states using unitary circuits? The trick is to introduce
80 | # additional qubits and perform a unitary transformation on this larger
81 | # system. By "tracing out" the ancilla qubits, we can prepare mixed states
82 | # in the target register. In this example, we introduce two additional
83 | # qubits, which suffices to prepare arbitrary states.
84 | #
85 | # The ansatz circuit is composed of repeated layers, each of which
86 | # consists of single-qubit rotations along the :math:`x, y,` and :math:`z`
87 | # axes, followed by three CNOT gates entangling all qubits. Initial gate
88 | # parameters are chosen at random from a normal distribution. Importantly,
89 | # when declaring the layer function, we introduce an input parameter
90 | # :math:`j`, which allows us to later call each layer individually.
91 |
92 | # number of qubits in the circuit
93 | nr_qubits = 3
94 | # number of layers in the circuit
95 | nr_layers = 2
96 |
97 | # randomly initialize parameters from a normal distribution
98 | params = np.random.normal(0, np.pi, (nr_qubits, nr_layers, 3))
99 | params = Variable(torch.tensor(params), requires_grad=True)
100 |
101 | # a layer of the circuit ansatz
102 | def layer(params, j):
103 | for i in range(nr_qubits):
104 | qml.RX(params[i, j, 0], wires=i)
105 | qml.RY(params[i, j, 1], wires=i)
106 | qml.RZ(params[i, j, 2], wires=i)
107 |
108 | qml.CNOT(wires=[0, 1])
109 | qml.CNOT(wires=[0, 2])
110 | qml.CNOT(wires=[1, 2])
111 |
112 |
113 | ##############################################################################
114 | # Here, we use the ``default.qubit`` device to perform the optimization, but this can be changed to
115 | # any other supported device.
116 |
117 | dev = qml.device("default.qubit", wires=3)
118 |
119 |
120 | # entanglement entropy
121 | def entanglement_entropy(state):
122 | state = np.array(state, ndmin=2)
123 | ket = state.T
124 | bra = state.conj()
125 | rho_final = np.outer(ket,bra)
126 | num_wires = int(np.log2(state.size))
127 | S = []
128 | for d in range(1, num_wires):
129 | Ia = np.identity(2**d)
130 | Ib = np.identity(2**(num_wires-d))
131 | Tr_a = np.empty([2**d, 2**(num_wires-d), 2**(num_wires-d)], dtype=complex)
132 | for i in range(2**d):
133 | ai = np.array(Ia[i], ndmin=2).T
134 | Tr_a[i] = np.kron(ai.conj().T, Ib).dot(rho_final).dot(np.kron(ai,Ib))
135 | rho_b = Tr_a.sum(axis=0)
136 | rho_b_l2 = logm(rho_b)/np.log(2.0)
137 | S_rho_b = - rho_b.dot(rho_b_l2).trace()
138 | S.append(S_rho_b)
139 | return np.array(S).mean()
140 |
141 | ##############################################################################
142 | # When defining the QNode, we introduce as input a Hermitian operator
143 | # :math:`A` that specifies the expectation value being evaluated. This
144 | # choice later allows us to easily evaluate several expectation values
145 | # without having to define a new QNode each time.
146 | #
147 | # Since we will be optimizing using PyTorch, we configure the QNode
148 | # to use the PyTorch interface:
149 |
150 |
151 | @qml.qnode(dev, interface="torch")
152 | def circuit(params, A=None):
153 |
154 | # repeatedly apply each layer in the circuit
155 | for j in range(nr_layers):
156 | layer(params, j)
157 |
158 | # returns the expectation of the input matrix A on the first qubit
159 | return qml.expval(qml.Hermitian(A, wires=0))
160 |
161 |
162 | ##############################################################################
163 | # Our goal is to prepare a state with the same Bloch vector as the target
164 | # state. Therefore, we define a simple cost function
165 | #
166 | # .. math:: C = \sum_{i=1}^3 \left|a_i-a'_i\right|,
167 | #
168 | # where :math:`\vec{a}=(a_1, a_2, a_3)` is the target vector and
169 | # :math:`\vec{a}'=(a'_1, a'_2, a'_3)` is the vector of the state prepared
170 | # by the circuit. Optimization is carried out using the Adam optimizer.
171 | # Finally, we compare the Bloch vectors of the target and output state.
172 |
173 | # cost function
174 | def cost_fn(params):
175 | cost = 0
176 | for k in range(3):
177 | cost += torch.abs(circuit(params, A=Paulis[k]) - bloch_v[k])
178 |
179 | return cost
180 |
181 |
182 |
183 | # set up the optimizer
184 | opt = torch.optim.Adam([params], lr=0.1)
185 |
186 | # number of steps in the optimization routine
187 | steps = 200
188 |
189 | # the final stage of optimization isn't always the best, so we keep track of
190 | # the best parameters along the way
191 | best_cost = cost_fn(params)
192 | best_params = np.zeros((nr_qubits, nr_layers, 3))
193 |
194 | print("Cost after 0 steps is {:.4f}".format(cost_fn(params)))
195 |
196 | entropy_list_cost = []
197 | entropy_list_backward = []
198 |
199 | # optimization begins
200 | for n in range(steps):
201 | opt.zero_grad()
202 | loss = cost_fn(params)
203 | entropy_list_cost.append(entanglement_entropy(dev.state))
204 | loss.backward()
205 | entropy_list_backward.append(entanglement_entropy(dev.state))
206 | opt.step()
207 |
208 | # keeps track of best parameters
209 | if loss < best_cost:
210 | best_params = params
211 |
212 | # Keep track of progress every 10 steps
213 | if n % 10 == 9 or n == steps - 1:
214 | print("Cost after {} steps is {:.4f}".format(n + 1, loss))
215 |
216 | # calculate the Bloch vector of the output state
217 | output_bloch_v = np.zeros(3)
218 | for l in range(3):
219 | output_bloch_v[l] = circuit(best_params, A=Paulis[l])
220 |
221 | # print results
222 | print("Target Bloch vector = ", bloch_v)
223 | print("Output Bloch vector = ", output_bloch_v)
224 |
225 | entropy_list_cost = np.array(entropy_list_cost)
226 | entropy_list_backward = np.array(entropy_list_backward)
227 |
228 | plt.plot(entropy_list_cost.real)
229 | plt.plot(entropy_list_backward.real)
230 | plt.show()
231 | # plt.plot(entropy_list_cost.imag)
232 | # plt.plot(entropy_list_backward.imag)
233 | # plt.show()
234 | print(entropy_list_cost)
235 | print(entropy_list_backward)
236 |
--------------------------------------------------------------------------------