├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Hybrid_jobs_for_community_detection.ipynb
├── LICENSE
├── Notebook_QBSolv_community_detection.ipynb
├── README.md
├── image
    └── Karate_DwaveQBSolv_4comm_modu0.4198.png
├── plots
    ├── .DS_Store
    └── whitepaper.mplstyle
├── requirements.txt
└── src
    ├── .DS_Store
    ├── graph_community.py
    ├── hybrid_job_community_detection.py
    ├── qbsolv_community.py
    ├── qubo_community.py
    └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # PyCharm stuff
 2 | .idea
 3 | 
 4 | # Code output/results
 5 | output
 6 | data
 7 | .ipynb_checkpoints
 8 | data/synthetic
 9 | src/__pycache__
10 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this
 4 | software and associated documentation files (the "Software"), to deal in the Software
 5 | without restriction, including without limitation the rights to use, copy, modify,
 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
 7 | permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Community Detection in Complex Networks using Hybrid Quantum Annealing on Amazon Braket
 2 | 
 3 | This code package is for solving network community detection problems using hybrid quantum annealing on Amazon Braket. To learn more background information, you can read our AWS quantum blog post series on community detection [Part 1](https://aws.amazon.com/blogs/quantum-computing/community-detection-in-complex-networks-using-hybrid-quantum-annealing-on-amazon-braket-part-i/) and [Part 2](https://aws.amazon.com/blogs/quantum-computing/community-detection-using-hybrid-quantum-annealing-on-amazon-braket-part-2/). 
 4 | 
 5 | ### Quick Start
 6 | 
 7 | The tutorial notebook [`Notebook_QBSolv_community_detection`](Notebook_QBSolv_community_detection.ipynb) provides a step-by-step guide on how to formulate community detection as a Quadratic Unconstrained Binary Optimization (QUBO) problem, similar to the work done by [Negre et. al](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0227538). We then demonstrate how to use the open source QBSolv library, which provides quantum-classical hybrid solvers for QUBO problems, using a combination of classical compute resources and D-Wave quantum annealers, to solve community detection problems on Amazon Braket.
 8 | 
 9 | **Table of Contents for the Tutorial Notebook**
10 | * Modularity-based community detection
11 | * Community detection as a QUBO problem
12 | * Datasets
13 | * List of key functions for community detection
14 | * Set up environment
15 | * Download graph data
16 | * Prepare graph data for community detection
17 | * Apply QBSolv for community detection
18 | * Detect communities for synthetic and real-world networks
19 | 
20 | The Amazon Braket Hybrid Jobs notebook [`Hybrid_jobs_for_community_detection`](Hybrid_jobs_for_community_detection.ipynb) provides a step-by-step guide on how to use Amazon Braket Hybrid Jobs to seamlessly manage and execute quantum annealing-based community detection tasks at scale. 
21 | 
22 | **Table of Contents for the Amazon Braket Hybrid Jobs Notebook**
23 | * Set up environment
24 | * Prepare input data
25 | * Create an algorithm script
26 | * Specify hyperparameters
27 | * Submit a hybrid job
28 | * View results
29 | * Run hyperparameter tuning
30 | 
31 | 
32 | ### Datasets
33 | 
34 | The graph datasets used for demonstration here can be downloaded from http://networkrepository.com/ under a Creative Commons Attribution-ShareAlike License. Dataset reference: Rossi, Ryan A.  and Ahmed, Nesreen K. (2015) The Network Data Repository with Interactive Graph Analytics and Visualization. AAAI https://networkrepository.com.
35 | 
36 | 
37 | ## Security
38 | 
39 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
40 | 
41 | ## License
42 | 
43 | This library is licensed under the MIT-0 License. See the LICENSE file.
44 | 
45 | 


--------------------------------------------------------------------------------
/image/Karate_DwaveQBSolv_4comm_modu0.4198.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-braket-community-detection/3cf0f5ecfe214a23d5880acdea6691fe72ff6716/image/Karate_DwaveQBSolv_4comm_modu0.4198.png


--------------------------------------------------------------------------------
/plots/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-braket-community-detection/3cf0f5ecfe214a23d5880acdea6691fe72ff6716/plots/.DS_Store


--------------------------------------------------------------------------------
/plots/whitepaper.mplstyle:
--------------------------------------------------------------------------------
1 | figure.figsize : 10, 7
2 | savefig.dpi : 300
3 | axes.titlesize : 18
4 | axes.titlepad : 20
5 | font.size : 14
6 | axes.linewidth : 2
7 | lines.linewidth : 2
8 | axes.xmargin : 0.01
9 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | networkx==2.4
2 | scipy==1.5.2
3 | pandas==1.1.4
4 | matplotlib==3.2.2
5 | minorminer==0.2.5
6 | amazon-braket-sdk==1.14.0
7 | dwave-ocean-sdk==3.3.0
8 | amazon-braket-ocean-plugin==1.0.6


--------------------------------------------------------------------------------
/src/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-braket-community-detection/3cf0f5ecfe214a23d5880acdea6691fe72ff6716/src/.DS_Store


--------------------------------------------------------------------------------
/src/graph_community.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import matplotlib.pyplot as plt
  5 | import networkx as nx
  6 | import numpy as np
  7 | import os
  8 | 
  9 | 
 10 | class CommunityGraphFactory(object):
 11 |     """
 12 |     Create a graph for community detection from two options: 1) load from a local data file; 2) create a synthetic network via networkx's random partition graph
 13 |     """
 14 |     
 15 |     def __init__(self, seed=1):
 16 |         """
 17 |         Estbalish the seed for numpy's RNG downstream
 18 | 
 19 |         :param seed: int, seed value to use when constructing random graphs
 20 |         """
 21 |         self.seed = seed
 22 | 
 23 |     @staticmethod
 24 |     def _sort_nodes(nx_G):
 25 |         """
 26 |         Relabel node index in order of 0, 1, ...., N-1
 27 | 
 28 |         :param nx_G: networkx.classes.graph.Graph, Graph object to be sorted
 29 |         """
 30 |         nodes = sorted(nx_G.nodes())
 31 |         new_nodes = list(range(len(nodes)))
 32 |         nodes_mapping = dict(zip(nodes, new_nodes))
 33 | 
 34 |         return nx.relabel_nodes(nx_G, nodes_mapping)
 35 |     
 36 |     def load_graph(self, graph_name, graph_file_dict):
 37 |         """
 38 |         Load a graph by name from a dict: {graph_name: [data_file_path, delimiter]}
 39 | 
 40 |         :param graph_name: str, Identifier (name) of graph to be loaded in
 41 |         :param graph_file_dict: dict, Contains location and delimiter specification for known graphs
 42 |         """
 43 |         
 44 |         nx_G = nx.read_weighted_edgelist(
 45 |             graph_file_dict[graph_name][0],
 46 |             delimiter=graph_file_dict[graph_name][1],
 47 |             create_using=nx.Graph(),
 48 |             nodetype=int
 49 |         )
 50 |         print(nx.info(nx_G))
 51 |         return self._sort_nodes(nx_G)
 52 |     
 53 |     def create_random_graph(self, total_nodes, num_comm, prob_in, prob_out, seed=None):
 54 |         """
 55 |         Create a random partition graph with minimum degree >=1, save the group labels for
 56 |         graph nodes into a local file. If seed is provided, overwrite initialized seed value.
 57 | 
 58 |         :param total_nodes: int, Desired number of nodes in generated graph
 59 |         :param num_comm: int, Desired number of communities in graph
 60 |         :param prob_in: float, Probability of an edge between nodes in the same community
 61 |         :param prob_out: float, Probability of an edge between nodes in different communities
 62 |         :param seed: int, optional, If provided, use as seed value for generating random graphs
 63 |         """
 64 | 
 65 |         if seed is not None:
 66 |             self.seed = seed
 67 | 
 68 |         np.random.seed(self.seed)
 69 |         community_size = np.random.multinomial(total_nodes, np.ones(num_comm)/num_comm, size=self.seed)[0]
 70 |         nx_G = nx.random_partition_graph(community_size, prob_in, prob_out, self.seed)
 71 |         
 72 |         out_degree = dict(nx_G.degree(weight='weight'))
 73 |         seed_ = 1
 74 |         while (min(out_degree.values()) == 0) & (seed_ <= 50):
 75 |             seed_ += 1
 76 |             print(f"iterating graph generation with random seed {seed_}")
 77 |             nx_G = nx.random_partition_graph(community_size, prob_in, prob_out, seed=seed_)
 78 |             out_degree = dict(nx_G.degree(weight='weight'))
 79 |         
 80 |         if min(out_degree.values()) == 0:
 81 |             raise ValueError("some nodes have ZERO degree! Change random graph input settings and re-generate a graph.")
 82 |         
 83 |         print(nx.info(nx_G))
 84 |         
 85 |         # save node labels
 86 |         labels = [node_attributes['block'] for _, node_attributes in nx_G.nodes(data=True)]
 87 |         node_labels = list(enumerate(labels))
 88 |         
 89 |         if not os.path.exists('./data/synthetic'):
 90 |             os.makedirs('./data/synthetic')
 91 |         
 92 |         label_outfile = os.path.join(
 93 |             "./data/synthetic", f"node_labels_{total_nodes}_{num_comm}_{prob_in}_{prob_out}.node_labels")
 94 | 
 95 |         # Store node-label mapping to csv-string file
 96 |         with open(label_outfile, 'w') as file:
 97 |             file.writelines(','.join(str(j) for j in i) + '\n' for i in node_labels)
 98 |         
 99 |         return self._sort_nodes(nx_G)
100 |     
101 |     def draw_graph(self, nx_G):
102 |         """
103 |         Simply draw a graph for a graph with less than 200 nodes
104 |         """
105 |         
106 |         if len(nx_G.nodes()) <= 200:
107 |             pos = nx.kamada_kawai_layout(nx_G)
108 |             return nx.draw(nx_G, pos, with_labels=True)
109 |         else:
110 |             print("Too many nodes (>= 200) to display!")
111 | 
112 | 
113 | # for labeled networks
114 | def load_node_labels(file_path, delimiter=','):
115 |     """
116 |     Load node label data from a local file
117 | 
118 |     :param file_path: str, a local file path to node label data for a graph
119 |     :param delimiter: str, delimiter used to separate node records
120 |     :return: label_nodes_dict, label_community, labels_array
121 |     """
122 | 
123 |     with open(file_path, 'r') as f:
124 |         node_labels = [list(map(int, line.split(delimiter))) for line in f.read().splitlines()]
125 | 
126 |     # Sort according to node values
127 |     node_labels_sorted = sorted(node_labels, key=lambda x: x[0])
128 |     node_labels = np.array(node_labels_sorted)
129 | 
130 |     # Center node and label IDs
131 |     node_labels[:, 0] -= node_labels[:, 0].min()
132 |     node_labels[:, 1] -= node_labels[:, 1].min()
133 | 
134 |     # Create one-hot encoding of labels (node classes)
135 |     labels_array = np.zeros((node_labels.shape[0], len(set(node_labels[:, 1]))))
136 |     labels_array[node_labels[:, 0], node_labels[:, 1]] = 1.0
137 | 
138 |     # Map {label_id: [nodes, with, that, label]}
139 |     label_nodes_dict = {label: [node for node, label_ in node_labels if label_ == label] \
140 |                         for label in set(node_labels[:, 1])}
141 | 
142 |     # Prep list of nodes under each community for networkX function
143 |     label_community = list(label_nodes_dict.values())
144 | 
145 |     return label_nodes_dict, label_community, labels_array
146 | 
147 | 
148 | # draw network graph with nodes colored by community groups
149 | def draw_graph_community(nx_G, communities, comm_order=None, color_map='rainbow', color_list=[], seed=42):
150 |     """
151 |     Draw network with nodes colored based on community groups
152 | 
153 |     :param nx_G: networkX graph
154 |     :param communities: list, a list with nodes grouped by communities, e.g., [{0, 1, 3}, {2, 4, 5}]
155 |     :param comm_order: list, order of communities to map communities to colors
156 |     :param color_map: str, one of the existing color map in matplotlib.pyplot
157 |     :param color_list: list, a list of color names for specifying a color for a community, e.g.,
158 |         color_list = ['tab:blue', 'tab:red', 'tab:green', 'tab:purple', 'tab:orange', 'tab:cyan']
159 |     :param seed: int, random seed for networkX layout
160 |     :return: a draw of network
161 |     """
162 | 
163 |     if comm_order is None:
164 |         # re-order communities based on the sum of node index for consistent coloring:
165 |         # we may still have inconsistent coloring among results from different number of communities
166 |         sum_nodes = [sum(i) for i in communities]
167 |         comm_order = sorted(range(len(sum_nodes)), key=lambda k: sum_nodes[k])
168 |     communities = [communities[i] for i in comm_order]
169 | 
170 |     class_map = {}
171 |     
172 |     if len(color_list) == 0:
173 |         for cl in range(len(communities)):
174 |             for n in communities[cl]:
175 |                 class_map.update({n: cl})
176 | 
177 |         class_map = dict(sorted(class_map.items()))
178 | 
179 |         pos = nx.spring_layout(nx_G, seed=seed)
180 |         nx.draw(nx_G, cmap=plt.get_cmap(color_map), pos=pos, node_color=list(class_map.values()), with_labels=True,
181 |                 font_color='white', node_size=500, font_size=10)
182 |     else:
183 |         assert len(color_list) >= len(communities), \
184 |             "Number of colors in color_list is less than the number of communities!!"
185 |         for cl in range(len(communities)):
186 |             for n in communities[cl]:
187 |                 class_map.update({n: color_list[cl]})
188 | 
189 |         class_map = dict(sorted(class_map.items()))
190 | 
191 |         pos = nx.spring_layout(nx_G, seed=seed)
192 |         nx.draw(nx_G, pos=pos, node_color=list(class_map.values()), with_labels=True,
193 |                 font_color='white', node_size=500, font_size=10)
194 | 


--------------------------------------------------------------------------------
/src/hybrid_job_community_detection.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import os
 5 | from braket.jobs import save_job_result
 6 | from braket.jobs.metrics import log_metric
 7 | import json
 8 | import networkx as nx
 9 | 
10 | # Load community detection specific library
11 | from src.qbsolv_community import QbsolvCommunity
12 | 
13 | def main():
14 |     # Print statements can be viewed in cloudwatch
15 |     print(os.environ)
16 | 
17 |     input_dir = os.environ["AMZN_BRAKET_INPUT_DIR"]
18 |     hp_file = os.environ["AMZN_BRAKET_HP_FILE"]
19 |     job_name = os.environ["AMZN_BRAKET_JOB_NAME"]
20 |     s3_bucket = os.environ["AMZN_BRAKET_OUT_S3_BUCKET"]
21 |     device_arn = os.environ["AMZN_BRAKET_DEVICE_ARN"]
22 | 
23 |     # Read the hyperparameters
24 |     with open(hp_file, "r") as f:
25 |         hyperparams = json.load(f)
26 |     print(hyperparams)
27 | 
28 |     # Graph related parameters
29 |     input_graph_file = str(hyperparams["input_graph_file"]).strip('"')
30 |     k = int(hyperparams["num_community"])
31 | 
32 |     # QBSolv related parameters
33 |     solver_mode = str(hyperparams["solver_mode"]).strip('"')
34 |     solver_limit = int(hyperparams["solver_limit"])
35 |     num_repeats = int(hyperparams["num_repeats"])
36 |     num_reads = int(hyperparams["num_reads"])
37 |     seed = int(hyperparams["seed"])
38 |     alpha = int(hyperparams["alpha"])
39 | 
40 |     print(f"Load graph file from {input_dir}/input-graph/{input_graph_file}")
41 |     nx_G = nx.read_weighted_edgelist(
42 |         f"{input_dir}/input-graph/{input_graph_file}",
43 |         delimiter=None, # check the input graph file and update the delimiter here
44 |         create_using=nx.Graph(),
45 |         nodetype=int)
46 |     print(f"Input graph information: {nx.info(nx_G)}")
47 | 
48 |     # Initialize QbsolvCommunity class
49 |     qbsolv_comm = QbsolvCommunity(nx_G, solver_limit, num_repeats, num_reads, seed, alpha)
50 | 
51 |     if solver_mode == "classical":
52 |         print("Executing QBSolv Classical solver for community detection")
53 |         comm_results, qbsolv_output = qbsolv_comm.solve_classical(k)
54 |     elif solver_mode == "hybrid":
55 |         # QBSolv Hybrid solver specific input
56 |         s3_task_prefix = f"jobs/{job_name}/tasks" # the folder name in the S3 braket bucket to save QBSolv task output
57 |         s3_folder = (s3_bucket, s3_task_prefix)
58 | 
59 |         print("Executing QBSolv Hybrid solver for community detection")
60 |         comm_results, qbsolv_output = qbsolv_comm.solve_hybrid(k, s3_folder, device_arn, ack_QPUcost=True)
61 |     else:
62 |         raise ValueError(f"Invalid qbsolv solver mode {solver_mode}. Solver mode has to be in ['classical', 'hybrid']!")
63 | 
64 |     log_metric(
65 |         metric_name="Modularity",
66 |         value=comm_results["modularity"],
67 |     )
68 |     
69 |     # We're done with the job, so save the result.
70 |     # This will be returned in job.result()    
71 |     print('Save results')
72 |     save_job_result({"community_results": str(comm_results), "hyperparams": str(hyperparams), "qbsolv_output": str(qbsolv_output)})
73 | 
74 | if __name__ == "__main__":
75 |     main()


--------------------------------------------------------------------------------
/src/qbsolv_community.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import collections
  5 | import datetime
  6 | import minorminer
  7 | import networkx as nx
  8 | import numpy as np
  9 | import os
 10 | import pandas as pd
 11 | import pickle
 12 | import time
 13 | import warnings
 14 | 
 15 | from copy import deepcopy
 16 | from dwave_qbsolv import QBSolv
 17 | from dwave.system.composites import FixedEmbeddingComposite
 18 | from braket.ocean_plugin import BraketDWaveSampler
 19 | from collections import defaultdict
 20 | from networkx.algorithms import community
 21 | 
 22 | from src.qubo_community import qubo_matrix_community_sparse, qbsolv_response_to_community
 23 | 
 24 | 
 25 | def create_qubo_dict(nx_G, k, alpha=5):
 26 |     """
 27 |     Create a QUBO matrix in dict format for community detection
 28 | 
 29 |     :param nx_G: networkX graph
 30 |     :param k: int, number of communities to detect
 31 |     :param alpha: int, constraint coefficient to force a solution with one node for one community assignment only
 32 |     :return: QUBO sparse matrix in dict format
 33 |     """
 34 |     
 35 |     qubo_matrix_sparse = qubo_matrix_community_sparse(nx_G, k, alpha=alpha)
 36 | 
 37 |     indices = list(zip(qubo_matrix_sparse.row, qubo_matrix_sparse.col))
 38 |     values = qubo_matrix_sparse.data
 39 | 
 40 |     qubo_sparse = defaultdict(int)
 41 |     for idx, val in zip(indices, values):
 42 |         qubo_sparse[idx] = val
 43 | 
 44 |     print(f"The size of the QUBO matrix in dictionary format for {k}-community is {len(qubo_sparse)}")
 45 |     
 46 |     return qubo_sparse
 47 | 
 48 | 
 49 | class QbsolvCommunity(object):
 50 |     """
 51 |     Use QBSolv to solve community detection via either the classical solver or the hybrid solver 
 52 |     """
 53 |     def __init__(self, graph, solver_limit=40, num_repeats=1, num_reads=1000, seed=1, alpha=5):
 54 |         
 55 |         """
 56 |         Set hyperparameter values for QBSolv
 57 | 
 58 |         :param graph: a networkX graph
 59 |         :param solver_limit: int, the maximum number of variables (n) for sub-QUBOs
 60 |         :param num_repeats: int, the maximum iterations to repeat QBSolv solver execution to discover a new best solution
 61 |         :param num_reads: int, the number of times the annealing to be performed
 62 |         :param seed: int, random seed value
 63 |         :param alpha: int, the penalty coefficient to enforce assigning only one community to each node
 64 |         """
 65 |         
 66 |         self.graph = graph
 67 |         self.solver_limit = solver_limit
 68 |         self.num_repeats = num_repeats
 69 |         self.num_reads = num_reads
 70 |         self.seed = seed
 71 |         self.alpha = alpha
 72 |     
 73 |     def solve_classical(self, num_comm):
 74 |         """
 75 |         Call QUBO classical solver for community detection
 76 | 
 77 |         :param num_comm: int, number of communities to solve for
 78 |         :return: dict, two dictionaries for graph's community results and QBSolv reponse results
 79 |         """
 80 |         t0 = time.time()
 81 | 
 82 |         q_dict = create_qubo_dict(self.graph, num_comm, self.alpha)
 83 |     
 84 |         # execute optimization task using QBSolv classical (run on your local notebook instance) 
 85 |         response_classical = QBSolv().sample_qubo(
 86 |             q_dict, num_repeats=self.num_repeats, solver_limit=self.solver_limit, seed=self.seed
 87 |         )
 88 |         print(f"Mode: Classical, time spent is {round(time.time()-t0, 2)} seconds for {self.num_repeats} repetitions")
 89 |         print(response_classical)
 90 |         
 91 |         # extract the best solution that has the lowest energy
 92 |         sample_classical = np.array(list(response_classical.first.sample.values()))
 93 |         comm_classical = qbsolv_response_to_community(self.graph, sample_classical, num_comm)
 94 |         
 95 |         return comm_classical, response_classical
 96 |     
 97 |     def solve_hybrid(self, num_comm, s3_folder, device_arn, ack_QPUcost=False):
 98 |         """
 99 |         Call QUBO hybrid solver for community detection
100 |         
101 |         :param num_comm: int, number of communities to solve for
102 |         :param s3_folder: str, the Amazon Braket S3 path to store solver response files
103 |         :param device_arn: str, D-Wave QPU Device ARN (only needed for QBSolv Hybrid solver)
104 |         :param ack_QPUcost: bool, acknowledge QPU cost associated with the QBSolv Hybrid solver for price awareness
105 |         :return: dict, two dictionaries for graph's community results and QBSolv reponse results
106 |         """
107 |         q_dict = create_qubo_dict(self.graph, num_comm, self.alpha)
108 |         self.qpu_cost_warning()
109 |         
110 |         if ack_QPUcost:
111 |             execution = "y"
112 |         else:
113 |             try:
114 |                 execution = input("Continue to execute QBSolv Hybrid job: Y or N?")
115 |             except EOFError as error:
116 |                 print("\n WARNING: User didn't acknowledge QPU cost. Hybrid job execution is declined!")
117 |                 raise error
118 |         
119 |         if execution.lower() in ["y", "yes"]:
120 |             t0 = time.time()
121 |             system = BraketDWaveSampler(s3_folder, device_arn)
122 | 
123 |             # find embedding of subproblem-sized complete graph to the QPU
124 |             G_sub = nx.complete_graph(self.solver_limit)
125 |             embedding = minorminer.find_embedding(G_sub.edges, system.edgelist)
126 | 
127 |             # use the FixedEmbeddingComposite() method with a fixed embedding
128 |             solver = FixedEmbeddingComposite(system, embedding)
129 | 
130 |             # execute optimization task using QBSolv hybrid on D-Wave QPU 
131 |             response_hybrid = QBSolv().sample_qubo(q_dict, solver=solver, num_repeats=self.num_repeats,
132 |                                                    solver_limit=self.solver_limit, num_reads=self.num_reads,
133 |                                                    seed=self.seed)
134 |             print(f"Mode: Hybrid, time spent is {round(time.time()-t0, 2)} seconds for {self.num_repeats} repeats")
135 |             print(response_hybrid)
136 |             
137 |             # extract the best solution that has the lowest energy
138 |             sample_hybrid = np.array(list(response_hybrid.first.sample.values()))
139 |             comm_hybrid = qbsolv_response_to_community(self.graph, sample_hybrid, num_comm)
140 | 
141 |             return comm_hybrid, response_hybrid
142 |         else:
143 |             raise ValueError("Hybrid job execution declined by the user!")
144 | 
145 |     def qpu_cost_warning(self):
146 |         
147 |         def custom_formatwarning(msg, *args, **kwargs):
148 |             return str(msg) + '\n'
149 | 
150 |         warnings.formatwarning = custom_formatwarning
151 |         stmt = f"\033[91mWARNING:\033[0m Additional cost for using D-Wave QPU. " \
152 |                f"Please evaluate potential cost before executing this QBSolv hybrid job"
153 |         warnings.warn(stmt)
154 |         
155 |         
156 | class CommunityDetectionSolver(object):
157 |     """
158 |     Call QUBO classical/hybrid solver to find community structure of a graph and save results
159 |     """
160 |     def __init__(self, graph_name, graph, num_comm, solver_limit=100, num_repeats=1, num_reads=1000, 
161 |                  seed=1, alpha=10, mode='classical', s3_folder='N/A', device_arn = 'N/A'):
162 |         """
163 |         Input for graph and QBSolv hyperparameter values
164 |         
165 |         :param graph_name: str, the name of a graph for result saving
166 |         :param graph: a networkX graph
167 |         :param num_comm: int, number of communities to solve for
168 |         :param solver_limit: int, the maximum number of variables (n) for sub-QUBOs
169 |         :param num_repeats: int, the maximum iterations to repeat QBSolv solver execution to discover a new best solution
170 |         :param num_reads: int, the number of times the annealing to be performed
171 |         :param seed: int, random seed value
172 |         :param alpha: int, the penalty coefficient to enforce assigning only one community to each node
173 |         :param mode, str, must be either 'classical' or 'hybrid'. Determines whether the classical or hybrid solver is called
174 |         :param s3_folder, str, Amazon Braket S3 folder path (only needed for QBSolv Hybrid solver)
175 |         :param device_arn: str, D-Wave QPU Device ARN (only needed for QBSolv Hybrid solver)
176 |         
177 |         """
178 |         self.graph_name = graph_name
179 |         self.graph = graph
180 |         self.num_comm = num_comm
181 |         self.solver_limit = solver_limit
182 |         self.num_repeats = num_repeats
183 |         self.num_reads = num_reads
184 |         self.seed = seed
185 |         self.alpha = alpha
186 |         self.mode = mode
187 |         self.s3_folder = s3_folder
188 |         self.device_arn = device_arn
189 |         
190 |     def _solve_single_graph(self):
191 |         """
192 |         Call QUBO classical/hybrid solver to process a single graph for community detection
193 |         
194 |         :param graph: a networkX graph
195 |         :param num_comm: int, number of communities to solve for
196 |         """
197 |         
198 |         QbsolvComm = QbsolvCommunity(
199 |             self.graph, self.solver_limit, self.num_repeats, self.num_reads, self.seed, self.alpha)
200 |         
201 |         t0 = time.time()
202 |         if self.mode == 'classical':
203 |             comm_results, response = QbsolvComm.solve_classical(self.num_comm)
204 |         elif self.mode == 'hybrid':
205 |             comm_results, response = QbsolvComm.solve_hybrid(self.num_comm, self.s3_folder, self.device_arn)
206 |         else:
207 |             raise ValueError(f"Invalid qbsolv mode {self.mode}. Mode has to be in ['classical', 'hybrid']!")
208 | 
209 |         time_spent = round(time.time()-t0, 2)
210 |         
211 |         return comm_results, response, time_spent
212 | 
213 |     def run_community_detection(self, save=False):
214 |         """
215 |         Call QUBO classical/hybrid solver for community detection and save results
216 |         
217 |         :param save: boolean, True or False to set whether to save results locally or not
218 |         :return: dict, two dictionaries: the first one 'results_graph' contains graph-level results about graph, qbsolv seetings, modularity values, 
219 |         and execution time; the second one 'track' contains node-level results about graph edge connections and community assignment.
220 |         """
221 |         results_graph = collections.defaultdict(list)
222 |         timenow = str(datetime.datetime.now())[:19].replace(' ', '_')
223 |         date_today = str(datetime.date.today())
224 |         track = {'graphs': [], 'responses': [], 'communities': []}
225 | 
226 |         output_parent = f'./output/{date_today}'
227 |         result_file = f"result_{self.graph_name}_{timenow}_sl{self.solver_limit}_rp{self.num_repeats}_shot{self.num_reads}_seed{self.seed}.csv"
228 | 
229 |         if save and (not os.path.exists(output_parent)):
230 |             print(f'Creating parent folder(s): {output_parent}')
231 |             print('Will create required sub-directories quietly')
232 |             os.makedirs(output_parent)
233 | 
234 |         # run QBSolv for community detection
235 |         comm_results, response, time_spent = self._solve_single_graph()
236 |         print(f"Modularity from QBSolv with {comm_results['num_comm']} communities is {round(comm_results['modularity'], 4)}")
237 | 
238 |         # Save graph information
239 |         results_graph['graph_name'].append(self.graph_name)
240 |         results_graph['total_nodes'].append(self.graph.number_of_nodes())
241 |         results_graph['num_edge'].append(self.graph.number_of_edges())
242 |         results_graph['num_comm'].append(self.num_comm)
243 |             
244 |         # Save results from qbsolv
245 |         results_graph['modu_qbsolv'].append(comm_results['modularity'])
246 |         results_graph['num_comm_qbsolv'].append(comm_results['num_comm'])
247 |         results_graph['wall_time_s'].append(time_spent)
248 |         results_graph['solver_limit'].append(self.solver_limit)
249 |         results_graph['num_repeats'].append(self.num_repeats)
250 |         results_graph['num_reads'].append(self.num_reads)
251 |         results_graph['seed'].append(self.seed)
252 |         results_graph['alpha'].append(self.alpha)
253 |         results_graph['mode'].append(self.mode)
254 |         results_graph['s3_folder'].append(self.s3_folder)
255 | 
256 |         graph_file = f"graph_{self.graph_name}_QBS_sl{self.solver_limit}_rp{self.num_repeats}_shot{self.num_reads}_seed{self.seed}.gpickle"
257 |         response_file = f"response_{self.graph_name}_QBS_sl{self.solver_limit}_rp{self.num_repeats}_shot{self.num_reads}_seed{self.seed}.p"
258 |         comm_file = f"modu_{self.graph_name}_QBS_sl{self.solver_limit}_rp{self.num_repeats}_shot{self.num_reads}_seed{self.seed}.p"
259 | 
260 |         track['graphs'].append((graph_file, deepcopy(self.graph)))
261 |         track['responses'].append((response_file, deepcopy(response)))
262 |         track['communities'].append((comm_file, deepcopy(comm_results)))
263 | 
264 |         if save:
265 |             self._save_results(track, results_graph, output_parent, result_file)
266 | 
267 |         return results_graph, track
268 |         
269 |     @staticmethod
270 |     def _save_results(track_dict, results_graph, output_parent, result_file):
271 |         
272 |         results_df = pd.DataFrame.from_dict(results_graph)
273 |         for top_key, item_list in track_dict.items():
274 |             output_subdir = f'{output_parent}/{top_key}'
275 |             if not os.path.exists(output_subdir):
276 |                 os.makedirs(output_subdir)
277 |             for item in item_list:
278 |                 out_loc = f'{output_subdir}/{item[0]}'
279 |                 print(f'Writing to disk: {out_loc}')
280 |                 if top_key == 'graphs':
281 |                     nx.write_gpickle(item[1], out_loc)
282 |                 else:
283 |                     pickle.dump(item[1], open(out_loc, 'wb'))
284 | 
285 |         print(f'Writing to disk: {output_parent}/{result_file}')
286 |         results_df.to_csv(f'{output_parent}/{result_file}', index=False)


--------------------------------------------------------------------------------
/src/qubo_community.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | # SPDX-License-Identifier: MIT-0
  3 | 
  4 | import collections
  5 | import networkx as nx
  6 | import numpy as np
  7 | import scipy.sparse as sp
  8 | 
  9 | from networkx.algorithms import community
 10 | 
 11 | 
 12 | def modularity_mx(nx_G):
 13 |     """
 14 |     Create a sparse matrix for the modularity of a graph nx_G
 15 | 
 16 |     :param nx_G: networkX graph
 17 |     :return: scipy coo sparse matrix, the modularity matrix for nx_G with weighted edges
 18 |     """
 19 |     
 20 |     # Generate a sparse adjacency matrix using networkx
 21 |     adj = nx.adjacency_matrix(nx_G, nodelist=sorted(list(nx_G)), weight='weight')
 22 |     
 23 |     # Create a numpy array for node's degree. 
 24 |     # Here we assumed an undirected graph, so `degrees` refers to both the in-degree and out-degree
 25 |     sorted_degrees = sorted(nx_G.degree(weight='weight'))
 26 |     degrees = np.array([degree[1] for degree in sorted_degrees])
 27 |     m = sum(degrees) / 2
 28 | 
 29 |     # Calculate the expected number of edges between two nodes for a null model.
 30 |     # Note we use sparse matrix format here but this degree matrix is a dense one by definition
 31 |     degrees = np.expand_dims(degrees.squeeze(), axis=0)
 32 |     degree_mx = sp.csr_matrix(degrees.T).multiply(sp.csr_matrix(degrees)) / (2 * m)
 33 |     
 34 |     # Create a modularity matrix and convert it into coo sparse matrix format (Torch sparse tensor compatible)
 35 |     modu_mx_sparse = (sp.csr_matrix(adj) - degree_mx) / (2 * m)
 36 |     modu_mx_sparse = sp.coo_matrix(modu_mx_sparse)
 37 | 
 38 |     return modu_mx_sparse
 39 | 
 40 | 
 41 | def qubo_matrix_community_sparse(nx_G, k, alpha=5):
 42 |     """
 43 |     Create a sparse matrix as a QUBO matrix for a graph nx_G with k-community detection
 44 | 
 45 |     :param nx_G: networkX graph
 46 |     :param k: int, the number of communities to detect for the graph nx_G
 47 |     :param alpha: float, the penalty coefficient for the constraint term in the QBUO matrix
 48 |     :return: scipy coo sparse matrix, the QUBO matrix to minimize for a graph nx_G with k-community detection
 49 |     """
 50 |     
 51 |     # get the number of nodes for a networkx graph
 52 |     num_nodes = nx_G.number_of_nodes()
 53 | 
 54 |     # create the modularity matrix in coo sparse format
 55 |     modu_mx_sparse = modularity_mx(nx_G)
 56 | 
 57 |     # define the coefficient value for the QUBO constraint term that a node can only be in one community
 58 |     gamma_v = alpha / num_nodes
 59 |     
 60 |     # create sparse diagonal matrix for the linear constraint term in the QUBO matrix
 61 |     gamma_mx = sp.eye(num_nodes) * gamma_v
 62 |     
 63 |     # create a block diagonal matrix for k-commnuity problem where k > 2
 64 |     # this block diagonal matrix is for the linear constraint term in the QUBO matrix 
 65 |     gamma_blockmatrix_sparse = sp.block_diag([gamma_mx] * k)
 66 | 
 67 |     # create a k x k sparse block matrix with each block being a diagonal matrix
 68 |     # this block matrix is for the quadratic constraint term in the QUBO matrix
 69 |     constraint_mx = [[gamma_mx] * k] * k
 70 |     constraint_blockmatrix_sparse = sp.bmat(constraint_mx)
 71 |     
 72 |     # create a sparse block diagonal matrix with the diagonal value equal to the modularity matrix
 73 |     # this is the modularity matrix for k communities in QUBO format
 74 |     modu_mx_sparse_k = [modu_mx_sparse] * k
 75 |     modu_block_sparse = sp.block_diag(modu_mx_sparse_k)
 76 | 
 77 |     # create a QUBO sparse matrix (for minimization) by combinding the modularity matrix and the constraint
 78 |     # term matrix for a k-community problem
 79 |     q_blockmatrix_sparse = -1 * modu_block_sparse + constraint_blockmatrix_sparse - 2 * gamma_blockmatrix_sparse
 80 |     q_blockmatrix_sparse = sp.coo_matrix(q_blockmatrix_sparse)
 81 | 
 82 |     return q_blockmatrix_sparse
 83 | 
 84 | def qbsolv_response_to_community(nx_G, response_sample, k):
 85 |     """
 86 |     Extract communities from QBSolv responses and calculate its modularity
 87 | 
 88 |     :param nx_G: networkX graph
 89 |     :param response_sample: QBSolv responses
 90 |     :param k: int, the number of communities to detect for the graph nx_G
 91 |     :return: dict, a dictionary of node sets as community groups and the graph modularity
 92 |     """
 93 | 
 94 |     num_nodes = nx_G.number_of_nodes()
 95 | 
 96 |     # Split result out into binary assignments within each community
 97 |     result = response_sample.squeeze()
 98 |     result = result.reshape(k, num_nodes)
 99 | 
100 |     # Extract node IDs belonging to each community, based on results
101 |     communities = []
102 |     for i in range(k):
103 |         node_c = np.where(result[i] == 1)[0]
104 |         if len(node_c) > 0:
105 |             communities.append(set(node_c))
106 | 
107 |     # Check if there is multi-community assignment for a node or a node without community assignment
108 |     for i in range(num_nodes):
109 |         if result[:, i].sum() > 1:
110 |             raise ValueError('Multi-community assignment!')
111 |             break
112 |         if result[:, i].sum() == 0:
113 |             raise ValueError('Node without community assignment!')
114 |             break
115 |     
116 |     # Order communities according to lowest-ID nodes in each set, ascending order
117 |     communities.sort(key=min)
118 | 
119 |     modu = community.modularity(nx_G, communities)
120 |     k = len(communities)  # in case any communities returned no hits
121 | 
122 |     return {"modularity": modu, "num_comm": k, "comm": communities}
123 | 


--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | # SPDX-License-Identifier: MIT-0
 3 | 
 4 | import os
 5 | import shutil
 6 | from urllib.request import urlopen
 7 | from zipfile import ZipFile
 8 | from io import BytesIO
 9 | 
10 | def download_graphs(graph_url, data_dir = "./graph_data"):
11 |     """
12 |     Download graph .zip files from web URL
13 |     
14 |     :param graph_url: dict, with a format of {'graph_name': 'url'}
15 |     :param data_dir: str, the directory path to store graph data
16 |     """
17 |     
18 |     if not os.path.exists(data_dir):
19 |         os.makedirs(data_dir)
20 |         print("Created ./graph_data directory in local machine to store graph data.")
21 |     
22 |     for graph_name in graph_url.keys():
23 |         url = graph_url[graph_name]
24 |         with urlopen(url) as zr:
25 |             with ZipFile(BytesIO(zr.read())) as zf:
26 |                 zf.extractall(data_dir)
27 | 
28 | def clean_graph_data(graph_files, data_dir = "./graph_data"):
29 |     """
30 |     Clean graph data by removing header lines
31 |     
32 |     :param graph_files: dict, with a format of {'graph_name': {'file': str, 'lines_to_skip': int}}
33 |     :param data_dir: str, the directory path to graph data
34 |     """
35 |     
36 |     for graph_name in graph_files.keys():
37 |         
38 |         # create a subfolder for each graph and save its file with header lines removed
39 |         graph_folder = os.path.join(data_dir, graph_name)
40 |         if not os.path.exists(graph_folder):
41 |             os.makedirs(graph_folder)
42 | 
43 |         raw_file = os.path.join(data_dir, graph_files[graph_name]['file'])
44 |         new_file = os.path.join(graph_folder, graph_files[graph_name]['file'])
45 | 
46 |         with open(raw_file, 'r') as f_raw:
47 |             data = f_raw.read().splitlines(True) 
48 |         with open(new_file, 'w') as f_new:
49 |             f_new.writelines(data[graph_files[graph_name]['lines_to_skip']:])


--------------------------------------------------------------------------------