├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── Hybrid_jobs_for_community_detection.ipynb ├── LICENSE ├── Notebook_QBSolv_community_detection.ipynb ├── README.md ├── image └── Karate_DwaveQBSolv_4comm_modu0.4198.png ├── plots ├── .DS_Store └── whitepaper.mplstyle ├── requirements.txt └── src ├── .DS_Store ├── graph_community.py ├── hybrid_job_community_detection.py ├── qbsolv_community.py ├── qubo_community.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # PyCharm stuff 2 | .idea 3 | 4 | # Code output/results 5 | output 6 | data 7 | .ipynb_checkpoints 8 | data/synthetic 9 | src/__pycache__ 10 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this 4 | software and associated documentation files (the "Software"), to deal in the Software 5 | without restriction, including without limitation the rights to use, copy, modify, 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 7 | permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Community Detection in Complex Networks using Hybrid Quantum Annealing on Amazon Braket 2 | 3 | This code package is for solving network community detection problems using hybrid quantum annealing on Amazon Braket. To learn more background information, you can read our AWS quantum blog post series on community detection [Part 1](https://aws.amazon.com/blogs/quantum-computing/community-detection-in-complex-networks-using-hybrid-quantum-annealing-on-amazon-braket-part-i/) and [Part 2](https://aws.amazon.com/blogs/quantum-computing/community-detection-using-hybrid-quantum-annealing-on-amazon-braket-part-2/). 4 | 5 | ### Quick Start 6 | 7 | The tutorial notebook [`Notebook_QBSolv_community_detection`](Notebook_QBSolv_community_detection.ipynb) provides a step-by-step guide on how to formulate community detection as a Quadratic Unconstrained Binary Optimization (QUBO) problem, similar to the work done by [Negre et. al](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0227538). We then demonstrate how to use the open source QBSolv library, which provides quantum-classical hybrid solvers for QUBO problems, using a combination of classical compute resources and D-Wave quantum annealers, to solve community detection problems on Amazon Braket. 8 | 9 | **Table of Contents for the Tutorial Notebook** 10 | * Modularity-based community detection 11 | * Community detection as a QUBO problem 12 | * Datasets 13 | * List of key functions for community detection 14 | * Set up environment 15 | * Download graph data 16 | * Prepare graph data for community detection 17 | * Apply QBSolv for community detection 18 | * Detect communities for synthetic and real-world networks 19 | 20 | The Amazon Braket Hybrid Jobs notebook [`Hybrid_jobs_for_community_detection`](Hybrid_jobs_for_community_detection.ipynb) provides a step-by-step guide on how to use Amazon Braket Hybrid Jobs to seamlessly manage and execute quantum annealing-based community detection tasks at scale. 21 | 22 | **Table of Contents for the Amazon Braket Hybrid Jobs Notebook** 23 | * Set up environment 24 | * Prepare input data 25 | * Create an algorithm script 26 | * Specify hyperparameters 27 | * Submit a hybrid job 28 | * View results 29 | * Run hyperparameter tuning 30 | 31 | 32 | ### Datasets 33 | 34 | The graph datasets used for demonstration here can be downloaded from http://networkrepository.com/ under a Creative Commons Attribution-ShareAlike License. Dataset reference: Rossi, Ryan A. and Ahmed, Nesreen K. (2015) The Network Data Repository with Interactive Graph Analytics and Visualization. AAAI https://networkrepository.com. 35 | 36 | 37 | ## Security 38 | 39 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 40 | 41 | ## License 42 | 43 | This library is licensed under the MIT-0 License. See the LICENSE file. 44 | 45 | -------------------------------------------------------------------------------- /image/Karate_DwaveQBSolv_4comm_modu0.4198.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-braket-community-detection/3cf0f5ecfe214a23d5880acdea6691fe72ff6716/image/Karate_DwaveQBSolv_4comm_modu0.4198.png -------------------------------------------------------------------------------- /plots/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-braket-community-detection/3cf0f5ecfe214a23d5880acdea6691fe72ff6716/plots/.DS_Store -------------------------------------------------------------------------------- /plots/whitepaper.mplstyle: -------------------------------------------------------------------------------- 1 | figure.figsize : 10, 7 2 | savefig.dpi : 300 3 | axes.titlesize : 18 4 | axes.titlepad : 20 5 | font.size : 14 6 | axes.linewidth : 2 7 | lines.linewidth : 2 8 | axes.xmargin : 0.01 9 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | networkx==2.4 2 | scipy==1.5.2 3 | pandas==1.1.4 4 | matplotlib==3.2.2 5 | minorminer==0.2.5 6 | amazon-braket-sdk==1.14.0 7 | dwave-ocean-sdk==3.3.0 8 | amazon-braket-ocean-plugin==1.0.6 -------------------------------------------------------------------------------- /src/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-braket-community-detection/3cf0f5ecfe214a23d5880acdea6691fe72ff6716/src/.DS_Store -------------------------------------------------------------------------------- /src/graph_community.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import matplotlib.pyplot as plt 5 | import networkx as nx 6 | import numpy as np 7 | import os 8 | 9 | 10 | class CommunityGraphFactory(object): 11 | """ 12 | Create a graph for community detection from two options: 1) load from a local data file; 2) create a synthetic network via networkx's random partition graph 13 | """ 14 | 15 | def __init__(self, seed=1): 16 | """ 17 | Estbalish the seed for numpy's RNG downstream 18 | 19 | :param seed: int, seed value to use when constructing random graphs 20 | """ 21 | self.seed = seed 22 | 23 | @staticmethod 24 | def _sort_nodes(nx_G): 25 | """ 26 | Relabel node index in order of 0, 1, ...., N-1 27 | 28 | :param nx_G: networkx.classes.graph.Graph, Graph object to be sorted 29 | """ 30 | nodes = sorted(nx_G.nodes()) 31 | new_nodes = list(range(len(nodes))) 32 | nodes_mapping = dict(zip(nodes, new_nodes)) 33 | 34 | return nx.relabel_nodes(nx_G, nodes_mapping) 35 | 36 | def load_graph(self, graph_name, graph_file_dict): 37 | """ 38 | Load a graph by name from a dict: {graph_name: [data_file_path, delimiter]} 39 | 40 | :param graph_name: str, Identifier (name) of graph to be loaded in 41 | :param graph_file_dict: dict, Contains location and delimiter specification for known graphs 42 | """ 43 | 44 | nx_G = nx.read_weighted_edgelist( 45 | graph_file_dict[graph_name][0], 46 | delimiter=graph_file_dict[graph_name][1], 47 | create_using=nx.Graph(), 48 | nodetype=int 49 | ) 50 | print(nx.info(nx_G)) 51 | return self._sort_nodes(nx_G) 52 | 53 | def create_random_graph(self, total_nodes, num_comm, prob_in, prob_out, seed=None): 54 | """ 55 | Create a random partition graph with minimum degree >=1, save the group labels for 56 | graph nodes into a local file. If seed is provided, overwrite initialized seed value. 57 | 58 | :param total_nodes: int, Desired number of nodes in generated graph 59 | :param num_comm: int, Desired number of communities in graph 60 | :param prob_in: float, Probability of an edge between nodes in the same community 61 | :param prob_out: float, Probability of an edge between nodes in different communities 62 | :param seed: int, optional, If provided, use as seed value for generating random graphs 63 | """ 64 | 65 | if seed is not None: 66 | self.seed = seed 67 | 68 | np.random.seed(self.seed) 69 | community_size = np.random.multinomial(total_nodes, np.ones(num_comm)/num_comm, size=self.seed)[0] 70 | nx_G = nx.random_partition_graph(community_size, prob_in, prob_out, self.seed) 71 | 72 | out_degree = dict(nx_G.degree(weight='weight')) 73 | seed_ = 1 74 | while (min(out_degree.values()) == 0) & (seed_ <= 50): 75 | seed_ += 1 76 | print(f"iterating graph generation with random seed {seed_}") 77 | nx_G = nx.random_partition_graph(community_size, prob_in, prob_out, seed=seed_) 78 | out_degree = dict(nx_G.degree(weight='weight')) 79 | 80 | if min(out_degree.values()) == 0: 81 | raise ValueError("some nodes have ZERO degree! Change random graph input settings and re-generate a graph.") 82 | 83 | print(nx.info(nx_G)) 84 | 85 | # save node labels 86 | labels = [node_attributes['block'] for _, node_attributes in nx_G.nodes(data=True)] 87 | node_labels = list(enumerate(labels)) 88 | 89 | if not os.path.exists('./data/synthetic'): 90 | os.makedirs('./data/synthetic') 91 | 92 | label_outfile = os.path.join( 93 | "./data/synthetic", f"node_labels_{total_nodes}_{num_comm}_{prob_in}_{prob_out}.node_labels") 94 | 95 | # Store node-label mapping to csv-string file 96 | with open(label_outfile, 'w') as file: 97 | file.writelines(','.join(str(j) for j in i) + '\n' for i in node_labels) 98 | 99 | return self._sort_nodes(nx_G) 100 | 101 | def draw_graph(self, nx_G): 102 | """ 103 | Simply draw a graph for a graph with less than 200 nodes 104 | """ 105 | 106 | if len(nx_G.nodes()) <= 200: 107 | pos = nx.kamada_kawai_layout(nx_G) 108 | return nx.draw(nx_G, pos, with_labels=True) 109 | else: 110 | print("Too many nodes (>= 200) to display!") 111 | 112 | 113 | # for labeled networks 114 | def load_node_labels(file_path, delimiter=','): 115 | """ 116 | Load node label data from a local file 117 | 118 | :param file_path: str, a local file path to node label data for a graph 119 | :param delimiter: str, delimiter used to separate node records 120 | :return: label_nodes_dict, label_community, labels_array 121 | """ 122 | 123 | with open(file_path, 'r') as f: 124 | node_labels = [list(map(int, line.split(delimiter))) for line in f.read().splitlines()] 125 | 126 | # Sort according to node values 127 | node_labels_sorted = sorted(node_labels, key=lambda x: x[0]) 128 | node_labels = np.array(node_labels_sorted) 129 | 130 | # Center node and label IDs 131 | node_labels[:, 0] -= node_labels[:, 0].min() 132 | node_labels[:, 1] -= node_labels[:, 1].min() 133 | 134 | # Create one-hot encoding of labels (node classes) 135 | labels_array = np.zeros((node_labels.shape[0], len(set(node_labels[:, 1])))) 136 | labels_array[node_labels[:, 0], node_labels[:, 1]] = 1.0 137 | 138 | # Map {label_id: [nodes, with, that, label]} 139 | label_nodes_dict = {label: [node for node, label_ in node_labels if label_ == label] \ 140 | for label in set(node_labels[:, 1])} 141 | 142 | # Prep list of nodes under each community for networkX function 143 | label_community = list(label_nodes_dict.values()) 144 | 145 | return label_nodes_dict, label_community, labels_array 146 | 147 | 148 | # draw network graph with nodes colored by community groups 149 | def draw_graph_community(nx_G, communities, comm_order=None, color_map='rainbow', color_list=[], seed=42): 150 | """ 151 | Draw network with nodes colored based on community groups 152 | 153 | :param nx_G: networkX graph 154 | :param communities: list, a list with nodes grouped by communities, e.g., [{0, 1, 3}, {2, 4, 5}] 155 | :param comm_order: list, order of communities to map communities to colors 156 | :param color_map: str, one of the existing color map in matplotlib.pyplot 157 | :param color_list: list, a list of color names for specifying a color for a community, e.g., 158 | color_list = ['tab:blue', 'tab:red', 'tab:green', 'tab:purple', 'tab:orange', 'tab:cyan'] 159 | :param seed: int, random seed for networkX layout 160 | :return: a draw of network 161 | """ 162 | 163 | if comm_order is None: 164 | # re-order communities based on the sum of node index for consistent coloring: 165 | # we may still have inconsistent coloring among results from different number of communities 166 | sum_nodes = [sum(i) for i in communities] 167 | comm_order = sorted(range(len(sum_nodes)), key=lambda k: sum_nodes[k]) 168 | communities = [communities[i] for i in comm_order] 169 | 170 | class_map = {} 171 | 172 | if len(color_list) == 0: 173 | for cl in range(len(communities)): 174 | for n in communities[cl]: 175 | class_map.update({n: cl}) 176 | 177 | class_map = dict(sorted(class_map.items())) 178 | 179 | pos = nx.spring_layout(nx_G, seed=seed) 180 | nx.draw(nx_G, cmap=plt.get_cmap(color_map), pos=pos, node_color=list(class_map.values()), with_labels=True, 181 | font_color='white', node_size=500, font_size=10) 182 | else: 183 | assert len(color_list) >= len(communities), \ 184 | "Number of colors in color_list is less than the number of communities!!" 185 | for cl in range(len(communities)): 186 | for n in communities[cl]: 187 | class_map.update({n: color_list[cl]}) 188 | 189 | class_map = dict(sorted(class_map.items())) 190 | 191 | pos = nx.spring_layout(nx_G, seed=seed) 192 | nx.draw(nx_G, pos=pos, node_color=list(class_map.values()), with_labels=True, 193 | font_color='white', node_size=500, font_size=10) 194 | -------------------------------------------------------------------------------- /src/hybrid_job_community_detection.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import os 5 | from braket.jobs import save_job_result 6 | from braket.jobs.metrics import log_metric 7 | import json 8 | import networkx as nx 9 | 10 | # Load community detection specific library 11 | from src.qbsolv_community import QbsolvCommunity 12 | 13 | def main(): 14 | # Print statements can be viewed in cloudwatch 15 | print(os.environ) 16 | 17 | input_dir = os.environ["AMZN_BRAKET_INPUT_DIR"] 18 | hp_file = os.environ["AMZN_BRAKET_HP_FILE"] 19 | job_name = os.environ["AMZN_BRAKET_JOB_NAME"] 20 | s3_bucket = os.environ["AMZN_BRAKET_OUT_S3_BUCKET"] 21 | device_arn = os.environ["AMZN_BRAKET_DEVICE_ARN"] 22 | 23 | # Read the hyperparameters 24 | with open(hp_file, "r") as f: 25 | hyperparams = json.load(f) 26 | print(hyperparams) 27 | 28 | # Graph related parameters 29 | input_graph_file = str(hyperparams["input_graph_file"]).strip('"') 30 | k = int(hyperparams["num_community"]) 31 | 32 | # QBSolv related parameters 33 | solver_mode = str(hyperparams["solver_mode"]).strip('"') 34 | solver_limit = int(hyperparams["solver_limit"]) 35 | num_repeats = int(hyperparams["num_repeats"]) 36 | num_reads = int(hyperparams["num_reads"]) 37 | seed = int(hyperparams["seed"]) 38 | alpha = int(hyperparams["alpha"]) 39 | 40 | print(f"Load graph file from {input_dir}/input-graph/{input_graph_file}") 41 | nx_G = nx.read_weighted_edgelist( 42 | f"{input_dir}/input-graph/{input_graph_file}", 43 | delimiter=None, # check the input graph file and update the delimiter here 44 | create_using=nx.Graph(), 45 | nodetype=int) 46 | print(f"Input graph information: {nx.info(nx_G)}") 47 | 48 | # Initialize QbsolvCommunity class 49 | qbsolv_comm = QbsolvCommunity(nx_G, solver_limit, num_repeats, num_reads, seed, alpha) 50 | 51 | if solver_mode == "classical": 52 | print("Executing QBSolv Classical solver for community detection") 53 | comm_results, qbsolv_output = qbsolv_comm.solve_classical(k) 54 | elif solver_mode == "hybrid": 55 | # QBSolv Hybrid solver specific input 56 | s3_task_prefix = f"jobs/{job_name}/tasks" # the folder name in the S3 braket bucket to save QBSolv task output 57 | s3_folder = (s3_bucket, s3_task_prefix) 58 | 59 | print("Executing QBSolv Hybrid solver for community detection") 60 | comm_results, qbsolv_output = qbsolv_comm.solve_hybrid(k, s3_folder, device_arn, ack_QPUcost=True) 61 | else: 62 | raise ValueError(f"Invalid qbsolv solver mode {solver_mode}. Solver mode has to be in ['classical', 'hybrid']!") 63 | 64 | log_metric( 65 | metric_name="Modularity", 66 | value=comm_results["modularity"], 67 | ) 68 | 69 | # We're done with the job, so save the result. 70 | # This will be returned in job.result() 71 | print('Save results') 72 | save_job_result({"community_results": str(comm_results), "hyperparams": str(hyperparams), "qbsolv_output": str(qbsolv_output)}) 73 | 74 | if __name__ == "__main__": 75 | main() -------------------------------------------------------------------------------- /src/qbsolv_community.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import collections 5 | import datetime 6 | import minorminer 7 | import networkx as nx 8 | import numpy as np 9 | import os 10 | import pandas as pd 11 | import pickle 12 | import time 13 | import warnings 14 | 15 | from copy import deepcopy 16 | from dwave_qbsolv import QBSolv 17 | from dwave.system.composites import FixedEmbeddingComposite 18 | from braket.ocean_plugin import BraketDWaveSampler 19 | from collections import defaultdict 20 | from networkx.algorithms import community 21 | 22 | from src.qubo_community import qubo_matrix_community_sparse, qbsolv_response_to_community 23 | 24 | 25 | def create_qubo_dict(nx_G, k, alpha=5): 26 | """ 27 | Create a QUBO matrix in dict format for community detection 28 | 29 | :param nx_G: networkX graph 30 | :param k: int, number of communities to detect 31 | :param alpha: int, constraint coefficient to force a solution with one node for one community assignment only 32 | :return: QUBO sparse matrix in dict format 33 | """ 34 | 35 | qubo_matrix_sparse = qubo_matrix_community_sparse(nx_G, k, alpha=alpha) 36 | 37 | indices = list(zip(qubo_matrix_sparse.row, qubo_matrix_sparse.col)) 38 | values = qubo_matrix_sparse.data 39 | 40 | qubo_sparse = defaultdict(int) 41 | for idx, val in zip(indices, values): 42 | qubo_sparse[idx] = val 43 | 44 | print(f"The size of the QUBO matrix in dictionary format for {k}-community is {len(qubo_sparse)}") 45 | 46 | return qubo_sparse 47 | 48 | 49 | class QbsolvCommunity(object): 50 | """ 51 | Use QBSolv to solve community detection via either the classical solver or the hybrid solver 52 | """ 53 | def __init__(self, graph, solver_limit=40, num_repeats=1, num_reads=1000, seed=1, alpha=5): 54 | 55 | """ 56 | Set hyperparameter values for QBSolv 57 | 58 | :param graph: a networkX graph 59 | :param solver_limit: int, the maximum number of variables (n) for sub-QUBOs 60 | :param num_repeats: int, the maximum iterations to repeat QBSolv solver execution to discover a new best solution 61 | :param num_reads: int, the number of times the annealing to be performed 62 | :param seed: int, random seed value 63 | :param alpha: int, the penalty coefficient to enforce assigning only one community to each node 64 | """ 65 | 66 | self.graph = graph 67 | self.solver_limit = solver_limit 68 | self.num_repeats = num_repeats 69 | self.num_reads = num_reads 70 | self.seed = seed 71 | self.alpha = alpha 72 | 73 | def solve_classical(self, num_comm): 74 | """ 75 | Call QUBO classical solver for community detection 76 | 77 | :param num_comm: int, number of communities to solve for 78 | :return: dict, two dictionaries for graph's community results and QBSolv reponse results 79 | """ 80 | t0 = time.time() 81 | 82 | q_dict = create_qubo_dict(self.graph, num_comm, self.alpha) 83 | 84 | # execute optimization task using QBSolv classical (run on your local notebook instance) 85 | response_classical = QBSolv().sample_qubo( 86 | q_dict, num_repeats=self.num_repeats, solver_limit=self.solver_limit, seed=self.seed 87 | ) 88 | print(f"Mode: Classical, time spent is {round(time.time()-t0, 2)} seconds for {self.num_repeats} repetitions") 89 | print(response_classical) 90 | 91 | # extract the best solution that has the lowest energy 92 | sample_classical = np.array(list(response_classical.first.sample.values())) 93 | comm_classical = qbsolv_response_to_community(self.graph, sample_classical, num_comm) 94 | 95 | return comm_classical, response_classical 96 | 97 | def solve_hybrid(self, num_comm, s3_folder, device_arn, ack_QPUcost=False): 98 | """ 99 | Call QUBO hybrid solver for community detection 100 | 101 | :param num_comm: int, number of communities to solve for 102 | :param s3_folder: str, the Amazon Braket S3 path to store solver response files 103 | :param device_arn: str, D-Wave QPU Device ARN (only needed for QBSolv Hybrid solver) 104 | :param ack_QPUcost: bool, acknowledge QPU cost associated with the QBSolv Hybrid solver for price awareness 105 | :return: dict, two dictionaries for graph's community results and QBSolv reponse results 106 | """ 107 | q_dict = create_qubo_dict(self.graph, num_comm, self.alpha) 108 | self.qpu_cost_warning() 109 | 110 | if ack_QPUcost: 111 | execution = "y" 112 | else: 113 | try: 114 | execution = input("Continue to execute QBSolv Hybrid job: Y or N?") 115 | except EOFError as error: 116 | print("\n WARNING: User didn't acknowledge QPU cost. Hybrid job execution is declined!") 117 | raise error 118 | 119 | if execution.lower() in ["y", "yes"]: 120 | t0 = time.time() 121 | system = BraketDWaveSampler(s3_folder, device_arn) 122 | 123 | # find embedding of subproblem-sized complete graph to the QPU 124 | G_sub = nx.complete_graph(self.solver_limit) 125 | embedding = minorminer.find_embedding(G_sub.edges, system.edgelist) 126 | 127 | # use the FixedEmbeddingComposite() method with a fixed embedding 128 | solver = FixedEmbeddingComposite(system, embedding) 129 | 130 | # execute optimization task using QBSolv hybrid on D-Wave QPU 131 | response_hybrid = QBSolv().sample_qubo(q_dict, solver=solver, num_repeats=self.num_repeats, 132 | solver_limit=self.solver_limit, num_reads=self.num_reads, 133 | seed=self.seed) 134 | print(f"Mode: Hybrid, time spent is {round(time.time()-t0, 2)} seconds for {self.num_repeats} repeats") 135 | print(response_hybrid) 136 | 137 | # extract the best solution that has the lowest energy 138 | sample_hybrid = np.array(list(response_hybrid.first.sample.values())) 139 | comm_hybrid = qbsolv_response_to_community(self.graph, sample_hybrid, num_comm) 140 | 141 | return comm_hybrid, response_hybrid 142 | else: 143 | raise ValueError("Hybrid job execution declined by the user!") 144 | 145 | def qpu_cost_warning(self): 146 | 147 | def custom_formatwarning(msg, *args, **kwargs): 148 | return str(msg) + '\n' 149 | 150 | warnings.formatwarning = custom_formatwarning 151 | stmt = f"\033[91mWARNING:\033[0m Additional cost for using D-Wave QPU. " \ 152 | f"Please evaluate potential cost before executing this QBSolv hybrid job" 153 | warnings.warn(stmt) 154 | 155 | 156 | class CommunityDetectionSolver(object): 157 | """ 158 | Call QUBO classical/hybrid solver to find community structure of a graph and save results 159 | """ 160 | def __init__(self, graph_name, graph, num_comm, solver_limit=100, num_repeats=1, num_reads=1000, 161 | seed=1, alpha=10, mode='classical', s3_folder='N/A', device_arn = 'N/A'): 162 | """ 163 | Input for graph and QBSolv hyperparameter values 164 | 165 | :param graph_name: str, the name of a graph for result saving 166 | :param graph: a networkX graph 167 | :param num_comm: int, number of communities to solve for 168 | :param solver_limit: int, the maximum number of variables (n) for sub-QUBOs 169 | :param num_repeats: int, the maximum iterations to repeat QBSolv solver execution to discover a new best solution 170 | :param num_reads: int, the number of times the annealing to be performed 171 | :param seed: int, random seed value 172 | :param alpha: int, the penalty coefficient to enforce assigning only one community to each node 173 | :param mode, str, must be either 'classical' or 'hybrid'. Determines whether the classical or hybrid solver is called 174 | :param s3_folder, str, Amazon Braket S3 folder path (only needed for QBSolv Hybrid solver) 175 | :param device_arn: str, D-Wave QPU Device ARN (only needed for QBSolv Hybrid solver) 176 | 177 | """ 178 | self.graph_name = graph_name 179 | self.graph = graph 180 | self.num_comm = num_comm 181 | self.solver_limit = solver_limit 182 | self.num_repeats = num_repeats 183 | self.num_reads = num_reads 184 | self.seed = seed 185 | self.alpha = alpha 186 | self.mode = mode 187 | self.s3_folder = s3_folder 188 | self.device_arn = device_arn 189 | 190 | def _solve_single_graph(self): 191 | """ 192 | Call QUBO classical/hybrid solver to process a single graph for community detection 193 | 194 | :param graph: a networkX graph 195 | :param num_comm: int, number of communities to solve for 196 | """ 197 | 198 | QbsolvComm = QbsolvCommunity( 199 | self.graph, self.solver_limit, self.num_repeats, self.num_reads, self.seed, self.alpha) 200 | 201 | t0 = time.time() 202 | if self.mode == 'classical': 203 | comm_results, response = QbsolvComm.solve_classical(self.num_comm) 204 | elif self.mode == 'hybrid': 205 | comm_results, response = QbsolvComm.solve_hybrid(self.num_comm, self.s3_folder, self.device_arn) 206 | else: 207 | raise ValueError(f"Invalid qbsolv mode {self.mode}. Mode has to be in ['classical', 'hybrid']!") 208 | 209 | time_spent = round(time.time()-t0, 2) 210 | 211 | return comm_results, response, time_spent 212 | 213 | def run_community_detection(self, save=False): 214 | """ 215 | Call QUBO classical/hybrid solver for community detection and save results 216 | 217 | :param save: boolean, True or False to set whether to save results locally or not 218 | :return: dict, two dictionaries: the first one 'results_graph' contains graph-level results about graph, qbsolv seetings, modularity values, 219 | and execution time; the second one 'track' contains node-level results about graph edge connections and community assignment. 220 | """ 221 | results_graph = collections.defaultdict(list) 222 | timenow = str(datetime.datetime.now())[:19].replace(' ', '_') 223 | date_today = str(datetime.date.today()) 224 | track = {'graphs': [], 'responses': [], 'communities': []} 225 | 226 | output_parent = f'./output/{date_today}' 227 | result_file = f"result_{self.graph_name}_{timenow}_sl{self.solver_limit}_rp{self.num_repeats}_shot{self.num_reads}_seed{self.seed}.csv" 228 | 229 | if save and (not os.path.exists(output_parent)): 230 | print(f'Creating parent folder(s): {output_parent}') 231 | print('Will create required sub-directories quietly') 232 | os.makedirs(output_parent) 233 | 234 | # run QBSolv for community detection 235 | comm_results, response, time_spent = self._solve_single_graph() 236 | print(f"Modularity from QBSolv with {comm_results['num_comm']} communities is {round(comm_results['modularity'], 4)}") 237 | 238 | # Save graph information 239 | results_graph['graph_name'].append(self.graph_name) 240 | results_graph['total_nodes'].append(self.graph.number_of_nodes()) 241 | results_graph['num_edge'].append(self.graph.number_of_edges()) 242 | results_graph['num_comm'].append(self.num_comm) 243 | 244 | # Save results from qbsolv 245 | results_graph['modu_qbsolv'].append(comm_results['modularity']) 246 | results_graph['num_comm_qbsolv'].append(comm_results['num_comm']) 247 | results_graph['wall_time_s'].append(time_spent) 248 | results_graph['solver_limit'].append(self.solver_limit) 249 | results_graph['num_repeats'].append(self.num_repeats) 250 | results_graph['num_reads'].append(self.num_reads) 251 | results_graph['seed'].append(self.seed) 252 | results_graph['alpha'].append(self.alpha) 253 | results_graph['mode'].append(self.mode) 254 | results_graph['s3_folder'].append(self.s3_folder) 255 | 256 | graph_file = f"graph_{self.graph_name}_QBS_sl{self.solver_limit}_rp{self.num_repeats}_shot{self.num_reads}_seed{self.seed}.gpickle" 257 | response_file = f"response_{self.graph_name}_QBS_sl{self.solver_limit}_rp{self.num_repeats}_shot{self.num_reads}_seed{self.seed}.p" 258 | comm_file = f"modu_{self.graph_name}_QBS_sl{self.solver_limit}_rp{self.num_repeats}_shot{self.num_reads}_seed{self.seed}.p" 259 | 260 | track['graphs'].append((graph_file, deepcopy(self.graph))) 261 | track['responses'].append((response_file, deepcopy(response))) 262 | track['communities'].append((comm_file, deepcopy(comm_results))) 263 | 264 | if save: 265 | self._save_results(track, results_graph, output_parent, result_file) 266 | 267 | return results_graph, track 268 | 269 | @staticmethod 270 | def _save_results(track_dict, results_graph, output_parent, result_file): 271 | 272 | results_df = pd.DataFrame.from_dict(results_graph) 273 | for top_key, item_list in track_dict.items(): 274 | output_subdir = f'{output_parent}/{top_key}' 275 | if not os.path.exists(output_subdir): 276 | os.makedirs(output_subdir) 277 | for item in item_list: 278 | out_loc = f'{output_subdir}/{item[0]}' 279 | print(f'Writing to disk: {out_loc}') 280 | if top_key == 'graphs': 281 | nx.write_gpickle(item[1], out_loc) 282 | else: 283 | pickle.dump(item[1], open(out_loc, 'wb')) 284 | 285 | print(f'Writing to disk: {output_parent}/{result_file}') 286 | results_df.to_csv(f'{output_parent}/{result_file}', index=False) -------------------------------------------------------------------------------- /src/qubo_community.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import collections 5 | import networkx as nx 6 | import numpy as np 7 | import scipy.sparse as sp 8 | 9 | from networkx.algorithms import community 10 | 11 | 12 | def modularity_mx(nx_G): 13 | """ 14 | Create a sparse matrix for the modularity of a graph nx_G 15 | 16 | :param nx_G: networkX graph 17 | :return: scipy coo sparse matrix, the modularity matrix for nx_G with weighted edges 18 | """ 19 | 20 | # Generate a sparse adjacency matrix using networkx 21 | adj = nx.adjacency_matrix(nx_G, nodelist=sorted(list(nx_G)), weight='weight') 22 | 23 | # Create a numpy array for node's degree. 24 | # Here we assumed an undirected graph, so `degrees` refers to both the in-degree and out-degree 25 | sorted_degrees = sorted(nx_G.degree(weight='weight')) 26 | degrees = np.array([degree[1] for degree in sorted_degrees]) 27 | m = sum(degrees) / 2 28 | 29 | # Calculate the expected number of edges between two nodes for a null model. 30 | # Note we use sparse matrix format here but this degree matrix is a dense one by definition 31 | degrees = np.expand_dims(degrees.squeeze(), axis=0) 32 | degree_mx = sp.csr_matrix(degrees.T).multiply(sp.csr_matrix(degrees)) / (2 * m) 33 | 34 | # Create a modularity matrix and convert it into coo sparse matrix format (Torch sparse tensor compatible) 35 | modu_mx_sparse = (sp.csr_matrix(adj) - degree_mx) / (2 * m) 36 | modu_mx_sparse = sp.coo_matrix(modu_mx_sparse) 37 | 38 | return modu_mx_sparse 39 | 40 | 41 | def qubo_matrix_community_sparse(nx_G, k, alpha=5): 42 | """ 43 | Create a sparse matrix as a QUBO matrix for a graph nx_G with k-community detection 44 | 45 | :param nx_G: networkX graph 46 | :param k: int, the number of communities to detect for the graph nx_G 47 | :param alpha: float, the penalty coefficient for the constraint term in the QBUO matrix 48 | :return: scipy coo sparse matrix, the QUBO matrix to minimize for a graph nx_G with k-community detection 49 | """ 50 | 51 | # get the number of nodes for a networkx graph 52 | num_nodes = nx_G.number_of_nodes() 53 | 54 | # create the modularity matrix in coo sparse format 55 | modu_mx_sparse = modularity_mx(nx_G) 56 | 57 | # define the coefficient value for the QUBO constraint term that a node can only be in one community 58 | gamma_v = alpha / num_nodes 59 | 60 | # create sparse diagonal matrix for the linear constraint term in the QUBO matrix 61 | gamma_mx = sp.eye(num_nodes) * gamma_v 62 | 63 | # create a block diagonal matrix for k-commnuity problem where k > 2 64 | # this block diagonal matrix is for the linear constraint term in the QUBO matrix 65 | gamma_blockmatrix_sparse = sp.block_diag([gamma_mx] * k) 66 | 67 | # create a k x k sparse block matrix with each block being a diagonal matrix 68 | # this block matrix is for the quadratic constraint term in the QUBO matrix 69 | constraint_mx = [[gamma_mx] * k] * k 70 | constraint_blockmatrix_sparse = sp.bmat(constraint_mx) 71 | 72 | # create a sparse block diagonal matrix with the diagonal value equal to the modularity matrix 73 | # this is the modularity matrix for k communities in QUBO format 74 | modu_mx_sparse_k = [modu_mx_sparse] * k 75 | modu_block_sparse = sp.block_diag(modu_mx_sparse_k) 76 | 77 | # create a QUBO sparse matrix (for minimization) by combinding the modularity matrix and the constraint 78 | # term matrix for a k-community problem 79 | q_blockmatrix_sparse = -1 * modu_block_sparse + constraint_blockmatrix_sparse - 2 * gamma_blockmatrix_sparse 80 | q_blockmatrix_sparse = sp.coo_matrix(q_blockmatrix_sparse) 81 | 82 | return q_blockmatrix_sparse 83 | 84 | def qbsolv_response_to_community(nx_G, response_sample, k): 85 | """ 86 | Extract communities from QBSolv responses and calculate its modularity 87 | 88 | :param nx_G: networkX graph 89 | :param response_sample: QBSolv responses 90 | :param k: int, the number of communities to detect for the graph nx_G 91 | :return: dict, a dictionary of node sets as community groups and the graph modularity 92 | """ 93 | 94 | num_nodes = nx_G.number_of_nodes() 95 | 96 | # Split result out into binary assignments within each community 97 | result = response_sample.squeeze() 98 | result = result.reshape(k, num_nodes) 99 | 100 | # Extract node IDs belonging to each community, based on results 101 | communities = [] 102 | for i in range(k): 103 | node_c = np.where(result[i] == 1)[0] 104 | if len(node_c) > 0: 105 | communities.append(set(node_c)) 106 | 107 | # Check if there is multi-community assignment for a node or a node without community assignment 108 | for i in range(num_nodes): 109 | if result[:, i].sum() > 1: 110 | raise ValueError('Multi-community assignment!') 111 | break 112 | if result[:, i].sum() == 0: 113 | raise ValueError('Node without community assignment!') 114 | break 115 | 116 | # Order communities according to lowest-ID nodes in each set, ascending order 117 | communities.sort(key=min) 118 | 119 | modu = community.modularity(nx_G, communities) 120 | k = len(communities) # in case any communities returned no hits 121 | 122 | return {"modularity": modu, "num_comm": k, "comm": communities} 123 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import os 5 | import shutil 6 | from urllib.request import urlopen 7 | from zipfile import ZipFile 8 | from io import BytesIO 9 | 10 | def download_graphs(graph_url, data_dir = "./graph_data"): 11 | """ 12 | Download graph .zip files from web URL 13 | 14 | :param graph_url: dict, with a format of {'graph_name': 'url'} 15 | :param data_dir: str, the directory path to store graph data 16 | """ 17 | 18 | if not os.path.exists(data_dir): 19 | os.makedirs(data_dir) 20 | print("Created ./graph_data directory in local machine to store graph data.") 21 | 22 | for graph_name in graph_url.keys(): 23 | url = graph_url[graph_name] 24 | with urlopen(url) as zr: 25 | with ZipFile(BytesIO(zr.read())) as zf: 26 | zf.extractall(data_dir) 27 | 28 | def clean_graph_data(graph_files, data_dir = "./graph_data"): 29 | """ 30 | Clean graph data by removing header lines 31 | 32 | :param graph_files: dict, with a format of {'graph_name': {'file': str, 'lines_to_skip': int}} 33 | :param data_dir: str, the directory path to graph data 34 | """ 35 | 36 | for graph_name in graph_files.keys(): 37 | 38 | # create a subfolder for each graph and save its file with header lines removed 39 | graph_folder = os.path.join(data_dir, graph_name) 40 | if not os.path.exists(graph_folder): 41 | os.makedirs(graph_folder) 42 | 43 | raw_file = os.path.join(data_dir, graph_files[graph_name]['file']) 44 | new_file = os.path.join(graph_folder, graph_files[graph_name]['file']) 45 | 46 | with open(raw_file, 'r') as f_raw: 47 | data = f_raw.read().splitlines(True) 48 | with open(new_file, 'w') as f_new: 49 | f_new.writelines(data[graph_files[graph_name]['lines_to_skip']:]) --------------------------------------------------------------------------------