├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── improvement-request.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── Graph_Sampling.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt └── top_level.txt ├── Graph_Sampling ├── ForestFire.py ├── MHRW.py ├── SRW_RWF_ISRW.py ├── Snowball.py ├── TIES.py ├── __init__.py └── __pycache__ │ ├── ForestFire.cpython-37.pyc │ ├── MHRW.cpython-37.pyc │ ├── SRW_RWF_ISRW.cpython-37.pyc │ ├── Snowball.cpython-37.pyc │ ├── TIES.cpython-37.pyc │ └── __init__.cpython-37.pyc ├── LICENSE ├── README.md ├── _config.yml ├── build └── lib │ └── Graph_Sampling │ ├── ForestFire.py │ ├── MHRW.py │ ├── SRW_RWF_ISRW.py │ ├── Snowball.py │ ├── TIES.py │ └── __init__.py ├── dist ├── Graph Sampling-0.0.1.tar.gz └── Graph_Sampling-0.0.1-py3-none-any.whl ├── fb.txt ├── setup.py └── test.py /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | 5 | --- 6 | 7 | **Describe the bug** 8 | A clear and concise description of what the bug is. 9 | 10 | **To Reproduce** 11 | Steps to reproduce the behavior: 12 | 1. Go to '...' 13 | 2. Click on '....' 14 | 3. Scroll down to '....' 15 | 4. See error 16 | 17 | **Expected behavior** 18 | A clear and concise description of what you expected to happen. 19 | 20 | **Screenshots** 21 | If applicable, add screenshots to help explain your problem. 22 | 23 | **Desktop (please complete the following information):** 24 | - OS: [e.g. iOS] 25 | - Browser [e.g. chrome, safari] 26 | - Version [e.g. 22] 27 | 28 | **Smartphone (please complete the following information):** 29 | - Device: [e.g. iPhone6] 30 | - OS: [e.g. iOS8.1] 31 | - Browser [e.g. stock browser, safari] 32 | - Version [e.g. 22] 33 | 34 | **Additional context** 35 | Add any other context about the problem here. 36 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/improvement-request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Improvement Request 3 | about: Suggest an ideas or improvements for this project 4 | 5 | --- 6 | 7 | **Is your Improvement Request related to a problem? Please describe.** 8 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 9 | 10 | **Describe the solution you'd like** 11 | A clear and concise description of what you want to happen. 12 | 13 | **Describe alternatives you've considered** 14 | A clear and concise description of any alternative solutions or features you've considered. 15 | 16 | **Additional context** 17 | Add any other context or screenshots about the feature request here. 18 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. 6 | 7 | ## Our Standards 8 | 9 | Examples of behavior that contributes to creating a positive environment include: 10 | 11 | * Using welcoming and inclusive language 12 | * Being respectful of differing viewpoints and experiences 13 | * Gracefully accepting constructive criticism 14 | * Focusing on what is best for the community 15 | * Showing empathy towards other community members 16 | 17 | Examples of unacceptable behavior by participants include: 18 | 19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances 20 | * Trolling, insulting/derogatory comments, and personal or political attacks 21 | * Public or private harassment 22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission 23 | * Other conduct which could reasonably be considered inappropriate in a professional setting 24 | 25 | ## Our Responsibilities 26 | 27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 28 | 29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 30 | 31 | ## Scope 32 | 33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 34 | 35 | ## Enforcement 36 | 37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at ashish.mcs16.du@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 38 | 39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. 40 | 41 | ## Attribution 42 | 43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] 44 | 45 | [homepage]: http://contributor-covenant.org 46 | [version]: http://contributor-covenant.org/version/1/4/ 47 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | People who would like to contribute in this graph sampling so they can write your new technique or can also improve existing ones. 2 | -------------------------------------------------------------------------------- /Graph_Sampling.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 2.1 2 | Name: Graph-Sampling 3 | Version: 0.0.1 4 | Summary: Graph Sampling Package 5 | Home-page: UNKNOWN 6 | Authors: [Kirti Jain,Ashish Aggarwal] 7 | Authors-email: [kirtij.mcs16.du@gmail.com,ashish.mcs16.du@gmail.com] 8 | License: UNKNOWN 9 | Description: A Graph Sampling package containing various approaches which samples the original graph according to different sample sizes 10 | Platform: UNKNOWN 11 | Classifier: Programming Language :: Python :: 3 12 | Classifier: License :: OSI Approved :: MIT License 13 | Classifier: Operating System :: OS Independent 14 | Description-Content-Type: text/markdown 15 | -------------------------------------------------------------------------------- /Graph_Sampling.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | README.md 2 | setup.py 3 | Graph_Sampling/ForestFire.py 4 | Graph_Sampling/MHRW.py 5 | Graph_Sampling/SRW_RWF_ISRW.py 6 | Graph_Sampling/Snowball.py 7 | Graph_Sampling/TIES.py 8 | Graph_Sampling/__init__.py 9 | Graph_Sampling.egg-info/PKG-INFO 10 | Graph_Sampling.egg-info/SOURCES.txt 11 | Graph_Sampling.egg-info/dependency_links.txt 12 | Graph_Sampling.egg-info/top_level.txt -------------------------------------------------------------------------------- /Graph_Sampling.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Graph_Sampling.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | Graph_Sampling 2 | -------------------------------------------------------------------------------- /Graph_Sampling/ForestFire.py: -------------------------------------------------------------------------------- 1 | import random 2 | import networkx as nx 3 | # import matplotlib.pyplot as plt 4 | # import time 5 | 6 | 7 | # G : Original Graph 8 | # size : size of the sampled graph 9 | class ForestFire(): 10 | def __init__(self): 11 | self.G1 = nx.Graph() 12 | 13 | def forestfire(self, G, size): 14 | list_nodes = list(G.nodes()) 15 | # print(len(G)) 16 | dictt = set() 17 | random_node = random.sample(set(list_nodes), 1)[0] 18 | # print(random_node) 19 | q = set() # q = set contains the distinct values 20 | q.add(random_node) 21 | while(len(self.G1.nodes()) < size): 22 | if(len(q) > 0): 23 | initial_node = q.pop() 24 | if(initial_node not in dictt): 25 | # print(initial_node) 26 | dictt.add(initial_node) 27 | neighbours = list(G.neighbors(initial_node)) 28 | # print(list(G.neighbors(initial_node))) 29 | np = random.randint(1, len(neighbours)) 30 | # print(np) 31 | # print(neighbours[:np]) 32 | for x in neighbours[:np]: 33 | if(len(self.G1.nodes()) < size): 34 | self.G1.add_edge(initial_node, x) 35 | q.add(x) 36 | else: 37 | break 38 | else: 39 | continue 40 | else: 41 | random_node = random.sample(set(list_nodes) and dictt, 1)[0] 42 | q.add(random_node) 43 | q.clear() 44 | return self.G1 45 | -------------------------------------------------------------------------------- /Graph_Sampling/MHRW.py: -------------------------------------------------------------------------------- 1 | import random 2 | # import time 3 | import networkx as nx 4 | # import matplotlib.pyplot as plt 5 | 6 | 7 | class MHRW(): 8 | def __init__(self): 9 | self.G1 = nx.Graph() 10 | 11 | def mhrw(self, G, node, size): 12 | dictt = {} 13 | node_list = set() 14 | node_list.add(node) 15 | parent_node = node_list.pop() 16 | dictt[parent_node] = parent_node 17 | degree_p = G.degree(parent_node) 18 | related_list = list(G.neighbors(parent_node)) 19 | node_list.update(related_list) 20 | 21 | while(len(self.G1.nodes()) < size): 22 | if(len(node_list) > 0): 23 | child_node = node_list.pop() 24 | p = round(random.uniform(0, 1), 4) 25 | if(child_node not in dictt): 26 | related_listt = list(G.neighbors(child_node)) 27 | degree_c = G.degree(child_node) 28 | dictt[child_node] = child_node 29 | if(p <= min(1, degree_p / degree_c) and child_node in list(G.neighbors(parent_node))): 30 | self.G1.add_edge(parent_node, child_node) 31 | parent_node = child_node 32 | degree_p = degree_c 33 | node_list.clear() 34 | node_list.update(related_listt) 35 | else: 36 | del dictt[child_node] 37 | 38 | # node_list set becomes empty or size is not reached 39 | # insert some random nodes into the set for next processing 40 | else: 41 | node_list.update(random.sample(set(G.nodes()) - set(self.G1.nodes()), 3)) 42 | parent_node = node_list.pop() 43 | G.add_node(parent_node) 44 | related_list = list(G.neighbors(parent_node)) 45 | node_list.clear() 46 | node_list.update(related_list) 47 | return self.G1 48 | 49 | def induced_mhrw(self, G, size, node): 50 | sampled_graph = mhrw(self.G1, G, size, node) 51 | induced_graph = G.subgraph(sampled_graph.nodes()) 52 | return induced_graph 53 | -------------------------------------------------------------------------------- /Graph_Sampling/SRW_RWF_ISRW.py: -------------------------------------------------------------------------------- 1 | import random 2 | # import time 3 | # import datetime 4 | # import io 5 | # import array, re, itertools 6 | import numpy as np 7 | import networkx as nx 8 | # import matplotlib.pyplot as plt 9 | # from itertools import groupby 10 | 11 | 12 | class SRW_RWF_ISRW: 13 | 14 | def __init__(self): 15 | self.growth_size = 2 16 | self.T = 100 # number of iterations 17 | # with a probability (1-fly_back_prob) select a neighbor node 18 | # with a probability fly_back_prob go back to the initial vertex 19 | self.fly_back_prob = 0.15 20 | 21 | def random_walk_sampling_simple(self, complete_graph, nodes_to_sample): 22 | complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True) 23 | # giving unique id to every node same as built-in function id 24 | for n, data in complete_graph.nodes(data=True): 25 | complete_graph.nodes[n]['id'] = n 26 | 27 | nr_nodes = len(complete_graph.nodes()) 28 | upper_bound_nr_nodes_to_sample = nodes_to_sample 29 | index_of_first_random_node = random.randint(0, nr_nodes - 1) 30 | sampled_graph = nx.Graph() 31 | 32 | sampled_graph.add_node(complete_graph.nodes[index_of_first_random_node]['id']) 33 | 34 | iteration = 1 35 | edges_before_t_iter = 0 36 | curr_node = index_of_first_random_node 37 | while sampled_graph.number_of_nodes() != upper_bound_nr_nodes_to_sample: 38 | edges = [n for n in complete_graph.neighbors(curr_node)] 39 | index_of_edge = random.randint(0, len(edges) - 1) 40 | chosen_node = edges[index_of_edge] 41 | sampled_graph.add_node(chosen_node) 42 | sampled_graph.add_edge(curr_node, chosen_node) 43 | curr_node = chosen_node 44 | iteration = iteration + 1 45 | 46 | if iteration % self.T == 0: 47 | if ((sampled_graph.number_of_edges() - edges_before_t_iter) < self.growth_size): 48 | curr_node = random.randint(0, nr_nodes - 1) 49 | edges_before_t_iter = sampled_graph.number_of_edges() 50 | return sampled_graph 51 | 52 | def random_walk_sampling_with_fly_back(self, complete_graph, nodes_to_sample, fly_back_prob): 53 | complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True) 54 | # giving unique id to every node same as built-in function id 55 | for n, data in complete_graph.nodes(data=True): 56 | complete_graph.nodes[n]['id'] = n 57 | 58 | nr_nodes = len(complete_graph.nodes()) 59 | upper_bound_nr_nodes_to_sample = nodes_to_sample 60 | 61 | index_of_first_random_node = random.randint(0, nr_nodes - 1) 62 | sampled_graph = nx.Graph() 63 | 64 | sampled_graph.add_node(complete_graph.nodes[index_of_first_random_node]['id']) 65 | 66 | iteration = 1 67 | edges_before_t_iter = 0 68 | curr_node = index_of_first_random_node 69 | while sampled_graph.number_of_nodes() != upper_bound_nr_nodes_to_sample: 70 | edges = [n for n in complete_graph.neighbors(curr_node)] 71 | index_of_edge = random.randint(0, len(edges) - 1) 72 | chosen_node = edges[index_of_edge] 73 | sampled_graph.add_node(chosen_node) 74 | sampled_graph.add_edge(curr_node, chosen_node) 75 | choice = np.random.choice(['prev', 'neigh'], 1, p=[fly_back_prob, 1 - fly_back_prob]) 76 | if choice == 'neigh': 77 | curr_node = chosen_node 78 | iteration = iteration + 1 79 | 80 | if iteration % self.T == 0: 81 | if ((sampled_graph.number_of_edges() - edges_before_t_iter) < self.growth_size): 82 | curr_node = random.randint(0, nr_nodes - 1) 83 | print("Choosing another random node to continue random walk ") 84 | edges_before_t_iter = sampled_graph.number_of_edges() 85 | 86 | return sampled_graph 87 | 88 | def random_walk_induced_graph_sampling(self, complete_graph, nodes_to_sample): 89 | complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True) 90 | # giving unique id to every node same as built-in function id 91 | for n, data in complete_graph.nodes(data=True): 92 | complete_graph.nodes[n]['id'] = n 93 | 94 | nr_nodes = len(complete_graph.nodes()) 95 | upper_bound_nr_nodes_to_sample = nodes_to_sample 96 | index_of_first_random_node = random.randint(0, nr_nodes - 1) 97 | 98 | Sampled_nodes = set([complete_graph.nodes[index_of_first_random_node]['id']]) 99 | 100 | iteration = 1 101 | nodes_before_t_iter = 0 102 | curr_node = index_of_first_random_node 103 | while len(Sampled_nodes) != upper_bound_nr_nodes_to_sample: 104 | edges = [n for n in complete_graph.neighbors(curr_node)] 105 | index_of_edge = random.randint(0, len(edges) - 1) 106 | chosen_node = edges[index_of_edge] 107 | Sampled_nodes.add(complete_graph.nodes[chosen_node]['id']) 108 | curr_node = chosen_node 109 | iteration = iteration + 1 110 | 111 | if iteration % self.T == 0: 112 | if ((len(Sampled_nodes) - nodes_before_t_iter) < self.growth_size): 113 | curr_node = random.randint(0, nr_nodes - 1) 114 | nodes_before_t_iter = len(Sampled_nodes) 115 | 116 | sampled_graph = complete_graph.subgraph(Sampled_nodes) 117 | 118 | return sampled_graph 119 | -------------------------------------------------------------------------------- /Graph_Sampling/Snowball.py: -------------------------------------------------------------------------------- 1 | # import json 2 | # import sys 3 | import random 4 | # import math 5 | # import time 6 | import networkx as nx 7 | # import matplotlib.pyplot as plt 8 | # from collections import defaultdict 9 | 10 | 11 | class Queue(): 12 | # Constructor creates a list 13 | def __init__(self): 14 | self.queue = list() 15 | 16 | # Adding elements to queue 17 | def enqueue(self, data): 18 | # Checking to avoid duplicate entry (not mandatory) 19 | if data not in self.queue: 20 | self.queue.insert(0, data) 21 | return True 22 | return False 23 | 24 | # Removing the last element from the queue 25 | def dequeue(self): 26 | if len(self.queue) > 0: 27 | return self.queue.pop() 28 | else: 29 | # plt.show() 30 | exit() 31 | 32 | # Getting the size of the queue 33 | def size(self): 34 | return len(self.queue) 35 | 36 | # printing the elements of the queue 37 | def printQueue(self): 38 | return self.queue 39 | 40 | 41 | class Snowball(): 42 | 43 | def __init__(self): 44 | self.G1 = nx.Graph() 45 | 46 | def snowball(self, G, size, k): 47 | q = Queue() 48 | list_nodes = list(G.nodes()) 49 | m = k 50 | dictt = set() 51 | while(m): 52 | id = random.sample(list(G.nodes()), 1)[0] 53 | q.enqueue(id) 54 | m = m - 1 55 | # print(q.printQueue()) 56 | while(len(self.G1.nodes()) <= size): 57 | if(q.size() > 0): 58 | id = q.dequeue() 59 | self.G1.add_node(id) 60 | if(id not in dictt): 61 | dictt.add(id) 62 | list_neighbors = list(G.neighbors(id)) 63 | if(len(list_neighbors) > k): 64 | for x in list_neighbors[:k]: 65 | q.enqueue(x) 66 | self.G1.add_edge(id, x) 67 | elif(len(list_neighbors) <= k and len(list_neighbors) > 0): 68 | for x in list_neighbors: 69 | q.enqueue(x) 70 | self.G1.add_edge(id, x) 71 | else: 72 | continue 73 | else: 74 | initial_nodes = random.sample(list(G.nodes()) and list(dictt), k) 75 | no_of_nodes = len(initial_nodes) 76 | for id in initial_nodes: 77 | q.enqueue(id) 78 | return self.G1 79 | -------------------------------------------------------------------------------- /Graph_Sampling/TIES.py: -------------------------------------------------------------------------------- 1 | import random 2 | import networkx as nx 3 | # import matplotlib.pyplot as plt 4 | import math 5 | # import time 6 | # import csv 7 | # from datetime import datetime 8 | 9 | 10 | class TIES(): 11 | def __init__(self): 12 | self.G1 = nx.Graph() 13 | 14 | def ties(self, G, size, phi): 15 | V = G.nodes() 16 | # Calculate number of nodes in Graph G 17 | Vs = [] 18 | # Empty list Vs 19 | phi = round((phi * 0.01), 2) 20 | while (len(Vs)) <= math.floor(phi * len(V)): 21 | # Loops run till sample size * length of V where V is number of nodes in graph as calculated above. 22 | edges_sample = random.sample(G.edges(), 1) 23 | # Randomly samples one edge from a graph at a time 24 | for a1, a2 in edges_sample: 25 | # Nodes corresponding to sample edge are retrieved and added in Graph G1 26 | self.G1.add_edge(a1, a2) 27 | if (a1 not in Vs): 28 | Vs.append(a1) 29 | if (a2 not in Vs): 30 | Vs.append(a2) 31 | # Statement written just to have a check of a program 32 | 33 | for x in self.G1.nodes(): 34 | neigh = (set(self.G1.nodes()) & set(list(G.neighbors(x)))) 35 | # Check neighbours of sample node and if the nodes are their in sampled set then edge is included between them. 36 | for y in neigh: 37 | # Check for every node's neighbour in sample set of nodes 38 | self.G1.add_edge(x, y) 39 | # Add edge between the sampled nodes 40 | return self.G1 41 | -------------------------------------------------------------------------------- /Graph_Sampling/__init__.py: -------------------------------------------------------------------------------- 1 | from Graph_Sampling.SRW_RWF_ISRW import SRW_RWF_ISRW 2 | from Graph_Sampling.Snowball import Snowball, Queue 3 | from Graph_Sampling.ForestFire import ForestFire 4 | from Graph_Sampling.MHRW import MHRW 5 | from Graph_Sampling.TIES import TIES -------------------------------------------------------------------------------- /Graph_Sampling/__pycache__/ForestFire.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/ForestFire.cpython-37.pyc -------------------------------------------------------------------------------- /Graph_Sampling/__pycache__/MHRW.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/MHRW.cpython-37.pyc -------------------------------------------------------------------------------- /Graph_Sampling/__pycache__/SRW_RWF_ISRW.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/SRW_RWF_ISRW.cpython-37.pyc -------------------------------------------------------------------------------- /Graph_Sampling/__pycache__/Snowball.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/Snowball.cpython-37.pyc -------------------------------------------------------------------------------- /Graph_Sampling/__pycache__/TIES.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/TIES.cpython-37.pyc -------------------------------------------------------------------------------- /Graph_Sampling/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Ashish Aggarwal 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Graph Sampling Package 2 | 3 | [Social Network Analysis](https://en.wikipedia.org/wiki/Social_network_analysis) (SNA) has recently been gaining more and more popularity in various domains. Unfortunately, performing SNA is not always an easy task, due to the volume of data which translates to huge network/graph, it is very time consuming and [computationally expensive](https://en.wikipedia.org/wiki/Computational_complexity) to perform analysis on these graphs. Depending on the type of task, handling graphs with even just dozens of thousands of nodes can be unfeasible, as some studies show. An intuitive solution to deal with this situation, just as in any scenario where we have a massive amount of data, is to sample the graph and then perform relevant simulation/analysis on obtained [sub-graph](https://en.wikipedia.org/wiki/Subgraph). 4 |
Graph sampling is a technique to pick a subset of vertices or edges from original graph. The biggest advantage of sampling methods are their execution efficiency so that the graph transformation procedure won’t take longer time than straightforward computation on original graph. This is a simple sampling repo that helps you find a representative sample of the original graph via different Sampling Techniques.
5 | 6 | ### 1. Sampling by Exploration 7 | Exploration or traversal (also called topology-based) approaches are based on the idea of randomly selecting one node and then exploring its neighborhood. Sampling algorithms based on this techniques are : 8 | 9 | - **Simple Random Walk Sampling (SRW) :** Uniformly at random pick a starting node and then simulate a [random walk](https://people.math.osu.edu/husen.1/teaching/571/random_walks.pdf)(select neighboring node uniformly and randomly) on the graph. Random walk is continued until we reach the required sample size. 10 |In the unconnected graph, it is possible that there is no node in the component that could be added to the sample. To handle this we defined a time period and an expected growth size in that period and after some iterations check whether the sample growth is large enough and if not, select again the node randomly to continue random walk. This way we ensure that the sample will reach the required size .
11 | ```sh 12 | sampled_graph = random_walk_sampling_simple(complete_graph, nodes_to_sample) 13 | ``` 14 | 15 | - **Random Walk Sampling with Fly Back Probability (RWF) :** In *SRW* at any stage, we choose only one of the neighboring node to continue random walk. Choosing only one neighboring node affects graph properties like average degree which in turn affect many properties related to it. 16 |RWF is a variation of random walk to improve the performance. The Fly-back probability (p) is used to sample more than one neighboring node at any stage of already sampled node. RWF picks a node uniformly at random as start point and begins a sequence. At each step, with 1-p probability it selects one node among neighbors of the current node with equal probability and moves to that node. If the neighboring node or the corresponding edge does not exist in the sample graph, they will be added to the graph; with p probability, we will fly back to the starting point. This ensures that the neighborhood of a selected node could be sufficiently explored. The higher the fly back probability, the more similar random walk is to Breadth First Search.
17 |To avoid being stuck we defined a time period and an expected growth size in that period and after iterations check whether the sample growth is large enough and if not, select again the node randomly to continue random walk. This way we ensure that the sample will reach the required size.
18 | ```sh 19 | sampled_graph = random_walk_sampling_with_fly_back(complete_graph,nodes_to_sample,p) 20 | ``` 21 | - **Induced Subgraph Random Walk Sampling (ISRW) :** We observed that *SRW* and *RWF* fundamentally biases the structure of the sampled subgraph, as at every step we choose only one neighbor uniformly and randomly of the node we sampled at the previous iteration. When a node is selected for inclusion in the sample, it is unlikely that all of its neighbors will be included in the sampled subgraph, and thus, sampled degrees of nodes tend to be smaller than original degrees. As random walk moves in the linear fashion, the connectivity in the sampled subgraph was also quite sparse due to under-sampling of edges. This under-sampling of edges caused overestimation of shortest path lengths in sampled subgraphs. Hence, this conventional wisdom of selecting nodes in an unbiased manner (e.g., uniformly at random) may not yield representative subgraphs that match the properties of the original graph. 22 |So, we presented our new sampling strategy, Induced Subgraph Random Walk Sampling (ISRW), which tries to overcome the problem of undersampling of edges in SRW. We applied graph induction step to SRW to select additional edges between sampled nodes with the aim to restore connectivity and bring the structure closer to that of the original graph.
23 | ```sh 24 | sampled_graph = random_walk_induced_graph_sampling(complete_graph, nodes_to_sample) 25 | ``` 26 | - **Snowball Sampling (SB) :** Snowball Sampling is a variant of [Breadth First Search](https://en.wikipedia.org/wiki/Breadth-first_search) where there is limit on the number of neighbors k that are added to the sample. Begin from a random set of nodes of size k. After that each of the new k nodes are added that make the second sampling stage. This continues until the sample size is reached. 27 |The Snowball sampling is a type of a sampling by exploration in which each individual in the sample is asked to name k different individuals in the population, where k is a specified integer; for example, each individual may be asked to name his "k colleagues". The individuals who were not in the random sample already but were named by individuals in it form the first stage. Each of the individuals in the first stage is then asked to name k different individuals. The individuals who were not in the random sample nor in the first stage but were named by individuals who were in the first stage form the second stage. Each of the individuals in the second stage is then asked to name k different individuals. The individuals who were not in the random sample nor in the first or second stages but were named by individuals who were in the second stage form the third stage. This procedure is continued until each of the individuals in the d-th stage has been asked to name k different individuals.
28 |Snowball Sampling starts with a set of nodes, say k. For each node, exactly k of it's neighbours are extracted. If the neighbours of a node is less than k , then all the neighbours of the node are extracted. This process continues until the required sample size is reached. In this way, a sampled graph is extracted from the original graph.
29 | 30 | ```sh 31 | sampled_graph = snowball(complete_graph, nodes_to_sample, k) 32 | ``` 33 | - **ForestFire Sampling (FF) :** Randomly pick a seed node and begin “burning” outgoing links and the corresponding nodes. If a link gets burned, the node at the other endpoint gets a chance to burn its own links. This process is recursively repeated for each burnt neighbor until no new node is selected, and a new random node is chosen to start the process until we obtain the desired sample size. 34 | ```sh 35 | sampled_graph = forestfire(complete_graph, nodes_to_sample) 36 | ``` 37 | - **Metropolis Hastings Random Walk Sampling (MHRW) :** This is very similar to random walk sampling. Initially, a user selected node *v* with non-zero degree is set as the seed. We define the proposal function as *Q(v) = kv*, which is the degree of node *v*. From node *v’s* neighbors, *MHRW* randomly chooses a node *w*, and then generates a random number *p* from uniform distribution *U(0, 1)*. If *p ≤ Q(v)/Q(w)*, the proposal is accepted and the sampling process will transit to *w*; otherwise, it stays at node *v*. MHRW stops when the budget is reached. 38 | ```sh 39 | sampled_graph = mhrw(complete_graph, nodes_to_sample, initial_seed_node) 40 | ``` 41 | - **Induced Metropolis Hastings Random Walk Sampling (Induced-MHRW) :** This is the improvement in MHRW sampling by appling [induction](https://en.wikipedia.org/wiki/Induced_subgraph) step to add additional edges. 42 | ```sh 43 | sampled_graph = induced_mhrw(complete_graph, nodes_to_sample, initial_seed_node) 44 | ``` 45 | 46 | ### 2. Edge Sampling 47 | Edge sampling focuses on the selection of edges rather than nodes to populate the sample. Thus, the node selection step in edge sampling algorithm proceeds by just sampling edges, and including both nodes when a particular edge is sampled. 48 | - **Total Induction Edge Sampling (TIES) :** The algorithm runs in an iterative fashion, picking an edge at random from the original graph and adding both the nodes to the sampled node set in each iteration as in the classic [edge sampling](https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2743&context=cstech) approach. It stops adding nodes once a target fraction *φ* of nodes are collected. After this, the algorithm proceeds to the graph induction step where it walks through all the edges in the graph and forms the induced graph by adding all edges which have both end-points already in the sampled node set. 49 | ```sh 50 | sampled_graph = ties(complete_graph, nodes_to_sample, φ) 51 | ``` 52 | 53 | 54 | ### Pre-requisite 55 | The Graph Sampling package requires [Python](https://www.python.org/downloads/) 3.X . If you don't have the pre-installed python in your system, please follow up the python link to download it. This package also requires [Networkx](https://networkx.github.io/documentation/latest/install.html) 2.1 or newer which helps to create the graphs and also perform manipulations on them. 56 | 57 | ### Installing the development version 58 | If you have Git installed on your system, then it is also possible to install the development version of Graph Sampling package by running these commands on your terminal: 59 | ```sh 60 | $ git clone https://github.com/Ashish7129/Graph_Sampling.git 61 | $ cd Graph_Sampling 62 | $ pip install -e . 63 | ``` 64 | Or you can install the current release of Graph Sampling package with pip. Please download the zip file and locate it into the current folder and then run the following command for installing the graph sampling package into your system: 65 | ```sh 66 | $ python setup.py sdist bdist_wheel 67 | $ pip install dist/Graph_Sampling-0.0.1-py3-none-any.whl 68 | ``` 69 | 70 | ### Usage 71 | 72 | After installing the package, you can use the package by writing the following command: 73 | ```sh 74 | >>> import Graph_Sampling 75 | ``` 76 | ### Example 77 | Check out the file test.py, which helps you to understand the procedure of executing different functions along with their type and number of arguments. For example, snowball sampling fuction is excecuted as follows: 78 | ```sh 79 | >>> object = Graph_Sampling.Snowball() 80 | >>> sampled_subgraph = object.snowball(G,size,k) 81 | ``` 82 | *The object is the instance of the class Snowball. The class having the snowball function has 3 parameters as :* 83 | - G : Original Graph / Whole Graph, 84 | - size: number of nodes to sample and 85 | - k: initial set of k nodes. 86 | 87 | 88 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman -------------------------------------------------------------------------------- /build/lib/Graph_Sampling/ForestFire.py: -------------------------------------------------------------------------------- 1 | import random 2 | import networkx as nx 3 | import matplotlib.pyplot as plt 4 | import time 5 | 6 | # G : Original Graph 7 | # size : size of the sampled graph 8 | class ForestFire(): 9 | def __init__(self): 10 | self.G1 = nx.Graph() 11 | 12 | def forestfire(self,G,size): 13 | list_nodes=list(G.nodes()) 14 | #print(len(G)) 15 | dictt = set() 16 | random_node = random.sample(set(list_nodes),1)[0] 17 | #print(random_node) 18 | q = set() #q = set contains the distinct values 19 | q.add(random_node) 20 | while(len(self.G1.nodes())