├── .github
    └── ISSUE_TEMPLATE
    │   ├── bug_report.md
    │   └── improvement-request.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Graph_Sampling.egg-info
    ├── PKG-INFO
    ├── SOURCES.txt
    ├── dependency_links.txt
    └── top_level.txt
├── Graph_Sampling
    ├── ForestFire.py
    ├── MHRW.py
    ├── SRW_RWF_ISRW.py
    ├── Snowball.py
    ├── TIES.py
    ├── __init__.py
    └── __pycache__
    │   ├── ForestFire.cpython-37.pyc
    │   ├── MHRW.cpython-37.pyc
    │   ├── SRW_RWF_ISRW.cpython-37.pyc
    │   ├── Snowball.cpython-37.pyc
    │   ├── TIES.cpython-37.pyc
    │   └── __init__.cpython-37.pyc
├── LICENSE
├── README.md
├── _config.yml
├── build
    └── lib
    │   └── Graph_Sampling
    │       ├── ForestFire.py
    │       ├── MHRW.py
    │       ├── SRW_RWF_ISRW.py
    │       ├── Snowball.py
    │       ├── TIES.py
    │       └── __init__.py
├── dist
    ├── Graph Sampling-0.0.1.tar.gz
    └── Graph_Sampling-0.0.1-py3-none-any.whl
├── fb.txt
├── setup.py
└── test.py


/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Bug report
 3 | about: Create a report to help us improve
 4 | 
 5 | ---
 6 | 
 7 | **Describe the bug**
 8 | A clear and concise description of what the bug is.
 9 | 
10 | **To Reproduce**
11 | Steps to reproduce the behavior:
12 | 1. Go to '...'
13 | 2. Click on '....'
14 | 3. Scroll down to '....'
15 | 4. See error
16 | 
17 | **Expected behavior**
18 | A clear and concise description of what you expected to happen.
19 | 
20 | **Screenshots**
21 | If applicable, add screenshots to help explain your problem.
22 | 
23 | **Desktop (please complete the following information):**
24 |  - OS: [e.g. iOS]
25 |  - Browser [e.g. chrome, safari]
26 |  - Version [e.g. 22]
27 | 
28 | **Smartphone (please complete the following information):**
29 |  - Device: [e.g. iPhone6]
30 |  - OS: [e.g. iOS8.1]
31 |  - Browser [e.g. stock browser, safari]
32 |  - Version [e.g. 22]
33 | 
34 | **Additional context**
35 | Add any other context about the problem here.
36 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/improvement-request.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Improvement Request
 3 | about: Suggest an ideas or improvements for this project
 4 | 
 5 | ---
 6 | 
 7 | **Is your Improvement  Request related to a problem? Please describe.**
 8 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
 9 | 
10 | **Describe the solution you'd like**
11 | A clear and concise description of what you want to happen.
12 | 
13 | **Describe alternatives you've considered**
14 | A clear and concise description of any alternative solutions or features you've considered.
15 | 
16 | **Additional context**
17 | Add any other context or screenshots about the feature request here.
18 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Covenant Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
 6 | 
 7 | ## Our Standards
 8 | 
 9 | Examples of behavior that contributes to creating a positive environment include:
10 | 
11 | * Using welcoming and inclusive language
12 | * Being respectful of differing viewpoints and experiences
13 | * Gracefully accepting constructive criticism
14 | * Focusing on what is best for the community
15 | * Showing empathy towards other community members
16 | 
17 | Examples of unacceptable behavior by participants include:
18 | 
19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
20 | * Trolling, insulting/derogatory comments, and personal or political attacks
21 | * Public or private harassment
22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
23 | * Other conduct which could reasonably be considered inappropriate in a professional setting
24 | 
25 | ## Our Responsibilities
26 | 
27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28 | 
29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30 | 
31 | ## Scope
32 | 
33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34 | 
35 | ## Enforcement
36 | 
37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at ashish.mcs16.du@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38 | 
39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40 | 
41 | ## Attribution
42 | 
43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44 | 
45 | [homepage]: http://contributor-covenant.org
46 | [version]: http://contributor-covenant.org/version/1/4/
47 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | People who would like to contribute in this graph sampling so they can write your new technique or can also improve existing ones. 
2 | 


--------------------------------------------------------------------------------
/Graph_Sampling.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
 1 | Metadata-Version: 2.1
 2 | Name: Graph-Sampling
 3 | Version: 0.0.1
 4 | Summary: Graph Sampling Package
 5 | Home-page: UNKNOWN
 6 | Authors: [Kirti Jain,Ashish Aggarwal]
 7 | Authors-email: [kirtij.mcs16.du@gmail.com,ashish.mcs16.du@gmail.com]
 8 | License: UNKNOWN
 9 | Description: A Graph Sampling package containing various approaches which samples the original graph according to different sample sizes
10 | Platform: UNKNOWN
11 | Classifier: Programming Language :: Python :: 3
12 | Classifier: License :: OSI Approved :: MIT License
13 | Classifier: Operating System :: OS Independent
14 | Description-Content-Type: text/markdown
15 | 


--------------------------------------------------------------------------------
/Graph_Sampling.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
 1 | README.md
 2 | setup.py
 3 | Graph_Sampling/ForestFire.py
 4 | Graph_Sampling/MHRW.py
 5 | Graph_Sampling/SRW_RWF_ISRW.py
 6 | Graph_Sampling/Snowball.py
 7 | Graph_Sampling/TIES.py
 8 | Graph_Sampling/__init__.py
 9 | Graph_Sampling.egg-info/PKG-INFO
10 | Graph_Sampling.egg-info/SOURCES.txt
11 | Graph_Sampling.egg-info/dependency_links.txt
12 | Graph_Sampling.egg-info/top_level.txt


--------------------------------------------------------------------------------
/Graph_Sampling.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/Graph_Sampling.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | Graph_Sampling
2 | 


--------------------------------------------------------------------------------
/Graph_Sampling/ForestFire.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | import networkx as nx
 3 | # import matplotlib.pyplot as plt
 4 | # import time
 5 | 
 6 | 
 7 | # G : Original Graph
 8 | # size : size of the sampled graph
 9 | class ForestFire():
10 |     def __init__(self):
11 |         self.G1 = nx.Graph()
12 | 
13 |     def forestfire(self, G, size):
14 |         list_nodes = list(G.nodes())
15 |         # print(len(G))
16 |         dictt = set()
17 |         random_node = random.sample(set(list_nodes), 1)[0]
18 |         # print(random_node)
19 |         q = set()   # q = set contains the distinct values
20 |         q.add(random_node)
21 |         while(len(self.G1.nodes()) < size):
22 |             if(len(q) > 0):
23 |                 initial_node = q.pop()
24 |                 if(initial_node not in dictt):
25 |                     # print(initial_node)
26 |                     dictt.add(initial_node)
27 |                     neighbours = list(G.neighbors(initial_node))
28 |                     # print(list(G.neighbors(initial_node)))
29 |                     np = random.randint(1, len(neighbours))
30 |                     # print(np)
31 |                     # print(neighbours[:np])
32 |                     for x in neighbours[:np]:
33 |                         if(len(self.G1.nodes()) < size):
34 |                             self.G1.add_edge(initial_node, x)
35 |                             q.add(x)
36 |                         else:
37 |                             break
38 |                 else:
39 |                     continue
40 |             else:
41 |                 random_node = random.sample(set(list_nodes) and dictt, 1)[0]
42 |                 q.add(random_node)
43 |         q.clear()
44 |         return self.G1
45 | 


--------------------------------------------------------------------------------
/Graph_Sampling/MHRW.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | # import time
 3 | import networkx as nx
 4 | # import matplotlib.pyplot as plt
 5 | 
 6 | 
 7 | class MHRW():
 8 |     def __init__(self):
 9 |         self.G1 = nx.Graph()
10 | 
11 |     def mhrw(self, G, node, size):
12 |         dictt = {}
13 |         node_list = set()
14 |         node_list.add(node)
15 |         parent_node = node_list.pop()
16 |         dictt[parent_node] = parent_node
17 |         degree_p = G.degree(parent_node)
18 |         related_list = list(G.neighbors(parent_node))
19 |         node_list.update(related_list)
20 | 
21 |         while(len(self.G1.nodes()) < size):
22 |             if(len(node_list) > 0):
23 |                 child_node = node_list.pop()
24 |                 p = round(random.uniform(0, 1), 4)
25 |                 if(child_node not in dictt):
26 |                     related_listt = list(G.neighbors(child_node))
27 |                     degree_c = G.degree(child_node)
28 |                     dictt[child_node] = child_node
29 |                     if(p <= min(1, degree_p / degree_c) and child_node in list(G.neighbors(parent_node))):
30 |                         self.G1.add_edge(parent_node, child_node)
31 |                         parent_node = child_node
32 |                         degree_p = degree_c
33 |                         node_list.clear()
34 |                         node_list.update(related_listt)
35 |                     else:
36 |                         del dictt[child_node]
37 | 
38 |             # node_list set becomes empty or size is not reached
39 |             # insert some random nodes into the set for next processing
40 |             else:
41 |                 node_list.update(random.sample(set(G.nodes()) - set(self.G1.nodes()), 3))
42 |                 parent_node = node_list.pop()
43 |                 G.add_node(parent_node)
44 |                 related_list = list(G.neighbors(parent_node))
45 |                 node_list.clear()
46 |                 node_list.update(related_list)
47 |         return self.G1
48 | 
49 |     def induced_mhrw(self, G, size, node):
50 |         sampled_graph = mhrw(self.G1, G, size, node)
51 |         induced_graph = G.subgraph(sampled_graph.nodes())
52 |         return induced_graph
53 | 


--------------------------------------------------------------------------------
/Graph_Sampling/SRW_RWF_ISRW.py:
--------------------------------------------------------------------------------
  1 | import random
  2 | # import time
  3 | # import datetime
  4 | # import io
  5 | # import array, re, itertools
  6 | import numpy as np
  7 | import networkx as nx
  8 | # import matplotlib.pyplot as plt
  9 | # from itertools import groupby
 10 | 
 11 | 
 12 | class SRW_RWF_ISRW:
 13 | 
 14 |     def __init__(self):
 15 |         self.growth_size = 2
 16 |         self.T = 100    # number of iterations
 17 |         # with a probability (1-fly_back_prob) select a neighbor node
 18 |         # with a probability fly_back_prob go back to the initial vertex
 19 |         self.fly_back_prob = 0.15
 20 | 
 21 |     def random_walk_sampling_simple(self, complete_graph, nodes_to_sample):
 22 |         complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True)
 23 |         # giving unique id to every node same as built-in function id
 24 |         for n, data in complete_graph.nodes(data=True):
 25 |             complete_graph.nodes[n]['id'] = n
 26 | 
 27 |         nr_nodes = len(complete_graph.nodes())
 28 |         upper_bound_nr_nodes_to_sample = nodes_to_sample
 29 |         index_of_first_random_node = random.randint(0, nr_nodes - 1)
 30 |         sampled_graph = nx.Graph()
 31 | 
 32 |         sampled_graph.add_node(complete_graph.nodes[index_of_first_random_node]['id'])
 33 | 
 34 |         iteration = 1
 35 |         edges_before_t_iter = 0
 36 |         curr_node = index_of_first_random_node
 37 |         while sampled_graph.number_of_nodes() != upper_bound_nr_nodes_to_sample:
 38 |             edges = [n for n in complete_graph.neighbors(curr_node)]
 39 |             index_of_edge = random.randint(0, len(edges) - 1)
 40 |             chosen_node = edges[index_of_edge]
 41 |             sampled_graph.add_node(chosen_node)
 42 |             sampled_graph.add_edge(curr_node, chosen_node)
 43 |             curr_node = chosen_node
 44 |             iteration = iteration + 1
 45 | 
 46 |             if iteration % self.T == 0:
 47 |                 if ((sampled_graph.number_of_edges() - edges_before_t_iter) < self.growth_size):
 48 |                     curr_node = random.randint(0, nr_nodes - 1)
 49 |                 edges_before_t_iter = sampled_graph.number_of_edges()
 50 |         return sampled_graph
 51 | 
 52 |     def random_walk_sampling_with_fly_back(self, complete_graph, nodes_to_sample, fly_back_prob):
 53 |         complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True)
 54 |         # giving unique id to every node same as built-in function id
 55 |         for n, data in complete_graph.nodes(data=True):
 56 |             complete_graph.nodes[n]['id'] = n
 57 | 
 58 |         nr_nodes = len(complete_graph.nodes())
 59 |         upper_bound_nr_nodes_to_sample = nodes_to_sample
 60 | 
 61 |         index_of_first_random_node = random.randint(0, nr_nodes - 1)
 62 |         sampled_graph = nx.Graph()
 63 | 
 64 |         sampled_graph.add_node(complete_graph.nodes[index_of_first_random_node]['id'])
 65 | 
 66 |         iteration = 1
 67 |         edges_before_t_iter = 0
 68 |         curr_node = index_of_first_random_node
 69 |         while sampled_graph.number_of_nodes() != upper_bound_nr_nodes_to_sample:
 70 |             edges = [n for n in complete_graph.neighbors(curr_node)]
 71 |             index_of_edge = random.randint(0, len(edges) - 1)
 72 |             chosen_node = edges[index_of_edge]
 73 |             sampled_graph.add_node(chosen_node)
 74 |             sampled_graph.add_edge(curr_node, chosen_node)
 75 |             choice = np.random.choice(['prev', 'neigh'], 1, p=[fly_back_prob, 1 - fly_back_prob])
 76 |             if choice == 'neigh':
 77 |                 curr_node = chosen_node
 78 |             iteration = iteration + 1
 79 | 
 80 |             if iteration % self.T == 0:
 81 |                 if ((sampled_graph.number_of_edges() - edges_before_t_iter) < self.growth_size):
 82 |                     curr_node = random.randint(0, nr_nodes - 1)
 83 |                     print("Choosing another random node to continue random walk ")
 84 |                 edges_before_t_iter = sampled_graph.number_of_edges()
 85 | 
 86 |         return sampled_graph
 87 | 
 88 |     def random_walk_induced_graph_sampling(self, complete_graph, nodes_to_sample):
 89 |         complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True)
 90 |         # giving unique id to every node same as built-in function id
 91 |         for n, data in complete_graph.nodes(data=True):
 92 |             complete_graph.nodes[n]['id'] = n
 93 | 
 94 |         nr_nodes = len(complete_graph.nodes())
 95 |         upper_bound_nr_nodes_to_sample = nodes_to_sample
 96 |         index_of_first_random_node = random.randint(0, nr_nodes - 1)
 97 | 
 98 |         Sampled_nodes = set([complete_graph.nodes[index_of_first_random_node]['id']])
 99 | 
100 |         iteration = 1
101 |         nodes_before_t_iter = 0
102 |         curr_node = index_of_first_random_node
103 |         while len(Sampled_nodes) != upper_bound_nr_nodes_to_sample:
104 |             edges = [n for n in complete_graph.neighbors(curr_node)]
105 |             index_of_edge = random.randint(0, len(edges) - 1)
106 |             chosen_node = edges[index_of_edge]
107 |             Sampled_nodes.add(complete_graph.nodes[chosen_node]['id'])
108 |             curr_node = chosen_node
109 |             iteration = iteration + 1
110 | 
111 |             if iteration % self.T == 0:
112 |                 if ((len(Sampled_nodes) - nodes_before_t_iter) < self.growth_size):
113 |                     curr_node = random.randint(0, nr_nodes - 1)
114 |                 nodes_before_t_iter = len(Sampled_nodes)
115 | 
116 |         sampled_graph = complete_graph.subgraph(Sampled_nodes)
117 | 
118 |         return sampled_graph
119 | 


--------------------------------------------------------------------------------
/Graph_Sampling/Snowball.py:
--------------------------------------------------------------------------------
 1 | # import json
 2 | # import sys
 3 | import random
 4 | # import math
 5 | # import time
 6 | import networkx as nx
 7 | # import matplotlib.pyplot as plt
 8 | # from collections import defaultdict
 9 | 
10 | 
11 | class Queue():
12 |     # Constructor creates a list
13 |     def __init__(self):
14 |         self.queue = list()
15 | 
16 |     # Adding elements to queue
17 |     def enqueue(self, data):
18 |         # Checking to avoid duplicate entry (not mandatory)
19 |         if data not in self.queue:
20 |             self.queue.insert(0, data)
21 |             return True
22 |         return False
23 | 
24 |     # Removing the last element from the queue
25 |     def dequeue(self):
26 |         if len(self.queue) > 0:
27 |             return self.queue.pop()
28 |         else:
29 |             # plt.show()
30 |             exit()
31 | 
32 |     # Getting the size of the queue
33 |     def size(self):
34 |         return len(self.queue)
35 | 
36 |     # printing the elements of the queue
37 |     def printQueue(self):
38 |         return self.queue
39 | 
40 | 
41 | class Snowball():
42 | 
43 |     def __init__(self):
44 |         self.G1 = nx.Graph()
45 | 
46 |     def snowball(self, G, size, k):
47 |         q = Queue()
48 |         list_nodes = list(G.nodes())
49 |         m = k
50 |         dictt = set()
51 |         while(m):
52 |             id = random.sample(list(G.nodes()), 1)[0]
53 |             q.enqueue(id)
54 |             m = m - 1
55 |         # print(q.printQueue())
56 |         while(len(self.G1.nodes()) <= size):
57 |             if(q.size() > 0):
58 |                 id = q.dequeue()
59 |                 self.G1.add_node(id)
60 |                 if(id not in dictt):
61 |                     dictt.add(id)
62 |                     list_neighbors = list(G.neighbors(id))
63 |                     if(len(list_neighbors) > k):
64 |                         for x in list_neighbors[:k]:
65 |                             q.enqueue(x)
66 |                             self.G1.add_edge(id, x)
67 |                     elif(len(list_neighbors) <= k and len(list_neighbors) > 0):
68 |                         for x in list_neighbors:
69 |                             q.enqueue(x)
70 |                             self.G1.add_edge(id, x)
71 |                 else:
72 |                     continue
73 |             else:
74 |                 initial_nodes = random.sample(list(G.nodes()) and list(dictt), k)
75 |                 no_of_nodes = len(initial_nodes)
76 |                 for id in initial_nodes:
77 |                     q.enqueue(id)
78 |         return self.G1
79 | 


--------------------------------------------------------------------------------
/Graph_Sampling/TIES.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | import networkx as nx
 3 | # import matplotlib.pyplot as plt
 4 | import math
 5 | # import time
 6 | # import csv
 7 | # from datetime import datetime
 8 | 
 9 | 
10 | class TIES():
11 |     def __init__(self):
12 |         self.G1 = nx.Graph()
13 | 
14 |     def ties(self, G, size, phi):
15 |         V = G.nodes()
16 |         # Calculate number of nodes in Graph G
17 |         Vs = []
18 |         # Empty list Vs
19 |         phi = round((phi * 0.01), 2)
20 |         while (len(Vs)) <= math.floor(phi * len(V)):
21 |         # Loops run till sample size * length of V where V is number of nodes in graph as calculated above.
22 |             edges_sample = random.sample(G.edges(), 1)
23 |             # Randomly samples one edge from a graph at a time
24 |             for a1, a2 in edges_sample:
25 |             # Nodes corresponding to sample edge are retrieved and added in Graph G1
26 |                 self.G1.add_edge(a1, a2)
27 |                 if (a1 not in Vs):
28 |                     Vs.append(a1)
29 |                 if (a2 not in Vs):
30 |                     Vs.append(a2)
31 |         # Statement written just to have a check of a program
32 | 
33 |         for x in self.G1.nodes():
34 |             neigh = (set(self.G1.nodes()) & set(list(G.neighbors(x))))
35 |             # Check neighbours of sample node and if the nodes are their in sampled set then edge is included between them.
36 |             for y in neigh:
37 |             # Check for every node's neighbour in sample set of nodes
38 |                 self.G1.add_edge(x, y)
39 |                 # Add edge between the sampled nodes
40 |         return self.G1
41 | 


--------------------------------------------------------------------------------
/Graph_Sampling/__init__.py:
--------------------------------------------------------------------------------
1 | from Graph_Sampling.SRW_RWF_ISRW import SRW_RWF_ISRW
2 | from Graph_Sampling.Snowball import Snowball, Queue
3 | from Graph_Sampling.ForestFire import ForestFire
4 | from Graph_Sampling.MHRW import MHRW
5 | from Graph_Sampling.TIES import TIES


--------------------------------------------------------------------------------
/Graph_Sampling/__pycache__/ForestFire.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/ForestFire.cpython-37.pyc


--------------------------------------------------------------------------------
/Graph_Sampling/__pycache__/MHRW.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/MHRW.cpython-37.pyc


--------------------------------------------------------------------------------
/Graph_Sampling/__pycache__/SRW_RWF_ISRW.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/SRW_RWF_ISRW.cpython-37.pyc


--------------------------------------------------------------------------------
/Graph_Sampling/__pycache__/Snowball.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/Snowball.cpython-37.pyc


--------------------------------------------------------------------------------
/Graph_Sampling/__pycache__/TIES.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/TIES.cpython-37.pyc


--------------------------------------------------------------------------------
/Graph_Sampling/__pycache__/__init__.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/Graph_Sampling/__pycache__/__init__.cpython-37.pyc


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Ashish Aggarwal
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Graph Sampling Package
 2 | 
 3 | [Social Network Analysis](https://en.wikipedia.org/wiki/Social_network_analysis) (SNA) has recently been gaining more and more popularity in various domains. Unfortunately, performing SNA is not always an easy task, due to the volume of data which translates to huge network/graph, it is very time consuming and [computationally expensive](https://en.wikipedia.org/wiki/Computational_complexity) to perform analysis on these graphs. Depending on the type of task, handling graphs with even just dozens of thousands of nodes can be unfeasible, as some studies show. An intuitive solution to deal with this situation, just as in any scenario where we have a massive amount of data, is to sample the graph and then perform relevant simulation/analysis on obtained [sub-graph](https://en.wikipedia.org/wiki/Subgraph).
 4 | <p> <a href="https://en.wikipedia.org/wiki/Sampling_(statistics)">Graph sampling</a> is a technique to pick a subset of vertices or edges from original graph. The biggest advantage of sampling methods are their execution efficiency so that the graph transformation procedure won’t take longer time than straightforward computation on original graph. This is a simple sampling repo that helps you find a representative sample of the original graph via different <a href="https://cs.stanford.edu/~jure/pubs/sampling-kdd06.pdf">Sampling Techniques</a>.</p> 
 5 | 
 6 | ### 1. Sampling by Exploration
 7 | Exploration or traversal (also called topology-based) approaches are based on the idea of randomly selecting one node and then exploring its neighborhood. Sampling algorithms based on this techniques are :
 8 | 
 9 | - **Simple Random Walk Sampling (SRW) :** Uniformly at random pick a starting node and then simulate a [random walk](https://people.math.osu.edu/husen.1/teaching/571/random_walks.pdf)(select neighboring node uniformly and randomly) on the graph. Random walk is continued until we reach the required sample size.
10 |   <p> In the unconnected graph, it is possible that there is no node in the component that could be added to the sample. To handle this we defined a time period and an expected growth size in that period and after some iterations check whether the sample growth is large enough and if not, select again the node randomly to continue random walk. This way we ensure that the sample will reach the required size .</p>
11 | ```sh 
12 |   sampled_graph = random_walk_sampling_simple(complete_graph, nodes_to_sample)
13 | ```
14 | 
15 | - **Random Walk Sampling with Fly Back Probability (RWF) :** In *SRW* at any stage, we choose only one of the neighboring node to continue random walk. Choosing only one neighboring node affects graph properties like average degree which in turn affect many properties related to it.
16 |   <p> <em>RWF</em> is a variation of random walk to improve the performance. The Fly-back probability <em>(p)</em> is used to sample more than one neighboring node at any stage of already sampled node. <em>RWF</em> picks a node uniformly at random as start point and begins a sequence. At each step, with <em>1-p</em> probability it selects one node among neighbors of the current node with equal probability and moves to that node. If the neighboring node or the corresponding edge does not exist in the sample graph, they will be added to the graph; with <em>p</em> probability, we will fly back to the starting point. This ensures that the neighborhood of a selected node could be sufficiently explored. The higher the fly back probability, the more similar random walk is to <a href="https://en.wikipedia.org/wiki/Breadth-first_search">Breadth First Search</a>. </p>
17 |   <p>To avoid being stuck we defined a time period and an expected growth size in that period and after iterations check whether the sample growth is large enough and if not, select again the node randomly to continue random walk. This way we ensure that the sample will reach the required size.</p>
18 | ```sh 
19 |   sampled_graph = random_walk_sampling_with_fly_back(complete_graph,nodes_to_sample,p)
20 | ```
21 | - **Induced Subgraph Random Walk Sampling (ISRW) :**  We observed that *SRW* and *RWF* fundamentally biases the structure of the sampled subgraph, as at every step we choose only one neighbor uniformly and randomly of the node we sampled at the previous iteration. When a node is selected for inclusion in the sample, it is unlikely that all of its neighbors will be included in the sampled subgraph, and thus, sampled degrees of nodes tend to be smaller than original degrees. As random walk moves in the linear fashion, the connectivity in the sampled subgraph was also quite sparse due to under-sampling of edges. This under-sampling of edges caused overestimation of shortest path lengths in sampled subgraphs. Hence, this conventional wisdom of selecting nodes in an unbiased manner (e.g., uniformly at random) may not yield representative subgraphs that match the properties of the original graph.
22 |   <p>So, we presented our new sampling strategy, <em>Induced Subgraph Random Walk Sampling (ISRW)</em>, which tries to overcome the problem of undersampling of edges in <em>SRW</em>. We applied graph <a href="https://en.wikipedia.org/wiki/Induced_subgraph">induction</a> step to <em>SRW</em> to select additional edges between sampled nodes with the aim to restore connectivity and bring the structure closer to that of the original graph.</p>
23 | ```sh 
24 |   sampled_graph = random_walk_induced_graph_sampling(complete_graph, nodes_to_sample)
25 | ```
26 | - **Snowball Sampling (SB) :** Snowball Sampling is a variant of [Breadth First Search](https://en.wikipedia.org/wiki/Breadth-first_search) where there is limit on the number of neighbors <em>k</em> that are added to the sample. Begin from a random set of nodes of size <em>k</em>. After that each of the new <em>k</em> nodes are added that make the second sampling stage. This continues until the sample size is reached.
27 |   <p>The Snowball sampling is a type of a sampling by exploration in which each individual in the sample is asked to name k different individuals in the population, where <em>k</em> is a specified integer; for example, each individual may be asked to name his "k colleagues". The individuals who were not in the random sample already but were named by individuals in it form the first stage. Each of the individuals in the first stage is then asked to name <em>k</em> different individuals. The individuals who were not in the random sample nor in the first stage but were named by individuals who were in the first stage form the second stage. Each of the individuals in the second stage is then asked to name <em>k</em> different individuals. The individuals who were not in the random sample nor in the first or second stages but were named by individuals who were in the second stage form the third stage. This procedure is continued until each of the individuals in the <em>d-th</em> stage has been asked to name <em>k</em> different individuals.</p>
28 |   <p>Snowball Sampling starts with a set of nodes, say <em>k</em>. For each node, exactly <em>k</em> of it's neighbours are extracted. If the neighbours of a node is less than <em>k</em> , then all the neighbours of the node are extracted. This process continues until the required sample size is reached. In this way, a sampled graph is extracted from the original graph.</p>
29 | 
30 | ```sh 
31 |   sampled_graph = snowball(complete_graph, nodes_to_sample, k) 
32 | ```
33 | - **ForestFire Sampling (FF) :** Randomly pick a seed node and begin “burning” outgoing links and the corresponding nodes. If a link gets burned, the node at the other endpoint gets a chance to burn its own links. This process is recursively repeated for each burnt neighbor until no new node is selected, and a new random node is chosen to start the process until we obtain the desired sample size.
34 | ```sh 
35 |   sampled_graph = forestfire(complete_graph, nodes_to_sample) 
36 | ```
37 | - **Metropolis Hastings Random Walk Sampling (MHRW) :** This is very similar to random walk sampling. Initially, a user selected node *v*  with non-zero degree is set as the seed. We define the proposal function as *Q(v) = k<sub>v</sub>*, which is the degree of node *v*. From node *v’s* neighbors, *MHRW* randomly chooses a node *w*, and then generates a random number *p* from uniform distribution *U(0, 1)*. If *p ≤ Q(v)/Q(w)*, the proposal is accepted and the sampling process will transit to *w*; otherwise, it stays at node *v*. MHRW stops when the budget is reached.
38 | ```sh  
39 |   sampled_graph = mhrw(complete_graph, nodes_to_sample, initial_seed_node) 
40 | ```
41 | - **Induced Metropolis Hastings Random Walk Sampling (Induced-MHRW) :** This is the improvement in MHRW sampling by appling [induction](https://en.wikipedia.org/wiki/Induced_subgraph) step to add additional edges.
42 | ```sh  
43 |   sampled_graph = induced_mhrw(complete_graph, nodes_to_sample, initial_seed_node) 
44 | ```
45 | 
46 | ### 2. Edge Sampling 
47 | Edge sampling focuses on the selection of edges rather than nodes to populate the sample. Thus, the node selection step in edge sampling algorithm proceeds by just sampling edges, and including both nodes when a particular edge is sampled.
48 | - **Total Induction Edge Sampling (TIES) :** The algorithm runs in an iterative fashion, picking an edge at random from the original graph and adding both the nodes to the sampled node set in each iteration as in the classic [edge sampling](https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2743&context=cstech) approach. It stops adding nodes once a target fraction *φ* of nodes are collected. After this, the algorithm proceeds to the graph induction step where it walks through all the edges in the graph and forms the induced graph by adding all edges which have both end-points already in the sampled node set.
49 | ```sh 
50 |   sampled_graph = ties(complete_graph, nodes_to_sample, φ)
51 | ```
52 |   
53 |   
54 | ### Pre-requisite
55 | The Graph Sampling package requires [Python](https://www.python.org/downloads/) 3.X . If you don't have the pre-installed python in your system, please follow up the python link to download it. This package also requires [Networkx](https://networkx.github.io/documentation/latest/install.html) 2.1 or newer which helps to create the graphs and also perform manipulations on them.
56 | 
57 | ### Installing the development version
58 | If you have Git installed on your system, then it is also possible to install the development version of Graph Sampling package by running these commands on your terminal:
59 | ```sh
60 | $ git clone https://github.com/Ashish7129/Graph_Sampling.git
61 | $ cd Graph_Sampling
62 | $ pip install -e .
63 | ```
64 | Or you can install the current release of Graph Sampling package with pip. Please download the zip file and locate it into the current folder and then run the following command for installing the graph sampling package into your system:
65 | ```sh
66 | $ python setup.py sdist bdist_wheel
67 | $ pip install dist/Graph_Sampling-0.0.1-py3-none-any.whl
68 | ```
69 |  
70 | ### Usage
71 | 
72 | After installing the package, you can use the package by writing the following command:
73 | ```sh
74 | >>> import Graph_Sampling 
75 | ```
76 | ### Example
77 | Check out the file test.py, which helps you to understand the procedure of executing different functions along with their type and number of arguments. For example, snowball sampling fuction is excecuted as follows:
78 | ```sh
79 | >>> object = Graph_Sampling.Snowball()             
80 | >>> sampled_subgraph = object.snowball(G,size,k) 
81 | ```
82 | *The object is the instance of the class Snowball. The class having the snowball function has 3 parameters as :*
83 |   - G : Original Graph / Whole Graph, 
84 |   - size: number of nodes to sample and 
85 |   - k: initial set of k nodes.
86 | 
87 | 
88 | 


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-cayman


--------------------------------------------------------------------------------
/build/lib/Graph_Sampling/ForestFire.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | import networkx as nx
 3 | import matplotlib.pyplot as plt
 4 | import time
 5 | 
 6 | # G : Original Graph
 7 | # size : size of the sampled graph
 8 | class ForestFire():
 9 |     def __init__(self):
10 |         self.G1 = nx.Graph()
11 | 
12 |     def forestfire(self,G,size):
13 |         list_nodes=list(G.nodes())
14 |         #print(len(G))
15 |         dictt = set()
16 |         random_node = random.sample(set(list_nodes),1)[0]
17 |         #print(random_node)
18 |         q = set() #q = set contains the distinct values
19 |         q.add(random_node)
20 |         while(len(self.G1.nodes())<size):
21 |             if(len(q)>0):
22 |                 initial_node = q.pop()
23 |                 if(initial_node not in dictt):
24 |                     #print(initial_node)
25 |                     dictt.add(initial_node)
26 |                     neighbours = list(G.neighbors(initial_node))
27 |                     #print(list(G.neighbors(initial_node)))
28 |                     np = random.randint(1,len(neighbours))
29 |                     #print(np)
30 |                     #print(neighbours[:np])
31 |                     for x in neighbours[:np]:
32 |                         if(len(self.G1.nodes())<size):
33 |                             self.G1.add_edge(initial_node,x)
34 |                             q.add(x)
35 |                         else:
36 |                             break
37 |                 else:
38 |                     continue
39 |             else:
40 |                 random_node = random.sample(set(list_nodes) and dictt,1)[0]
41 |                 q.add(random_node)
42 |         q.clear()
43 |         return self.G1
44 | 
45 | 
46 | 
47 | 
48 | 


--------------------------------------------------------------------------------
/build/lib/Graph_Sampling/MHRW.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | import time
 3 | import networkx as nx
 4 | import matplotlib.pyplot as plt
 5 | 
 6 | class MHRW():
 7 |     def __init__(self):
 8 |         self.G1 = nx.Graph()
 9 | 
10 |     def mhrw(self,G,node,size):
11 |         dictt = {}
12 |         node_list = set()
13 |         node_list.add(node)
14 |         parent_node = node_list.pop()
15 |         dictt[parent_node] = parent_node
16 |         degree_p = G.degree(parent_node)
17 |         related_list = list(G.neighbors(parent_node))
18 |         node_list.update(related_list)
19 | 
20 |         while(len(self.G1.nodes()) < size):
21 |             if(len(node_list) > 0):
22 |                 child_node = node_list.pop()
23 |                 p =  round(random.uniform(0,1),4)
24 |                 if(child_node not in dictt):
25 |                     related_listt = list(G.neighbors(child_node))
26 |                     degree_c = G.degree(child_node)
27 |                     dictt[child_node] = child_node
28 |                     if(p <= min(1,degree_p/degree_c) and child_node in list(G.neighbors(parent_node))):
29 |                         self.G1.add_edge(parent_node,child_node)
30 |                         parent_node = child_node
31 |                         degree_p = degree_c
32 |                         node_list.clear()
33 |                         node_list.update(related_listt)
34 |                     else:
35 |                         del dictt[child_node]
36 | 
37 | 
38 |             # node_list set becomes empty or size is not reached 
39 |             # insert some random nodes into the set for next processing
40 |             else:
41 |                 node_list.update(random.sample(set(G.nodes())-set(self.G1.nodes()),3))
42 |                 parent_node = node_list.pop()
43 |                 G.add_node(parent_node)
44 |                 related_list = list(G.neighbors(parent_node))
45 |                 node_list.clear()
46 |                 node_list.update(related_list)
47 |         return self.G1
48 |     
49 |     def induced_mhrw(self,G,size,node):
50 |         sampled_graph = mhrw(self.G1,G,size,node)
51 |         induced_graph = G.subgraph(sampled_graph.nodes())
52 |         return induced_graph
53 | 


--------------------------------------------------------------------------------
/build/lib/Graph_Sampling/SRW_RWF_ISRW.py:
--------------------------------------------------------------------------------
  1 | import random
  2 | import time
  3 | import datetime
  4 | import io
  5 | import array,re,itertools
  6 | import numpy as np
  7 | import networkx as nx
  8 | import matplotlib.pyplot as plt
  9 | from itertools import groupby
 10 | 
 11 | class SRW_RWF_ISRW:
 12 | 
 13 |     def __init__(self):
 14 |         self.growth_size = 2
 15 |         self.T = 100 #number of iterations
 16 |         #with a probability (1-fly_back_prob) select a neighbor node
 17 |         #with a probability fly_back_prob go back to the initial vertex
 18 |         self.fly_back_prob = 0.15
 19 | 
 20 |     def random_walk_sampling_simple(self,complete_graph, nodes_to_sample):
 21 |         complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True)
 22 |         # giving unique id to every node same as built-in function id
 23 |         for n, data in complete_graph.nodes(data=True):
 24 |             complete_graph.node[n]['id'] = n
 25 | 
 26 |         nr_nodes = len(complete_graph.nodes())
 27 |         upper_bound_nr_nodes_to_sample = nodes_to_sample
 28 |         index_of_first_random_node = random.randint(0, nr_nodes-1)
 29 |         sampled_graph = nx.Graph()
 30 | 
 31 |         sampled_graph.add_node(complete_graph.node[index_of_first_random_node]['id'])
 32 | 
 33 |         iteration = 1
 34 |         edges_before_t_iter = 0
 35 |         curr_node = index_of_first_random_node
 36 |         while sampled_graph.number_of_nodes() != upper_bound_nr_nodes_to_sample:
 37 |             edges = [n for n in complete_graph.neighbors(curr_node)]
 38 |             index_of_edge = random.randint(0, len(edges) - 1)
 39 |             chosen_node = edges[index_of_edge]
 40 |             sampled_graph.add_node(chosen_node)
 41 |             sampled_graph.add_edge(curr_node, chosen_node)
 42 |             curr_node = chosen_node
 43 |             iteration = iteration+1
 44 | 
 45 |             if iteration % self.T == 0:
 46 |                 if ((sampled_graph.number_of_edges() - edges_before_t_iter) < self.growth_size):
 47 |                     curr_node = random.randint(0, nr_nodes-1)
 48 |                 edges_before_t_iter = sampled_graph.number_of_edges()
 49 |         return sampled_graph
 50 | 
 51 |     def random_walk_sampling_with_fly_back(self,complete_graph, nodes_to_sample, fly_back_prob):
 52 |         complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True)
 53 |         # giving unique id to every node same as built-in function id
 54 |         for n, data in complete_graph.nodes(data=True):
 55 |             complete_graph.node[n]['id'] = n
 56 | 
 57 |         nr_nodes = len(complete_graph.nodes())
 58 |         upper_bound_nr_nodes_to_sample = nodes_to_sample
 59 | 
 60 |         index_of_first_random_node = random.randint(0, nr_nodes-1)
 61 |         sampled_graph = nx.Graph()
 62 | 
 63 |         sampled_graph.add_node(complete_graph.node[index_of_first_random_node]['id'])
 64 | 
 65 |         iteration = 1
 66 |         edges_before_t_iter = 0
 67 |         curr_node = index_of_first_random_node
 68 |         while sampled_graph.number_of_nodes() != upper_bound_nr_nodes_to_sample:
 69 |             edges = [n for n in complete_graph.neighbors(curr_node)]
 70 |             index_of_edge = random.randint(0, len(edges) - 1)
 71 |             chosen_node = edges[index_of_edge]
 72 |             sampled_graph.add_node(chosen_node)
 73 |             sampled_graph.add_edge(curr_node, chosen_node)
 74 |             choice = np.random.choice(['prev','neigh'], 1, p=[fly_back_prob,1-fly_back_prob])
 75 |             if choice == 'neigh':
 76 |                 curr_node = chosen_node
 77 |             iteration=iteration+1
 78 | 
 79 |             if iteration % self.T == 0:
 80 |                 if ((sampled_graph.number_of_edges() - edges_before_t_iter) < self.growth_size):
 81 |                     curr_node = random.randint(0, nr_nodes-1)
 82 |                     print ("Choosing another random node to continue random walk ")
 83 |                 edges_before_t_iter = sampled_graph.number_of_edges()
 84 | 
 85 |         return sampled_graph
 86 | 
 87 |     def random_walk_induced_graph_sampling(self, complete_graph, nodes_to_sample):
 88 |         complete_graph = nx.convert_node_labels_to_integers(complete_graph, 0, 'default', True)
 89 |         # giving unique id to every node same as built-in function id
 90 |         for n, data in complete_graph.nodes(data=True):
 91 |             complete_graph.node[n]['id'] = n
 92 |             
 93 |         nr_nodes = len(complete_graph.nodes())
 94 |         upper_bound_nr_nodes_to_sample = nodes_to_sample
 95 |         index_of_first_random_node = random.randint(0, nr_nodes - 1)
 96 | 
 97 |         Sampled_nodes = set([complete_graph.node[index_of_first_random_node]['id']])
 98 | 
 99 |         iteration = 1
100 |         nodes_before_t_iter = 0
101 |         curr_node = index_of_first_random_node
102 |         while len(Sampled_nodes) != upper_bound_nr_nodes_to_sample:
103 |             edges = [n for n in complete_graph.neighbors(curr_node)]
104 |             index_of_edge = random.randint(0, len(edges) - 1)
105 |             chosen_node = edges[index_of_edge]
106 |             Sampled_nodes.add(complete_graph.node[chosen_node]['id'])
107 |             curr_node = chosen_node
108 |             iteration=iteration+1
109 | 
110 |             if iteration % self.T == 0:
111 |                 if ((len(Sampled_nodes) - nodes_before_t_iter) < self.growth_size):
112 |                     curr_node = random.randint(0, nr_nodes - 1)
113 |                 nodes_before_t_iter = len(Sampled_nodes)
114 | 
115 |         sampled_graph = complete_graph.subgraph(Sampled_nodes)
116 | 
117 |         return sampled_graph
118 | 


--------------------------------------------------------------------------------
/build/lib/Graph_Sampling/Snowball.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import sys
 3 | import random
 4 | import math 
 5 | import time
 6 | import networkx as nx
 7 | import matplotlib.pyplot as plt
 8 | from collections import defaultdict
 9 | 
10 | 
11 | class Queue():
12 |     #Constructor creates a list
13 |     def __init__(self):
14 |         self.queue = list()
15 |     
16 |     #Adding elements to queue
17 |     def enqueue(self,data):
18 |         #Checking to avoid duplicate entry (not mandatory)
19 |         if data not in self.queue:
20 |             self.queue.insert(0,data)
21 |             return True
22 |         return False
23 |     
24 |     #Removing the last element from the queue
25 |     def dequeue(self):
26 |         if len(self.queue)>0:
27 |             return self.queue.pop()
28 |         else:
29 |             #plt.show()
30 |             exit()
31 |     
32 |     #Getting the size of the queue
33 |     def size(self):
34 |         return len(self.queue)
35 |     
36 |     #printing the elements of the queue
37 |     def printQueue(self):
38 |         return self.queue
39 | 
40 | class Snowball():
41 | 
42 |     def __init__(self):
43 |         self.G1 = nx.Graph()
44 | 
45 |     def snowball(self,G,size,k):
46 |         q=Queue() 
47 |         list_nodes=list(G.nodes())
48 |         m = k
49 |         dictt = set()
50 |         while(m):
51 |             id = random.sample(list(G.nodes()),1)[0]
52 |             q.enqueue(id)
53 |             m = m - 1
54 |         #print(q.printQueue())
55 |         while(len(self.G1.nodes()) <= size):
56 |             if(q.size() > 0):
57 |                 id = q.dequeue()
58 |                 self.G1.add_node(id)
59 |                 if(id not in dictt):
60 |                     dictt.add(id)
61 |                     list_neighbors = list(G.neighbors(id))
62 |                     if(len(list_neighbors) > k):
63 |                         for x in list_neighbors[:k]:
64 |                             q.enqueue(x)
65 |                             self.G1.add_edge(id,x)
66 |                     elif(len(list_neighbors) <= k and len(list_neighbors) > 0):
67 |                         for x in list_neighbors:
68 |                             q.enqueue(x)
69 |                             self.G1.add_edge(id,x)
70 |                 else:
71 |                     continue
72 |             else:
73 |                 initial_nodes = random.sample(list(G.nodes()) and list(dictt),k)
74 |                 no_of_nodes = len(initial_nodes)
75 |                 for id in initial_nodes:
76 |                     q.enqueue(id) 
77 |         return self.G1
78 | 
79 | 
80 | 


--------------------------------------------------------------------------------
/build/lib/Graph_Sampling/TIES.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | import networkx as nx
 3 | import matplotlib.pyplot as plt
 4 | import math
 5 | import time
 6 | import csv
 7 | from datetime import datetime
 8 | 
 9 | class TIES():
10 |     def __init__(self):
11 |         self.G1 = nx.Graph()
12 | 
13 |     def ties(self,G,size,phi):
14 |         V = G.nodes()																	
15 |         #Calculate number of nodes in Graph G
16 |         Vs = []																			
17 |         #Empty list Vs														
18 |         phi = round((phi * 0.01), 2)
19 |         while (len(Vs)) <= math.floor(phi * len(V)):									
20 |         #Loops run till sample size * length of V where V is number of nodes in graph as calculated above.
21 |             edges_sample = random.sample(G.edges(), 1)									
22 |             #Randomly samples one edge from a graph at a time
23 |             for a1, a2 in edges_sample:													
24 |             #Nodes corresponding to sample edge are retrieved and added in Graph G1
25 |                 self.G1.add_edge(a1, a2)
26 |                 if (a1 not in Vs):
27 |                     Vs.append(a1)
28 |                 if (a2 not in Vs):
29 |                     Vs.append(a2)															
30 |         #Statement written just to have a check of a program
31 |         
32 |         for x in self.G1.nodes():
33 |             neigh = (set(self.G1.nodes()) & set(list(G.neighbors(x))))						
34 |             #Check neighbours of sample node and if the nodes are their in sampled set then edge is included between them.				
35 |             for y in neigh:																
36 |             #Check for every node's neighbour in sample set of nodes
37 |                 self.G1.add_edge(x, y)														
38 |                 #Add edge between the sampled nodes
39 |         return self.G1
40 |         


--------------------------------------------------------------------------------
/build/lib/Graph_Sampling/__init__.py:
--------------------------------------------------------------------------------
1 | from Graph_Sampling.SRW_RWF_ISRW import SRW_RWF_ISRW
2 | from Graph_Sampling.Snowball import Snowball, Queue
3 | from Graph_Sampling.ForestFire import ForestFire
4 | from Graph_Sampling.MHRW import MHRW
5 | from Graph_Sampling.TIES import TIES


--------------------------------------------------------------------------------
/dist/Graph Sampling-0.0.1.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/dist/Graph Sampling-0.0.1.tar.gz


--------------------------------------------------------------------------------
/dist/Graph_Sampling-0.0.1-py3-none-any.whl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ashish7129/Graph_Sampling/d5f16bf45d0dfe89f0b281d71b38cb70648d3065/dist/Graph_Sampling-0.0.1-py3-none-any.whl


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import setuptools
 2 | 
 3 | with open("README.md", "r",encoding="utf8") as fh:
 4 |     long_description = fh.read()
 5 | 
 6 | setuptools.setup(
 7 |     name="Graph Sampling",
 8 |     version="0.0.1",
 9 |     authors=["Kirti Jain","Ashish Aggarwal"],
10 |     authors_email=["kirtij.mcs16.du@gmail.com","ashish.mcs16.du@gmail.com"],
11 |     description="Graph Sampling Package",
12 |     long_description="A Graph Sampling package containing various approaches which samples the original graph according to different sample sizes",
13 |     long_description_content_type="text/markdown",
14 |     packages=setuptools.find_packages(),
15 |     classifiers=(
16 |         "Programming Language :: Python :: 3",
17 |         "License :: OSI Approved :: MIT License",
18 |         "Operating System :: OS Independent",
19 |     ),
20 | )
21 | 


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
 1 | import Graph_Sampling 
 2 | import networkx as nx
 3 | 
 4 | # read facebook network edge list "fb.txt" and return a graph g. 
 5 | g = nx.read_edgelist("fb.txt", create_using= nx.Graph(),nodetype=int)
 6 | 
 7 | # make an object and call function SRW
 8 | object1=Graph_Sampling.SRW_RWF_ISRW()
 9 | sample1 = object1.random_walk_sampling_simple(g,100) # graph, number of nodes to sample
10 | print("Simple Random Walk Sampling:")
11 | print("Number of nodes sampled=",len(sample1.nodes()))
12 | print("Number of edges sampled=",len(sample1.edges()))
13 | 
14 | # make an object and call function RWF
15 | object2=Graph_Sampling.SRW_RWF_ISRW()
16 | sample2= object2.random_walk_sampling_with_fly_back(g,110,0.2)  # graph, number of nodes to sample, fly-back probability
17 | print("Random Walk Sampling with flyback:")
18 | print("Number of nodes sampled=",len(sample2.nodes()))
19 | print("Number of edges sampled=",len(sample2.edges()))
20 | 
21 | # make an object and call function ISRW
22 | object3=Graph_Sampling.SRW_RWF_ISRW()
23 | sample3= object3.random_walk_induced_graph_sampling(g,120)  # graph, number of nodes to sample
24 | print("Induced Subgraph Random Walk Sampling:")
25 | print ("Number of nodes sampled=",len(sample3.nodes()))
26 | print ("Number of edges sampled=",len(sample3.edges()))
27 | 
28 | # make an object and call function SB
29 | object3=Graph_Sampling.Snowball()
30 | sample3 = object3.snowball(g,28000,25) # graph, number of nodes to sample , k set
31 | print("Snowball Sampling:")
32 | print("Number of nodes sampled=",len(sample3.nodes()))
33 | print("Number of edges sampled=",len(sample3.edges()))
34 | 
35 | # make an object and call function FF
36 | object4=Graph_Sampling.ForestFire()
37 | sample4 = object4.forestfire(g,28000) # graph, number of nodes to sample
38 | print("Forest Fire Sampling")
39 | print("Number of nodes sampled=",len(sample4.nodes()))
40 | print("Number of edges sampled=",len(sample4.edges()))
41 | 
42 | # make an object and call function MHRW
43 | object5=Graph_Sampling.MHRW()
44 | sample5 = object5.mhrw(g,28000,30) # graph, number of nodes to sample, initial seed node
45 | print("Metropolis Hasting Random Walk Sampling:")
46 | print("Number of nodes sampled=",len(sample5.nodes()))
47 | print("Number of edges sampled=",len(sample5.edges()))
48 | 
49 | # make an object and call function TIES
50 | object6=Graph_Sampling.TIES()
51 | sample6 = object6.ties(g,10,0.01) # graph, number of nodes to sample, phi
52 | print("TIES:")
53 | print("Number of nodes sampled=",len(sample6.nodes()))
54 | print("Number of edges sampled=",len(sample6.edges()))
55 | 


--------------------------------------------------------------------------------