├── .gitignore ├── ABC.png ├── README.md ├── examples ├── 01.simple.py ├── asn_timeline.py ├── bro_graph.py ├── diff.py ├── fs_graph.py ├── linked_in_csv.py ├── mergers.py ├── requirements.txt ├── sample │ ├── cc-20140624-2200-DE.json │ ├── cc-20140703-2200-DE.json │ ├── http.log │ └── http.log.readme ├── shodan_graph.py ├── ssv.py └── vcdb.py ├── semanticnet ├── DiGraph.py ├── Graph.py ├── __init__.py ├── algorithms.py └── operators.py ├── setup.py └── test ├── conftest.py ├── fixtures ├── test_output_correct.json └── test_output_correct_plaintext.json ├── test_algorithms.py ├── test_cache.py ├── test_graph.py └── test_operators.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.state* 3 | *.json 4 | *.DS_Store 5 | *__pycache__ 6 | *.coverage 7 | *.note 8 | bro/ 9 | tags 10 | *.swp 11 | *.notes 12 | build/ 13 | dist/ 14 | *.egg-info/ 15 | -------------------------------------------------------------------------------- /ABC.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ThibaultReuille/semanticnet/6c818ac2d0ddea1f2a0ec8c0823c52e91c092f8f/ABC.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | semanticnet 2 | ============ 3 | 4 | semanticnet is a small python library to create semantic graphs in JSON. 5 | Those created datasets can then be visualized with the 3D graph engine. 6 | 7 | ## A quick example 8 | ![A quick example](/ABC.png "A Quick Example") 9 | 10 | To generate and save the graph represented by this image, you would write 11 | 12 | ```python 13 | import semanticnet as sn 14 | 15 | g = sn.Graph() 16 | 17 | a = g.add_node({ "label" : "A" }) 18 | b = g.add_node({ "label" : "B" }) 19 | c = g.add_node({ "label" : "C" }) 20 | 21 | g.add_edge(a, b, { "type" : "belongs" }) 22 | g.add_edge(b, c, { "type" : "owns" }) 23 | g.add_edge(c, a, { "type" : "has" }) 24 | 25 | g.save_json("output.json") 26 | ``` 27 | 28 | which would save the graph to a file `output.json`, which could be used by 29 | [OpenGraphiti](https://github.com/ThibaultReuille/graphiti). 30 | 31 | There are several other example scripts included in this repo to demonstrate usage 32 | of SemanticNet. Each example is documented in the wiki. 33 | 34 | ## JSON representation 35 | When saving graph objects as JSON, the graph is represented internally as one might expect. 36 | Suppose you have a graph G = (V, E), where 37 | 38 | V = {0, 1, 2} and 39 | E = {(0, 1), (0, 2), (1, 2)} 40 | 41 | Suppose further that: 42 | 43 | 1. Vertex 0 has the attributes: `{"type": "A", "id": 0}` 44 | 2. Vertex 1 has the attributes: `{"type": "B", "id": 1}` 45 | 3. Vertex 2 has the attributes: `{"type": "C", "id": 2}` 46 | 4. Edge (0, 1) has the attributes: `{'src': 0, 'dst': 1, 'type': 'normal', 'id': 0}` 47 | 5. Edge (0, 2) has the attributes: `{'src': 0, 'dst': 2, 'type': 'normal', 'id': 1}` 48 | 6. Edge (1, 2) has the attributes: `{'src': 1, 'dst': 2, 'type': 'irregular', 'id': 1}` 49 | 50 | then in JSON format, it would look like: 51 | 52 | ```json 53 | { 54 | "timeline": [], 55 | "nodes": [ 56 | { 57 | "type": "A", 58 | "id": 0 59 | }, 60 | { 61 | "type": "B", 62 | "id": 1 63 | }, 64 | { 65 | "type": "C", 66 | "id": 2 67 | } 68 | ], 69 | "meta": {}, 70 | "edges": [ 71 | { 72 | "src": 0, 73 | "dst": 1, 74 | "type": "normal", 75 | "id": 0 76 | }, 77 | { 78 | "src": 0, 79 | "dst": 2, 80 | "type": "normal", 81 | "id": 1 82 | }, 83 | { 84 | "src": 1, 85 | "dst": 2, 86 | "type": "irregular", 87 | "id": 2 88 | } 89 | ] 90 | } 91 | ``` 92 | 93 | As you can see, there is a list of `"node"` objects, each of which contain the node's attributes and IDs, 94 | as well as a list of `"edge"` objects, each of which have the edge's attributes, and the fields `"src"` and `"dst"`, 95 | which indicate the source and destination vertices, respectively. 96 | 97 | Without user definition, the `"id"` fields will default to randomly-generated 98 | [UUIDs](http://en.wikipedia.org/wiki/Globally_unique_identifier), 99 | although they can be any hashable type. 100 | 101 | ## Caching 102 | Should you come across a use case where you'd like quick references to nodes or edges by more than just the ID, 103 | semanticnet provides a mechanism to cache nodes and edges by any of their attributes. For example, suppose you make 104 | the following graph: 105 | 106 | ```python 107 | >>> g = sn.Graph() 108 | >>> a = g.add_node({"type": "server"}) 109 | >>> b = g.add_node({"type": "server"}) 110 | >>> c = g.add_node({"type": "client"}) 111 | >>> g.add_edge(a, b, {"method": "GET", "port": 80}) 112 | UUID('eeb41fd0-9229-47eb-84f0-08ae37a341b2') 113 | >>> g.add_edge(a, c, {"method": "GET", "port": 80}) 114 | UUID('d490157e-621f-4e4d-ba93-68e83f3230dc') 115 | >>> g.add_edge(b, c, {"method": "POST", "port": 443}) 116 | UUID('9b2bcaf3-7af7-45a4-871e-d453e1ae8f2c') 117 | ``` 118 | 119 | Suppose further that you want to access the nodes by their `"type"` attribute. You can tell semanticnet to 120 | cache the nodes by the `"type"` attribute, and access them like so: 121 | 122 | ```python 123 | >>> g.cache_nodes_by("type") 124 | >>> g.get_nodes_by_attr("type") 125 | {'client': [{'type': 'client', 'id': UUID('8ccbcf75-603e-4a53-83a8-ccb0c4680f57')}], 'server': [{'type': 'server', 'id': UUID('125eb4a5-705f-420d-839c-59f15f2238d5')}, {'type': 'server', 'id': UUID('df0ac3ba-920d-4c46-9da8-748cf17b7e45')}]} 126 | ``` 127 | 128 | Similarly, you could get a list of all connections by port: 129 | 130 | ```python 131 | >>> g.cache_edges_by("port") 132 | >>> g.get_edges_by_attr("port") 133 | {80: [{'port': 80, 'src': UUID('df0ac3ba-920d-4c46-9da8-748cf17b7e45'), 'dst': UUID('8ccbcf75-603e-4a53-83a8-ccb0c4680f57'), 'id': UUID('d490157e-621f-4e4d-ba93-68e83f3230dc'), 'method': 'GET'}, {'port': 80, 'src': UUID('df0ac3ba-920d-4c46-9da8-748cf17b7e45'), 'dst': UUID('125eb4a5-705f-420d-839c-59f15f2238d5'), 'id': UUID('eeb41fd0-9229-47eb-84f0-08ae37a341b2'), 'method': 'GET'}], 443: [{'port': 443, 'src': UUID('125eb4a5-705f-420d-839c-59f15f2238d5'), 'dst': UUID('8ccbcf75-603e-4a53-83a8-ccb0c4680f57'), 'id': UUID('9b2bcaf3-7af7-45a4-871e-d453e1ae8f2c'), 'method': 'POST'}]} 134 | ``` 135 | 136 | and you can specify the attribute value as well, to return the list of connections by, say, port 80: 137 | 138 | ```python 139 | >>> g.get_edges_by_attr("port", 80) 140 | [{'port': 80, 'src': UUID('df0ac3ba-920d-4c46-9da8-748cf17b7e45'), 'dst': UUID('8ccbcf75-603e-4a53-83a8-ccb0c4680f57'), 'id': UUID('d490157e-621f-4e4d-ba93-68e83f3230dc'), 'method': 'GET'}, {'port': 80, 'src': UUID('df0ac3ba-920d-4c46-9da8-748cf17b7e45'), 'dst': UUID('125eb4a5-705f-420d-839c-59f15f2238d5'), 'id': UUID('eeb41fd0-9229-47eb-84f0-08ae37a341b2'), 'method': 'GET'}] 141 | ``` 142 | 143 | The cache is managed automatically. Any time you add or remove a node/edge with an attribute that you are 144 | caching, or modify an attribute of a node/edge, semanticnet updates the cache. 145 | 146 | ## Installation 147 | To install, you can simply run 148 | 149 | ```sh 150 | pip install semanticnet 151 | ``` 152 | 153 | ### Manual installation 154 | ```sh 155 | git clone https://github.com/ThibaultReuille/semanticnet.git 156 | cd semanticnet 157 | ./setup.py install 158 | ``` 159 | 160 | ### Tests 161 | If you wish to run the test suite, it uses `py.test`. Install it with: 162 | 163 | ```sh 164 | pip install pytest 165 | ``` 166 | 167 | and run the tests with: 168 | 169 | ```sh 170 | py.test -v ./test 171 | ``` 172 | -------------------------------------------------------------------------------- /examples/01.simple.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | import sys 4 | import semanticnet as sn 5 | 6 | graph = sn.Graph() 7 | 8 | a = graph.add_node({"label" : "A"}) 9 | b = graph.add_node({"label" : "B"}) 10 | c = graph.add_node({"label" : "C"}) 11 | 12 | graph.add_edge(a, b, {"type" : "belongs"}) 13 | graph.add_edge(b, c, {"type" : "owns"}) 14 | graph.add_edge(c, a, {"type" : "has"}) 15 | 16 | graph.save_json("output.json") 17 | -------------------------------------------------------------------------------- /examples/asn_timeline.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import argparse 4 | import semanticnet as sn 5 | 6 | def set_invisible(g): 7 | for nid in g.get_node_ids(): 8 | g.set_node_attribute(nid, 'og:space:lod', 0.0 ) 9 | for eid in g.get_edge_ids(): 10 | g.set_edge_attribute(eid, 'og:space:lod', 0.0 ) 11 | 12 | def timeline_set_node_visible(g, nid): 13 | '''Add an event to set the node visible and increment counter.''' 14 | global timeline_counter 15 | global timeline_delta 16 | g.add_event(timeline_counter, "graph:set_node_attribute", 17 | { 18 | 'id': nid, 19 | 'name': 'og:space:lod', 20 | 'type': 'float', 21 | 'value': '1.0' 22 | } 23 | ) 24 | g.set_node_attribute(nid, 'og:space:lod', 1.0) 25 | timeline_counter += timeline_delta 26 | 27 | def timeline_set_edge_visible(g, eid): 28 | if edge_is_visible(g, eid): 29 | return 30 | 31 | global timeline_counter 32 | global timeline_delta_small 33 | g.add_event(timeline_counter, "graph:set_link_attribute", 34 | { 35 | 'id': eid, 36 | 'name': 'og:space:lod', 37 | 'type': 'float', 38 | 'value': '1.0' 39 | } 40 | ) 41 | g.set_edge_attribute(eid, 'og:space:lod', 1.0) 42 | timeline_counter += timeline_delta_small 43 | 44 | def node_is_visible(g, nid): 45 | return g.get_node_attribute(nid, 'og:space:lod') != 0.0 46 | 47 | def edge_is_visible(g, eid): 48 | return g.get_edge_attribute(eid, 'og:space:lod') != 0.0 49 | 50 | if __name__ == "__main__": 51 | parser = argparse.ArgumentParser("asn_timeline") 52 | parser.add_argument('asn_graph', help="The JSON file of the ASN graph.") 53 | args = parser.parse_args() 54 | 55 | global timeline_counter 56 | global timeline_delta 57 | global timeline_delta_small 58 | 59 | timeline_counter = 100 60 | timeline_delta = 100 61 | timeline_delta_small = 50 62 | 63 | queue = {} # maps node IDs to their queue of edges 64 | g = sn.DiGraph(json_file=args.asn_graph) # the JSON graph 65 | set_invisible(g) # set the nodes/edges to invisible initially 66 | 67 | for nid, attrs in sorted(g.get_nodes().iteritems(), key=lambda n: n[1]['registration'] 68 | if 'registration' in n[1] else n[1]['label']): 69 | timeline_set_node_visible(g, nid) 70 | if nid in queue: 71 | for eid in queue[nid]: 72 | timeline_set_edge_visible(g, eid) 73 | queue.pop(nid) # remove from the queue, once visible 74 | for neighbor in dict(g.neighbors(nid).items() + g.predecessors(nid).items()): 75 | # get all edges between this node and its neighbor 76 | edges = [ eid for eid, attrs in g.get_edges_between(nid, neighbor).iteritems() ] 77 | if node_is_visible(g, neighbor): 78 | for eid in edges: 79 | timeline_set_edge_visible(g, eid) 80 | else: 81 | queue[neighbor] = edges 82 | 83 | set_invisible(g) 84 | g.save_json('timeline.json') 85 | 86 | -------------------------------------------------------------------------------- /examples/bro_graph.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import argparse 3 | import csv 4 | import os 5 | import pprint 6 | import re 7 | import semanticnet as sn 8 | import sys 9 | import urlparse 10 | 11 | delim_pattern = '(#separator.+)' 12 | var_line_pattern = '(\#.+)' 13 | var_pattern = '((?<=\#).+)' 14 | 15 | def log_delim(log): 16 | global delim_pattern 17 | delim_match = re.search(delim_pattern, log) 18 | if delim_match == None: 19 | print("No separator in log. Exiting.") 20 | sys.exit(-1) 21 | else: 22 | delim_line = delim_match.groups()[0] 23 | 24 | return delim_line.split(' ')[1].decode("string-escape") 25 | 26 | def extract_vars(log): 27 | global var_pattern 28 | var_lines = [ line for line in re.findall(var_pattern, log) if re.match("^separator", line) == None ] 29 | vars_ = dict([ tuple(line.split(delim, 1)) for line in var_lines ]) 30 | vars_['fields'] = vars_['fields'].split(delim) 31 | vars_['types'] = vars_['types'].split(delim) 32 | 33 | return vars_ 34 | 35 | def extract_data(fields, log, limit=0): 36 | data = {} 37 | global var_line_pattern 38 | data_lines_matches = re.subn(var_line_pattern + "\n", '', log) 39 | 40 | if data_lines_matches == None: 41 | return {} 42 | 43 | data_lines = data_lines_matches[0] 44 | data_lines = data_lines.split("\n") 45 | bro_reader = csv.DictReader(data_lines, fieldnames=fields, delimiter=delim) 46 | 47 | if limit == 0: 48 | limit = len(data_lines) 49 | 50 | # for row in bro_reader: 51 | num_rows = 0 52 | for row in bro_reader: 53 | if num_rows >= limit: 54 | break 55 | else: 56 | num_rows += 1 57 | 58 | if row['id.orig_h'] in data: 59 | data[row['id.orig_h']].append(row) 60 | else: 61 | data[row['id.orig_h']] = [ row ] 62 | 63 | return data 64 | 65 | def print_data(data): 66 | for ip, ip_data in data.items(): 67 | for item in ip_data: 68 | for key, val in item.items(): 69 | print("{}: {}".format(key, val)) 70 | print 71 | 72 | def get_node(graph, field, log_entry, label=""): 73 | if label == "": 74 | try: 75 | field_value = log_entry[field] 76 | except KeyError: 77 | return None 78 | 79 | if field_value == "-": 80 | return None 81 | else: 82 | field_value = label 83 | 84 | node = graph.get_nodes_by_attr("label", field_value, nosingleton=True) 85 | 86 | if not node: 87 | return graph.add_node({"label": field_value, "type": field}) 88 | 89 | return node['id'] 90 | 91 | def connect(graph, src, dst, attrs): 92 | edges = graph.get_edges_between(src, dst) 93 | 94 | if len(edges) != 0: 95 | combine_edge_id = None 96 | for id_, edge_attrs in edges.items(): 97 | if all(item in edge_attrs.items() for item in attrs.items()): 98 | combine_edge_id = id_ 99 | break 100 | if combine_edge_id != None: 101 | if "raindance:space:activity" in edges[combine_edge_id]: 102 | edges[combine_edge_id]["raindance:space:activity"] += .005 103 | else: 104 | edges[combine_edge_id]["raindance:space:activity"] = .005 105 | else: 106 | graph.add_edge(src, dst, attrs) 107 | else: 108 | graph.add_edge(src, dst, attrs) 109 | 110 | 111 | if __name__ == "__main__": 112 | parser = argparse.ArgumentParser("bro_graph") 113 | parser.add_argument('capture_filename') 114 | parser.add_argument('-v', '--verbose', action='store_true', default=False) 115 | args = parser.parse_args() 116 | 117 | logfilename = args.capture_filename 118 | print("Opening {}".format(logfilename)) 119 | 120 | with open(logfilename) as logfile: 121 | log = logfile.read() 122 | 123 | print("Building graph...") 124 | 125 | delim = log_delim(log) 126 | vars_ = extract_vars(log) 127 | data = extract_data(vars_['fields'], log, 500) 128 | 129 | if args.verbose: 130 | print_data(data) 131 | 132 | graph = sn.DiGraph() 133 | 134 | for ip, items in data.items(): 135 | graph.clear_node_cache() 136 | graph.cache_nodes_by("label", build=False) 137 | for item in items: 138 | src_ip_node = get_node(graph, 'id.orig_h', item) 139 | user_agent_node = get_node(graph, 'user_agent', item) 140 | 141 | if src_ip_node == None or user_agent_node == None: 142 | continue 143 | 144 | connect(graph, src_ip_node, user_agent_node, {"type": "user_agent"}) 145 | 146 | label = item['host'] + item['uri'] 147 | ext = os.path.splitext(item['uri'])[1] 148 | host_node = get_node(graph, "host:{}".format(ext), item, label) 149 | 150 | if host_node == None: 151 | continue 152 | 153 | connect(graph, user_agent_node, host_node, {"type": item['method']}) 154 | 155 | try: 156 | ref_url = item['referrer'] 157 | except KeyError: 158 | continue 159 | 160 | if ref_url == "-": 161 | continue 162 | 163 | ref = urlparse.urlparse(ref_url) 164 | ref = ref.netloc + ref.path 165 | 166 | ref_node = get_node(graph, 'referrer', item, ref) 167 | 168 | if ref_node == None: 169 | continue 170 | 171 | connect(graph, ref_node, host_node, {"type": "ref"}) 172 | 173 | path = os.path.dirname(os.path.abspath(logfilename)) 174 | file_ext = os.path.splitext(logfilename) 175 | save_json_name = os.path.join(path, os.path.basename(file_ext[0]) + ".json") 176 | print("Writing results to {}".format(save_json_name)) 177 | graph.save_json(save_json_name) 178 | -------------------------------------------------------------------------------- /examples/diff.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import json 4 | import semanticnet as sn 5 | import argparse 6 | import sys 7 | import os 8 | 9 | def attr_to_id(j, attr): 10 | node_ids_to_attrs = {} 11 | for node in j['nodes']: 12 | if attr in node: 13 | node_ids_to_attrs[node['id']] = node[attr] 14 | node['id'] = node[attr] 15 | else: 16 | raise "attribute {} not in node {}".format(attr, node) 17 | for edge in j['edges']: 18 | edge['src'] = node_ids_to_attrs[edge['src']] 19 | edge['dst'] = node_ids_to_attrs[edge['dst']] 20 | edge['id'] = edge['src'] + "|" + edge['dst'] 21 | 22 | if __name__ == "__main__": 23 | parser = argparse.ArgumentParser("diff.py") 24 | parser.add_argument('-a', '--attr', type=str, 25 | help="If you used an attribute to identify nodes and edges, pass the attribute here.") 26 | parser.add_argument('-o', '--outfile', type=str, help="Output file path") 27 | parser.add_argument('-c', '--context', action="store_true", default=False, 28 | help="Only keep relevant unchanged nodes/edges. Cleans up clutter.") 29 | parser.add_argument('-m', '--modifications', action="store_true", default=False, 30 | help="Check for attribute modifications.") 31 | parser.add_argument('-u', '--undirected', action="store_true", default=False, 32 | help='Build undirected graphs for the changesets. Uses directed graphs by default.') 33 | parser.add_argument('old_graph', type=str) 34 | parser.add_argument('new_graph', type=str) 35 | args = parser.parse_args() 36 | 37 | if not args.outfile: 38 | old_base = os.path.splitext(os.path.basename(args.old_graph))[0] 39 | new_base = os.path.splitext(os.path.basename(args.new_graph))[0] 40 | args.outfile = old_base + "-" + new_base + "-diff" 41 | 42 | if args.context: 43 | args.outfile += "-c" 44 | 45 | if args.modifications: 46 | args.outfile += "-m" 47 | 48 | args.outfile += ".json" 49 | 50 | if args.attr: 51 | a_obj = json.load(open(args.old_graph, 'r')) 52 | b_obj = json.load(open(args.new_graph, 'r')) 53 | print("Converting old graph...") 54 | attr_to_id(a_obj, args.attr) 55 | print("Converting new graph...") 56 | attr_to_id(b_obj, args.attr) 57 | else: 58 | a_obj = args.old_graph 59 | b_obj = args.new_graph 60 | 61 | A = sn.Graph() if args.undirected else sn.DiGraph() 62 | A.load_json(a_obj) 63 | 64 | B = sn.Graph() if args.undirected else sn.DiGraph() 65 | B.load_json(b_obj) 66 | 67 | print("Performing diff...") 68 | if args.context: 69 | print("and filtering out clutter...") 70 | print 71 | 72 | D = sn.diff(A, B, args.context, args.modifications) 73 | print("Nodes added: {}".format(len([n for n, attrs in D.get_nodes().items() if attrs['diffstatus'] == 'added']))) 74 | print("Nodes removed: {}".format(len([n for n, attrs in D.get_nodes().items() if attrs['diffstatus'] == 'removed']))) 75 | 76 | if args.modifications: 77 | print("Nodes modified: {}".format(len([n for n, attrs in D.get_nodes().items() if attrs['diffstatus'] == 'modified']))) 78 | 79 | print("Edges added: {}".format(len([e for e, attrs in D.get_edges().items() if attrs['diffstatus'] == 'added']))) 80 | print("Edges removed: {}".format(len([e for e, attrs in D.get_edges().items() if attrs['diffstatus'] == 'removed']))) 81 | 82 | if args.modifications: 83 | print("Edges modified: {}".format(len([e for e, attrs in D.get_edges().items() if attrs['diffstatus'] == 'modified']))) 84 | 85 | print("Writing results to {}".format(args.outfile)) 86 | D.save_json(args.outfile) 87 | -------------------------------------------------------------------------------- /examples/fs_graph.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sys 4 | import os 5 | import argparse 6 | import semanticnet as sn 7 | 8 | def add_node(graph, root, label, node_type): 9 | data = {} 10 | 11 | if os.path.islink(root): 12 | data['type'] = 'link' 13 | 14 | path = os.path.join(root, label) 15 | 16 | if not graph.has_node(path): 17 | graph.add_node({"type": node_type, "label": label}, path) 18 | 19 | graph.add_edge(root, path, data) 20 | 21 | if __name__ == "__main__": 22 | if len(sys.argv) < 1: 23 | print("Need a starting dir") 24 | sys.exit(-1) 25 | 26 | start = sys.argv[1] 27 | graph = sn.Graph() 28 | 29 | for root, dirs, files in os.walk(start, followlinks=True): 30 | print(root) 31 | 32 | if not graph.has_node(root): 33 | graph.add_node({'label': root, 'type': 'dir', 'depth': 0}, root) 34 | 35 | for d in dirs: 36 | add_node(graph, root, d, "dir") 37 | for f in files: 38 | add_node(graph, root, f, os.path.splitext(f)[1]) 39 | 40 | graph.save_json("fs.json") 41 | 42 | -------------------------------------------------------------------------------- /examples/linked_in_csv.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import argparse 4 | import csv 5 | import os 6 | import semanticnet as sn 7 | import networkx as nx 8 | import sys 9 | 10 | class Contact(object): 11 | def __init__(self, name, company): 12 | self.name = name 13 | self.company = company 14 | # self.title = title 15 | 16 | def add_node(graph, attrs): 17 | try: 18 | node = graph.get_node(attrs["label"]) 19 | except KeyError: 20 | return graph.add_node(attrs, attrs["label"]) 21 | 22 | return node["id"] 23 | 24 | def process_contact(graph, contact): 25 | nodes = [] 26 | 27 | # add all nodes 28 | for key, val in vars(contact).items(): 29 | if val == "": 30 | continue 31 | if key == "name": 32 | name_node = add_node(graph, {"label": val, "type": key, "depth": 0}) 33 | else: 34 | nodes.append(add_node(graph, {"label": val, "type": key})) 35 | 36 | # connect this contact's name node to every other node for this contact 37 | for node in nodes: 38 | graph.add_edge(name_node, node, id_="{}-{}".format(name_node, node)) 39 | 40 | def process_csv_file(graph, filename, limit): 41 | processed = 0 42 | with open(filename, 'rU') as f: 43 | reader = csv.DictReader(f, dialect="excel") 44 | for row in reader: 45 | contact = Contact( 46 | " ".join([row["First Name"], row["Last Name"]]), 47 | row["Company"] 48 | # row["Job Title"] 49 | ) 50 | 51 | process_contact(graph, contact) 52 | 53 | processed += 1 54 | 55 | if limit > -1 and processed >= limit: 56 | break 57 | 58 | if __name__ == "__main__": 59 | parser = argparse.ArgumentParser("linked_in_csv") 60 | parser.add_argument('-i', '--intersection', type=str, 61 | help='Take the intersection of the input graph with the graph given on this argument.') 62 | parser.add_argument('-l', '--limit', type=int, default=-1, 63 | help='Load only this many contacts from each CSV file.') 64 | parser.add_argument('contact_list_filename') 65 | args = parser.parse_args() 66 | 67 | graph = sn.Graph() 68 | 69 | process_csv_file(graph, args.contact_list_filename, args.limit) 70 | 71 | if args.intersection: 72 | ugraph = sn.Graph() 73 | process_csv_file(ugraph, args.intersection, args.limit) 74 | 75 | g1 = graph.networkx_graph() 76 | g2 = ugraph.networkx_graph() 77 | 78 | gi = g1.copy() 79 | gi.remove_nodes_from(n for n in g1.nodes() if n not in g2.nodes()) 80 | gi.add_edges_from((src, dst) for src, dst in g2.edges() if src in g1.nodes() and dst in g1.nodes()) 81 | 82 | # remove any remaining nodes that are not connected to anything 83 | for id_ in gi.nodes(): 84 | if gi.degree(id_) <= 0: 85 | gi.remove_node(id_) 86 | 87 | graph = sn.Graph() 88 | graph.load_networkx_graph(gi) 89 | 90 | path = os.path.dirname(os.path.abspath(args.contact_list_filename)) 91 | file_ext = os.path.splitext(args.contact_list_filename) 92 | save_json_name = os.path.join(path, os.path.basename(file_ext[0]) + ".json") 93 | print("Writing results to {}".format(save_json_name)) 94 | graph.save_json(save_json_name) 95 | -------------------------------------------------------------------------------- /examples/mergers.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import argparse 4 | import csv 5 | import semanticnet as sn 6 | import math 7 | 8 | def get_node(graph, id_, attrs): 9 | if graph.has_node(id_): 10 | return graph.get_node(id_)['id'] 11 | return graph.add_node(attrs, id_=id_) 12 | 13 | def process_csv_file(graph, filename): 14 | global max_amt 15 | with open(filename, 'rU') as f: 16 | reader = csv.DictReader(f, dialect="excel") 17 | for row in reader: 18 | acquirer = get_node(graph, row['Acquirer'], 19 | {'type': 'acquirer', 'label': row['Acquirer'], 'depth': 0} 20 | ) 21 | target = get_node(graph, row['Target'], 22 | {'type': 'target', 'label': row['Target']} 23 | ) 24 | if 'Total Deal Amt.' in row and row['Total Deal Amt.'] != "": 25 | amt = row['Total Deal Amt.'][1:].replace(',', '') 26 | try: 27 | amt = float(amt) 28 | max_amt = max(amt, max_amt) 29 | data = {"graphiti:space:activity": amt} 30 | except ValueError: 31 | data = {} 32 | else: 33 | data = {} 34 | 35 | graph.add_edge(acquirer, target, data) 36 | 37 | def scale_deal_amts(graph): 38 | global max_amt 39 | for nid, attrs in graph.get_edges().iteritems(): 40 | if 'graphiti:space:activity' in attrs: 41 | attrs['graphiti:space:activity'] = math.log( 42 | attrs['graphiti:space:activity'], max_amt 43 | ) 44 | 45 | if __name__ == "__main__": 46 | parser = argparse.ArgumentParser("merger.py") 47 | parser.add_argument('csv_file', type=str) 48 | args = parser.parse_args() 49 | 50 | global max_amt 51 | max_amt = 0 52 | 53 | g = sn.DiGraph() 54 | process_csv_file(g, args.csv_file) 55 | scale_deal_amts(g) 56 | g.save_json("mergers.json") 57 | -------------------------------------------------------------------------------- /examples/requirements.txt: -------------------------------------------------------------------------------- 1 | shodan 2 | -------------------------------------------------------------------------------- /examples/sample/http.log: -------------------------------------------------------------------------------- 1 | #separator \x09 2 | #set_separator , 3 | #empty_field (empty) 4 | #unset_field - 5 | #path http 6 | #open 2014-06-16-19-58-34 7 | #fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p trans_depth method host uri referrer user_agent request_body_len response_body_len status_code status_msg info_code info_msg filename tags username password proxied orig_fuids orig_mime_types resp_fuids resp_mime_types 8 | #types time string addr port addr port count string string string string string count count count string count string string set[enum] string string set[string] vector[string] vector[string] vector[string] vector[string] 9 | 1401994125.159406 CMmoNC3u1gFllUEJK 172.16.165.133 49288 108.168.252.27 80 1 GET www.wranglerforum.com / http://www.google.com/url?url=http://www.wranglerforum.com/&rct=j&frm=1&q=&esrc=s&sa=U&ei=e7uQU6WmH86bqAadzIDwCg&ved=0CBgQFjAA&usg=AFQjCNHvrfBHQ5Zs4UkA5xm4HQwOmweeow Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 370077 200 OK - - - (empty) - - - - - F3Jf56V8XDImnoLzb text/html 10 | 1401994126.349676 CMmoNC3u1gFllUEJK 172.16.165.133 49288 108.168.252.27 80 2 GET www.wranglerforum.com /forumrunner/detect.js http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 2897 200 OK - - - (empty) - - - - - F4czdjH9WsuwmHuYe text/plain 11 | 1401994126.370724 CMmoNC3u1gFllUEJK 172.16.165.133 49288 108.168.252.27 80 3 GET www.wranglerforum.com /clientscript/vbulletin_menu.js?v=387 http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 9441 200 OK - - - (empty) - - - - - FkwPwJPVnayaUaAg text/plain 12 | 1401994126.725248 Cljs1p1K1WMbMped23 172.16.165.133 49297 75.102.9.195 80 1 GET frozerry.com /piNfcJgSomh3ws.js?pNY6ZT4=3ef0a6cb541dab381d00f5a18ef23c http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 122 200 OK - - - (empty) - - - - - FTCHuD2hRPReuuW0tk text/plain 13 | 1401994127.464649 CMmoNC3u1gFllUEJK 172.16.165.133 49288 108.168.252.27 80 4 GET www.wranglerforum.com /images/misc/navbits_start.gif http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 1004 200 OK - - - (empty) - - - - - FWg7tLlkiPBKWVqgj image/gif 14 | 1401994127.658610 CMmoNC3u1gFllUEJK 172.16.165.133 49288 108.168.252.27 80 5 GET www.wranglerforum.com /images/misc/join_now.png http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 7756 200 OK - - - (empty) - - - - - FtfXtk2se9QodcR425 image/png 15 | 1401994127.735481 CMmoNC3u1gFllUEJK 172.16.165.133 49288 108.168.252.27 80 6 GET www.wranglerforum.com /images/misc/forumhome_ad_left.png http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 5646 200 OK - - - (empty) - - - - - F8S1Gr3gFHKONkew45 image/png 16 | 1401994127.794318 CMmoNC3u1gFllUEJK 172.16.165.133 49288 108.168.252.27 80 7 GET www.wranglerforum.com /images/statusicon/subforum_old.gif http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 541 200 OK - - - (empty) - - - - - FnbtCj1037ihqcnCZ3 image/gif 17 | 1401994127.859960 CMmoNC3u1gFllUEJK 172.16.165.133 49288 108.168.252.27 80 8 GET www.wranglerforum.com /images/clear.gif http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 0 301 Moved Permanently - - - (empty) - - - - - - - 18 | 1401994127.569616 CXHxcT1imXnFHBgXAe 172.16.165.133 49304 64.202.116.151 80 1 GET dogintoo.in.ua /v20idaf/2 http://www.wranglerforum.com/ Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 19250 200 OK - - - (empty) - - - - - FYlsFv1pc3uca0gUvk text/html 19 | 1401994128.470389 CXHxcT1imXnFHBgXAe 172.16.165.133 49304 64.202.116.151 80 2 GET dogintoo.in.ua /v20idaf/106b45df758a527142475059060e5750080604590057545d04030f5256575507;112202;228 http://dogintoo.in.ua/v20idaf/2 Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 9999 200 OK - - - (empty) - - - - - FT0JTUhFNZIwonSW7 application/x-shockwave-flash 20 | 1401994128.566016 CAXJEm4gsj10WV6Iv4 172.16.165.133 49326 64.202.116.151 80 1 GET dogintoo.in.ua /v20idaf/58c006617a2067d85c5d0f0b020d05070c0e510b0454060a000b5a0052540750 http://dogintoo.in.ua/v20idaf/2 Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 13851 200 OK - - - (empty) - - - - - FalkCIONFvbTU3R79 text/html 21 | 1401994128.696494 CXHxcT1imXnFHBgXAe 172.16.165.133 49304 64.202.116.151 80 3 GET dogintoo.in.ua /v20idaf/44ef81d1ab53e4674758135d0a0a57070d02575d0c53540a01075c565a535550;4060129 - Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 11458 200 OK - - - (empty) - - - - - FLJRrJ1Qk3xRH5ufth application/zip 22 | 1401994128.779315 CVrZd42UHuda86suig 172.16.165.133 49333 64.202.116.151 80 1 GET dogintoo.in.ua /v20idaf/66e7367e8e92642255460e0c010d04530f00570c0754075e03055c0751540604;6 - Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 135990 200 OK - - - (empty) - - - - - FBhkSC2G7IMcWCsZDb - 23 | 1401994129.329074 CAXJEm4gsj10WV6Iv4 172.16.165.133 49326 64.202.116.151 80 2 GET dogintoo.in.ua /v20idaf/66e7367e8e92642255460e0c010d04530f00570c0754075e03055c0751540604;6;1 - Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 0 200 OK - - - (empty) - - - - - - - 24 | 1401994130.923605 CXHxcT1imXnFHBgXAe 172.16.165.133 49304 64.202.116.151 80 4 GET dogintoo.in.ua /v20idaf/7e345e3c8e9264225415580f075e00550e53010f0107035802560a0457070202;4 - Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 135990 200 OK - - - (empty) - - - - - FAQcy2BIBQbTdsWo3 - 25 | 1401994131.478848 CVrZd42UHuda86suig 172.16.165.133 49333 64.202.116.151 80 2 GET dogintoo.in.ua /v20idaf/7e345e3c8e9264225415580f075e00550e53010f0107035802560a0457070202;4;1 - Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 0 200 OK - - - (empty) - - - - - - - 26 | 1401994132.803011 CAXJEm4gsj10WV6Iv4 172.16.165.133 49326 64.202.116.151 80 3 GET dogintoo.in.ua /v20idaf/55f0074a8e92642256450d0b020c07570c03540b0455045a00065f0052550500;5 - Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 135990 200 OK - - - (empty) - - - - - Ff9UZx32jSz3Nj7pZj - 27 | 1401994133.543224 CXHxcT1imXnFHBgXAe 172.16.165.133 49304 64.202.116.151 80 5 GET dogintoo.in.ua /v20idaf/55f0074a8e92642256450d0b020c07570c03540b0455045a00065f0052550500;5;1 - Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 0 0 200 OK - - - (empty) - - - - - - - 28 | 1401994138.025590 CyJTwz4MTE6ITXxYTc 172.16.165.133 49386 195.2.253.38 80 1 POST 195.2.253.38 / - Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 17194 666 502 Connection refused - - - (empty) - - - FLygVE4ORf7pFw7RY9 text/plain F7MMgq129n1IgWOEw1 text/html 29 | 1401994138.952036 C6aWCQ3SxaYemRylF2 172.16.165.133 49387 195.2.253.38 80 1 POST 195.2.253.38 / - Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 17212 666 502 Connection refused - - - (empty) - - - FtiY6m4xPTvqwGrONk text/plain FvORKb1dQtMQsjSnad text/html 30 | 1401994140.888813 CJmuPF1W6F2aBchzKl 172.16.165.133 49388 195.2.253.38 80 1 POST 195.2.253.38 / - Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 17236 666 502 Connection refused - - - (empty) - - - FaJyq72ubTXBxlZhgf text/plain FkVER61r0dXQIxhfk5 text/html 31 | 1401994142.813364 COWDjo3kkinTBB5uX9 172.16.165.133 49401 64.202.116.151 80 1 GET dogintoo.in.ua /v20idaf/04efdc142d2c94445146045d565802020902575d5001010f05075c5606010055 - Mozilla/4.0 (Windows 7 6.1) Java/1.6.0_25 0 4766 200 OK - - - (empty) - - - - - FchytA3qV4a7mLFruk application/java-archive 32 | 1401994145.109331 C1K1WS0GQ64qYaWc8 172.16.165.133 49410 64.202.116.151 80 1 GET dogintoo.in.ua /v20idaf/3d89ab2645995b84501c5d02535901000a520a025500020d0657010903000357;1;2 - Mozilla/4.0 (Windows 7 6.1) Java/1.6.0_25 0 135948 200 OK - - - (empty) - - - - - FhOPhq4yyrloaXxcG - 33 | 1401994145.635879 C1K1WS0GQ64qYaWc8 172.16.165.133 49410 64.202.116.151 80 2 GET dogintoo.in.ua /v20idaf/3d89ab2645995b84501c5d02535901000a520a025500020d0657010903000357;1;2;1 - Mozilla/4.0 (Windows 7 6.1) Java/1.6.0_25 0 0 200 OK - - - (empty) - - - - - - - 34 | 1401994151.281455 CFxAMvuQLWySupdri 172.16.165.133 49424 195.2.253.38 80 1 POST 195.2.253.38 / - Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) 17248 666 502 Connection refused - - - (empty) - - - FfukF14TxbXtvUkaig text/plain FsbPiKYVL3WfKmNxk text/html 35 | #close 2014-06-16-19-58-34 36 | -------------------------------------------------------------------------------- /examples/sample/http.log.readme: -------------------------------------------------------------------------------- 1 | The provided bro log was parsed from a packet capture published on malware-traffic-analysis.net 2 | -------------------------------------------------------------------------------- /examples/shodan_graph.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import argparse 4 | import pprint 5 | import semanticnet as sn 6 | import shodan 7 | import sys 8 | 9 | def get_node(data, field): 10 | if field not in data: 11 | return None 12 | 13 | field_value = str(data[field]) 14 | 15 | node = graph.get_nodes_by_attr("label", field_value, nosingleton=True) 16 | 17 | if not node: 18 | return graph.add_node({"label": field_value, "type": field}) 19 | 20 | return node["id"] 21 | 22 | def connect(graph, src, dst): 23 | if src != None and dst != None: 24 | graph.add_edge(src, dst) 25 | 26 | if __name__ == "__main__": 27 | parser = argparse.ArgumentParser('shodan_graph') 28 | parser.add_argument('-k', '--api-key', type=str, required=True) 29 | parser.add_argument('-s', '--search', type=str, required=True) 30 | parser.add_argument('-v', '--verbose', action='store_true', default=False) 31 | args = parser.parse_args() 32 | 33 | sho = shodan.Shodan(args.api_key) 34 | 35 | try: 36 | search_results = sho.search(args.search) 37 | except shodan.APIError, e: 38 | print("Error during request: {}".format(e)) 39 | sys.exit(-1) 40 | 41 | if args.verbose: 42 | pprint.pprint(search_results) 43 | 44 | if not search_results['matches']: 45 | print("No search results.") 46 | sys.exit(0) 47 | 48 | graph = sn.Graph() 49 | graph.cache_nodes_by("label") # all labels will be unique 50 | 51 | for match in search_results['matches']: 52 | ip_node = get_node(match, 'ip_str') 53 | asn_node = get_node(match, 'asn') 54 | port_node = get_node(match, 'port') 55 | title_node = get_node(match, 'title') 56 | 57 | if 'location' in match and 'country_name' in match['location']: 58 | country_name = match['location']['country_name'] 59 | country_node = graph.get_nodes_by_attr("label", country_name, nosingleton=True) 60 | if not country_node: 61 | country_node = graph.add_node({"type": "country_name", 62 | "label": country_name, 63 | "depth": 0 64 | }) 65 | else: 66 | try: 67 | country_node = country_node["id"] 68 | except KeyError: 69 | country_node = None 70 | else: 71 | country_node = None 72 | 73 | connect(graph, asn_node, country_node) 74 | connect(graph, ip_node, asn_node) 75 | connect(graph, ip_node, port_node) 76 | connect(graph, ip_node, title_node) 77 | 78 | postfix = args.search.replace(" ", "_") 79 | save_filename = "shodan_{}.json".format(postfix) 80 | print("Saving results to {}".format(save_filename)) 81 | graph.save_json(save_filename) 82 | -------------------------------------------------------------------------------- /examples/ssv.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import argparse 4 | import csv 5 | import semanticnet as sn 6 | import sys 7 | 8 | def add_edge_ifn(graph, src, dst): 9 | # get the edges between src and dst 10 | edges = graph.get_edges_between(src, dst) 11 | # if there are no edges, add one and return it 12 | if not edges: 13 | return graph.add_edge(src, dst) 14 | # if there is, return the first edge's ID 15 | return edges.items()[0][0] 16 | 17 | if __name__ == "__main__": 18 | parser = argparse.ArgumentParser("ssv.py") 19 | parser.add_argument("input_filename", type=str) 20 | parser.add_argument("output_filename", type=str) 21 | args = parser.parse_args() 22 | 23 | g = sn.DiGraph() 24 | with open(args.input_filename, "rU") as infile: 25 | reader = csv.reader(infile, delimiter=' ') 26 | for row in reader: 27 | previous = None 28 | for elem in row: 29 | current = g.get_or_add_node(elem, {"label": elem}) 30 | if previous != None: 31 | add_edge_ifn(g, previous['id'], current['id']) 32 | previous = current 33 | 34 | g.save_json(args.output_filename) 35 | -------------------------------------------------------------------------------- /examples/vcdb.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import argparse 3 | import json 4 | import os 5 | import semanticnet as sn 6 | 7 | def add_node(g, nid, attrs={}): 8 | try: 9 | return g.get_node(nid)['id'] 10 | except KeyError: 11 | return g.add_node(attrs, id_=nid) 12 | 13 | def add_edge(g, src, dst, attrs={}): 14 | edges = g.get_edges_between(src, dst) 15 | if edges: 16 | return edges.items()[0][0] 17 | 18 | return g.add_edge(src, dst, attrs) 19 | 20 | def load_action(a, g): 21 | actions = [] 22 | for action_type, data in a.iteritems(): 23 | try: 24 | variety = data.get('variety') 25 | except: 26 | continue 27 | if variety != None: 28 | for v in variety: 29 | actions.append(add_node(g, v, {'label': v, 'type': 'action'})) 30 | return actions 31 | 32 | def load_asset(a, g): 33 | assets = [] 34 | for asset in a: 35 | variety = asset.get('variety') 36 | if variety != None: 37 | assets.append(add_node(g, variety, {'label': variety, 'type': 'asset'})) 38 | return assets 39 | 40 | def connect_items(g, list1, list2, attrs={}): 41 | for item1 in list1: 42 | for item2 in list2: 43 | add_edge(g, item1, item2, attrs) 44 | for item2 in list2: 45 | for item1 in list1: 46 | add_edge(g, item1, item2, attrs) 47 | 48 | if __name__ == "__main__": 49 | parser = argparse.ArgumentParser("vcdb.py") 50 | parser.add_argument("-i", "--input", type=str) 51 | args = parser.parse_args() 52 | 53 | vcdb_dir = os.environ['VCDB_DATA'] 54 | g = sn.Graph() 55 | files = [] 56 | if args.input: 57 | files.append(args.input) 58 | else: 59 | files = [ os.path.join(vcdb_dir, f) for f in os.listdir(vcdb_dir) if os.path.splitext(f)[1] == '.json' ] 60 | 61 | for f in files: 62 | j = json.load(open(f, 'rU')) 63 | actions = load_action(j['action'], g) 64 | try: 65 | assets = load_asset(j['asset']['assets'], g) 66 | except KeyError: 67 | continue 68 | try: 69 | victim = add_node( 70 | g, 71 | j['victim']['victim_id'].encode('utf-8', 'ignore'), 72 | {'label': j['victim']['victim_id'].encode('utf-8', 'ignore'), 'type': 'victim', 'depth': 0} 73 | ) 74 | except KeyError: 75 | victim = None 76 | 77 | connect_items(g, assets, actions) 78 | 79 | if victim != None: 80 | for asset in assets: 81 | add_edge(g, victim, asset) 82 | for action in actions: 83 | add_edge(g, victim, action) 84 | 85 | g.save_json("vcdb.json") 86 | 87 | -------------------------------------------------------------------------------- /semanticnet/DiGraph.py: -------------------------------------------------------------------------------- 1 | import networkx as nx 2 | from semanticnet import Graph 3 | 4 | class DiGraph(Graph): 5 | 6 | def __init__(self, verbose=False, json_file=""): 7 | # don't pass in json_file so it doesn't call load_json() twice 8 | super(DiGraph, self).__init__(verbose, "") 9 | self._g = nx.MultiDiGraph() 10 | if json_file: 11 | self.load_json(json_file) 12 | 13 | def remove_node(self, id_): 14 | '''Removes node id_.''' 15 | id_ = self._extract_id(id_) 16 | if self._g.has_node(id_): 17 | # for DiGraph, remove predecessors AND successors 18 | for successor in self._g.successors(id_): 19 | # need to iterate over items() (which copies the dict) because we are 20 | # removing items from the edges dict as we are iterating over it 21 | for edge in self._g.edge[id_][successor].items(): 22 | self.remove_edge(self._g.edge[id_][successor][edge[0]]["id"]) # edge[0] is the edge's ID 23 | for predecessor in self._g.predecessors(id_): 24 | for edge in self._g.edge[predecessor][id_].items(): 25 | self.remove_edge(self._g.edge[predecessor][id_][edge[0]]["id"]) 26 | 27 | self._remove_node_from_cache(id_) 28 | self._g.remove_node(id_) 29 | else: 30 | raise GraphException("Node ID not found.") 31 | 32 | def predecessors(self, id_): 33 | return dict([(nid, self.get_node(nid)) for nid in self._g.predecessors(id_)]) 34 | 35 | -------------------------------------------------------------------------------- /semanticnet/Graph.py: -------------------------------------------------------------------------------- 1 | import networkx as nx 2 | import json 3 | import uuid 4 | import copy 5 | from itertools import chain 6 | 7 | class GraphException(Exception): 8 | """Generic Semantic Graph Exception""" 9 | def __init__(self, msg): 10 | Exception.__init__(self, msg) 11 | self.msg = msg 12 | 13 | def __str__(self): 14 | return repr(self.msg) 15 | 16 | class ReservedAttributeException(GraphException): 17 | '''An exception for when the user attempts to set a reserved attribute.''' 18 | def __init__(self, reserved_attr_name): 19 | self.reserved_attr_name = reserved_attr_name 20 | msg = 'Attribute {} is reserved.'.format(self.reserved_attr_name) 21 | GraphException.__init__(self, msg) 22 | 23 | def __str__(self): 24 | return repr(self.msg) 25 | 26 | class Event(object): 27 | def __init__(self, timecode, name, attributes): 28 | self.timecode = timecode 29 | self.name = name 30 | self.attributes = attributes 31 | 32 | class CacheMeta(object): 33 | def __init__(self, get_func, get_items_func, cache_func, cache): 34 | self.get_func = get_func 35 | self.get_items_func = get_items_func 36 | self.cache_func = cache_func 37 | self.cache = cache 38 | 39 | class Graph(object): 40 | '''A simple Graph structure which lets you focus on the data.''' 41 | 42 | def __init__(self, verbose=False, json_file=""): 43 | self._g = nx.MultiGraph() 44 | self._edges = {} 45 | 46 | self._node_cache = {} 47 | self._edge_cache = {} 48 | self._cache_meta = { 49 | "node": CacheMeta(self.get_node, self.get_nodes, self._cache_node, self._node_cache), 50 | "edge": CacheMeta(self.get_edge, self.get_edges, self._cache_edge, self._edge_cache) 51 | } 52 | 53 | self.meta = {} 54 | self.timeline = [] 55 | 56 | self.verbose = verbose 57 | self.attr_reserved = ["id", "src", "dst"] 58 | 59 | if json_file: 60 | self.load_json(json_file) 61 | 62 | def _create_uuid(self): 63 | '''Create a random UUID for a new node or edge. Checks for collisions.''' 64 | id_ = uuid.uuid4() 65 | while self._g.has_node(id_) or id_ in self._edges: 66 | id_ = uuid.uuid4() 67 | return id_ 68 | 69 | def _extract_id(self, id_): 70 | '''Parse a UUID out of the string id_.''' 71 | if id_.__class__.__name__ == 'UUID': 72 | return id_ 73 | 74 | # convert to a UUID if possible 75 | try: 76 | id_ = uuid.UUID(id_) 77 | # if it's not a UUID, just return what was sent 78 | except: 79 | pass 80 | 81 | return id_ 82 | 83 | def _cache_item(self, item_type, attr_name, attr_values): 84 | # if we have not cached anything by this attr before, 85 | # create an empty dict for it 86 | if attr_name not in self._cache_meta[item_type].cache: 87 | self._cache_meta[item_type].cache[attr] = {} 88 | 89 | # if we haven't seen this attr value before, make an empty list for it 90 | if attr_values[attr_name] not in self._cache_meta[item_type].cache[attr_name]: 91 | self._cache_meta[item_type].cache[attr_name][attr_values[attr_name]] = [] 92 | 93 | # add it to the cache 94 | self._cache_meta[item_type].cache[attr_name][attr_values[attr_name]].append(attr_values) 95 | 96 | def _cache_node(self, attr, node): 97 | '''Cache a node in self._node_cache''' 98 | self._cache_item("node", attr, node) 99 | 100 | def _cache_edge(self, attr, edge): 101 | '''Cache an edge in self._edge_cache''' 102 | self._cache_item("edge", attr, edge) 103 | 104 | def _cache_new(self, item_type, attrs): 105 | for key in self._cache_meta[item_type].cache: 106 | if key in attrs: 107 | self._cache_meta[item_type].cache_func(key, attrs) 108 | 109 | def _cache_new_node(self, attrs): 110 | '''Checks a new node's attributes and caches it if we are caching by one or more 111 | of its attributes.''' 112 | self._cache_new("node", attrs) 113 | 114 | def _cache_new_edge(self, attrs): 115 | '''Checks a new edge's attributes and caches it if we are caching by one or more 116 | of its attributes.''' 117 | self._cache_new("edge", attrs) 118 | 119 | def _remove_item_from_cache(self, item_type, id_): 120 | item = self._cache_meta[item_type].get_func(id_) 121 | for attr, val in item.iteritems(): 122 | try: 123 | self._cache_meta[item_type].cache[attr][val].remove(item) 124 | except KeyError: 125 | pass 126 | 127 | def _remove_node_from_cache(self, id_): 128 | '''Removes node id_ from all places it occurs in the cache, if anywhere.''' 129 | self._remove_item_from_cache("node", id_) 130 | 131 | def _remove_edge_from_cache(self, id_): 132 | '''Removes edge id_ from all places it occurs in the cache, if anywhere.''' 133 | self._remove_item_from_cache("edge", id_) 134 | 135 | def _update_item_cache(self, item_type, id_, attr_name): 136 | # if we are not caching by this attribute, there is nothing to do 137 | if attr_name not in self._cache_meta[item_type].cache: 138 | return 139 | 140 | # remove any nodes that are in the wrong place in the cache 141 | for key, nodes in self._cache_meta[item_type].cache[attr_name].items(): 142 | for node in nodes: 143 | if key not in node: 144 | self._cache_meta[item_type].cache[attr_name][key].remove(node) 145 | break # should only happen once 146 | 147 | self._cache_meta[item_type].cache_func(attr_name, self._cache_meta[item_type].get_func(id_)) 148 | 149 | def _update_node_cache(self, id_, attr_name): 150 | '''Update the cache for the given node with ID id_ and attribute attr_name 151 | 152 | IMPORTANT: Assumes that the attribute has already been set with the new value! 153 | ''' 154 | self._update_item_cache("node", id_, attr_name) 155 | 156 | def _update_edge_cache(self, id_, attr_name): 157 | '''Update the cache for the given edge with ID id_ and attribute attr_name 158 | 159 | IMPORTANT: Assumes that the attribute has already been set with the new value! 160 | ''' 161 | self._update_item_cache("edge", id_, attr_name) 162 | 163 | def log(self, line): 164 | '''Print the message line to standard output.''' 165 | if self.verbose: 166 | print("[SemanticNet] " + line) 167 | 168 | def _check_reserved_attrs(self, data): 169 | if type(data) is str: 170 | if data in self.attr_reserved: 171 | raise ReservedAttributeException(data) 172 | elif type(data) is dict: 173 | for k, v in data.iteritems(): 174 | if k in self.attr_reserved: 175 | raise ReservedAttributeException(k) 176 | 177 | def add_node(self, data={}, id_=None): 178 | '''Add a node to the graph, with an optional dict of attributes, data. 179 | 180 | By default, providing an ID is unnecessary; a random UUID is generated for each node. 181 | However, if you wish to key by something else, you can do so with the id_ parameter. 182 | If you choose to do this, please note that adding a node with the same ID twice 183 | overwrites the node that was previously there, so you must check for the presence 184 | of a node with an ID manually, if you wish to avoid this. 185 | ''' 186 | self._check_reserved_attrs(data) 187 | if id_ == None: 188 | id_ = self._create_uuid() 189 | else: 190 | id_ = self._extract_id(id_) 191 | 192 | data['id'] = id_ # add the ID to the attributes 193 | self.log("add_node " + str(data) + " = " + str(id_)) 194 | self._g.add_node(id_, data) 195 | self._cache_new_node(data) 196 | return id_ 197 | 198 | def add_nodes(self, nodes): 199 | '''Adds the nodes from the given parameter nodes, where nodes 200 | is EITHER: 201 | 202 | 1. a dictionary that maps node IDs to attributes, e.g.: 203 | 204 | { 205 | uuid.UUID('3caaa8c09148493dbdf02c574b95526c'): { 206 | 'type': "A" 207 | }, 208 | uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'): { 209 | 'type': "B" 210 | }, 211 | etc... 212 | } 213 | 214 | OR 215 | 216 | 2. a list that contains the attributes of the nodes to add, e.g. 217 | 218 | [ {'type': "A"}, {'type': "B"} ] 219 | 220 | where with this option, the unique IDs will be generated automatically, 221 | and it will return a list of the IDs in the respective order given. 222 | ''' 223 | if type(nodes) is dict: 224 | for id_, data_orig in nodes.items(): 225 | # copy the attributes so they don't 226 | # get deleted from the source data 227 | data = copy.deepcopy(data_orig) 228 | del data['id'] 229 | self.add_node(data, id_) 230 | elif type(nodes) is list: 231 | ids = [] 232 | for data_orig in nodes: 233 | data = copy.deepcopy(data_orig) 234 | id_ = data.get('id') 235 | if id_ != None: 236 | del data['id'] 237 | ids.append(self.add_node(data, id_)) 238 | return ids 239 | 240 | def remove_node(self, id_): 241 | '''Removes node id_.''' 242 | id_ = self._extract_id(id_) 243 | if self._g.has_node(id_): 244 | # remove all edges incident on this node 245 | for neighbor in self._g.neighbors(id_): 246 | # need to iterate over items() (which copies the dict) because we are 247 | # removing items from the edges dict as we are iterating over it 248 | for edge in self._g.edge[id_][neighbor].items(): 249 | # edge[0] is the edge's ID 250 | self.remove_edge(self._g.edge[id_][neighbor][edge[0]]["id"]) 251 | self._remove_node_from_cache(id_) 252 | self._g.remove_node(id_) 253 | else: 254 | raise GraphException("Node ID not found.") 255 | 256 | def remove_nodes(self, ids): 257 | map(self.remove_node, ids) 258 | 259 | def add_edge(self, src, dst, data={}, id_=None): 260 | '''Add an edge from src to dst, with an optional dict of attributes, data. 261 | 262 | By default, providing an ID is unnecessary; a random UUID is generated for each edge. 263 | However, if you wish to key by something else, you can do so with the id_ parameter. 264 | If you choose to do this, please note that adding an edge with the same ID twice 265 | overwrites the edge that was previously there, so you must check for the presence 266 | of an edge with an ID manually, if you wish to avoid this. 267 | ''' 268 | self._check_reserved_attrs(data) 269 | src = self._extract_id(src) 270 | dst = self._extract_id(dst) 271 | 272 | if id_ == None: 273 | id_ = self._create_uuid() 274 | else: 275 | id_ = self._extract_id(id_) 276 | 277 | if self._g.has_node(src) and self._g.has_node(dst): 278 | self.log("add_edge " + str(src) + ", " + str(dst) + ", " + str(data) + " = " + str(id_)) 279 | self._g.add_edge(src, dst, id_, 280 | dict(chain( 281 | data.items(), 282 | { 283 | "id": id_, 284 | "src": src, 285 | "dst": dst 286 | }.items()) 287 | ) 288 | ) 289 | self._edges[id_] = self._g.edge[src][dst][id_] 290 | self._cache_new_edge(self._edges[id_]) 291 | return id_ 292 | else: 293 | raise GraphException("Node ID not found.") 294 | 295 | def add_edges(self, edges): 296 | '''Adds the edges in the parameter edges, where edges is EITHER: 297 | 298 | 1. a list of tuples of the form (src, dst, data) or (src, dst, data, id_), 299 | where src and dst are the IDs of the nodes, 300 | data is the dictionary of the edge's attributes, 301 | and id_ is the unique ID of the edge, e.g.: 302 | 303 | [ (, , {'type': 'normal'}), etc... ] or 304 | [ (, , {'type': 'normal'}, ), etc... ] 305 | 306 | The two different forms of tuples may be combined, if desired. 307 | 308 | OR 309 | 310 | 2. a dictionary that maps edge IDs to their attributes, where attributes 311 | MUST contain at least the two attributes 'src' and 'dst', which are the 312 | unique IDs of the source and destination nodes, respectively. 313 | 314 | WARNING: If either 'src' or 'dst' is missing from an edge's attributes, 315 | it will be silently ignored! 316 | ''' 317 | if type(edges) is list: 318 | def _remove_reserved(tup, key): 319 | if key in tup[2]: 320 | del tup[2][key] 321 | 322 | for tup in edges: 323 | # make sure the tuple is of the correct form 324 | if len(tup) != 3 and len(tup) != 4: 325 | raise GraphException('Given tuple {} is of the wrong form.'.format(tup)) 326 | 327 | # make a deep copy, in case the tuple contains references 328 | # to data the user doesn't want modified 329 | tup = copy.deepcopy(tup) 330 | 331 | # remove reserved attributes from data 332 | for r in self.attr_reserved: 333 | _remove_reserved(tup, r) 334 | 335 | self.add_edge(*tup) 336 | elif type(edges) is dict: 337 | for id_, orig_attrs in edges.items(): 338 | # copy the attributes so we don't remove the reserved 339 | # attributes from the edges! 340 | attrs = copy.deepcopy(orig_attrs) 341 | try: 342 | # extract and remove the reserved attributes 343 | src = attrs.get('src') 344 | dst = attrs.get('dst') 345 | attr_id = attrs.get('id') 346 | if src != None: 347 | del attrs['src'] 348 | if dst != None: 349 | del attrs['dst'] 350 | if attr_id != None: 351 | del attrs['id'] 352 | 353 | self.add_edge(src, dst, attrs, id_) 354 | except KeyError: 355 | continue 356 | 357 | def remove_edge(self, id_): 358 | '''Removes edge id_.''' 359 | id_ = self._extract_id(id_) 360 | if id_ in self._edges: 361 | edge = self._edges[id_] 362 | self._g.remove_edge(edge["src"], edge["dst"], id_) 363 | self._remove_edge_from_cache(id_) 364 | del self._edges[id_] 365 | else: 366 | raise GraphException("Node ID not found.") 367 | 368 | def remove_edges(self, ids): 369 | map(self.remove_edge, ids) 370 | 371 | def set_graph_attribute(self, attr_name, value): 372 | self._g.graph[attr_name] = value 373 | 374 | def get_graph_attribute(self, attr_name): 375 | return self._g.graph[attr_name] 376 | 377 | def set_node_attribute(self, id_, attr_name, value): 378 | '''Sets the attribute attr_name to value for node id_.''' 379 | id_ = self._extract_id(id_) 380 | 381 | if self._g.has_node(id_): 382 | self._check_reserved_attrs(attr_name) 383 | self._g.node[id_][attr_name] = value 384 | self._update_node_cache(id_, attr_name) 385 | else: 386 | raise GraphException("Node id not found, can't set attribute.") 387 | 388 | def get_nodes(self): 389 | '''Returns a dict of all nodes in the graph, keyed by their unique ID.''' 390 | return dict([ (id_, self._g.node[id_]) for id_ in self._g.nodes() ]) 391 | 392 | def get_node_ids(self): 393 | '''Returns a list of the IDs of all nodes in the graph.''' 394 | return self._g.nodes() 395 | 396 | def get_node(self, id_): 397 | '''Get the node with the given ID.''' 398 | id_ = self._extract_id(id_) 399 | return self._g.node[id_] 400 | 401 | def get_or_add_node(self, id_, data={}): 402 | '''Get the node with the given ID if it exists. If not, create it 403 | with the given data attributes. For consistency, 404 | this method will always return the attributes of the node, whether it was 405 | already in the graph, or was created. 406 | ''' 407 | id_ = self._extract_id(id_) 408 | if self.has_node(id_): 409 | return self.get_node(id_) 410 | self.add_node(data, id_=id_) 411 | return self.get_node(id_) # return the attributes, instead of the ID 412 | 413 | def has_node(self, id_): 414 | id_ = self._extract_id(id_) 415 | return self._g.has_node(id_) 416 | 417 | def get_node_attribute(self, id_, attr_name): 418 | '''Returns the attribute attr_name of node id_.''' 419 | id_ = self._extract_id(id_) 420 | if self._g.has_node(id_): 421 | return self._g.node[id_][attr_name] 422 | else: 423 | raise GraphException("Node ID not found, can't get attribute") 424 | 425 | def get_node_attributes(self, id_): 426 | '''Returns all attributes of node id_.''' 427 | id_ = self._extract_id(id_) 428 | if self._g.has_node(id_): 429 | return self._g.node[id_] 430 | else: 431 | raise GraphException("Node ID not found, can't get attribute") 432 | 433 | def get_edges(self): 434 | '''Returns all edges in the graph.''' 435 | return self._edges 436 | 437 | def get_edge_ids(self): 438 | return [ id_ for id_ in self._edges ] 439 | 440 | def get_edge(self, id_): 441 | '''Returns edge id_.''' 442 | id_ = self._extract_id(id_) 443 | if id_ in self._edges: 444 | return self._edges[id_] 445 | else: 446 | raise GraphException('Node ID not found.') 447 | 448 | def get_edges_between(self, src, dst): 449 | '''Returns all edges between src and dst and between dst and src''' 450 | src = self._extract_id(src) 451 | dst = self._extract_id(dst) 452 | edges_src_dst = {} 453 | if self._g.has_node(src) and self._g.has_node(dst): 454 | if self._g.has_edge(src, dst): 455 | edges_src_dst = dict(edges_src_dst.items() + self._g.edge[src][dst].items()) 456 | 457 | # for DiGraphs, add edges in the other direction too 458 | if type(self) is not Graph and self._g.has_edge(dst, src): 459 | edges_src_dst = dict(edges_src_dst.items() + self._g.edge[dst][src].items()) 460 | 461 | return edges_src_dst 462 | 463 | def has_edge(self, id_): 464 | id_ = self._extract_id(id_) 465 | return id_ in self._edges 466 | 467 | def has_edge_between(self, src, dst): 468 | src = self._extract_id(src) 469 | dst = self._extract_id(dst) 470 | return self._g.has_node(src) and self._g.has_node(dst) and self._g.has_edge(src, dst) 471 | 472 | def set_edge_attribute(self, id_, attr_name, value): 473 | '''Sets the attribute attr_name to value for edge id_.''' 474 | id_ = self._extract_id(id_) 475 | if id_ in self._edges: 476 | self._check_reserved_attrs(attr_name) 477 | self._edges[id_][attr_name] = value 478 | self._update_edge_cache(id_, attr_name) 479 | else: 480 | raise GraphException("Edge id '" + str(id_) + "' not found!") 481 | 482 | def get_edge_attributes(self, id_): 483 | '''Returns all attributes for edge id_.''' 484 | id_ = self._extract_id(id_) 485 | if id_ in self._edges: 486 | return self._edges[id_] 487 | else: 488 | raise GraphException("Edge id '" + str(id_) + "' not found!") 489 | 490 | def get_edge_attribute(self, id_, attr_name): 491 | '''Returns the attribute attr_name for edge id_.''' 492 | id_ = self._extract_id(id_) 493 | if id_ in self._edges: 494 | if attr_name in self._edges[id_]: 495 | return self._edges[id_][attr_name] 496 | else: 497 | return None 498 | else: 499 | raise GraphException("Edge id '" + str(id_) + "' not found!") 500 | 501 | def add_event(self, timecode, name, attributes): 502 | self.timeline.append(Event(timecode, name, attributes)) 503 | 504 | def _cache_by(self, item_type, attr, build): 505 | # If we ARE already caching by this value, do nothing 506 | if attr in self._cache_meta[item_type].cache: 507 | return 508 | 509 | # If we ARE NOT not already caching by this value, initialize the dict for it. 510 | # This is also done in _cache_node/edge(), but this is needed for cases 511 | # where the user decides to start caching by an attribute, and they haven't 512 | # added any nodes/edges with that attribute yet 513 | if attr not in self._cache_meta[item_type].cache: 514 | self._cache_meta[item_type].cache[attr] = {} 515 | 516 | if not build: 517 | return 518 | 519 | for id_, item_attrs in self._cache_meta[item_type].get_items_func().items(): 520 | if attr in item_attrs: 521 | self._cache_meta[item_type].cache_func(attr, item_attrs) 522 | 523 | def cache_nodes_by(self, attr, build=True): 524 | '''Tells SemanticNet to cache nodes by the given attribute attr. 525 | 526 | After a call to this method, nodes will be accessible by calls to the method get_node_by_attr(). 527 | See the docs for that function for more detail. 528 | 529 | Optinally, if the user wishes to tell SemanticNet to start caching NEW nodes of type attr, but not 530 | to build a cache from the existing nodes, they may set the 'build' flag to False. 531 | ''' 532 | self._cache_by("node", attr, build) 533 | 534 | def cache_edges_by(self, attr, build=True): 535 | '''Tells SemanticNet to cache edges by the given attribute attr. 536 | 537 | After a call to this method, edges will be accessible by calls to the method get_edges_by_attr(). 538 | See the docs for that function for more detail. 539 | 540 | Optinally, if the user wishes to tell SemanticNet to start caching NEW edges of type attr, but not 541 | to build a cache from the existing edges, they may set the 'build' flag to False. 542 | ''' 543 | self._cache_by("edge", attr, build) 544 | 545 | def _clear_item_cache(self, item_type, attr): 546 | if attr == "": 547 | for key, val in self._cache_meta[item_type].cache.items(): 548 | del self._cache_meta[item_type].cache[key] 549 | elif attr in self._node_cache: 550 | self._cache_meta[item_type].cache[attr] = {} 551 | 552 | def clear_node_cache(self, attr=""): 553 | '''Delete the node cache. If attr is given, delete the cache for that attribute.''' 554 | self._clear_item_cache("node", attr) 555 | 556 | def clear_edge_cache(self, attr=""): 557 | '''Delete the edge cache. If attr is given, delete the cache for that attribute.''' 558 | self._clear_item_cache("edge", attr) 559 | 560 | def _get_items_by_attr(self, item_type, attr, val, nosingleton): 561 | items = self._cache_meta[item_type].cache.get(attr) 562 | 563 | # if the attribute doesn't exist, return an empty dict 564 | if items == None: 565 | return {} 566 | 567 | # if no value was specified for the attribute, return the whole dict 568 | # of items keyed by attr 569 | if val == None: 570 | return items 571 | 572 | # if there are no items with the given attribute and value, return an empty list 573 | if val not in items: 574 | return [] 575 | 576 | # if user set nosingleton to true, and there is only a single node with this value, 577 | # just return the node, rather than a singleton list 578 | if nosingleton and len(items[val]) == 1: 579 | return items[val][0] 580 | 581 | # otherwise, return all nodes with the attribute attr and the value val 582 | return items[val] 583 | 584 | def get_nodes_by_attr(self, attr, val=None, nosingleton=False): 585 | '''Gets all nodes with the given attribute attr and value val. 586 | 587 | If val is not specified, returns a dict of all nodes, keyed by attr. 588 | 589 | If there are no nodes with the given attr or val, returns an empty list. 590 | 591 | Optionally, if the nosingleton parameter is set to True, and there is only one node in a list, 592 | the method will only return that single node, rather than a singleton list. This is useful, 593 | for instance, if the user knows all nodes with attributes of a certain type will be unique, 594 | and wishes to simply use attr as the node key. 595 | ''' 596 | return self._get_items_by_attr("node", attr, val, nosingleton) 597 | 598 | def get_edges_by_attr(self, attr, val=None, nosingleton=False): 599 | '''Gets all edges with the given attribute attr and value val. 600 | 601 | If val is not specified, returns a dict of all edges, keyed by attr. 602 | 603 | If there are no edges with the given attr or val, returns an empty list. 604 | 605 | Optionally, if the nosingleton parameter is set to True, and there is only one edge in a list, 606 | the method will only return that single edge, rather than a singleton list. This is useful, 607 | for instance, if the user knows all edges with attributes of a certain type will be unique, 608 | and wishes to simply use attr as the edge key. 609 | ''' 610 | return self._get_items_by_attr("edge", attr, val, nosingleton) 611 | 612 | def neighbors(self, id_): 613 | return dict([(nid, self.get_node(nid)) for nid in self._g.neighbors(id_)]) 614 | 615 | def _get_export_id_str(self, id_): 616 | if id_.__class__.__name__ == "UUID": 617 | return id_.hex 618 | return id_ 619 | 620 | def _hexify_attrs(self, attrs): 621 | for key, val in attrs.iteritems(): 622 | if key in ['src', 'dst', 'id']: 623 | attrs[key] = self._get_export_id_str(attrs[key]) 624 | return attrs 625 | 626 | def save_json(self, filename): 627 | '''Exports the graph to a JSON file for use in the Gaia visualizer.''' 628 | with open(filename, 'w') as outfile: 629 | graph = dict() 630 | graph["meta"] = self.meta 631 | graph["nodes"] = [ dict(chain(self._g.node[id_].items(), {"id": self._get_export_id_str(id_)}.items())) for id_ in self._g.nodes() ] 632 | graph["edges"] = [ 633 | dict( 634 | chain( 635 | self._g.edge[i][j][key].items(), 636 | { "src": self._get_export_id_str(i), "dst": self._get_export_id_str(j), "id": self._get_export_id_str(key)}.items() 637 | ) 638 | ) 639 | for i, j in self._g.edges() 640 | for key in self._g.edge[i][j] 641 | ] 642 | graph["timeline"] = [ [c.timecode, c.name, self._hexify_attrs( c.attributes )] for c in self.timeline ] 643 | json.dump(graph, outfile, indent=True) 644 | 645 | def load_json(self, j): 646 | '''Generates a graph from the given JSON file j. j may be the filename string, or a JSON object.''' 647 | if type(j) is str: 648 | jfile = open(j, 'r') 649 | graph = json.load(jfile) 650 | else: 651 | graph = j 652 | 653 | self.meta = graph["meta"] 654 | self.timeline = graph["timeline"] 655 | 656 | for node in graph["nodes"]: 657 | id_ = self._extract_id(node.get("id")) 658 | if id_ != None: 659 | del node["id"] 660 | self.add_node(node, id_) 661 | 662 | for edge in graph["edges"]: 663 | src = self._extract_id(edge["src"]) 664 | dst = self._extract_id(edge["dst"]) 665 | id_ = self._extract_id(edge["id"]) if edge["id"] != None else self._create_uuid() 666 | self.add_edge( 667 | src, 668 | dst, 669 | dict([item for item in edge.items() 670 | if (item[0] != "src" and item[0] != "dst" and item[0] != "id")] ), 671 | id_ 672 | ) 673 | self._g.edge[src][dst][id_]["id"] = id_ 674 | 675 | def copy(self): 676 | return copy.deepcopy(self) 677 | 678 | def _check_key_presence(self, d, key, val): 679 | try: 680 | d[key] 681 | except KeyError: 682 | d[key] = val 683 | 684 | def networkx_graph(self): 685 | return copy.deepcopy(self._g) 686 | 687 | def load_networkx_graph(self, nxgraph): 688 | self._g = nxgraph 689 | 690 | # add id fields on nodes that don't have them 691 | for id_ in self._g.nodes(): 692 | self._check_key_presence(self._g.node[id_], "id", id_) 693 | 694 | for src, dst in nxgraph.edges(): 695 | for key in nxgraph.edge[src][dst]: 696 | attrs = nxgraph.edge[src][dst][key] 697 | self._edges[key] = attrs 698 | self._check_key_presence(self._edges[key], "id", key) 699 | self._check_key_presence(self._edges[key], "src", src) 700 | self._check_key_presence(self._edges[key], "dst", dst) 701 | 702 | if __name__ == "__main__": 703 | print("Please import this module !") 704 | -------------------------------------------------------------------------------- /semanticnet/__init__.py: -------------------------------------------------------------------------------- 1 | from Graph import * 2 | from DiGraph import * 3 | from operators import * 4 | from algorithms import * 5 | -------------------------------------------------------------------------------- /semanticnet/algorithms.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | import semanticnet as sn 3 | 4 | def _mark_incident_edges_as(U, G, status): 5 | for nid in G.get_node_ids(): 6 | # for every successor of the removed node rnid 7 | for neighbor in U._g.neighbors(nid): 8 | edges_nid_neighbor = [eid for eid, attrs in U.get_edges_between(nid, neighbor).items()] 9 | map(lambda reid: U.set_edge_attribute(reid, 'diffstatus', status), edges_nid_neighbor) 10 | if type(U) is sn.DiGraph: 11 | for predecessor in U._g.predecessors(nid): 12 | edges_predecessor_nid = [eid for eid, attrs in U.get_edges_between(nid, predecessor).items()] 13 | map(lambda reid: U.set_edge_attribute(reid, 'diffstatus', status), edges_predecessor_nid) 14 | 15 | def _mark_nodes_edges_as(U, G, status): 16 | map(lambda nid: U.set_node_attribute(nid, 'diffstatus', status), G.get_node_ids()) 17 | map(lambda eid: U.set_edge_attribute(eid, 'diffstatus', status), G.get_edges()) 18 | 19 | # I is the graph of nodes/edges which are in both A and B 20 | def _check_changed_edges(A, B, AB, I): 21 | # Go through the remaining edges that haven't been marked with a 'diffstatus'. 22 | # These are edges that are incident on, to, or from two nodes which are in 23 | # both A and B 24 | for eid in [ eid for eid, attrs in AB.get_edges().iteritems() if 'diffstatus' not in attrs ]: 25 | # removed edges 26 | if eid in A.get_edges() and eid not in B.get_edges(): 27 | AB.set_edge_attribute(eid, 'diffstatus', 'removed') 28 | # added edges 29 | elif eid in B.get_edges() and eid not in A.get_edges(): 30 | AB.set_edge_attribute(eid, 'diffstatus', 'added') 31 | 32 | def _clear_clutter(U): 33 | '''Clears up some clutter, so only relevant unchanged nodes/edges 34 | remain in the graph. 35 | ''' 36 | same = [n for n, attrs in U.get_nodes().iteritems() if attrs['diffstatus'] == 'same'] 37 | for n in same: 38 | # get the list of edges incident to or from this node 39 | successors = U._g.neighbors(n) 40 | predecessors = U._g.predecessors(n) if type(U) is sn.DiGraph else [] 41 | edges = {} 42 | map(lambda adj: edges.update(U.get_edges_between(n, adj)), successors + predecessors) 43 | 44 | # a node is "relevant" if any of its incident edges has been changed, OR 45 | # it is connected to a changed node AND has an in-degree > 0 46 | changed = [ 47 | # the edge itself has been changed, or 48 | attrs['diffstatus'] != 'same' or 49 | # it is connected to a changed node 50 | U.get_node_attribute(attrs['dst'], 'diffstatus') != 'same' or 51 | U.get_node_attribute(attrs['src'], 'diffstatus') != 'same' 52 | for e, attrs in edges.iteritems() 53 | ] 54 | if not any(changed): 55 | U.remove_node(n) 56 | 57 | def _check_mods(A, B, AB, S): 58 | '''Check the 'same' graph for differences in attributes ''' 59 | for nid, attrs in S.get_nodes().iteritems(): 60 | if attrs != A.get_node(nid) or attrs != B.get_node(nid): 61 | AB.set_node_attribute(nid, 'diffstatus', 'modified') 62 | for eid, attrs in S.get_edges().iteritems(): 63 | if attrs != A.get_edge(eid) or attrs != B.get_edge(eid): 64 | AB.set_edge_attribute(eid, 'diffstatus', 'modified') 65 | 66 | def diff(A, B, context=False, mods=False): 67 | '''Given two graphs A and B, where it is generally assumed that B is a "newer" version of A, 68 | returns a new graph which captures information about which nodes and edges of A were 69 | removed, added, and remain the same in B. 70 | 71 | Specifically, it returns A ∪ B, such that: 72 | 1. Nodes in A - B are given the "diffstatus" attribute "removed" 73 | 2. Nodes in B - A are given the "diffstatus" attribute "added" 74 | 3. Nodes in A ∩ B are given the "diffstatus" attribute "same" 75 | 76 | Notice that the union of 1 - 3 equals A ∪ B. 77 | 78 | The optional parameter context, when true, will prune the graph so that nodes/edges which are 79 | the same are only present in the diff graph if: 80 | 1. An edge incident on/to/from it has been changed, or 81 | 2. it is connected to a changed node. 82 | 83 | The optional parameter mods, when true, will check for attribute modifications on nodes and 84 | edges, in addition to new/removed nodes. Any nodes/edges that have had their attributes 85 | changed between A and B are marked with the "diffstatus" attribute as "modified." 86 | 87 | WARNING: Currently, this method only works if both A and B were generated with unique IDs in a 88 | deterministic fashion; i.e., two identical nodes are given the same ID at both points in time. 89 | This means that diff() will not work on graphs which were generated with automatic random UUIDs. 90 | ''' 91 | # must take their union first, then mark appropriate nodes/edges 92 | AB = sn.union(A, B) 93 | 94 | # any edges incident on, to, or from the removed nodes will not be in the removed graph, 95 | # since we cannot have edges incident on, to, or from non-existent nodes 96 | removed = sn.difference(A, B) 97 | _mark_nodes_edges_as(AB, removed, 'removed') 98 | _mark_incident_edges_as(AB, removed, 'removed') 99 | 100 | added = sn.difference(B, A) 101 | _mark_nodes_edges_as(AB, added, 'added') 102 | _mark_incident_edges_as(AB, added, 'added') 103 | 104 | same = sn.intersection(B, A) 105 | _mark_nodes_edges_as(AB, same, 'same') 106 | _check_changed_edges(A, B, AB, same) 107 | 108 | if mods: 109 | _check_mods(A, B, AB, same) 110 | 111 | if context: 112 | _clear_clutter(AB) 113 | 114 | return AB 115 | -------------------------------------------------------------------------------- /semanticnet/operators.py: -------------------------------------------------------------------------------- 1 | import semanticnet as sn 2 | 3 | ### Convenience lambdas ### 4 | def node_in(nid, G): 5 | '''Returns true if the node n is in the graph G.''' 6 | return G._g.has_node(nid) 7 | 8 | def edge_in(eid, G): 9 | return eid in G.get_edges() 10 | 11 | ### Operators 12 | def _inter(A, B, node_cond, edge_cond): 13 | '''Generic internal helper for building a new graph, starting with the nodes in A 14 | and excluding members based on a condition on B. 15 | Generates and returns a new graph C = (V, E), where 16 | 17 | V = {v in V(A) | node_cond(v, B)} 18 | E = {e in E(A) | edge_cond(e, B)} 19 | 20 | ''' 21 | C = A.copy() 22 | C.remove_nodes(n for n in C.get_node_ids() if not node_cond(n, B)) 23 | C.remove_edges(e for e in C.get_edge_ids() if not edge_cond(e, B)) 24 | return C 25 | 26 | def difference(A, B, node_is_member=node_in, edge_is_member=edge_in): 27 | '''Returns a new graph which contains the nodes and edges in A, but not in B. 28 | 29 | User may pass in a lambda which defines what it means for an element (node or edge) 30 | to be a member of a graph. By default, it uses the unique IDs. The lambda must 31 | be of the form: 32 | 33 | lambda id_, G: (expression which determines if the element id_ is "in" the graph G) 34 | ''' 35 | return _inter(A, B, lambda n, new_G: not node_is_member(n, new_G), lambda e, new_G: not edge_is_member(e, new_G)) 36 | 37 | def intersection(A, B, node_is_member=node_in, edge_is_member=edge_in): 38 | '''Returns a new graph which contains the nodes and edges which are in BOTH A and B. 39 | 40 | User may pass in a lambda which defines what it means for an element (node or edge) 41 | to be a member of a graph. By default, it uses the unique IDs. The lambda must 42 | be of the form: 43 | 44 | lambda id_, G: (expression which determines if the element id_ is "in" the graph G) 45 | ''' 46 | return _inter(A, B, node_is_member, edge_is_member) 47 | 48 | def union(A, B, node_is_member=node_in, edge_is_member=edge_in): 49 | '''Returns a new graph which contains the nodes and edges in EITHER A or B. 50 | 51 | User may pass in a lambda which defines what it means for an element (node or edge) 52 | to be a member of a graph. By default, it uses the unique IDs. The lambda must 53 | be of the form: 54 | 55 | lambda id_, G: (expression which determines if the element id_ is "in" the graph G) 56 | ''' 57 | # copy A and B, and combine all their nodes and edges based on ID first, to create 58 | # a universal set AB to use in building the union 59 | AB = B.copy() 60 | AB.add_nodes(dict((nid, attrs) for nid, attrs in A.get_nodes().iteritems() if nid not in AB.get_node_ids())) 61 | AB.add_edges(dict((eid, attrs) for eid, attrs in A.get_edges().iteritems() if eid not in AB.get_edges())) 62 | 63 | # then use the universal set AB to build the union, based on the lambdas 64 | C = type(AB)() 65 | C.add_nodes( 66 | dict( 67 | (nid, attrs) 68 | for nid, attrs in AB.get_nodes().items() 69 | if node_is_member(nid, A) or node_is_member(nid, B) 70 | ) 71 | ) 72 | C.add_edges( 73 | dict( 74 | (eid, attrs) 75 | for eid, attrs in AB.get_edges().items() 76 | if edge_is_member(eid, A) or edge_is_member(eid, B) 77 | ) 78 | ) 79 | return C 80 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from setuptools import setup 4 | 5 | setup( 6 | name='semanticnet', 7 | packages=['semanticnet'], 8 | version='0.2.0', 9 | description='Small python library to create semantic graphs in JSON.', 10 | author='Thibault Reuille', 11 | author_email='thibault@opendns.com', 12 | url="https://github.com/ThibaultReuille/semanticnet", 13 | install_requires=['networkx'] 14 | ) 15 | -------------------------------------------------------------------------------- /test/conftest.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import pytest 4 | import semanticnet as sn 5 | import networkx as nx 6 | import uuid 7 | 8 | @pytest.fixture 9 | def fixture_dir(): 10 | return os.path.join(os.path.dirname(os.path.realpath(__file__)), "fixtures") 11 | 12 | @pytest.fixture 13 | def uuid_str(): 14 | return '3caaa8c09148493dbdf02c574b95526c' 15 | 16 | @pytest.fixture 17 | def uuid_obj(): 18 | return uuid.UUID('3caaa8c09148493dbdf02c574b95526c') 19 | 20 | @pytest.fixture 21 | def graph(): 22 | return sn.Graph() 23 | 24 | @pytest.fixture 25 | def populated_graph(): 26 | g = sn.Graph() 27 | a = g.add_node({"type": "A"}, '3caaa8c09148493dbdf02c574b95526c') 28 | b = g.add_node({"type": "B"}, '2cdfebf3bf9547f19f0412ccdfbe03b7') 29 | c = g.add_node({"type": "C"}, '3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 30 | g.add_edge(a, b, {"type": "normal"}, '5f5f44ec7c0144e29c5b7d513f92d9ab') 31 | g.add_edge(a, c, {"type": "normal"}, '7eb91be54d3746b89a61a282bcc207bb') 32 | g.add_edge(b, c, {"type": "irregular"}, 'c172a3599b7d4ef3bbb688277276b763') 33 | return g 34 | 35 | @pytest.fixture 36 | def populated_digraph(): 37 | g = sn.DiGraph() 38 | a = g.add_node({"type": "A"}, '3caaa8c09148493dbdf02c574b95526c') 39 | b = g.add_node({"type": "B"}, '2cdfebf3bf9547f19f0412ccdfbe03b7') 40 | c = g.add_node({"type": "C"}, '3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 41 | g.add_edge(a, b, {"type": "normal"}, '5f5f44ec7c0144e29c5b7d513f92d9ab') 42 | g.add_edge(b, a, {"type": "normal"}, 'f3674fcc691848ebbd478b1bfb3e84c3') 43 | g.add_edge(a, c, {"type": "normal"}, '7eb91be54d3746b89a61a282bcc207bb') 44 | g.add_edge(b, c, {"type": "irregular"}, 'c172a3599b7d4ef3bbb688277276b763') 45 | return g 46 | 47 | @pytest.fixture 48 | def correct_output_graph_plaintext(): 49 | graph = sn.Graph() 50 | 51 | a = graph.add_node({"label" : "A"}, 'a') 52 | b = graph.add_node({"label" : "B"}, 'b') 53 | c = graph.add_node({"label" : "C"}, 'c') 54 | 55 | graph.add_edge(a, b, {"type" : "belongs"}, 'belongs') 56 | graph.add_edge(b, c, {"type" : "owns"}, 'owns') 57 | graph.add_edge(c, a, {"type" : "has"}, 'has') 58 | 59 | return graph 60 | 61 | @pytest.fixture 62 | def correct_output_graph_plaintext_from_file(fixture_dir): 63 | g = sn.Graph() 64 | g.load_json(os.path.join(fixture_dir, "test_output_correct_plaintext.json")) 65 | return g 66 | 67 | @pytest.fixture 68 | def test_output_plaintext(correct_output_graph_plaintext, fixture_dir): 69 | correct_output_graph_plaintext.save_json(os.path.join(fixture_dir, "test_output_plaintext.json")) 70 | 71 | with open(os.path.join(fixture_dir, "test_output_plaintext.json")) as f: 72 | jsonObj = json.load(f) 73 | 74 | os.remove(os.path.join(fixture_dir, "test_output_plaintext.json")) 75 | 76 | return jsonObj 77 | 78 | @pytest.fixture 79 | def test_output_plaintext_correct(fixture_dir): 80 | with open(os.path.join(fixture_dir, "test_output_correct_plaintext.json")) as f: 81 | jsonObj = json.load(f) 82 | 83 | return jsonObj 84 | 85 | @pytest.fixture 86 | def test_output(fixture_dir): 87 | graph = sn.Graph() 88 | 89 | a = graph.add_node({"label" : "A"}, '6cf546f71efe47578f7a1400871ef6b8') 90 | b = graph.add_node({"label" : "B"}, 'bcb388bb24a74d978fa2006ed278b2fe') 91 | c = graph.add_node({"label" : "C"}, 'd6523f4f9d5240d2a92e341f4ca00a78') 92 | 93 | graph.add_edge(a, b, {"type" : "belongs"}, 'ff8a8a8093cf436aa3b0127c71ddc11d') 94 | graph.add_edge(b, c, {"type" : "owns"}, '081369f6197b467abe97b3efe8cc4640') 95 | graph.add_edge(c, a, {"type" : "has"}, 'b3a245098d5d482f893c6d63606c7e91') 96 | 97 | graph.save_json(os.path.join(fixture_dir, "test_output.json")) 98 | 99 | with open(os.path.join(fixture_dir, "test_output.json")) as f: 100 | jsonObj = json.load(f) 101 | 102 | return jsonObj 103 | 104 | @pytest.fixture 105 | def correct_output_filename(): 106 | return "test_output_correct.json" 107 | 108 | @pytest.fixture 109 | def correct_output(fixture_dir, correct_output_filename): 110 | with open(os.path.join(fixture_dir, correct_output_filename)) as f: 111 | jsonObj = json.load(f) 112 | 113 | return jsonObj 114 | 115 | @pytest.fixture 116 | def correct_output_graph(fixture_dir, correct_output_filename): 117 | g = sn.Graph() 118 | g.load_json(os.path.join(fixture_dir, correct_output_filename)) 119 | return g 120 | 121 | @pytest.fixture 122 | def netx_graph(): 123 | g = nx.MultiGraph() 124 | g.add_node(0, {"type": "A"}) 125 | g.add_node(1, {"type": "B"}) 126 | g.add_node(2, {"type": "C"}) 127 | g.add_edge(0, 1, 0, {"type": "normal"}) 128 | g.add_edge(0, 2, 1, {"type": "normal"}) 129 | g.add_edge(1, 2, 2, {"type": "irregular"}) 130 | return g 131 | -------------------------------------------------------------------------------- /test/fixtures/test_output_correct.json: -------------------------------------------------------------------------------- 1 | { 2 | "timeline": [], 3 | "nodes": [ 4 | { 5 | "id": "d6523f4f9d5240d2a92e341f4ca00a78", 6 | "label": "C" 7 | }, 8 | { 9 | "id": "bcb388bb24a74d978fa2006ed278b2fe", 10 | "label": "B" 11 | }, 12 | { 13 | "id": "6cf546f71efe47578f7a1400871ef6b8", 14 | "label": "A" 15 | } 16 | ], 17 | "meta": {}, 18 | "edges": [ 19 | { 20 | "src": "d6523f4f9d5240d2a92e341f4ca00a78", 21 | "dst": "6cf546f71efe47578f7a1400871ef6b8", 22 | "type": "has", 23 | "id": "b3a245098d5d482f893c6d63606c7e91" 24 | }, 25 | { 26 | "src": "bcb388bb24a74d978fa2006ed278b2fe", 27 | "dst": "d6523f4f9d5240d2a92e341f4ca00a78", 28 | "type": "owns", 29 | "id": "081369f6197b467abe97b3efe8cc4640" 30 | }, 31 | { 32 | "src": "6cf546f71efe47578f7a1400871ef6b8", 33 | "dst": "bcb388bb24a74d978fa2006ed278b2fe", 34 | "type": "belongs", 35 | "id": "ff8a8a8093cf436aa3b0127c71ddc11d" 36 | } 37 | ] 38 | } -------------------------------------------------------------------------------- /test/fixtures/test_output_correct_plaintext.json: -------------------------------------------------------------------------------- 1 | { 2 | "timeline": [], 3 | "nodes": [ 4 | { 5 | "id": "c", 6 | "label": "C" 7 | }, 8 | { 9 | "id": "b", 10 | "label": "B" 11 | }, 12 | { 13 | "id": "a", 14 | "label": "A" 15 | } 16 | ], 17 | "meta": {}, 18 | "edges": [ 19 | { 20 | "src": "c", 21 | "dst": "a", 22 | "type": "has", 23 | "id": "has" 24 | }, 25 | { 26 | "src": "b", 27 | "dst": "c", 28 | "type": "owns", 29 | "id": "owns" 30 | }, 31 | { 32 | "src": "a", 33 | "dst": "b", 34 | "type": "belongs", 35 | "id": "belongs" 36 | } 37 | ] 38 | } -------------------------------------------------------------------------------- /test/test_algorithms.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import semanticnet as sn 3 | import uuid 4 | 5 | def test_diff(populated_digraph): 6 | A = populated_digraph 7 | B = A.copy() 8 | 9 | # remove node C. Will consequently also remove edges (A, C) and (B, C) 10 | B.remove_node('3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 11 | 12 | node_d = B.add_node({"type": "D"}, 'da30015efe3c44dbb0b3b3862cef704a') # add a new node of type D 13 | da = B.add_edge(node_d, '3caaa8c09148493dbdf02c574b95526c', {"type": "irregular"}) # add an edge from D to A 14 | B.remove_edge('5f5f44ec7c0144e29c5b7d513f92d9ab') # remove (A, B) 15 | # change node A to type Z to check for modifications 16 | B.set_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'type', 'Z') 17 | 18 | D = sn.diff(A, B, mods=True) # compute the diff graph D 19 | 20 | correct_nodes = { 21 | uuid.UUID('3caaa8c09148493dbdf02c574b95526c'): { 22 | 'id': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 23 | 'type': "Z", 24 | 'diffstatus': 'modified' 25 | }, 26 | uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'): { 27 | 'id': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 28 | 'type': "B", 29 | 'diffstatus': 'same' 30 | }, 31 | uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'): { 32 | 'id': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 33 | 'type': "C", 34 | 'diffstatus': 'removed' 35 | }, 36 | node_d: { 37 | 'id': node_d, 38 | 'type': "D", 39 | 'diffstatus': 'added' 40 | } 41 | } 42 | assert D.get_nodes() == correct_nodes 43 | 44 | correct_edges = { 45 | # (A, B) 46 | uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab'): { 47 | 'id': uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab'), 48 | 'src': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 49 | 'dst': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 50 | 'type': 'normal', 51 | 'diffstatus': 'removed' 52 | }, 53 | # (B, A) 54 | uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'): { 55 | 'id': uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'), 56 | 'src': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 57 | 'dst': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 58 | 'type': 'normal', 59 | 'diffstatus': 'same' 60 | }, 61 | # (A, C) 62 | uuid.UUID('7eb91be54d3746b89a61a282bcc207bb'): { 63 | 'id': uuid.UUID('7eb91be54d3746b89a61a282bcc207bb'), 64 | 'src': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 65 | 'dst': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 66 | 'type': 'normal', 67 | 'diffstatus': 'removed' 68 | }, 69 | # (B, C) 70 | uuid.UUID('c172a3599b7d4ef3bbb688277276b763'): { 71 | 'id': uuid.UUID('c172a3599b7d4ef3bbb688277276b763'), 72 | 'src': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 73 | 'dst': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 74 | 'type': 'irregular', 75 | 'diffstatus': 'removed' 76 | }, 77 | # (D, A) 78 | da: { 79 | 'id': da, 80 | 'src': uuid.UUID('da30015efe3c44dbb0b3b3862cef704a'), 81 | 'dst': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 82 | 'type': 'irregular', 83 | 'diffstatus': 'added' 84 | }, 85 | } 86 | assert D.get_edges() == correct_edges 87 | -------------------------------------------------------------------------------- /test/test_cache.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import uuid 3 | 4 | def test_cache_nodes_by(populated_graph): 5 | # add another node with the same type to make sure it works for multiple nodes 6 | # with the same attribute 7 | populated_graph.add_node({"type": "A"}, '2b673235a0b94935ab8b6b9de178d341') 8 | 9 | # cache by the attribute "type" 10 | populated_graph.cache_nodes_by("type") 11 | 12 | in_cache = populated_graph._node_cache 13 | out_cache = { 14 | "type": { 15 | "B": [{ 16 | "id": uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 17 | 'type': 'B' 18 | }], 19 | "A": [ 20 | { 21 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 22 | "type": "A" 23 | }, 24 | { 25 | "id": uuid.UUID('2b673235a0b94935ab8b6b9de178d341'), 26 | "type": "A" 27 | } 28 | ], 29 | "C": [{ 30 | "id": uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 31 | 'type': 'C' 32 | }] 33 | } 34 | } 35 | assert in_cache == out_cache 36 | 37 | # calling cache_nodes_by() more than once on the same attribute should be ignored 38 | populated_graph.cache_nodes_by("type") 39 | in_cache = populated_graph._node_cache 40 | assert in_cache == out_cache 41 | 42 | 43 | def test_cache_edges_by(populated_graph): 44 | populated_graph.cache_edges_by("type") 45 | 46 | assert ( 47 | populated_graph._edge_cache == 48 | { 49 | "type": { 50 | "normal": [ 51 | { 52 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 53 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 54 | 'type': 'normal', 55 | 'id': uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb') 56 | }, 57 | { 58 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 59 | 'dst': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 60 | 'type': 'normal', 61 | 'id': uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab') 62 | } 63 | ], 64 | "irregular":[ 65 | { 66 | 'src': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 67 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 68 | 'type': 'irregular', 69 | 'id': uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763') 70 | } 71 | ] 72 | } 73 | } 74 | ) 75 | 76 | def test_get_edges_by_attr(populated_graph): 77 | populated_graph.cache_edges_by("type") 78 | 79 | type_edges = populated_graph.get_edges_by_attr("type") 80 | correct_output = { 81 | "normal": [ 82 | { 83 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 84 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 85 | 'type': 'normal', 86 | 'id': uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb') 87 | }, 88 | { 89 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 90 | 'dst': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 91 | 'type': 'normal', 92 | 'id': uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab') 93 | } 94 | ], 95 | "irregular":[ 96 | { 97 | 'src': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 98 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 99 | 'type': 'irregular', 100 | 'id': uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763') 101 | } 102 | ] 103 | } 104 | assert type_edges == correct_output 105 | 106 | normal_edges = populated_graph.get_edges_by_attr("type", "normal") 107 | correct_output = [ 108 | { 109 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 110 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 111 | 'type': 'normal', 112 | 'id': uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb') 113 | }, 114 | { 115 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 116 | 'dst': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 117 | 'type': 'normal', 118 | 'id': uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab') 119 | } 120 | ] 121 | assert normal_edges == correct_output 122 | 123 | irregular_edges = populated_graph.get_edges_by_attr("type", "irregular") 124 | correct_output = [ 125 | { 126 | 'src': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 127 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 128 | 'type': 'irregular', 129 | 'id': uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763') 130 | } 131 | ] 132 | assert irregular_edges == correct_output 133 | 134 | # should return a single item, rather than a singleton list 135 | irregular_edges = populated_graph.get_edges_by_attr("type", "irregular", nosingleton=True) 136 | correct_output = { 137 | 'src': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 138 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 139 | 'type': 'irregular', 140 | 'id': uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763') 141 | } 142 | assert irregular_edges == correct_output 143 | 144 | def test_cache_edges_by_build_false(populated_graph): 145 | populated_graph.cache_edges_by("type", build=False) 146 | 147 | assert ( 148 | populated_graph._edge_cache == 149 | { 150 | "type": {} 151 | } 152 | ) 153 | 154 | def test_add_edge_with_cache(populated_graph): 155 | test_cache_edges_by(populated_graph) # builds the cache 156 | 157 | populated_graph.add_edge('3caaa8c09148493dbdf02c574b95526c', '2cdfebf3bf9547f19f0412ccdfbe03b7', 158 | {"type": "irregular"}, 'c332692fcce54ea2ae85ece6788f7f05') 159 | 160 | assert ( 161 | populated_graph._edge_cache == 162 | { 163 | "type": { 164 | "normal": [ 165 | { 166 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 167 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 168 | 'type': 'normal', 169 | 'id': uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb') 170 | }, 171 | { 172 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 173 | 'dst': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 174 | 'type': 'normal', 175 | 'id': uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab') 176 | } 177 | ], 178 | "irregular":[ 179 | { 180 | 'src': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 181 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 182 | 'type': 'irregular', 183 | 'id': uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763') 184 | }, 185 | { 186 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 187 | 'dst': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 188 | 'type': 'irregular', 189 | 'id': uuid.UUID('c332692fcce54ea2ae85ece6788f7f05') 190 | } 191 | ] 192 | } 193 | } 194 | ) 195 | 196 | # add an edge with an attribute we are not tracking 197 | # should NOT be in the cache 198 | populated_graph.add_edge('2cdfebf3bf9547f19f0412ccdfbe03b7', '3cd197c2cf5e42dc9ccd0c2adcaf4bc2', 199 | {"label": "test"}, 'a0d5f731322e428ca1549296ad1c5f66') 200 | 201 | assert "label" not in populated_graph._edge_cache 202 | 203 | def test_cache_by_build_false(populated_graph): 204 | populated_graph.cache_nodes_by("type", build=False) 205 | 206 | assert ( 207 | populated_graph._node_cache == 208 | { 209 | "type": {} 210 | } 211 | ) 212 | 213 | def test_clear_node_cache(populated_graph): 214 | # add a node with a different attribute 215 | populated_graph.add_node( 216 | {"label": "test"}, 217 | '13624b67282444cb9e038ccd8038e644' 218 | ) 219 | 220 | populated_graph.cache_nodes_by("type") 221 | populated_graph.cache_nodes_by("label") 222 | print("node_cache: {}".format(populated_graph._node_cache)) 223 | assert populated_graph._node_cache["type"] # cache is not empty 224 | assert populated_graph._node_cache["label"] # cache is not empty 225 | 226 | populated_graph.clear_node_cache('type') 227 | assert not populated_graph._node_cache["type"] # cache IS empty 228 | assert populated_graph._node_cache["label"] # cache is not empty 229 | 230 | populated_graph.clear_node_cache() 231 | assert not populated_graph._node_cache # entire cache is gone 232 | 233 | def test_add_node_with_cache(populated_graph): 234 | populated_graph.cache_nodes_by("type") 235 | 236 | ### get all nodes with attr "type" 237 | input_ = populated_graph.get_nodes_by_attr("type") 238 | output = { 239 | "A": [ 240 | { 241 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 242 | "type": "A" 243 | } 244 | ], 245 | "B": [{ 246 | "id": uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 247 | 'type': 'B' 248 | }], 249 | "C": [{ 250 | "id": uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 251 | 'type': 'C' 252 | }] 253 | } 254 | 255 | assert input_ == output 256 | 257 | # Add a new node of the same type as another. Should be in the cache. 258 | populated_graph.add_node({"type": "A"}, '2b673235a0b94935ab8b6b9de178d341') 259 | input_ = populated_graph.get_nodes_by_attr("type", "A") 260 | output = [ 261 | { 262 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 263 | "type": "A" 264 | }, 265 | { 266 | "id": uuid.UUID('2b673235a0b94935ab8b6b9de178d341'), 267 | "type": "A" 268 | } 269 | ] 270 | 271 | assert input_ == output 272 | 273 | # add a new node with an attribute that we are not caching. 274 | # Should NOT be in the cache 275 | populated_graph.add_node({"label": "test"}, 'fa02d5e82ed54baf828558c70317f20e') 276 | input_ = populated_graph._node_cache 277 | output = { 278 | "type": { 279 | "A": [ 280 | { 281 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 282 | "type": "A" 283 | }, 284 | { 285 | "id": uuid.UUID('2b673235a0b94935ab8b6b9de178d341'), 286 | "type": "A" 287 | } 288 | ], 289 | "B": [{ 290 | "id": uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 291 | 'type': 'B' 292 | }], 293 | "C": [{ 294 | "id": uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 295 | 'type': 'C' 296 | }] 297 | } 298 | } 299 | 300 | assert input_ == output 301 | 302 | def test_set_node_attribute_with_cache(populated_graph): 303 | populated_graph.cache_nodes_by("type") 304 | populated_graph.set_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'type', 'B') 305 | 306 | a_nodes = populated_graph.get_nodes_by_attr("type", "A") 307 | b_nodes = populated_graph.get_nodes_by_attr("type", "B") 308 | 309 | node_a = populated_graph.get_node('3caaa8c09148493dbdf02c574b95526c') 310 | 311 | assert node_a not in a_nodes 312 | assert node_a in b_nodes 313 | 314 | def test_set_edge_attribute_with_cache(populated_graph): 315 | populated_graph.cache_edges_by("type") 316 | populated_graph.set_edge_attribute('7eb91be54d3746b89a61a282bcc207bb', 317 | 'type', 'irregular') 318 | 319 | normal_edges = populated_graph.get_edges_by_attr("type", "normal") 320 | irregular_edges = populated_graph.get_edges_by_attr("type", "irregular") 321 | 322 | edge_a_b = populated_graph.get_edge('5f5f44ec7c0144e29c5b7d513f92d9ab') 323 | edge_a_c = populated_graph.get_edge('7eb91be54d3746b89a61a282bcc207bb') 324 | 325 | assert edge_a_b in normal_edges 326 | assert edge_a_c not in normal_edges 327 | assert edge_a_c in irregular_edges 328 | 329 | def test_remove_node_with_cache(populated_graph): 330 | populated_graph.add_node({"type": "A"}, '2b673235a0b94935ab8b6b9de178d341') 331 | populated_graph.cache_nodes_by("type") 332 | populated_graph.remove_node('3caaa8c09148493dbdf02c574b95526c') 333 | 334 | input_ = populated_graph.get_nodes_by_attr("type", "A") 335 | output = [{ 336 | "id": uuid.UUID('2b673235a0b94935ab8b6b9de178d341'), 337 | "type": "A" 338 | }] 339 | 340 | assert input_ == output 341 | 342 | def test_remove_edge_with_cache(populated_graph): 343 | populated_graph.cache_edges_by("type") 344 | 345 | populated_graph.add_edge('3caaa8c09148493dbdf02c574b95526c', '2cdfebf3bf9547f19f0412ccdfbe03b7', 346 | {"type": "irregular"}, 'c332692fcce54ea2ae85ece6788f7f05') 347 | 348 | populated_graph.remove_edge('c172a359-9b7d-4ef3-bbb6-88277276b763') 349 | 350 | assert ( 351 | populated_graph.get_edges_by_attr("type", "irregular") == 352 | [ 353 | { 354 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 355 | 'dst': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 356 | 'type': 'irregular', 357 | 'id': uuid.UUID('c332692fcce54ea2ae85ece6788f7f05') 358 | } 359 | ] 360 | ) 361 | -------------------------------------------------------------------------------- /test/test_graph.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pytest 3 | import semanticnet as sn 4 | import time 5 | import uuid 6 | 7 | def test_json_constructor(fixture_dir, correct_output_filename, correct_output_graph): 8 | g = sn.DiGraph(json_file=os.path.join(fixture_dir, correct_output_filename)) 9 | assert g.get_nodes() == correct_output_graph.get_nodes() 10 | assert g.get_edges() == correct_output_graph.get_edges() 11 | 12 | def test__create_uuid(graph): 13 | id_ = graph._create_uuid() 14 | assert id_.__class__.__name__ == 'UUID' 15 | 16 | def test__extract_id(graph, uuid_str, uuid_obj): 17 | assert graph._extract_id(uuid_str) == uuid_obj 18 | 19 | def test_add_event(graph): 20 | t = time.time() 21 | e = sn.Event(t, "Generic Event", {"type": "test"}) 22 | graph.add_event(e.timecode, e.name, e.attributes) 23 | in_timeline = graph.timeline[0] 24 | assert e.timecode == in_timeline.timecode 25 | assert e.name == in_timeline.name 26 | assert e.attributes == in_timeline.attributes 27 | 28 | def test_add_node(graph): 29 | a = graph.add_node({"type": "A"}) 30 | 31 | nodes = graph.get_nodes() 32 | assert a in nodes 33 | 34 | node = graph.get_node_attributes(a) 35 | assert "type" in node 36 | assert node["type"] == "A" 37 | 38 | # adding a node with reserved attributes should throw an exception 39 | with pytest.raises(sn.ReservedAttributeException): 40 | graph.add_node({"id": 0}) 41 | 42 | def test_add_nodes(): 43 | dg = sn.DiGraph() 44 | nodes = { 45 | uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'): { 46 | "id": uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 47 | 'type': 'B' 48 | }, 49 | uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'): { 50 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 51 | 'type': 'A' 52 | }, 53 | uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'): { 54 | "id": uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 55 | 'type': 'C' 56 | } 57 | } 58 | dg.add_nodes(nodes) 59 | assert dg.get_nodes() == nodes 60 | 61 | dg = sn.DiGraph() 62 | nodes = [ {'type': 'A'}, {'type': 'B'}, {'type': 'C'} ] 63 | ids = dg.add_nodes(nodes) 64 | 65 | dg_nodes = dg.get_nodes() 66 | assert dg_nodes[ids[0]]['type'] == 'A' 67 | assert dg_nodes[ids[1]]['type'] == 'B' 68 | assert dg_nodes[ids[2]]['type'] == 'C' 69 | 70 | def test_has_node(populated_graph): 71 | assert populated_graph.has_node('3caaa8c09148493dbdf02c574b95526c') 72 | assert populated_graph.has_node('2cdfebf3bf9547f19f0412ccdfbe03b7') 73 | assert populated_graph.has_node('3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 74 | 75 | def test_get_node(populated_graph): 76 | assert ( 77 | populated_graph.get_node('3caaa8c09148493dbdf02c574b95526c') == 78 | { 79 | "type": "A", 80 | "id": uuid.UUID('3caaa8c09148493dbdf02c574b95526c') 81 | } 82 | ) 83 | 84 | def test_get_or_add_node(): 85 | dg = sn.DiGraph() 86 | key = 'test_key' 87 | 88 | # node is not in before we add it 89 | with pytest.raises(KeyError): 90 | dg.get_node('test_key') 91 | 92 | # node is in after we add it 93 | dg.get_or_add_node(key, data={"type": "test"}) 94 | 95 | correct_node = { 96 | "id": key, 97 | "type": "test" 98 | } 99 | test_node = dg.get_node(key) 100 | assert test_node == correct_node 101 | 102 | # if we call again with the same key, it does not add a node, it returns it 103 | # Will also discard the argument data 104 | dg.get_or_add_node(key, data={"type": "test2"}) 105 | correct_nodes = { 106 | key: { 107 | "id": key, 108 | "type": "test" # not test2 109 | } 110 | } 111 | assert dg.get_nodes() == correct_nodes 112 | 113 | def test_get_nodes(populated_graph): 114 | output = { 115 | uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'): { 116 | "id": uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 117 | 'type': 'B' 118 | }, 119 | uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'): { 120 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 121 | 'type': 'A' 122 | }, 123 | uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'): { 124 | "id": uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 125 | 'type': 'C' 126 | } 127 | } 128 | assert populated_graph.get_nodes() == output 129 | 130 | def test_get_node_ids(populated_graph): 131 | correct_output = [ 132 | uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 133 | uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 134 | uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2') 135 | ] 136 | assert populated_graph.get_node_ids() == correct_output 137 | 138 | def test_get_node_attribute(populated_graph): 139 | assert populated_graph.get_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'type') == 'A' 140 | assert populated_graph.get_node_attribute('2cdfebf3bf9547f19f0412ccdfbe03b7', 'type') == 'B' 141 | assert populated_graph.get_node_attribute('3cd197c2cf5e42dc9ccd0c2adcaf4bc2', 'type') == 'C' 142 | 143 | with pytest.raises(sn.GraphException): 144 | populated_graph.get_node_attribute('3caaa8c09148493dbdf02c57deadbeef', 'type') 145 | 146 | def test_get_node_attributes(populated_graph): 147 | assert ( populated_graph.get_node_attributes('3caaa8c09148493dbdf02c574b95526c') == 148 | { 149 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 150 | 'type': 'A' 151 | } 152 | ) 153 | with pytest.raises(sn.GraphException): 154 | populated_graph.get_node_attributes('3caaa8c09148493dbdf02c57deadbeef') 155 | 156 | def test_neighbors(populated_digraph): 157 | neighbors = populated_digraph.neighbors(uuid.UUID( '3caaa8c09148493dbdf02c574b95526c') ) 158 | correct_neighbors = { 159 | uuid.UUID( '2cdfebf3bf9547f19f0412ccdfbe03b7' ): { 160 | 'id': uuid.UUID( '2cdfebf3bf9547f19f0412ccdfbe03b7' ), 161 | 'type': 'B', 162 | }, 163 | uuid.UUID( '3cd197c2cf5e42dc9ccd0c2adcaf4bc2' ): { 164 | 'id': uuid.UUID( '3cd197c2cf5e42dc9ccd0c2adcaf4bc2' ), 165 | 'type': 'C', 166 | } 167 | } 168 | assert neighbors == correct_neighbors 169 | 170 | def test_predecessors(populated_digraph): 171 | predecessors = populated_digraph.predecessors(uuid.UUID( '3cd197c2cf5e42dc9ccd0c2adcaf4bc2') ) 172 | correct_predecessors = { 173 | uuid.UUID( '2cdfebf3bf9547f19f0412ccdfbe03b7' ): { 174 | 'id': uuid.UUID( '2cdfebf3bf9547f19f0412ccdfbe03b7' ), 175 | 'type': 'B', 176 | }, 177 | uuid.UUID( '3caaa8c09148493dbdf02c574b95526c' ): { 178 | 'id': uuid.UUID( '3caaa8c09148493dbdf02c574b95526c' ), 179 | 'type': 'A', 180 | } 181 | } 182 | assert predecessors == correct_predecessors 183 | 184 | def test_set_node_attribute(populated_graph): 185 | populated_graph.set_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'depth', 5) 186 | assert populated_graph.get_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'depth') == 5 187 | 188 | populated_graph.set_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'type', 'D') 189 | assert populated_graph.get_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'type') != 'A' 190 | assert populated_graph.get_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'type') == 'D' 191 | 192 | # set non-existant edge 193 | with pytest.raises(sn.GraphException): 194 | populated_graph.set_node_attribute('3caaa8c09148493dbdf02c57deadbeef', 'depht', 5) 195 | 196 | # set reserved attribute 197 | with pytest.raises(sn.ReservedAttributeException): 198 | populated_graph.set_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'id', 199 | '3caaa8c09148493dbdf02c57deadbeef') 200 | 201 | def test_add_edge(graph): 202 | a = graph.add_node({"type": "A"}) 203 | b = graph.add_node({"type": "B"}) 204 | e = graph.add_edge(a, b, {"type": "normal"}) 205 | 206 | assert len(graph.get_edges()) != 0 207 | assert e in graph.get_edges() 208 | 209 | attrs = graph.get_edge_attributes(e) 210 | assert "type" in attrs 211 | assert attrs["type"] == "normal" 212 | 213 | # use a non-existant node. Should throw an exception 214 | with pytest.raises(sn.GraphException): 215 | graph.add_edge(a, '3caaa8c09148493dbdf02c57deadbeef') 216 | 217 | # adding an edge with reserved attributes should throw an exception 218 | with pytest.raises(sn.ReservedAttributeException): 219 | graph.add_edge(a, b, {"id": 0}) 220 | 221 | with pytest.raises(sn.ReservedAttributeException): 222 | graph.add_edge(a, b, {"src": a}) 223 | 224 | with pytest.raises(sn.ReservedAttributeException): 225 | graph.add_edge(a, b, {"dst": b}) 226 | 227 | with pytest.raises(sn.ReservedAttributeException): 228 | graph.add_edge(a, b, {"src": a, "dst": b, "id": 0}) 229 | 230 | def test_add_edges(populated_digraph): 231 | g = sn.DiGraph() 232 | g.add_nodes(populated_digraph.get_nodes()) 233 | edges = [ 234 | ( 235 | '3caaa8c09148493dbdf02c574b95526c', 236 | '2cdfebf3bf9547f19f0412ccdfbe03b7', 237 | {"type": "normal"}, 238 | '5f5f44ec7c0144e29c5b7d513f92d9ab' 239 | ), 240 | ( 241 | '2cdfebf3bf9547f19f0412ccdfbe03b7', 242 | '3caaa8c09148493dbdf02c574b95526c', 243 | {"type": "normal"}, 244 | 'f3674fcc691848ebbd478b1bfb3e84c3' 245 | ), 246 | ( 247 | '3caaa8c09148493dbdf02c574b95526c', 248 | '3cd197c2cf5e42dc9ccd0c2adcaf4bc2', 249 | {"type": "normal"}, 250 | '7eb91be54d3746b89a61a282bcc207bb' 251 | ), 252 | ( 253 | '2cdfebf3bf9547f19f0412ccdfbe03b7', 254 | '3cd197c2cf5e42dc9ccd0c2adcaf4bc2', 255 | # include some dummy reserved attributes 256 | # to make sure they correctly get removed 257 | { 258 | "type": "irregular", "id": "foo", 259 | "src": "bar", "dst": "baz" 260 | }, 261 | 'c172a3599b7d4ef3bbb688277276b763' 262 | ), 263 | # (b, c). leave out id to test that mixing tuple variations works 264 | ( 265 | '3cd197c2cf5e42dc9ccd0c2adcaf4bc2', 266 | '2cdfebf3bf9547f19f0412ccdfbe03b7', 267 | {"type": "irregular"} 268 | ) 269 | ] 270 | g.add_edges(edges) 271 | gedges = g.get_edges() 272 | # make sure all the edges from populated_digraph are present 273 | for eid, attrs in populated_digraph.get_edges().iteritems(): 274 | assert eid in gedges 275 | assert gedges[eid] == attrs 276 | 277 | # make sure the last edge in the list was added 278 | assert g.get_edges_between('3cd197c2cf5e42dc9ccd0c2adcaf4bc2', '2cdfebf3bf9547f19f0412ccdfbe03b7') 279 | 280 | g = sn.DiGraph() 281 | g.add_node({"type": "A"}, '3caaa8c09148493dbdf02c574b95526c') 282 | g.add_node({"type": "B"}, '2cdfebf3bf9547f19f0412ccdfbe03b7') 283 | g.add_node({"type": "C"}, '3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 284 | edges = populated_digraph.get_edges() 285 | g.add_edges(edges) 286 | assert g.get_edges() == populated_digraph.get_edges() 287 | 288 | def test_get_edge(populated_graph): 289 | assert ( populated_graph.get_edge('7eb91be5-4d37-46b8-9a61-a282bcc207bb') == 290 | { 291 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 292 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 293 | 'type': 'normal', 294 | 'id': uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb') 295 | } 296 | ) 297 | 298 | with pytest.raises(sn.GraphException): 299 | populated_graph.get_edge('7eb91be5-4d37-46b8-9a61-a282deadbeef') 300 | 301 | def test_has_edge(populated_graph): 302 | assert populated_graph.has_edge('5f5f44ec7c0144e29c5b7d513f92d9ab') 303 | assert populated_graph.has_edge('7eb91be54d3746b89a61a282bcc207bb') 304 | assert populated_graph.has_edge('c172a3599b7d4ef3bbb688277276b763') 305 | 306 | def test_has_edge_between(populated_graph): 307 | assert populated_graph.has_edge_between('3caaa8c09148493dbdf02c574b95526c', '2cdfebf3bf9547f19f0412ccdfbe03b7') 308 | assert populated_graph.has_edge_between('3caaa8c09148493dbdf02c574b95526c', '3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 309 | assert populated_graph.has_edge_between('2cdfebf3bf9547f19f0412ccdfbe03b7', '3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 310 | 311 | def test_get_edges_between(populated_graph): 312 | populated_graph.add_node(id_='261b076580434c299361f4a3c05db55d') 313 | populated_graph.add_edge('3caaa8c09148493dbdf02c574b95526c', '2cdfebf3bf9547f19f0412ccdfbe03b7', 314 | {"type": "irregular"}, '9ad0b719d681459584f7e2c962910526') 315 | 316 | edges_a_b = populated_graph.get_edges_between('3caaa8c09148493dbdf02c574b95526c', '2cdfebf3bf9547f19f0412ccdfbe03b7') 317 | correct = { 318 | uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab'): { 319 | "src": uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 320 | "dst": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 321 | "type": "normal", 322 | "id": uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab') 323 | }, 324 | uuid.UUID('9ad0b719d681459584f7e2c962910526'): { 325 | "src": uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 326 | "dst": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 327 | "type": "irregular", 328 | "id": uuid.UUID('9ad0b719d681459584f7e2c962910526') 329 | } 330 | } 331 | assert edges_a_b == correct 332 | 333 | assert ( 334 | not populated_graph.get_edges_between('3caaa8c09148493dbdf02c574b95526c', 335 | '261b076580434c299361f4a3c05db55d') 336 | ) 337 | 338 | def test_get_edges_between_digraph(populated_digraph): 339 | edges_a_b = populated_digraph.get_edges_between('3caaa8c09148493dbdf02c574b95526c', 340 | '2cdfebf3bf9547f19f0412ccdfbe03b7') 341 | 342 | # for digraphs, should return all edges in both directions 343 | correct_edges = { 344 | uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab'): { 345 | "id": uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab'), 346 | "src": uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 347 | "dst": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 348 | "type": "normal" 349 | }, 350 | uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'): { 351 | "id": uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'), 352 | "src": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 353 | "dst": uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 354 | "type": "normal" 355 | } 356 | } 357 | assert edges_a_b == correct_edges 358 | 359 | def test_get_edges(populated_graph): 360 | output = { 361 | uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb'): { 362 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 363 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 364 | 'type': 'normal', 365 | 'id': uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb') 366 | }, 367 | uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab'): { 368 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 369 | 'dst': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 370 | 'type': 'normal', 371 | 'id': uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab') 372 | }, 373 | uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763'): { 374 | 'src': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 375 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 376 | 'type': 'irregular', 377 | 'id': uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763') 378 | } 379 | } 380 | assert populated_graph.get_edges() == output 381 | 382 | def test_get_edge_ids(populated_graph): 383 | correct_edge_ids = [ 384 | uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb'), 385 | uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab'), 386 | uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763') 387 | ] 388 | assert populated_graph.get_edge_ids() == correct_edge_ids 389 | 390 | def test_get_node(populated_graph): 391 | assert ( 392 | populated_graph.get_node('3caaa8c09148493dbdf02c574b95526c') == 393 | { 394 | "type": "A", 395 | "id": uuid.UUID('3caaa8c09148493dbdf02c574b95526c') 396 | } 397 | ) 398 | 399 | def test_get_edge_attribute(populated_graph): 400 | assert populated_graph.get_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'type') == 'normal' 401 | 402 | # get nonexistent edge attribute on a valid edge 403 | assert populated_graph.get_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'weight') == None 404 | 405 | # nonexistent id 406 | with pytest.raises(sn.GraphException): 407 | populated_graph.get_edge_attribute('5f5f44ec7c0144e29c5b7d51deadbeef', 'type') 408 | 409 | def test_get_edge_attributes(populated_graph): 410 | assert ( populated_graph.get_edge_attributes('5f5f44ec7c0144e29c5b7d513f92d9ab') == 411 | { 412 | "type": "normal", 413 | "src": uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 414 | "dst": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 415 | "id": uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab') 416 | } 417 | ) 418 | 419 | # nonexistent id 420 | with pytest.raises(sn.GraphException): 421 | populated_graph.get_edge_attributes('5f5f44ec7c0144e29c5b7d51deadbeef') 422 | 423 | def test_set_edge_attribute(populated_graph): 424 | populated_graph.set_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'weight', 5) 425 | assert populated_graph.get_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'weight') == 5 426 | 427 | populated_graph.set_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'type', 'irregular') 428 | assert populated_graph.get_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'type') != 'normal' 429 | assert populated_graph.get_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'type') == 'irregular' 430 | 431 | # nonexistent id 432 | with pytest.raises(sn.GraphException): 433 | populated_graph.set_edge_attribute('5f5f44ec7c0144e29c5b7d51deadbeef', 'weight', 5) 434 | 435 | # set reserved attribute 436 | with pytest.raises(sn.ReservedAttributeException): 437 | populated_graph.set_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'id', 438 | '5f5f44ec7c0144e29c5b7d51deadbeef') 439 | 440 | with pytest.raises(sn.ReservedAttributeException): 441 | populated_graph.set_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'src', 442 | '3caaa8c09148493dbdf02c57deadbeef') 443 | 444 | with pytest.raises(sn.ReservedAttributeException): 445 | populated_graph.set_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'dst', 446 | '3caaa8c09148493dbdf02c57deadbeef') 447 | 448 | def test_remove_edge(populated_graph): 449 | populated_graph.remove_edge('5f5f44ec7c0144e29c5b7d513f92d9ab') 450 | assert uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab') not in populated_graph.get_edges() 451 | with pytest.raises(sn.GraphException): 452 | populated_graph.remove_edge('5f5f44ec7c0144e29c5b7d51deadbeef') 453 | 454 | def test_remove_edges(populated_digraph): 455 | # remove the edges (a, b) and (a, c) 456 | populated_digraph.remove_edges(['5f5f44ec7c0144e29c5b7d513f92d9ab', '7eb91be54d3746b89a61a282bcc207bb']) 457 | 458 | correct_edges = { 459 | uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'): { 460 | "id": uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'), 461 | "src": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 462 | "dst": uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 463 | "type": "normal" 464 | }, 465 | uuid.UUID('c172a3599b7d4ef3bbb688277276b763'): { 466 | "id": uuid.UUID('c172a3599b7d4ef3bbb688277276b763'), 467 | "src": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 468 | "dst": uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 469 | "type": "irregular" 470 | } 471 | } 472 | assert populated_digraph.get_edges() == correct_edges 473 | 474 | def test_remove_node(populated_graph): 475 | node_a_id = uuid.UUID('3caaa8c09148493dbdf02c574b95526c') 476 | node_b_id = uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7') 477 | node_c_id = uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 478 | 479 | edge_a_b_id = uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab') 480 | edge_a_c_id = uuid.UUID('7eb91be54d3746b89a61a282bcc207bb') 481 | edge_b_c_id = uuid.UUID('c172a3599b7d4ef3bbb688277276b763') 482 | 483 | populated_graph.remove_node('3caaa8c09148493dbdf02c574b95526c') 484 | 485 | # make sure a is gone, and b and c are not 486 | assert node_a_id not in populated_graph.get_nodes() 487 | assert node_b_id in populated_graph.get_nodes() 488 | assert node_c_id in populated_graph.get_nodes() 489 | 490 | # make sure the edges from a to b and a to c are gone 491 | # but the edge from b to c is not 492 | edges = populated_graph.get_edges() 493 | assert edge_a_b_id not in edges 494 | assert edge_a_c_id not in edges 495 | assert edge_b_c_id in edges 496 | 497 | with pytest.raises(sn.GraphException): 498 | populated_graph.remove_node('3caaa8c09148493dbdf02c57deadbeef') 499 | 500 | def test_remove_nodes(populated_graph): 501 | # remove A and B 502 | populated_graph.remove_nodes(['3caaa8c09148493dbdf02c574b95526c', '2cdfebf3bf9547f19f0412ccdfbe03b7']) 503 | 504 | # only C should remain 505 | correct_nodes = { 506 | uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'): { 507 | "type": "C", 508 | "id": uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 509 | } 510 | } 511 | assert populated_graph.get_nodes() == correct_nodes 512 | 513 | def test_remove_digraph_node(populated_digraph): 514 | node_a_id = uuid.UUID('3caaa8c09148493dbdf02c574b95526c') 515 | node_b_id = uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7') 516 | node_c_id = uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 517 | 518 | edge_a_b_id = uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab') 519 | edge_b_a_id = uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3') 520 | edge_a_c_id = uuid.UUID('7eb91be54d3746b89a61a282bcc207bb') 521 | edge_b_c_id = uuid.UUID('c172a3599b7d4ef3bbb688277276b763') 522 | 523 | populated_digraph.remove_node('3caaa8c09148493dbdf02c574b95526c') 524 | 525 | # make sure a is gone, and b and c are not 526 | assert node_a_id not in populated_digraph.get_nodes() 527 | assert node_b_id in populated_digraph.get_nodes() 528 | assert node_c_id in populated_digraph.get_nodes() 529 | 530 | # make sure edges (a,b), (b,a), (a,c) are gone but (b,c) is not 531 | edges = populated_digraph.get_edges() 532 | assert edge_a_b_id not in edges 533 | assert edge_b_a_id not in edges 534 | assert edge_a_c_id not in edges 535 | assert edge_b_c_id in edges 536 | 537 | def test_save_json(fixture_dir, test_output, correct_output): 538 | assert test_output["timeline"] == correct_output["timeline"] 539 | assert test_output["meta"] == correct_output["meta"] 540 | 541 | for node in test_output["nodes"]: 542 | assert node in correct_output["nodes"] 543 | 544 | for edge in test_output["edges"]: 545 | # for an undirected edge, reversing src and dst is valid 546 | try: 547 | assert edge in correct_output["edges"] 548 | except AssertionError: 549 | edge["src"], edge["dst"] = edge["dst"], edge["src"] 550 | assert edge in correct_output["edges"] 551 | 552 | os.remove(os.path.join(fixture_dir, "test_output.json")) 553 | 554 | def test_save_json_plaintext(test_output_plaintext, test_output_plaintext_correct): 555 | assert test_output_plaintext["timeline"] == test_output_plaintext_correct["timeline"] 556 | assert test_output_plaintext["meta"] == test_output_plaintext_correct["meta"] 557 | 558 | for node in test_output_plaintext["nodes"]: 559 | assert node in test_output_plaintext_correct["nodes"] 560 | 561 | for edge in test_output_plaintext["edges"]: 562 | # for an undirected edge, reversing src and dst is valid 563 | try: 564 | assert edge in test_output_plaintext_correct["edges"] 565 | except AssertionError: 566 | edge["src"], edge["dst"] = edge["dst"], edge["src"] 567 | assert edge in test_output_plaintext_correct["edges"] 568 | 569 | def test_load_json(correct_output_graph): 570 | nodes = { 571 | uuid.UUID('6cf546f71efe47578f7a1400871ef6b8'): { 572 | 'id': uuid.UUID('6cf546f71efe47578f7a1400871ef6b8'), 573 | 'label': 'A' 574 | }, 575 | uuid.UUID('bcb388bb24a74d978fa2006ed278b2fe'): { 576 | 'id': uuid.UUID('bcb388bb24a74d978fa2006ed278b2fe'), 577 | 'label': 'B' 578 | }, 579 | uuid.UUID('d6523f4f9d5240d2a92e341f4ca00a78'): { 580 | 'id': uuid.UUID('d6523f4f9d5240d2a92e341f4ca00a78'), 581 | 'label': 'C' 582 | } 583 | } 584 | 585 | assert correct_output_graph.get_nodes() == nodes 586 | 587 | edges = { 588 | uuid.UUID('081369f6197b467abe97b3efe8cc4640'): { 589 | 'src': uuid.UUID('bcb388bb24a74d978fa2006ed278b2fe'), 590 | 'dst': uuid.UUID('d6523f4f9d5240d2a92e341f4ca00a78'), 591 | 'type': 'owns', 592 | 'id': uuid.UUID('081369f6197b467abe97b3efe8cc4640') 593 | }, 594 | uuid.UUID('b3a245098d5d482f893c6d63606c7e91'): { 595 | 'src': uuid.UUID('d6523f4f9d5240d2a92e341f4ca00a78'), 596 | 'dst': uuid.UUID('6cf546f71efe47578f7a1400871ef6b8'), 597 | 'type': 'has', 598 | 'id': uuid.UUID('b3a245098d5d482f893c6d63606c7e91') 599 | }, 600 | uuid.UUID('ff8a8a8093cf436aa3b0127c71ddc11d'): { 601 | 'src': uuid.UUID('6cf546f71efe47578f7a1400871ef6b8'), 602 | 'dst': uuid.UUID('bcb388bb24a74d978fa2006ed278b2fe'), 603 | 'type': 'belongs', 604 | 'id': uuid.UUID('ff8a8a8093cf436aa3b0127c71ddc11d') 605 | } 606 | } 607 | 608 | assert correct_output_graph.get_edges() == edges 609 | 610 | def test_load_json_with_object(correct_output): 611 | g = sn.Graph() 612 | g.load_json(correct_output) # load graph with json object, instead of string 613 | test_load_json(g) 614 | 615 | def test_load_json_plaintext(correct_output_graph_plaintext_from_file): 616 | nodes = { 617 | 'a': { 618 | 'id': 'a', 619 | 'label': 'A' 620 | }, 621 | 'b': { 622 | 'id': 'b', 623 | 'label': 'B' 624 | }, 625 | 'c': { 626 | 'id': 'c', 627 | 'label': 'C' 628 | } 629 | } 630 | 631 | assert correct_output_graph_plaintext_from_file.get_nodes() == nodes 632 | 633 | edges = { 634 | 'owns': { 635 | 'src': 'b', 636 | 'dst': 'c', 637 | 'type': 'owns', 638 | 'id': 'owns' 639 | }, 640 | 'has': { 641 | 'src': 'c', 642 | 'dst': 'a', 643 | 'type': 'has', 644 | 'id': 'has' 645 | }, 646 | 'belongs': { 647 | 'src': 'a', 648 | 'dst': 'b', 649 | 'type': 'belongs', 650 | 'id': 'belongs' 651 | } 652 | } 653 | 654 | assert correct_output_graph_plaintext_from_file.get_edges() == edges 655 | 656 | def test_networkx_graph(populated_graph): 657 | nx_graph = populated_graph.networkx_graph() 658 | 659 | # make sure all edges and nodes are the same 660 | for id_, attr in populated_graph.get_edges().iteritems(): 661 | assert nx_graph.edge[attr["src"]][attr["dst"]][id_] == attr 662 | 663 | for id_, attr in populated_graph.get_nodes().iteritems(): 664 | assert nx_graph.node[id_] == attr 665 | 666 | 667 | # but that it is not the same object 668 | assert nx_graph is not populated_graph._g 669 | 670 | def test_load_networkx_graph(netx_graph): 671 | graph = sn.Graph() 672 | graph.load_networkx_graph(netx_graph) 673 | 674 | correct_nodes = { 675 | 0: { 676 | "type": "A", 677 | "id": 0 678 | }, 679 | 1: { 680 | "type": "B", 681 | "id": 1 682 | }, 683 | 2: { 684 | "type": "C", 685 | "id": 2 686 | }, 687 | } 688 | assert graph.get_nodes() == correct_nodes 689 | 690 | correct_edges = { 691 | 0: { 692 | "src": 0, 693 | "dst": 1, 694 | "type": "normal", 695 | "id": 0 696 | }, 697 | 1: { 698 | "src": 0, 699 | "dst": 2, 700 | "type": "normal", 701 | "id": 1 702 | }, 703 | 2: { 704 | "src": 1, 705 | "dst": 2, 706 | "type": "irregular", 707 | "id": 2 708 | }, 709 | } 710 | assert graph.get_edges() == correct_edges 711 | 712 | def test_cache_by_empty(graph): 713 | graph.cache_nodes_by("type") 714 | graph.add_node({"type": "A"}, '8a09b47f77284348878c745741a326aa') 715 | cache = graph.get_nodes_by_attr("type", "A", nosingleton=True) 716 | assert ( 717 | cache == 718 | { 719 | "id": uuid.UUID('8a09b47f77284348878c745741a326aa'), 720 | "type": "A" 721 | } 722 | ) 723 | 724 | def test_get_nodes_by_attr(populated_graph): 725 | populated_graph.add_node({"type": "A"}, '2b673235a0b94935ab8b6b9de178d341') 726 | 727 | # a non-existent attr should return an empty dict 728 | assert populated_graph.get_nodes_by_attr("label") == {} 729 | 730 | # cache by the attribute "type" 731 | populated_graph.cache_nodes_by("type") 732 | 733 | ### get all nodes with attr "type" 734 | input_ = populated_graph.get_nodes_by_attr("type") 735 | output = { 736 | "B": [{ 737 | "id": uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 738 | 'type': 'B' 739 | }], 740 | "A": [ 741 | { 742 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 743 | "type": "A" 744 | }, 745 | { 746 | "id": uuid.UUID('2b673235a0b94935ab8b6b9de178d341'), 747 | "type": "A" 748 | } 749 | ], 750 | "C": [{ 751 | "id": uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 752 | 'type': 'C' 753 | }] 754 | } 755 | 756 | assert input_ == output 757 | 758 | ### get all nodes of "type" "B" 759 | input_ = populated_graph.get_nodes_by_attr("type", "B") 760 | output = [{ 761 | "id": uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 762 | 'type': 'B' 763 | }] 764 | 765 | assert input_ == output 766 | 767 | ### if user specifies 'nosingleton=False', return a singleton list 768 | input_ = populated_graph.get_nodes_by_attr("type", "B", nosingleton=True) 769 | output = { 770 | "id": uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 771 | 'type': 'B' 772 | } 773 | 774 | assert input_ == output 775 | 776 | ### get all nodes of "type" "A" 777 | input_ = populated_graph.get_nodes_by_attr("type", "A") 778 | output = [ 779 | { 780 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 781 | "type": "A" 782 | }, 783 | { 784 | "id": uuid.UUID('2b673235a0b94935ab8b6b9de178d341'), 785 | "type": "A" 786 | } 787 | ] 788 | 789 | assert input_ == output 790 | 791 | ### if user specifies 'nosingleton=True', but there is more than one, 792 | ### should still return the same list, having no affect on the output 793 | input_ = populated_graph.get_nodes_by_attr("type", "A", nosingleton=True) 794 | output = [ 795 | { 796 | "id": uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 797 | "type": "A" 798 | }, 799 | { 800 | "id": uuid.UUID('2b673235a0b94935ab8b6b9de178d341'), 801 | "type": "A" 802 | } 803 | ] 804 | 805 | assert input_ == output 806 | 807 | ### if the attr is in the cache, but the value is not, return [] 808 | assert populated_graph.get_nodes_by_attr("type", "D") == [] 809 | -------------------------------------------------------------------------------- /test/test_operators.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import semanticnet as sn 3 | import uuid 4 | 5 | def test_difference(populated_digraph): 6 | A = populated_digraph 7 | B = populated_digraph.copy() 8 | 9 | # remove node C. Consequently, also removes edges (A, C) and (B, C) 10 | B.remove_node('3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 11 | d = B.add_node({"type": "D"}, 'da30015efe3c44dbb0b3b3862cef704a') # add another node D 12 | B.add_edge(d, '3caaa8c09148493dbdf02c574b95526c') # add an edge from D to A 13 | 14 | e = A.add_node({"type": "E"}, 'b1b1c6bbbce74a6fb40ee2486cebef26') # add another node 15 | f = A.add_node({"type": "F"}, '3a668c22b43e4521b3c9f042fb2380c2') # add another node 16 | A.add_edge(e, f, {"type": "irregular"}, 'a216de41cca8412fa4b3f432b5d3b0e4') # add edge between the two new nodes 17 | 18 | C = sn.difference(A, B) # A - B 19 | 20 | correct_nodes = { 21 | uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'): { 22 | "type": "C", 23 | "id": uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 24 | }, 25 | uuid.UUID('b1b1c6bbbce74a6fb40ee2486cebef26'): { 26 | "type": "E", 27 | "id": uuid.UUID('b1b1c6bbbce74a6fb40ee2486cebef26') 28 | }, 29 | uuid.UUID('3a668c22b43e4521b3c9f042fb2380c2'): { 30 | "type": "F", 31 | "id": uuid.UUID('3a668c22b43e4521b3c9f042fb2380c2') 32 | } 33 | } 34 | assert C.get_nodes() == correct_nodes 35 | 36 | correct_edges = { 37 | # e,f 38 | uuid.UUID('a216de41cca8412fa4b3f432b5d3b0e4'): { 39 | "type": "irregular", 40 | "src": uuid.UUID('b1b1c6bbbce74a6fb40ee2486cebef26'), 41 | "dst": uuid.UUID('3a668c22b43e4521b3c9f042fb2380c2'), 42 | "id": uuid.UUID('a216de41cca8412fa4b3f432b5d3b0e4') 43 | } 44 | } 45 | assert C.get_edges() == correct_edges 46 | 47 | def test_difference_custom_lambda(populated_digraph): 48 | # add some attributes 49 | populated_digraph.set_node_attribute('3caaa8c09148493dbdf02c574b95526c', 'depth', 0) 50 | populated_digraph.set_node_attribute('2cdfebf3bf9547f19f0412ccdfbe03b7', 'depth', 0) 51 | populated_digraph.set_node_attribute('3cd197c2cf5e42dc9ccd0c2adcaf4bc2', 'depth', 1) 52 | 53 | # add a node of type D 54 | d = populated_digraph.add_node({'type': 'D', 'depth': 1}, '63cf70d2762043c29eb5e3e958383f4a') 55 | 56 | populated_digraph.set_edge_attribute('5f5f44ec7c0144e29c5b7d513f92d9ab', 'weight', 1) 57 | populated_digraph.set_edge_attribute('f3674fcc691848ebbd478b1bfb3e84c3', 'weight', 2) # (B, A) 58 | populated_digraph.set_edge_attribute('7eb91be54d3746b89a61a282bcc207bb', 'weight', 3) 59 | populated_digraph.set_edge_attribute('c172a3599b7d4ef3bbb688277276b763', 'weight', 5) # (B, C) 60 | 61 | # make a copy to change around 62 | new_populated_digraph = populated_digraph.copy() 63 | 64 | # add an edge between the new node d and node B with weight 8 65 | new_populated_digraph.add_edge(d, '2cdfebf3bf9547f19f0412ccdfbe03b7', {'weight': 8}, 66 | '8ccd176a48284915828e5ac7e13bc43a') 67 | 68 | # remove node C. Will remove the edges (B, C) and (A, C) as well 69 | new_populated_digraph.remove_node('3cd197c2cf5e42dc9ccd0c2adcaf4bc2') 70 | new_populated_digraph.remove_edge('f3674fcc691848ebbd478b1bfb3e84c3') # remove edge (B, A) 71 | 72 | # custom lambda that defines membership in the usual way, but only for nodes with 73 | # a depth greater than 0, and edges with a weight greater than 2 74 | node_depth_gt_0 = lambda nid, G: sn.node_in(nid, G) and G.get_node_attribute(nid, 'depth') > 0 75 | edge_weight_gt_2 = lambda eid, G: sn.edge_in(eid, G) and G.get_edge_attribute(eid, 'weight') > 2 76 | 77 | dg = sn.difference(populated_digraph, new_populated_digraph, node_depth_gt_0, edge_weight_gt_2) 78 | 79 | # the first two nodes are "not in B" as we have defined it because their 80 | # depth is not greater than 0 81 | correct_nodes = { 82 | uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'): { 83 | 'depth': 0, 84 | 'id': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 85 | 'type': 'B' 86 | }, 87 | uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'): { 88 | 'depth': 0, 89 | 'id': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 90 | 'type': 'A' 91 | }, 92 | uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'): { 93 | "id": uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 94 | "type": "C", 95 | "depth": 1 96 | } 97 | } 98 | assert dg.get_nodes() == correct_nodes 99 | 100 | correct_edges = { 101 | # the new graph DOES have edge (B, A), but its weight 1, which is not > 2, 102 | # so by our definition, (B, A) is "not in" the new graph 103 | uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab'): { 104 | 'dst': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 105 | 'id': uuid.UUID('5f5f44ec-7c01-44e2-9c5b-7d513f92d9ab'), 106 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 107 | 'type': 'normal', 108 | 'weight': 1 109 | }, 110 | # the new graph removed edge (B, A) 111 | uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'): { 112 | 'id': uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'), 113 | 'src': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 114 | 'dst': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 115 | 'weight': 2, 116 | 'type': 'normal' 117 | }, 118 | # the new graph removed edge (A, C) 119 | uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb'): { 120 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 121 | 'id': uuid.UUID('7eb91be5-4d37-46b8-9a61-a282bcc207bb'), 122 | 'src': uuid.UUID('3caaa8c0-9148-493d-bdf0-2c574b95526c'), 123 | 'type': 'normal', 124 | 'weight': 3 125 | }, 126 | # new graph removed edge (B, C) 127 | uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763'): { 128 | 'dst': uuid.UUID('3cd197c2-cf5e-42dc-9ccd-0c2adcaf4bc2'), 129 | 'id': uuid.UUID('c172a359-9b7d-4ef3-bbb6-88277276b763'), 130 | 'src': uuid.UUID('2cdfebf3-bf95-47f1-9f04-12ccdfbe03b7'), 131 | 'type': 'irregular', 132 | 'weight': 5 133 | } 134 | } 135 | assert dg.get_edges() == correct_edges 136 | 137 | def test_intersection(populated_digraph): 138 | another_digraph = sn.DiGraph() 139 | a = another_digraph.add_node({"type": "A"}, '3caaa8c09148493dbdf02c574b95526c') 140 | b = another_digraph.add_node({"type": "B"}, '2cdfebf3bf9547f19f0412ccdfbe03b7') 141 | d = another_digraph.add_node({"type": "D"}, 'da30015efe3c44dbb0b3b3862cef704a') 142 | another_digraph.add_edge(a, b, {"type": "normal"}, '5f5f44ec7c0144e29c5b7d513f92d9ab') 143 | another_digraph.add_edge(b, a, {"type": "normal"}, 'f3674fcc691848ebbd478b1bfb3e84c3') 144 | another_digraph.add_edge(a, d, {"type": "normal"}, 'f3674fcc691848ebbd478b1bfb3e84c3') 145 | another_digraph.add_edge(d, b, {"type": "irregular"}, 'f3674fcc691848ebbd478b1bfb3e84c3') 146 | 147 | I = sn.intersection(populated_digraph, another_digraph) 148 | 149 | correct_nodes = { 150 | uuid.UUID('3caaa8c09148493dbdf02c574b95526c'): { 151 | "type": "A", 152 | "id": uuid.UUID('3caaa8c09148493dbdf02c574b95526c') 153 | }, 154 | uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'): { 155 | "type": "B", 156 | "id": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7') 157 | } 158 | } 159 | assert I.get_nodes() == correct_nodes 160 | 161 | correct_edges = { 162 | # a,b 163 | uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab'): { 164 | "type": "normal", 165 | "src": uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 166 | "dst": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 167 | "id": uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab') 168 | }, 169 | # b,a 170 | uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'): { 171 | "type": "normal", 172 | "src": uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 173 | "dst": uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 174 | "id": uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3') 175 | } 176 | } 177 | assert I.get_edges() == correct_edges 178 | 179 | def test_union_disjoint(populated_digraph): 180 | g1 = populated_digraph 181 | 182 | g2 = sn.DiGraph() 183 | 184 | a = g2.add_node({"label" : "A"}, 'a') 185 | b = g2.add_node({"label" : "B"}, 'b') 186 | c = g2.add_node({"label" : "C"}, 'c') 187 | 188 | ab = g2.add_edge(a, b, {"type" : "belongs"}, 'belongs') 189 | bc = g2.add_edge(b, c, {"type" : "owns"}, 'owns') 190 | ca = g2.add_edge(c, a, {"type" : "has"}, 'has') 191 | 192 | gu = sn.union(g1, g2) 193 | 194 | correct_nodes = { 195 | uuid.UUID('3caaa8c09148493dbdf02c574b95526c'): { 196 | 'id': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 197 | 'type': 'A' 198 | }, 199 | uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'): { 200 | 'id': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 201 | 'type': 'B' 202 | }, 203 | uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'): { 204 | 'id': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 205 | 'type': 'C' 206 | }, 207 | 'a': { 208 | 'id': 'a', 209 | 'label': 'A' 210 | }, 211 | 'b': { 212 | 'id': 'b', 213 | 'label': 'B' 214 | }, 215 | 'c': { 216 | 'id': 'c', 217 | 'label': 'C' 218 | } 219 | } 220 | assert gu.get_nodes() == correct_nodes 221 | 222 | correct_edges = { 223 | uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab'): { 224 | 'id': uuid.UUID('5f5f44ec7c0144e29c5b7d513f92d9ab'), 225 | 'src': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 226 | 'dst': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 227 | 'type': 'normal' 228 | }, 229 | uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'): { 230 | 'id': uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'), 231 | 'src': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 232 | 'dst': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 233 | 'type': 'normal' 234 | }, 235 | uuid.UUID('7eb91be54d3746b89a61a282bcc207bb'): { 236 | 'id': uuid.UUID('7eb91be54d3746b89a61a282bcc207bb'), 237 | 'src': uuid.UUID('3caaa8c09148493dbdf02c574b95526c'), 238 | 'dst': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 239 | 'type': 'normal' 240 | }, 241 | uuid.UUID('c172a3599b7d4ef3bbb688277276b763'): { 242 | 'id': uuid.UUID('c172a3599b7d4ef3bbb688277276b763'), 243 | 'src': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 244 | 'dst': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 245 | 'type': 'irregular' 246 | }, 247 | ab: { 248 | 'id': ab, 249 | 'src': a, 250 | 'dst': b, 251 | 'type': 'belongs' 252 | }, 253 | bc: { 254 | 'id': bc, 255 | 'src': b, 256 | 'dst': c, 257 | 'type': 'owns' 258 | }, 259 | ca: { 260 | 'id': ca, 261 | 'src': c, 262 | 'dst': a, 263 | 'type': 'has' 264 | } 265 | } 266 | assert gu.get_edges() == correct_edges 267 | 268 | def test_union_partial(populated_digraph): 269 | g1 = populated_digraph 270 | g2 = sn.DiGraph() 271 | a = g2.add_node({"type": "A"}, '3caaa8c09148493dbdf02c574b95526c') 272 | b = g2.add_node({"type": "B"}, '2cdfebf3bf9547f19f0412ccdfbe03b7') 273 | d = g2.add_node({"type": "D"}) 274 | ab = g2.add_edge(a, b, {"type": "normal"}, '5f5f44ec7c0144e29c5b7d513f92d9ab') 275 | bd = g2.add_edge(b, d, {"type": "irregular"}) 276 | 277 | gu = sn.union(g1, g2) 278 | 279 | correct_nodes = { 280 | a: { 281 | 'id': a, 282 | 'type': 'A' 283 | }, 284 | b: { 285 | 'id': b, 286 | 'type': 'B' 287 | }, 288 | uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'): { 289 | 'id': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 290 | 'type': 'C' 291 | }, 292 | d: { 293 | 'id': d, 294 | 'type': 'D' 295 | } 296 | } 297 | assert gu.get_nodes() == correct_nodes 298 | 299 | correct_edges = { 300 | ab: { 301 | 'id': ab, 302 | 'src': a, 303 | 'dst': b, 304 | 'type': 'normal' 305 | }, 306 | uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'): { 307 | 'id': uuid.UUID('f3674fcc691848ebbd478b1bfb3e84c3'), 308 | 'src': b, 309 | 'dst': a, 310 | 'type': 'normal' 311 | }, 312 | uuid.UUID('7eb91be54d3746b89a61a282bcc207bb'): { 313 | 'id': uuid.UUID('7eb91be54d3746b89a61a282bcc207bb'), 314 | 'src': a, 315 | 'dst': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 316 | 'type': 'normal' 317 | }, 318 | uuid.UUID('c172a3599b7d4ef3bbb688277276b763'): { 319 | 'id': uuid.UUID('c172a3599b7d4ef3bbb688277276b763'), 320 | 'src': uuid.UUID('2cdfebf3bf9547f19f0412ccdfbe03b7'), 321 | 'dst': uuid.UUID('3cd197c2cf5e42dc9ccd0c2adcaf4bc2'), 322 | 'type': 'irregular' 323 | }, 324 | bd: { 325 | 'id': bd, 326 | 'src': b, 327 | 'dst': d, 328 | 'type': 'irregular' 329 | } 330 | } 331 | gu.get_nodes() == correct_nodes 332 | 333 | def test_union_idempotent(populated_digraph): 334 | g1 = populated_digraph 335 | g2 = g1.copy() 336 | gu = sn.union(g1, g2) 337 | assert gu.get_nodes() == g1.get_nodes() 338 | assert gu.get_nodes() == g2.get_nodes() 339 | assert gu.get_edges() == g1.get_edges() 340 | assert gu.get_edges() == g2.get_edges() 341 | --------------------------------------------------------------------------------