├── .gitignore ├── .gitmodules ├── LICENSE ├── README.md └── code ├── common ├── Makefile ├── build.py ├── cmd_args.py ├── dnn.py ├── functions │ ├── __init__.py │ └── custom_func.py ├── graph_embedding.py ├── modules │ ├── __init__.py │ └── custom_mod.py ├── src │ ├── custom_kernel.cu │ ├── custom_kernel.h │ ├── my_lib.c │ ├── my_lib.h │ ├── my_lib_cuda.c │ └── my_lib_cuda.h └── test.py ├── data_generator ├── data_util.py ├── gen_er_components.py └── pkl_dump.sh ├── graph_attack ├── collect_rl_results.py ├── dqn.py ├── er_trivial_attack.py ├── genetic_algorithm.py ├── grad_attack.py ├── nstep_replay_mem.py ├── plot_dqn.py ├── plot_dqn.sh ├── q_net.py ├── rl_common.py ├── run_dqn.sh ├── run_ga.sh ├── run_grad.sh └── run_trivial.sh ├── graph_classification ├── er_components.py ├── graph_common.py ├── run_er_components.sh └── test_er_comp.sh ├── node_attack ├── exhaust_attack.py ├── node_attack_common.py ├── node_dqn.py ├── node_genetic.py ├── node_grad_attack.py ├── node_rand_attack.py ├── plot_grad.sh ├── plot_node_grad_attack.py ├── q_net_node.py ├── run_attack.sh ├── run_exhaust.sh ├── run_genetic.sh ├── run_grad.sh └── run_rand.sh └── node_classification ├── gcn.py ├── gcn_modules.py ├── node_utils.py └── run_gcn.sh /.gitignore: -------------------------------------------------------------------------------- 1 | dropbox/ 2 | dropbox 3 | 4 | *.pyc 5 | _ext/ 6 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "pytorch_structure2vec"] 2 | path = pytorch_structure2vec 3 | url = git@github.com:Hanjun-Dai/pytorch_structure2vec 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Dai, Hanjun and Li, Hui and Tian, Tian and Huang, Xin and Wang, Lin and Zhu, Jun and Song, Le 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # graph_adversarial_attack 2 | Adversarial Attack on Graph Structured Data (https://arxiv.org/abs/1806.02371, to appear in ICML 2018). 3 | This repo contains the code, data and results reported in the paper. 4 | 5 | ### 1. download repo and data 6 | 7 | First clone the repo recursively, since it depends on another repo (https://github.com/Hanjun-Dai/pytorch_structure2vec): 8 | 9 | git clone git@github.com:Hanjun-Dai/graph_adversarial_attack --recursive 10 | 11 | (BTW if you have trouble downloading it because of permission issues, please see [this issue](https://github.com/Hanjun-Dai/graph_adversarial_attack/issues/2) ) 12 | 13 | Then download the data using the following dropbox link: 14 | 15 | https://www.dropbox.com/sh/mu8odkd36x54rl3/AABg8ABiMqwcMEM5qKIY97nla?dl=0 16 | 17 | Put everything under the 'dropbox' folder, or create a symbolic link with name 'dropbox': 18 | 19 | ln -s /path/to/your/downloaded/files dropbox 20 | 21 | Finally the folder structure should look like this: 22 | 23 | graph_adversarial_attack (project root) 24 | |__ README.md 25 | |__ code 26 | |__ pytorch_structure2vec 27 | |__ dropbox 28 | |__ |__ data 29 | | |__ scratch 30 | |...... 31 | 32 | Optionally, you can use the data generator under ``code/data_generator`` to generate the synthetic data. 33 | 34 | ### 2. install dependencies and build c++ backend 35 | 36 | The current code depends on pytorch 0.3.1, cffi and CUDA 9.1. Please install using the following command (for linux): 37 | 38 | pip install http://download.pytorch.org/whl/cu91/torch-0.3.1-cp27-cp27mu-linux_x86_64.whl 39 | pip install cffi==1.11.2 40 | 41 | The c++ code needs to be built first: 42 | 43 | cd pytorch_structure2vec/s2v_lib 44 | make 45 | cd code/common 46 | make 47 | 48 | ### 3. Train the graph classification and node classification model (our attack target) 49 | 50 | If you want to retrain the target model, go to either ``code/graph_classification`` or ``code/node_classification`` and run the script in train mode. For example: 51 | 52 | cd code/graph_classification 53 | ./run_er_components.sh -phase train 54 | 55 | You can also use the pretrained model that is the same as used in this paper, under the folder ``dropbox/scratch/results`` 56 | 57 | ### 4. Attack the above trained model. 58 | 59 | In this paper, we presented 5 different approaches for attack, under both graph-level classification and node-level classification tasks. The code for attack can be found under ``code/graph_attack`` and ``code/node_attack``, respectively. 60 | 61 | For example, to use Q-leaning to attack the graph classification method, do the following: 62 | 63 | cd code/graph_attack 64 | ./run_dqn.sh -phase train 65 | 66 | ### Reference 67 | 68 | @article{dai2018adversarial, 69 | title={Adversarial Attack on Graph Structured Data}, 70 | author={Dai, Hanjun and Li, Hui and Tian, Tian and Huang, Xin and Wang, Lin and Zhu, Jun and Song, Le}, 71 | journal={arXiv preprint arXiv:1806.02371}, 72 | year={2018} 73 | } 74 | 75 | 76 | -------------------------------------------------------------------------------- /code/common/Makefile: -------------------------------------------------------------------------------- 1 | dir_guard = @mkdir -p $(@D) 2 | 3 | #INTEL_ROOT := /opt/intel 4 | MKL_ROOT = $(INTEL_ROOT)/mkl 5 | TBB_ROOT = $(INTEL_ROOT)/tbb 6 | 7 | FIND := find 8 | CXX := g++ 9 | CXXFLAGS += -Wall -O3 -std=c++11 10 | LDFLAGS += -lm -lmkl_rt -ltbb 11 | 12 | #CUDA_HOME := /usr/local/cuda 13 | NVCC := $(CUDA_HOME)/bin/nvcc 14 | NVCCFLAGS += --default-stream per-thread 15 | LDFLAGS += -L$(CUDA_HOME)/lib64 -lcudart -lcublas -lcurand -lcusparse 16 | 17 | CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ 18 | -gencode arch=compute_35,code=sm_35 \ 19 | -gencode arch=compute_50,code=sm_50 \ 20 | -gencode arch=compute_50,code=compute_50 21 | 22 | build_root = _ext 23 | obj_build_root = $(build_root) 24 | 25 | include_dirs = $(CUDA_HOME)/include $(MKL_ROOT)/include $(TBB_ROOT)/include include 26 | CXXFLAGS += $(addprefix -I,$(include_dirs)) 27 | CXXFLAGS += -fPIC 28 | 29 | NVCCFLAGS += $(addprefix -I,$(include_dirs)) 30 | NVCCFLAGS += -std=c++11 --use_fast_math --compiler-options '-fPIC' 31 | cu_files = $(shell $(FIND) src/ -name "*.cu" -printf "%P\n") 32 | cu_obj_files = $(subst .cu,.o,$(cu_files)) 33 | objs = $(addprefix $(obj_build_root)/,$(cu_obj_files)) 34 | 35 | DEPS = ${objs:.o=.d} 36 | mylib = _ext/my_lib/_my_lib.so 37 | 38 | all: $(objs) $(mylib) 39 | 40 | $(obj_build_root)/%.o: src/%.cu 41 | $(dir_guard) 42 | $(NVCC) $(NVCCFLAGS) $(CUDA_ARCH) -M $< -o ${@:.o=.d} -odir $(@D) 43 | $(NVCC) $(NVCCFLAGS) $(CUDA_ARCH) -c $< -o $@ 44 | 45 | $(mylib): src/*.c src/*.h src/*.cu 46 | python build.py 47 | 48 | clean: 49 | rm -f $(obj_build_root)/*.o 50 | rm -f $(obj_build_root)/*.d 51 | rm -rf _ext 52 | rm -f functions/*.pyc 53 | rm -f modules/*.pyc 54 | -include $(DEPS) 55 | -------------------------------------------------------------------------------- /code/common/build.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | from torch.utils.ffi import create_extension 4 | 5 | this_file = os.path.dirname(__file__) 6 | 7 | sources = ['src/my_lib.c'] 8 | headers = ['src/my_lib.h'] 9 | defines = [] 10 | with_cuda = False 11 | 12 | if torch.cuda.is_available(): 13 | print('Including CUDA code.') 14 | sources += ['src/my_lib_cuda.c'] 15 | headers += ['src/my_lib_cuda.h'] 16 | defines += [('WITH_CUDA', None)] 17 | with_cuda = True 18 | 19 | this_file = os.path.dirname(os.path.realpath(__file__)) 20 | extra_objects = ['_ext/custom_kernel.o'] 21 | extra_objects = [os.path.join(this_file, fname) for fname in extra_objects] 22 | 23 | ffi = create_extension( 24 | '_ext.my_lib', 25 | headers=headers, 26 | sources=sources, 27 | define_macros=defines, 28 | relative_to=__file__, 29 | with_cuda=with_cuda, 30 | extra_objects=extra_objects, 31 | extra_compile_args=['-fopenmp'], 32 | extra_link_args=['-lgomp'] 33 | ) 34 | 35 | if __name__ == '__main__': 36 | ffi.build() 37 | -------------------------------------------------------------------------------- /code/common/cmd_args.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import cPickle as cp 3 | 4 | cmd_opt = argparse.ArgumentParser(description='Argparser for molecule vae') 5 | cmd_opt.add_argument('-data_folder', type=str, default=None, help='data folder') 6 | cmd_opt.add_argument('-saved_model', type=str, default=None, help='saved model') 7 | cmd_opt.add_argument('-save_dir', type=str, default=None, help='save folder') 8 | cmd_opt.add_argument('-ctx', type=str, default='cpu', help='cpu/gpu') 9 | cmd_opt.add_argument('-phase', type=str, default='test', help='train/test') 10 | cmd_opt.add_argument('-logfile', type=str, default=None, help='log') 11 | 12 | cmd_opt.add_argument('-batch_size', type=int, default=50, help='minibatch size') 13 | cmd_opt.add_argument('-seed', type=int, default=1, help='seed') 14 | cmd_opt.add_argument('-min_n', type=int, default=0, help='min #nodes') 15 | cmd_opt.add_argument('-max_n', type=int, default=0, help='max #nodes') 16 | cmd_opt.add_argument('-min_c', type=int, default=0, help='min #components') 17 | cmd_opt.add_argument('-max_c', type=int, default=0, help='max #components') 18 | cmd_opt.add_argument('-er_p', type=float, default=0, help='parameter of er graphs') 19 | cmd_opt.add_argument('-n_graphs', type=int, default=0, help='number of graphs') 20 | cmd_opt.add_argument('-gm', default='mean_field', help='mean_field/loopy_bp/gcn') 21 | cmd_opt.add_argument('-latent_dim', type=int, default=64, help='dimension of latent layers') 22 | cmd_opt.add_argument('-out_dim', type=int, default=0, help='s2v output size') 23 | cmd_opt.add_argument('-hidden', type=int, default=32, help='dimension of classification') 24 | cmd_opt.add_argument('-max_lv', type=int, default=2, help='max rounds of message passing') 25 | 26 | cmd_opt.add_argument('-num_epochs', type=int, default=1000, help='number of epochs') 27 | cmd_opt.add_argument('-learning_rate', type=float, default=0.001, help='init learning_rate') 28 | cmd_opt.add_argument('-weight_decay', type=float, default=5e-4, help='weight_decay') 29 | cmd_opt.add_argument('-dropout', type=float, default=0.5, help='dropout rate') 30 | 31 | # for node classification 32 | cmd_opt.add_argument('-dataset', type=str, default=None, help='citeseer/cora/pubmed') 33 | cmd_opt.add_argument('-feature_dim', type=int, default=None, help='node feature dim') 34 | cmd_opt.add_argument('-num_class', type=int, default=None, help='# classes') 35 | cmd_opt.add_argument('-adj_norm', type=int, default=1, help='normalize the adj or not') 36 | 37 | # for bio graph classification 38 | cmd_opt.add_argument('-feat_dim', type=int, default=0, help='dimension of node feature') 39 | cmd_opt.add_argument('-fold', type=int, default=1, help='fold (1..10)') 40 | 41 | # for attack 42 | cmd_opt.add_argument('-idx_start', type=int, default=None, help='id of graph or node index') 43 | cmd_opt.add_argument('-num_instances', type=int, default=None, help='num of samples for attack, in genetic algorithm') 44 | cmd_opt.add_argument('-num_steps', type=int, default=100000, help='rl training steps') 45 | cmd_opt.add_argument('-targeted', type=int, default=0, help='0/1 target attack or not') 46 | cmd_opt.add_argument('-frac_meta', type=float, default=0, help='fraction for meta rl learning') 47 | cmd_opt.add_argument('-meta_test', type=int, default=0, help='for meta rl learning') 48 | cmd_opt.add_argument('-rand_att_type', type=str, default=None, help='random/exhaust') 49 | cmd_opt.add_argument('-reward_type', type=str, default=None, help='binary/nll') 50 | cmd_opt.add_argument('-base_model_dump', type=str, default=None, help='saved base model') 51 | cmd_opt.add_argument('-num_mod', type=int, default=1, help='number of modifications allowed') 52 | 53 | # for genetic algorithm 54 | cmd_opt.add_argument('-population_size', type=int, default=100, help='population size') 55 | cmd_opt.add_argument('-cross_rate', type=float, default=0.1, help='cross_rate') 56 | cmd_opt.add_argument('-mutate_rate', type=float, default=0.2, help='mutate rate') 57 | cmd_opt.add_argument('-rounds', type=int, default=10, help='rounds of evolution') 58 | 59 | # for node attack 60 | cmd_opt.add_argument('-bilin_q', type=int, default=0, help='bilinear q or not') 61 | cmd_opt.add_argument('-mlp_hidden', type=int, default=64, help='mlp hidden layer size') 62 | cmd_opt.add_argument('-n_hops', type=int, default=2, help='attack range') 63 | 64 | # for defence 65 | cmd_opt.add_argument('-del_rate', type=float, default=0, help='rate of deleting edge') 66 | 67 | cmd_args, _ = cmd_opt.parse_known_args() 68 | 69 | print(cmd_args) 70 | 71 | def build_kwargs(keys, arg_dict): 72 | st = '' 73 | for key in keys: 74 | st += '%s-%s' % (key, str(arg_dict[key])) 75 | return st 76 | 77 | def save_args(fout, args): 78 | with open(fout, 'wb') as f: 79 | cp.dump(args, f, cp.HIGHEST_PROTOCOL) -------------------------------------------------------------------------------- /code/common/dnn.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import random 8 | from torch.autograd import Variable 9 | from torch.nn.parameter import Parameter 10 | import torch.nn as nn 11 | import torch.nn.functional as F 12 | import torch.optim as optim 13 | from tqdm import tqdm 14 | import networkx as nx 15 | 16 | from cmd_args import cmd_args 17 | from graph_embedding import EmbedMeanField, EmbedLoopyBP 18 | 19 | sys.path.append('%s/../../pytorch_structure2vec/s2v_lib' % os.path.dirname(os.path.realpath(__file__))) 20 | from pytorch_util import weights_init 21 | 22 | class MLPRegression(nn.Module): 23 | def __init__(self, input_size, hidden_size): 24 | super(MLPRegression, self).__init__() 25 | 26 | self.h1_weights = nn.Linear(input_size, hidden_size) 27 | self.h2_weights = nn.Linear(hidden_size, 1) 28 | 29 | weights_init(self) 30 | 31 | def forward(self, x, y = None): 32 | h1 = self.h1_weights(x) 33 | h1 = F.relu(h1) 34 | 35 | pred = self.h2_weights(h1) 36 | 37 | if y is not None: 38 | y = Variable(y) 39 | mse = F.mse_loss(pred, y) 40 | mae = F.l1_loss(pred, y) 41 | return pred, mae, mse 42 | else: 43 | return pred 44 | 45 | class MLPClassifier(nn.Module): 46 | def __init__(self, input_size, hidden_size, num_class): 47 | super(MLPClassifier, self).__init__() 48 | self.hidden_size = hidden_size 49 | if hidden_size > 0: 50 | self.h1_weights = nn.Linear(input_size, hidden_size) 51 | self.last_weights = nn.Linear(hidden_size, num_class) 52 | else: 53 | self.last_weights = nn.Linear(input_size, num_class) 54 | 55 | weights_init(self) 56 | 57 | def forward(self, x, y = None): 58 | if self.hidden_size: 59 | x = self.h1_weights(x) 60 | x = F.relu(x) 61 | 62 | logits = self.last_weights(x) 63 | logits = F.log_softmax(logits, dim=1) 64 | 65 | if y is not None: 66 | y = Variable(y) 67 | loss = F.nll_loss(logits, y) 68 | 69 | pred = logits.data.max(1, keepdim=True)[1] 70 | acc = pred.eq(y.data.view_as(pred)).cpu() 71 | return logits, loss, acc 72 | else: 73 | return logits 74 | 75 | class GraphClassifier(nn.Module): 76 | def __init__(self, label_map, **kwargs): 77 | super(GraphClassifier, self).__init__() 78 | self.label_map = label_map 79 | if kwargs['gm'] == 'mean_field': 80 | model = EmbedMeanField 81 | elif kwargs['gm'] == 'loopy_bp': 82 | model = EmbedLoopyBP 83 | else: 84 | print('unknown gm %s' % kwargs['gm']) 85 | sys.exit() 86 | if 'feat_dim' in kwargs: 87 | self.feat_dim = kwargs['feat_dim'] 88 | else: 89 | self.feat_dim = 0 90 | self.s2v = model(latent_dim=kwargs['latent_dim'], 91 | output_dim=kwargs['out_dim'], 92 | num_node_feats=kwargs['feat_dim'], 93 | num_edge_feats=0, 94 | max_lv=kwargs['max_lv']) 95 | out_dim = kwargs['out_dim'] 96 | if out_dim == 0: 97 | out_dim = kwargs['latent_dim'] 98 | self.mlp = MLPClassifier(input_size=out_dim, hidden_size=kwargs['hidden'], num_class=len(label_map)) 99 | 100 | def PrepareFeatureLabel(self, batch_graph): 101 | labels = torch.LongTensor(len(batch_graph)) 102 | n_nodes = 0 103 | concat_feat = [] 104 | for i in range(len(batch_graph)): 105 | labels[i] = self.label_map[batch_graph[i].label] 106 | n_nodes += batch_graph[i].num_nodes 107 | if batch_graph[i].node_tags is not None: 108 | concat_feat += batch_graph[i].node_tags 109 | if len(concat_feat): 110 | node_feat = torch.zeros(n_nodes, self.feat_dim) 111 | concat_feat = torch.LongTensor(concat_feat).view(-1, 1) 112 | node_feat.scatter_(1, concat_feat, 1) 113 | else: 114 | node_feat = torch.ones(n_nodes, 1) 115 | if cmd_args.ctx == 'gpu': 116 | node_feat = node_feat.cuda() 117 | return node_feat, None, labels 118 | 119 | def forward(self, batch_graph): 120 | node_feat, edge_feat, labels = self.PrepareFeatureLabel(batch_graph) 121 | if cmd_args.ctx == 'gpu': 122 | node_feat = node_feat.cuda() 123 | labels = labels.cuda() 124 | 125 | _, embed = self.s2v(batch_graph, node_feat, edge_feat, pool_global=True) 126 | 127 | return self.mlp(embed, labels) 128 | -------------------------------------------------------------------------------- /code/common/functions/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hanjun-Dai/graph_adversarial_attack/f2aaad73efd142bcc20d5e8c43117e5359f9aa8e/code/common/functions/__init__.py -------------------------------------------------------------------------------- /code/common/functions/custom_func.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.autograd import Function 3 | from _ext import my_lib 4 | import sys 5 | 6 | class JaggedLogSoftmax(Function): 7 | def forward(self, logits, prefix_sum): 8 | self.save_for_backward(prefix_sum) 9 | 10 | assert len(prefix_sum.size()) == 1 11 | output = logits.new() 12 | if not logits.is_cuda: 13 | my_lib.jagged_log_softmax_forward(logits, prefix_sum, output) 14 | else: 15 | my_lib.jagged_log_softmax_forward_cuda(logits, prefix_sum, output) 16 | 17 | self.save_for_backward(prefix_sum, output) 18 | return output 19 | 20 | def backward(self, grad_output): 21 | prefix_sum, output = self.saved_variables 22 | grad_input = grad_output.new() 23 | if not grad_output.is_cuda: 24 | my_lib.jagged_log_softmax_backward(output.data, grad_output, prefix_sum.data, grad_input) 25 | else: 26 | my_lib.jagged_log_softmax_backward_cuda(output.data, grad_output, prefix_sum.data, grad_input) 27 | return grad_input, None 28 | 29 | class JaggedArgmax(Function): 30 | def forward(self, values, prefix_sum): 31 | assert len(prefix_sum.size()) == 1 32 | output = prefix_sum.new() 33 | if not values.is_cuda: 34 | my_lib.jagged_argmax_forward(values, prefix_sum, output) 35 | else: 36 | my_lib.jagged_argmax_forward_cuda(values, prefix_sum, output) 37 | 38 | return output 39 | 40 | def backward(self, grad_output): 41 | assert False 42 | 43 | class JaggedMax(Function): 44 | def forward(self, values, prefix_sum): 45 | assert len(prefix_sum.size()) == 1 46 | idxes = prefix_sum.new() 47 | vmax = values.new() 48 | if not values.is_cuda: 49 | my_lib.jagged_max_forward(values, prefix_sum, vmax, idxes) 50 | else: 51 | my_lib.jagged_max_forward_cuda(values, prefix_sum, vmax, idxes) 52 | 53 | return vmax, idxes 54 | 55 | def backward(self, grad_output): 56 | assert False 57 | 58 | def GraphLaplacianNorm(raw_adj): 59 | ones = torch.ones(raw_adj.size()[0], 1) 60 | if raw_adj.is_cuda: 61 | ones = ones.cuda() 62 | norm = torch.mm(raw_adj, ones) ** 0.5 63 | indices = raw_adj._indices() 64 | values = raw_adj._values() 65 | if not values.is_cuda: 66 | my_lib.graph_laplacian_norm(indices, values, norm) 67 | else: 68 | my_lib.graph_laplacian_norm_cuda(indices, values, norm) 69 | 70 | def GraphDegreeNorm(raw_adj): 71 | ones = torch.ones(raw_adj.size()[0], 1) 72 | if raw_adj.is_cuda: 73 | ones = ones.cuda() 74 | norm = torch.mm(raw_adj, ones) 75 | indices = raw_adj._indices() 76 | values = raw_adj._values() 77 | if not values.is_cuda: 78 | my_lib.graph_degree_norm(indices, values, norm) 79 | else: 80 | my_lib.graph_degree_norm_cuda(indices, values, norm) 81 | -------------------------------------------------------------------------------- /code/common/graph_embedding.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import random 8 | from torch.autograd import Variable 9 | from torch.nn.parameter import Parameter 10 | import torch.nn as nn 11 | import torch.nn.functional as F 12 | import torch.optim as optim 13 | from tqdm import tqdm 14 | import networkx as nx 15 | 16 | sys.path.append('%s/../../pytorch_structure2vec/s2v_lib' % os.path.dirname(os.path.realpath(__file__))) 17 | from s2v_lib import S2VLIB 18 | from pytorch_util import weights_init 19 | 20 | class S2VGraph(object): 21 | def __init__(self, g, label, node_tags=None): 22 | self.num_nodes = len(g) 23 | self.node_tags = node_tags 24 | x, y = zip(*g.edges()) 25 | self.num_edges = len(x) 26 | self.label = label 27 | 28 | self.edge_pairs = np.ndarray(shape=(self.num_edges, 2), dtype=np.int32) 29 | self.edge_pairs[:, 0] = x 30 | self.edge_pairs[:, 1] = y 31 | self.edge_pairs = self.edge_pairs.flatten() 32 | 33 | def to_networkx(self): 34 | edges = np.reshape(self.edge_pairs, (self.num_edges, 2)) 35 | g = nx.Graph() 36 | g.add_edges_from(edges) 37 | return g 38 | 39 | class MySpMM(torch.autograd.Function): 40 | 41 | @staticmethod 42 | def forward(ctx, sp_mat, dense_mat): 43 | ctx.save_for_backward(sp_mat, dense_mat) 44 | 45 | return torch.mm(sp_mat, dense_mat) 46 | 47 | @staticmethod 48 | def backward(ctx, grad_output): 49 | sp_mat, dense_mat = ctx.saved_variables 50 | grad_matrix1 = grad_matrix2 = None 51 | 52 | if ctx.needs_input_grad[0]: 53 | grad_matrix1 = Variable(torch.mm(grad_output.data, dense_mat.data.t())) 54 | if ctx.needs_input_grad[1]: 55 | grad_matrix2 = Variable(torch.mm(sp_mat.data.t(), grad_output.data)) 56 | 57 | return grad_matrix1, grad_matrix2 58 | 59 | def gnn_spmm(sp_mat, dense_mat): 60 | return MySpMM.apply(sp_mat, dense_mat) 61 | 62 | 63 | class EmbedMeanField(nn.Module): 64 | def __init__(self, latent_dim, output_dim, num_node_feats, num_edge_feats, max_lv = 3): 65 | super(EmbedMeanField, self).__init__() 66 | self.latent_dim = latent_dim 67 | self.output_dim = output_dim 68 | self.num_node_feats = num_node_feats 69 | self.num_edge_feats = num_edge_feats 70 | 71 | self.max_lv = max_lv 72 | 73 | self.w_n2l = nn.Linear(num_node_feats, latent_dim) 74 | if num_edge_feats > 0: 75 | self.w_e2l = nn.Linear(num_edge_feats, latent_dim) 76 | if output_dim > 0: 77 | self.out_params = nn.Linear(latent_dim, output_dim) 78 | 79 | self.conv_params = nn.Linear(latent_dim, latent_dim) 80 | weights_init(self) 81 | 82 | def forward(self, graph_list, node_feat, edge_feat, pool_global=True, n2n_grad=False, e2n_grad=False): 83 | n2n_sp, e2n_sp, subg_sp = S2VLIB.PrepareMeanField(graph_list) 84 | if type(node_feat) is torch.cuda.FloatTensor: 85 | n2n_sp = n2n_sp.cuda() 86 | e2n_sp = e2n_sp.cuda() 87 | subg_sp = subg_sp.cuda() 88 | node_feat = Variable(node_feat) 89 | if edge_feat is not None: 90 | edge_feat = Variable(edge_feat) 91 | n2n_sp = Variable(n2n_sp, requires_grad=n2n_grad) 92 | e2n_sp = Variable(e2n_sp, requires_grad=e2n_grad) 93 | subg_sp = Variable(subg_sp) 94 | 95 | h = self.mean_field(node_feat, edge_feat, n2n_sp, e2n_sp, subg_sp, pool_global) 96 | 97 | if n2n_grad or e2n_grad: 98 | sp_dict = {'n2n' : n2n_sp, 'e2n' : e2n_sp} 99 | return h, sp_dict 100 | else: 101 | return h 102 | 103 | def mean_field(self, node_feat, edge_feat, n2n_sp, e2n_sp, subg_sp, pool_global): 104 | input_node_linear = self.w_n2l(node_feat) 105 | input_message = input_node_linear 106 | if edge_feat is not None: 107 | input_edge_linear = self.w_e2l(edge_feat) 108 | e2npool_input = gnn_spmm(e2n_sp, input_edge_linear) 109 | input_message += e2npool_input 110 | input_potential = F.relu(input_message) 111 | 112 | lv = 0 113 | cur_message_layer = input_potential 114 | while lv < self.max_lv: 115 | n2npool = gnn_spmm(n2n_sp, cur_message_layer) 116 | node_linear = self.conv_params( n2npool ) 117 | merged_linear = node_linear + input_message 118 | 119 | cur_message_layer = F.relu(merged_linear) 120 | lv += 1 121 | if self.output_dim > 0: 122 | out_linear = self.out_params(cur_message_layer) 123 | reluact_fp = F.relu(out_linear) 124 | else: 125 | reluact_fp = cur_message_layer 126 | 127 | if pool_global: 128 | y_potential = gnn_spmm(subg_sp, reluact_fp) 129 | return reluact_fp, F.relu(y_potential) 130 | else: 131 | return reluact_fp 132 | 133 | class EmbedLoopyBP(nn.Module): 134 | def __init__(self, latent_dim, output_dim, num_node_feats, num_edge_feats, max_lv = 3): 135 | super(EmbedLoopyBP, self).__init__() 136 | self.latent_dim = latent_dim 137 | self.max_lv = max_lv 138 | 139 | self.w_n2l = nn.Linear(num_node_feats, latent_dim) 140 | self.w_e2l = nn.Linear(num_edge_feats, latent_dim) 141 | self.out_params = nn.Linear(latent_dim, output_dim) 142 | 143 | self.conv_params = nn.Linear(latent_dim, latent_dim) 144 | weights_init(self) 145 | 146 | def forward(self, graph_list, node_feat, edge_feat): 147 | n2e_sp, e2e_sp, e2n_sp, subg_sp = S2VLIB.PrepareLoopyBP(graph_list) 148 | if type(node_feat) is torch.cuda.FloatTensor: 149 | n2e_sp = n2e_sp.cuda() 150 | e2e_sp = e2e_sp.cuda() 151 | e2n_sp = e2n_sp.cuda() 152 | subg_sp = subg_sp.cuda() 153 | node_feat = Variable(node_feat) 154 | edge_feat = Variable(edge_feat) 155 | n2e_sp = Variable(n2e_sp) 156 | e2e_sp = Variable(e2e_sp) 157 | e2n_sp = Variable(e2n_sp) 158 | subg_sp = Variable(subg_sp) 159 | 160 | h = self.loopy_bp(node_feat, edge_feat, n2e_sp, e2e_sp, e2n_sp, subg_sp) 161 | 162 | return h 163 | 164 | def loopy_bp(self, node_feat, edge_feat, n2e_sp, e2e_sp, e2n_sp, subg_sp): 165 | input_node_linear = self.w_n2l(node_feat) 166 | input_edge_linear = self.w_e2l(edge_feat) 167 | 168 | n2epool_input = gnn_spmm(n2e_sp, input_node_linear) 169 | 170 | input_message = input_edge_linear + n2epool_input 171 | input_potential = F.relu(input_message) 172 | 173 | lv = 0 174 | cur_message_layer = input_potential 175 | while lv < self.max_lv: 176 | e2epool = gnn_spmm(e2e_sp, cur_message_layer) 177 | edge_linear = self.conv_params(e2epool) 178 | merged_linear = edge_linear + input_message 179 | 180 | cur_message_layer = F.relu(merged_linear) 181 | lv += 1 182 | 183 | e2npool = gnn_spmm(e2n_sp, cur_message_layer) 184 | hidden_msg = F.relu(e2npool) 185 | out_linear = self.out_params(hidden_msg) 186 | reluact_fp = F.relu(out_linear) 187 | 188 | y_potential = gnn_spmm(subg_sp, reluact_fp) 189 | 190 | return F.relu(y_potential) -------------------------------------------------------------------------------- /code/common/modules/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hanjun-Dai/graph_adversarial_attack/f2aaad73efd142bcc20d5e8c43117e5359f9aa8e/code/common/modules/__init__.py -------------------------------------------------------------------------------- /code/common/modules/custom_mod.py: -------------------------------------------------------------------------------- 1 | from torch.nn.modules.module import Module 2 | from functions.custom_func import JaggedLogSoftmax, JaggedArgmax, JaggedMax 3 | import networkx as nx 4 | import numpy as np 5 | 6 | class JaggedLogSoftmaxModule(Module): 7 | def forward(self, logits, prefix_sum): 8 | return JaggedLogSoftmax()(logits, prefix_sum) 9 | 10 | class JaggedArgmaxModule(Module): 11 | def forward(self, values, prefix_sum): 12 | return JaggedArgmax()(values, prefix_sum) 13 | 14 | class JaggedMaxModule(Module): 15 | def forward(self, values, prefix_sum): 16 | return JaggedMax()(values, prefix_sum) 17 | 18 | -------------------------------------------------------------------------------- /code/common/src/custom_kernel.cu: -------------------------------------------------------------------------------- 1 | 2 | #include 3 | #include 4 | #include 5 | 6 | #include "custom_kernel.h" 7 | 8 | struct SharedMem 9 | { 10 | __device__ double *getPointer() { 11 | extern __shared__ double s_double[]; 12 | return s_double; 13 | } 14 | }; 15 | 16 | 17 | struct MaxFloat 18 | { 19 | __device__ __forceinline__ double operator()(double max, float v) const { 20 | return max > static_cast(v) ? max : static_cast(v); 21 | } 22 | }; 23 | 24 | struct Max 25 | { 26 | __device__ __forceinline__ double operator()(double x, double y) const { 27 | return x > y ? x : y; 28 | } 29 | }; 30 | 31 | struct Add 32 | { 33 | __device__ __forceinline__ double operator()(double x, double y) const { 34 | return x + y; 35 | } 36 | }; 37 | 38 | struct AddFloat 39 | { 40 | __device__ __forceinline__ double operator()(double sum, float v) const { 41 | return sum + v; 42 | } 43 | }; 44 | 45 | struct SumExpFloat 46 | { 47 | __device__ __forceinline__ SumExpFloat(double v) 48 | : max_k(v) {} 49 | 50 | __device__ __forceinline__ double operator()(double sum, float v) const { 51 | return sum + static_cast(exp((double)v - max_k)); 52 | } 53 | 54 | const double max_k; 55 | }; 56 | 57 | template 58 | __device__ __forceinline__ double 59 | blockReduce(double* smem, double val, 60 | const Reduction& r, 61 | double defaultVal) 62 | { 63 | // To avoid RaW races from chaining blockReduce calls together, we need a sync here 64 | __syncthreads(); 65 | 66 | smem[threadIdx.x] = val; 67 | 68 | __syncthreads(); 69 | 70 | double warpVal = defaultVal; 71 | 72 | // First warp will perform per-warp reductions for the remaining warps 73 | if (threadIdx.x < 32) { 74 | int lane = threadIdx.x % 32; 75 | if (lane < blockDim.x / 32) { 76 | #pragma unroll 77 | for (int i = 0; i < 32; ++i) { 78 | warpVal = r(warpVal, smem[lane * 32 + i]); 79 | } 80 | smem[lane] = warpVal; 81 | } 82 | } 83 | 84 | __syncthreads(); 85 | 86 | // First thread will perform a reduction of the above per-warp reductions 87 | double blockVal = defaultVal; 88 | 89 | if (threadIdx.x == 0) { 90 | for (int i = 0; i < blockDim.x / 32; ++i) { 91 | blockVal = r(blockVal, smem[i]); 92 | } 93 | smem[0] = blockVal; 94 | } 95 | 96 | // Sync and broadcast 97 | __syncthreads(); 98 | return smem[0]; 99 | } 100 | 101 | 102 | template 103 | __device__ __forceinline__ double 104 | ilpReduce(float* data, 105 | int size, 106 | const Reduction& r, 107 | double defaultVal) 108 | { 109 | double threadVal = defaultVal; 110 | int offset = threadIdx.x; 111 | 112 | int last = size % (ILP * blockDim.x); 113 | 114 | // Body (unroll by ILP times) 115 | for (; offset < size - last; offset += blockDim.x * ILP) { 116 | float tmp[ILP]; 117 | 118 | #pragma unroll 119 | for (int j = 0; j < ILP; ++j) 120 | tmp[j] = data[offset + j * blockDim.x]; 121 | 122 | #pragma unroll 123 | for (int j = 0; j < ILP; ++j) 124 | threadVal = r(threadVal, tmp[j]); 125 | } 126 | 127 | // Epilogue 128 | for (; offset < size; offset += blockDim.x) 129 | threadVal = r(threadVal, data[offset]); 130 | 131 | return threadVal; 132 | } 133 | 134 | template 135 | __global__ void cunn_SoftMaxForward(float *output, float *input, long* ps) 136 | { 137 | SharedMem smem; 138 | double *buffer = smem.getPointer(); 139 | // forward pointers to batch[blockIdx.x] 140 | // each block handles a sample in the mini-batch 141 | long ofs = (blockIdx.x == 0) ? 0 : ps[blockIdx.x - 1]; 142 | long n_ele = ps[blockIdx.x] - ofs; 143 | input += ofs; 144 | output += ofs; 145 | 146 | // find the max 147 | double threadMax = ilpReduce(input, n_ele, MaxFloat(), -DBL_MAX); 148 | 149 | double max_k = blockReduce(buffer, threadMax, Max(), -DBL_MAX); 150 | // float max_k_non_accum = static_cast(max_k); 151 | 152 | // reduce all values 153 | double threadExp = ilpReduce(input, n_ele, SumExpFloat(max_k), static_cast(0)); 154 | 155 | double sumAll = blockReduce(buffer, threadExp, Add(), static_cast(0)); 156 | 157 | // Epilogue epilogue(max_k_non_accum, sumAll); 158 | // float logsum = max_k_non_accum + static_cast(log(sumAll)); 159 | double logsum = max_k + log(sumAll); 160 | 161 | int offset = threadIdx.x; 162 | int last = n_ele % (ILP * blockDim.x); 163 | for (; offset < n_ele - last; offset += blockDim.x * ILP) { 164 | float tmp[ILP]; 165 | 166 | #pragma unroll 167 | for (int j = 0; j < ILP; ++j) 168 | tmp[j] = input[offset + j * blockDim.x]; 169 | 170 | #pragma unroll 171 | for (int j = 0; j < ILP; ++j) 172 | output[offset + j * blockDim.x] = (double)tmp[j] - logsum; 173 | } 174 | 175 | for (; offset < n_ele; offset += blockDim.x) 176 | output[offset] = (double)input[offset] - logsum; 177 | } 178 | 179 | template 180 | __global__ void cunn_SoftMaxBackward(float *gradInput, float *output, float *gradOutput, long* ps) 181 | { 182 | SharedMem smem; 183 | double *buffer = smem.getPointer(); 184 | long ofs = (blockIdx.x == 0) ? 0 : ps[blockIdx.x - 1]; 185 | long n_ele = ps[blockIdx.x] - ofs; 186 | 187 | gradInput += ofs; 188 | output += ofs; 189 | gradOutput += ofs; 190 | 191 | double threadSum = ilpReduce( 192 | gradOutput, n_ele, AddFloat(), double(0)); 193 | double sum_k = blockReduce( 194 | buffer, threadSum, Add(), double(0)); 195 | 196 | int offset = threadIdx.x; 197 | int last = n_ele % (ILP * blockDim.x); 198 | for (; offset < n_ele - last; offset += blockDim.x * ILP) { 199 | float tmpGradOutput[ILP]; 200 | float tmpOutput[ILP]; 201 | 202 | #pragma unroll 203 | for (int j = 0; j < ILP; ++j) { 204 | tmpGradOutput[j] = gradOutput[offset + j * blockDim.x]; 205 | tmpOutput[j] = output[offset + j * blockDim.x]; 206 | } 207 | 208 | #pragma unroll 209 | for (int j = 0; j < ILP; ++j) 210 | gradInput[offset + j * blockDim.x] = tmpGradOutput[j] - exp((double)tmpOutput[j]) * sum_k; 211 | } 212 | 213 | for (; offset < n_ele; offset += blockDim.x) 214 | gradInput[offset] = gradOutput[offset] - exp((double)output[offset]) * sum_k; 215 | } 216 | 217 | 218 | void HostSoftMaxForward(cudaStream_t stream, float *input, float *output, long* ps, int bsize) 219 | { 220 | // This kernel spawns a block of 1024 threads per each element in the batch. 221 | // XXX: it assumes that inner_size == 1 222 | 223 | dim3 grid(bsize); 224 | dim3 block(1024); 225 | 226 | cunn_SoftMaxForward<2> 227 | <<>>( 228 | output, input, ps 229 | ); 230 | 231 | // THCudaCheck(cudaGetLastError()); 232 | } 233 | 234 | void HostSoftMaxBackward(cudaStream_t stream, float *gradOutput, float *gradInput, float *output, long* ps, int bsize) 235 | { 236 | dim3 grid(bsize); 237 | dim3 block(1024); 238 | 239 | cunn_SoftMaxBackward<2> 240 | <<>>( 241 | gradInput, output, gradOutput, ps 242 | ); 243 | 244 | // THCudaCheck(cudaGetLastError()); 245 | } 246 | 247 | __global__ void JaggedArgmaxKernel(long* dst, float *orig_ptr, long* ps) 248 | { 249 | __shared__ long buffer[256]; 250 | 251 | long ofs = (blockIdx.x == 0) ? 0 : ps[blockIdx.x - 1]; 252 | long cols = ps[blockIdx.x] - ofs; 253 | 254 | float* row_ptr = orig_ptr + ofs; 255 | 256 | int i_start = threadIdx.x; 257 | int i_end = cols; 258 | int i_step = blockDim.x; 259 | if (i_start < cols) 260 | buffer[threadIdx.x] = i_start; 261 | for (int i = i_start + i_step; i < i_end; i += i_step) 262 | { 263 | if (row_ptr[i] > row_ptr[buffer[threadIdx.x]]) 264 | buffer[threadIdx.x] = i; 265 | } 266 | __syncthreads(); 267 | 268 | int shift; 269 | for (int i = 8 - 1; i >= 0; --i) 270 | { 271 | shift = 1 << i; 272 | if (threadIdx.x < shift && threadIdx.x + shift < cols) 273 | { 274 | if (row_ptr[buffer[threadIdx.x + shift]] > row_ptr[buffer[threadIdx.x]]) 275 | buffer[threadIdx.x] = buffer[threadIdx.x + shift]; 276 | } 277 | __syncthreads(); 278 | } 279 | if (threadIdx.x == 0) 280 | dst[blockIdx.x] = buffer[0]; 281 | } 282 | 283 | void HostArgmaxForward(cudaStream_t stream, float *input, long *output, long* ps, int bsize) 284 | { 285 | dim3 grid(bsize); 286 | dim3 block(256); 287 | 288 | JaggedArgmaxKernel<<>>(output, input, ps); 289 | } 290 | 291 | __global__ void JaggedMaxKernel(float* vmax, long* idxes, float *orig_ptr, long* ps) 292 | { 293 | __shared__ long buffer[256]; 294 | long ofs = (blockIdx.x == 0) ? 0 : ps[blockIdx.x - 1]; 295 | long cols = ps[blockIdx.x] - ofs; 296 | 297 | float* row_ptr = orig_ptr + ofs; 298 | 299 | int i_start = threadIdx.x; 300 | int i_end = cols; 301 | int i_step = blockDim.x; 302 | if (i_start < cols) 303 | buffer[threadIdx.x] = i_start; 304 | for (int i = i_start + i_step; i < i_end; i += i_step) 305 | { 306 | if (row_ptr[i] > row_ptr[buffer[threadIdx.x]]) 307 | buffer[threadIdx.x] = i; 308 | } 309 | __syncthreads(); 310 | 311 | int shift; 312 | for (int i = 8 - 1; i >= 0; --i) 313 | { 314 | shift = 1 << i; 315 | if (threadIdx.x < shift && threadIdx.x + shift < cols) 316 | { 317 | if (row_ptr[buffer[threadIdx.x + shift]] > row_ptr[buffer[threadIdx.x]]) 318 | buffer[threadIdx.x] = buffer[threadIdx.x + shift]; 319 | } 320 | __syncthreads(); 321 | } 322 | if (threadIdx.x == 0) 323 | { 324 | idxes[blockIdx.x] = buffer[0]; 325 | vmax[blockIdx.x] = row_ptr[buffer[0]]; 326 | } 327 | } 328 | 329 | void HostMaxForward(cudaStream_t stream, float *input, float* vmax, long *idxes, long* ps, int bsize) 330 | { 331 | dim3 grid(bsize); 332 | dim3 block(256); 333 | 334 | JaggedMaxKernel<<>>(vmax, idxes, input, ps); 335 | } 336 | 337 | #define min(x, y) (x < y ? x : y) 338 | 339 | __global__ void GLapNormKernel(long* row_indices, long* col_indices, float* p_v, float* p_norm, int nnz) 340 | { 341 | int i = blockDim.x * blockIdx.x + threadIdx.x; 342 | 343 | if(i < nnz) 344 | { 345 | float norm = p_norm[ row_indices[i] ] * p_norm[ col_indices[i] ]; 346 | p_v[i] /= norm; 347 | } 348 | } 349 | 350 | void HostGLapNorm(cudaStream_t stream, long* row_indices, long* col_indices, float* p_v, float* p_norm, int nnz) 351 | { 352 | int thread_num = min(1024, nnz); 353 | int blocksPerGrid = (nnz + thread_num - 1) / thread_num; 354 | 355 | GLapNormKernel<<>> (row_indices, col_indices, p_v, p_norm, nnz); 356 | } 357 | 358 | __global__ void GDegreeNormKernel(long* row_indices, float* p_v, float* p_norm, int nnz) 359 | { 360 | int i = blockDim.x * blockIdx.x + threadIdx.x; 361 | 362 | if(i < nnz) 363 | { 364 | float norm = p_norm[ row_indices[i] ]; 365 | p_v[i] /= norm; 366 | } 367 | } 368 | 369 | void HostGDegreeNorm(cudaStream_t stream, long* row_indices, float* p_v, float* p_norm, int nnz) 370 | { 371 | int thread_num = min(1024, nnz); 372 | int blocksPerGrid = (nnz + thread_num - 1) / thread_num; 373 | 374 | GDegreeNormKernel<<>> (row_indices, p_v, p_norm, nnz); 375 | } -------------------------------------------------------------------------------- /code/common/src/custom_kernel.h: -------------------------------------------------------------------------------- 1 | #ifndef JAGGED_SOFTMAX_KERNEL_H 2 | #define JAGGED_SOFTMAX_KERNEL_H 3 | 4 | #ifdef __cplusplus 5 | extern "C" { 6 | #endif 7 | 8 | void HostSoftMaxForward(cudaStream_t stream, float *input, float *output, long* ps, int bsize); 9 | 10 | void HostSoftMaxBackward(cudaStream_t stream, float *gradOutput, float *gradInput, float *output, long* ps, int bsize); 11 | 12 | void HostArgmaxForward(cudaStream_t stream, float *input, long *output, long* ps, int bsize); 13 | 14 | void HostMaxForward(cudaStream_t stream, float *input, float* vmax, long *idxes, long* ps, int bsize); 15 | 16 | void HostGLapNorm(cudaStream_t stream, long* row_indices, long* col_indices, float* p_v, float* p_norm, int nnz); 17 | 18 | void HostGDegreeNorm(cudaStream_t stream, long* row_indices, float* p_v, float* p_norm, int nnz); 19 | 20 | #ifdef __cplusplus 21 | } 22 | #endif 23 | 24 | #endif 25 | -------------------------------------------------------------------------------- /code/common/src/my_lib.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | int jagged_argmax_forward(THFloatTensor *values, THLongTensor *prefix_sum, THLongTensor *output) 5 | { 6 | values = THFloatTensor_newContiguous(values); 7 | THLongTensor_resizeAs(output, prefix_sum); 8 | 9 | float *input_data_base = values->storage->data + values->storageOffset;; 10 | long *ps = prefix_sum->storage->data + prefix_sum->storageOffset; 11 | long *p_out = output->storage->data + output->storageOffset; 12 | long bsize = (long)prefix_sum->size[0]; 13 | long i, d; 14 | 15 | #pragma omp parallel for private(i, d) 16 | for (i = 0; i < bsize; i++) 17 | { 18 | long offset = (i == 0) ? 0 : ps[i - 1]; 19 | long n_ele = ps[i] - offset; 20 | 21 | float* input_data = input_data_base + offset; 22 | 23 | float max_input = -FLT_MAX; 24 | long max_id = -1; 25 | for (d = 0; d < n_ele; d++) 26 | if (input_data[d] > max_input) 27 | { 28 | max_input = input_data[d]; 29 | max_id = d; 30 | } 31 | assert(max_id >= 0); 32 | p_out[i] = max_id; 33 | } 34 | 35 | THFloatTensor_free(values); 36 | return 1; 37 | } 38 | 39 | int jagged_max_forward(THFloatTensor *values, THLongTensor *prefix_sum, THFloatTensor *vmax, THLongTensor *idxes) 40 | { 41 | int64_t inputsize = prefix_sum->size[0]; 42 | 43 | values = THFloatTensor_newContiguous(values); 44 | THLongTensor_resize1d(idxes, inputsize); 45 | THFloatTensor_resize1d(vmax, inputsize); 46 | 47 | float *input_data_base = values->storage->data + values->storageOffset; 48 | long *ps = prefix_sum->storage->data + prefix_sum->storageOffset; 49 | float *p_maxv = vmax->storage->data + vmax->storageOffset; 50 | long *p_i = idxes->storage->data + idxes->storageOffset; 51 | 52 | long bsize = (long)prefix_sum->size[0]; 53 | long i, d; 54 | 55 | #pragma omp parallel for private(i, d) 56 | for (i = 0; i < bsize; i++) 57 | { 58 | long offset = (i == 0) ? 0 : ps[i - 1]; 59 | long n_ele = ps[i] - offset; 60 | 61 | float* input_data = input_data_base + offset; 62 | 63 | float max_input = -FLT_MAX; 64 | long max_id = -1; 65 | for (d = 0; d < n_ele; d++) 66 | if (input_data[d] > max_input) 67 | { 68 | max_input = input_data[d]; 69 | max_id = d; 70 | } 71 | assert(max_id >= 0); 72 | p_i[i] = max_id; 73 | p_maxv[i] = max_input; 74 | } 75 | 76 | THFloatTensor_free(values); 77 | return 1; 78 | } 79 | 80 | int jagged_log_softmax_forward(THFloatTensor *logits, THLongTensor *prefix_sum, THFloatTensor *output) 81 | { 82 | logits = THFloatTensor_newContiguous(logits); 83 | THFloatTensor_resizeAs(output, logits); 84 | float *input_data_base = logits->storage->data + logits->storageOffset;// THTensor_(data)(logits); 85 | long *ps = prefix_sum->storage->data + prefix_sum->storageOffset; 86 | float *output_data_base = output->storage->data + output->storageOffset; 87 | uint64_t bsize = (uint64_t)prefix_sum->size[0]; 88 | uint64_t i, d; 89 | 90 | #pragma omp parallel for private(i, d) 91 | for (i = 0; i < bsize; i++) 92 | { 93 | long offset = (i == 0) ? 0 : ps[i - 1]; 94 | 95 | float* input_data = input_data_base + offset; 96 | float* output_data = output_data_base + offset; 97 | 98 | long n_ele = ps[i] - offset; 99 | float max_input = -FLT_MAX; 100 | for (d = 0; d < n_ele; d++) 101 | max_input = THMax(max_input, input_data[d]); 102 | 103 | double logsum = 0; 104 | for (d = 0; d < n_ele; d++) 105 | logsum += exp(input_data[d] - max_input); 106 | logsum = max_input + log(logsum); 107 | 108 | for (d = 0; d < n_ele; d++) 109 | output_data[d] = input_data[d] - logsum; 110 | } 111 | 112 | THFloatTensor_free(logits); 113 | return 1; 114 | } 115 | 116 | int jagged_log_softmax_backward(THFloatTensor *output, THFloatTensor *grad_output, THLongTensor *prefix_sum, THFloatTensor *grad_input) 117 | { 118 | grad_output = THFloatTensor_newContiguous(grad_output); 119 | output = THFloatTensor_newContiguous(output); 120 | THFloatTensor_resizeAs(grad_input, grad_output); 121 | 122 | float *output_data_base = output->storage->data + output->storageOffset; 123 | float *gradOutput_data_base = grad_output->storage->data + grad_output->storageOffset; 124 | long *ps = prefix_sum->storage->data + prefix_sum->storageOffset; 125 | float *gradInput_data_base = grad_input->storage->data + grad_input->storageOffset; 126 | 127 | uint64_t bsize = (uint64_t)prefix_sum->size[0]; 128 | uint64_t i, d; 129 | #pragma omp parallel for private(i, d) 130 | for (i = 0; i < bsize; i++) 131 | { 132 | long offset = (i == 0) ? 0 : ps[i - 1]; 133 | float *gradInput_data = gradInput_data_base + offset; 134 | float *output_data = output_data_base + offset; 135 | float *gradOutput_data = gradOutput_data_base + offset; 136 | 137 | double sum = 0; 138 | long n_ele = ps[i] - offset; 139 | for (d = 0; d < n_ele; d++) 140 | sum += gradOutput_data[d]; 141 | 142 | for (d = 0; d < n_ele; d++) 143 | gradInput_data[d] = gradOutput_data[d] - exp(output_data[d]) * sum; 144 | } 145 | 146 | THFloatTensor_free(grad_output); 147 | THFloatTensor_free(output); 148 | return 1; 149 | } 150 | 151 | int graph_laplacian_norm(THLongTensor *indices, THFloatTensor *values, THFloatTensor *norm) 152 | { 153 | uint64_t nnz = (uint64_t)values->size[0]; 154 | long *row_indices = indices->storage->data + indices->storageOffset; 155 | long *col_indices = row_indices + indices->stride[0]; 156 | float *p_v = values->storage->data + values->storageOffset; 157 | float *p_norm = norm->storage->data + norm->storageOffset; 158 | 159 | uint64_t i; 160 | #pragma omp parallel for private(i) 161 | for (i = 0; i < nnz; i++) 162 | { 163 | float norm = p_norm[ row_indices[i] ] * p_norm[ col_indices[i] ]; 164 | p_v[i] /= norm; 165 | } 166 | 167 | return 1; 168 | } 169 | 170 | int graph_degree_norm(THLongTensor *indices, THFloatTensor *values, THFloatTensor *norm) 171 | { 172 | uint64_t nnz = (uint64_t)values->size[0]; 173 | long *row_indices = indices->storage->data + indices->storageOffset; 174 | float *p_v = values->storage->data + values->storageOffset; 175 | float *p_norm = norm->storage->data + norm->storageOffset; 176 | 177 | uint64_t i; 178 | #pragma omp parallel for private(i) 179 | for (i = 0; i < nnz; i++) 180 | { 181 | float norm = p_norm[ row_indices[i] ]; 182 | p_v[i] /= norm; 183 | } 184 | 185 | return 1; 186 | } -------------------------------------------------------------------------------- /code/common/src/my_lib.h: -------------------------------------------------------------------------------- 1 | int jagged_log_softmax_forward(THFloatTensor *logits, THLongTensor *prefix_sum, THFloatTensor *output); 2 | 3 | int jagged_log_softmax_backward(THFloatTensor *output, THFloatTensor *grad_output, THLongTensor *prefix_sum, THFloatTensor *grad_input); 4 | 5 | int jagged_argmax_forward(THFloatTensor *values, THLongTensor *prefix_sum, THLongTensor *output); 6 | 7 | int jagged_max_forward(THFloatTensor *values, THLongTensor *prefix_sum, THFloatTensor *vmax, THLongTensor *idxes); 8 | 9 | int graph_laplacian_norm(THLongTensor *indices, THFloatTensor *values, THFloatTensor *norm); 10 | 11 | int graph_degree_norm(THLongTensor *indices, THFloatTensor *values, THFloatTensor *norm); -------------------------------------------------------------------------------- /code/common/src/my_lib_cuda.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include "custom_kernel.h" 4 | 5 | // this symbol will be resolved automatically from PyTorch libs 6 | extern THCState *state; 7 | 8 | int jagged_log_softmax_forward_cuda(THCudaTensor *logits, THCudaLongTensor *prefix_sum, THCudaTensor *output) 9 | { 10 | logits = THCudaTensor_newContiguous(state, logits); 11 | THCudaTensor_resizeAs(state, output, logits); 12 | 13 | float *input_data_base = THCudaTensor_data(state, logits); 14 | long* ps = THCudaLongTensor_data(state, prefix_sum); 15 | float *output_data_base = THCudaTensor_data(state, output); 16 | 17 | int bsize = (int)prefix_sum->size[0]; 18 | cudaStream_t stream = THCState_getCurrentStream(state); 19 | HostSoftMaxForward(stream, input_data_base, output_data_base, ps, bsize); 20 | 21 | THCudaTensor_free(state, logits); 22 | return 1; 23 | } 24 | 25 | int jagged_log_softmax_backward_cuda(THCudaTensor *output, THCudaTensor *grad_output, THCudaLongTensor *prefix_sum, THCudaTensor *grad_input) 26 | { 27 | output = THCudaTensor_newContiguous(state, output); 28 | grad_output = THCudaTensor_newContiguous(state, grad_output); 29 | 30 | THCudaTensor_resizeAs(state, grad_input, grad_output); 31 | float *output_data_base = THCudaTensor_data(state, output); 32 | float *gradOutput_data_base = THCudaTensor_data(state, grad_output); 33 | long* ps = THCudaLongTensor_data(state, prefix_sum); 34 | float *gradInput_data_base = THCudaTensor_data(state, grad_input); 35 | 36 | int bsize = (int)prefix_sum->size[0]; 37 | cudaStream_t stream = THCState_getCurrentStream(state); 38 | HostSoftMaxBackward(stream, gradOutput_data_base, gradInput_data_base, output_data_base, ps, bsize); 39 | THCudaTensor_free(state, grad_output); 40 | THCudaTensor_free(state, output); 41 | return 1; 42 | } 43 | 44 | int jagged_argmax_forward_cuda(THCudaTensor *values, THCudaLongTensor *prefix_sum, THCudaLongTensor *output) 45 | { 46 | values = THCudaTensor_newContiguous(state, values); 47 | THCudaLongTensor_resizeAs(state, output, prefix_sum); 48 | 49 | float *input_data_base = THCudaTensor_data(state, values); 50 | long* ps = THCudaLongTensor_data(state, prefix_sum); 51 | long *output_data_base = THCudaLongTensor_data(state, output); 52 | 53 | int bsize = (int)prefix_sum->size[0]; 54 | cudaStream_t stream = THCState_getCurrentStream(state); 55 | HostArgmaxForward(stream, input_data_base, output_data_base, ps, bsize); 56 | 57 | THCudaTensor_free(state, values); 58 | return 1; 59 | } 60 | 61 | int jagged_max_forward_cuda(THCudaTensor *values, THCudaLongTensor *prefix_sum, THCudaTensor *vmax, THCudaLongTensor *idxes) 62 | { 63 | int64_t inputsize = prefix_sum->size[0]; 64 | values = THCudaTensor_newContiguous(state, values); 65 | THCudaLongTensor_resize1d(state, idxes, inputsize); 66 | THCudaTensor_resize1d(state, vmax, inputsize); 67 | 68 | float *input_data_base = THCudaTensor_data(state, values); 69 | long* ps = THCudaLongTensor_data(state, prefix_sum); 70 | long *p_i = THCudaLongTensor_data(state, idxes); 71 | float *p_maxv = THCudaTensor_data(state, vmax); 72 | 73 | int bsize = (int)prefix_sum->size[0]; 74 | cudaStream_t stream = THCState_getCurrentStream(state); 75 | HostMaxForward(stream, input_data_base, p_maxv, p_i, ps, bsize); 76 | 77 | THCudaTensor_free(state, values); 78 | return 1; 79 | } 80 | 81 | int graph_laplacian_norm_cuda(THCudaLongTensor *indices, THCudaTensor *values, THCudaTensor *norm) 82 | { 83 | uint64_t nnz = (uint64_t)values->size[0]; 84 | long *row_indices = THCudaLongTensor_data(state, indices); 85 | long *col_indices = row_indices + THCudaLongTensor_stride(state, indices, 0); 86 | float *p_v = THCudaTensor_data(state, values); 87 | float *p_norm = THCudaTensor_data(state, norm); 88 | 89 | cudaStream_t stream = THCState_getCurrentStream(state); 90 | HostGLapNorm(stream, row_indices, col_indices, p_v, p_norm, nnz); 91 | return 1; 92 | } 93 | 94 | int graph_degree_norm_cuda(THCudaLongTensor *indices, THCudaTensor *values, THCudaTensor *norm) 95 | { 96 | uint64_t nnz = (uint64_t)values->size[0]; 97 | long *row_indices = THCudaLongTensor_data(state, indices); 98 | float *p_v = THCudaTensor_data(state, values); 99 | float *p_norm = THCudaTensor_data(state, norm); 100 | 101 | cudaStream_t stream = THCState_getCurrentStream(state); 102 | HostGDegreeNorm(stream, row_indices, p_v, p_norm, nnz); 103 | return 1; 104 | } -------------------------------------------------------------------------------- /code/common/src/my_lib_cuda.h: -------------------------------------------------------------------------------- 1 | int jagged_log_softmax_forward_cuda(THCudaTensor *logits, THCudaLongTensor *prefix_sum, THCudaTensor *output); 2 | 3 | int jagged_log_softmax_backward_cuda(THCudaTensor *output, THCudaTensor *grad_output, THCudaLongTensor *prefix_sum, THCudaTensor *grad_input); 4 | 5 | int jagged_argmax_forward_cuda(THCudaTensor *values, THCudaLongTensor *prefix_sum, THCudaLongTensor *output); 6 | 7 | int jagged_max_forward_cuda(THCudaTensor *values, THCudaLongTensor *prefix_sum, THCudaTensor *vmax, THCudaLongTensor *idxes); 8 | 9 | int graph_laplacian_norm_cuda(THCudaLongTensor *indices, THCudaTensor *values, THCudaTensor *norm); 10 | 11 | int graph_degree_norm_cuda(THCudaLongTensor *indices, THCudaTensor *values, THCudaTensor *norm); -------------------------------------------------------------------------------- /code/common/test.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from torch.autograd import Function 4 | import torch.nn.functional as F 5 | from torch.autograd import Variable 6 | import numpy as np 7 | from modules.custom_mod import JaggedLogSoftmaxModule, JaggedArgmaxModule, JaggedMaxModule 8 | import sys 9 | 10 | def cpu_test(): 11 | mod = JaggedLogSoftmaxModule() 12 | for i in range(10): 13 | a = torch.rand(10000, 10) 14 | b = torch.from_numpy(np.array([ (i + 1) * int(a.size()[1]) for i in range(a.size()[0])])) 15 | c = mod(Variable(a), Variable(b)) 16 | c2 = F.log_softmax(Variable(a), dim=1) 17 | print(torch.sum(torch.abs(c - c2))) 18 | 19 | a = torch.rand(100, 30) 20 | b = torch.from_numpy(np.array([ (i + 1) * 30 for i in range(100)])) 21 | va = Variable(a, requires_grad=True) 22 | vb = Variable(b) 23 | c = mod(va, vb) 24 | t = F.torch.mean(c) 25 | t.backward() 26 | b1 = va.grad 27 | 28 | va = Variable(a, requires_grad=True) 29 | c = F.log_softmax(va, dim=1) 30 | t = F.torch.mean(c) 31 | t.backward() 32 | b2 = va.grad 33 | 34 | print(torch.sum(torch.abs(b1 - b2))) 35 | 36 | def gpu_test(): 37 | mod = JaggedLogSoftmaxModule() 38 | for i in range(10): 39 | a = torch.rand(10000, 10).cuda() 40 | b = torch.from_numpy(np.array([ (i + 1) * int(a.size()[1]) for i in range(a.size()[0])])).cuda() 41 | c1 = mod(Variable(a), Variable(b)) 42 | c2 = F.log_softmax(Variable(a), dim=1) 43 | c3 = F.log_softmax(Variable(a.cpu()), dim=1).cuda() 44 | print(torch.sum(torch.abs(c3 - c2)).data[0], torch.sum(torch.abs(c3 - c1)).data[0], torch.sum(torch.abs(c2 - c1)).data[0]) 45 | 46 | a = torch.rand(1000, 100).cuda() 47 | b = torch.from_numpy(np.array([ (i + 1) * int(a.size()[1]) for i in range(a.size()[0])])).cuda() 48 | va = Variable(a, requires_grad=True) 49 | vb = Variable(b) 50 | c = mod(va, vb) 51 | t = F.torch.sum(c) 52 | t.backward() 53 | b1 = va.grad 54 | 55 | va = Variable(a, requires_grad=True) 56 | c = F.log_softmax(va, dim=1) 57 | t = F.torch.sum(c) 58 | t.backward() 59 | b2 = va.grad 60 | 61 | va = Variable(a.cpu(), requires_grad=True) 62 | c = F.log_softmax(va, dim=1) 63 | t = F.torch.sum(c) 64 | t.backward() 65 | b3 = va.grad.cuda() 66 | print(torch.sum(torch.abs(b3 - b2)).data[0], torch.sum(torch.abs(b3 - b1)).data[0], torch.sum(torch.abs(b2 - b1)).data[0]) 67 | 68 | def argmax(): 69 | torch.manual_seed(1) 70 | mod = JaggedArgmaxModule() 71 | 72 | a = torch.rand(10, 4).cuda() 73 | print(a) 74 | b = torch.from_numpy(np.array([ (i + 1) * int(a.size()[1]) for i in range(a.size()[0])])).cuda() 75 | c = mod(Variable(a), Variable(b)) 76 | print(c) 77 | 78 | a = torch.randn(10).cuda() 79 | print(a) 80 | b = torch.LongTensor([2, 5, 9, 10]).cuda() 81 | c = mod(Variable(a), Variable(b)) 82 | print(c) 83 | 84 | torch.manual_seed(1) 85 | mod = JaggedMaxModule() 86 | 87 | a = torch.rand(10, 4).cuda() 88 | print(a) 89 | b = torch.from_numpy(np.array([ (i + 1) * int(a.size()[1]) for i in range(a.size()[0])])).cuda() 90 | c1, c2 = mod(Variable(a), Variable(b)) 91 | print(c1) 92 | print(c2) 93 | 94 | a = torch.randn(10).cuda() 95 | print(a) 96 | b = torch.LongTensor([2, 5, 9, 10]).cuda() 97 | c = mod(Variable(a), Variable(b)) 98 | print(c[0], c[1]) -------------------------------------------------------------------------------- /code/data_generator/data_util.py: -------------------------------------------------------------------------------- 1 | import cPickle as cp 2 | import networkx as nx 3 | 4 | def load_pkl(fname, num_graph): 5 | g_list = [] 6 | with open(fname, 'rb') as f: 7 | for i in range(num_graph): 8 | g = cp.load(f) 9 | g_list.append(g) 10 | return g_list 11 | 12 | def g2txt(g, label, fid): 13 | fid.write('%d %d\n' % (len(g), label)) 14 | for i in range(len(g)): 15 | fid.write('%d' % len(g.neighbors(i))) 16 | for j in g.neighbors(i): 17 | fid.write(' %d' % j) 18 | fid.write('\n') -------------------------------------------------------------------------------- /code/data_generator/gen_er_components.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import cPickle as cp 4 | import random 5 | import numpy as np 6 | import networkx as nx 7 | import time 8 | from tqdm import tqdm 9 | 10 | 11 | def get_component(): 12 | cur_n = np.random.randint(max_n - min_n + 1) + min_n 13 | g = nx.erdos_renyi_graph(n = cur_n, p = p) 14 | 15 | comps = [c for c in nx.connected_component_subgraphs(g)] 16 | random.shuffle(comps) 17 | for i in range(1, len(comps)): 18 | x = random.choice(comps[i - 1].nodes()) 19 | y = random.choice(comps[i].nodes()) 20 | g.add_edge(x, y) 21 | assert nx.is_connected(g) 22 | return g 23 | 24 | if __name__ == '__main__': 25 | save_dir = None 26 | max_n = None 27 | min_n = None 28 | num_graph = None 29 | p = None 30 | n_comp = None 31 | for i in range(1, len(sys.argv), 2): 32 | if sys.argv[i] == '-save_dir': 33 | save_dir = sys.argv[i + 1] 34 | if sys.argv[i] == '-max_n': 35 | max_n = int(sys.argv[i + 1]) 36 | if sys.argv[i] == '-min_n': 37 | min_n = int(sys.argv[i + 1]) 38 | if sys.argv[i] == '-num_graph': 39 | num_graph = int(sys.argv[i + 1]) 40 | if sys.argv[i] == '-p': 41 | p = float(sys.argv[i + 1]) 42 | if sys.argv[i] == '-n_comp': 43 | n_comp = int(sys.argv[i + 1]) 44 | 45 | assert save_dir is not None 46 | assert max_n is not None 47 | assert min_n is not None 48 | assert num_graph is not None 49 | assert p is not None 50 | assert n_comp is not None 51 | 52 | fout_name = '%s/ncomp-%d-nrange-%d-%d-n_graph-%d-p-%.2f.pkl' % (save_dir, n_comp, min_n, max_n, num_graph, p) 53 | print('Final Output: ' + fout_name) 54 | print("Generating graphs...") 55 | min_n = min_n // n_comp 56 | max_n = max_n // n_comp 57 | 58 | for i in tqdm(range(num_graph)): 59 | 60 | for j in range(n_comp): 61 | g = get_component() 62 | 63 | if j == 0: 64 | g_all = g 65 | else: 66 | g_all = nx.disjoint_union(g_all, g) 67 | assert nx.number_connected_components(g_all) == n_comp 68 | 69 | with open(fout_name, 'ab') as fout: 70 | cp.dump(g_all, fout, cp.HIGHEST_PROTOCOL) 71 | -------------------------------------------------------------------------------- /code/data_generator/pkl_dump.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | min_n=90 4 | max_n=100 5 | p=0.02 6 | output_root=../../dropbox/data/components 7 | 8 | if [ ! -e $output_root ]; 9 | then 10 | mkdir -p $output_root 11 | fi 12 | 13 | for t_c in 1 2 3 4 5; do 14 | 15 | n_comp=$t_c 16 | 17 | python gen_er_components.py \ 18 | -save_dir $output_root \ 19 | -max_n $max_n \ 20 | -min_n $min_n \ 21 | -num_graph 5000 \ 22 | -p $p \ 23 | -n_comp $n_comp 24 | 25 | done 26 | -------------------------------------------------------------------------------- /code/graph_attack/collect_rl_results.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | if __name__ == '__main__': 4 | result_root = '../../dropbox/scratch/results/graph_classification/components' 5 | targets = os.listdir(result_root) 6 | targets = sorted(targets) 7 | for fname in targets: 8 | if fname[0] == '.': 9 | continue 10 | configs = os.listdir(result_root + '/' + fname) 11 | best_num = 100 12 | best_config = None 13 | 14 | for config in configs: 15 | if config[0] == '.' or 'epoch-best' in config: 16 | continue 17 | if '0.1' in config: 18 | continue 19 | result = result_root + '/' + fname + '/' + config + '/epoch-best.txt' 20 | with open(result, 'r') as f: 21 | num = float(f.readline().strip()) 22 | if num < best_num: 23 | best_config = config 24 | best_num = num 25 | print fname, best_config, best_num 26 | -------------------------------------------------------------------------------- /code/graph_attack/dqn.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | from q_net import NStepQNet, QNet, greedy_actions 18 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 19 | from cmd_args import cmd_args 20 | 21 | from rl_common import GraphEdgeEnv, local_args, load_graphs, test_graphs, load_base_model, attackable, get_supervision 22 | from nstep_replay_mem import NstepReplayMem 23 | 24 | sys.path.append('%s/../graph_classification' % os.path.dirname(os.path.realpath(__file__))) 25 | from graph_common import loop_dataset 26 | 27 | class Agent(object): 28 | def __init__(self, g_list, test_g_list, env): 29 | self.g_list = g_list 30 | if test_g_list is None: 31 | self.test_g_list = g_list 32 | else: 33 | self.test_g_list = test_g_list 34 | self.mem_pool = NstepReplayMem(memory_size=50000, n_steps=2) 35 | self.env = env 36 | # self.net = QNet() 37 | self.net = NStepQNet(2) 38 | self.old_net = NStepQNet(2) 39 | if cmd_args.ctx == 'gpu': 40 | self.net = self.net.cuda() 41 | self.old_net = self.old_net.cuda() 42 | self.eps_start = 1.0 43 | self.eps_end = 1.0 44 | self.eps_step = 10000 45 | self.burn_in = 100 46 | self.step = 0 47 | 48 | self.best_eval = None 49 | self.pos = 0 50 | self.sample_idxes = list(range(len(g_list))) 51 | random.shuffle(self.sample_idxes) 52 | self.take_snapshot() 53 | 54 | def take_snapshot(self): 55 | self.old_net.load_state_dict(self.net.state_dict()) 56 | 57 | def make_actions(self, time_t, greedy=False): 58 | self.eps = self.eps_end + max(0., (self.eps_start - self.eps_end) 59 | * (self.eps_step - max(0., self.step)) / self.eps_step) 60 | 61 | if random.random() < self.eps and not greedy: 62 | actions = self.env.uniformRandActions() 63 | else: 64 | cur_state = self.env.getStateRef() 65 | actions, _, _ = self.net(time_t, cur_state, None, greedy_acts=True) 66 | actions = list(actions.cpu().numpy()) 67 | 68 | return actions 69 | 70 | def run_simulation(self): 71 | if (self.pos + 1) * cmd_args.batch_size > len(self.sample_idxes): 72 | self.pos = 0 73 | random.shuffle(self.sample_idxes) 74 | 75 | selected_idx = self.sample_idxes[self.pos * cmd_args.batch_size : (self.pos + 1) * cmd_args.batch_size] 76 | self.pos += 1 77 | self.env.setup([self.g_list[idx] for idx in selected_idx]) 78 | 79 | t = 0 80 | while not env.isTerminal(): 81 | list_at = self.make_actions(t) 82 | list_st = self.env.cloneState() 83 | self.env.step(list_at) 84 | 85 | assert (env.rewards is not None) == env.isTerminal() 86 | if env.isTerminal(): 87 | rewards = env.rewards 88 | s_prime = None 89 | else: 90 | rewards = np.zeros(len(list_at), dtype=np.float32) 91 | s_prime = self.env.cloneState() 92 | 93 | self.mem_pool.add_list(list_st, list_at, rewards, s_prime, [env.isTerminal()] * len(list_at), t) 94 | t += 1 95 | 96 | def eval(self): 97 | self.env.setup(deepcopy(self.test_g_list)) 98 | t = 0 99 | while not self.env.isTerminal(): 100 | list_at = self.make_actions(t, greedy=True) 101 | self.env.step(list_at) 102 | t += 1 103 | test_loss = loop_dataset(env.g_list, env.classifier, list(range(len(env.g_list)))) 104 | print('\033[93m average test: loss %.5f acc %.5f\033[0m' % (test_loss[0], test_loss[1])) 105 | 106 | if cmd_args.phase == 'train' and self.best_eval is None or test_loss[1] < self.best_eval: 107 | print('----saving to best attacker since this is the best attack rate so far.----') 108 | torch.save(self.net.state_dict(), cmd_args.save_dir + '/epoch-best.model') 109 | with open(cmd_args.save_dir + '/epoch-best.txt', 'w') as f: 110 | f.write('%.4f\n' % test_loss[1]) 111 | self.best_eval = test_loss[1] 112 | 113 | reward = np.mean(self.env.rewards) 114 | print(reward) 115 | return reward, test_loss[1] 116 | 117 | def train(self): 118 | log_out = open(cmd_args.logfile, 'w', 0) 119 | pbar = tqdm(range(self.burn_in), unit='batch') 120 | for p in pbar: 121 | self.run_simulation() 122 | pbar = tqdm(range(local_args.num_steps), unit='steps') 123 | optimizer = optim.Adam(self.net.parameters(), lr=cmd_args.learning_rate) 124 | for self.step in pbar: 125 | 126 | self.run_simulation() 127 | 128 | if self.step % 100 == 0: 129 | self.take_snapshot() 130 | if self.step % 100 == 0: 131 | r, acc = self.eval() 132 | log_out.write('%d %.6f %.6f\n' % (self.step, r, acc)) 133 | 134 | cur_time, list_st, list_at, list_rt, list_s_primes, list_term = self.mem_pool.sample(batch_size=cmd_args.batch_size) 135 | 136 | list_target = torch.Tensor(list_rt) 137 | if cmd_args.ctx == 'gpu': 138 | list_target = list_target.cuda() 139 | 140 | cleaned_sp = [] 141 | nonterms = [] 142 | for i in range(len(list_st)): 143 | if not list_term[i]: 144 | cleaned_sp.append(list_s_primes[i]) 145 | nonterms.append(i) 146 | 147 | if len(cleaned_sp): 148 | _, _, banned = zip(*cleaned_sp) 149 | _, q_t_plus_1, prefix_sum_prime = self.old_net(cur_time + 1, cleaned_sp, None) 150 | _, q_rhs = greedy_actions(q_t_plus_1, prefix_sum_prime, banned) 151 | list_target[nonterms] = q_rhs 152 | 153 | # list_target = get_supervision(self.env.classifier, list_st, list_at) 154 | list_target = Variable(list_target.view(-1, 1)) 155 | 156 | _, q_sa, _ = self.net(cur_time, list_st, list_at) 157 | 158 | loss = F.mse_loss(q_sa, list_target) 159 | optimizer.zero_grad() 160 | loss.backward() 161 | optimizer.step() 162 | pbar.set_description('exp: %.5f, loss: %0.5f' % (self.eps, loss) ) 163 | 164 | log_out.close() 165 | if __name__ == '__main__': 166 | random.seed(cmd_args.seed) 167 | np.random.seed(cmd_args.seed) 168 | torch.manual_seed(cmd_args.seed) 169 | 170 | label_map, _, g_list = load_graphs() 171 | random.shuffle(g_list) 172 | base_classifier = load_base_model(label_map, g_list) 173 | env = GraphEdgeEnv(base_classifier, n_edges = 1) 174 | 175 | if cmd_args.frac_meta > 0: 176 | num_train = int( len(g_list) * (1 - cmd_args.frac_meta) ) 177 | agent = Agent(g_list[:num_train], g_list[num_train:], env) 178 | else: 179 | agent = Agent(g_list, None, env) 180 | 181 | if cmd_args.phase == 'train': 182 | agent.train() 183 | else: 184 | agent.net.load_state_dict(torch.load(cmd_args.save_dir + '/epoch-best.model')) 185 | agent.eval() 186 | # env.setup([g_list[idx] for idx in selected_idx]) 187 | # t = 0 188 | # while not env.isTerminal(): 189 | # policy_net = net_list[t] 190 | # t += 1 191 | # batch_graph, picked_nodes = env.getState() 192 | # log_probs, prefix_sum = policy_net(batch_graph, picked_nodes) 193 | # actions = env.sampleActions(torch.exp(log_probs).data.cpu().numpy(), prefix_sum.data.cpu().numpy(), greedy=True) 194 | # env.step(actions) 195 | 196 | # test_loss = loop_dataset(env.g_list, base_classifier, list(range(len(env.g_list)))) 197 | # print('\033[93maverage test: loss %.5f acc %.5f\033[0m' % (test_loss[0], test_loss[1])) 198 | 199 | # print(np.mean(avg_rewards), np.mean(env.rewards)) -------------------------------------------------------------------------------- /code/graph_attack/er_trivial_attack.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | from q_net import NStepQNet, QNet, greedy_actions 18 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 19 | from cmd_args import cmd_args 20 | from graph_embedding import S2VGraph 21 | 22 | from rl_common import GraphEdgeEnv, load_graphs, test_graphs, attackable, load_base_model 23 | 24 | sys.path.append('%s/../graph_classification' % os.path.dirname(os.path.realpath(__file__))) 25 | from graph_common import loop_dataset, load_er_data 26 | 27 | def propose_attack(model, s2v_g, num_added=1): 28 | g = s2v_g.to_networkx() 29 | comps = [c for c in nx.connected_component_subgraphs(g)] 30 | set_id = {} 31 | 32 | for i in range(len(comps)): 33 | for j in comps[i].nodes(): 34 | set_id[j] = i 35 | 36 | cand = [] 37 | for i in range(len(g) - 1): 38 | for j in range(i + 1, len(g)): 39 | if set_id[i] != set_id[j] or i == j: 40 | continue 41 | cand.append('%d %d' % (i, j)) 42 | 43 | if cmd_args.rand_att_type == 'random': 44 | added = np.random.choice(cand, num_added) 45 | added = [(int(w.split()[0]), int(w.split()[1])) for w in added] 46 | g.add_edges_from(added) 47 | return S2VGraph(g, s2v_g.label) 48 | elif cmd_args.rand_att_type == 'exhaust': 49 | g_list = [] 50 | for c in cand: 51 | x, y = [int(w) for w in c.split()] 52 | g2 = g.copy() 53 | g2.add_edge(x, y) 54 | g_list.append(S2VGraph(g2, s2v_g.label)) 55 | _, _, acc = model(g_list) 56 | ans = g_list[0] 57 | for i in range(len(g_list)): 58 | if acc.numpy()[i] < 1: 59 | ans = g_list[i] 60 | break 61 | return ans 62 | else: 63 | raise NotImplementedError 64 | 65 | if __name__ == '__main__': 66 | random.seed(cmd_args.seed) 67 | np.random.seed(cmd_args.seed) 68 | torch.manual_seed(cmd_args.seed) 69 | 70 | label_map, train_glist, test_glist = load_er_data() 71 | 72 | base_classifier = load_base_model(label_map, test_glist) 73 | 74 | new_test_list = [] 75 | for g in tqdm(test_glist): 76 | new_test_list.append(propose_attack(base_classifier, g)) 77 | 78 | test_graphs(base_classifier, new_test_list) -------------------------------------------------------------------------------- /code/graph_attack/genetic_algorithm.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 18 | from cmd_args import cmd_args 19 | from graph_embedding import S2VGraph 20 | 21 | from rl_common import local_args, load_graphs, load_base_model, attackable 22 | 23 | class GeneticAgent(object): 24 | def __init__(self, classifier, s2v_g, n_edges_attack): 25 | self.s2v_g = s2v_g 26 | self.n_edges_attack = n_edges_attack 27 | self.classifier = classifier 28 | g = s2v_g.to_networkx() 29 | comps = [c for c in nx.connected_component_subgraphs(g)] 30 | self.comps = comps 31 | self.set_id = {} 32 | self.solution = None 33 | for i in range(len(comps)): 34 | for j in comps[i].nodes(): 35 | self.set_id[j] = i 36 | 37 | self.population = [] 38 | for k in range(cmd_args.population_size): 39 | added = [] 40 | for k in range(n_edges_attack): 41 | while True: 42 | i = np.random.randint(len(g)) 43 | j = np.random.randint(len(g)) 44 | if self.set_id[i] != self.set_id[j] or i == j or (i, j) in added: 45 | continue 46 | break 47 | added.append((i, j)) 48 | self.population.append(added) 49 | 50 | def rand_action(self, i): 51 | region = self.comps[self.set_id[i]].nodes() 52 | assert len(region) > 1 53 | while True: 54 | j = region[np.random.randint(len(region))] 55 | if j == i: 56 | continue 57 | assert self.set_id[i] == self.set_id[j] 58 | break 59 | return j 60 | 61 | def get_fitness(self): 62 | g_list = [] 63 | g = self.s2v_g.to_networkx() 64 | for edges in self.population: 65 | g2 = g.copy() 66 | g2.add_edge(edges[0][0], edges[0][1]) 67 | # g2.add_edges_from(edges) 68 | assert nx.number_connected_components(g2) == self.s2v_g.label 69 | g_list.append(S2VGraph(g2, self.s2v_g.label)) 70 | 71 | log_ll, _, acc = self.classifier(g_list) 72 | acc = acc.cpu().double().numpy() 73 | if self.solution is None: 74 | for i in range(len(self.population)): 75 | if acc[i] < 1.0: 76 | self.solution = self.population[i] 77 | break 78 | nll = -log_ll[:, self.classifier.label_map[self.s2v_g.label]] 79 | return nll 80 | 81 | def select(self, fitness): 82 | scores = torch.exp(fitness).cpu().data.numpy() 83 | max_args = np.argsort(-scores) 84 | 85 | result = [] 86 | for i in range(cmd_args.population_size - cmd_args.population_size // 2): 87 | result.append(deepcopy(self.population[max_args[i]])) 88 | 89 | idx = np.random.choice(np.arange(cmd_args.population_size), 90 | size=cmd_args.population_size // 2, 91 | replace=True, 92 | p=scores/scores.sum()) 93 | for i in idx: 94 | result.append(deepcopy(self.population[i])) 95 | 96 | return result 97 | 98 | def crossover(self, parent, pop): 99 | if np.random.rand() < cmd_args.cross_rate: 100 | another = pop[ np.random.randint(len(pop)) ] 101 | if len(parent) != self.n_edges_attack: 102 | return another[:] 103 | if len(another) != self.n_edges_attack: 104 | return parent[:] 105 | t = [] 106 | for i in range(self.n_edges_attack): 107 | if np.random.rand() < 0.5: 108 | t.append(parent[i]) 109 | else: 110 | t.append(another[i]) 111 | return t 112 | else: 113 | return parent[:] 114 | 115 | def mutate(self, child): 116 | if len(child) != self.n_edges_attack: 117 | return child 118 | for i in range(self.n_edges_attack): 119 | if np.random.rand() < cmd_args.mutate_rate: 120 | e = child[i] 121 | if np.random.rand() < 0.5: 122 | e = (e[0], self.rand_action(e[0])) 123 | else: 124 | e = (self.rand_action(e[1]), e[1]) 125 | child[i] = e 126 | return child 127 | 128 | def evolve(self): 129 | fitness = self.get_fitness() 130 | if self.solution is not None: 131 | return 132 | pop = self.select(fitness) 133 | new_pop_list = [] 134 | for parent in pop: 135 | child = self.crossover(parent, pop) 136 | child = self.mutate(child) 137 | new_pop_list.append(child) 138 | 139 | self.population = new_pop_list 140 | 141 | if __name__ == '__main__': 142 | random.seed(cmd_args.seed) 143 | np.random.seed(cmd_args.seed) 144 | torch.manual_seed(cmd_args.seed) 145 | 146 | label_map, _, g_list = load_graphs() 147 | base_classifier = load_base_model(label_map, g_list) 148 | 149 | if cmd_args.idx_start + cmd_args.num_instances > len(g_list): 150 | instances = g_list[cmd_args.idx_start : ] 151 | else: 152 | instances = g_list[cmd_args.idx_start : cmd_args.idx_start + cmd_args.num_instances] 153 | 154 | attacked = 0.0 155 | pbar = tqdm(instances) 156 | idx = cmd_args.idx_start 157 | for g in pbar: 158 | agent = GeneticAgent(base_classifier, g, cmd_args.num_mod) 159 | if len(agent.population) == 0: 160 | continue 161 | for i in range(cmd_args.rounds): 162 | agent.evolve() 163 | if agent.solution is not None: 164 | attacked += 1 165 | break 166 | with open('%s/sol-%d.txt' % (cmd_args.save_dir, idx), 'w') as f: 167 | f.write('%d: [' % idx) 168 | if agent.solution is not None: 169 | for e in agent.solution: 170 | f.write('(%d, %d)' % e) 171 | f.write('] succ: ') 172 | if agent.solution is not None: 173 | f.write('1\n') 174 | else: 175 | f.write('0\n') 176 | pbar.set_description('cur_attack: %.2f' % (attacked) ) 177 | idx += 1 178 | print('\n\nacc: %.4f\n' % ((len(instances) - attacked) / float(len(instances))) ) 179 | -------------------------------------------------------------------------------- /code/graph_attack/grad_attack.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | from q_net import NStepQNet, QNet, greedy_actions 18 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 19 | from cmd_args import cmd_args 20 | from graph_embedding import S2VGraph 21 | 22 | from rl_common import GraphEdgeEnv, load_graphs, test_graphs, attackable, load_base_model 23 | 24 | sys.path.append('%s/../graph_classification' % os.path.dirname(os.path.realpath(__file__))) 25 | from graph_common import loop_dataset, load_er_data 26 | 27 | def propose_attack(model, s2v_g, num_added=1): 28 | g = s2v_g.to_networkx() 29 | comps = [c for c in nx.connected_component_subgraphs(g)] 30 | set_id = {} 31 | 32 | for i in range(len(comps)): 33 | for j in comps[i].nodes(): 34 | set_id[j] = i 35 | 36 | node_feat, edge_feat, labels = model.PrepareFeatureLabel([s2v_g]) 37 | if cmd_args.ctx == 'gpu': 38 | node_feat = node_feat.cuda() 39 | labels = labels.cuda() 40 | 41 | cand_list = [s2v_g] 42 | for l in range( len(model.label_map) ): 43 | if l == s2v_g.label: 44 | continue 45 | labels[0] = l 46 | model.zero_grad() 47 | (_, embed), sp_dict = model.s2v([s2v_g], node_feat, edge_feat, pool_global=True, n2n_grad=True) 48 | _, loss, _ = model.mlp(embed, labels) 49 | loss.backward() 50 | grad = sp_dict['n2n'].grad.data.numpy().flatten() 51 | idxes = np.argsort(grad) 52 | added = [] 53 | 54 | for p in idxes: 55 | x = p // s2v_g.num_nodes 56 | y = p % s2v_g.num_nodes 57 | if set_id[x] != set_id[y] or x == y or grad[p] > 0: 58 | continue 59 | added.append((x, y)) 60 | if len(added) >= num_added: 61 | break 62 | if len(added) == 0: 63 | continue 64 | g2 = g.copy() 65 | g2.add_edges_from(added) 66 | 67 | cand_list.append( S2VGraph(g2, s2v_g.label) ) 68 | 69 | _, _, acc = model(cand_list) 70 | acc = acc.double().cpu().numpy() 71 | for i in range(len(cand_list)): 72 | if acc[i] < 1.0: 73 | return cand_list[i] 74 | return cand_list[0] 75 | 76 | if __name__ == '__main__': 77 | random.seed(cmd_args.seed) 78 | np.random.seed(cmd_args.seed) 79 | torch.manual_seed(cmd_args.seed) 80 | 81 | label_map, train_glist, test_glist = load_er_data() 82 | 83 | base_classifier = load_base_model(label_map, test_glist) 84 | 85 | new_test_list = [] 86 | for g in tqdm(test_glist): 87 | new_test_list.append(propose_attack(base_classifier, g)) 88 | 89 | test_graphs(base_classifier, new_test_list) -------------------------------------------------------------------------------- /code/graph_attack/nstep_replay_mem.py: -------------------------------------------------------------------------------- 1 | import random 2 | import numpy as np 3 | 4 | class NstepReplaySubMemCell(object): 5 | def __init__(self, memory_size): 6 | self.memory_size = memory_size 7 | 8 | self.actions = [None] * self.memory_size 9 | self.rewards = [None] * self.memory_size 10 | self.states = [None] * self.memory_size 11 | self.s_primes = [None] * self.memory_size 12 | self.terminals = [None] * self.memory_size 13 | 14 | self.count = 0 15 | self.current = 0 16 | 17 | def add(self, s_t, a_t, r_t, s_prime, terminal): 18 | self.actions[self.current] = a_t 19 | self.rewards[self.current] = r_t 20 | self.states[self.current] = s_t 21 | self.s_primes[self.current] = s_prime 22 | self.terminals[self.current] = terminal 23 | 24 | self.count = max(self.count, self.current + 1) 25 | self.current = (self.current + 1) % self.memory_size 26 | 27 | def add_list(self, list_st, list_at, list_rt, list_sp, list_term): 28 | for i in range(len(list_st)): 29 | if list_sp is None: 30 | sp = (None, None, None) 31 | else: 32 | sp = list_sp[i] 33 | self.add(list_st[i], list_at[i], list_rt[i], sp, list_term[i]) 34 | 35 | def sample(self, batch_size): 36 | assert self.count >= batch_size 37 | 38 | list_st = [] 39 | list_at = [] 40 | list_rt = [] 41 | list_s_primes = [] 42 | list_term = [] 43 | 44 | for i in range(batch_size): 45 | idx = random.randint(0, self.count - 1) 46 | list_st.append(self.states[idx]) 47 | list_at.append(self.actions[idx]) 48 | list_rt.append(float(self.rewards[idx])) 49 | list_s_primes.append(self.s_primes[idx]) 50 | list_term.append(self.terminals[idx]) 51 | 52 | return list_st, list_at, list_rt, list_s_primes, list_term 53 | 54 | def hash_state_action(s_t, a_t): 55 | key = s_t[0] 56 | base = 179424673 57 | for e in s_t[1].directed_edges: 58 | key = (key * base + e[0]) % base 59 | key = (key * base + e[1]) % base 60 | if s_t[2] is not None: 61 | key = (key * base + s_t[2]) % base 62 | else: 63 | key = (key * base) % base 64 | 65 | key = (key * base + a_t) % base 66 | return key 67 | 68 | class NstepReplayMemCell(object): 69 | def __init__(self, memory_size, balance_sample = False): 70 | self.sub_list = [] 71 | self.balance_sample = balance_sample 72 | self.sub_list.append(NstepReplaySubMemCell(memory_size)) 73 | if balance_sample: 74 | self.sub_list.append(NstepReplaySubMemCell(memory_size)) 75 | self.state_set = set() 76 | 77 | def add(self, s_t, a_t, r_t, s_prime, terminal): 78 | if not self.balance_sample or r_t < 0: 79 | self.sub_list[0].add(s_t, a_t, r_t, s_prime, terminal) 80 | else: 81 | assert r_t > 0 82 | key = hash_state_action(s_t, a_t) 83 | if key in self.state_set: 84 | return 85 | self.state_set.add(key) 86 | self.sub_list[1].add(s_t, a_t, r_t, s_prime, terminal) 87 | 88 | def sample(self, batch_size): 89 | if not self.balance_sample or self.sub_list[1].count < batch_size: 90 | return self.sub_list[0].sample(batch_size) 91 | 92 | list_st, list_at, list_rt, list_s_primes, list_term = self.sub_list[0].sample(batch_size // 2) 93 | list_st2, list_at2, list_rt2, list_s_primes2, list_term2 = self.sub_list[1].sample(batch_size - batch_size // 2) 94 | 95 | return list_st + list_st2, list_at + list_at2, list_rt + list_rt2, list_s_primes + list_s_primes2, list_term + list_term2 96 | 97 | class NstepReplayMem(object): 98 | def __init__(self, memory_size, n_steps, balance_sample = False): 99 | self.mem_cells = [] 100 | for i in range(n_steps - 1): 101 | self.mem_cells.append(NstepReplayMemCell(memory_size, False)) 102 | self.mem_cells.append(NstepReplayMemCell(memory_size, balance_sample)) 103 | 104 | self.n_steps = n_steps 105 | self.memory_size = memory_size 106 | 107 | def add(self, s_t, a_t, r_t, s_prime, terminal, t): 108 | assert t >= 0 and t < self.n_steps 109 | if t == self.n_steps - 1: 110 | assert terminal 111 | else: 112 | assert not terminal 113 | self.mem_cells[t].add(s_t, a_t, r_t, s_prime, terminal) 114 | 115 | def add_list(self, list_st, list_at, list_rt, list_sp, list_term, t): 116 | for i in range(len(list_st)): 117 | if list_sp is None: 118 | sp = (None, None, None) 119 | else: 120 | sp = list_sp[i] 121 | self.add(list_st[i], list_at[i], list_rt[i], sp, list_term[i], t) 122 | 123 | def sample(self, batch_size, t = None): 124 | if t is None: 125 | t = np.random.randint(self.n_steps) 126 | list_st, list_at, list_rt, list_s_primes, list_term = self.mem_cells[t].sample(batch_size) 127 | return t, list_st, list_at, list_rt, list_s_primes, list_term -------------------------------------------------------------------------------- /code/graph_attack/plot_dqn.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | from q_net import NStepQNet, QNet, greedy_actions 18 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 19 | from cmd_args import cmd_args 20 | 21 | from rl_common import GraphEdgeEnv, local_args, load_graphs, test_graphs, load_base_model, attackable, get_supervision 22 | from nstep_replay_mem import NstepReplayMem 23 | 24 | sys.path.append('%s/../graph_classification' % os.path.dirname(os.path.realpath(__file__))) 25 | from graph_common import loop_dataset 26 | 27 | class Agent(object): 28 | def __init__(self, g_list, test_g_list, env): 29 | self.g_list = g_list 30 | if test_g_list is None: 31 | self.test_g_list = g_list 32 | else: 33 | self.test_g_list = test_g_list 34 | self.mem_pool = NstepReplayMem(memory_size=50000, n_steps=2) 35 | self.env = env 36 | # self.net = QNet() 37 | self.net = NStepQNet(2) 38 | self.old_net = NStepQNet(2) 39 | if cmd_args.ctx == 'gpu': 40 | self.net = self.net.cuda() 41 | self.old_net = self.old_net.cuda() 42 | self.eps_start = 1.0 43 | self.eps_end = 1.0 44 | self.eps_step = 10000 45 | self.burn_in = 100 46 | self.step = 0 47 | 48 | self.best_eval = None 49 | self.pos = 0 50 | self.sample_idxes = list(range(len(g_list))) 51 | random.shuffle(self.sample_idxes) 52 | self.take_snapshot() 53 | 54 | def take_snapshot(self): 55 | self.old_net.load_state_dict(self.net.state_dict()) 56 | 57 | def make_actions(self, time_t, greedy=False): 58 | self.eps = self.eps_end + max(0., (self.eps_start - self.eps_end) 59 | * (self.eps_step - max(0., self.step)) / self.eps_step) 60 | 61 | if random.random() < self.eps and not greedy: 62 | actions = self.env.uniformRandActions() 63 | else: 64 | cur_state = self.env.getStateRef() 65 | actions, _, _ = self.net(time_t, cur_state, None, greedy_acts=True) 66 | actions = list(actions.cpu().numpy()) 67 | 68 | return actions 69 | 70 | def run_simulation(self): 71 | if (self.pos + 1) * cmd_args.batch_size > len(self.sample_idxes): 72 | self.pos = 0 73 | random.shuffle(self.sample_idxes) 74 | 75 | selected_idx = self.sample_idxes[self.pos * cmd_args.batch_size : (self.pos + 1) * cmd_args.batch_size] 76 | self.pos += 1 77 | self.env.setup([self.g_list[idx] for idx in selected_idx]) 78 | 79 | t = 0 80 | while not env.isTerminal(): 81 | list_at = self.make_actions(t) 82 | list_st = self.env.cloneState() 83 | self.env.step(list_at) 84 | 85 | assert (env.rewards is not None) == env.isTerminal() 86 | if env.isTerminal(): 87 | rewards = env.rewards 88 | s_prime = None 89 | else: 90 | rewards = np.zeros(len(list_at), dtype=np.float32) 91 | s_prime = self.env.cloneState() 92 | 93 | self.mem_pool.add_list(list_st, list_at, rewards, s_prime, [env.isTerminal()] * len(list_at), t) 94 | t += 1 95 | 96 | def eval(self): 97 | self.env.setup(deepcopy(self.test_g_list)) 98 | t = 0 99 | while not self.env.isTerminal(): 100 | list_at = self.make_actions(t, greedy=True) 101 | self.env.step(list_at) 102 | t += 1 103 | test_loss = loop_dataset(env.g_list, env.classifier, list(range(len(env.g_list)))) 104 | print('\033[93m average test: loss %.5f acc %.5f\033[0m' % (test_loss[0], test_loss[1])) 105 | with open('%s/edge_added.txt' % cmd_args.save_dir, 'w') as f: 106 | for i in range(len(self.test_g_list)): 107 | f.write('%d %d ' % (self.test_g_list[i].label, env.pred[i] + 1)) 108 | f.write('%d %d\n' % env.added_edges[i]) 109 | reward = np.mean(self.env.rewards) 110 | print(reward) 111 | return reward, test_loss[1] 112 | 113 | if __name__ == '__main__': 114 | random.seed(cmd_args.seed) 115 | np.random.seed(cmd_args.seed) 116 | torch.manual_seed(cmd_args.seed) 117 | 118 | label_map, _, g_list = load_graphs() 119 | # random.shuffle(g_list) 120 | base_classifier = load_base_model(label_map, g_list) 121 | env = GraphEdgeEnv(base_classifier, n_edges = 1) 122 | 123 | if cmd_args.frac_meta > 0: 124 | num_train = int( len(g_list) * (1 - cmd_args.frac_meta) ) 125 | agent = Agent(g_list[:num_train], g_list[num_train:], env) 126 | else: 127 | agent = Agent(g_list, None, env) 128 | 129 | assert cmd_args.phase == 'test' 130 | agent.net.load_state_dict(torch.load(cmd_args.save_dir + '/epoch-best.model')) 131 | agent.eval() 132 | # env.setup([g_list[idx] for idx in selected_idx]) 133 | # t = 0 134 | # while not env.isTerminal(): 135 | # policy_net = net_list[t] 136 | # t += 1 137 | # batch_graph, picked_nodes = env.getState() 138 | # log_probs, prefix_sum = policy_net(batch_graph, picked_nodes) 139 | # actions = env.sampleActions(torch.exp(log_probs).data.cpu().numpy(), prefix_sum.data.cpu().numpy(), greedy=True) 140 | # env.step(actions) 141 | 142 | # test_loss = loop_dataset(env.g_list, base_classifier, list(range(len(env.g_list)))) 143 | # print('\033[93maverage test: loss %.5f acc %.5f\033[0m' % (test_loss[0], test_loss[1])) 144 | 145 | # print(np.mean(avg_rewards), np.mean(env.rewards)) -------------------------------------------------------------------------------- /code/graph_attack/plot_dqn.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | 5 | min_n=40 6 | max_n=50 7 | p=0.05 8 | min_c=1 9 | max_c=3 10 | base_lv=4 11 | data_folder=$dropbox/data/components 12 | save_fold=nodes-${min_n}-${max_n}-p-${p}-c-${min_c}-${max_c}-lv-${base_lv} 13 | base_model_dump=$dropbox/scratch/results/graph_classification/components/$save_fold/epoch-best 14 | 15 | lr=0.001 16 | max_lv=5 17 | frac_meta=0 18 | 19 | output_base=$dropbox/scratch/results/graph_classification/components/$save_fold 20 | 21 | output_root=$output_base/lv-${max_lv}-frac-${frac_meta} 22 | 23 | python plot_dqn.py \ 24 | -data_folder $data_folder \ 25 | -save_dir $output_root \ 26 | -max_n $max_n \ 27 | -min_n $min_n \ 28 | -max_lv $max_lv \ 29 | -frac_meta $frac_meta \ 30 | -min_c $min_c \ 31 | -max_c $max_c \ 32 | -n_graphs 5000 \ 33 | -er_p $p \ 34 | -learning_rate $lr \ 35 | -base_model_dump $base_model_dump \ 36 | -logfile $output_root/log.txt \ 37 | $@ 38 | -------------------------------------------------------------------------------- /code/graph_attack/q_net.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | 16 | sys.path.append('%s/../../pytorch_structure2vec/s2v_lib' % os.path.dirname(os.path.realpath(__file__))) 17 | from pytorch_util import weights_init 18 | 19 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 20 | from graph_embedding import EmbedMeanField, EmbedLoopyBP 21 | from cmd_args import cmd_args 22 | from modules.custom_mod import JaggedMaxModule 23 | 24 | from rl_common import local_args 25 | 26 | def greedy_actions(q_values, v_p, banned_list): 27 | actions = [] 28 | offset = 0 29 | banned_acts = [] 30 | prefix_sum = v_p.data.cpu().numpy() 31 | for i in range(len(prefix_sum)): 32 | n_nodes = prefix_sum[i] - offset 33 | 34 | if banned_list is not None and banned_list[i] is not None: 35 | for j in banned_list[i]: 36 | banned_acts.append(offset + j) 37 | offset = prefix_sum[i] 38 | 39 | q_values = q_values.data.clone() 40 | if len(banned_acts): 41 | q_values[banned_acts, :] = np.finfo(np.float64).min 42 | jmax = JaggedMaxModule() 43 | values, actions = jmax(Variable(q_values), v_p) 44 | 45 | return actions.data, values.data 46 | 47 | class QNet(nn.Module): 48 | def __init__(self, s2v_module = None): 49 | super(QNet, self).__init__() 50 | if cmd_args.gm == 'mean_field': 51 | model = EmbedMeanField 52 | elif cmd_args.gm == 'loopy_bp': 53 | model = EmbedLoopyBP 54 | else: 55 | print('unknown gm %s' % cmd_args.gm) 56 | sys.exit() 57 | 58 | if cmd_args.out_dim == 0: 59 | embed_dim = cmd_args.latent_dim 60 | else: 61 | embed_dim = cmd_args.out_dim 62 | if local_args.mlp_hidden: 63 | self.linear_1 = nn.Linear(embed_dim * 2, local_args.mlp_hidden) 64 | self.linear_out = nn.Linear(local_args.mlp_hidden, 1) 65 | else: 66 | self.linear_out = nn.Linear(embed_dim * 2, 1) 67 | weights_init(self) 68 | 69 | if s2v_module is None: 70 | self.s2v = model(latent_dim=cmd_args.latent_dim, 71 | output_dim=cmd_args.out_dim, 72 | num_node_feats=2, 73 | num_edge_feats=0, 74 | max_lv=cmd_args.max_lv) 75 | else: 76 | self.s2v = s2v_module 77 | 78 | def PrepareFeatures(self, batch_graph, picked_nodes): 79 | n_nodes = 0 80 | prefix_sum = [] 81 | picked_ones = [] 82 | for i in range(len(batch_graph)): 83 | if picked_nodes is not None and picked_nodes[i] is not None: 84 | assert picked_nodes[i] >= 0 and picked_nodes[i] < batch_graph[i].num_nodes 85 | picked_ones.append(n_nodes + picked_nodes[i]) 86 | n_nodes += batch_graph[i].num_nodes 87 | prefix_sum.append(n_nodes) 88 | 89 | node_feat = torch.zeros(n_nodes, 2) 90 | node_feat[:, 0] = 1.0 91 | 92 | if len(picked_ones): 93 | node_feat.numpy()[picked_ones, 1] = 1.0 94 | node_feat.numpy()[picked_ones, 0] = 0.0 95 | 96 | return node_feat, torch.LongTensor(prefix_sum) 97 | 98 | def add_offset(self, actions, v_p): 99 | prefix_sum = v_p.data.cpu().numpy() 100 | 101 | shifted = [] 102 | for i in range(len(prefix_sum)): 103 | if i > 0: 104 | offset = prefix_sum[i - 1] 105 | else: 106 | offset = 0 107 | shifted.append(actions[i] + offset) 108 | 109 | return shifted 110 | 111 | def rep_global_embed(self, graph_embed, v_p): 112 | prefix_sum = v_p.data.cpu().numpy() 113 | 114 | rep_idx = [] 115 | for i in range(len(prefix_sum)): 116 | if i == 0: 117 | n_nodes = prefix_sum[i] 118 | else: 119 | n_nodes = prefix_sum[i] - prefix_sum[i - 1] 120 | rep_idx += [i] * n_nodes 121 | 122 | rep_idx = Variable(torch.LongTensor(rep_idx)) 123 | if cmd_args.ctx == 'gpu': 124 | rep_idx = rep_idx.cuda() 125 | graph_embed = torch.index_select(graph_embed, 0, rep_idx) 126 | return graph_embed 127 | 128 | def forward(self, time_t, states, actions, greedy_acts = False): 129 | batch_graph, picked_nodes, banned_list = zip(*states) 130 | 131 | node_feat, prefix_sum = self.PrepareFeatures(batch_graph, picked_nodes) 132 | 133 | if cmd_args.ctx == 'gpu': 134 | node_feat = node_feat.cuda() 135 | prefix_sum = prefix_sum.cuda() 136 | prefix_sum = Variable(prefix_sum) 137 | 138 | embed, graph_embed = self.s2v(batch_graph, node_feat, None, pool_global=True) 139 | 140 | if actions is None: 141 | graph_embed = self.rep_global_embed(graph_embed, prefix_sum) 142 | else: 143 | shifted = self.add_offset(actions, prefix_sum) 144 | embed = embed[shifted, :] 145 | 146 | embed_s_a = torch.cat((embed, graph_embed), dim=1) 147 | 148 | if local_args.mlp_hidden: 149 | embed_s_a = F.relu( self.linear_1(embed_s_a) ) 150 | 151 | raw_pred = self.linear_out(embed_s_a) 152 | 153 | if greedy_acts: 154 | actions, _ = greedy_actions(raw_pred, prefix_sum, banned_list) 155 | 156 | return actions, raw_pred, prefix_sum 157 | 158 | class NStepQNet(nn.Module): 159 | def __init__(self, num_steps, s2v_module = None): 160 | super(NStepQNet, self).__init__() 161 | 162 | list_mod = [QNet(s2v_module)] 163 | 164 | for i in range(1, num_steps): 165 | list_mod.append(QNet(list_mod[0].s2v)) 166 | 167 | self.list_mod = nn.ModuleList(list_mod) 168 | 169 | self.num_steps = num_steps 170 | 171 | def forward(self, time_t, states, actions, greedy_acts = False): 172 | assert time_t >= 0 and time_t < self.num_steps 173 | 174 | return self.list_mod[time_t](time_t, states, actions, greedy_acts) 175 | -------------------------------------------------------------------------------- /code/graph_attack/rl_common.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import sys 4 | import numpy as np 5 | import torch 6 | import networkx as nx 7 | import random 8 | from torch.autograd import Variable 9 | from torch.nn.parameter import Parameter 10 | import torch.nn as nn 11 | import torch.nn.functional as F 12 | import torch.optim as optim 13 | from tqdm import tqdm 14 | from copy import deepcopy 15 | import cPickle as cp 16 | cmd_opt = argparse.ArgumentParser(description='Argparser locally') 17 | cmd_opt.add_argument('-mlp_hidden', type=int, default=64, help='mlp hidden layer size') 18 | cmd_opt.add_argument('-att_embed_dim', type=int, default=64, help='att_embed_dim') 19 | cmd_opt.add_argument('-num_steps', type=int, default=100000, help='# fits') 20 | local_args, _ = cmd_opt.parse_known_args() 21 | 22 | print(local_args) 23 | 24 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 25 | from graph_embedding import S2VGraph 26 | from cmd_args import cmd_args 27 | from dnn import GraphClassifier 28 | 29 | sys.path.append('%s/../data_generator' % os.path.dirname(os.path.realpath(__file__))) 30 | from data_util import load_pkl 31 | 32 | sys.path.append('%s/../graph_classification' % os.path.dirname(os.path.realpath(__file__))) 33 | from graph_common import loop_dataset 34 | 35 | class GraphEdgeEnv(object): 36 | def __init__(self, classifier, n_edges): 37 | self.n_edges = n_edges 38 | self.classifier = classifier 39 | 40 | def setup(self, g_list): 41 | self.g_list = g_list 42 | self.n_steps = 0 43 | self.first_nodes = None 44 | self.rewards = None 45 | self.banned_list = None 46 | self.prefix_sum = [] 47 | n_nodes = 0 48 | for i in range(len(g_list)): 49 | n_nodes += g_list[i].num_nodes 50 | self.prefix_sum.append(n_nodes) 51 | self.added_edges = [] 52 | 53 | def bannedActions(self, g, node_x): 54 | comps = [c for c in nx.connected_component_subgraphs(g)] 55 | set_id = {} 56 | for i in range(len(comps)): 57 | for j in comps[i].nodes(): 58 | set_id[j] = i 59 | 60 | banned_actions = set() 61 | for i in range(len(g)): 62 | if set_id[i] != set_id[node_x] or i == node_x: 63 | banned_actions.add(i) 64 | return banned_actions 65 | 66 | def step(self, actions): 67 | if self.first_nodes is None: # pick the first node of edge 68 | assert self.n_steps % 2 == 0 69 | self.first_nodes = actions 70 | self.banned_list = [] 71 | for i in range(len(self.g_list)): 72 | self.banned_list.append(self.bannedActions(self.g_list[i].to_networkx(), self.first_nodes[i])) 73 | else: # edge picked 74 | self.added_edges = [] 75 | for i in range(len(self.g_list)): 76 | g = self.g_list[i].to_networkx() 77 | g.add_edge(self.first_nodes[i], actions[i]) 78 | self.added_edges.append((self.first_nodes[i], actions[i])) 79 | self.g_list[i] = S2VGraph(g, label = self.g_list[i].label) 80 | self.first_nodes = None 81 | self.banned_list = None 82 | self.n_steps += 1 83 | 84 | if self.isTerminal(): 85 | logits, _, acc = self.classifier(self.g_list) 86 | pred = logits.data.max(1, keepdim=True)[1] 87 | self.pred = pred.view(-1).cpu().numpy() 88 | self.rewards = (acc.view(-1).numpy() * -2.0 + 1.0).astype(np.float32) 89 | 90 | def uniformRandActions(self): 91 | act_list = [] 92 | offset = 0 93 | for i in range(len(self.prefix_sum)): 94 | n_nodes = self.prefix_sum[i] - offset 95 | 96 | if self.first_nodes is None: 97 | act_list.append(np.random.randint(n_nodes)) 98 | else: 99 | banned_actions = self.banned_list[i] 100 | cands = list(set(range(n_nodes)) - banned_actions) 101 | act_list.append(random.choice(cands)) 102 | offset = self.prefix_sum[i] 103 | return act_list 104 | 105 | def sampleActions(self, probs, greedy=False): 106 | offset = 0 107 | act_list = [] 108 | for i in range(len(self.prefix_sum)): 109 | p_vec = probs[offset : self.prefix_sum[i], 0].astype(np.float64) 110 | 111 | if self.first_nodes is not None: 112 | banned_actions = self.banned_list[i] 113 | for j in banned_actions: 114 | p_vec[j] = 0.0 115 | assert len(banned_actions) < len(p_vec) 116 | 117 | p_vec = p_vec / sum(p_vec) 118 | if greedy: 119 | action = np.argmax(p_vec) 120 | else: 121 | action = np.argmax(np.random.multinomial(1, p_vec)) 122 | act_list.append(action) 123 | offset = self.prefix_sum[i] 124 | 125 | return act_list 126 | 127 | def isTerminal(self): 128 | if self.n_steps == 2 * self.n_edges: 129 | return True 130 | return False 131 | 132 | def getStateRef(self): 133 | cp_first = [None] * len(self.g_list) 134 | if self.first_nodes is not None: 135 | cp_first = self.first_nodes 136 | b_list = [None] * len(self.g_list) 137 | if self.banned_list is not None: 138 | b_list = self.banned_list 139 | return zip(self.g_list, cp_first, b_list) 140 | 141 | def cloneState(self): 142 | cp_first = [None] * len(self.g_list) 143 | if self.first_nodes is not None: 144 | cp_first = self.first_nodes[:] 145 | b_list = [None] * len(self.g_list) 146 | if self.banned_list is not None: 147 | b_list = self.banned_list[:] 148 | 149 | return zip(deepcopy(self.g_list), cp_first, b_list) 150 | 151 | def load_graphs(): 152 | frac_train = 0.9 153 | pattern = 'nrange-%d-%d-n_graph-%d-p-%.2f' % (cmd_args.min_n, cmd_args.max_n, cmd_args.n_graphs, cmd_args.er_p) 154 | 155 | num_train = int(frac_train * cmd_args.n_graphs) 156 | 157 | train_glist = [] 158 | test_glist = [] 159 | label_map = {} 160 | for i in range(cmd_args.min_c, cmd_args.max_c + 1): 161 | cur_list = load_pkl('%s/ncomp-%d-%s.pkl' % (cmd_args.data_folder, i, pattern), cmd_args.n_graphs) 162 | assert len(cur_list) == cmd_args.n_graphs 163 | train_glist += [S2VGraph(cur_list[j], i) for j in range(num_train)] 164 | test_glist += [S2VGraph(cur_list[j], i) for j in range(num_train, len(cur_list))] 165 | label_map[i] = i - cmd_args.min_c 166 | 167 | print('# train:', len(train_glist), ' # test:', len(test_glist)) 168 | 169 | return label_map, train_glist, test_glist 170 | 171 | def test_graphs(classifier, test_glist): 172 | test_loss = loop_dataset(test_glist, classifier, list(range(len(test_glist)))) 173 | print('\033[93maverage test: loss %.5f acc %.5f\033[0m' % (test_loss[0], test_loss[1])) 174 | 175 | def load_base_model(label_map, test_glist = None): 176 | assert cmd_args.base_model_dump is not None 177 | with open('%s-args.pkl' % cmd_args.base_model_dump, 'rb') as f: 178 | base_args = cp.load(f) 179 | 180 | classifier = GraphClassifier(label_map, **vars(base_args)) 181 | if cmd_args.ctx == 'gpu': 182 | classifier = classifier.cuda() 183 | 184 | classifier.load_state_dict(torch.load(cmd_args.base_model_dump + '.model')) 185 | if test_glist is not None: 186 | test_graphs(classifier, test_glist) 187 | 188 | return classifier 189 | 190 | def attackable(classifier, s2v_g, x = None, y = None): 191 | g = s2v_g.to_networkx() 192 | comps = [c for c in nx.connected_component_subgraphs(g)] 193 | set_id = {} 194 | 195 | for i in range(len(comps)): 196 | for j in comps[i].nodes(): 197 | set_id[j] = i 198 | 199 | if x is not None: 200 | r_i = [x] 201 | else: 202 | r_i = range(len(g) - 1) 203 | 204 | g_list = [] 205 | for i in r_i: 206 | if y is not None: 207 | assert x is not None 208 | r_j = [y] 209 | else: 210 | if x is not None: 211 | r_j = range(len(g) - 1) 212 | else: 213 | r_j = range(i + 1, len(g)) 214 | for j in r_j: 215 | if set_id[i] != set_id[j]: 216 | continue 217 | g2 = g.copy() 218 | g2.add_edge(i, j) 219 | assert nx.number_connected_components(g2) == s2v_g.label 220 | g_list.append(S2VGraph(g2, s2v_g.label)) 221 | if len(g_list) == 0: 222 | print(x, y) 223 | print(g.edges(), s2v_g.label) 224 | print(set_id) 225 | assert len(g_list) 226 | _, _, acc = classifier(g_list) 227 | 228 | return np.sum(acc.view(-1).numpy()) < len(g_list) 229 | 230 | def get_supervision(classifier, list_st, list_at): 231 | list_target = torch.zeros(len(list_st)) 232 | for i in range(len(list_st)): 233 | g, x, _ = list_st[i] 234 | if x is not None: 235 | att = attackable(classifier, g, x, list_at[i]) 236 | else: 237 | att = attackable(classifier, g, list_at[i]) 238 | 239 | if att: 240 | list_target[i] = 1.0 241 | else: 242 | list_target[i] = -1.0 243 | 244 | return list_target 245 | -------------------------------------------------------------------------------- /code/graph_attack/run_dqn.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | 5 | min_n=90 6 | max_n=100 7 | p=0.02 8 | min_c=1 9 | max_c=3 10 | base_lv=2 11 | data_folder=$dropbox/data/components 12 | save_fold=nodes-${min_n}-${max_n}-p-${p}-c-${min_c}-${max_c}-lv-${base_lv} 13 | base_model_dump=$dropbox/scratch/results/graph_classification/components/$save_fold/epoch-best 14 | 15 | lr=0.001 16 | max_lv=5 17 | frac_meta=0.1 18 | 19 | output_base=$HOME/scratch/results/graph_classification/components/$save_fold 20 | 21 | output_root=$output_base/lv-${max_lv}-frac-${frac_meta} 22 | 23 | if [ ! -e $output_root ]; 24 | then 25 | mkdir -p $output_root 26 | fi 27 | 28 | python dqn.py \ 29 | -data_folder $data_folder \ 30 | -save_dir $output_root \ 31 | -max_n $max_n \ 32 | -min_n $min_n \ 33 | -max_lv $max_lv \ 34 | -frac_meta $frac_meta \ 35 | -min_c $min_c \ 36 | -max_c $max_c \ 37 | -n_graphs 5000 \ 38 | -er_p $p \ 39 | -learning_rate $lr \ 40 | -base_model_dump $base_model_dump \ 41 | -logfile $output_root/log.txt \ 42 | $@ 43 | -------------------------------------------------------------------------------- /code/graph_attack/run_ga.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | 5 | min_n=90 6 | max_n=100 7 | p=0.02 8 | min_c=1 9 | max_c=3 10 | base_lv=2 11 | data_folder=$dropbox/data/components 12 | save_fold=nodes-${min_n}-${max_n}-p-${p}-c-${min_c}-${max_c}-lv-${base_lv} 13 | base_model_dump=$dropbox/scratch/results/graph_classification/components/$save_fold/epoch-best 14 | 15 | idx_start=0 16 | num=2000 17 | pop=50 18 | cross=0.1 19 | mutate=0.2 20 | rounds=10 21 | 22 | output_base=$HOME/scratch/results/graph_classification/components/$save_fold 23 | output_root=$output_base/ga-p-${pop}-c-${cross}-m-${mutate}-r-${rounds} 24 | 25 | if [ ! -e $output_root ]; 26 | then 27 | mkdir -p $output_root 28 | fi 29 | 30 | python genetic_algorithm.py \ 31 | -data_folder $data_folder \ 32 | -save_dir $output_root \ 33 | -idx_start $idx_start \ 34 | -population_size $pop \ 35 | -cross_rate $cross \ 36 | -mutate_rate $mutate \ 37 | -rounds $rounds \ 38 | -num_instances $num \ 39 | -max_n $max_n \ 40 | -min_n $min_n \ 41 | -min_c $min_c \ 42 | -max_c $max_c \ 43 | -n_graphs 5000 \ 44 | -er_p $p \ 45 | -base_model_dump $base_model_dump \ 46 | $@ 47 | -------------------------------------------------------------------------------- /code/graph_attack/run_grad.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | min_n=90 4 | max_n=100 5 | p=0.02 6 | dropbox=../../dropbox/ 7 | data_folder=$dropbox/data/components 8 | min_c=1 9 | max_c=3 10 | max_lv=2 11 | rand=random 12 | 13 | save_fold=nodes-${min_n}-${max_n}-p-${p}-c-${min_c}-${max_c}-lv-${max_lv} 14 | base_model_dump=$dropbox/scratch/results/graph_classification/components/$save_fold/epoch-best 15 | 16 | output_root=./saved 17 | 18 | if [ ! -e $output_root ]; 19 | then 20 | mkdir -p $output_root 21 | fi 22 | 23 | python grad_attack.py \ 24 | -data_folder $data_folder \ 25 | -save_dir $output_root \ 26 | -max_n $max_n \ 27 | -min_n $min_n \ 28 | -rand_att_type $rand \ 29 | -min_c $min_c \ 30 | -max_c $max_c \ 31 | -base_model_dump $base_model_dump \ 32 | -n_graphs 5000 \ 33 | -er_p $p \ 34 | $@ 35 | -------------------------------------------------------------------------------- /code/graph_attack/run_trivial.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | min_n=15 4 | max_n=20 5 | p=0.15 6 | dropbox=../../dropbox/ 7 | data_folder=$dropbox/data/components 8 | min_c=1 9 | max_c=3 10 | max_lv=4 11 | rand=exhaust 12 | 13 | save_fold=nodes-${min_n}-${max_n}-p-${p}-c-${min_c}-${max_c}-lv-${max_lv} 14 | base_model_dump=$dropbox/scratch/results/graph_classification/components/$save_fold/epoch-best 15 | 16 | output_root=./saved 17 | 18 | if [ ! -e $output_root ]; 19 | then 20 | mkdir -p $output_root 21 | fi 22 | 23 | python er_trivial_attack.py \ 24 | -data_folder $data_folder \ 25 | -save_dir $output_root \ 26 | -max_n $max_n \ 27 | -min_n $min_n \ 28 | -max_lv $max_lv \ 29 | -rand_att_type $rand \ 30 | -min_c $min_c \ 31 | -max_c $max_c \ 32 | -base_model_dump $base_model_dump \ 33 | -n_graphs 5000 \ 34 | -er_p $p \ 35 | $@ 36 | -------------------------------------------------------------------------------- /code/graph_classification/er_components.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import random 8 | from torch.autograd import Variable 9 | from torch.nn.parameter import Parameter 10 | import torch.nn as nn 11 | import torch.nn.functional as F 12 | import torch.optim as optim 13 | from tqdm import tqdm 14 | import cPickle as cp 15 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 16 | from cmd_args import cmd_args, save_args 17 | from dnn import GraphClassifier 18 | from graph_embedding import S2VGraph 19 | 20 | sys.path.append('%s/../data_generator' % os.path.dirname(os.path.realpath(__file__))) 21 | from data_util import load_pkl 22 | 23 | from graph_common import loop_dataset, load_er_data 24 | 25 | if __name__ == '__main__': 26 | random.seed(cmd_args.seed) 27 | np.random.seed(cmd_args.seed) 28 | torch.manual_seed(cmd_args.seed) 29 | 30 | label_map, train_glist, test_glist = load_er_data() 31 | 32 | if cmd_args.saved_model is not None and cmd_args.saved_model != '': 33 | print('loading model from %s' % cmd_args.saved_model) 34 | with open('%s-args.pkl' % cmd_args.saved_model, 'rb') as f: 35 | base_args = cp.load(f) 36 | classifier = GraphClassifier(label_map, **vars(base_args)) 37 | classifier.load_state_dict(torch.load(cmd_args.saved_model + '.model')) 38 | else: 39 | classifier = GraphClassifier(label_map, **vars(cmd_args)) 40 | 41 | if cmd_args.ctx == 'gpu': 42 | classifier = classifier.cuda() 43 | if cmd_args.phase == 'test': 44 | test_loss = loop_dataset(test_glist, classifier, list(range(len(test_glist)))) 45 | print('\033[93maverage test: loss %.5f acc %.5f\033[0m' % (test_loss[0], test_loss[1])) 46 | 47 | if cmd_args.phase == 'train': 48 | optimizer = optim.Adam(classifier.parameters(), lr=cmd_args.learning_rate) 49 | 50 | train_idxes = list(range(len(train_glist))) 51 | best_loss = None 52 | for epoch in range(cmd_args.num_epochs): 53 | random.shuffle(train_idxes) 54 | avg_loss = loop_dataset(train_glist, classifier, train_idxes, optimizer=optimizer) 55 | print('\033[92maverage training of epoch %d: loss %.5f acc %.5f\033[0m' % (epoch, avg_loss[0], avg_loss[1])) 56 | 57 | test_loss = loop_dataset(test_glist, classifier, list(range(len(test_glist)))) 58 | print('\033[93maverage test of epoch %d: loss %.5f acc %.5f\033[0m' % (epoch, test_loss[0], test_loss[1])) 59 | 60 | if best_loss is None or test_loss[0] < best_loss: 61 | best_loss = test_loss[0] 62 | print('----saving to best model since this is the best valid loss so far.----') 63 | torch.save(classifier.state_dict(), cmd_args.save_dir + '/epoch-best.model') 64 | save_args(cmd_args.save_dir + '/epoch-best-args.pkl', cmd_args) -------------------------------------------------------------------------------- /code/graph_classification/graph_common.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | 16 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 17 | from cmd_args import cmd_args 18 | from graph_embedding import S2VGraph 19 | 20 | sys.path.append('%s/../data_generator' % os.path.dirname(os.path.realpath(__file__))) 21 | from data_util import load_pkl 22 | 23 | def loop_dataset(g_list, classifier, sample_idxes, optimizer=None, bsize=cmd_args.batch_size): 24 | total_loss = [] 25 | total_iters = (len(sample_idxes) + (bsize - 1) * (optimizer is None)) // bsize 26 | pbar = tqdm(range(total_iters), unit='batch') 27 | 28 | n_samples = 0 29 | for pos in pbar: 30 | selected_idx = sample_idxes[pos * bsize : (pos + 1) * bsize] 31 | 32 | batch_graph = [g_list[idx] for idx in selected_idx] 33 | _, loss, acc = classifier(batch_graph) 34 | 35 | acc = acc.sum() / float(acc.size()[0]) 36 | if optimizer is not None: 37 | optimizer.zero_grad() 38 | loss.backward() 39 | optimizer.step() 40 | 41 | loss = loss.data.cpu().numpy()[0] 42 | pbar.set_description('loss: %0.5f acc: %0.5f' % (loss, acc) ) 43 | 44 | total_loss.append( np.array([loss, acc]) * len(selected_idx)) 45 | 46 | n_samples += len(selected_idx) 47 | if optimizer is None: 48 | assert n_samples == len(sample_idxes) 49 | total_loss = np.array(total_loss) 50 | avg_loss = np.sum(total_loss, 0) / n_samples 51 | return avg_loss 52 | 53 | def load_er_data(): 54 | frac_train = 0.9 55 | pattern = 'nrange-%d-%d-n_graph-%d-p-%.2f' % (cmd_args.min_n, cmd_args.max_n, cmd_args.n_graphs, cmd_args.er_p) 56 | 57 | num_train = int(frac_train * cmd_args.n_graphs) 58 | 59 | train_glist = [] 60 | test_glist = [] 61 | label_map = {} 62 | for i in range(cmd_args.min_c, cmd_args.max_c + 1): 63 | cur_list = load_pkl('%s/ncomp-%d-%s.pkl' % (cmd_args.data_folder, i, pattern), cmd_args.n_graphs) 64 | assert len(cur_list) == cmd_args.n_graphs 65 | train_glist += [S2VGraph(cur_list[j], i) for j in range(num_train)] 66 | test_glist += [S2VGraph(cur_list[j], i) for j in range(num_train, len(cur_list))] 67 | label_map[i] = i - cmd_args.min_c 68 | cmd_args.num_class = len(label_map) 69 | cmd_args.feat_dim = 1 70 | print('# train:', len(train_glist), ' # test:', len(test_glist)) 71 | 72 | return label_map, train_glist, test_glist 73 | 74 | -------------------------------------------------------------------------------- /code/graph_classification/run_er_components.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | min_n=15 4 | max_n=20 5 | p=0.15 6 | dropbox=../../dropbox/ 7 | data_folder=$dropbox/data/components 8 | min_c=1 9 | max_c=3 10 | max_lv=5 11 | 12 | save_fold=nodes-${min_n}-${max_n}-p-${p}-c-${min_c}-${max_c}-lv-${max_lv} 13 | output_root=$HOME/scratch/results/graph_classification/components/$save_fold 14 | 15 | if [ ! -e $output_root ]; 16 | then 17 | mkdir -p $output_root 18 | fi 19 | 20 | python er_components.py \ 21 | -data_folder $data_folder \ 22 | -save_dir $output_root \ 23 | -max_n $max_n \ 24 | -min_n $min_n \ 25 | -max_lv $max_lv \ 26 | -min_c $min_c \ 27 | -max_c $max_c \ 28 | -n_graphs 5000 \ 29 | -er_p $p \ 30 | $@ 31 | -------------------------------------------------------------------------------- /code/graph_classification/test_er_comp.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | min_n=40 4 | max_n=50 5 | p=0.05 6 | dropbox=../../dropbox/ 7 | data_folder=$dropbox/data/components 8 | min_c=1 9 | max_c=3 10 | max_lv=4 11 | 12 | save_fold=nodes-${min_n}-${max_n}-p-${p}-c-${min_c}-${max_c}-lv-${max_lv} 13 | output_root=$HOME/scratch/results/graph_classification/components/$save_fold 14 | saved_model=$output_root/epoch-best 15 | 16 | if [ ! -e $output_root ]; 17 | then 18 | mkdir -p $output_root 19 | fi 20 | 21 | python er_components.py \ 22 | -data_folder $data_folder \ 23 | -save_dir $output_root \ 24 | -max_n $max_n \ 25 | -min_n $min_n \ 26 | -max_lv $max_lv \ 27 | -min_c $min_c \ 28 | -max_c $max_c \ 29 | -saved_model $saved_model \ 30 | -n_graphs 5000 \ 31 | -er_p $p \ 32 | $@ 33 | -------------------------------------------------------------------------------- /code/node_attack/exhaust_attack.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | import cPickle as cp 16 | import time 17 | 18 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 19 | from cmd_args import cmd_args 20 | 21 | sys.path.append('%s/../node_classification' % os.path.dirname(os.path.realpath(__file__))) 22 | from node_utils import load_txt_data, load_binary_data, run_test, load_raw_graph 23 | 24 | from node_attack_common import load_base_model, ModifiedGraph 25 | 26 | def check_attack_rate(gcn, features, labels, idx_test, list_of_modification): 27 | all_acc = torch.ones(len(idx_test), 1) 28 | pbar = tqdm(list_of_modification) 29 | _, _, orig_acc = gcn(features, Variable(gcn.norm_tool.normed_adj), idx_test, labels) 30 | attackable = {} 31 | ftxt = open('%s/%s-exaust.txt' % (cmd_args.save_dir, cmd_args.dataset), 'w', 0) 32 | 33 | for g in pbar: 34 | adj = gcn.norm_tool.norm_extra(g.get_extra_adj()) 35 | _, _, acc = gcn(features, Variable(adj), idx_test, labels) 36 | 37 | for i in range(len(idx_test)): 38 | if float(acc[i]) < float(orig_acc[i]): 39 | if not idx_test[i] in attackable: 40 | attackable[idx_test[i]] = [] 41 | attackable[idx_test[i]].append(g.directed_edges) 42 | ftxt.write('%d:' %idx_test[i]) 43 | for e in g.directed_edges: 44 | ftxt.write(' %d %d' % (e[0], e[1])) 45 | ftxt.write('\n') 46 | all_acc *= acc.float() 47 | 48 | cur_acc = all_acc.sum() / float(len(idx_test)) 49 | 50 | pbar.set_description('cur_acc: %0.5f' % (cur_acc) ) 51 | with open('%s/%s-exaust.pkl' % (cmd_args.save_dir, cmd_args.dataset), 'wb') as f: 52 | cp.dump(attackable, f, cp.HIGHEST_PROTOCOL) 53 | 54 | def gen_modified(dict_of_lists, mod_type): 55 | for i in range(len(dict_of_lists)): 56 | if mod_type == 'any' or mod_type == 'del': 57 | for j in dict_of_lists[i]: 58 | yield ModifiedGraph([(i, j)], [-1.0]) 59 | if mod_type == 'del': 60 | continue 61 | for j in range(i + 1, len(dict_of_lists)): 62 | if not j in dict_of_lists[i]: 63 | g = ModifiedGraph([(i, j)], [1.0]) 64 | yield g 65 | 66 | def recur_gen_edges(center, khop_neighbors, dict_of_lists, cur_list, n_edges): 67 | for j in khop_neighbors[center]: 68 | if not j in dict_of_lists[center] and j != center: 69 | new_list = cur_list + [(center, j)] 70 | if len(new_list) == n_edges: 71 | g = ModifiedGraph(new_list, [1.0] * n_edges) 72 | yield g 73 | else: 74 | for g in recur_gen_edges(center, khop_neighbors, dict_of_lists, new_list, n_edges): 75 | yield g 76 | 77 | def gen_khop_edges(khop_neighbors, dict_of_lists, n_edges): 78 | for i in range(len(dict_of_lists)): 79 | for g in recur_gen_edges(i, khop_neighbors, dict_of_lists, [], n_edges): 80 | yield g 81 | 82 | if __name__ == '__main__': 83 | random.seed(cmd_args.seed) 84 | np.random.seed(cmd_args.seed) 85 | torch.manual_seed(cmd_args.seed) 86 | 87 | features, labels, idx_train, idx_val, idx_test = load_txt_data(cmd_args.data_folder + '/' + cmd_args.dataset, cmd_args.dataset) 88 | if cmd_args.meta_test: 89 | idx_test = idx_val 90 | 91 | features = Variable( features ) 92 | labels = Variable( torch.LongTensor( np.argmax(labels, axis=1) ) ) 93 | if cmd_args.ctx == 'gpu': 94 | labels = labels.cuda() 95 | 96 | base_model = load_base_model() 97 | run_test(base_model, features, Variable( base_model.norm_tool.normed_adj ), idx_test, labels) 98 | 99 | dict_of_lists = load_raw_graph(cmd_args.data_folder + '/' + cmd_args.dataset, cmd_args.dataset) 100 | 101 | # freely add edges 102 | if cmd_args.n_hops <= 0: 103 | check_attack_rate(base_model, features, labels, idx_test, gen_modified(dict_of_lists, 'del')) 104 | else: 105 | # add edges within k-hop 106 | pass 107 | # khop_neighbors = get_khop_neighbors(dict_of_lists, cmd_args.n_hops) 108 | # nei_size = [] 109 | # for i in khop_neighbors: 110 | # nei_size.append(len(khop_neighbors[i])) 111 | # print(np.mean(nei_size), np.max(nei_size), np.min(nei_size)) 112 | # check_attack_rate(base_model, features, labels, idx_test, gen_khop_edges(khop_neighbors, dict_of_lists, 1)) 113 | -------------------------------------------------------------------------------- /code/node_attack/node_attack_common.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import numpy as np 4 | import torch 5 | import networkx as nx 6 | import random 7 | from torch.autograd import Variable 8 | from torch.nn.parameter import Parameter 9 | import torch.nn as nn 10 | import torch.nn.functional as F 11 | import torch.optim as optim 12 | from tqdm import tqdm 13 | from copy import deepcopy 14 | import cPickle as cp 15 | 16 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 17 | from graph_embedding import S2VGraph 18 | from cmd_args import cmd_args 19 | 20 | sys.path.append('%s/../node_classification' % os.path.dirname(os.path.realpath(__file__))) 21 | from node_utils import load_txt_data, run_test, load_raw_graph, StaticGraph 22 | from gcn_modules import GCNModule, S2VNodeClassifier 23 | 24 | class ModifiedGraph(object): 25 | def __init__(self, directed_edges = None, weights = None): 26 | if directed_edges is not None: 27 | self.directed_edges = deepcopy(directed_edges) 28 | self.weights = deepcopy(weights) 29 | else: 30 | self.directed_edges = [] 31 | self.weights = [] 32 | 33 | def add_edge(self, x, y, z): 34 | assert x is not None and y is not None 35 | if x == y: 36 | return 37 | for e in self.directed_edges: 38 | if e[0] == x and e[1] == y: 39 | return 40 | if e[1] == x and e[0] == y: 41 | return 42 | self.directed_edges.append((x, y)) 43 | assert z < 0 44 | self.weights.append(-1.0) 45 | 46 | def get_extra_adj(self): 47 | if len(self.directed_edges): 48 | edges = np.array(self.directed_edges, dtype=np.int64) 49 | rev_edges = np.array([edges[:, 1], edges[:, 0]], dtype=np.int64) 50 | edges = np.hstack((edges.T, rev_edges)) 51 | 52 | idxes = torch.LongTensor(edges) 53 | values = torch.Tensor(self.weights + self.weights) 54 | 55 | added_adj = torch.sparse.FloatTensor(idxes, values, StaticGraph.get_gsize()) 56 | if cmd_args.ctx == 'gpu': 57 | added_adj = added_adj.cuda() 58 | return added_adj 59 | else: 60 | return None 61 | 62 | class NodeAttakEnv(object): 63 | def __init__(self, features, labels, all_targets, list_action_space, classifier): 64 | self.classifier = classifier 65 | self.list_action_space = list_action_space 66 | self.features = features 67 | self.labels = labels 68 | self.all_targets = all_targets 69 | 70 | def setup(self, target_nodes): 71 | self.target_nodes = target_nodes 72 | self.n_steps = 0 73 | self.first_nodes = None 74 | self.rewards = None 75 | self.binary_rewards = None 76 | self.modified_list = [] 77 | for i in range(len(self.target_nodes)): 78 | self.modified_list.append(ModifiedGraph()) 79 | 80 | self.list_acc_of_all = [] 81 | 82 | def step(self, actions): 83 | if self.first_nodes is None: # pick the first node of edge 84 | assert self.n_steps % 2 == 0 85 | self.first_nodes = actions[:] 86 | else: 87 | for i in range(len(self.target_nodes)): 88 | #assert self.first_nodes[i] != actions[i] 89 | self.modified_list[i].add_edge(self.first_nodes[i], actions[i], -1.0) 90 | self.first_nodes = None 91 | self.banned_list = None 92 | self.n_steps += 1 93 | 94 | if self.isTerminal(): 95 | acc_list = [] 96 | loss_list = [] 97 | for i in tqdm(range(len(self.target_nodes))): 98 | extra_adj = self.modified_list[i].get_extra_adj() 99 | adj = self.classifier.norm_tool.norm_extra(extra_adj) 100 | _, loss, acc = self.classifier(self.features, Variable(adj), self.all_targets, self.labels, avg_loss=False) 101 | cur_idx = self.all_targets.index(self.target_nodes[i]) 102 | acc = np.copy(acc.double().cpu().view(-1).numpy()) 103 | loss = loss.data.cpu().view(-1).numpy() 104 | self.list_acc_of_all.append(acc) 105 | acc_list.append(acc[cur_idx]) 106 | loss_list.append(loss[cur_idx]) 107 | self.binary_rewards = (np.array(acc_list) * -2.0 + 1.0).astype(np.float32) 108 | if cmd_args.reward_type == 'binary': 109 | self.rewards = (np.array(acc_list) * -2.0 + 1.0).astype(np.float32) 110 | else: 111 | assert cmd_args.reward_type == 'nll' 112 | self.rewards = np.array(loss_list).astype(np.float32) 113 | 114 | def sample_pos_rewards(self, num_samples): 115 | assert self.list_acc_of_all is not None 116 | cands = [] 117 | for i in range(len(self.list_acc_of_all)): 118 | succ = np.where( self.list_acc_of_all[i] < 0.9 )[0] 119 | for j in range(len(succ)): 120 | cands.append((i, self.all_targets[succ[j]])) 121 | if num_samples > len(cands): 122 | return cands 123 | random.shuffle(cands) 124 | return cands[0:num_samples] 125 | 126 | def uniformRandActions(self): 127 | act_list = [] 128 | offset = 0 129 | for i in range(len(self.target_nodes)): 130 | cur_node = self.target_nodes[i] 131 | region = self.list_action_space[cur_node] 132 | 133 | if self.first_nodes is not None and self.first_nodes[i] is not None: 134 | region = self.list_action_space[self.first_nodes[i]] 135 | 136 | if region is None: 137 | cur_action = np.random.randint(len(self.list_action_space)) 138 | else: 139 | cur_action = region[np.random.randint(len(region))] 140 | 141 | act_list.append(cur_action) 142 | return act_list 143 | 144 | def isTerminal(self): 145 | if self.n_steps == 2 * cmd_args.num_mod: 146 | return True 147 | return False 148 | 149 | def getStateRef(self): 150 | cp_first = [None] * len(self.target_nodes) 151 | if self.first_nodes is not None: 152 | cp_first = self.first_nodes 153 | 154 | return zip(self.target_nodes, self.modified_list, cp_first) 155 | 156 | def cloneState(self): 157 | cp_first = [None] * len(self.target_nodes) 158 | if self.first_nodes is not None: 159 | cp_first = self.first_nodes[:] 160 | 161 | return zip(self.target_nodes[:], deepcopy(self.modified_list), cp_first) 162 | 163 | def load_base_model(): 164 | assert cmd_args.saved_model is not None 165 | with open('%s-args.pkl' % cmd_args.saved_model, 'rb') as f: 166 | base_args = cp.load(f) 167 | if 'mean_field' in cmd_args.saved_model: 168 | mod = S2VNodeClassifier 169 | elif 'gcn' in cmd_args.saved_model: 170 | mod = GCNModule 171 | 172 | gcn = mod(**vars(base_args)) 173 | if cmd_args.ctx == 'gpu': 174 | gcn = gcn.cuda() 175 | gcn.load_state_dict(torch.load(cmd_args.saved_model+ '.model')) 176 | gcn.eval() 177 | return gcn 178 | 179 | def init_setup(): 180 | features, labels, _, idx_val, idx_test = load_txt_data(cmd_args.data_folder + '/' + cmd_args.dataset, cmd_args.dataset) 181 | features = Variable( features ) 182 | labels = Variable( torch.LongTensor( np.argmax(labels, axis=1) ) ) 183 | if cmd_args.ctx == 'gpu': 184 | labels = labels.cuda() 185 | 186 | base_model = load_base_model() 187 | run_test(base_model, features, Variable( base_model.norm_tool.normed_adj ), idx_test, labels) 188 | 189 | dict_of_lists = load_raw_graph(cmd_args.data_folder + '/' + cmd_args.dataset, cmd_args.dataset) 190 | 191 | return features, labels, idx_val, idx_test, base_model, dict_of_lists 192 | -------------------------------------------------------------------------------- /code/node_attack/node_dqn.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | from q_net_node import QNetNode, NStepQNetNode, node_greedy_actions 18 | from node_attack_common import load_base_model, NodeAttakEnv, init_setup 19 | 20 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 21 | from cmd_args import cmd_args 22 | 23 | sys.path.append('%s/../node_classification' % os.path.dirname(os.path.realpath(__file__))) 24 | from node_utils import run_test, load_raw_graph 25 | 26 | sys.path.append('%s/../graph_attack' % os.path.dirname(os.path.realpath(__file__))) 27 | from nstep_replay_mem import NstepReplayMem 28 | 29 | class Agent(object): 30 | def __init__(self, env, features, labels, idx_meta, idx_test, list_action_space, num_wrong = 0): 31 | self.features = features 32 | self.labels = labels 33 | self.idx_meta = idx_meta 34 | self.idx_test = idx_test 35 | self.num_wrong = num_wrong 36 | self.list_action_space = list_action_space 37 | self.mem_pool = NstepReplayMem(memory_size=500000, n_steps=2 * cmd_args.num_mod, balance_sample= cmd_args.reward_type == 'binary') 38 | self.env = env 39 | 40 | # self.net = QNetNode(features, labels, list_action_space) 41 | # self.old_net = QNetNode(features, labels, list_action_space) 42 | self.net = NStepQNetNode(2 * cmd_args.num_mod, features, labels, list_action_space) 43 | self.old_net = NStepQNetNode(2 * cmd_args.num_mod, features, labels, list_action_space) 44 | 45 | if cmd_args.ctx == 'gpu': 46 | self.net = self.net.cuda() 47 | self.old_net = self.old_net.cuda() 48 | 49 | self.eps_start = 1.0 50 | self.eps_end = 0.05 51 | self.eps_step = 100000 52 | self.burn_in = 10 53 | self.step = 0 54 | self.pos = 0 55 | self.best_eval = None 56 | self.take_snapshot() 57 | 58 | def take_snapshot(self): 59 | self.old_net.load_state_dict(self.net.state_dict()) 60 | 61 | def make_actions(self, time_t, greedy=False): 62 | self.eps = self.eps_end + max(0., (self.eps_start - self.eps_end) 63 | * (self.eps_step - max(0., self.step)) / self.eps_step) 64 | 65 | if random.random() < self.eps and not greedy: 66 | actions = self.env.uniformRandActions() 67 | else: 68 | cur_state = self.env.getStateRef() 69 | actions, values = self.net(time_t, cur_state, None, greedy_acts=True, is_inference=True) 70 | actions = list(actions.cpu().numpy()) 71 | 72 | return actions 73 | 74 | def run_simulation(self): 75 | if (self.pos + 1) * cmd_args.batch_size > len(self.idx_test): 76 | self.pos = 0 77 | random.shuffle(self.idx_test) 78 | 79 | selected_idx = self.idx_test[self.pos * cmd_args.batch_size : (self.pos + 1) * cmd_args.batch_size] 80 | self.pos += 1 81 | self.env.setup(selected_idx) 82 | 83 | t = 0 84 | list_of_list_st = [] 85 | list_of_list_at = [] 86 | 87 | while not self.env.isTerminal(): 88 | list_at = self.make_actions(t) 89 | list_st = self.env.cloneState() 90 | 91 | self.env.step(list_at) 92 | assert (env.rewards is not None) == env.isTerminal() 93 | if env.isTerminal(): 94 | rewards = env.rewards 95 | s_prime = None 96 | else: 97 | rewards = np.zeros(len(list_at), dtype=np.float32) 98 | s_prime = self.env.cloneState() 99 | 100 | self.mem_pool.add_list(list_st, list_at, rewards, s_prime, [env.isTerminal()] * len(list_at), t) 101 | list_of_list_st.append( deepcopy(list_st) ) 102 | list_of_list_at.append( deepcopy(list_at) ) 103 | 104 | t += 1 105 | 106 | if cmd_args.reward_type == 'nll': 107 | return 108 | T = t 109 | cands = self.env.sample_pos_rewards(len(selected_idx)) 110 | if len(cands): 111 | for c in cands: 112 | sample_idx, target = c 113 | doable = True 114 | for t in range(T): 115 | if self.list_action_space[target] is not None and (not list_of_list_at[t][sample_idx] in self.list_action_space[target]): 116 | doable = False 117 | break 118 | if not doable: 119 | continue 120 | for t in range(T): 121 | s_t = list_of_list_st[t][sample_idx] 122 | a_t = list_of_list_at[t][sample_idx] 123 | s_t = [target, deepcopy(s_t[1]), s_t[2]] 124 | if t + 1 == T: 125 | s_prime = (None, None, None) 126 | r = 1.0 127 | term = True 128 | else: 129 | s_prime = list_of_list_st[t + 1][sample_idx] 130 | s_prime = [target, deepcopy(s_prime[1]), s_prime[2]] 131 | r = 0.0 132 | term = False 133 | self.mem_pool.mem_cells[t].add(s_t, a_t, r, s_prime, term) 134 | 135 | def eval(self): 136 | self.env.setup(self.idx_meta) 137 | t = 0 138 | while not self.env.isTerminal(): 139 | list_at = self.make_actions(t, greedy=True) 140 | self.env.step(list_at) 141 | t += 1 142 | 143 | acc = 1 - (self.env.binary_rewards + 1.0) / 2.0 144 | acc = np.sum(acc) / (len(self.idx_meta) + self.num_wrong) 145 | print('\033[93m average test: acc %.5f\033[0m' % (acc)) 146 | 147 | if cmd_args.phase == 'train' and self.best_eval is None or acc < self.best_eval: 148 | print('----saving to best attacker since this is the best attack rate so far.----') 149 | torch.save(self.net.state_dict(), cmd_args.save_dir + '/epoch-best.model') 150 | with open(cmd_args.save_dir + '/epoch-best.txt', 'w') as f: 151 | f.write('%.4f\n' % acc) 152 | with open(cmd_args.save_dir + '/attack_solution.txt', 'w') as f: 153 | for i in range(len(self.idx_meta)): 154 | f.write('%d: [' % self.idx_meta[i]) 155 | for e in self.env.modified_list[i].directed_edges: 156 | f.write('(%d %d)' % e) 157 | f.write('] succ: %d\n' % (self.env.binary_rewards[i])) 158 | self.best_eval = acc 159 | 160 | def train(self): 161 | pbar = tqdm(range(self.burn_in), unit='batch') 162 | for p in pbar: 163 | self.run_simulation() 164 | 165 | pbar = tqdm(range(cmd_args.num_steps), unit='steps') 166 | optimizer = optim.Adam(self.net.parameters(), lr=cmd_args.learning_rate) 167 | 168 | for self.step in pbar: 169 | 170 | self.run_simulation() 171 | 172 | if self.step % 123 == 0: 173 | self.take_snapshot() 174 | if self.step % 500 == 0: 175 | self.eval() 176 | 177 | cur_time, list_st, list_at, list_rt, list_s_primes, list_term = self.mem_pool.sample(batch_size=cmd_args.batch_size) 178 | list_target = torch.Tensor(list_rt) 179 | if cmd_args.ctx == 'gpu': 180 | list_target = list_target.cuda() 181 | 182 | if not list_term[0]: 183 | target_nodes, _, picked_nodes = zip(*list_s_primes) 184 | _, q_t_plus_1 = self.old_net(cur_time + 1, list_s_primes, None) 185 | _, q_rhs = node_greedy_actions(target_nodes, picked_nodes, q_t_plus_1, self.old_net) 186 | list_target += q_rhs 187 | 188 | list_target = Variable(list_target.view(-1, 1)) 189 | 190 | _, q_sa = self.net(cur_time, list_st, list_at) 191 | q_sa = torch.cat(q_sa, dim=0) 192 | loss = F.mse_loss(q_sa, list_target) 193 | optimizer.zero_grad() 194 | loss.backward() 195 | optimizer.step() 196 | pbar.set_description('eps: %.5f, loss: %0.5f, q_val: %.5f' % (self.eps, loss, torch.mean(q_sa)[0]) ) 197 | 198 | if __name__ == '__main__': 199 | random.seed(cmd_args.seed) 200 | np.random.seed(cmd_args.seed) 201 | torch.manual_seed(cmd_args.seed) 202 | 203 | features, labels, idx_valid, idx_test, base_model, dict_of_lists = init_setup() 204 | _, _, acc_test = base_model(features, Variable(base_model.norm_tool.normed_adj), idx_test, labels) 205 | acc_test = acc_test.double().cpu().numpy() 206 | 207 | attack_list = [] 208 | for i in range(len(idx_test)): 209 | if acc_test[i] > 0 and len(dict_of_lists[idx_test[i]]): 210 | attack_list.append(idx_test[i]) 211 | 212 | if not cmd_args.meta_test: 213 | total = attack_list 214 | idx_valid = idx_test 215 | else: 216 | total = attack_list + idx_valid 217 | 218 | _, _, acc_test = base_model(features, Variable(base_model.norm_tool.normed_adj), idx_valid, labels) 219 | acc_test = acc_test.double().cpu().numpy() 220 | meta_list = [] 221 | num_wrong = 0 222 | for i in range(len(idx_valid)): 223 | if acc_test[i] > 0: 224 | if len(dict_of_lists[idx_valid[i]]): 225 | meta_list.append(idx_valid[i]) 226 | else: 227 | num_wrong += 1 228 | print( len(meta_list) / float(len(idx_valid))) 229 | 230 | env = NodeAttakEnv(features, labels, total, dict_of_lists, base_model) 231 | 232 | agent = Agent(env, features, labels, meta_list, attack_list, dict_of_lists, num_wrong = num_wrong) 233 | 234 | if cmd_args.phase == 'train': 235 | agent.train() 236 | else: 237 | agent.net.load_state_dict(torch.load(cmd_args.save_dir + '/epoch-best.model')) 238 | agent.eval() 239 | -------------------------------------------------------------------------------- /code/node_attack/node_genetic.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 18 | from cmd_args import cmd_args 19 | 20 | from node_attack_common import ModifiedGraph, init_setup 21 | 22 | class NodeGeneticAgent(object): 23 | def __init__(self, features, labels, list_action_space, classifier, n_edges_attack, target_node): 24 | self.n_edges_attack = n_edges_attack 25 | self.classifier = classifier 26 | self.list_action_space = list_action_space 27 | self.features = features 28 | self.labels = labels 29 | self.target_node = target_node 30 | self.total_nodes = len(self.list_action_space) 31 | self.solution = None 32 | self.population = [] 33 | 34 | region = self.list_action_space[target_node] 35 | if len(set(region)) == 0: 36 | return 37 | if len(set(region)) == 1 and self.target_node in region: 38 | return 39 | 40 | for k in range(cmd_args.population_size): 41 | added = ModifiedGraph() 42 | for k in range(n_edges_attack): 43 | while True: 44 | x = self.rand_action(self.target_node) 45 | y = self.rand_action(x) 46 | if x == y: 47 | continue 48 | break 49 | added.add_edge(x, y, -1.0) 50 | self.population.append(added) 51 | 52 | def rand_action(self, x): 53 | region = self.list_action_space[x] 54 | y = region[np.random.randint(len(region))] 55 | return y 56 | 57 | def get_fitness(self): 58 | nll_list = [] 59 | for i in range(len(self.population)): 60 | adj = self.classifier.norm_tool.norm_extra( self.population[i].get_extra_adj() ) 61 | adj = Variable(adj, volatile=True) 62 | _, loss, acc = self.classifier(self.features, adj, [self.target_node], self.labels) 63 | nll_list.append(loss.cpu().data.numpy()[0]) 64 | # print(i, self.population[i].directed_edges, float(acc.cpu()[0])) 65 | if self.solution is None and float(acc.cpu()[0]) < 1.0: # successed 66 | self.solution = self.population[i] 67 | break 68 | return np.array(nll_list) 69 | 70 | def select(self, fitness): 71 | scores = np.exp(fitness) 72 | max_args = np.argsort(-scores) 73 | 74 | result = [] 75 | for i in range(cmd_args.population_size - cmd_args.population_size // 2): 76 | result.append(deepcopy(self.population[max_args[i]])) 77 | 78 | idx = np.random.choice(np.arange(cmd_args.population_size), 79 | size=cmd_args.population_size // 2, 80 | replace=True, 81 | p=scores/scores.sum()) 82 | for i in idx: 83 | result.append(deepcopy(self.population[i])) 84 | return result 85 | 86 | def crossover(self, parent, pop): 87 | if np.random.rand() < cmd_args.cross_rate: 88 | another = pop[ np.random.randint(len(pop)) ] 89 | if len(parent.directed_edges) == 0: 90 | return deepcopy(another) 91 | if len(another.directed_edges) == 0: 92 | return deepcopy(parent) 93 | new_graph = ModifiedGraph() 94 | for i in range(self.n_edges_attack): 95 | if np.random.rand() < 0.5: 96 | e = parent.directed_edges[i] 97 | new_graph.add_edge(e[0], e[1], parent.weights[i]) 98 | else: 99 | e = another.directed_edges[i] 100 | new_graph.add_edge(e[0], e[1], another.weights[i]) 101 | return new_graph 102 | else: 103 | return deepcopy(parent) 104 | 105 | def mutate(self, child): 106 | for i in range(self.n_edges_attack): 107 | if len(child.directed_edges) == 0: 108 | continue 109 | if np.random.rand() < cmd_args.mutate_rate: 110 | if np.random.rand() < 0.5: 111 | new_e = (child.directed_edges[i][0], self.rand_action(child.directed_edges[i][0])) 112 | child.directed_edges[i] = new_e 113 | else: 114 | new_e = (self.rand_action(child.directed_edges[i][1]), child.directed_edges[i][1]) 115 | child.directed_edges[i] = new_e 116 | 117 | def evolve(self): 118 | fitness = self.get_fitness() 119 | if self.solution is not None: 120 | return 121 | pop = self.select(fitness) 122 | new_pop_list = [] 123 | for parent in pop: 124 | child = self.crossover(parent, pop) 125 | self.mutate(child) 126 | new_pop_list.append(child) 127 | 128 | self.population = new_pop_list 129 | 130 | if __name__ == '__main__': 131 | random.seed(cmd_args.seed) 132 | np.random.seed(cmd_args.seed) 133 | torch.manual_seed(cmd_args.seed) 134 | 135 | features, labels, _, idx_test, base_model, khop_neighbors = init_setup() 136 | 137 | if cmd_args.idx_start + cmd_args.num_instances > len(idx_test): 138 | instances = idx_test[cmd_args.idx_start : ] 139 | else: 140 | instances = idx_test[cmd_args.idx_start : cmd_args.idx_start + cmd_args.num_instances] 141 | 142 | pbar = tqdm(instances) 143 | attacked = 0.0 144 | for g in pbar: 145 | agent = NodeGeneticAgent(features, labels, khop_neighbors, base_model, cmd_args.num_mod, g) 146 | if len(agent.population) == 0: 147 | continue 148 | for i in range(cmd_args.rounds): 149 | agent.evolve() 150 | if agent.solution is not None: 151 | attacked += 1 152 | break 153 | with open('%s/sol-%d.txt' % (cmd_args.save_dir, g), 'w') as f: 154 | f.write('%d: [' % g) 155 | if agent.solution is not None: 156 | for e in agent.solution.directed_edges: 157 | f.write('(%d, %d)' % e) 158 | f.write('] succ: ') 159 | if agent.solution is not None: 160 | f.write('1\n') 161 | else: 162 | f.write('0\n') 163 | pbar.set_description('cur_attack: %.2f' % (attacked) ) 164 | 165 | print('\n\nacc: %.4f\n' % ((len(instances) - attacked) / float(len(instances))) ) 166 | -------------------------------------------------------------------------------- /code/node_attack/node_grad_attack.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 18 | from cmd_args import cmd_args 19 | 20 | sys.path.append('%s/../node_classification' % os.path.dirname(os.path.realpath(__file__))) 21 | from node_utils import load_txt_data, load_binary_data, run_test, load_raw_graph, StaticGraph 22 | 23 | from node_attack_common import load_base_model, ModifiedGraph, init_setup 24 | 25 | def propose_add(grad): 26 | idxes = np.argsort(grad) 27 | added = [] 28 | 29 | mod = ModifiedGraph() 30 | for p in idxes: 31 | x = p // len(StaticGraph.graph) 32 | y = p % len(StaticGraph.graph) 33 | if x == y or x in dict_of_lists[y] or y in dict_of_lists[x]: 34 | continue 35 | if cmd_args.n_hops > 0 and not x in khop_neighbors[y]: 36 | continue 37 | assert cmd_args.n_hops <= 0 or (x in khop_neighbors[y] and y in khop_neighbors[x]) 38 | mod.add_edge(x, y, 1.0) 39 | if len(mod.directed_edges) >= cmd_args.num_mod: 40 | break 41 | if len(mod.directed_edges) < cmd_args.num_mod: 42 | extra = None 43 | else: 44 | extra = mod.get_extra_adj() 45 | adj = base_model.norm_tool.norm_extra(extra) 46 | _, _, acc = base_model(features, Variable(adj), [idx], labels) 47 | acc = acc.double().cpu().numpy() 48 | 49 | return acc[0] < 1.0, mod 50 | 51 | def propose_del(grad): 52 | idxes = np.argsort(-grad) 53 | added = [] 54 | 55 | mod = ModifiedGraph() 56 | for p in idxes: 57 | x = p // len(StaticGraph.graph) 58 | y = p % len(StaticGraph.graph) 59 | if x == y: 60 | continue 61 | if not x in dict_of_lists[y] or not y in dict_of_lists[x]: 62 | continue 63 | mod.add_edge(x, y, -1.0) 64 | if len(mod.directed_edges) >= cmd_args.num_mod: 65 | break 66 | if len(mod.directed_edges) < cmd_args.num_mod: 67 | extra = None 68 | else: 69 | extra = mod.get_extra_adj() 70 | adj = base_model.norm_tool.norm_extra(extra) 71 | _, _, acc = base_model(features, Variable(adj), [idx], labels) 72 | acc = acc.double().cpu().numpy() 73 | 74 | return acc[0] < 1.0, mod 75 | 76 | 77 | if __name__ == '__main__': 78 | random.seed(cmd_args.seed) 79 | np.random.seed(cmd_args.seed) 80 | torch.manual_seed(cmd_args.seed) 81 | 82 | features, labels, _, idx_test, base_model, khop_neighbors = init_setup() 83 | np_labels = labels.cpu().data.numpy() 84 | 85 | method = propose_del 86 | attacked = 0.0 87 | pbar = tqdm(range(len(idx_test))) 88 | 89 | ftxt = open('%s/%s-grad.txt' % (cmd_args.save_dir, cmd_args.dataset), 'w', 0) 90 | dict_of_lists = load_raw_graph(cmd_args.data_folder + '/' + cmd_args.dataset, cmd_args.dataset) 91 | 92 | _, _, all_acc = base_model(features, Variable(base_model.norm_tool.normed_adj), idx_test, labels) 93 | all_acc = all_acc.cpu().numpy() 94 | for pos in pbar: 95 | if all_acc[pos] < 1.0: 96 | attacked += 1 97 | continue 98 | idx = idx_test[pos] 99 | fake_labels = labels.clone() 100 | 101 | if cmd_args.targeted: 102 | for i in range(cmd_args.num_class): 103 | if i == np_labels[idx]: 104 | continue 105 | adj = Variable( base_model.norm_tool.normed_adj, requires_grad=True ) 106 | base_model.zero_grad() 107 | fake_labels[idx] = i 108 | _, loss, acc = base_model(features, adj, [idx], fake_labels) 109 | loss.backward() 110 | grad = adj.grad.data.cpu().numpy().flatten() 111 | 112 | if method(grad)[0]: 113 | attacked += 1 114 | break 115 | else: 116 | adj = Variable( base_model.norm_tool.normed_adj, requires_grad=True ) 117 | base_model.zero_grad() 118 | _, loss, acc = base_model(features, adj, [idx], labels) 119 | loss = -loss 120 | loss.backward() 121 | grad = adj.grad.data.cpu().numpy().flatten() 122 | succ, mod = method(grad) 123 | if succ: 124 | ftxt.write('%d: [%d %d]\n' % (idx, mod.directed_edges[0][0], mod.directed_edges[0][1])) 125 | attacked += 1 126 | 127 | pbar.set_description('cur_attack: %.2f' % (attacked) ) 128 | ftxt.close() 129 | print( '%.6f\n' % (1.0 - attacked / len(idx_test)) ) 130 | 131 | 132 | -------------------------------------------------------------------------------- /code/node_attack/node_rand_attack.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 18 | from cmd_args import cmd_args 19 | 20 | sys.path.append('%s/../node_classification' % os.path.dirname(os.path.realpath(__file__))) 21 | from node_utils import load_txt_data, load_binary_data, run_test, load_raw_graph, StaticGraph 22 | 23 | from node_attack_common import load_base_model, ModifiedGraph, init_setup 24 | 25 | if __name__ == '__main__': 26 | random.seed(cmd_args.seed) 27 | np.random.seed(cmd_args.seed) 28 | torch.manual_seed(cmd_args.seed) 29 | 30 | features, labels, idx_valid, idx_test, base_model, khop_neighbors = init_setup() 31 | if cmd_args.meta_test: 32 | idx_test = idx_valid 33 | np_labels = labels.cpu().data.numpy() 34 | 35 | _, _, all_acc = base_model(features, Variable(base_model.norm_tool.normed_adj), idx_test, labels) 36 | all_acc = all_acc.cpu().numpy() 37 | print('acc before modification:', np.mean(all_acc)) 38 | 39 | attacked = 0.0 40 | pbar = tqdm(range(len(idx_test))) 41 | for pos in pbar: 42 | if all_acc[pos] < 1.0: 43 | attacked += 1 44 | continue 45 | idx = idx_test[pos] 46 | mod = ModifiedGraph() 47 | for i in range(cmd_args.num_mod): 48 | x = None 49 | y = None 50 | if len(set(khop_neighbors[idx])) == 0: 51 | continue 52 | if len(set(khop_neighbors[idx])) == 1 and idx in khop_neighbors[idx]: 53 | continue 54 | while True: 55 | region = khop_neighbors[idx] 56 | x = region[np.random.randint(len(region))] 57 | region = khop_neighbors[x] 58 | y = region[np.random.randint(len(region))] 59 | 60 | if x == y: 61 | continue 62 | assert x in khop_neighbors[y] and y in khop_neighbors[x] 63 | break 64 | if x is not None and y is not None: 65 | mod.add_edge(x, y, -1.0) 66 | if len(mod.directed_edges) != cmd_args.num_mod: 67 | continue 68 | adj = base_model.norm_tool.norm_extra(mod.get_extra_adj()) 69 | _, _, acc = base_model(features, Variable(adj), [idx], labels) 70 | acc = acc.double().cpu().numpy() 71 | if acc[0] < 1.0: 72 | attacked += 1 73 | 74 | pbar.set_description('cur_attack: %.2f' % (attacked) ) 75 | 76 | print( '%.6f\n' % (1.0 - attacked / len(idx_test)) ) 77 | 78 | -------------------------------------------------------------------------------- /code/node_attack/plot_grad.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | dataset=pubmed 5 | gm=gcn 6 | n_hops=2 7 | #del_rate=0.50 8 | data_folder=$dropbox/data 9 | saved_model=$dropbox/scratch/results/node_classification/$dataset/model-${gm}-epoch-best 10 | 11 | targeted=0 12 | num_mod=1 13 | output_base=$HOME/scratch/results/del_edge_attack/ 14 | 15 | save_fold=${dataset}-${gm} 16 | 17 | output_root=$output_base/$save_fold/grad-t-${targeted}-m-${num_mod} 18 | 19 | 20 | if [ ! -e $output_root ]; 21 | then 22 | mkdir -p $output_root 23 | fi 24 | 25 | python plot_node_grad_attack.py \ 26 | -num_mod $num_mod \ 27 | -targeted $targeted \ 28 | -data_folder $data_folder \ 29 | -n_hops $n_hops \ 30 | -dataset $dataset \ 31 | -save_dir $output_root \ 32 | -saved_model $saved_model \ 33 | $@ 34 | -------------------------------------------------------------------------------- /code/node_attack/plot_node_grad_attack.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | from copy import deepcopy 16 | 17 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 18 | from cmd_args import cmd_args 19 | 20 | sys.path.append('%s/../node_classification' % os.path.dirname(os.path.realpath(__file__))) 21 | from node_utils import load_txt_data, load_binary_data, run_test, load_raw_graph, StaticGraph 22 | 23 | from node_attack_common import load_base_model, ModifiedGraph, init_setup 24 | 25 | def propose_add(grad): 26 | idxes = np.argsort(grad) 27 | added = [] 28 | 29 | mod = ModifiedGraph() 30 | for p in idxes: 31 | x = p // len(StaticGraph.graph) 32 | y = p % len(StaticGraph.graph) 33 | if x == y or x in dict_of_lists[y] or y in dict_of_lists[x]: 34 | continue 35 | if cmd_args.n_hops > 0 and not x in khop_neighbors[y]: 36 | continue 37 | assert cmd_args.n_hops <= 0 or (x in khop_neighbors[y] and y in khop_neighbors[x]) 38 | mod.add_edge(x, y, 1.0) 39 | if len(mod.directed_edges) >= cmd_args.num_mod: 40 | break 41 | if len(mod.directed_edges) < cmd_args.num_mod: 42 | extra = None 43 | else: 44 | extra = mod.get_extra_adj() 45 | adj = base_model.norm_tool.norm_extra(extra) 46 | _, _, acc = base_model(features, Variable(adj), [idx], labels) 47 | acc = acc.double().cpu().numpy() 48 | 49 | return acc[0] < 1.0, mod 50 | 51 | def propose_del(grad): 52 | idxes = np.argsort(-grad) 53 | added = [] 54 | 55 | mod = ModifiedGraph() 56 | for p in idxes: 57 | x = p // len(StaticGraph.graph) 58 | y = p % len(StaticGraph.graph) 59 | if x == y: 60 | continue 61 | if not x in dict_of_lists[y] or not y in dict_of_lists[x]: 62 | continue 63 | mod.add_edge(x, y, -1.0) 64 | if len(mod.directed_edges) >= cmd_args.num_mod: 65 | break 66 | if len(mod.directed_edges) < cmd_args.num_mod: 67 | extra = None 68 | else: 69 | extra = mod.get_extra_adj() 70 | adj = base_model.norm_tool.norm_extra(extra) 71 | pred, _, acc = base_model(features, Variable(adj), [idx], labels) 72 | acc = acc.double().cpu().numpy() 73 | 74 | return acc[0] < 1.0, mod, pred.cpu().numpy() 75 | 76 | 77 | if __name__ == '__main__': 78 | random.seed(cmd_args.seed) 79 | np.random.seed(cmd_args.seed) 80 | torch.manual_seed(cmd_args.seed) 81 | 82 | features, labels, _, idx_test, base_model, khop_neighbors = init_setup() 83 | np_labels = labels.cpu().data.numpy() 84 | 85 | method = propose_del 86 | attacked = 0.0 87 | pbar = tqdm(range(len(idx_test))) 88 | 89 | dict_of_lists = load_raw_graph(cmd_args.data_folder + '/' + cmd_args.dataset, cmd_args.dataset) 90 | with open('%s/%s-grad-labels.txt' % (cmd_args.save_dir, cmd_args.dataset), 'w', 0) as f: 91 | for i in range(len(np_labels)): 92 | f.write('%d\n' % np_labels[i]) 93 | 94 | _, _, all_acc = base_model(features, Variable(base_model.norm_tool.normed_adj), idx_test, labels) 95 | all_acc = all_acc.cpu().numpy() 96 | fsol = open('%s/%s-grad.txt' % (cmd_args.save_dir, cmd_args.dataset), 'w', 0) 97 | for pos in pbar: 98 | if all_acc[pos] < 1.0: 99 | attacked += 1 100 | continue 101 | idx = idx_test[pos] 102 | fake_labels = labels.clone() 103 | 104 | if cmd_args.targeted: 105 | for i in range(cmd_args.num_class): 106 | if i == np_labels[idx]: 107 | continue 108 | adj = Variable( base_model.norm_tool.normed_adj, requires_grad=True ) 109 | base_model.zero_grad() 110 | fake_labels[idx] = i 111 | _, loss, acc = base_model(features, adj, [idx], fake_labels) 112 | loss.backward() 113 | grad = adj.grad.data.cpu().numpy().flatten() 114 | 115 | if method(grad)[0]: 116 | attacked += 1 117 | break 118 | else: 119 | adj = Variable( base_model.norm_tool.normed_adj, requires_grad=True ) 120 | base_model.zero_grad() 121 | _, loss, acc = base_model(features, adj, [idx], labels) 122 | loss = -loss 123 | loss.backward() 124 | grad = adj.grad.data.cpu().numpy().flatten() 125 | succ, mod, pred = method(grad) 126 | if succ: 127 | fsol.write('%d: [%d %d]\n' % (idx, mod.directed_edges[0][0], mod.directed_edges[0][1])) 128 | ftxt = open('%s/%s-grad-%d.txt' % (cmd_args.save_dir, cmd_args.dataset, idx), 'w', 0) 129 | ftxt.write('origin: %d, pred: %d\n' % (np_labels[idx], pred[0])) 130 | seen = set() 131 | for i in dict_of_lists[idx]: 132 | for j in dict_of_lists[i]: 133 | if (i, j) in seen or (j, i) in seen: 134 | continue 135 | score = grad[i * len(StaticGraph.graph) + j] 136 | ftxt.write('%d %d %.4f\n' % (i, j, score)) 137 | score = grad[j * len(StaticGraph.graph) + i] 138 | ftxt.write('%d %d %.4f\n' % (j, i, score)) 139 | 140 | attacked += 1 141 | ftxt.close() 142 | 143 | pbar.set_description('cur_attack: %.2f' % (attacked) ) 144 | fsol.close() 145 | print( '%.6f\n' % (1.0 - attacked / len(idx_test)) ) 146 | 147 | 148 | -------------------------------------------------------------------------------- /code/node_attack/q_net_node.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | 16 | sys.path.append('%s/../../pytorch_structure2vec/s2v_lib' % os.path.dirname(os.path.realpath(__file__))) 17 | from pytorch_util import weights_init 18 | 19 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 20 | from cmd_args import cmd_args 21 | from graph_embedding import gnn_spmm 22 | 23 | sys.path.append('%s/../node_classification' % os.path.dirname(os.path.realpath(__file__))) 24 | from node_utils import GraphNormTool 25 | 26 | def node_greedy_actions(target_nodes, picked_nodes, list_q, net): 27 | assert len(target_nodes) == len(list_q) 28 | 29 | actions = [] 30 | values = [] 31 | for i in range(len(target_nodes)): 32 | region = net.list_action_space[target_nodes[i]] 33 | if picked_nodes is not None and picked_nodes[i] is not None: 34 | region = net.list_action_space[picked_nodes[i]] 35 | if region is None: 36 | assert list_q[i].size()[0] == net.total_nodes 37 | else: 38 | assert len(region) == list_q[i].size()[0] 39 | 40 | val, act = torch.max(list_q[i], dim=0) 41 | values.append(val) 42 | if region is not None: 43 | act = region[act.data.cpu().numpy()[0]] 44 | act = Variable(torch.LongTensor([act])) 45 | actions.append( act ) 46 | else: 47 | actions.append(act) 48 | 49 | return torch.cat(actions, dim=0).data, torch.cat(values, dim=0).data 50 | 51 | class QNetNode(nn.Module): 52 | def __init__(self, node_features, node_labels, list_action_space): 53 | super(QNetNode, self).__init__() 54 | self.node_features = node_features 55 | self.node_labels = node_labels 56 | self.list_action_space = list_action_space 57 | self.total_nodes = len(list_action_space) 58 | 59 | embed_dim = cmd_args.latent_dim 60 | if cmd_args.bilin_q: 61 | last_wout = embed_dim 62 | else: 63 | last_wout = 1 64 | self.bias_target = Parameter(torch.Tensor(1, embed_dim)) 65 | 66 | if cmd_args.mlp_hidden: 67 | self.linear_1 = nn.Linear(embed_dim * 2, cmd_args.mlp_hidden) 68 | self.linear_out = nn.Linear(cmd_args.mlp_hidden, last_wout) 69 | else: 70 | self.linear_out = nn.Linear(embed_dim * 2, last_wout) 71 | 72 | self.w_n2l = Parameter(torch.Tensor(node_features.size()[1], embed_dim)) 73 | self.bias_n2l = Parameter(torch.Tensor(embed_dim)) 74 | self.bias_picked = Parameter(torch.Tensor(1, embed_dim)) 75 | self.conv_params = nn.Linear(embed_dim, embed_dim) 76 | self.norm_tool = GraphNormTool(cmd_args.adj_norm, cmd_args.gm) 77 | weights_init(self) 78 | 79 | def make_spmat(self, n_rows, n_cols, row_idx, col_idx): 80 | idxes = torch.LongTensor([[row_idx], [col_idx]]) 81 | values = torch.ones(1) 82 | 83 | sp = torch.sparse.FloatTensor(idxes, values, torch.Size([n_rows, n_cols])) 84 | if next(self.parameters()).is_cuda: 85 | sp = sp.cuda() 86 | return sp 87 | 88 | def forward(self, time_t, states, actions, greedy_acts = False, is_inference=False): 89 | if self.node_features.data.is_sparse: 90 | input_node_linear = gnn_spmm(self.node_features, self.w_n2l) 91 | else: 92 | input_node_linear = torch.mm(self.node_features, self.w_n2l) 93 | input_node_linear += self.bias_n2l 94 | 95 | target_nodes, batch_graph, picked_nodes = zip(*states) 96 | 97 | list_pred = [] 98 | prefix_sum = [] 99 | for i in range(len(batch_graph)): 100 | region = self.list_action_space[target_nodes[i]] 101 | 102 | node_embed = input_node_linear.clone() 103 | if picked_nodes is not None and picked_nodes[i] is not None: 104 | picked_sp = Variable( self.make_spmat(self.total_nodes, 1, picked_nodes[i], 0), volatile=is_inference ) 105 | node_embed += gnn_spmm(picked_sp, self.bias_picked) 106 | region = self.list_action_space[picked_nodes[i]] 107 | 108 | if not cmd_args.bilin_q: 109 | target_sp = Variable( self.make_spmat(self.total_nodes, 1, target_nodes[i], 0), volatile=is_inference) 110 | node_embed += gnn_spmm(target_sp, self.bias_target) 111 | 112 | adj = Variable( self.norm_tool.norm_extra( batch_graph[i].get_extra_adj() ), volatile=is_inference ) 113 | lv = 0 114 | input_message = node_embed 115 | node_embed = F.relu(input_message) 116 | while lv < cmd_args.max_lv: 117 | n2npool = gnn_spmm(adj, node_embed) 118 | node_linear = self.conv_params( n2npool ) 119 | merged_linear = node_linear + input_message 120 | node_embed = F.relu(merged_linear) 121 | lv += 1 122 | 123 | target_embed = node_embed[target_nodes[i], :].view(-1, 1) 124 | if region is not None: 125 | node_embed = node_embed[region] 126 | 127 | graph_embed = torch.mean(node_embed, dim=0, keepdim=True) 128 | 129 | if actions is None: 130 | graph_embed = graph_embed.repeat(node_embed.size()[0], 1) 131 | else: 132 | if region is not None: 133 | act_idx = region.index(actions[i]) 134 | else: 135 | act_idx = actions[i] 136 | node_embed = node_embed[act_idx, :].view(1, -1) 137 | 138 | embed_s_a = torch.cat((node_embed, graph_embed), dim=1) 139 | if cmd_args.mlp_hidden: 140 | embed_s_a = F.relu( self.linear_1(embed_s_a) ) 141 | raw_pred = self.linear_out(embed_s_a) 142 | 143 | if cmd_args.bilin_q: 144 | raw_pred = torch.mm(raw_pred, target_embed) 145 | list_pred.append(raw_pred) 146 | 147 | if greedy_acts: 148 | actions, _ = node_greedy_actions(target_nodes, picked_nodes, list_pred, self) 149 | 150 | return actions, list_pred 151 | 152 | class NStepQNetNode(nn.Module): 153 | def __init__(self, num_steps, node_features, node_labels, list_action_space): 154 | super(NStepQNetNode, self).__init__() 155 | self.node_features = node_features 156 | self.node_labels = node_labels 157 | self.list_action_space = list_action_space 158 | self.total_nodes = len(list_action_space) 159 | 160 | list_mod = [] 161 | 162 | for i in range(0, num_steps): 163 | list_mod.append(QNetNode(node_features, node_labels, list_action_space)) 164 | 165 | self.list_mod = nn.ModuleList(list_mod) 166 | 167 | self.num_steps = num_steps 168 | 169 | def forward(self, time_t, states, actions, greedy_acts = False, is_inference=False): 170 | assert time_t >= 0 and time_t < self.num_steps 171 | 172 | return self.list_mod[time_t](time_t, states, actions, greedy_acts, is_inference) 173 | -------------------------------------------------------------------------------- /code/node_attack/run_attack.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | data_folder=$dropbox/data 5 | 6 | dataset=pubmed 7 | base_gm=gcn 8 | del_rate=0.01 9 | saved_model=$dropbox/scratch/results/node_classification/$dataset/model-${base_gm}-epoch-best-${del_rate} 10 | 11 | lr=0.01 12 | max_lv=1 13 | num_epochs=200 14 | batch_size=10 15 | hidden=0 16 | n_hops=3 17 | bilin_q=1 18 | reward_type=binary 19 | gm=mean_field 20 | adj_norm=1 21 | meta_test=0 22 | num_mod=1 23 | 24 | output_base=$HOME/scratch/results/del_edge_attack/${dataset}-${base_gm}-${del_rate} 25 | save_fold=rl-lv-${max_lv}-q-${bilin_q}-meta-${meta_test} 26 | output_root=$output_base/$save_fold 27 | 28 | if [ ! -e $output_root ]; 29 | then 30 | mkdir -p $output_root 31 | fi 32 | 33 | #export CUDA_VISIBLE_DEVICES=1 34 | python node_dqn.py \ 35 | -meta_test $meta_test \ 36 | -num_mod $num_mod \ 37 | -data_folder $data_folder \ 38 | -dataset $dataset \ 39 | -reward_type $reward_type \ 40 | -bilin_q $bilin_q \ 41 | -n_hops $n_hops \ 42 | -gm $gm \ 43 | -adj_norm $adj_norm \ 44 | -hidden $hidden \ 45 | -batch_size $batch_size \ 46 | -save_dir $output_root \ 47 | -num_epochs $num_epochs \ 48 | -saved_model $saved_model \ 49 | -max_lv $max_lv \ 50 | -learning_rate $lr \ 51 | -num_steps 500000 \ 52 | -phase train \ 53 | $@ 54 | -------------------------------------------------------------------------------- /code/node_attack/run_exhaust.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | dataset=pubmed 5 | gm=gcn 6 | n_hops=0 7 | del_rate=0.01 8 | data_folder=$dropbox/data 9 | saved_model=$dropbox/scratch/results/node_classification/$dataset/model-${gm}-epoch-best-${del_rate} 10 | meta_test=0 11 | 12 | output_base=$HOME/scratch/results/del_edge_attack/ 13 | save_fold=${dataset}-${gm}-${del_rate} 14 | 15 | output_root=$output_base/$save_fold/exhaust-meta-${meta_test} 16 | 17 | if [ ! -e $output_root ]; 18 | then 19 | mkdir -p $output_root 20 | fi 21 | 22 | python exhaust_attack.py \ 23 | -meta_test $meta_test \ 24 | -data_folder $data_folder \ 25 | -n_hops $n_hops \ 26 | -dataset $dataset \ 27 | -save_dir $output_root \ 28 | -saved_model $saved_model \ 29 | $@ 30 | -------------------------------------------------------------------------------- /code/node_attack/run_genetic.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | dataset=pubmed 5 | 6 | data_folder=$dropbox/data 7 | base_gm=gcn 8 | del_rate=0.01 9 | saved_model=$dropbox/scratch/results/node_classification/$dataset/model-${base_gm}-epoch-best-${del_rate} 10 | 11 | lr=0.01 12 | max_lv=1 13 | num_epochs=200 14 | hidden=0 15 | n_hops=0 16 | idx_start=0 17 | num=1000000 18 | pop=50 19 | cross=0.1 20 | mutate=0.2 21 | rounds=5 22 | 23 | output_base=$HOME/scratch/results/del_edge_attack/${dataset}-${base_gm}-${del_rate} 24 | save_fold=ga-p-${pop}-c-${cross}-m-${mutate}-r-${rounds} 25 | output_root=$output_base/$save_fold 26 | 27 | if [ ! -e $output_root ]; 28 | then 29 | mkdir -p $output_root 30 | fi 31 | 32 | python node_genetic.py \ 33 | -data_folder $data_folder \ 34 | -dataset $dataset \ 35 | -idx_start $idx_start \ 36 | -population_size $pop \ 37 | -cross_rate $cross \ 38 | -mutate_rate $mutate \ 39 | -rounds $rounds \ 40 | -num_instances $num \ 41 | -n_hops $n_hops \ 42 | -hidden $hidden \ 43 | -save_dir $output_root \ 44 | -num_epochs $num_epochs \ 45 | -saved_model $saved_model \ 46 | -max_lv $max_lv \ 47 | -learning_rate $lr \ 48 | -instance_id 1968 \ 49 | $@ 50 | -------------------------------------------------------------------------------- /code/node_attack/run_grad.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | dataset=cora 5 | gm=gcn 6 | n_hops=2 7 | del_rate=0.00 8 | data_folder=$dropbox/data 9 | saved_model=$dropbox/scratch/results/node_classification/$dataset/model-${gm}-epoch-best-${del_rate} 10 | 11 | targeted=0 12 | num_mod=1 13 | output_base=$HOME/scratch/results/del_edge_attack/ 14 | 15 | save_fold=${dataset}-${gm} 16 | 17 | output_root=$output_base/$save_fold/grad-t-${targeted}-m-${num_mod} 18 | 19 | 20 | if [ ! -e $output_root ]; 21 | then 22 | mkdir -p $output_root 23 | fi 24 | 25 | python node_grad_attack.py \ 26 | -num_mod $num_mod \ 27 | -targeted $targeted \ 28 | -data_folder $data_folder \ 29 | -n_hops $n_hops \ 30 | -dataset $dataset \ 31 | -save_dir $output_root \ 32 | -saved_model $saved_model \ 33 | $@ 34 | -------------------------------------------------------------------------------- /code/node_attack/run_rand.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | dataset=pubmed 5 | gm=gcn 6 | del_rate=0.50 7 | data_folder=$dropbox/data 8 | saved_model=$dropbox/scratch/results/node_classification/$dataset/model-${gm}-epoch-best-${del_rate} 9 | 10 | meta_test=0 11 | num_mod=1 12 | 13 | python node_rand_attack.py \ 14 | -meta_test $meta_test \ 15 | -num_mod $num_mod \ 16 | -data_folder $data_folder \ 17 | -dataset $dataset \ 18 | -saved_model $saved_model \ 19 | $@ 20 | -------------------------------------------------------------------------------- /code/node_classification/gcn.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | import cPickle as cp 16 | import time 17 | 18 | sys.path.append('%s/../../pytorch_structure2vec/s2v_lib' % os.path.dirname(os.path.realpath(__file__))) 19 | from pytorch_util import weights_init 20 | 21 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 22 | from cmd_args import cmd_args, save_args 23 | from gcn_modules import GCNModule, S2VNodeClassifier 24 | from node_utils import load_binary_data, load_txt_data, run_test, StaticGraph 25 | 26 | def adj_generator(): 27 | directed_edges = StaticGraph.graph.edges() 28 | 29 | while True: 30 | if cmd_args.del_rate > 0: 31 | random.shuffle(directed_edges) 32 | del_num = int(len(directed_edges) * cmd_args.del_rate) 33 | for i in range(len(directed_edges) // del_num): 34 | cur_edges = directed_edges[i * del_num : (i + 1) * del_num] 35 | edges = np.array(cur_edges, dtype=np.int64) 36 | rev_edges = np.array([edges[:, 1], edges[:, 0]], dtype=np.int64) 37 | edges = np.hstack((edges.T, rev_edges)) 38 | idxes = torch.LongTensor(edges) 39 | values = torch.ones(idxes.size()[1]) * -1.0 40 | 41 | added = torch.sparse.FloatTensor(idxes, values, StaticGraph.get_gsize()) 42 | if cmd_args.ctx == 'gpu': 43 | added = added.cuda() 44 | 45 | new_adj = gcn.norm_tool.norm_extra(added) 46 | yield Variable(new_adj) 47 | else: 48 | yield orig_adj 49 | 50 | if __name__ == '__main__': 51 | random.seed(cmd_args.seed) 52 | np.random.seed(cmd_args.seed) 53 | torch.manual_seed(cmd_args.seed) 54 | 55 | # features, labels, idx_train, idx_val, idx_test = load_binary_data(cmd_args.data_folder + '/' + cmd_args.dataset, cmd_args.dataset) 56 | features, labels, idx_train, idx_val, idx_test = load_txt_data(cmd_args.data_folder + '/' + cmd_args.dataset, cmd_args.dataset) 57 | features = Variable( features ) 58 | labels = Variable( torch.LongTensor( np.argmax(labels, axis=1) ) ) 59 | 60 | if cmd_args.gm == 'mean_field': 61 | mod = S2VNodeClassifier 62 | elif cmd_args.gm == 'gcn': 63 | mod = GCNModule 64 | if cmd_args.saved_model is not None: 65 | print('loading') 66 | with open('%s-args.pkl' % cmd_args.saved_model, 'rb') as f: 67 | base_args = cp.load(f) 68 | gcn = mod(**vars(base_args)) 69 | gcn.load_state_dict(torch.load(cmd_args.saved_model+ '.model')) 70 | else: 71 | gcn = mod(**vars(cmd_args)) 72 | 73 | orig_adj = Variable( gcn.norm_tool.normed_adj ) 74 | 75 | if cmd_args.ctx == 'gpu': 76 | gcn = gcn.cuda() 77 | labels = labels.cuda() 78 | 79 | if cmd_args.phase == 'test': 80 | run_test(gcn, features, orig_adj, idx_test, labels) 81 | sys.exit() 82 | 83 | optimizer = optim.Adam(gcn.parameters(), lr=cmd_args.learning_rate, weight_decay=cmd_args.weight_decay) 84 | best_val = None 85 | gen = adj_generator() 86 | for epoch in range(cmd_args.num_epochs): 87 | t = time.time() 88 | gcn.train() 89 | optimizer.zero_grad() 90 | cur_adj = next(gen) 91 | _, loss_train, acc_train = gcn(features, cur_adj, idx_train, labels) 92 | acc_train = acc_train.sum() / float(len(idx_train)) 93 | loss_train.backward() 94 | optimizer.step() 95 | 96 | gcn.eval() 97 | _, loss_val, acc_val = gcn(features, orig_adj, idx_val, labels) 98 | acc_val = acc_val.sum() / float(len(idx_val)) 99 | 100 | print('Epoch: {:04d}'.format(epoch+1), 101 | 'loss_train: {:.4f}'.format(loss_train.data[0]), 102 | 'acc_train: {:.4f}'.format(acc_train), 103 | 'loss_val: {:.4f}'.format(loss_val.data[0]), 104 | 'acc_val: {:.4f}'.format(acc_val), 105 | 'time: {:.4f}s'.format(time.time() - t)) 106 | 107 | if best_val is None or acc_val > best_val: 108 | best_val = acc_val 109 | print('----saving to best model since this is the best valid loss so far.----') 110 | torch.save(gcn.state_dict(), cmd_args.save_dir + '/model-%s-epoch-best-%.2f.model' % (cmd_args.gm, cmd_args.del_rate)) 111 | save_args(cmd_args.save_dir + '/model-%s-epoch-best-%.2f-args.pkl' % (cmd_args.gm, cmd_args.del_rate), cmd_args) 112 | 113 | run_test(gcn, features, orig_adj, idx_test, labels) 114 | # pred = gcn(features, adh) 115 | -------------------------------------------------------------------------------- /code/node_classification/gcn_modules.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import sys 5 | import numpy as np 6 | import torch 7 | import networkx as nx 8 | import random 9 | from torch.autograd import Variable 10 | from torch.nn.parameter import Parameter 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torch.optim as optim 14 | from tqdm import tqdm 15 | import cPickle as cp 16 | 17 | sys.path.append('%s/../../pytorch_structure2vec/s2v_lib' % os.path.dirname(os.path.realpath(__file__))) 18 | from pytorch_util import weights_init 19 | 20 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 21 | from graph_embedding import gnn_spmm 22 | 23 | from node_utils import GraphNormTool 24 | 25 | class GraphConvolution(nn.Module): 26 | def __init__(self, in_features, out_features, bias=True): 27 | super(GraphConvolution, self).__init__() 28 | self.in_features = in_features 29 | self.out_features = out_features 30 | self.weight = Parameter(torch.Tensor(in_features, out_features)) 31 | if bias: 32 | self.bias = Parameter(torch.Tensor(out_features)) 33 | else: 34 | self.register_parameter('bias', None) 35 | 36 | weights_init(self) 37 | 38 | def forward(self, input, adj): 39 | if input.data.is_sparse: 40 | support = gnn_spmm(input, self.weight) 41 | else: 42 | support = torch.mm(input, self.weight) 43 | 44 | output = gnn_spmm(adj, support) 45 | if self.bias is not None: 46 | return output + self.bias 47 | else: 48 | return output 49 | 50 | def __repr__(self): 51 | return self.__class__.__name__ + ' (' \ 52 | + str(self.in_features) + ' -> ' \ 53 | + str(self.out_features) + ')' 54 | 55 | class GCNModule(nn.Module): 56 | def __init__(self, **kwargs): 57 | super(GCNModule, self).__init__() 58 | self.gc1 = GraphConvolution(kwargs['feature_dim'], kwargs['latent_dim']) 59 | self.gc2 = GraphConvolution(kwargs['latent_dim'], kwargs['num_class']) 60 | self.dropout_rate = kwargs['dropout'] 61 | self.norm_tool = GraphNormTool(kwargs['adj_norm'], 'gcn') 62 | 63 | def forward(self, x, adj, node_selector = None, labels = None, avg_loss = True): 64 | x = F.relu(self.gc1(x, adj)) 65 | x = F.dropout(x, self.dropout_rate, training=self.training) 66 | x = self.gc2(x, adj) 67 | logits = F.log_softmax(x, dim=1) 68 | 69 | if node_selector is not None: 70 | logits = logits[node_selector] 71 | 72 | if labels is not None: 73 | if node_selector is not None: 74 | labels = labels[node_selector] 75 | loss = F.nll_loss(logits, labels, reduce=avg_loss) 76 | pred = logits.data.max(1, keepdim=True)[1] 77 | acc = pred.eq(labels.data.view_as(pred)).cpu() 78 | return pred, loss, acc 79 | else: 80 | return pred 81 | 82 | class S2VNodeClassifier(nn.Module): 83 | def __init__(self, **kwargs): 84 | super(S2VNodeClassifier, self).__init__() 85 | self.w_n2l = Parameter(torch.Tensor(kwargs['feature_dim'], kwargs['latent_dim'])) 86 | self.bias_n2l = Parameter(torch.Tensor(kwargs['latent_dim'])) 87 | self.conv_params = nn.Linear(kwargs['latent_dim'], kwargs['latent_dim']) 88 | self.max_lv = kwargs['max_lv'] 89 | self.dropout_rate = kwargs['dropout'] 90 | self.last_weights = nn.Linear(kwargs['latent_dim'], kwargs['num_class']) 91 | self.norm_tool = GraphNormTool(kwargs['adj_norm'], 'mean_field') 92 | 93 | weights_init(self) 94 | 95 | def forward(self, x, adj, node_selector = None, labels = None, avg_loss = True): 96 | if x.data.is_sparse: 97 | input_node_linear = gnn_spmm(x, self.w_n2l) 98 | else: 99 | input_node_linear = torch.mm(x, self.w_n2l) 100 | input_node_linear += self.bias_n2l 101 | 102 | # n2npool = gnn_spmm(adj, input_node_linear) 103 | # cur_message_layer = F.relu(n2npool) 104 | # cur_message_layer = F.dropout(cur_message_layer, self.dropout_rate, training=self.training) 105 | 106 | # node_embed = gnn_spmm(adj, cur_message_layer) 107 | 108 | input_potential = F.relu(input_node_linear) 109 | lv = 0 110 | node_embed = input_potential 111 | while lv < self.max_lv: 112 | n2npool = gnn_spmm(adj, node_embed) 113 | node_linear = self.conv_params( n2npool ) 114 | merged_linear = node_linear + input_node_linear 115 | node_embed = F.relu(merged_linear) + node_embed 116 | lv += 1 117 | 118 | logits = self.last_weights(node_embed) 119 | logits = F.log_softmax(logits, dim=1) 120 | 121 | if node_selector is not None: 122 | logits = logits[node_selector] 123 | 124 | if labels is not None: 125 | if node_selector is not None: 126 | labels = labels[node_selector] 127 | loss = F.nll_loss(logits, labels, reduce=avg_loss) 128 | pred = logits.data.max(1, keepdim=True)[1] 129 | acc = pred.eq(labels.data.view_as(pred)).cpu() 130 | return pred, loss, acc 131 | else: 132 | return pred -------------------------------------------------------------------------------- /code/node_classification/node_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pickle as pkl 3 | import networkx as nx 4 | import torch 5 | import scipy.sparse as sp 6 | from scipy.sparse.linalg.eigen.arpack import eigsh 7 | import sys 8 | import os 9 | 10 | sys.path.append('%s/../common/functions' % os.path.dirname(os.path.realpath(__file__))) 11 | sys.path.append('%s/../common' % os.path.dirname(os.path.realpath(__file__))) 12 | from functions.custom_func import GraphLaplacianNorm, GraphDegreeNorm 13 | from cmd_args import cmd_args 14 | 15 | class StaticGraph(object): 16 | graph = None 17 | 18 | @staticmethod 19 | def get_gsize(): 20 | return torch.Size( (len(StaticGraph.graph), len(StaticGraph.graph)) ) 21 | 22 | class GraphNormTool(object): 23 | def __init__(self, adj_norm, gm): 24 | self.adj_norm = adj_norm 25 | self.gm = gm 26 | g = StaticGraph.graph 27 | 28 | edges = np.array(g.edges(), dtype=np.int64) 29 | rev_edges = np.array([edges[:, 1], edges[:, 0]], dtype=np.int64) 30 | self_edges = np.array([range(len(g)), range(len(g))], dtype=np.int64) 31 | 32 | edges = np.hstack((edges.T, rev_edges, self_edges)) 33 | 34 | idxes = torch.LongTensor(edges) 35 | values = torch.ones(idxes.size()[1]) 36 | 37 | self.raw_adj = torch.sparse.FloatTensor(idxes, values, StaticGraph.get_gsize()) 38 | if cmd_args.ctx == 'gpu': 39 | self.raw_adj = self.raw_adj.cuda() 40 | 41 | self.normed_adj = self.raw_adj.clone() 42 | if self.adj_norm: 43 | if self.gm == 'gcn': 44 | GraphLaplacianNorm(self.normed_adj) 45 | else: 46 | GraphDegreeNorm(self.normed_adj) 47 | 48 | def norm_extra(self, added_adj = None): 49 | if added_adj is None: 50 | return self.normed_adj 51 | 52 | new_adj = self.raw_adj + added_adj 53 | if self.adj_norm: 54 | if self.gm == 'gcn': 55 | GraphLaplacianNorm(new_adj) 56 | else: 57 | GraphDegreeNorm(new_adj) 58 | return new_adj 59 | 60 | def parse_index_file(filename): 61 | """Parse index file.""" 62 | index = [] 63 | for line in open(filename): 64 | index.append(int(line.strip())) 65 | return index 66 | 67 | def load_raw_graph(data_folder, dataset_str): 68 | bin_file = "{}/ind.{}.{}".format(data_folder, dataset_str, 'graph') 69 | if os.path.isfile(bin_file): 70 | with open(bin_file, 'rb') as f: 71 | if sys.version_info > (3, 0): 72 | graph = pkl.load(f, encoding='latin1') 73 | else: 74 | graph = pkl.load(f) 75 | else: 76 | txt_file = data_folder + '/adj_list.txt' 77 | graph = {} 78 | with open(txt_file, 'r') as f: 79 | cur_idx = 0 80 | for row in f: 81 | row = row.strip().split() 82 | adjs = [] 83 | for j in range(1, len(row)): 84 | adjs.append(int(row[j])) 85 | graph[cur_idx] = adjs 86 | cur_idx += 1 87 | 88 | return graph 89 | 90 | def load_binary_data(data_folder, dataset_str): 91 | """Load data.""" 92 | names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph'] 93 | objects = [] 94 | for i in range(len(names)): 95 | with open("{}/ind.{}.{}".format(data_folder, dataset_str, names[i]), 'rb') as f: 96 | if sys.version_info > (3, 0): 97 | objects.append(pkl.load(f, encoding='latin1')) 98 | else: 99 | objects.append(pkl.load(f)) 100 | 101 | x, y, tx, ty, allx, ally, graph = tuple(objects) 102 | test_idx_reorder = parse_index_file("{}/ind.{}.test.index".format(data_folder, dataset_str)) 103 | test_idx_range = np.sort(test_idx_reorder) 104 | 105 | if dataset_str == 'citeseer': 106 | # Fix citeseer dataset (there are some isolated nodes in the graph) 107 | # Find isolated nodes, add them as zero-vecs into the right position 108 | test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder)+1) 109 | tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1])) 110 | tx_extended[test_idx_range-min(test_idx_range), :] = tx 111 | tx = tx_extended 112 | ty_extended = np.zeros((len(test_idx_range_full), y.shape[1])) 113 | ty_extended[test_idx_range-min(test_idx_range), :] = ty 114 | ty = ty_extended 115 | 116 | features = sp.vstack((allx, tx)).tolil() 117 | features[test_idx_reorder, :] = features[test_idx_range, :] 118 | 119 | StaticGraph.graph = nx.from_dict_of_lists(graph) 120 | 121 | labels = np.vstack((ally, ty)) 122 | labels[test_idx_reorder, :] = labels[test_idx_range, :] 123 | 124 | idx_test = test_idx_range.tolist() 125 | idx_train = range(len(y)) 126 | idx_val = range(len(y), len(y)+500) 127 | 128 | cmd_args.feature_dim = features.shape[1] 129 | cmd_args.num_class = labels.shape[1] 130 | return preprocess_features(features), labels, idx_train, idx_val, idx_test 131 | 132 | def load_txt_data(data_folder, dataset_str): 133 | idx_train = list(np.loadtxt(data_folder + '/train_idx.txt', dtype=int)) 134 | idx_val = list(np.loadtxt(data_folder + '/val_idx.txt', dtype=int)) 135 | idx_test = list(np.loadtxt(data_folder + '/test_idx.txt', dtype=int)) 136 | labels = np.loadtxt(data_folder + '/label.txt') 137 | 138 | with open(data_folder + '/meta.txt', 'r') as f: 139 | num_nodes, cmd_args.num_class, cmd_args.feature_dim = [int(w) for w in f.readline().strip().split()] 140 | 141 | graph = load_raw_graph(data_folder, dataset_str) 142 | assert len(graph) == num_nodes 143 | StaticGraph.graph = nx.from_dict_of_lists(graph) 144 | 145 | row_ptr = [] 146 | col_idx = [] 147 | vals = [] 148 | with open(data_folder + '/features.txt', 'r') as f: 149 | nnz = 0 150 | for row in f: 151 | row = row.strip().split() 152 | row_ptr.append(nnz) 153 | for i in range(1, len(row)): 154 | w = row[i].split(':') 155 | col_idx.append(int(w[0])) 156 | vals.append(float(w[1])) 157 | nnz += int(row[0]) 158 | row_ptr.append(nnz) 159 | assert len(col_idx) == len(vals) and len(vals) == nnz and len(row_ptr) == num_nodes + 1 160 | 161 | features = sp.csr_matrix((vals, col_idx, row_ptr), shape=(num_nodes, cmd_args.feature_dim)) 162 | 163 | return preprocess_features(features), labels, idx_train, idx_val, idx_test 164 | 165 | 166 | 167 | def sparse_to_tuple(sparse_mx): 168 | """Convert sparse matrix to tuple representation.""" 169 | def to_tuple(mx): 170 | if not sp.isspmatrix_coo(mx): 171 | mx = mx.tocoo() 172 | coords = np.vstack((mx.row, mx.col)).transpose() 173 | values = mx.data 174 | shape = mx.shape 175 | return coords, values, shape 176 | 177 | if isinstance(sparse_mx, list): 178 | for i in range(len(sparse_mx)): 179 | sparse_mx[i] = to_tuple(sparse_mx[i]) 180 | else: 181 | sparse_mx = to_tuple(sparse_mx) 182 | 183 | return sparse_mx 184 | 185 | def preprocess_features(features): 186 | """Row-normalize feature matrix and convert to tuple representation""" 187 | rowsum = np.array(features.sum(1)) 188 | r_inv = np.power(rowsum, -1).flatten() 189 | r_inv[np.isinf(r_inv)] = 0. 190 | r_mat_inv = sp.diags(r_inv) 191 | features = r_mat_inv.dot(features) 192 | sp_tuple = sparse_to_tuple(features) 193 | idxes = torch.LongTensor(sp_tuple[0]).transpose(0, 1).contiguous() 194 | values = torch.Tensor(sp_tuple[1].astype(np.float32)) 195 | 196 | mat = torch.sparse.FloatTensor(idxes, values, torch.Size(sp_tuple[2])) 197 | if cmd_args.ctx == 'gpu': 198 | mat = mat.cuda() 199 | return mat 200 | 201 | def chebyshev_polynomials(adj, k): 202 | """Calculate Chebyshev polynomials up to order k. Return a list of sparse matrices (tuple representation).""" 203 | print("Calculating Chebyshev polynomials up to order {}...".format(k)) 204 | 205 | adj_normalized = normalize_adj(adj) 206 | laplacian = sp.eye(adj.shape[0]) - adj_normalized 207 | largest_eigval, _ = eigsh(laplacian, 1, which='LM') 208 | scaled_laplacian = (2. / largest_eigval[0]) * laplacian - sp.eye(adj.shape[0]) 209 | 210 | t_k = list() 211 | t_k.append(sp.eye(adj.shape[0])) 212 | t_k.append(scaled_laplacian) 213 | 214 | def chebyshev_recurrence(t_k_minus_one, t_k_minus_two, scaled_lap): 215 | s_lap = sp.csr_matrix(scaled_lap, copy=True) 216 | return 2 * s_lap.dot(t_k_minus_one) - t_k_minus_two 217 | 218 | for i in range(2, k+1): 219 | t_k.append(chebyshev_recurrence(t_k[-1], t_k[-2], scaled_laplacian)) 220 | 221 | return sparse_to_tuple(t_k) 222 | 223 | def run_test(gcn, features, adj, idx_test, labels): 224 | gcn.eval() 225 | _, loss_test, acc_test = gcn(features, adj, idx_test, labels) 226 | acc_test = acc_test.sum() / float(len(idx_test)) 227 | print("Test set results:", 228 | "loss= {:.4f}".format(loss_test.data[0]), 229 | "accuracy= {:.4f}".format(acc_test)) 230 | -------------------------------------------------------------------------------- /code/node_classification/run_gcn.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | dropbox=../../dropbox 4 | dataset=pubmed 5 | 6 | data_folder=$dropbox/data 7 | gm=gcn 8 | del_rate=0.00 9 | output_root=$dropbox/scratch/results/node_classification/$dataset 10 | 11 | lr=0.01 12 | max_lv=2 13 | num_epochs=200 14 | hidden=0 15 | 16 | python gcn.py \ 17 | -data_folder $data_folder \ 18 | -del_rate $del_rate \ 19 | -dataset $dataset \ 20 | -gm $gm \ 21 | -hidden $hidden \ 22 | -save_dir $output_root \ 23 | -num_epochs $num_epochs \ 24 | -max_lv $max_lv \ 25 | -learning_rate $lr \ 26 | $@ 27 | --------------------------------------------------------------------------------