├── .gitignore ├── README.md ├── config.py ├── dataset └── AIDS │ ├── AIDS_A.txt │ ├── AIDS_edge_labels.txt │ ├── AIDS_graph_indicator.txt │ ├── AIDS_graph_labels.txt │ ├── AIDS_label_readme.txt │ ├── AIDS_node_attributes.txt │ └── AIDS_node_labels.txt ├── main ├── attack.py ├── benign.py └── example.sh ├── model ├── gat.py ├── gcn.py └── sage.py ├── trojan ├── GTA.py ├── __init__.py ├── input.py └── prop.py └── utils ├── batch.py ├── bkdcdd.py ├── datareader.py ├── graph.py └── mask.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Configuration files 2 | .vscode 3 | *.pyc 4 | 5 | # Temp files 6 | __pycache__ 7 | .ipynb_checkpoints 8 | 9 | # Scripts 10 | *.log 11 | archive 12 | config 13 | utils_org 14 | prepare 15 | save 16 | main/android 17 | trojan/transGTA.py 18 | # git rm -rf --cached . -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GraphBackdoor 2 | 3 | This is a light-weight implementation of our **USENIX Security'21** paper **[Graph Backdoor](https://arxiv.org/abs/2006.11890)**. To be convenient for relevant projects, we simplify following functionalities with a higher running efficiency: 4 | 5 | - **GNNs**: now we use DGL-based framework to implement our GNN, which has better memory occupation and running speed. For more information about DGL, see **Useful resources**. 6 | - **graph encoding**: using pretrained attention network causes additional time cost. We find that directly aggregating input-space (feature/topology) matrices can also lead to a good input representation. Please see `./trojan/input.py` 7 | - **blending function**: re-searching a subgraph to blend trigger has high cost especially on large graphs. Instead, one can always blend a generated trigger in a fixed region. 8 | - **optimization objective**: we find the output-end optimization (based on labels) can realize similar attack efficacy comparing with imtermediate activations, but can significantly simplify the implementation. Thus we change to use label-level objective. 9 | 10 | If you aim to compare the performance between this work and your novel attacks, or develop a defense against it, feel free to use this release on your work due to its easier accessibility and higher efficiency. 11 | 12 | ## Guide 13 | 14 | We organize the structure of our files as follows: 15 | ```latex 16 | . 17 | ├── dataset/ # keep all original dataset you may use 18 | ├── main/ 19 | │   ├── attack.py # end-to-end attack codes 20 | │   ├── benign.py # benign training/evaluation codes 21 | │ └── example.sh # examples of running commands 22 | ├── model/ 23 | │   ├── gcn.py # dgl-based GCN 24 | │   └── sage.py # dgl-based GraphSAGE 25 | ├── save/ # temporary dir to save your trained models/perturbed data 26 | ├── utils/ 27 | │   ├── batch.py # collate_batch function 28 | │   ├── bkdcdd.py # codes to select victim graphs and trigger regions 29 | │ ├── datareader.sh # data loader codes 30 | │   ├── graph.py # simple utility function(s) related to graph processing 31 | │   └── mask.py # the mask functions to scale graphs into same size or scale back 32 | └── config.py  # all configurations 33 | 34 | ``` 35 | 36 | ## Required packages 37 | - torch 1.5.1 38 | - dgl 0.4.2 39 | 40 | 41 | ## Useful resources: 42 | - [TU graph set](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets): most of our datasets come from this source. In some cases, we need to change the graph set such as remove some classes without enough instances, or remove graphs with small node scale. 43 | - [DGL](https://docs.dgl.ai): we use DGL to implement our GNNs in this released version, because it has some high-efficient implementations such as [GCN](https://docs.dgl.ai/en/0.6.x/tutorials/models/1_gnn/1_gcn.html), [GAT](https://docs.dgl.ai/en/0.4.x/tutorials/models/1_gnn/9_gat.html), [GraphSAGE](https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/model.py). 44 | - [TU graph datareader](https://github.com/bknyaz/graph_nn/blob/master/graph_unet.py): this repo implements a data loader to process TU graph datasets under their raw storage formats. Our `./utils/datareader.py` and `./utils/batch.py` contain the modified codes and we appreciate the authors' efforts! 45 | 46 | 47 | ## Run the code 48 | You can directly run the attack by `python -u ./main/attack.py --use_org_node_attr --train_verbose --dataset --target_class `. We put some example commands in `./main/example.sh`. 49 | 50 | 51 | ## Cite 52 | Please cite our paper if it is helpful in your own work: 53 | ``` 54 | @inproceedings{xi2021graph, 55 | title={Graph backdoor}, 56 | author={Xi, Zhaohan and Pang, Ren and Ji, Shouling and Wang, Ting}, 57 | booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)}, 58 | year={2021} 59 | } 60 | ``` -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | def add_data_group(group): 4 | group.add_argument('--seed', type=int, default=123) 5 | group.add_argument('--dataset', type=str, default='AIDS', help="used dataset") 6 | group.add_argument('--data_path', type=str, default='../dataset', help="the directory used to save dataset") 7 | group.add_argument('--use_nlabel_asfeat', action='store_true', help="use node labels as (part of) node features") 8 | group.add_argument('--use_org_node_attr', action='store_true', help="use node attributes as (part of) node features") 9 | group.add_argument('--use_degree_asfeat', action='store_true', help="use node degrees as (part of) node features") 10 | group.add_argument('--data_verbose', action='store_true', help="print detailed dataset info") 11 | group.add_argument('--save_data', action='store_true') 12 | 13 | 14 | def add_model_group(group): 15 | group.add_argument('--model', type=str, default='gcn', help="used model") 16 | group.add_argument('--train_ratio', type=float, default=0.5, help="ratio of trainset from whole dataset") 17 | group.add_argument('--hidden_dim', nargs='+', default=[64, 16], type=int, help='constrain how much products a vendor can have') 18 | group.add_argument('--num_head', type=int, default=2, help="GAT head number") 19 | 20 | group.add_argument('--batch_size', type=int, default=16) 21 | group.add_argument('--train_epochs', type=int, default=40) 22 | group.add_argument('--lr', type=float, default=0.01) 23 | group.add_argument('--lr_decay_steps', nargs='+', default=[25, 35], type=int) 24 | group.add_argument('--weight_decay', type=float, default=5e-4) 25 | group.add_argument('--dropout', type=float, default=0.5) 26 | group.add_argument('--train_verbose', action='store_true', help="print training details") 27 | group.add_argument('--log_every', type=int, default=1, help='print every x epoch') 28 | group.add_argument('--eval_every', type=int, default=5, help='evaluate every x epoch') 29 | 30 | group.add_argument('--clean_model_save_path', type=str, default='../save/model/clean') 31 | group.add_argument('--save_clean_model', action='store_true') 32 | 33 | def add_atk_group(group): 34 | group.add_argument('--bkd_gratio_train', type=float, default=0.1, help="backdoor graph ratio in trainset") 35 | group.add_argument('--bkd_gratio_test', type=float, default=0.5, help="backdoor graph ratio in testset") 36 | group.add_argument('--bkd_num_pergraph', type=int, default=1, help="number of backdoor triggers per graph") 37 | group.add_argument('--bkd_size', type=int, default=5, help="number of nodes for each trigger") 38 | group.add_argument('--target_class', type=int, default=None, help="the targeted node/graph label") 39 | 40 | group.add_argument('--gtn_layernum', type=int, default=3, help="layer number of GraphTrojanNet") 41 | group.add_argument('--pn_rate', type=float, default=1, help="ratio between trigger-embedded graphs (positive) and benign ones (negative)") 42 | group.add_argument('--gtn_input_type', type=str, default='2hop', help="how to process org graphs before inputting to GTN") 43 | 44 | group.add_argument('--resample_steps', type=int, default=3, help="# iterations to re-select graph samples") 45 | group.add_argument('--bilevel_steps', type=int, default=4, help="# bi-level optimization iterations") 46 | group.add_argument('--gtn_lr', type=float, default=0.01) 47 | group.add_argument('--gtn_epochs', type=int, default=20, help="# attack epochs") 48 | group.add_argument('--topo_activation', type=str, default='sigmoid', help="activation function for topology generator") 49 | group.add_argument('--feat_activation', type=str, default='relu', help="activation function for feature generator") 50 | group.add_argument('--topo_thrd', type=float, default=0.5, help="threshold for topology generator") 51 | group.add_argument('--feat_thrd', type=float, default=0, help="threshold for feature generator (only useful for binary feature)") 52 | 53 | group.add_argument('--lambd', type=float, default=1, help="a hyperparameter to balance attack loss components") 54 | # group.add_argument('--atk_verbose', action='store_true', help="print attack details") 55 | group.add_argument('--save_bkd_model', action='store_true') 56 | group.add_argument('--bkd_model_save_path', type=str, default='../save/model/bkd') 57 | 58 | def parse_args(): 59 | parser = argparse.ArgumentParser() 60 | data_group = parser.add_argument_group(title="Data-related configuration") 61 | model_group = parser.add_argument_group(title="Model-related configuration") 62 | atk_group = parser.add_argument_group(title="Attack-related configuration") 63 | 64 | add_data_group(data_group) 65 | add_model_group(model_group) 66 | add_atk_group(atk_group) 67 | 68 | return parser.parse_args() 69 | -------------------------------------------------------------------------------- /dataset/AIDS/AIDS_graph_labels.txt: -------------------------------------------------------------------------------- 1 | 0 2 | 1 3 | 1 4 | 1 5 | 0 6 | 1 7 | 1 8 | 1 9 | 0 10 | 1 11 | 0 12 | 1 13 | 1 14 | 1 15 | 0 16 | 1 17 | 1 18 | 1 19 | 0 20 | 1 21 | 1 22 | 1 23 | 1 24 | 1 25 | 1 26 | 1 27 | 1 28 | 0 29 | 1 30 | 1 31 | 1 32 | 1 33 | 1 34 | 1 35 | 1 36 | 0 37 | 1 38 | 1 39 | 1 40 | 1 41 | 1 42 | 1 43 | 0 44 | 0 45 | 1 46 | 1 47 | 1 48 | 1 49 | 1 50 | 1 51 | 1 52 | 1 53 | 1 54 | 1 55 | 1 56 | 1 57 | 0 58 | 1 59 | 0 60 | 1 61 | 1 62 | 1 63 | 1 64 | 0 65 | 1 66 | 1 67 | 1 68 | 1 69 | 0 70 | 1 71 | 1 72 | 1 73 | 1 74 | 1 75 | 1 76 | 1 77 | 0 78 | 1 79 | 1 80 | 1 81 | 1 82 | 1 83 | 1 84 | 1 85 | 1 86 | 1 87 | 1 88 | 1 89 | 1 90 | 1 91 | 1 92 | 1 93 | 1 94 | 1 95 | 1 96 | 1 97 | 1 98 | 1 99 | 0 100 | 1 101 | 1 102 | 1 103 | 1 104 | 1 105 | 1 106 | 1 107 | 1 108 | 1 109 | 1 110 | 1 111 | 1 112 | 0 113 | 1 114 | 0 115 | 1 116 | 1 117 | 1 118 | 1 119 | 1 120 | 1 121 | 0 122 | 1 123 | 1 124 | 1 125 | 1 126 | 0 127 | 1 128 | 1 129 | 1 130 | 1 131 | 1 132 | 0 133 | 1 134 | 1 135 | 1 136 | 1 137 | 1 138 | 0 139 | 1 140 | 0 141 | 1 142 | 0 143 | 1 144 | 1 145 | 1 146 | 0 147 | 1 148 | 1 149 | 1 150 | 1 151 | 0 152 | 1 153 | 1 154 | 1 155 | 1 156 | 0 157 | 1 158 | 1 159 | 1 160 | 0 161 | 1 162 | 1 163 | 0 164 | 1 165 | 1 166 | 1 167 | 0 168 | 0 169 | 1 170 | 1 171 | 1 172 | 0 173 | 0 174 | 0 175 | 1 176 | 1 177 | 1 178 | 1 179 | 1 180 | 1 181 | 1 182 | 1 183 | 1 184 | 1 185 | 0 186 | 1 187 | 1 188 | 0 189 | 0 190 | 1 191 | 1 192 | 1 193 | 1 194 | 1 195 | 0 196 | 1 197 | 1 198 | 1 199 | 1 200 | 1 201 | 1 202 | 1 203 | 1 204 | 0 205 | 1 206 | 0 207 | 1 208 | 1 209 | 0 210 | 1 211 | 1 212 | 1 213 | 1 214 | 1 215 | 1 216 | 1 217 | 1 218 | 1 219 | 1 220 | 1 221 | 0 222 | 1 223 | 1 224 | 1 225 | 1 226 | 1 227 | 1 228 | 0 229 | 1 230 | 1 231 | 0 232 | 1 233 | 1 234 | 1 235 | 1 236 | 1 237 | 0 238 | 1 239 | 0 240 | 1 241 | 1 242 | 0 243 | 1 244 | 1 245 | 1 246 | 0 247 | 1 248 | 1 249 | 1 250 | 1 251 | 1 252 | 1 253 | 1 254 | 1 255 | 1 256 | 1 257 | 1 258 | 1 259 | 0 260 | 1 261 | 1 262 | 0 263 | 1 264 | 1 265 | 1 266 | 1 267 | 0 268 | 1 269 | 1 270 | 1 271 | 1 272 | 1 273 | 1 274 | 1 275 | 1 276 | 1 277 | 1 278 | 1 279 | 1 280 | 1 281 | 1 282 | 1 283 | 1 284 | 1 285 | 1 286 | 1 287 | 0 288 | 1 289 | 1 290 | 1 291 | 0 292 | 1 293 | 1 294 | 1 295 | 1 296 | 1 297 | 1 298 | 1 299 | 1 300 | 1 301 | 1 302 | 1 303 | 1 304 | 1 305 | 1 306 | 0 307 | 0 308 | 1 309 | 1 310 | 1 311 | 1 312 | 1 313 | 1 314 | 1 315 | 1 316 | 1 317 | 1 318 | 1 319 | 1 320 | 1 321 | 1 322 | 1 323 | 1 324 | 1 325 | 1 326 | 1 327 | 1 328 | 1 329 | 1 330 | 1 331 | 0 332 | 1 333 | 1 334 | 0 335 | 1 336 | 1 337 | 1 338 | 1 339 | 1 340 | 0 341 | 1 342 | 1 343 | 0 344 | 0 345 | 1 346 | 1 347 | 1 348 | 1 349 | 1 350 | 1 351 | 1 352 | 1 353 | 1 354 | 1 355 | 1 356 | 0 357 | 1 358 | 1 359 | 1 360 | 1 361 | 0 362 | 1 363 | 1 364 | 0 365 | 0 366 | 1 367 | 1 368 | 1 369 | 1 370 | 1 371 | 1 372 | 1 373 | 1 374 | 1 375 | 1 376 | 1 377 | 1 378 | 1 379 | 1 380 | 1 381 | 1 382 | 1 383 | 0 384 | 0 385 | 1 386 | 1 387 | 1 388 | 1 389 | 1 390 | 1 391 | 0 392 | 1 393 | 0 394 | 1 395 | 1 396 | 1 397 | 1 398 | 1 399 | 0 400 | 1 401 | 1 402 | 1 403 | 1 404 | 1 405 | 0 406 | 1 407 | 0 408 | 0 409 | 1 410 | 1 411 | 0 412 | 1 413 | 1 414 | 1 415 | 1 416 | 1 417 | 0 418 | 1 419 | 1 420 | 1 421 | 1 422 | 0 423 | 1 424 | 1 425 | 0 426 | 1 427 | 1 428 | 1 429 | 1 430 | 0 431 | 1 432 | 1 433 | 1 434 | 0 435 | 1 436 | 0 437 | 1 438 | 1 439 | 1 440 | 1 441 | 1 442 | 1 443 | 1 444 | 0 445 | 1 446 | 1 447 | 1 448 | 1 449 | 1 450 | 1 451 | 1 452 | 1 453 | 1 454 | 0 455 | 1 456 | 0 457 | 1 458 | 1 459 | 0 460 | 1 461 | 1 462 | 0 463 | 1 464 | 1 465 | 1 466 | 1 467 | 1 468 | 1 469 | 1 470 | 0 471 | 0 472 | 1 473 | 0 474 | 1 475 | 1 476 | 1 477 | 1 478 | 1 479 | 1 480 | 1 481 | 1 482 | 1 483 | 1 484 | 1 485 | 1 486 | 1 487 | 1 488 | 1 489 | 1 490 | 0 491 | 1 492 | 1 493 | 0 494 | 1 495 | 1 496 | 1 497 | 1 498 | 1 499 | 1 500 | 1 501 | 0 502 | 1 503 | 1 504 | 1 505 | 0 506 | 1 507 | 1 508 | 1 509 | 0 510 | 1 511 | 1 512 | 1 513 | 1 514 | 1 515 | 1 516 | 1 517 | 1 518 | 0 519 | 1 520 | 1 521 | 1 522 | 0 523 | 1 524 | 1 525 | 1 526 | 1 527 | 1 528 | 0 529 | 1 530 | 1 531 | 1 532 | 1 533 | 1 534 | 1 535 | 1 536 | 1 537 | 0 538 | 0 539 | 1 540 | 0 541 | 1 542 | 1 543 | 1 544 | 1 545 | 1 546 | 0 547 | 1 548 | 1 549 | 1 550 | 1 551 | 1 552 | 1 553 | 0 554 | 1 555 | 1 556 | 0 557 | 1 558 | 1 559 | 1 560 | 1 561 | 1 562 | 1 563 | 1 564 | 1 565 | 1 566 | 1 567 | 1 568 | 1 569 | 1 570 | 1 571 | 0 572 | 0 573 | 1 574 | 1 575 | 0 576 | 1 577 | 1 578 | 1 579 | 1 580 | 1 581 | 1 582 | 1 583 | 1 584 | 0 585 | 0 586 | 1 587 | 1 588 | 1 589 | 1 590 | 0 591 | 1 592 | 1 593 | 1 594 | 1 595 | 1 596 | 1 597 | 1 598 | 1 599 | 1 600 | 0 601 | 1 602 | 0 603 | 1 604 | 0 605 | 1 606 | 1 607 | 0 608 | 1 609 | 1 610 | 1 611 | 1 612 | 0 613 | 1 614 | 1 615 | 1 616 | 1 617 | 1 618 | 1 619 | 0 620 | 0 621 | 1 622 | 0 623 | 1 624 | 0 625 | 0 626 | 0 627 | 1 628 | 1 629 | 1 630 | 0 631 | 1 632 | 0 633 | 1 634 | 0 635 | 1 636 | 1 637 | 1 638 | 1 639 | 1 640 | 1 641 | 1 642 | 1 643 | 1 644 | 0 645 | 0 646 | 1 647 | 0 648 | 1 649 | 1 650 | 1 651 | 1 652 | 0 653 | 1 654 | 1 655 | 1 656 | 1 657 | 1 658 | 1 659 | 1 660 | 1 661 | 0 662 | 1 663 | 0 664 | 1 665 | 1 666 | 1 667 | 0 668 | 1 669 | 1 670 | 1 671 | 1 672 | 0 673 | 1 674 | 1 675 | 1 676 | 1 677 | 1 678 | 1 679 | 1 680 | 1 681 | 1 682 | 1 683 | 1 684 | 1 685 | 1 686 | 1 687 | 1 688 | 1 689 | 0 690 | 1 691 | 1 692 | 1 693 | 1 694 | 1 695 | 1 696 | 1 697 | 1 698 | 1 699 | 1 700 | 1 701 | 1 702 | 0 703 | 1 704 | 1 705 | 0 706 | 1 707 | 1 708 | 1 709 | 1 710 | 0 711 | 1 712 | 1 713 | 1 714 | 1 715 | 1 716 | 1 717 | 1 718 | 1 719 | 1 720 | 1 721 | 0 722 | 0 723 | 1 724 | 1 725 | 1 726 | 1 727 | 1 728 | 1 729 | 1 730 | 1 731 | 0 732 | 0 733 | 0 734 | 0 735 | 0 736 | 1 737 | 1 738 | 1 739 | 1 740 | 1 741 | 1 742 | 1 743 | 1 744 | 1 745 | 1 746 | 1 747 | 1 748 | 1 749 | 1 750 | 1 751 | 1 752 | 0 753 | 0 754 | 1 755 | 1 756 | 1 757 | 1 758 | 1 759 | 1 760 | 1 761 | 1 762 | 1 763 | 1 764 | 1 765 | 1 766 | 1 767 | 1 768 | 0 769 | 1 770 | 1 771 | 1 772 | 1 773 | 0 774 | 1 775 | 1 776 | 1 777 | 1 778 | 1 779 | 1 780 | 1 781 | 1 782 | 1 783 | 1 784 | 1 785 | 1 786 | 1 787 | 0 788 | 1 789 | 1 790 | 1 791 | 1 792 | 1 793 | 1 794 | 1 795 | 1 796 | 0 797 | 1 798 | 1 799 | 1 800 | 1 801 | 0 802 | 1 803 | 1 804 | 1 805 | 1 806 | 0 807 | 1 808 | 1 809 | 1 810 | 1 811 | 1 812 | 1 813 | 1 814 | 0 815 | 0 816 | 0 817 | 1 818 | 0 819 | 0 820 | 1 821 | 1 822 | 1 823 | 1 824 | 1 825 | 1 826 | 0 827 | 0 828 | 1 829 | 1 830 | 1 831 | 0 832 | 1 833 | 0 834 | 1 835 | 1 836 | 1 837 | 1 838 | 1 839 | 1 840 | 0 841 | 1 842 | 1 843 | 1 844 | 0 845 | 1 846 | 1 847 | 1 848 | 0 849 | 0 850 | 1 851 | 1 852 | 0 853 | 1 854 | 1 855 | 1 856 | 1 857 | 1 858 | 1 859 | 1 860 | 0 861 | 1 862 | 1 863 | 1 864 | 0 865 | 1 866 | 1 867 | 0 868 | 0 869 | 1 870 | 1 871 | 1 872 | 0 873 | 1 874 | 1 875 | 1 876 | 1 877 | 0 878 | 1 879 | 1 880 | 1 881 | 1 882 | 1 883 | 1 884 | 1 885 | 1 886 | 1 887 | 1 888 | 0 889 | 1 890 | 1 891 | 1 892 | 0 893 | 1 894 | 1 895 | 0 896 | 1 897 | 1 898 | 1 899 | 1 900 | 0 901 | 1 902 | 0 903 | 1 904 | 1 905 | 1 906 | 0 907 | 1 908 | 0 909 | 1 910 | 1 911 | 1 912 | 1 913 | 0 914 | 1 915 | 1 916 | 1 917 | 1 918 | 1 919 | 1 920 | 0 921 | 1 922 | 1 923 | 1 924 | 0 925 | 1 926 | 1 927 | 1 928 | 1 929 | 1 930 | 1 931 | 1 932 | 1 933 | 0 934 | 1 935 | 1 936 | 1 937 | 1 938 | 1 939 | 1 940 | 0 941 | 1 942 | 0 943 | 0 944 | 1 945 | 0 946 | 1 947 | 1 948 | 1 949 | 1 950 | 1 951 | 1 952 | 1 953 | 0 954 | 1 955 | 1 956 | 1 957 | 1 958 | 1 959 | 1 960 | 0 961 | 1 962 | 0 963 | 0 964 | 0 965 | 1 966 | 1 967 | 1 968 | 1 969 | 1 970 | 1 971 | 1 972 | 1 973 | 1 974 | 1 975 | 1 976 | 0 977 | 1 978 | 1 979 | 0 980 | 1 981 | 1 982 | 1 983 | 0 984 | 1 985 | 1 986 | 1 987 | 1 988 | 1 989 | 1 990 | 1 991 | 1 992 | 1 993 | 1 994 | 1 995 | 1 996 | 0 997 | 1 998 | 1 999 | 0 1000 | 0 1001 | 1 1002 | 0 1003 | 1 1004 | 1 1005 | 1 1006 | 1 1007 | 1 1008 | 1 1009 | 1 1010 | 0 1011 | 0 1012 | 1 1013 | 1 1014 | 1 1015 | 1 1016 | 1 1017 | 1 1018 | 1 1019 | 1 1020 | 1 1021 | 1 1022 | 1 1023 | 1 1024 | 1 1025 | 0 1026 | 1 1027 | 1 1028 | 0 1029 | 1 1030 | 1 1031 | 1 1032 | 1 1033 | 1 1034 | 1 1035 | 1 1036 | 1 1037 | 1 1038 | 1 1039 | 1 1040 | 1 1041 | 1 1042 | 1 1043 | 1 1044 | 1 1045 | 1 1046 | 1 1047 | 1 1048 | 1 1049 | 1 1050 | 1 1051 | 1 1052 | 1 1053 | 1 1054 | 1 1055 | 0 1056 | 0 1057 | 0 1058 | 1 1059 | 1 1060 | 1 1061 | 1 1062 | 1 1063 | 1 1064 | 1 1065 | 1 1066 | 1 1067 | 1 1068 | 1 1069 | 1 1070 | 1 1071 | 1 1072 | 1 1073 | 1 1074 | 1 1075 | 1 1076 | 1 1077 | 0 1078 | 1 1079 | 1 1080 | 1 1081 | 1 1082 | 1 1083 | 0 1084 | 1 1085 | 1 1086 | 1 1087 | 1 1088 | 1 1089 | 0 1090 | 1 1091 | 1 1092 | 0 1093 | 0 1094 | 0 1095 | 1 1096 | 1 1097 | 1 1098 | 0 1099 | 1 1100 | 1 1101 | 0 1102 | 1 1103 | 1 1104 | 0 1105 | 1 1106 | 1 1107 | 0 1108 | 1 1109 | 1 1110 | 0 1111 | 1 1112 | 1 1113 | 1 1114 | 1 1115 | 1 1116 | 1 1117 | 0 1118 | 1 1119 | 0 1120 | 0 1121 | 1 1122 | 1 1123 | 1 1124 | 0 1125 | 1 1126 | 1 1127 | 1 1128 | 1 1129 | 1 1130 | 1 1131 | 1 1132 | 1 1133 | 1 1134 | 1 1135 | 1 1136 | 1 1137 | 1 1138 | 1 1139 | 0 1140 | 1 1141 | 1 1142 | 1 1143 | 1 1144 | 1 1145 | 1 1146 | 1 1147 | 1 1148 | 1 1149 | 0 1150 | 0 1151 | 1 1152 | 1 1153 | 1 1154 | 0 1155 | 1 1156 | 1 1157 | 1 1158 | 0 1159 | 1 1160 | 1 1161 | 1 1162 | 0 1163 | 1 1164 | 1 1165 | 1 1166 | 0 1167 | 1 1168 | 0 1169 | 1 1170 | 0 1171 | 0 1172 | 1 1173 | 1 1174 | 1 1175 | 1 1176 | 0 1177 | 1 1178 | 1 1179 | 1 1180 | 1 1181 | 1 1182 | 1 1183 | 1 1184 | 1 1185 | 1 1186 | 0 1187 | 1 1188 | 1 1189 | 1 1190 | 0 1191 | 1 1192 | 1 1193 | 1 1194 | 1 1195 | 0 1196 | 1 1197 | 1 1198 | 1 1199 | 0 1200 | 1 1201 | 1 1202 | 1 1203 | 1 1204 | 1 1205 | 1 1206 | 1 1207 | 1 1208 | 1 1209 | 1 1210 | 1 1211 | 0 1212 | 1 1213 | 1 1214 | 1 1215 | 1 1216 | 1 1217 | 1 1218 | 1 1219 | 1 1220 | 1 1221 | 1 1222 | 1 1223 | 1 1224 | 1 1225 | 0 1226 | 1 1227 | 1 1228 | 1 1229 | 0 1230 | 1 1231 | 1 1232 | 0 1233 | 1 1234 | 1 1235 | 1 1236 | 1 1237 | 1 1238 | 0 1239 | 1 1240 | 0 1241 | 1 1242 | 1 1243 | 1 1244 | 1 1245 | 0 1246 | 1 1247 | 1 1248 | 1 1249 | 1 1250 | 1 1251 | 1 1252 | 0 1253 | 0 1254 | 1 1255 | 1 1256 | 0 1257 | 1 1258 | 1 1259 | 0 1260 | 1 1261 | 1 1262 | 1 1263 | 1 1264 | 1 1265 | 0 1266 | 1 1267 | 1 1268 | 0 1269 | 1 1270 | 1 1271 | 0 1272 | 0 1273 | 1 1274 | 1 1275 | 1 1276 | 1 1277 | 1 1278 | 0 1279 | 1 1280 | 1 1281 | 1 1282 | 1 1283 | 1 1284 | 1 1285 | 1 1286 | 0 1287 | 1 1288 | 1 1289 | 1 1290 | 1 1291 | 1 1292 | 1 1293 | 1 1294 | 1 1295 | 0 1296 | 1 1297 | 1 1298 | 1 1299 | 0 1300 | 0 1301 | 0 1302 | 1 1303 | 1 1304 | 1 1305 | 1 1306 | 1 1307 | 1 1308 | 1 1309 | 1 1310 | 1 1311 | 1 1312 | 1 1313 | 1 1314 | 1 1315 | 0 1316 | 1 1317 | 1 1318 | 1 1319 | 1 1320 | 1 1321 | 1 1322 | 1 1323 | 1 1324 | 0 1325 | 1 1326 | 1 1327 | 1 1328 | 1 1329 | 1 1330 | 0 1331 | 1 1332 | 1 1333 | 0 1334 | 0 1335 | 0 1336 | 0 1337 | 1 1338 | 1 1339 | 1 1340 | 1 1341 | 1 1342 | 1 1343 | 1 1344 | 1 1345 | 1 1346 | 0 1347 | 1 1348 | 1 1349 | 1 1350 | 1 1351 | 1 1352 | 1 1353 | 0 1354 | 1 1355 | 1 1356 | 1 1357 | 1 1358 | 0 1359 | 1 1360 | 0 1361 | 1 1362 | 1 1363 | 1 1364 | 1 1365 | 1 1366 | 1 1367 | 0 1368 | 0 1369 | 1 1370 | 0 1371 | 0 1372 | 0 1373 | 1 1374 | 1 1375 | 1 1376 | 0 1377 | 1 1378 | 1 1379 | 1 1380 | 1 1381 | 1 1382 | 1 1383 | 0 1384 | 1 1385 | 0 1386 | 1 1387 | 1 1388 | 1 1389 | 1 1390 | 1 1391 | 1 1392 | 1 1393 | 1 1394 | 1 1395 | 1 1396 | 1 1397 | 1 1398 | 1 1399 | 0 1400 | 1 1401 | 1 1402 | 1 1403 | 1 1404 | 1 1405 | 1 1406 | 1 1407 | 0 1408 | 1 1409 | 1 1410 | 1 1411 | 1 1412 | 1 1413 | 1 1414 | 1 1415 | 1 1416 | 1 1417 | 0 1418 | 1 1419 | 1 1420 | 1 1421 | 1 1422 | 1 1423 | 0 1424 | 1 1425 | 1 1426 | 1 1427 | 1 1428 | 1 1429 | 1 1430 | 1 1431 | 1 1432 | 1 1433 | 0 1434 | 1 1435 | 1 1436 | 1 1437 | 1 1438 | 0 1439 | 1 1440 | 1 1441 | 1 1442 | 0 1443 | 1 1444 | 1 1445 | 1 1446 | 1 1447 | 0 1448 | 1 1449 | 1 1450 | 1 1451 | 1 1452 | 1 1453 | 0 1454 | 1 1455 | 0 1456 | 1 1457 | 1 1458 | 0 1459 | 1 1460 | 1 1461 | 1 1462 | 1 1463 | 1 1464 | 1 1465 | 1 1466 | 0 1467 | 1 1468 | 1 1469 | 1 1470 | 1 1471 | 1 1472 | 1 1473 | 1 1474 | 0 1475 | 1 1476 | 1 1477 | 1 1478 | 1 1479 | 1 1480 | 0 1481 | 1 1482 | 0 1483 | 1 1484 | 1 1485 | 1 1486 | 1 1487 | 1 1488 | 1 1489 | 0 1490 | 1 1491 | 1 1492 | 0 1493 | 1 1494 | 0 1495 | 1 1496 | 1 1497 | 1 1498 | 1 1499 | 0 1500 | 1 1501 | 1 1502 | 1 1503 | 1 1504 | 1 1505 | 1 1506 | 0 1507 | 1 1508 | 1 1509 | 1 1510 | 1 1511 | 1 1512 | 1 1513 | 0 1514 | 0 1515 | 1 1516 | 0 1517 | 0 1518 | 1 1519 | 1 1520 | 1 1521 | 1 1522 | 1 1523 | 0 1524 | 1 1525 | 0 1526 | 1 1527 | 1 1528 | 0 1529 | 1 1530 | 1 1531 | 1 1532 | 0 1533 | 1 1534 | 1 1535 | 0 1536 | 1 1537 | 1 1538 | 0 1539 | 0 1540 | 1 1541 | 1 1542 | 0 1543 | 1 1544 | 1 1545 | 1 1546 | 1 1547 | 1 1548 | 1 1549 | 1 1550 | 1 1551 | 1 1552 | 0 1553 | 1 1554 | 1 1555 | 1 1556 | 0 1557 | 0 1558 | 1 1559 | 0 1560 | 1 1561 | 1 1562 | 1 1563 | 0 1564 | 0 1565 | 1 1566 | 0 1567 | 1 1568 | 1 1569 | 0 1570 | 1 1571 | 1 1572 | 0 1573 | 1 1574 | 1 1575 | 1 1576 | 0 1577 | 1 1578 | 1 1579 | 1 1580 | 1 1581 | 1 1582 | 1 1583 | 1 1584 | 0 1585 | 0 1586 | 0 1587 | 1 1588 | 1 1589 | 1 1590 | 1 1591 | 1 1592 | 1 1593 | 0 1594 | 1 1595 | 0 1596 | 1 1597 | 1 1598 | 1 1599 | 1 1600 | 1 1601 | 1 1602 | 0 1603 | 1 1604 | 1 1605 | 1 1606 | 1 1607 | 1 1608 | 0 1609 | 1 1610 | 1 1611 | 1 1612 | 1 1613 | 1 1614 | 1 1615 | 1 1616 | 0 1617 | 1 1618 | 1 1619 | 1 1620 | 1 1621 | 1 1622 | 1 1623 | 1 1624 | 1 1625 | 1 1626 | 1 1627 | 0 1628 | 1 1629 | 1 1630 | 1 1631 | 1 1632 | 1 1633 | 1 1634 | 0 1635 | 0 1636 | 1 1637 | 1 1638 | 1 1639 | 1 1640 | 1 1641 | 1 1642 | 1 1643 | 1 1644 | 0 1645 | 1 1646 | 1 1647 | 1 1648 | 1 1649 | 0 1650 | 1 1651 | 1 1652 | 1 1653 | 0 1654 | 1 1655 | 1 1656 | 1 1657 | 0 1658 | 1 1659 | 1 1660 | 1 1661 | 1 1662 | 0 1663 | 1 1664 | 1 1665 | 1 1666 | 1 1667 | 1 1668 | 1 1669 | 1 1670 | 1 1671 | 1 1672 | 1 1673 | 0 1674 | 0 1675 | 0 1676 | 1 1677 | 1 1678 | 0 1679 | 1 1680 | 1 1681 | 0 1682 | 1 1683 | 1 1684 | 0 1685 | 1 1686 | 1 1687 | 0 1688 | 1 1689 | 1 1690 | 1 1691 | 1 1692 | 0 1693 | 0 1694 | 1 1695 | 1 1696 | 1 1697 | 1 1698 | 1 1699 | 0 1700 | 1 1701 | 1 1702 | 0 1703 | 1 1704 | 1 1705 | 0 1706 | 1 1707 | 1 1708 | 1 1709 | 1 1710 | 1 1711 | 1 1712 | 1 1713 | 1 1714 | 1 1715 | 0 1716 | 1 1717 | 1 1718 | 1 1719 | 1 1720 | 1 1721 | 1 1722 | 0 1723 | 1 1724 | 1 1725 | 1 1726 | 1 1727 | 1 1728 | 1 1729 | 1 1730 | 1 1731 | 1 1732 | 1 1733 | 1 1734 | 1 1735 | 1 1736 | 1 1737 | 1 1738 | 1 1739 | 1 1740 | 0 1741 | 0 1742 | 1 1743 | 1 1744 | 1 1745 | 0 1746 | 1 1747 | 1 1748 | 1 1749 | 0 1750 | 1 1751 | 0 1752 | 1 1753 | 1 1754 | 1 1755 | 1 1756 | 1 1757 | 1 1758 | 1 1759 | 1 1760 | 1 1761 | 1 1762 | 1 1763 | 0 1764 | 1 1765 | 1 1766 | 1 1767 | 1 1768 | 1 1769 | 1 1770 | 1 1771 | 0 1772 | 1 1773 | 1 1774 | 1 1775 | 1 1776 | 0 1777 | 1 1778 | 1 1779 | 1 1780 | 0 1781 | 1 1782 | 1 1783 | 0 1784 | 0 1785 | 0 1786 | 1 1787 | 1 1788 | 1 1789 | 1 1790 | 1 1791 | 1 1792 | 1 1793 | 0 1794 | 1 1795 | 1 1796 | 1 1797 | 0 1798 | 1 1799 | 1 1800 | 1 1801 | 1 1802 | 1 1803 | 1 1804 | 0 1805 | 1 1806 | 1 1807 | 1 1808 | 1 1809 | 1 1810 | 0 1811 | 1 1812 | 1 1813 | 1 1814 | 1 1815 | 1 1816 | 1 1817 | 0 1818 | 0 1819 | 1 1820 | 1 1821 | 1 1822 | 1 1823 | 1 1824 | 0 1825 | 1 1826 | 0 1827 | 1 1828 | 0 1829 | 1 1830 | 1 1831 | 1 1832 | 1 1833 | 0 1834 | 1 1835 | 1 1836 | 0 1837 | 1 1838 | 0 1839 | 1 1840 | 1 1841 | 1 1842 | 1 1843 | 1 1844 | 1 1845 | 1 1846 | 1 1847 | 1 1848 | 0 1849 | 0 1850 | 1 1851 | 1 1852 | 1 1853 | 1 1854 | 1 1855 | 1 1856 | 1 1857 | 1 1858 | 1 1859 | 0 1860 | 1 1861 | 1 1862 | 1 1863 | 0 1864 | 1 1865 | 1 1866 | 0 1867 | 1 1868 | 1 1869 | 1 1870 | 0 1871 | 1 1872 | 1 1873 | 0 1874 | 1 1875 | 1 1876 | 1 1877 | 0 1878 | 1 1879 | 1 1880 | 0 1881 | 1 1882 | 1 1883 | 1 1884 | 1 1885 | 1 1886 | 1 1887 | 1 1888 | 1 1889 | 0 1890 | 1 1891 | 0 1892 | 1 1893 | 1 1894 | 1 1895 | 1 1896 | 1 1897 | 0 1898 | 0 1899 | 1 1900 | 1 1901 | 1 1902 | 0 1903 | 1 1904 | 1 1905 | 1 1906 | 1 1907 | 1 1908 | 1 1909 | 1 1910 | 1 1911 | 1 1912 | 1 1913 | 0 1914 | 1 1915 | 0 1916 | 1 1917 | 1 1918 | 1 1919 | 0 1920 | 1 1921 | 1 1922 | 0 1923 | 1 1924 | 1 1925 | 1 1926 | 1 1927 | 1 1928 | 1 1929 | 1 1930 | 1 1931 | 1 1932 | 1 1933 | 1 1934 | 1 1935 | 1 1936 | 1 1937 | 1 1938 | 1 1939 | 1 1940 | 1 1941 | 1 1942 | 1 1943 | 0 1944 | 1 1945 | 1 1946 | 1 1947 | 1 1948 | 0 1949 | 0 1950 | 1 1951 | 1 1952 | 1 1953 | 1 1954 | 1 1955 | 1 1956 | 1 1957 | 1 1958 | 1 1959 | 0 1960 | 1 1961 | 1 1962 | 1 1963 | 1 1964 | 1 1965 | 0 1966 | 1 1967 | 1 1968 | 1 1969 | 1 1970 | 0 1971 | 1 1972 | 1 1973 | 1 1974 | 1 1975 | 1 1976 | 1 1977 | 1 1978 | 0 1979 | 0 1980 | 1 1981 | 0 1982 | 1 1983 | 1 1984 | 1 1985 | 0 1986 | 1 1987 | 1 1988 | 1 1989 | 1 1990 | 1 1991 | 1 1992 | 0 1993 | 1 1994 | 1 1995 | 0 1996 | 1 1997 | 1 1998 | 0 1999 | 0 2000 | 1 2001 | -------------------------------------------------------------------------------- /dataset/AIDS/AIDS_label_readme.txt: -------------------------------------------------------------------------------- 1 | Node labels: [symbol] 2 | 3 | Node attributes: [chem, charge, x, y] 4 | 5 | Edge labels: [valence] 6 | 7 | Node labels were converted to integer values using this map: 8 | 9 | Component 0: 10 | 0 C 11 | 1 O 12 | 2 N 13 | 3 Cl 14 | 4 F 15 | 5 S 16 | 6 Se 17 | 7 P 18 | 8 Na 19 | 9 I 20 | 10 Co 21 | 11 Br 22 | 12 Li 23 | 13 Si 24 | 14 Mg 25 | 15 Cu 26 | 16 As 27 | 17 B 28 | 18 Pt 29 | 19 Ru 30 | 20 K 31 | 21 Pd 32 | 22 Au 33 | 23 Te 34 | 24 W 35 | 25 Rh 36 | 26 Zn 37 | 27 Bi 38 | 28 Pb 39 | 29 Ge 40 | 30 Sb 41 | 31 Sn 42 | 32 Ga 43 | 33 Hg 44 | 34 Ho 45 | 35 Tl 46 | 36 Ni 47 | 37 Tb 48 | 49 | 50 | 51 | Edge labels were converted to integer values using this map: 52 | 53 | Component 0: 54 | 0 1 55 | 1 2 56 | 2 3 57 | 58 | 59 | 60 | Class labels were converted to integer values using this map: 61 | 62 | 0 a 63 | 1 i 64 | 65 | 66 | -------------------------------------------------------------------------------- /main/attack.py: -------------------------------------------------------------------------------- 1 | import sys, os 2 | sys.path.append(os.path.abspath('..')) 3 | 4 | import copy 5 | import numpy as np 6 | from tqdm import tqdm 7 | import torch 8 | import torch.nn as nn 9 | import torch.optim as optim 10 | import torch.nn.functional as F 11 | import torch.optim.lr_scheduler as lr_scheduler 12 | 13 | from utils.datareader import DataReader 14 | from utils.bkdcdd import select_cdd_graphs, select_cdd_nodes 15 | from utils.mask import gen_mask, recover_mask 16 | import main.benign as benign 17 | import trojan.GTA as gta 18 | from trojan.input import gen_input 19 | from trojan.prop import train_model, evaluate 20 | from config import parse_args 21 | 22 | class GraphBackdoor: 23 | def __init__(self, args) -> None: 24 | self.args = args 25 | 26 | assert torch.cuda.is_available(), 'no GPU available' 27 | self.cpu = torch.device('cpu') 28 | self.cuda = torch.device('cuda') 29 | 30 | def run(self): 31 | # train a benign GNN 32 | self.benign_dr, self.benign_model = benign.run(self.args) 33 | model = copy.deepcopy(self.benign_model).to(self.cuda) 34 | # pick up initial candidates 35 | bkd_gids_test, bkd_nids_test, bkd_nid_groups_test = self.bkd_cdd('test') 36 | 37 | nodenums = [adj.shape[0] for adj in self.benign_dr.data['adj_list']] 38 | nodemax = max(nodenums) 39 | featdim = np.array(self.benign_dr.data['features'][0]).shape[1] 40 | 41 | # init two generators for topo/feat 42 | toponet = gta.GraphTrojanNet(nodemax, self.args.gtn_layernum) 43 | featnet = gta.GraphTrojanNet(featdim, self.args.gtn_layernum) 44 | 45 | 46 | # init test data 47 | # NOTE: for data that can only add perturbation on features, only init the topo value 48 | init_dr_test = self.init_trigger( 49 | self.args, copy.deepcopy(self.benign_dr), bkd_gids_test, bkd_nid_groups_test, 0.0, 0.0) 50 | bkd_dr_test = copy.deepcopy(init_dr_test) 51 | 52 | topomask_test, featmask_test = gen_mask( 53 | init_dr_test, bkd_gids_test, bkd_nid_groups_test) 54 | Ainput_test, Xinput_test = gen_input(self.args, init_dr_test, bkd_gids_test) 55 | 56 | for rs_step in range(self.args.resample_steps): # for each step, choose different sample 57 | 58 | # randomly select new graph backdoor samples 59 | bkd_gids_train, bkd_nids_train, bkd_nid_groups_train = self.bkd_cdd('train') 60 | 61 | # positive/negtive sample set 62 | pset = bkd_gids_train 63 | nset = list(set(self.benign_dr.data['splits']['train'])-set(pset)) 64 | 65 | if self.args.pn_rate != None: 66 | if len(pset) > len(nset): 67 | repeat = int(np.ceil(len(pset)/(len(nset)*self.args.pn_rate))) 68 | nset = list(nset) * repeat 69 | else: 70 | repeat = int(np.ceil((len(nset)*self.args.pn_rate)/len(pset))) 71 | pset = list(pset) * repeat 72 | 73 | # init train data 74 | # NOTE: for data that can only add perturbation on features, only init the topo value 75 | init_dr_train = self.init_trigger( 76 | self.args, copy.deepcopy(self.benign_dr), bkd_gids_train, bkd_nid_groups_train, 0.0, 0.0) 77 | bkd_dr_train = copy.deepcopy(init_dr_train) 78 | 79 | topomask_train, featmask_train = gen_mask( 80 | init_dr_train, bkd_gids_train, bkd_nid_groups_train) 81 | Ainput_train, Xinput_train = gen_input(self.args, init_dr_train, bkd_gids_train) 82 | 83 | for bi_step in range(self.args.bilevel_steps): 84 | print("Resampling step %d, bi-level optimization step %d" % (rs_step, bi_step)) 85 | 86 | toponet, featnet = gta.train_gtn( 87 | self.args, model, toponet, featnet, 88 | pset, nset, topomask_train, featmask_train, 89 | init_dr_train, bkd_dr_train, Ainput_train, Xinput_train) 90 | 91 | # get new backdoor datareader for training based on well-trained generators 92 | for gid in bkd_gids_train: 93 | rst_bkdA = toponet( 94 | Ainput_train[gid], topomask_train[gid], self.args.topo_thrd, 95 | self.cpu, self.args.topo_activation, 'topo') 96 | # rst_bkdA = recover_mask(nodenums[gid], topomask_train[gid], 'topo') 97 | # bkd_dr_train.data['adj_list'][gid] = torch.add(rst_bkdA, init_dr_train.data['adj_list'][gid]) 98 | bkd_dr_train.data['adj_list'][gid] = torch.add( 99 | rst_bkdA[:nodenums[gid], :nodenums[gid]].detach().cpu(), 100 | init_dr_train.data['adj_list'][gid]) 101 | 102 | rst_bkdX = featnet( 103 | Xinput_train[gid], featmask_train[gid], self.args.feat_thrd, 104 | self.cpu, self.args.feat_activation, 'feat') 105 | # rst_bkdX = recover_mask(nodenums[gid], featmask_train[gid], 'feat') 106 | # bkd_dr_train.data['features'][gid] = torch.add(rst_bkdX, init_dr_train.data['features'][gid]) 107 | bkd_dr_train.data['features'][gid] = torch.add( 108 | rst_bkdX[:nodenums[gid]].detach().cpu(), init_dr_train.data['features'][gid]) 109 | 110 | # train GNN 111 | train_model(self.args, bkd_dr_train, model, list(set(pset)), list(set(nset))) 112 | 113 | #----------------- Evaluation -----------------# 114 | for gid in bkd_gids_test: 115 | rst_bkdA = toponet( 116 | Ainput_test[gid], topomask_test[gid], self.args.topo_thrd, 117 | self.cpu, self.args.topo_activation, 'topo') 118 | # rst_bkdA = recover_mask(nodenums[gid], topomask_test[gid], 'topo') 119 | # bkd_dr_test.data['adj_list'][gid] = torch.add(rst_bkdA, 120 | # torch.as_tensor(copy.deepcopy(init_dr_test.data['adj_list'][gid]))) 121 | bkd_dr_test.data['adj_list'][gid] = torch.add( 122 | rst_bkdA[:nodenums[gid], :nodenums[gid]], 123 | torch.as_tensor(copy.deepcopy(init_dr_test.data['adj_list'][gid]))) 124 | 125 | rst_bkdX = featnet( 126 | Xinput_test[gid], featmask_test[gid], self.args.feat_thrd, 127 | self.cpu, self.args.feat_activation, 'feat') 128 | # rst_bkdX = recover_mask(nodenums[gid], featmask_test[gid], 'feat') 129 | # bkd_dr_test.data['features'][gid] = torch.add( 130 | # rst_bkdX, torch.as_tensor(copy.deepcopy(init_dr_test.data['features'][gid]))) 131 | bkd_dr_test.data['features'][gid] = torch.add( 132 | rst_bkdX[:nodenums[gid]], torch.as_tensor(copy.deepcopy(init_dr_test.data['features'][gid]))) 133 | 134 | # graph originally in target label 135 | yt_gids = [gid for gid in bkd_gids_test 136 | if self.benign_dr.data['labels'][gid]==self.args.target_class] 137 | # graph originally notin target label 138 | yx_gids = list(set(bkd_gids_test) - set(yt_gids)) 139 | clean_graphs_test = list(set(self.benign_dr.data['splits']['test'])-set(bkd_gids_test)) 140 | 141 | # feed into GNN, test success rate 142 | bkd_acc = evaluate(self.args, bkd_dr_test, model, bkd_gids_test) 143 | flip_rate = evaluate(self.args, bkd_dr_test, model,yx_gids) 144 | clean_acc = evaluate(self.args, bkd_dr_test, model, clean_graphs_test) 145 | 146 | # save gnn 147 | if rs_step == 0 and (bi_step==self.args.bilevel_steps-1 or abs(bkd_acc-100) <1e-4): 148 | if self.args.save_bkd_model: 149 | save_path = self.args.bkd_model_save_path 150 | os.makedirs(save_path, exist_ok=True) 151 | save_path = os.path.join(save_path, '%s-%s-%f.t7' % ( 152 | self.args.model, self.args.dataset, self.args.train_ratio, 153 | self.args.bkd_gratio_trainset, self.args.bkd_num_pergraph, self.args.bkd_size)) 154 | 155 | torch.save({'model': model.state_dict(), 156 | 'asr': bkd_acc, 157 | 'flip_rate': flip_rate, 158 | 'clean_acc': clean_acc, 159 | }, save_path) 160 | print("Trojaning model is saved at: ", save_path) 161 | 162 | if abs(bkd_acc-100) <1e-4: 163 | # bkd_dr_tosave = copy.deepcopy(bkd_dr_test) 164 | print("Early Termination for 100% Attack Rate") 165 | break 166 | print('Done') 167 | 168 | 169 | def bkd_cdd(self, subset: str): 170 | # - subset: 'train', 'test' 171 | # find graphs to add trigger (not modify now) 172 | bkd_gids = select_cdd_graphs( 173 | self.args, self.benign_dr.data['splits'][subset], self.benign_dr.data['adj_list'], subset) 174 | # find trigger nodes per graph 175 | # same sequence with selected backdoored graphs 176 | bkd_nids, bkd_nid_groups = select_cdd_nodes( 177 | self.args, bkd_gids, self.benign_dr.data['adj_list']) 178 | 179 | assert len(bkd_gids)==len(bkd_nids)==len(bkd_nid_groups) 180 | 181 | return bkd_gids, bkd_nids, bkd_nid_groups 182 | 183 | 184 | @staticmethod 185 | def init_trigger(args, dr: DataReader, bkd_gids: list, bkd_nid_groups: list, init_edge: float, init_feat: float): 186 | if init_feat == None: 187 | init_feat = - 1 188 | print('init feat == None, transferred into -1') 189 | 190 | # (in place) datareader trigger injection 191 | for i in tqdm(range(len(bkd_gids)), desc="initializing trigger..."): 192 | gid = bkd_gids[i] 193 | for group in bkd_nid_groups[i] : 194 | # change adj in-place 195 | src, dst = [], [] 196 | for v1 in group: 197 | for v2 in group: 198 | if v1!=v2: 199 | src.append(v1) 200 | dst.append(v2) 201 | a = np.array(dr.data['adj_list'][gid]) 202 | a[src, dst] = init_edge 203 | dr.data['adj_list'][gid] = a.tolist() 204 | 205 | # change features in-place 206 | featdim = len(dr.data['features'][0][0]) 207 | a = np.array(dr.data['features'][gid]) 208 | a[group] = np.ones((len(group), featdim)) * init_feat 209 | dr.data['features'][gid] = a.tolist() 210 | 211 | # change graph labels 212 | assert args.target_class is not None 213 | dr.data['labels'][gid] = args.target_class 214 | 215 | return dr 216 | 217 | if __name__ == '__main__': 218 | args = parse_args() 219 | attack = GraphBackdoor(args) 220 | attack.run() -------------------------------------------------------------------------------- /main/benign.py: -------------------------------------------------------------------------------- 1 | import sys, os 2 | sys.path.append(os.path.abspath('..')) 3 | 4 | import time 5 | import pickle 6 | import numpy as np 7 | 8 | import torch 9 | import torch.optim as optim 10 | import torch.nn.functional as F 11 | from torch.utils.data import DataLoader 12 | import torch.optim.lr_scheduler as lr_scheduler 13 | 14 | from utils.datareader import GraphData, DataReader 15 | from utils.batch import collate_batch 16 | from model.gcn import GCN 17 | from model.gat import GAT 18 | from model.sage import GraphSAGE 19 | from config import parse_args 20 | 21 | def run(args): 22 | assert torch.cuda.is_available(), 'no GPU available' 23 | cpu = torch.device('cpu') 24 | cuda = torch.device('cuda') 25 | 26 | # load data into DataReader object 27 | dr = DataReader(args) 28 | 29 | loaders = {} 30 | for split in ['train', 'test']: 31 | if split=='train': 32 | gids = dr.data['splits']['train'] 33 | else: 34 | gids = dr.data['splits']['test'] 35 | gdata = GraphData(dr, gids) 36 | loader = DataLoader(gdata, 37 | batch_size=args.batch_size, 38 | shuffle=False, 39 | collate_fn=collate_batch) 40 | # data in loaders['train/test'] is saved as returned format of collate_batch() 41 | loaders[split] = loader 42 | print('train %d, test %d' % (len(loaders['train'].dataset), len(loaders['test'].dataset))) 43 | 44 | # prepare model 45 | in_dim = loaders['train'].dataset.num_features 46 | out_dim = loaders['train'].dataset.num_classes 47 | if args.model == 'gcn': 48 | model = GCN(in_dim, out_dim, hidden_dim=args.hidden_dim, dropout=args.dropout) 49 | elif args.model == 'gat': 50 | model = GAT(in_dim, out_dim, hidden_dim=args.hidden_dim, dropout=args.dropout, num_head=args.num_head) 51 | elif args.model=='sage': 52 | model = GraphSAGE(in_dim, out_dim, hidden_dim=args.hidden_dim, dropout=args.dropout) 53 | else: 54 | raise NotImplementedError(args.model) 55 | 56 | # print('\nInitialize model') 57 | # print(model) 58 | train_params = list(filter(lambda p: p.requires_grad, model.parameters())) 59 | # print('N trainable parameters:', np.sum([p.numel() for p in train_params])) 60 | 61 | # training 62 | loss_fn = F.cross_entropy 63 | predict_fn = lambda output: output.max(1, keepdim=True)[1].detach().cpu() 64 | optimizer = optim.Adam(train_params, lr=args.lr, weight_decay=args.weight_decay, betas=(0.5, 0.999)) 65 | scheduler = lr_scheduler.MultiStepLR(optimizer, args.lr_decay_steps, gamma=0.1) 66 | 67 | model.to(cuda) 68 | for epoch in range(args.train_epochs): 69 | model.train() 70 | start = time.time() 71 | train_loss, n_samples = 0, 0 72 | for batch_id, data in enumerate(loaders['train']): 73 | for i in range(len(data)): 74 | data[i] = data[i].to(cuda) 75 | # if args.use_cont_node_attr: 76 | # data[0] = norm_features(data[0]) 77 | optimizer.zero_grad() 78 | output = model(data) 79 | if len(output.shape)==1: 80 | output = output.unsqueeze(0) 81 | loss = loss_fn(output, data[4]) 82 | loss.backward() 83 | optimizer.step() 84 | scheduler.step() 85 | 86 | time_iter = time.time() - start 87 | train_loss += loss.item() * len(output) 88 | n_samples += len(output) 89 | 90 | if args.train_verbose and (epoch % args.log_every == 0 or epoch == args.train_epochs - 1): 91 | print('Train Epoch: %d\tLoss: %.4f (avg: %.4f) \tsec/iter: %.2f' % ( 92 | epoch + 1, loss.item(), train_loss / n_samples, time_iter / (batch_id + 1))) 93 | 94 | if (epoch + 1) % args.eval_every == 0 or epoch == args.train_epochs-1: 95 | model.eval() 96 | start = time.time() 97 | test_loss, correct, n_samples = 0, 0, 0 98 | for batch_id, data in enumerate(loaders['test']): 99 | for i in range(len(data)): 100 | data[i] = data[i].to(cuda) 101 | # if args.use_org_node_attr: 102 | # data[0] = norm_features(data[0]) 103 | output = model(data) 104 | if len(output.shape)==1: 105 | output = output.unsqueeze(0) 106 | loss = loss_fn(output, data[4], reduction='sum') 107 | test_loss += loss.item() 108 | n_samples += len(output) 109 | pred = predict_fn(output) 110 | 111 | correct += pred.eq(data[4].detach().cpu().view_as(pred)).sum().item() 112 | 113 | eval_acc = 100. * correct / n_samples 114 | print('Test set (epoch %d): Average loss: %.4f, Accuracy: %d/%d (%.2f%s) \tsec/iter: %.2f' % ( 115 | epoch + 1, test_loss / n_samples, correct, n_samples, 116 | eval_acc, '%', (time.time() - start) / len(loaders['test']))) 117 | 118 | model.to(cpu) 119 | 120 | if args.save_clean_model: 121 | save_path = args.clean_model_save_path 122 | os.makedirs(save_path, exist_ok=True) 123 | save_path = os.path.join(save_path, '%s-%s-%s.t7' % (args.model, args.dataset, str(args.train_ratio))) 124 | 125 | torch.save({ 126 | 'model': model.state_dict(), 127 | 'lr': args.lr, 128 | 'batch_size': args.batch_size, 129 | 'eval_acc': eval_acc, 130 | }, save_path) 131 | print('Clean trained GNN saved at: ', os.path.abspath(save_path)) 132 | 133 | return dr, model 134 | 135 | 136 | if __name__ == '__main__': 137 | args = parse_args() 138 | run(args) -------------------------------------------------------------------------------- /main/example.sh: -------------------------------------------------------------------------------- 1 | python benign.py --use_org_node_attr --train_verbose 2 | 3 | nohup python -u attack.py --use_org_node_attr --train_verbose --target_class 0 --train_epochs 20 > ../attack.log 2>&1 & -------------------------------------------------------------------------------- /model/gat.py: -------------------------------------------------------------------------------- 1 | import dgl 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | from utils.graph import numpy_to_graph 6 | 7 | # implemented from https://arxiv.org/abs/1710.10903 8 | 9 | class GATLayer(nn.Module): 10 | def __init__(self, in_dim, out_dim): 11 | super(GATLayer, self).__init__() 12 | # equation (1) 13 | self.fc = nn.Linear(in_dim, out_dim, bias=False) 14 | # equation (2) 15 | self.attn_fc = nn.Linear(2 * out_dim, 1, bias=False) 16 | 17 | def edge_attention(self, edges): 18 | # edge UDF for equation (2) 19 | z2 = torch.cat([edges.src['z'], edges.dst['z']], dim=1) 20 | a = self.attn_fc(z2) 21 | return {'e': F.leaky_relu(a)} 22 | 23 | def message_func(self, edges): 24 | # message UDF for equation (3) & (4) 25 | return {'z': edges.src['z'], 'e': edges.data['e']} 26 | 27 | def reduce_func(self, nodes): 28 | # reduce UDF for equation (3) & (4) 29 | # equation (3) 30 | alpha = F.softmax(nodes.mailbox['e'], dim=1) 31 | # equation (4) 32 | h = torch.sum(alpha * nodes.mailbox['z'], dim=1) 33 | return {'h': h} 34 | 35 | def forward(self, g, h): 36 | # equation (1) 37 | z = self.fc(h) 38 | g.ndata['z'] = z 39 | # equation (2) 40 | g.apply_edges(self.edge_attention) 41 | # equation (3) & (4) 42 | g.update_all(self.message_func, self.reduce_func) 43 | return g.ndata.pop('h') 44 | 45 | 46 | class MultiHeadGATLayer(nn.Module): 47 | def __init__(self, in_dim, out_dim, num_head, merge='cat'): 48 | super(MultiHeadGATLayer, self).__init__() 49 | self.heads = nn.ModuleList() 50 | for i in range(num_head): 51 | self.heads.append(GATLayer(in_dim, out_dim)) 52 | self.merge = merge 53 | 54 | def forward(self, g, h): 55 | head_outs = [attn_head(g, h) for attn_head in self.heads] 56 | if self.merge == 'cat': 57 | # concat on the output feature dimension (dim=1) 58 | return torch.cat(head_outs, dim=1) 59 | else: 60 | # merge using average 61 | return torch.mean(torch.stack(head_outs), dim=0) 62 | 63 | 64 | class GAT(nn.Module): 65 | def __init__(self, in_dim, out_dim, 66 | hidden_dim=[64, 32], 67 | dropout=0.2, 68 | num_head=2): 69 | super(GAT, self).__init__() 70 | 71 | self.layers = nn.ModuleList() 72 | 73 | self.layers.append(MultiHeadGATLayer(in_dim, hidden_dim[0], num_head, merge='mean')) 74 | for i in range(len(hidden_dim) - 1): 75 | self.layers.append(MultiHeadGATLayer(hidden_dim[i], hidden_dim[i+1], num_head, merge='mean')) 76 | 77 | fc = [] 78 | if dropout > 0: 79 | fc.append(nn.Dropout(p=dropout)) 80 | fc.append(nn.Linear(hidden_dim[-1], out_dim)) 81 | self.fc = nn.Sequential(*fc) 82 | 83 | def forward(self, data): 84 | batch_g = [] 85 | for adj in data[1]: 86 | batch_g.append(numpy_to_graph(adj.cpu().detach().T.numpy(), to_cuda=adj.is_cuda)) 87 | batch_g = dgl.batch(batch_g) 88 | 89 | mask = data[2] 90 | if len(mask.shape) == 2: 91 | mask = mask.unsqueeze(2) # (B,N,1) 92 | 93 | B,N,F = data[0].shape[:3] 94 | x = data[0].reshape(B*N, F) 95 | mask = mask.reshape(B*N, 1) 96 | for layer in self.layers: 97 | x = layer(batch_g, x) 98 | x = x * mask 99 | 100 | F_prime = x.shape[-1] 101 | x = x.reshape(B, N, F_prime) 102 | x = torch.max(x, dim=1)[0].squeeze() # max pooling over nodes (usually performs better than average) 103 | # x = torch.mean(x, dim=1).squeeze() 104 | x = self.fc(x) 105 | return x -------------------------------------------------------------------------------- /model/gcn.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | import dgl 6 | import dgl.function as fn 7 | from utils.graph import numpy_to_graph 8 | 9 | gcn_msg = fn.copy_src(src='h', out='m') 10 | gcn_reduce = fn.sum(msg='m', out='h') 11 | 12 | # Used for inductive case (graph classification) by default. 13 | class GCNLayer(nn.Module): 14 | def __init__(self, in_feats, out_feats): 15 | super(GCNLayer, self).__init__() 16 | self.linear = nn.Linear(in_feats, out_feats) 17 | 18 | def forward(self, g, feature): 19 | # Creating a local scope so that all the stored ndata and edata 20 | # (such as the `'h'` ndata below) are automatically popped out 21 | # when the scope exits. 22 | with g.local_scope(): 23 | g.ndata['h'] = feature 24 | g.update_all(gcn_msg, gcn_reduce) 25 | h = g.ndata['h'] 26 | return self.linear(h) 27 | 28 | 29 | # 2 layers by default 30 | class GCN(nn.Module): 31 | def __init__(self, in_dim, out_dim, 32 | hidden_dim=[64, 32], # GNN layers + 1 layer MLP 33 | dropout=0.2, 34 | activation=F.relu): 35 | super(GCN, self).__init__() 36 | self.layers = nn.ModuleList() 37 | 38 | self.layers.append(GCNLayer(in_dim, hidden_dim[0])) 39 | for i in range(len(hidden_dim) - 1): 40 | self.layers.append(GCNLayer(hidden_dim[i], hidden_dim[i+1])) 41 | 42 | fc = [] 43 | if dropout > 0: 44 | fc.append(nn.Dropout(p=dropout)) 45 | fc.append(nn.Linear(hidden_dim[-1], out_dim)) 46 | self.fc = nn.Sequential(*fc) 47 | 48 | 49 | def forward(self, data): 50 | batch_g = [] 51 | for adj in data[1]: 52 | batch_g.append(numpy_to_graph(adj.cpu().detach().T.numpy(), to_cuda=adj.is_cuda)) 53 | batch_g = dgl.batch(batch_g) 54 | 55 | mask = data[2] 56 | if len(mask.shape) == 2: 57 | mask = mask.unsqueeze(2) # (B,N,1) 58 | 59 | B,N,F = data[0].shape[:3] 60 | x = data[0].reshape(B*N, F) 61 | mask = mask.reshape(B*N, 1) 62 | for layer in self.layers: 63 | x = layer(batch_g, x) 64 | x = x * mask 65 | 66 | F_prime = x.shape[-1] 67 | x = x.reshape(B, N, F_prime) 68 | x = torch.max(x, dim=1)[0].squeeze() # max pooling over nodes (usually performs better than average) 69 | # x = torch.mean(x, dim=1).squeeze() 70 | x = self.fc(x) 71 | return x 72 | 73 | -------------------------------------------------------------------------------- /model/sage.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | import dgl 6 | from dgl import DGLGraph, transform 7 | from dgl.nn.pytorch.conv import SAGEConv 8 | from utils.graph import numpy_to_graph 9 | 10 | # Used for inductive case (graph classification) by default. 11 | class GraphSAGE(nn.Module): 12 | def __init__(self, in_dim, out_dim, 13 | hidden_dim=[64, 32], # GNN layers + 1 layer MLP 14 | dropout=0.2, 15 | activation=F.relu, 16 | aggregator_type='gcn'): # mean/gcn/pool/lstm 17 | super(GraphSAGE, self).__init__() 18 | self.layers = nn.ModuleList() 19 | 20 | # input layer 21 | self.layers.append(SAGEConv(in_dim, hidden_dim[0], aggregator_type, feat_drop=dropout, activation=activation)) 22 | # hidden layers 23 | for i in range(len(hidden_dim) - 1): 24 | self.layers.append(SAGEConv(hidden_dim[i], hidden_dim[i+1], aggregator_type, feat_drop=dropout, activation=activation)) 25 | 26 | fc = [] 27 | if dropout > 0: 28 | fc.append(nn.Dropout(p=dropout)) 29 | fc.append(nn.Linear(hidden_dim[-1], out_dim)) 30 | self.fc = nn.Sequential(*fc) 31 | 32 | 33 | def forward(self, data): 34 | batch_g = [] 35 | for adj in data[1]: 36 | # cannot use tensor init DGLGraph 37 | batch_g.append(numpy_to_graph(adj.cpu().T.numpy(), to_cuda=adj.is_cuda)) 38 | batch_g = dgl.batch(batch_g) 39 | 40 | mask = data[2] 41 | if len(mask.shape) == 2: 42 | mask = mask.unsqueeze(2) # (B,N,1) 43 | 44 | B,N,F = data[0].shape[:3] 45 | x = data[0].reshape(B*N, F) 46 | mask = mask.reshape(B*N, 1) 47 | for layer in self.layers: 48 | x = layer(batch_g, x) 49 | x = x * mask 50 | 51 | F_prime = x.shape[-1] 52 | x = x.reshape(B, N, F_prime) 53 | x = torch.max(x, dim=1)[0].squeeze() # max pooling over nodes (usually performs better than average) 54 | # x = torch.mean(x, dim=1).squeeze() 55 | x = self.fc(x) 56 | return x -------------------------------------------------------------------------------- /trojan/GTA.py: -------------------------------------------------------------------------------- 1 | import sys, os 2 | from utils.datareader import DataReader 3 | sys.path.append(os.path.abspath('..')) 4 | 5 | import numpy as np 6 | from tqdm import tqdm 7 | import torch 8 | import torch.nn as nn 9 | import torch.optim as optim 10 | import torch.nn.functional as F 11 | 12 | from utils.mask import recover_mask 13 | from trojan.prop import forwarding 14 | 15 | class GradWhere(torch.autograd.Function): 16 | """ 17 | We can implement our own custom autograd Functions by subclassing 18 | torch.autograd.Function and implementing the forward and backward passes 19 | which operate on Tensors. 20 | """ 21 | 22 | @staticmethod 23 | def forward(ctx, input, thrd, device): 24 | """ 25 | In the forward pass we receive a Tensor containing the input and return 26 | a Tensor containing the output. ctx is a context object that can be used 27 | to stash information for backward computation. You can cache arbitrary 28 | objects for use in the backward pass using the ctx.save_for_backward method. 29 | """ 30 | ctx.save_for_backward(input) 31 | rst = torch.where(input>thrd, torch.tensor(1.0, device=device, requires_grad=True), 32 | torch.tensor(0.0, device=device, requires_grad=True)) 33 | return rst 34 | 35 | @staticmethod 36 | def backward(ctx, grad_output): 37 | """ 38 | In the backward pass we receive a Tensor containing the gradient of the loss 39 | with respect to the output, and we need to compute the gradient of the loss 40 | with respect to the input. 41 | """ 42 | input, = ctx.saved_tensors 43 | grad_input = grad_output.clone() 44 | 45 | """ 46 | Return results number should corresponding with .forward inputs (besides ctx), 47 | for each input, return a corresponding backward grad 48 | """ 49 | return grad_input, None, None 50 | 51 | 52 | 53 | class GraphTrojanNet(nn.Module): 54 | def __init__(self, sq_dim, layernum=1, dropout=0.05): 55 | super(GraphTrojanNet, self).__init__() 56 | 57 | layers = [] 58 | if dropout > 0: 59 | layers.append(nn.Dropout(p=dropout)) 60 | for l in range(layernum-1): 61 | layers.append(nn.Linear(sq_dim, sq_dim)) 62 | layers.append(nn.ReLU(inplace=True)) 63 | if dropout > 0: 64 | layers.append(nn.Dropout(p=dropout)) 65 | layers.append(nn.Linear(sq_dim, sq_dim)) 66 | 67 | self.layers = nn.Sequential(*layers) 68 | 69 | def forward(self, input, mask, thrd, 70 | device=torch.device('cpu'), 71 | activation='relu', 72 | for_whom='topo', 73 | binaryfeat=False): 74 | 75 | """ 76 | "input", "mask" and "thrd", should already in cuda before sent to this function. 77 | If using sparse format, corresponding tensor should already in sparse format before 78 | sent into this function 79 | """ 80 | GW = GradWhere.apply 81 | 82 | bkdmat = self.layers(input) 83 | if activation=='relu': 84 | bkdmat = F.relu(bkdmat) 85 | elif activation=='sigmoid': 86 | bkdmat = torch.sigmoid(bkdmat) # nn.Functional.sigmoid is deprecated 87 | 88 | if for_whom == 'topo': # not consider direct yet 89 | bkdmat = torch.div(torch.add(bkdmat, bkdmat.transpose(0, 1)), 2.0) 90 | if for_whom == 'topo' or (for_whom == 'feat' and binaryfeat): 91 | bkdmat = GW(bkdmat, thrd, device) 92 | bkdmat = torch.mul(bkdmat, mask) 93 | 94 | return bkdmat 95 | 96 | 97 | def train_gtn(args, model, toponet: GraphTrojanNet, featnet: GraphTrojanNet, 98 | pset, nset, topomasks, featmasks, 99 | init_dr: DataReader, bkd_dr: DataReader, Ainputs, Xinputs): 100 | """ 101 | All matrix/array like inputs should already in torch.tensor format. 102 | All tensor parameters or models should initially stay in CPU when 103 | feeding into this function. 104 | 105 | About inputs of this function: 106 | - pset/nset: gids in trainset 107 | - init_dr: init datareader, keep unmodified inside of each resampling 108 | - bkd_dr: store temp adaptive adj/features, get by init_dr + GTN(inputs) 109 | """ 110 | if torch.cuda.is_available(): 111 | cuda = torch.device('cuda') 112 | cpu = torch.device('cpu') 113 | 114 | init_As = init_dr.data['adj_list'] 115 | init_Xs = init_dr.data['features'] 116 | bkd_As = bkd_dr.data['adj_list'] 117 | bkd_Xs = bkd_dr.data['features'] 118 | 119 | nodenums = [len(adj) for adj in init_As] 120 | glabels = torch.LongTensor(init_dr.data['labels']).to(cuda) 121 | glabels[pset] = args.target_class 122 | allset = np.concatenate((pset, nset)) 123 | 124 | optimizer_topo = optim.Adam(toponet.parameters(), 125 | lr=args.gtn_lr, 126 | weight_decay=5e-4) 127 | optimizer_feat = optim.Adam(featnet.parameters(), 128 | lr=args.gtn_lr, 129 | weight_decay=5e-4) 130 | 131 | 132 | #----------- training topo generator -----------# 133 | toponet.to(cuda) 134 | model.to(cuda) 135 | topo_thrd = torch.tensor(args.topo_thrd).to(cuda) 136 | criterion = nn.CrossEntropyLoss() 137 | 138 | toponet.train() 139 | for _ in tqdm(range(args.gtn_epochs), desc="training topology generator"): 140 | optimizer_topo.zero_grad() 141 | # generate new adj_list by dr.data['adj_list'] 142 | for gid in pset: 143 | SendtoCUDA(gid, [init_As, Ainputs, topomasks]) # only send the used graph items to cuda 144 | rst_bkdA = toponet( 145 | Ainputs[gid], topomasks[gid], topo_thrd, cuda, args.topo_activation, 'topo') 146 | # rst_bkdA = recover_mask(nodenums[gid], topomasks[gid], 'topo') 147 | # bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA, init_As[gid]) 148 | bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA[:nodenums[gid], :nodenums[gid]], init_As[gid]) # only current position in cuda 149 | SendtoCPU(gid, [init_As, Ainputs, topomasks]) 150 | 151 | loss = forwarding(args, bkd_dr, model, allset, criterion) 152 | loss.backward() 153 | optimizer_topo.step() 154 | torch.cuda.empty_cache() 155 | 156 | toponet.eval() 157 | toponet.to(cpu) 158 | model.to(cpu) 159 | for gid in pset: 160 | SendtoCPU(gid, [bkd_dr.data['adj_list']]) 161 | del topo_thrd 162 | torch.cuda.empty_cache() 163 | 164 | 165 | #----------- training feat generator -----------# 166 | featnet.to(cuda) 167 | model.to(cuda) 168 | feat_thrd = torch.tensor(args.feat_thrd).to(cuda) 169 | criterion = nn.CrossEntropyLoss() 170 | 171 | featnet.train() 172 | for epoch in tqdm(range(args.gtn_epochs), desc="training feature generator"): 173 | optimizer_feat.zero_grad() 174 | # generate new features by dr.data['features'] 175 | for gid in pset: 176 | SendtoCUDA(gid, [init_Xs, Xinputs, featmasks]) # only send the used graph items to cuda 177 | rst_bkdX = featnet( 178 | Xinputs[gid], featmasks[gid], feat_thrd, cuda, args.feat_activation, 'feat') 179 | # rst_bkdX = recover_mask(nodenums[gid], featmasks[gid], 'feat') 180 | # bkd_dr.data['features'][gid] = torch.add(rst_bkdX, init_Xs[gid]) 181 | bkd_dr.data['features'][gid] = torch.add(rst_bkdX[:nodenums[gid]], init_Xs[gid]) # only current position in cuda 182 | SendtoCPU(gid, [init_Xs, Xinputs, featmasks]) 183 | 184 | # generate DataLoader 185 | loss = forwarding( 186 | args, bkd_dr, model, allset, criterion) 187 | loss.backward() 188 | optimizer_feat.step() 189 | torch.cuda.empty_cache() 190 | 191 | featnet.eval() 192 | featnet.to(cpu) 193 | model.to(cpu) 194 | for gid in pset: 195 | SendtoCPU(gid, [bkd_dr.data['features']]) 196 | del feat_thrd 197 | torch.cuda.empty_cache() 198 | 199 | return toponet, featnet 200 | 201 | #---------------------------------------------------------------- 202 | def SendtoCUDA(gid, items): 203 | """ 204 | - items: a list of dict / full-graphs list, 205 | used as item[gid] in items 206 | - gid: int 207 | """ 208 | cuda = torch.device('cuda') 209 | for item in items: 210 | item[gid] = torch.as_tensor(item[gid], dtype=torch.float32).to(cuda) 211 | 212 | 213 | def SendtoCPU(gid, items): 214 | """ 215 | Used after SendtoCUDA, target object must be torch.tensor and already in cuda. 216 | 217 | - items: a list of dict / full-graphs list, 218 | used as item[gid] in items 219 | - gid: int 220 | """ 221 | 222 | cpu = torch.device('cpu') 223 | for item in items: 224 | item[gid] = item[gid].to(cpu) -------------------------------------------------------------------------------- /trojan/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhaohan-xi/GraphBackdoor/3d975d78813f2a4a4960f92f9b66847dc19413a8/trojan/__init__.py -------------------------------------------------------------------------------- /trojan/input.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | 4 | def gen_input(args, datareader, bkd_gids): 5 | """ 6 | Prepare inputs for GTN, topo input and feat input together. 7 | 8 | About inputs (of this function): 9 | - args: control adapt-input type 10 | 11 | Note: Extend input size as (N, N) / (N, F) where N is max node num among all graphs 12 | """ 13 | As = {} 14 | Xs = {} 15 | for gid in bkd_gids: 16 | if gid not in As: As[gid] = torch.tensor(datareader.data['adj_list'][gid], dtype=torch.float) 17 | if gid not in Xs: Xs[gid] = torch.tensor(datareader.data['features'][gid], dtype=torch.float) 18 | Ainputs = {} 19 | Xinputs = {} 20 | 21 | if args.gtn_input_type == '1hop': 22 | for gid in bkd_gids: 23 | if gid not in Ainputs: Ainputs[gid] = As[gid].clone().detach() 24 | if gid not in Xinputs: Xinputs[gid] = torch.mm(Ainputs[gid], Xs[gid]) 25 | 26 | elif args.gtn_input_type == '2hop': 27 | for gid in bkd_gids: 28 | As[gid] = torch.add(As[gid], torch.mm(As[gid], As[gid])) 29 | As[gid] = torch.where(As[gid]>0, torch.tensor(1.0, requires_grad=True), 30 | torch.tensor(0.0, requires_grad=True)) 31 | As[gid].fill_diagonal_(0.0) 32 | 33 | for gid in bkd_gids: 34 | if gid not in Ainputs: Ainputs[gid] = As[gid].clone().detach() 35 | if gid not in Xinputs: Xinputs[gid] = torch.mm(Ainputs[gid], Xs[gid]) 36 | 37 | 38 | elif args.gtn_input_type == '1hop_degree': 39 | rowsums = [torch.add(torch.sum(As[gid], dim=1), 1e-6) for gid in bkd_gids] 40 | re_Ds = [torch.diag(torch.pow(rowsum, -1)) for rowsum in rowsums] 41 | 42 | for i in range(len(bkd_gids)): 43 | gid = bkd_gids[i] 44 | if gid not in Ainputs: Ainputs[gid] = torch.mm(re_Ds[i], As[gid]) 45 | if gid not in Xinputs: Xinputs[gid] = torch.mm(Ainputs[gid], Xs[gid]) 46 | 47 | 48 | elif args.gtn_input_type == '2hop_degree': 49 | for gid in bkd_gids: 50 | As[gid] = torch.add(As[gid], torch.mm(As[gid], As[gid])) 51 | As[gid] = torch.where(As[gid]>0, torch.tensor(1.0, requires_grad=True), 52 | torch.tensor(0.0, requires_grad=True)) 53 | As[gid].fill_diagonal_(0.0) 54 | 55 | rowsums = [torch.add(torch.sum(As[gid], dim=1), 1e-6) for gid in bkd_gids] 56 | re_Ds = [torch.diag(torch.pow(rowsum, -1)) for rowsum in rowsums] 57 | 58 | for i in range(len(bkd_gids)): 59 | gid = bkd_gids[i] 60 | if gid not in Ainputs: Ainputs[gid] = torch.mm(re_Ds[i], As[gid]) 61 | if gid not in Xinputs: Xinputs[gid] = torch.mm(Ainputs[gid], Xs[gid]) 62 | 63 | else: raise NotImplementedError('not support other types of aggregated inputs') 64 | 65 | # pad each input into maxi possible size (N, N) / (N, F) 66 | NodeMax = int(datareader.data['n_node_max']) 67 | FeatDim = np.array(datareader.data['features'][0]).shape[1] 68 | for gid in Ainputs.keys(): 69 | a_input = Ainputs[gid] 70 | x_input = Xinputs[gid] 71 | 72 | add_dim = NodeMax - a_input.shape[0] 73 | Ainputs[gid] = np.pad(a_input, ((0, add_dim), (0, add_dim))).tolist() 74 | Xinputs[gid] = np.pad(x_input, ((0, add_dim), (0, 0))).tolist() 75 | Ainputs[gid] = torch.tensor(Ainputs[gid]) 76 | Xinputs[gid] = torch.tensor(Xinputs[gid]) 77 | 78 | return Ainputs, Xinputs 79 | -------------------------------------------------------------------------------- /trojan/prop.py: -------------------------------------------------------------------------------- 1 | import sys, os 2 | sys.path.append(os.path.abspath('..')) 3 | 4 | import torch 5 | import torch.nn as nn 6 | import torch.optim as optim 7 | import torch.nn.functional as F 8 | import torch.optim.lr_scheduler as lr_scheduler 9 | from torch.utils.data import DataLoader 10 | 11 | from utils.datareader import GraphData, DataReader 12 | from utils.batch import collate_batch 13 | 14 | # run on CUDA 15 | def forwarding(args, bkd_dr: DataReader, model, gids, criterion): 16 | assert torch.cuda.is_available(), "no GPU available" 17 | cuda = torch.device('cuda') 18 | 19 | gdata = GraphData(bkd_dr, gids) 20 | loader = DataLoader(gdata, 21 | batch_size=args.batch_size, 22 | shuffle=False, 23 | collate_fn=collate_batch) 24 | 25 | if not next(model.parameters()).is_cuda: 26 | model.to(cuda) 27 | model.eval() 28 | all_loss, n_samples = 0.0, 0.0 29 | for batch_idx, data in enumerate(loader): 30 | # assert batch_idx == 0, "In AdaptNet Train, we only need one GNN pass, batch-size=len(all trainset)" 31 | for i in range(len(data)): 32 | data[i] = data[i].to(cuda) 33 | output = model(data) 34 | 35 | if len(output.shape)==1: 36 | output = output.unsqueeze(0) 37 | 38 | loss = criterion(output, data[4]) # only calculate once 39 | all_loss = torch.add(torch.mul(loss, len(output)), all_loss) # cannot be loss.item() 40 | n_samples += len(output) 41 | 42 | all_loss = torch.div(all_loss, n_samples) 43 | return all_loss 44 | 45 | 46 | def train_model(args, dr_train: DataReader, model, pset, nset): 47 | assert torch.cuda.is_available(), "no GPU available" 48 | cuda = torch.device('cuda') 49 | cpu = torch.device('cpu') 50 | 51 | model.to(cuda) 52 | gids = {'pos': pset, 'neg': nset} 53 | gdata = {} 54 | loader = {} 55 | for key in ['pos', 'neg']: 56 | gdata[key] = GraphData(dr_train, gids[key]) 57 | loader[key] = DataLoader(gdata[key], 58 | batch_size=args.batch_size, 59 | shuffle=False, 60 | collate_fn=collate_batch) 61 | 62 | train_params = list(filter(lambda p: p.requires_grad, model.parameters())) 63 | optimizer = optim.Adam(train_params, lr=args.lr, weight_decay=args.weight_decay, betas=(0.5, 0.999)) 64 | scheduler = lr_scheduler.MultiStepLR(optimizer, args.lr_decay_steps, gamma=0.1) 65 | loss_fn = F.cross_entropy 66 | 67 | model.train() 68 | for epoch in range(args.train_epochs): 69 | optimizer.zero_grad() 70 | 71 | losses = {'pos': 0.0, 'neg': 0.0} 72 | n_samples = {'pos': 0.0, 'neg': 0.0} 73 | for key in ['pos', 'neg']: 74 | for batch_idx, data in enumerate(loader[key]): 75 | for i in range(len(data)): 76 | data[i] = data[i].to(cuda) 77 | output = model(data) 78 | if len(output.shape)==1: 79 | output = output.unsqueeze(0) 80 | losses[key] += loss_fn(output, data[4])*len(output) 81 | n_samples[key] += len(output) 82 | 83 | for i in range(len(data)): 84 | data[i] = data[i].to(cpu) 85 | 86 | losses[key] = torch.div(losses[key], n_samples[key]) 87 | loss = losses['pos'] + args.lambd*losses['neg'] 88 | loss.backward() 89 | optimizer.step() 90 | scheduler.step() 91 | model.to(cpu) 92 | 93 | 94 | # def TrainGNN_v2(args, 95 | # dr_train, 96 | # model, 97 | # fold_id, 98 | # train_gids, 99 | # use_optim='Adam', 100 | # need_print=False): 101 | # assert torch.cuda.is_available(), "no GPU available" 102 | # cuda = torch.device('cuda') 103 | # cpu = torch.device('cpu') 104 | 105 | # model.to(cuda) 106 | 107 | # gdata = GraphData(dr_train, 108 | # fold_id, 109 | # 'train', 110 | # train_gids) 111 | # loader = DataLoader(gdata, 112 | # batch_size=args.batch_size, 113 | # shuffle=False, 114 | # collate_fn=collate_batch) 115 | 116 | # train_params = list(filter(lambda p: p.requires_grad, model.parameters())) 117 | # if use_optim=='Adam': 118 | # optimizer = optim.Adam(train_params, lr=args.lr, weight_decay=args.weight_decay, betas=(0.5, 0.999)) 119 | # else: 120 | # optimizer = optim.SGD(train_params, lr=args.lr) 121 | # predict_fn = lambda output: output.max(1, keepdim=True)[1].detach().cpu() 122 | # loss_fn = F.cross_entropy 123 | 124 | # model.train() 125 | # for epoch in range(args.epochs): 126 | # optimizer.zero_grad() 127 | 128 | # loss = 0.0 129 | # n_samples = 0 130 | # correct = 0 131 | # for batch_idx, data in enumerate(loader): 132 | # for i in range(len(data)): 133 | # data[i] = data[i].to(cuda) 134 | # output = model(data) 135 | # if len(output.shape)==1: 136 | # output = output.unsqueeze(0) 137 | # loss += loss_fn(output, data[4])*len(output) 138 | # n_samples += len(output) 139 | 140 | # for i in range(len(data)): 141 | # data[i] = data[i].to(cpu) 142 | # torch.cuda.empty_cache() 143 | 144 | # pred = predict_fn(output) 145 | # correct += pred.eq(data[4].detach().cpu().view_as(pred)).sum().item() 146 | # acc = 100. * correct / n_samples 147 | # loss = torch.div(loss, n_samples) 148 | 149 | # if need_print and epoch%5==0: 150 | # print("Epoch {} | Loss {:.4f} | Train Accuracy {:.4f}".format(epoch, loss.item(), acc)) 151 | # loss.backward() 152 | # optimizer.step() 153 | # model.to(cpu) 154 | 155 | 156 | 157 | def evaluate(args, dr_test: DataReader, model, gids): 158 | # separate bkd_test/clean_test gids 159 | softmax = torch.nn.Softmax(dim=1) 160 | 161 | model.cuda() 162 | gdata = GraphData(dr_test, gids) 163 | loader = DataLoader(gdata, 164 | batch_size=args.batch_size, 165 | shuffle=False, 166 | collate_fn=collate_batch) 167 | 168 | loss_fn = F.cross_entropy 169 | predict_fn = lambda output: output.max(1, keepdim=True)[1].detach().cpu() 170 | 171 | model.eval() 172 | test_loss, correct, n_samples, confidence = 0, 0, 0, 0 173 | for batch_idx, data in enumerate(loader): 174 | for i in range(len(data)): 175 | data[i] = data[i].cuda() 176 | output = model(data) # not softmax yet 177 | if len(output.shape)==1: 178 | output = output.unsqueeze(0) 179 | loss = loss_fn(output, data[4], reduction='sum') 180 | test_loss += loss.item() 181 | n_samples += len(output) 182 | pred = predict_fn(output) 183 | 184 | correct += pred.eq(data[4].detach().cpu().view_as(pred)).sum().item() 185 | confidence += torch.sum(torch.max(softmax(output), dim=1)[0]).item() 186 | acc = 100. * correct / n_samples 187 | confidence = confidence / n_samples 188 | 189 | print('Test set: Average loss: %.4f, Accuracy: %d/%d (%.2f%s), Average Confidence %.4f' % ( 190 | test_loss / n_samples, correct, n_samples, acc, '%', confidence)) 191 | model.cpu() 192 | return acc -------------------------------------------------------------------------------- /utils/batch.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | 4 | 5 | def collate_batch(batch): 6 | ''' 7 | function: Creates a batch of same size graphs by zero-padding node features and adjacency matrices 8 | up to the maximum number of nodes in the CURRENT batch rather than in the entire dataset. 9 | param batch: [node_features*batch_size, A*batch_size, label*batch_size] 10 | return: [padded feature matrices, padded adjecency matrices, non-padding positions, nodenums, labels] 11 | ''' 12 | B = len(batch) 13 | nodenums = [len(batch[b][1]) for b in range(B)] 14 | if len(batch[0][0].shape)==2: 15 | C = batch[0][0].shape[1] # C is feature dim 16 | else: 17 | C = batch[0][0].shape[0] 18 | n_node_max = int(np.max(nodenums)) 19 | 20 | graph_support = torch.zeros(B, n_node_max) 21 | A = torch.zeros(B, n_node_max, n_node_max) 22 | X = torch.zeros(B, n_node_max, C) 23 | for b in range(B): 24 | X[b, :nodenums[b]] = batch[b][0] # store original values in top (no need to pad feat dim, node dim only) 25 | A[b, :nodenums[b], :nodenums[b]] = batch[b][1] # store original values in top-left corner 26 | graph_support[b][:nodenums[b]] = 1 # mask with values of 0 for dummy (zero padded) nodes, otherwise 1 27 | 28 | nodenums = torch.from_numpy(np.array(nodenums)).long() 29 | labels = torch.from_numpy(np.array([batch[b][2] for b in range(B)])).long() 30 | return [X, A, graph_support, nodenums, labels] 31 | 32 | 33 | # Note: here mask "graph_support" is only a 1D mask for each graph instance. 34 | # When use this mask for 2D work, should first extend into 2D. 35 | -------------------------------------------------------------------------------- /utils/bkdcdd.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | from utils.datareader import DataReader 4 | sys.path.append('/home/zxx5113/BackdoorGNN/') 5 | 6 | import numpy as np 7 | import copy 8 | 9 | 10 | # return 1D list 11 | def select_cdd_graphs(args, data: list, adj_list: list, subset: str): 12 | ''' 13 | Given a data (train/test), (randomly or determinately) 14 | pick up some graph to put backdoor information, return ids. 15 | ''' 16 | rs = np.random.RandomState(args.seed) 17 | graph_sizes = [np.array(adj).shape[0] for adj in adj_list] 18 | bkd_graph_ratio = args.bkd_gratio_train if subset == 'train' else args.bkd_gratio_test 19 | bkd_num = int(np.ceil(bkd_graph_ratio * len(data))) 20 | 21 | assert len(data)>bkd_num , "Graph Instances are not enough" 22 | picked_ids = [] 23 | 24 | # Randomly pick up graphs as backdoor candidates from data 25 | remained_set = copy.deepcopy(data) 26 | loopcount = 0 27 | while bkd_num-len(picked_ids) >0 and len(remained_set)>0 and loopcount<=50: 28 | loopcount += 1 29 | 30 | cdd_ids = rs.choice(remained_set, bkd_num-len(picked_ids), replace=False) 31 | for gid in cdd_ids: 32 | if bkd_num-len(picked_ids) <=0: 33 | break 34 | gsize = graph_sizes[gid] 35 | if gsize >= 3*args.bkd_size*args.bkd_num_pergraph: 36 | picked_ids.append(gid) 37 | 38 | if len(remained_set)= 1.5*args.bkd_size*args.bkd_num_pergraph and gid not in picked_ids: 44 | picked_ids.append(gid) 45 | 46 | if len(remained_set)= 1.0*args.bkd_size*args.bkd_num_pergraph and gid not in picked_ids: 52 | picked_ids.append(gid) 53 | 54 | picked_ids = list(set(picked_ids)) 55 | remained_set = list(set(remained_set) - set(picked_ids)) 56 | if len(remained_set)==0 and bkd_num>len(picked_ids): 57 | print("no more graph to pick, return insufficient candidate graphs, try smaller bkd-pattern or graph size") 58 | 59 | return picked_ids 60 | 61 | 62 | def select_cdd_nodes(args, graph_cdd_ids, adj_list): 63 | ''' 64 | Given a graph instance, based on pre-determined standard, 65 | find nodes who should be put backdoor information, return 66 | their ids. 67 | 68 | return: same sequece with bkd-gids 69 | (1) a 2D list - bkd nodes under each graph 70 | (2) and a 3D list - bkd node groups under each graph 71 | (in case of each graph has multiple triggers) 72 | ''' 73 | rs = np.random.RandomState(args.seed) 74 | 75 | # step1: find backdoor nodes 76 | picked_nodes = [] # 2D, save all cdd graphs 77 | 78 | for gid in graph_cdd_ids: 79 | node_ids = [i for i in range(len(adj_list[gid]))] 80 | assert len(node_ids)==len(adj_list[gid]), 'node number in graph {} mismatch'.format(gid) 81 | 82 | bkd_node_num = int(args.bkd_num_pergraph*args.bkd_size) 83 | assert bkd_node_num <= len(adj_list[gid]), "error in SelectCddGraphs, candidate graph too small" 84 | cur_picked_nodes = rs.choice(node_ids, bkd_node_num, replace=False) 85 | picked_nodes.append(cur_picked_nodes) 86 | 87 | # step2: match nodes 88 | assert len(picked_nodes)==len(graph_cdd_ids), "backdoor graphs & node groups mismatch, check SelectCddGraphs/SelectCddNodes" 89 | 90 | node_groups = [] # 3D, grouped trigger nodes 91 | for i in range(len(graph_cdd_ids)): # for each graph, devide candidate nodes into groups 92 | gid = graph_cdd_ids[i] 93 | nids = picked_nodes[i] 94 | 95 | assert len(nids)%args.bkd_size==0.0, "Backdoor nodes cannot equally be divided, check SelectCddNodes-STEP1" 96 | 97 | # groups within each graph 98 | groups = np.array_split(nids, len(nids)//args.bkd_size) 99 | # np.array_split return list[array([..]), array([...]), ] 100 | # thus transfer internal np.array into list 101 | # store groups as a 2D list. 102 | groups = np.array(groups).tolist() 103 | node_groups.append(groups) 104 | 105 | assert len(picked_nodes)==len(node_groups), "groups of bkd-nodes mismatch, check SelectCddNodes-STEP2" 106 | return picked_nodes, node_groups 107 | 108 | -------------------------------------------------------------------------------- /utils/datareader.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file processes tu-dataset and saved in a 'DataReader' class, 3 | then the DataReader objects will transfer into 'GraphData' before training 4 | 5 | Specifically used to process dataset from 6 | https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets 7 | """ 8 | 9 | import os 10 | import torch 11 | import numpy as np 12 | 13 | def split_ids(args, gids, rs): 14 | ''' 15 | single fold 16 | gids: 0-based graph id list. 17 | ''' 18 | train_gids = list(rs.choice(gids, int(args.train_ratio * len(gids)), replace=False)) 19 | test_gids = list(set(gids)-set(train_gids)) 20 | return train_gids, test_gids 21 | 22 | 23 | #! All files should end with .txt 24 | class DataReader(): 25 | """ 26 | Wil contain keys ['adj_list', 'nlabel', 'labels', 'attr', 'features', 27 | 'splits', 'n_node_max', 'num_features', 'num_classes'] 28 | - 'adj_list': generated by 'read_graph_adj' from '_A.txt', which represents 29 | a list of adj matrices, whose shape may be different. Stored in 30 | same order of graph indicator. 31 | In list format, each element is a np.array, refers to a square and 32 | symmetric adj matrix. 33 | 34 | use: datareader.data['adj_list'][gid] - a 2D adj matrix 35 | 36 | - 'nlabels': generated by 'read_node_features' from '_node_labels.txt', which 37 | represents node labels within each same graph instance. Stored in 38 | same sequence of '_node_labels.txt' or '_graph_indicator.txt'. 39 | Stored in list format, 2D. Each internal dim refers to a feature 40 | list (node label list) for a graph instance. 41 | 42 | - 'attr': generated by 'read_node_features' from "_node_attributes.txt" if have, 43 | which represents original features of each node within a graph instance. 44 | Order are same with previous. 45 | Stored as a 2D list of 1D np.array. Internal list is a series of original 46 | node feature vectors for a graph instance. Internal element is a 1D np.array 47 | represents the (maybe floating-point) feature vector for a speficif node. 48 | More exatcly, is [ 49 | [array(f1, f2, ..), array(f1, f2, ..), array(f1, f2, ..)], 50 | [similar node feature vectors in 2nd graph instance], 51 | [similar node feature vectors in 3rd graph instance], 52 | ... 53 | ] 54 | 55 | - 'features': combination of 'nlabel' and 'attr'. Where 'nlabel' are 56 | transferred as one-hot format to show which label belongs to 57 | a specific node. Overall sequence is same with previous. 58 | Each onehot feature matrix has shape (N, D1+D2), where N is 59 | number of nodes within a specific graph instance, D1, D2 are 60 | number of possible labels and node feature vector length within 61 | this graph, respectively. D2 is optional. 62 | 63 | use : datareader.data['features'][gid] - a 2D (N, D1+D2) matrix of constructed features 64 | 65 | - 'labels': concrete label for each graph instance, with same order of 'graph_labels.txt'. 66 | Stored as a list of np.int64. 67 | 68 | use : datareader.data['labels'][gid] - a single int label 69 | 70 | - 'splits': a split of train/test sets. 71 | { 72 | 'train': [list of train graph ids, in int], 73 | 'test': [list of test graph ids, in int] 74 | }. 75 | 76 | use : datareader.data['splits']['train/test'] - a list of int gids 77 | 78 | 79 | 80 | - 'n_node_max': max num of nodes within a graph instance among all graphs. Single int. 81 | - 'num_features': size of concatenate features in 'features'. Single int. 82 | - 'num_classes': num of graph classes. Single int. 83 | """ 84 | 85 | def __init__(self, args): 86 | 87 | # self.args = args 88 | assert args.use_nlabel_asfeat or args.use_org_node_attr or args.use_degree_asfeat, \ 89 | 'need at least one source to construct node features' 90 | 91 | self.data_path = os.path.join(args.data_path, args.dataset) 92 | self.rnd_state = np.random.RandomState(args.seed) 93 | files = os.listdir(self.data_path) 94 | data = {} 95 | 96 | """ 97 | Load raw graphs, nodes, record in 2 dicts. 98 | Load adj list for each graph with sequence of graph indicator. 99 | Load node labels for each graph with sequence of graph indicator. 100 | Load graph labels for each graph with sequence of graph indicator. 101 | """ 102 | nodes, graphs = self.read_graph_nodes_relations( 103 | list(filter(lambda f: f.find('graph_indicator') >= 0, files))[0]) 104 | data['adj_list'] = self.read_graph_adj( # in case of Tox21_Axx_... 105 | list(filter(lambda f: f.find('_A.') >= 0, files))[0], nodes, graphs) 106 | 107 | node_labels_file = list(filter(lambda f: f.find('node_labels') >= 0, files)) 108 | if len(node_labels_file) == 1: 109 | data['nlabels'] = self.read_node_features( 110 | node_labels_file[0], nodes, graphs, fn=lambda s: int(s.strip())) 111 | else: 112 | data['nlabels'] = None 113 | 114 | data['labels'] = np.array( 115 | self.parse_txt_file( 116 | list(filter(lambda f: f.find('graph_labels') >= 0 or f.find('graph_attributes') >= 0, files))[0], 117 | line_parse_fn=lambda s: int(float(s.strip())))) 118 | 119 | if args.use_org_node_attr: 120 | data['attr'] = self.read_node_features(list(filter(lambda f: f.find('node_attributes') >= 0, files))[0], 121 | nodes, graphs, 122 | fn=lambda s: np.array(list(map(float, s.strip().split(','))))) 123 | 124 | '''also include this part into GetFinalFeatures() 125 | ''' 126 | # In each graph sample, treat node labels (if have) as feature for one graph. 127 | nlabels, n_edges, degrees = [], [], [] 128 | for sample_id, adj in enumerate(data['adj_list']): 129 | N = len(adj) # number of nodes 130 | 131 | # some verifications 132 | if data['nlabels'] is not None: 133 | assert N == len(data['nlabels'][sample_id]), (N, len(data['nlabels'][sample_id])) 134 | # if not np.allclose(adj, adj.T): 135 | # print(sample_id, 'not symmetric') # not symm is okay, maybe direct graph 136 | n = np.sum(adj) # total sum of edges 137 | # assert n % 2 == 0, n 138 | 139 | n_edges.append(int(n / 2)) # undirected edges, so need to divide by 2 140 | degrees.extend(list(np.sum(adj, 1))) 141 | if data['nlabels'] is not None: 142 | nlabels.append(np.array(data['nlabels'][sample_id])) 143 | 144 | # Create nlabels over graphs as one-hot vectors for each node 145 | if data['nlabels'] is not None: 146 | nlabels_all = np.concatenate(nlabels) 147 | nlabels_min = nlabels_all.min() 148 | num_nlabels = int(nlabels_all.max() - nlabels_min + 1) # number of possible values 149 | 150 | 151 | 152 | #--------- Generate onehot-feature ---------# 153 | features = GetFinalFeatures(args, data) 154 | 155 | # final graph feature dim 156 | num_features = features[0].shape[1] 157 | 158 | shapes = [len(adj) for adj in data['adj_list']] 159 | labels = data['labels'] # graph class labels, np.ndarray 160 | labels -= np.min(labels) # to start from 0 161 | 162 | classes = np.unique(labels) 163 | num_classes = len(classes) 164 | 165 | """ 166 | Test whether labels are successive, e.g., 0,1,2,3,4,..i, i+1,.. 167 | If not, make them successive. New labels still store in "labels". 168 | """ 169 | if not np.all(np.diff(classes) == 1): 170 | print('making labels sequential, otherwise pytorch might crash') 171 | labels_new = np.zeros(labels.shape, dtype=labels.dtype) - 1 172 | for lbl in range(num_classes): 173 | labels_new[labels == classes[lbl]] = lbl 174 | labels = labels_new 175 | classes = np.unique(labels) 176 | assert len(np.unique(labels)) == num_classes, np.unique(labels) 177 | 178 | 179 | def stats(x): 180 | return (np.mean(x), np.std(x), np.min(x), np.max(x)) 181 | 182 | print('N nodes avg/std/min/max: \t%.2f/%.2f/%d/%d' % stats(shapes)) 183 | print('N edges avg/std/min/max: \t%.2f/%.2f/%d/%d' % stats(n_edges)) 184 | print('Node degree avg/std/min/max: \t%.2f/%.2f/%d/%d' % stats(degrees)) 185 | print('Node features dim: \t\t%d' % num_features) 186 | print('N classes: \t\t\t%d' % num_classes) 187 | print('Classes: \t\t\t%s' % str(classes)) 188 | 189 | for lbl in classes: 190 | print('Class %d: \t\t\t%d samples' % (lbl, np.sum(labels == lbl))) 191 | 192 | if args.data_verbose: 193 | if data['nlabels'] is not None: 194 | for u in np.unique(nlabels_all): 195 | print('nlabels {}, count {}/{}'.format(u, np.count_nonzero(nlabels_all == u), len(nlabels_all))) 196 | 197 | # some datasets like "Fingerprint" may lack graph in _indicator.txt 198 | # N_graphs = len(labels) # number of samples (graphs) in data 199 | # assert N_graphs == len(data['adj_list']) == len(features), 'invalid data' 200 | N_graphs = len(data['adj_list']) 201 | 202 | # Create train/test sets 203 | train_gids, test_gids = split_ids(args, self.rnd_state.permutation(N_graphs), self.rnd_state) 204 | splits = {'train': train_gids, 205 | 'test': test_gids} 206 | 207 | data['features'] = features 208 | data['labels'] = labels 209 | data['splits'] = splits 210 | data['n_node_max'] = np.max(shapes) # max number of nodes 211 | data['num_features'] = num_features 212 | data['num_classes'] = num_classes 213 | 214 | self.data = data 215 | 216 | # print(len(data['features']), len(data['adj_list']), len(data['labels'])) 217 | assert len(data['features'])==len(data['adj_list'])==len(data['labels']), \ 218 | "Graph Number Mismatch, Possible Reason: due to insuccessive graph indicator, \ 219 | some gids are not existed in original indicator files, only thing is filtering graph labels. \ 220 | Remember that insuccessive graph indicator is okay, graph labels-graphs are corresponding by \ 221 | stored index in data['xxx']." 222 | print() 223 | 224 | def parse_txt_file(self, fpath, line_parse_fn=None): 225 | """ 226 | Read a file, split each line by pre-defined pattern (e.g., ','), 227 | save results in list. Transferring data into Int is done outside. 228 | """ 229 | with open(os.path.join(self.data_path, fpath), 'r') as f: 230 | lines = f.readlines() 231 | data = [line_parse_fn(s) if line_parse_fn is not None else s for s in lines] 232 | return data 233 | 234 | 235 | def read_graph_nodes_relations(self, fpath): 236 | """ 237 | From graph_indicator.txt file, find { node_id: graph_id } and { graph_id:[nodes] }. 238 | """ 239 | graph_ids = self.parse_txt_file(fpath, 240 | line_parse_fn=lambda s: int(s.rstrip())) 241 | nodes, graphs = {}, {} 242 | for node_id, graph_id in enumerate(graph_ids): 243 | if graph_id not in graphs: 244 | graphs[graph_id] = [] 245 | graphs[graph_id].append(node_id) 246 | nodes[node_id] = graph_id 247 | graph_ids = np.unique(list(graphs.keys())) 248 | for graph_id in graph_ids: 249 | graphs[graph_id] = np.array(graphs[graph_id]) 250 | return nodes, graphs 251 | 252 | 253 | # for direct graph, row is source nodes 254 | def read_graph_adj(self, fpath, nodes, graphs): 255 | edges = self.parse_txt_file(fpath, 256 | line_parse_fn=lambda s: s.split(',')) 257 | 258 | adj_dict = {} 259 | for edge in edges: 260 | # Note: TU-datasets are all 1 based node id 261 | node1 = int(edge[0].strip()) - 1 # -1 because of zero-indexing in our code 262 | node2 = int(edge[1].strip()) - 1 263 | graph_id = nodes[node1] 264 | 265 | # both nodes in edge side should in a same graph 266 | assert graph_id == nodes[node2], ('invalid data', graph_id, nodes[node2]) 267 | if graph_id not in adj_dict: 268 | n = len(graphs[graph_id]) 269 | adj_dict[graph_id] = np.zeros((n, n)) 270 | 271 | ind1 = np.where(graphs[graph_id] == node1)[0] 272 | ind2 = np.where(graphs[graph_id] == node2)[0] 273 | assert len(ind1) == len(ind2) == 1, (ind1, ind2) 274 | adj_dict[graph_id][ind1, ind2] = 1 275 | 276 | # no-connection graph may not included on code above, 277 | # should specially add it, e.g., graph-291 in Fingerprint 278 | # data set only have single node 1477 (1-based index), 279 | # which is not in edge file since it has no connection. 280 | # But still, we should add it to ensure the consistent. 281 | # some graphs in Tox21 also only have isolated nodes. 282 | adj_list = [] 283 | for gid in sorted(list(graphs.keys())): 284 | if gid in adj_dict: 285 | adj_list.append(adj_dict[gid]) 286 | else: 287 | adj_list.append(np.zeros((len(graphs[gid]), len(graphs[gid])))) 288 | return adj_list 289 | 290 | 291 | def read_node_features(self, fpath, nodes, graphs, fn): 292 | ''' 293 | Return 'feature' graph by graph. 294 | here 'feature' may refer to (1) node attributes; (2) node labels; (3) node degrees 295 | ''' 296 | node_features_all = self.parse_txt_file(fpath, line_parse_fn=fn) 297 | node_features = {} 298 | for node_id, x in enumerate(node_features_all): 299 | graph_id = nodes[node_id] 300 | if graph_id not in node_features: 301 | node_features[graph_id] = [None] * len(graphs[graph_id]) 302 | ind = np.where(graphs[graph_id] == node_id)[0] # exactly find on index 303 | assert len(ind) == 1, ind 304 | assert node_features[graph_id][ind[0]] is None, node_features[graph_id][ind[0]] 305 | node_features[graph_id][ind[0]] = x 306 | node_features_lst = [node_features[graph_id] for graph_id in sorted(list(graphs.keys()))] 307 | return node_features_lst 308 | 309 | 310 | def GetFinalFeatures(args, data): 311 | ''' 312 | Construct features for each graph instnace, may comes from 3 parts. 313 | Each element in 'features' refers to constructed feature mat 314 | to a graph. This feature mas has shape (Ni, Di), where Ni is number 315 | of nodes in graph_i, and Di is combined feature dimension, may comes 316 | from node labels, node features and degree. 317 | ''' 318 | 319 | # In each graph sample, treat node labels (if have) as feature for one graph. 320 | nlabels, n_edges, degrees = [], [], [] 321 | for sample_id, adj in enumerate(data['adj_list']): 322 | N = len(adj) # number of nodes 323 | n = np.sum(adj) # total sum of edges 324 | 325 | n_edges.append(int(n / 2)) # undirected edges, so need to divide by 2 326 | degrees.extend(list(np.sum(adj, 1))) 327 | if data['nlabels'] is not None: 328 | nlabels.append(np.array(data['nlabels'][sample_id])) 329 | 330 | # Create features over graphs as one-hot vectors for each node 331 | if data['nlabels'] is not None: 332 | nlabels_all = np.concatenate(nlabels) 333 | nlabels_min = nlabels_all.min() 334 | num_nlabels = int(nlabels_all.max() - nlabels_min + 1) # number of possible values 335 | 336 | final_features = [] 337 | max_degree = int(np.max(degrees)) # maximum node degree among all graphs 338 | for sample_id, adj in enumerate(data['adj_list']): 339 | N = adj.shape[0] 340 | 341 | # OneHot Feature: (N, D), where D is all possible feature nums 342 | # among ondes within a graph. Each position in is 0/1 to show 343 | # whether it has/hasnot a corresopnding feature here. E.g., if 344 | # original features (also original node labels) range from 3~8, 345 | # now D = 6 (8-3+1), feature "3" will map to position "0", even 346 | # though there are multiple "3" in original feature vector. 347 | 348 | # This is down inside of one single graph. 349 | 350 | 351 | # part 1: one-hot nlabels as feature 352 | if args.use_nlabel_asfeat: 353 | if data['nlabels'] is not None: 354 | x = data['nlabels'][sample_id] 355 | nlabels_onehot = np.zeros((len(x), num_nlabels)) 356 | for node, value in enumerate(x): 357 | if value is not None: 358 | nlabels_onehot[node, value - nlabels_min] = 1 359 | else: 360 | nlabels_onehot = np.empty((N, 0)) 361 | else: 362 | nlabels_onehot = np.empty((N, 0)) 363 | 364 | # part 2 (optional, not always have): original node features 365 | if args.use_org_node_attr: 366 | if args.dataset in ['COLORS-3', 'TRIANGLES']: 367 | # first column corresponds to node attention and shouldn't be used as node features 368 | feature_attr = np.array(data['attr'][sample_id])[:, 1:] 369 | else: 370 | feature_attr = np.array(data['attr'][sample_id]) 371 | else: 372 | feature_attr = np.empty((N, 0)) 373 | 374 | # part 3 (optinal): node degree 375 | if args.use_degree_asfeat: 376 | degree_onehot = np.zeros((N, max_degree + 1)) 377 | degree_onehot[np.arange(N), np.sum(adj, 1).astype(np.int32)] = 1 378 | else: 379 | degree_onehot = np.empty((N, 0)) 380 | 381 | node_features = np.concatenate((nlabels_onehot, feature_attr, degree_onehot), axis=1) 382 | if node_features.shape[1] == 0: 383 | # dummy features for datasets without node labels/attributes 384 | # node degree features can be used instead 385 | node_features = np.ones((N, 1)) 386 | final_features.append(node_features) 387 | 388 | return final_features 389 | 390 | 391 | class GraphData(torch.utils.data.Dataset): 392 | def __init__(self, datareader: DataReader, gids: list): 393 | self.idx = gids 394 | self.rnd_state = datareader.rnd_state 395 | self.set_fold(datareader.data) 396 | 397 | def set_fold(self, data): 398 | self.total = len(data['labels']) 399 | self.n_node_max = data['n_node_max'] 400 | self.num_classes = data['num_classes'] 401 | self.num_features = data['num_features'] 402 | self.labels = [data['labels'][i] for i in self.idx] 403 | self.adj_list = [data['adj_list'][i] for i in self.idx] 404 | self.features = [data['features'][i] for i in self.idx] 405 | # print('%s: %d/%d' % (self.split_name.upper(), len(self.labels), len(data['labels']))) 406 | 407 | def __len__(self): 408 | return len(self.labels) 409 | 410 | def __getitem__(self, index): 411 | # convert to torch 412 | return [torch.as_tensor(self.features[index], dtype=torch.float), # node features 413 | torch.as_tensor(self.adj_list[index], dtype=torch.float), # adj matrices 414 | int(self.labels[index])] 415 | 416 | -------------------------------------------------------------------------------- /utils/graph.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import dgl 3 | import networkx as nx 4 | 5 | def numpy_to_graph(A,type_graph='dgl',node_features=None, to_cuda=True): 6 | '''Convert numpy arrays to graph 7 | 8 | Parameters 9 | ---------- 10 | A : mxm array 11 | Adjacency matrix 12 | type_graph : str 13 | 'dgl' or 'nx' 14 | node_features : dict 15 | Optional, dictionary with key=feature name, value=list of size m 16 | Allows user to specify node features 17 | 18 | Returns 19 | 20 | ------- 21 | Graph of 'type_graph' specification 22 | ''' 23 | 24 | G = nx.from_numpy_array(A) 25 | 26 | if node_features != None: 27 | for n in G.nodes(): 28 | for k,v in node_features.items(): 29 | G.nodes[n][k] = v[n] 30 | 31 | if type_graph == 'nx': 32 | return G 33 | 34 | G = G.to_directed() 35 | 36 | if node_features != None: 37 | node_attrs = list(node_features.keys()) 38 | else: 39 | node_attrs = [] 40 | 41 | g = dgl.from_networkx(G, node_attrs=node_attrs, edge_attrs=['weight']) 42 | if to_cuda: 43 | g = g.to(torch.device('cuda')) 44 | return g -------------------------------------------------------------------------------- /utils/mask.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import copy 4 | 5 | def gen_mask(datareader, bkd_gids, bkd_nid_groups): 6 | """ 7 | Input a datareader and a list of backdoor candidate nodes (train/test), 8 | generate 2 list of masks (2D) to each of them, for topology and feature, 9 | respectively. 10 | 11 | Here a adj mask is (N, N), and feat mask is (N, F), where N is maximum 12 | num of nodes among all graphs in a dataset, F is fixed feat dim value. 13 | 14 | About how to use the mask: Topo- and Feat-mask are used in a same manner: 15 | (1) After the padding input (N, N/F) pass though its corresponding AdaptNet, 16 | we get a (N, N/F) result for one graph instance. 17 | (2) Simply do element-wise torch.mul with mask and this result, since we 18 | only want to keep mutual information inside of a backdoor pattern. 19 | (3) After masking redundant information, remember to remove additional dim 20 | in row/col, recover this masked result back to original dim same with 21 | corresponding graph instance. 22 | (4) Simply add recovered result with initialized adj / feat matrix. 23 | 24 | About inputs: 25 | - bkd_gids: 1D list 26 | - bkd_node_groups: 3D list 27 | """ 28 | nodenums = [len(adj) for adj in datareader.data['adj_list']] 29 | N = max(nodenums) 30 | F = np.array(datareader.data['features'][0]).shape[1] 31 | topomask = {} 32 | featmask = {} 33 | 34 | for i in range(len(bkd_gids)): 35 | gid = bkd_gids[i] 36 | groups = bkd_nid_groups[i] 37 | if gid not in topomask: topomask[gid] = torch.zeros(N, N) 38 | if gid not in featmask: featmask[gid] = torch.zeros(N, F) 39 | 40 | for group in groups: 41 | for nid in group: 42 | topomask[gid][nid][group] = 1 43 | topomask[gid][nid][nid] = 0 44 | featmask[gid][nid][::] = 1 45 | 46 | return topomask, featmask 47 | 48 | 49 | def recover_mask(Ni, mask, for_whom): 50 | """ 51 | Step3 of the mask usage, recover each masked result back to original: 52 | topomask[gid]: (N, N) --> (Ni, Ni) 53 | featmask[gid]: (N, F) --> (Ni, F) 54 | 55 | Not change original mask 56 | 57 | About mask: 58 | topomask: contains all topo masks in train/test set, dict. 59 | featmask: contains all feat masks in train/test set, dict. 60 | Return: mask for single graph instance 61 | """ 62 | recovermask = copy.deepcopy(mask) 63 | 64 | if for_whom == 'topo': 65 | recovermask = recovermask[:Ni, :Ni] 66 | elif for_whom == 'feat': 67 | recovermask = recovermask[:Ni] 68 | 69 | return recovermask 70 | --------------------------------------------------------------------------------