├── .gitignore
├── README.md
├── config.py
├── dataset
└── AIDS
│ ├── AIDS_A.txt
│ ├── AIDS_edge_labels.txt
│ ├── AIDS_graph_indicator.txt
│ ├── AIDS_graph_labels.txt
│ ├── AIDS_label_readme.txt
│ ├── AIDS_node_attributes.txt
│ └── AIDS_node_labels.txt
├── main
├── attack.py
├── benign.py
└── example.sh
├── model
├── gat.py
├── gcn.py
└── sage.py
├── trojan
├── GTA.py
├── __init__.py
├── input.py
└── prop.py
└── utils
├── batch.py
├── bkdcdd.py
├── datareader.py
├── graph.py
└── mask.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # Configuration files
2 | .vscode
3 | *.pyc
4 |
5 | # Temp files
6 | __pycache__
7 | .ipynb_checkpoints
8 |
9 | # Scripts
10 | *.log
11 | archive
12 | config
13 | utils_org
14 | prepare
15 | save
16 | main/android
17 | trojan/transGTA.py
18 | # git rm -rf --cached .
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # GraphBackdoor
2 |
3 | This is a light-weight implementation of our **USENIX Security'21** paper **[Graph Backdoor](https://arxiv.org/abs/2006.11890)**. To be convenient for relevant projects, we simplify following functionalities with a higher running efficiency:
4 |
5 | - **GNNs**: now we use DGL-based framework to implement our GNN, which has better memory occupation and running speed. For more information about DGL, see **Useful resources**.
6 | - **graph encoding**: using pretrained attention network causes additional time cost. We find that directly aggregating input-space (feature/topology) matrices can also lead to a good input representation. Please see `./trojan/input.py`
7 | - **blending function**: re-searching a subgraph to blend trigger has high cost especially on large graphs. Instead, one can always blend a generated trigger in a fixed region.
8 | - **optimization objective**: we find the output-end optimization (based on labels) can realize similar attack efficacy comparing with imtermediate activations, but can significantly simplify the implementation. Thus we change to use label-level objective.
9 |
10 | If you aim to compare the performance between this work and your novel attacks, or develop a defense against it, feel free to use this release on your work due to its easier accessibility and higher efficiency.
11 |
12 | ## Guide
13 |
14 | We organize the structure of our files as follows:
15 | ```latex
16 | .
17 | ├── dataset/ # keep all original dataset you may use
18 | ├── main/
19 | │ ├── attack.py # end-to-end attack codes
20 | │ ├── benign.py # benign training/evaluation codes
21 | │ └── example.sh # examples of running commands
22 | ├── model/
23 | │ ├── gcn.py # dgl-based GCN
24 | │ └── sage.py # dgl-based GraphSAGE
25 | ├── save/ # temporary dir to save your trained models/perturbed data
26 | ├── utils/
27 | │ ├── batch.py # collate_batch function
28 | │ ├── bkdcdd.py # codes to select victim graphs and trigger regions
29 | │ ├── datareader.sh # data loader codes
30 | │ ├── graph.py # simple utility function(s) related to graph processing
31 | │ └── mask.py # the mask functions to scale graphs into same size or scale back
32 | └── config.py # all configurations
33 |
34 | ```
35 |
36 | ## Required packages
37 | - torch 1.5.1
38 | - dgl 0.4.2
39 |
40 |
41 | ## Useful resources:
42 | - [TU graph set](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets): most of our datasets come from this source. In some cases, we need to change the graph set such as remove some classes without enough instances, or remove graphs with small node scale.
43 | - [DGL](https://docs.dgl.ai): we use DGL to implement our GNNs in this released version, because it has some high-efficient implementations such as [GCN](https://docs.dgl.ai/en/0.6.x/tutorials/models/1_gnn/1_gcn.html), [GAT](https://docs.dgl.ai/en/0.4.x/tutorials/models/1_gnn/9_gat.html), [GraphSAGE](https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/model.py).
44 | - [TU graph datareader](https://github.com/bknyaz/graph_nn/blob/master/graph_unet.py): this repo implements a data loader to process TU graph datasets under their raw storage formats. Our `./utils/datareader.py` and `./utils/batch.py` contain the modified codes and we appreciate the authors' efforts!
45 |
46 |
47 | ## Run the code
48 | You can directly run the attack by `python -u ./main/attack.py --use_org_node_attr --train_verbose --dataset --target_class `. We put some example commands in `./main/example.sh`.
49 |
50 |
51 | ## Cite
52 | Please cite our paper if it is helpful in your own work:
53 | ```
54 | @inproceedings{xi2021graph,
55 | title={Graph backdoor},
56 | author={Xi, Zhaohan and Pang, Ren and Ji, Shouling and Wang, Ting},
57 | booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
58 | year={2021}
59 | }
60 | ```
--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | def add_data_group(group):
4 | group.add_argument('--seed', type=int, default=123)
5 | group.add_argument('--dataset', type=str, default='AIDS', help="used dataset")
6 | group.add_argument('--data_path', type=str, default='../dataset', help="the directory used to save dataset")
7 | group.add_argument('--use_nlabel_asfeat', action='store_true', help="use node labels as (part of) node features")
8 | group.add_argument('--use_org_node_attr', action='store_true', help="use node attributes as (part of) node features")
9 | group.add_argument('--use_degree_asfeat', action='store_true', help="use node degrees as (part of) node features")
10 | group.add_argument('--data_verbose', action='store_true', help="print detailed dataset info")
11 | group.add_argument('--save_data', action='store_true')
12 |
13 |
14 | def add_model_group(group):
15 | group.add_argument('--model', type=str, default='gcn', help="used model")
16 | group.add_argument('--train_ratio', type=float, default=0.5, help="ratio of trainset from whole dataset")
17 | group.add_argument('--hidden_dim', nargs='+', default=[64, 16], type=int, help='constrain how much products a vendor can have')
18 | group.add_argument('--num_head', type=int, default=2, help="GAT head number")
19 |
20 | group.add_argument('--batch_size', type=int, default=16)
21 | group.add_argument('--train_epochs', type=int, default=40)
22 | group.add_argument('--lr', type=float, default=0.01)
23 | group.add_argument('--lr_decay_steps', nargs='+', default=[25, 35], type=int)
24 | group.add_argument('--weight_decay', type=float, default=5e-4)
25 | group.add_argument('--dropout', type=float, default=0.5)
26 | group.add_argument('--train_verbose', action='store_true', help="print training details")
27 | group.add_argument('--log_every', type=int, default=1, help='print every x epoch')
28 | group.add_argument('--eval_every', type=int, default=5, help='evaluate every x epoch')
29 |
30 | group.add_argument('--clean_model_save_path', type=str, default='../save/model/clean')
31 | group.add_argument('--save_clean_model', action='store_true')
32 |
33 | def add_atk_group(group):
34 | group.add_argument('--bkd_gratio_train', type=float, default=0.1, help="backdoor graph ratio in trainset")
35 | group.add_argument('--bkd_gratio_test', type=float, default=0.5, help="backdoor graph ratio in testset")
36 | group.add_argument('--bkd_num_pergraph', type=int, default=1, help="number of backdoor triggers per graph")
37 | group.add_argument('--bkd_size', type=int, default=5, help="number of nodes for each trigger")
38 | group.add_argument('--target_class', type=int, default=None, help="the targeted node/graph label")
39 |
40 | group.add_argument('--gtn_layernum', type=int, default=3, help="layer number of GraphTrojanNet")
41 | group.add_argument('--pn_rate', type=float, default=1, help="ratio between trigger-embedded graphs (positive) and benign ones (negative)")
42 | group.add_argument('--gtn_input_type', type=str, default='2hop', help="how to process org graphs before inputting to GTN")
43 |
44 | group.add_argument('--resample_steps', type=int, default=3, help="# iterations to re-select graph samples")
45 | group.add_argument('--bilevel_steps', type=int, default=4, help="# bi-level optimization iterations")
46 | group.add_argument('--gtn_lr', type=float, default=0.01)
47 | group.add_argument('--gtn_epochs', type=int, default=20, help="# attack epochs")
48 | group.add_argument('--topo_activation', type=str, default='sigmoid', help="activation function for topology generator")
49 | group.add_argument('--feat_activation', type=str, default='relu', help="activation function for feature generator")
50 | group.add_argument('--topo_thrd', type=float, default=0.5, help="threshold for topology generator")
51 | group.add_argument('--feat_thrd', type=float, default=0, help="threshold for feature generator (only useful for binary feature)")
52 |
53 | group.add_argument('--lambd', type=float, default=1, help="a hyperparameter to balance attack loss components")
54 | # group.add_argument('--atk_verbose', action='store_true', help="print attack details")
55 | group.add_argument('--save_bkd_model', action='store_true')
56 | group.add_argument('--bkd_model_save_path', type=str, default='../save/model/bkd')
57 |
58 | def parse_args():
59 | parser = argparse.ArgumentParser()
60 | data_group = parser.add_argument_group(title="Data-related configuration")
61 | model_group = parser.add_argument_group(title="Model-related configuration")
62 | atk_group = parser.add_argument_group(title="Attack-related configuration")
63 |
64 | add_data_group(data_group)
65 | add_model_group(model_group)
66 | add_atk_group(atk_group)
67 |
68 | return parser.parse_args()
69 |
--------------------------------------------------------------------------------
/dataset/AIDS/AIDS_graph_labels.txt:
--------------------------------------------------------------------------------
1 | 0
2 | 1
3 | 1
4 | 1
5 | 0
6 | 1
7 | 1
8 | 1
9 | 0
10 | 1
11 | 0
12 | 1
13 | 1
14 | 1
15 | 0
16 | 1
17 | 1
18 | 1
19 | 0
20 | 1
21 | 1
22 | 1
23 | 1
24 | 1
25 | 1
26 | 1
27 | 1
28 | 0
29 | 1
30 | 1
31 | 1
32 | 1
33 | 1
34 | 1
35 | 1
36 | 0
37 | 1
38 | 1
39 | 1
40 | 1
41 | 1
42 | 1
43 | 0
44 | 0
45 | 1
46 | 1
47 | 1
48 | 1
49 | 1
50 | 1
51 | 1
52 | 1
53 | 1
54 | 1
55 | 1
56 | 1
57 | 0
58 | 1
59 | 0
60 | 1
61 | 1
62 | 1
63 | 1
64 | 0
65 | 1
66 | 1
67 | 1
68 | 1
69 | 0
70 | 1
71 | 1
72 | 1
73 | 1
74 | 1
75 | 1
76 | 1
77 | 0
78 | 1
79 | 1
80 | 1
81 | 1
82 | 1
83 | 1
84 | 1
85 | 1
86 | 1
87 | 1
88 | 1
89 | 1
90 | 1
91 | 1
92 | 1
93 | 1
94 | 1
95 | 1
96 | 1
97 | 1
98 | 1
99 | 0
100 | 1
101 | 1
102 | 1
103 | 1
104 | 1
105 | 1
106 | 1
107 | 1
108 | 1
109 | 1
110 | 1
111 | 1
112 | 0
113 | 1
114 | 0
115 | 1
116 | 1
117 | 1
118 | 1
119 | 1
120 | 1
121 | 0
122 | 1
123 | 1
124 | 1
125 | 1
126 | 0
127 | 1
128 | 1
129 | 1
130 | 1
131 | 1
132 | 0
133 | 1
134 | 1
135 | 1
136 | 1
137 | 1
138 | 0
139 | 1
140 | 0
141 | 1
142 | 0
143 | 1
144 | 1
145 | 1
146 | 0
147 | 1
148 | 1
149 | 1
150 | 1
151 | 0
152 | 1
153 | 1
154 | 1
155 | 1
156 | 0
157 | 1
158 | 1
159 | 1
160 | 0
161 | 1
162 | 1
163 | 0
164 | 1
165 | 1
166 | 1
167 | 0
168 | 0
169 | 1
170 | 1
171 | 1
172 | 0
173 | 0
174 | 0
175 | 1
176 | 1
177 | 1
178 | 1
179 | 1
180 | 1
181 | 1
182 | 1
183 | 1
184 | 1
185 | 0
186 | 1
187 | 1
188 | 0
189 | 0
190 | 1
191 | 1
192 | 1
193 | 1
194 | 1
195 | 0
196 | 1
197 | 1
198 | 1
199 | 1
200 | 1
201 | 1
202 | 1
203 | 1
204 | 0
205 | 1
206 | 0
207 | 1
208 | 1
209 | 0
210 | 1
211 | 1
212 | 1
213 | 1
214 | 1
215 | 1
216 | 1
217 | 1
218 | 1
219 | 1
220 | 1
221 | 0
222 | 1
223 | 1
224 | 1
225 | 1
226 | 1
227 | 1
228 | 0
229 | 1
230 | 1
231 | 0
232 | 1
233 | 1
234 | 1
235 | 1
236 | 1
237 | 0
238 | 1
239 | 0
240 | 1
241 | 1
242 | 0
243 | 1
244 | 1
245 | 1
246 | 0
247 | 1
248 | 1
249 | 1
250 | 1
251 | 1
252 | 1
253 | 1
254 | 1
255 | 1
256 | 1
257 | 1
258 | 1
259 | 0
260 | 1
261 | 1
262 | 0
263 | 1
264 | 1
265 | 1
266 | 1
267 | 0
268 | 1
269 | 1
270 | 1
271 | 1
272 | 1
273 | 1
274 | 1
275 | 1
276 | 1
277 | 1
278 | 1
279 | 1
280 | 1
281 | 1
282 | 1
283 | 1
284 | 1
285 | 1
286 | 1
287 | 0
288 | 1
289 | 1
290 | 1
291 | 0
292 | 1
293 | 1
294 | 1
295 | 1
296 | 1
297 | 1
298 | 1
299 | 1
300 | 1
301 | 1
302 | 1
303 | 1
304 | 1
305 | 1
306 | 0
307 | 0
308 | 1
309 | 1
310 | 1
311 | 1
312 | 1
313 | 1
314 | 1
315 | 1
316 | 1
317 | 1
318 | 1
319 | 1
320 | 1
321 | 1
322 | 1
323 | 1
324 | 1
325 | 1
326 | 1
327 | 1
328 | 1
329 | 1
330 | 1
331 | 0
332 | 1
333 | 1
334 | 0
335 | 1
336 | 1
337 | 1
338 | 1
339 | 1
340 | 0
341 | 1
342 | 1
343 | 0
344 | 0
345 | 1
346 | 1
347 | 1
348 | 1
349 | 1
350 | 1
351 | 1
352 | 1
353 | 1
354 | 1
355 | 1
356 | 0
357 | 1
358 | 1
359 | 1
360 | 1
361 | 0
362 | 1
363 | 1
364 | 0
365 | 0
366 | 1
367 | 1
368 | 1
369 | 1
370 | 1
371 | 1
372 | 1
373 | 1
374 | 1
375 | 1
376 | 1
377 | 1
378 | 1
379 | 1
380 | 1
381 | 1
382 | 1
383 | 0
384 | 0
385 | 1
386 | 1
387 | 1
388 | 1
389 | 1
390 | 1
391 | 0
392 | 1
393 | 0
394 | 1
395 | 1
396 | 1
397 | 1
398 | 1
399 | 0
400 | 1
401 | 1
402 | 1
403 | 1
404 | 1
405 | 0
406 | 1
407 | 0
408 | 0
409 | 1
410 | 1
411 | 0
412 | 1
413 | 1
414 | 1
415 | 1
416 | 1
417 | 0
418 | 1
419 | 1
420 | 1
421 | 1
422 | 0
423 | 1
424 | 1
425 | 0
426 | 1
427 | 1
428 | 1
429 | 1
430 | 0
431 | 1
432 | 1
433 | 1
434 | 0
435 | 1
436 | 0
437 | 1
438 | 1
439 | 1
440 | 1
441 | 1
442 | 1
443 | 1
444 | 0
445 | 1
446 | 1
447 | 1
448 | 1
449 | 1
450 | 1
451 | 1
452 | 1
453 | 1
454 | 0
455 | 1
456 | 0
457 | 1
458 | 1
459 | 0
460 | 1
461 | 1
462 | 0
463 | 1
464 | 1
465 | 1
466 | 1
467 | 1
468 | 1
469 | 1
470 | 0
471 | 0
472 | 1
473 | 0
474 | 1
475 | 1
476 | 1
477 | 1
478 | 1
479 | 1
480 | 1
481 | 1
482 | 1
483 | 1
484 | 1
485 | 1
486 | 1
487 | 1
488 | 1
489 | 1
490 | 0
491 | 1
492 | 1
493 | 0
494 | 1
495 | 1
496 | 1
497 | 1
498 | 1
499 | 1
500 | 1
501 | 0
502 | 1
503 | 1
504 | 1
505 | 0
506 | 1
507 | 1
508 | 1
509 | 0
510 | 1
511 | 1
512 | 1
513 | 1
514 | 1
515 | 1
516 | 1
517 | 1
518 | 0
519 | 1
520 | 1
521 | 1
522 | 0
523 | 1
524 | 1
525 | 1
526 | 1
527 | 1
528 | 0
529 | 1
530 | 1
531 | 1
532 | 1
533 | 1
534 | 1
535 | 1
536 | 1
537 | 0
538 | 0
539 | 1
540 | 0
541 | 1
542 | 1
543 | 1
544 | 1
545 | 1
546 | 0
547 | 1
548 | 1
549 | 1
550 | 1
551 | 1
552 | 1
553 | 0
554 | 1
555 | 1
556 | 0
557 | 1
558 | 1
559 | 1
560 | 1
561 | 1
562 | 1
563 | 1
564 | 1
565 | 1
566 | 1
567 | 1
568 | 1
569 | 1
570 | 1
571 | 0
572 | 0
573 | 1
574 | 1
575 | 0
576 | 1
577 | 1
578 | 1
579 | 1
580 | 1
581 | 1
582 | 1
583 | 1
584 | 0
585 | 0
586 | 1
587 | 1
588 | 1
589 | 1
590 | 0
591 | 1
592 | 1
593 | 1
594 | 1
595 | 1
596 | 1
597 | 1
598 | 1
599 | 1
600 | 0
601 | 1
602 | 0
603 | 1
604 | 0
605 | 1
606 | 1
607 | 0
608 | 1
609 | 1
610 | 1
611 | 1
612 | 0
613 | 1
614 | 1
615 | 1
616 | 1
617 | 1
618 | 1
619 | 0
620 | 0
621 | 1
622 | 0
623 | 1
624 | 0
625 | 0
626 | 0
627 | 1
628 | 1
629 | 1
630 | 0
631 | 1
632 | 0
633 | 1
634 | 0
635 | 1
636 | 1
637 | 1
638 | 1
639 | 1
640 | 1
641 | 1
642 | 1
643 | 1
644 | 0
645 | 0
646 | 1
647 | 0
648 | 1
649 | 1
650 | 1
651 | 1
652 | 0
653 | 1
654 | 1
655 | 1
656 | 1
657 | 1
658 | 1
659 | 1
660 | 1
661 | 0
662 | 1
663 | 0
664 | 1
665 | 1
666 | 1
667 | 0
668 | 1
669 | 1
670 | 1
671 | 1
672 | 0
673 | 1
674 | 1
675 | 1
676 | 1
677 | 1
678 | 1
679 | 1
680 | 1
681 | 1
682 | 1
683 | 1
684 | 1
685 | 1
686 | 1
687 | 1
688 | 1
689 | 0
690 | 1
691 | 1
692 | 1
693 | 1
694 | 1
695 | 1
696 | 1
697 | 1
698 | 1
699 | 1
700 | 1
701 | 1
702 | 0
703 | 1
704 | 1
705 | 0
706 | 1
707 | 1
708 | 1
709 | 1
710 | 0
711 | 1
712 | 1
713 | 1
714 | 1
715 | 1
716 | 1
717 | 1
718 | 1
719 | 1
720 | 1
721 | 0
722 | 0
723 | 1
724 | 1
725 | 1
726 | 1
727 | 1
728 | 1
729 | 1
730 | 1
731 | 0
732 | 0
733 | 0
734 | 0
735 | 0
736 | 1
737 | 1
738 | 1
739 | 1
740 | 1
741 | 1
742 | 1
743 | 1
744 | 1
745 | 1
746 | 1
747 | 1
748 | 1
749 | 1
750 | 1
751 | 1
752 | 0
753 | 0
754 | 1
755 | 1
756 | 1
757 | 1
758 | 1
759 | 1
760 | 1
761 | 1
762 | 1
763 | 1
764 | 1
765 | 1
766 | 1
767 | 1
768 | 0
769 | 1
770 | 1
771 | 1
772 | 1
773 | 0
774 | 1
775 | 1
776 | 1
777 | 1
778 | 1
779 | 1
780 | 1
781 | 1
782 | 1
783 | 1
784 | 1
785 | 1
786 | 1
787 | 0
788 | 1
789 | 1
790 | 1
791 | 1
792 | 1
793 | 1
794 | 1
795 | 1
796 | 0
797 | 1
798 | 1
799 | 1
800 | 1
801 | 0
802 | 1
803 | 1
804 | 1
805 | 1
806 | 0
807 | 1
808 | 1
809 | 1
810 | 1
811 | 1
812 | 1
813 | 1
814 | 0
815 | 0
816 | 0
817 | 1
818 | 0
819 | 0
820 | 1
821 | 1
822 | 1
823 | 1
824 | 1
825 | 1
826 | 0
827 | 0
828 | 1
829 | 1
830 | 1
831 | 0
832 | 1
833 | 0
834 | 1
835 | 1
836 | 1
837 | 1
838 | 1
839 | 1
840 | 0
841 | 1
842 | 1
843 | 1
844 | 0
845 | 1
846 | 1
847 | 1
848 | 0
849 | 0
850 | 1
851 | 1
852 | 0
853 | 1
854 | 1
855 | 1
856 | 1
857 | 1
858 | 1
859 | 1
860 | 0
861 | 1
862 | 1
863 | 1
864 | 0
865 | 1
866 | 1
867 | 0
868 | 0
869 | 1
870 | 1
871 | 1
872 | 0
873 | 1
874 | 1
875 | 1
876 | 1
877 | 0
878 | 1
879 | 1
880 | 1
881 | 1
882 | 1
883 | 1
884 | 1
885 | 1
886 | 1
887 | 1
888 | 0
889 | 1
890 | 1
891 | 1
892 | 0
893 | 1
894 | 1
895 | 0
896 | 1
897 | 1
898 | 1
899 | 1
900 | 0
901 | 1
902 | 0
903 | 1
904 | 1
905 | 1
906 | 0
907 | 1
908 | 0
909 | 1
910 | 1
911 | 1
912 | 1
913 | 0
914 | 1
915 | 1
916 | 1
917 | 1
918 | 1
919 | 1
920 | 0
921 | 1
922 | 1
923 | 1
924 | 0
925 | 1
926 | 1
927 | 1
928 | 1
929 | 1
930 | 1
931 | 1
932 | 1
933 | 0
934 | 1
935 | 1
936 | 1
937 | 1
938 | 1
939 | 1
940 | 0
941 | 1
942 | 0
943 | 0
944 | 1
945 | 0
946 | 1
947 | 1
948 | 1
949 | 1
950 | 1
951 | 1
952 | 1
953 | 0
954 | 1
955 | 1
956 | 1
957 | 1
958 | 1
959 | 1
960 | 0
961 | 1
962 | 0
963 | 0
964 | 0
965 | 1
966 | 1
967 | 1
968 | 1
969 | 1
970 | 1
971 | 1
972 | 1
973 | 1
974 | 1
975 | 1
976 | 0
977 | 1
978 | 1
979 | 0
980 | 1
981 | 1
982 | 1
983 | 0
984 | 1
985 | 1
986 | 1
987 | 1
988 | 1
989 | 1
990 | 1
991 | 1
992 | 1
993 | 1
994 | 1
995 | 1
996 | 0
997 | 1
998 | 1
999 | 0
1000 | 0
1001 | 1
1002 | 0
1003 | 1
1004 | 1
1005 | 1
1006 | 1
1007 | 1
1008 | 1
1009 | 1
1010 | 0
1011 | 0
1012 | 1
1013 | 1
1014 | 1
1015 | 1
1016 | 1
1017 | 1
1018 | 1
1019 | 1
1020 | 1
1021 | 1
1022 | 1
1023 | 1
1024 | 1
1025 | 0
1026 | 1
1027 | 1
1028 | 0
1029 | 1
1030 | 1
1031 | 1
1032 | 1
1033 | 1
1034 | 1
1035 | 1
1036 | 1
1037 | 1
1038 | 1
1039 | 1
1040 | 1
1041 | 1
1042 | 1
1043 | 1
1044 | 1
1045 | 1
1046 | 1
1047 | 1
1048 | 1
1049 | 1
1050 | 1
1051 | 1
1052 | 1
1053 | 1
1054 | 1
1055 | 0
1056 | 0
1057 | 0
1058 | 1
1059 | 1
1060 | 1
1061 | 1
1062 | 1
1063 | 1
1064 | 1
1065 | 1
1066 | 1
1067 | 1
1068 | 1
1069 | 1
1070 | 1
1071 | 1
1072 | 1
1073 | 1
1074 | 1
1075 | 1
1076 | 1
1077 | 0
1078 | 1
1079 | 1
1080 | 1
1081 | 1
1082 | 1
1083 | 0
1084 | 1
1085 | 1
1086 | 1
1087 | 1
1088 | 1
1089 | 0
1090 | 1
1091 | 1
1092 | 0
1093 | 0
1094 | 0
1095 | 1
1096 | 1
1097 | 1
1098 | 0
1099 | 1
1100 | 1
1101 | 0
1102 | 1
1103 | 1
1104 | 0
1105 | 1
1106 | 1
1107 | 0
1108 | 1
1109 | 1
1110 | 0
1111 | 1
1112 | 1
1113 | 1
1114 | 1
1115 | 1
1116 | 1
1117 | 0
1118 | 1
1119 | 0
1120 | 0
1121 | 1
1122 | 1
1123 | 1
1124 | 0
1125 | 1
1126 | 1
1127 | 1
1128 | 1
1129 | 1
1130 | 1
1131 | 1
1132 | 1
1133 | 1
1134 | 1
1135 | 1
1136 | 1
1137 | 1
1138 | 1
1139 | 0
1140 | 1
1141 | 1
1142 | 1
1143 | 1
1144 | 1
1145 | 1
1146 | 1
1147 | 1
1148 | 1
1149 | 0
1150 | 0
1151 | 1
1152 | 1
1153 | 1
1154 | 0
1155 | 1
1156 | 1
1157 | 1
1158 | 0
1159 | 1
1160 | 1
1161 | 1
1162 | 0
1163 | 1
1164 | 1
1165 | 1
1166 | 0
1167 | 1
1168 | 0
1169 | 1
1170 | 0
1171 | 0
1172 | 1
1173 | 1
1174 | 1
1175 | 1
1176 | 0
1177 | 1
1178 | 1
1179 | 1
1180 | 1
1181 | 1
1182 | 1
1183 | 1
1184 | 1
1185 | 1
1186 | 0
1187 | 1
1188 | 1
1189 | 1
1190 | 0
1191 | 1
1192 | 1
1193 | 1
1194 | 1
1195 | 0
1196 | 1
1197 | 1
1198 | 1
1199 | 0
1200 | 1
1201 | 1
1202 | 1
1203 | 1
1204 | 1
1205 | 1
1206 | 1
1207 | 1
1208 | 1
1209 | 1
1210 | 1
1211 | 0
1212 | 1
1213 | 1
1214 | 1
1215 | 1
1216 | 1
1217 | 1
1218 | 1
1219 | 1
1220 | 1
1221 | 1
1222 | 1
1223 | 1
1224 | 1
1225 | 0
1226 | 1
1227 | 1
1228 | 1
1229 | 0
1230 | 1
1231 | 1
1232 | 0
1233 | 1
1234 | 1
1235 | 1
1236 | 1
1237 | 1
1238 | 0
1239 | 1
1240 | 0
1241 | 1
1242 | 1
1243 | 1
1244 | 1
1245 | 0
1246 | 1
1247 | 1
1248 | 1
1249 | 1
1250 | 1
1251 | 1
1252 | 0
1253 | 0
1254 | 1
1255 | 1
1256 | 0
1257 | 1
1258 | 1
1259 | 0
1260 | 1
1261 | 1
1262 | 1
1263 | 1
1264 | 1
1265 | 0
1266 | 1
1267 | 1
1268 | 0
1269 | 1
1270 | 1
1271 | 0
1272 | 0
1273 | 1
1274 | 1
1275 | 1
1276 | 1
1277 | 1
1278 | 0
1279 | 1
1280 | 1
1281 | 1
1282 | 1
1283 | 1
1284 | 1
1285 | 1
1286 | 0
1287 | 1
1288 | 1
1289 | 1
1290 | 1
1291 | 1
1292 | 1
1293 | 1
1294 | 1
1295 | 0
1296 | 1
1297 | 1
1298 | 1
1299 | 0
1300 | 0
1301 | 0
1302 | 1
1303 | 1
1304 | 1
1305 | 1
1306 | 1
1307 | 1
1308 | 1
1309 | 1
1310 | 1
1311 | 1
1312 | 1
1313 | 1
1314 | 1
1315 | 0
1316 | 1
1317 | 1
1318 | 1
1319 | 1
1320 | 1
1321 | 1
1322 | 1
1323 | 1
1324 | 0
1325 | 1
1326 | 1
1327 | 1
1328 | 1
1329 | 1
1330 | 0
1331 | 1
1332 | 1
1333 | 0
1334 | 0
1335 | 0
1336 | 0
1337 | 1
1338 | 1
1339 | 1
1340 | 1
1341 | 1
1342 | 1
1343 | 1
1344 | 1
1345 | 1
1346 | 0
1347 | 1
1348 | 1
1349 | 1
1350 | 1
1351 | 1
1352 | 1
1353 | 0
1354 | 1
1355 | 1
1356 | 1
1357 | 1
1358 | 0
1359 | 1
1360 | 0
1361 | 1
1362 | 1
1363 | 1
1364 | 1
1365 | 1
1366 | 1
1367 | 0
1368 | 0
1369 | 1
1370 | 0
1371 | 0
1372 | 0
1373 | 1
1374 | 1
1375 | 1
1376 | 0
1377 | 1
1378 | 1
1379 | 1
1380 | 1
1381 | 1
1382 | 1
1383 | 0
1384 | 1
1385 | 0
1386 | 1
1387 | 1
1388 | 1
1389 | 1
1390 | 1
1391 | 1
1392 | 1
1393 | 1
1394 | 1
1395 | 1
1396 | 1
1397 | 1
1398 | 1
1399 | 0
1400 | 1
1401 | 1
1402 | 1
1403 | 1
1404 | 1
1405 | 1
1406 | 1
1407 | 0
1408 | 1
1409 | 1
1410 | 1
1411 | 1
1412 | 1
1413 | 1
1414 | 1
1415 | 1
1416 | 1
1417 | 0
1418 | 1
1419 | 1
1420 | 1
1421 | 1
1422 | 1
1423 | 0
1424 | 1
1425 | 1
1426 | 1
1427 | 1
1428 | 1
1429 | 1
1430 | 1
1431 | 1
1432 | 1
1433 | 0
1434 | 1
1435 | 1
1436 | 1
1437 | 1
1438 | 0
1439 | 1
1440 | 1
1441 | 1
1442 | 0
1443 | 1
1444 | 1
1445 | 1
1446 | 1
1447 | 0
1448 | 1
1449 | 1
1450 | 1
1451 | 1
1452 | 1
1453 | 0
1454 | 1
1455 | 0
1456 | 1
1457 | 1
1458 | 0
1459 | 1
1460 | 1
1461 | 1
1462 | 1
1463 | 1
1464 | 1
1465 | 1
1466 | 0
1467 | 1
1468 | 1
1469 | 1
1470 | 1
1471 | 1
1472 | 1
1473 | 1
1474 | 0
1475 | 1
1476 | 1
1477 | 1
1478 | 1
1479 | 1
1480 | 0
1481 | 1
1482 | 0
1483 | 1
1484 | 1
1485 | 1
1486 | 1
1487 | 1
1488 | 1
1489 | 0
1490 | 1
1491 | 1
1492 | 0
1493 | 1
1494 | 0
1495 | 1
1496 | 1
1497 | 1
1498 | 1
1499 | 0
1500 | 1
1501 | 1
1502 | 1
1503 | 1
1504 | 1
1505 | 1
1506 | 0
1507 | 1
1508 | 1
1509 | 1
1510 | 1
1511 | 1
1512 | 1
1513 | 0
1514 | 0
1515 | 1
1516 | 0
1517 | 0
1518 | 1
1519 | 1
1520 | 1
1521 | 1
1522 | 1
1523 | 0
1524 | 1
1525 | 0
1526 | 1
1527 | 1
1528 | 0
1529 | 1
1530 | 1
1531 | 1
1532 | 0
1533 | 1
1534 | 1
1535 | 0
1536 | 1
1537 | 1
1538 | 0
1539 | 0
1540 | 1
1541 | 1
1542 | 0
1543 | 1
1544 | 1
1545 | 1
1546 | 1
1547 | 1
1548 | 1
1549 | 1
1550 | 1
1551 | 1
1552 | 0
1553 | 1
1554 | 1
1555 | 1
1556 | 0
1557 | 0
1558 | 1
1559 | 0
1560 | 1
1561 | 1
1562 | 1
1563 | 0
1564 | 0
1565 | 1
1566 | 0
1567 | 1
1568 | 1
1569 | 0
1570 | 1
1571 | 1
1572 | 0
1573 | 1
1574 | 1
1575 | 1
1576 | 0
1577 | 1
1578 | 1
1579 | 1
1580 | 1
1581 | 1
1582 | 1
1583 | 1
1584 | 0
1585 | 0
1586 | 0
1587 | 1
1588 | 1
1589 | 1
1590 | 1
1591 | 1
1592 | 1
1593 | 0
1594 | 1
1595 | 0
1596 | 1
1597 | 1
1598 | 1
1599 | 1
1600 | 1
1601 | 1
1602 | 0
1603 | 1
1604 | 1
1605 | 1
1606 | 1
1607 | 1
1608 | 0
1609 | 1
1610 | 1
1611 | 1
1612 | 1
1613 | 1
1614 | 1
1615 | 1
1616 | 0
1617 | 1
1618 | 1
1619 | 1
1620 | 1
1621 | 1
1622 | 1
1623 | 1
1624 | 1
1625 | 1
1626 | 1
1627 | 0
1628 | 1
1629 | 1
1630 | 1
1631 | 1
1632 | 1
1633 | 1
1634 | 0
1635 | 0
1636 | 1
1637 | 1
1638 | 1
1639 | 1
1640 | 1
1641 | 1
1642 | 1
1643 | 1
1644 | 0
1645 | 1
1646 | 1
1647 | 1
1648 | 1
1649 | 0
1650 | 1
1651 | 1
1652 | 1
1653 | 0
1654 | 1
1655 | 1
1656 | 1
1657 | 0
1658 | 1
1659 | 1
1660 | 1
1661 | 1
1662 | 0
1663 | 1
1664 | 1
1665 | 1
1666 | 1
1667 | 1
1668 | 1
1669 | 1
1670 | 1
1671 | 1
1672 | 1
1673 | 0
1674 | 0
1675 | 0
1676 | 1
1677 | 1
1678 | 0
1679 | 1
1680 | 1
1681 | 0
1682 | 1
1683 | 1
1684 | 0
1685 | 1
1686 | 1
1687 | 0
1688 | 1
1689 | 1
1690 | 1
1691 | 1
1692 | 0
1693 | 0
1694 | 1
1695 | 1
1696 | 1
1697 | 1
1698 | 1
1699 | 0
1700 | 1
1701 | 1
1702 | 0
1703 | 1
1704 | 1
1705 | 0
1706 | 1
1707 | 1
1708 | 1
1709 | 1
1710 | 1
1711 | 1
1712 | 1
1713 | 1
1714 | 1
1715 | 0
1716 | 1
1717 | 1
1718 | 1
1719 | 1
1720 | 1
1721 | 1
1722 | 0
1723 | 1
1724 | 1
1725 | 1
1726 | 1
1727 | 1
1728 | 1
1729 | 1
1730 | 1
1731 | 1
1732 | 1
1733 | 1
1734 | 1
1735 | 1
1736 | 1
1737 | 1
1738 | 1
1739 | 1
1740 | 0
1741 | 0
1742 | 1
1743 | 1
1744 | 1
1745 | 0
1746 | 1
1747 | 1
1748 | 1
1749 | 0
1750 | 1
1751 | 0
1752 | 1
1753 | 1
1754 | 1
1755 | 1
1756 | 1
1757 | 1
1758 | 1
1759 | 1
1760 | 1
1761 | 1
1762 | 1
1763 | 0
1764 | 1
1765 | 1
1766 | 1
1767 | 1
1768 | 1
1769 | 1
1770 | 1
1771 | 0
1772 | 1
1773 | 1
1774 | 1
1775 | 1
1776 | 0
1777 | 1
1778 | 1
1779 | 1
1780 | 0
1781 | 1
1782 | 1
1783 | 0
1784 | 0
1785 | 0
1786 | 1
1787 | 1
1788 | 1
1789 | 1
1790 | 1
1791 | 1
1792 | 1
1793 | 0
1794 | 1
1795 | 1
1796 | 1
1797 | 0
1798 | 1
1799 | 1
1800 | 1
1801 | 1
1802 | 1
1803 | 1
1804 | 0
1805 | 1
1806 | 1
1807 | 1
1808 | 1
1809 | 1
1810 | 0
1811 | 1
1812 | 1
1813 | 1
1814 | 1
1815 | 1
1816 | 1
1817 | 0
1818 | 0
1819 | 1
1820 | 1
1821 | 1
1822 | 1
1823 | 1
1824 | 0
1825 | 1
1826 | 0
1827 | 1
1828 | 0
1829 | 1
1830 | 1
1831 | 1
1832 | 1
1833 | 0
1834 | 1
1835 | 1
1836 | 0
1837 | 1
1838 | 0
1839 | 1
1840 | 1
1841 | 1
1842 | 1
1843 | 1
1844 | 1
1845 | 1
1846 | 1
1847 | 1
1848 | 0
1849 | 0
1850 | 1
1851 | 1
1852 | 1
1853 | 1
1854 | 1
1855 | 1
1856 | 1
1857 | 1
1858 | 1
1859 | 0
1860 | 1
1861 | 1
1862 | 1
1863 | 0
1864 | 1
1865 | 1
1866 | 0
1867 | 1
1868 | 1
1869 | 1
1870 | 0
1871 | 1
1872 | 1
1873 | 0
1874 | 1
1875 | 1
1876 | 1
1877 | 0
1878 | 1
1879 | 1
1880 | 0
1881 | 1
1882 | 1
1883 | 1
1884 | 1
1885 | 1
1886 | 1
1887 | 1
1888 | 1
1889 | 0
1890 | 1
1891 | 0
1892 | 1
1893 | 1
1894 | 1
1895 | 1
1896 | 1
1897 | 0
1898 | 0
1899 | 1
1900 | 1
1901 | 1
1902 | 0
1903 | 1
1904 | 1
1905 | 1
1906 | 1
1907 | 1
1908 | 1
1909 | 1
1910 | 1
1911 | 1
1912 | 1
1913 | 0
1914 | 1
1915 | 0
1916 | 1
1917 | 1
1918 | 1
1919 | 0
1920 | 1
1921 | 1
1922 | 0
1923 | 1
1924 | 1
1925 | 1
1926 | 1
1927 | 1
1928 | 1
1929 | 1
1930 | 1
1931 | 1
1932 | 1
1933 | 1
1934 | 1
1935 | 1
1936 | 1
1937 | 1
1938 | 1
1939 | 1
1940 | 1
1941 | 1
1942 | 1
1943 | 0
1944 | 1
1945 | 1
1946 | 1
1947 | 1
1948 | 0
1949 | 0
1950 | 1
1951 | 1
1952 | 1
1953 | 1
1954 | 1
1955 | 1
1956 | 1
1957 | 1
1958 | 1
1959 | 0
1960 | 1
1961 | 1
1962 | 1
1963 | 1
1964 | 1
1965 | 0
1966 | 1
1967 | 1
1968 | 1
1969 | 1
1970 | 0
1971 | 1
1972 | 1
1973 | 1
1974 | 1
1975 | 1
1976 | 1
1977 | 1
1978 | 0
1979 | 0
1980 | 1
1981 | 0
1982 | 1
1983 | 1
1984 | 1
1985 | 0
1986 | 1
1987 | 1
1988 | 1
1989 | 1
1990 | 1
1991 | 1
1992 | 0
1993 | 1
1994 | 1
1995 | 0
1996 | 1
1997 | 1
1998 | 0
1999 | 0
2000 | 1
2001 |
--------------------------------------------------------------------------------
/dataset/AIDS/AIDS_label_readme.txt:
--------------------------------------------------------------------------------
1 | Node labels: [symbol]
2 |
3 | Node attributes: [chem, charge, x, y]
4 |
5 | Edge labels: [valence]
6 |
7 | Node labels were converted to integer values using this map:
8 |
9 | Component 0:
10 | 0 C
11 | 1 O
12 | 2 N
13 | 3 Cl
14 | 4 F
15 | 5 S
16 | 6 Se
17 | 7 P
18 | 8 Na
19 | 9 I
20 | 10 Co
21 | 11 Br
22 | 12 Li
23 | 13 Si
24 | 14 Mg
25 | 15 Cu
26 | 16 As
27 | 17 B
28 | 18 Pt
29 | 19 Ru
30 | 20 K
31 | 21 Pd
32 | 22 Au
33 | 23 Te
34 | 24 W
35 | 25 Rh
36 | 26 Zn
37 | 27 Bi
38 | 28 Pb
39 | 29 Ge
40 | 30 Sb
41 | 31 Sn
42 | 32 Ga
43 | 33 Hg
44 | 34 Ho
45 | 35 Tl
46 | 36 Ni
47 | 37 Tb
48 |
49 |
50 |
51 | Edge labels were converted to integer values using this map:
52 |
53 | Component 0:
54 | 0 1
55 | 1 2
56 | 2 3
57 |
58 |
59 |
60 | Class labels were converted to integer values using this map:
61 |
62 | 0 a
63 | 1 i
64 |
65 |
66 |
--------------------------------------------------------------------------------
/main/attack.py:
--------------------------------------------------------------------------------
1 | import sys, os
2 | sys.path.append(os.path.abspath('..'))
3 |
4 | import copy
5 | import numpy as np
6 | from tqdm import tqdm
7 | import torch
8 | import torch.nn as nn
9 | import torch.optim as optim
10 | import torch.nn.functional as F
11 | import torch.optim.lr_scheduler as lr_scheduler
12 |
13 | from utils.datareader import DataReader
14 | from utils.bkdcdd import select_cdd_graphs, select_cdd_nodes
15 | from utils.mask import gen_mask, recover_mask
16 | import main.benign as benign
17 | import trojan.GTA as gta
18 | from trojan.input import gen_input
19 | from trojan.prop import train_model, evaluate
20 | from config import parse_args
21 |
22 | class GraphBackdoor:
23 | def __init__(self, args) -> None:
24 | self.args = args
25 |
26 | assert torch.cuda.is_available(), 'no GPU available'
27 | self.cpu = torch.device('cpu')
28 | self.cuda = torch.device('cuda')
29 |
30 | def run(self):
31 | # train a benign GNN
32 | self.benign_dr, self.benign_model = benign.run(self.args)
33 | model = copy.deepcopy(self.benign_model).to(self.cuda)
34 | # pick up initial candidates
35 | bkd_gids_test, bkd_nids_test, bkd_nid_groups_test = self.bkd_cdd('test')
36 |
37 | nodenums = [adj.shape[0] for adj in self.benign_dr.data['adj_list']]
38 | nodemax = max(nodenums)
39 | featdim = np.array(self.benign_dr.data['features'][0]).shape[1]
40 |
41 | # init two generators for topo/feat
42 | toponet = gta.GraphTrojanNet(nodemax, self.args.gtn_layernum)
43 | featnet = gta.GraphTrojanNet(featdim, self.args.gtn_layernum)
44 |
45 |
46 | # init test data
47 | # NOTE: for data that can only add perturbation on features, only init the topo value
48 | init_dr_test = self.init_trigger(
49 | self.args, copy.deepcopy(self.benign_dr), bkd_gids_test, bkd_nid_groups_test, 0.0, 0.0)
50 | bkd_dr_test = copy.deepcopy(init_dr_test)
51 |
52 | topomask_test, featmask_test = gen_mask(
53 | init_dr_test, bkd_gids_test, bkd_nid_groups_test)
54 | Ainput_test, Xinput_test = gen_input(self.args, init_dr_test, bkd_gids_test)
55 |
56 | for rs_step in range(self.args.resample_steps): # for each step, choose different sample
57 |
58 | # randomly select new graph backdoor samples
59 | bkd_gids_train, bkd_nids_train, bkd_nid_groups_train = self.bkd_cdd('train')
60 |
61 | # positive/negtive sample set
62 | pset = bkd_gids_train
63 | nset = list(set(self.benign_dr.data['splits']['train'])-set(pset))
64 |
65 | if self.args.pn_rate != None:
66 | if len(pset) > len(nset):
67 | repeat = int(np.ceil(len(pset)/(len(nset)*self.args.pn_rate)))
68 | nset = list(nset) * repeat
69 | else:
70 | repeat = int(np.ceil((len(nset)*self.args.pn_rate)/len(pset)))
71 | pset = list(pset) * repeat
72 |
73 | # init train data
74 | # NOTE: for data that can only add perturbation on features, only init the topo value
75 | init_dr_train = self.init_trigger(
76 | self.args, copy.deepcopy(self.benign_dr), bkd_gids_train, bkd_nid_groups_train, 0.0, 0.0)
77 | bkd_dr_train = copy.deepcopy(init_dr_train)
78 |
79 | topomask_train, featmask_train = gen_mask(
80 | init_dr_train, bkd_gids_train, bkd_nid_groups_train)
81 | Ainput_train, Xinput_train = gen_input(self.args, init_dr_train, bkd_gids_train)
82 |
83 | for bi_step in range(self.args.bilevel_steps):
84 | print("Resampling step %d, bi-level optimization step %d" % (rs_step, bi_step))
85 |
86 | toponet, featnet = gta.train_gtn(
87 | self.args, model, toponet, featnet,
88 | pset, nset, topomask_train, featmask_train,
89 | init_dr_train, bkd_dr_train, Ainput_train, Xinput_train)
90 |
91 | # get new backdoor datareader for training based on well-trained generators
92 | for gid in bkd_gids_train:
93 | rst_bkdA = toponet(
94 | Ainput_train[gid], topomask_train[gid], self.args.topo_thrd,
95 | self.cpu, self.args.topo_activation, 'topo')
96 | # rst_bkdA = recover_mask(nodenums[gid], topomask_train[gid], 'topo')
97 | # bkd_dr_train.data['adj_list'][gid] = torch.add(rst_bkdA, init_dr_train.data['adj_list'][gid])
98 | bkd_dr_train.data['adj_list'][gid] = torch.add(
99 | rst_bkdA[:nodenums[gid], :nodenums[gid]].detach().cpu(),
100 | init_dr_train.data['adj_list'][gid])
101 |
102 | rst_bkdX = featnet(
103 | Xinput_train[gid], featmask_train[gid], self.args.feat_thrd,
104 | self.cpu, self.args.feat_activation, 'feat')
105 | # rst_bkdX = recover_mask(nodenums[gid], featmask_train[gid], 'feat')
106 | # bkd_dr_train.data['features'][gid] = torch.add(rst_bkdX, init_dr_train.data['features'][gid])
107 | bkd_dr_train.data['features'][gid] = torch.add(
108 | rst_bkdX[:nodenums[gid]].detach().cpu(), init_dr_train.data['features'][gid])
109 |
110 | # train GNN
111 | train_model(self.args, bkd_dr_train, model, list(set(pset)), list(set(nset)))
112 |
113 | #----------------- Evaluation -----------------#
114 | for gid in bkd_gids_test:
115 | rst_bkdA = toponet(
116 | Ainput_test[gid], topomask_test[gid], self.args.topo_thrd,
117 | self.cpu, self.args.topo_activation, 'topo')
118 | # rst_bkdA = recover_mask(nodenums[gid], topomask_test[gid], 'topo')
119 | # bkd_dr_test.data['adj_list'][gid] = torch.add(rst_bkdA,
120 | # torch.as_tensor(copy.deepcopy(init_dr_test.data['adj_list'][gid])))
121 | bkd_dr_test.data['adj_list'][gid] = torch.add(
122 | rst_bkdA[:nodenums[gid], :nodenums[gid]],
123 | torch.as_tensor(copy.deepcopy(init_dr_test.data['adj_list'][gid])))
124 |
125 | rst_bkdX = featnet(
126 | Xinput_test[gid], featmask_test[gid], self.args.feat_thrd,
127 | self.cpu, self.args.feat_activation, 'feat')
128 | # rst_bkdX = recover_mask(nodenums[gid], featmask_test[gid], 'feat')
129 | # bkd_dr_test.data['features'][gid] = torch.add(
130 | # rst_bkdX, torch.as_tensor(copy.deepcopy(init_dr_test.data['features'][gid])))
131 | bkd_dr_test.data['features'][gid] = torch.add(
132 | rst_bkdX[:nodenums[gid]], torch.as_tensor(copy.deepcopy(init_dr_test.data['features'][gid])))
133 |
134 | # graph originally in target label
135 | yt_gids = [gid for gid in bkd_gids_test
136 | if self.benign_dr.data['labels'][gid]==self.args.target_class]
137 | # graph originally notin target label
138 | yx_gids = list(set(bkd_gids_test) - set(yt_gids))
139 | clean_graphs_test = list(set(self.benign_dr.data['splits']['test'])-set(bkd_gids_test))
140 |
141 | # feed into GNN, test success rate
142 | bkd_acc = evaluate(self.args, bkd_dr_test, model, bkd_gids_test)
143 | flip_rate = evaluate(self.args, bkd_dr_test, model,yx_gids)
144 | clean_acc = evaluate(self.args, bkd_dr_test, model, clean_graphs_test)
145 |
146 | # save gnn
147 | if rs_step == 0 and (bi_step==self.args.bilevel_steps-1 or abs(bkd_acc-100) <1e-4):
148 | if self.args.save_bkd_model:
149 | save_path = self.args.bkd_model_save_path
150 | os.makedirs(save_path, exist_ok=True)
151 | save_path = os.path.join(save_path, '%s-%s-%f.t7' % (
152 | self.args.model, self.args.dataset, self.args.train_ratio,
153 | self.args.bkd_gratio_trainset, self.args.bkd_num_pergraph, self.args.bkd_size))
154 |
155 | torch.save({'model': model.state_dict(),
156 | 'asr': bkd_acc,
157 | 'flip_rate': flip_rate,
158 | 'clean_acc': clean_acc,
159 | }, save_path)
160 | print("Trojaning model is saved at: ", save_path)
161 |
162 | if abs(bkd_acc-100) <1e-4:
163 | # bkd_dr_tosave = copy.deepcopy(bkd_dr_test)
164 | print("Early Termination for 100% Attack Rate")
165 | break
166 | print('Done')
167 |
168 |
169 | def bkd_cdd(self, subset: str):
170 | # - subset: 'train', 'test'
171 | # find graphs to add trigger (not modify now)
172 | bkd_gids = select_cdd_graphs(
173 | self.args, self.benign_dr.data['splits'][subset], self.benign_dr.data['adj_list'], subset)
174 | # find trigger nodes per graph
175 | # same sequence with selected backdoored graphs
176 | bkd_nids, bkd_nid_groups = select_cdd_nodes(
177 | self.args, bkd_gids, self.benign_dr.data['adj_list'])
178 |
179 | assert len(bkd_gids)==len(bkd_nids)==len(bkd_nid_groups)
180 |
181 | return bkd_gids, bkd_nids, bkd_nid_groups
182 |
183 |
184 | @staticmethod
185 | def init_trigger(args, dr: DataReader, bkd_gids: list, bkd_nid_groups: list, init_edge: float, init_feat: float):
186 | if init_feat == None:
187 | init_feat = - 1
188 | print('init feat == None, transferred into -1')
189 |
190 | # (in place) datareader trigger injection
191 | for i in tqdm(range(len(bkd_gids)), desc="initializing trigger..."):
192 | gid = bkd_gids[i]
193 | for group in bkd_nid_groups[i] :
194 | # change adj in-place
195 | src, dst = [], []
196 | for v1 in group:
197 | for v2 in group:
198 | if v1!=v2:
199 | src.append(v1)
200 | dst.append(v2)
201 | a = np.array(dr.data['adj_list'][gid])
202 | a[src, dst] = init_edge
203 | dr.data['adj_list'][gid] = a.tolist()
204 |
205 | # change features in-place
206 | featdim = len(dr.data['features'][0][0])
207 | a = np.array(dr.data['features'][gid])
208 | a[group] = np.ones((len(group), featdim)) * init_feat
209 | dr.data['features'][gid] = a.tolist()
210 |
211 | # change graph labels
212 | assert args.target_class is not None
213 | dr.data['labels'][gid] = args.target_class
214 |
215 | return dr
216 |
217 | if __name__ == '__main__':
218 | args = parse_args()
219 | attack = GraphBackdoor(args)
220 | attack.run()
--------------------------------------------------------------------------------
/main/benign.py:
--------------------------------------------------------------------------------
1 | import sys, os
2 | sys.path.append(os.path.abspath('..'))
3 |
4 | import time
5 | import pickle
6 | import numpy as np
7 |
8 | import torch
9 | import torch.optim as optim
10 | import torch.nn.functional as F
11 | from torch.utils.data import DataLoader
12 | import torch.optim.lr_scheduler as lr_scheduler
13 |
14 | from utils.datareader import GraphData, DataReader
15 | from utils.batch import collate_batch
16 | from model.gcn import GCN
17 | from model.gat import GAT
18 | from model.sage import GraphSAGE
19 | from config import parse_args
20 |
21 | def run(args):
22 | assert torch.cuda.is_available(), 'no GPU available'
23 | cpu = torch.device('cpu')
24 | cuda = torch.device('cuda')
25 |
26 | # load data into DataReader object
27 | dr = DataReader(args)
28 |
29 | loaders = {}
30 | for split in ['train', 'test']:
31 | if split=='train':
32 | gids = dr.data['splits']['train']
33 | else:
34 | gids = dr.data['splits']['test']
35 | gdata = GraphData(dr, gids)
36 | loader = DataLoader(gdata,
37 | batch_size=args.batch_size,
38 | shuffle=False,
39 | collate_fn=collate_batch)
40 | # data in loaders['train/test'] is saved as returned format of collate_batch()
41 | loaders[split] = loader
42 | print('train %d, test %d' % (len(loaders['train'].dataset), len(loaders['test'].dataset)))
43 |
44 | # prepare model
45 | in_dim = loaders['train'].dataset.num_features
46 | out_dim = loaders['train'].dataset.num_classes
47 | if args.model == 'gcn':
48 | model = GCN(in_dim, out_dim, hidden_dim=args.hidden_dim, dropout=args.dropout)
49 | elif args.model == 'gat':
50 | model = GAT(in_dim, out_dim, hidden_dim=args.hidden_dim, dropout=args.dropout, num_head=args.num_head)
51 | elif args.model=='sage':
52 | model = GraphSAGE(in_dim, out_dim, hidden_dim=args.hidden_dim, dropout=args.dropout)
53 | else:
54 | raise NotImplementedError(args.model)
55 |
56 | # print('\nInitialize model')
57 | # print(model)
58 | train_params = list(filter(lambda p: p.requires_grad, model.parameters()))
59 | # print('N trainable parameters:', np.sum([p.numel() for p in train_params]))
60 |
61 | # training
62 | loss_fn = F.cross_entropy
63 | predict_fn = lambda output: output.max(1, keepdim=True)[1].detach().cpu()
64 | optimizer = optim.Adam(train_params, lr=args.lr, weight_decay=args.weight_decay, betas=(0.5, 0.999))
65 | scheduler = lr_scheduler.MultiStepLR(optimizer, args.lr_decay_steps, gamma=0.1)
66 |
67 | model.to(cuda)
68 | for epoch in range(args.train_epochs):
69 | model.train()
70 | start = time.time()
71 | train_loss, n_samples = 0, 0
72 | for batch_id, data in enumerate(loaders['train']):
73 | for i in range(len(data)):
74 | data[i] = data[i].to(cuda)
75 | # if args.use_cont_node_attr:
76 | # data[0] = norm_features(data[0])
77 | optimizer.zero_grad()
78 | output = model(data)
79 | if len(output.shape)==1:
80 | output = output.unsqueeze(0)
81 | loss = loss_fn(output, data[4])
82 | loss.backward()
83 | optimizer.step()
84 | scheduler.step()
85 |
86 | time_iter = time.time() - start
87 | train_loss += loss.item() * len(output)
88 | n_samples += len(output)
89 |
90 | if args.train_verbose and (epoch % args.log_every == 0 or epoch == args.train_epochs - 1):
91 | print('Train Epoch: %d\tLoss: %.4f (avg: %.4f) \tsec/iter: %.2f' % (
92 | epoch + 1, loss.item(), train_loss / n_samples, time_iter / (batch_id + 1)))
93 |
94 | if (epoch + 1) % args.eval_every == 0 or epoch == args.train_epochs-1:
95 | model.eval()
96 | start = time.time()
97 | test_loss, correct, n_samples = 0, 0, 0
98 | for batch_id, data in enumerate(loaders['test']):
99 | for i in range(len(data)):
100 | data[i] = data[i].to(cuda)
101 | # if args.use_org_node_attr:
102 | # data[0] = norm_features(data[0])
103 | output = model(data)
104 | if len(output.shape)==1:
105 | output = output.unsqueeze(0)
106 | loss = loss_fn(output, data[4], reduction='sum')
107 | test_loss += loss.item()
108 | n_samples += len(output)
109 | pred = predict_fn(output)
110 |
111 | correct += pred.eq(data[4].detach().cpu().view_as(pred)).sum().item()
112 |
113 | eval_acc = 100. * correct / n_samples
114 | print('Test set (epoch %d): Average loss: %.4f, Accuracy: %d/%d (%.2f%s) \tsec/iter: %.2f' % (
115 | epoch + 1, test_loss / n_samples, correct, n_samples,
116 | eval_acc, '%', (time.time() - start) / len(loaders['test'])))
117 |
118 | model.to(cpu)
119 |
120 | if args.save_clean_model:
121 | save_path = args.clean_model_save_path
122 | os.makedirs(save_path, exist_ok=True)
123 | save_path = os.path.join(save_path, '%s-%s-%s.t7' % (args.model, args.dataset, str(args.train_ratio)))
124 |
125 | torch.save({
126 | 'model': model.state_dict(),
127 | 'lr': args.lr,
128 | 'batch_size': args.batch_size,
129 | 'eval_acc': eval_acc,
130 | }, save_path)
131 | print('Clean trained GNN saved at: ', os.path.abspath(save_path))
132 |
133 | return dr, model
134 |
135 |
136 | if __name__ == '__main__':
137 | args = parse_args()
138 | run(args)
--------------------------------------------------------------------------------
/main/example.sh:
--------------------------------------------------------------------------------
1 | python benign.py --use_org_node_attr --train_verbose
2 |
3 | nohup python -u attack.py --use_org_node_attr --train_verbose --target_class 0 --train_epochs 20 > ../attack.log 2>&1 &
--------------------------------------------------------------------------------
/model/gat.py:
--------------------------------------------------------------------------------
1 | import dgl
2 | import torch
3 | import torch.nn as nn
4 | import torch.nn.functional as F
5 | from utils.graph import numpy_to_graph
6 |
7 | # implemented from https://arxiv.org/abs/1710.10903
8 |
9 | class GATLayer(nn.Module):
10 | def __init__(self, in_dim, out_dim):
11 | super(GATLayer, self).__init__()
12 | # equation (1)
13 | self.fc = nn.Linear(in_dim, out_dim, bias=False)
14 | # equation (2)
15 | self.attn_fc = nn.Linear(2 * out_dim, 1, bias=False)
16 |
17 | def edge_attention(self, edges):
18 | # edge UDF for equation (2)
19 | z2 = torch.cat([edges.src['z'], edges.dst['z']], dim=1)
20 | a = self.attn_fc(z2)
21 | return {'e': F.leaky_relu(a)}
22 |
23 | def message_func(self, edges):
24 | # message UDF for equation (3) & (4)
25 | return {'z': edges.src['z'], 'e': edges.data['e']}
26 |
27 | def reduce_func(self, nodes):
28 | # reduce UDF for equation (3) & (4)
29 | # equation (3)
30 | alpha = F.softmax(nodes.mailbox['e'], dim=1)
31 | # equation (4)
32 | h = torch.sum(alpha * nodes.mailbox['z'], dim=1)
33 | return {'h': h}
34 |
35 | def forward(self, g, h):
36 | # equation (1)
37 | z = self.fc(h)
38 | g.ndata['z'] = z
39 | # equation (2)
40 | g.apply_edges(self.edge_attention)
41 | # equation (3) & (4)
42 | g.update_all(self.message_func, self.reduce_func)
43 | return g.ndata.pop('h')
44 |
45 |
46 | class MultiHeadGATLayer(nn.Module):
47 | def __init__(self, in_dim, out_dim, num_head, merge='cat'):
48 | super(MultiHeadGATLayer, self).__init__()
49 | self.heads = nn.ModuleList()
50 | for i in range(num_head):
51 | self.heads.append(GATLayer(in_dim, out_dim))
52 | self.merge = merge
53 |
54 | def forward(self, g, h):
55 | head_outs = [attn_head(g, h) for attn_head in self.heads]
56 | if self.merge == 'cat':
57 | # concat on the output feature dimension (dim=1)
58 | return torch.cat(head_outs, dim=1)
59 | else:
60 | # merge using average
61 | return torch.mean(torch.stack(head_outs), dim=0)
62 |
63 |
64 | class GAT(nn.Module):
65 | def __init__(self, in_dim, out_dim,
66 | hidden_dim=[64, 32],
67 | dropout=0.2,
68 | num_head=2):
69 | super(GAT, self).__init__()
70 |
71 | self.layers = nn.ModuleList()
72 |
73 | self.layers.append(MultiHeadGATLayer(in_dim, hidden_dim[0], num_head, merge='mean'))
74 | for i in range(len(hidden_dim) - 1):
75 | self.layers.append(MultiHeadGATLayer(hidden_dim[i], hidden_dim[i+1], num_head, merge='mean'))
76 |
77 | fc = []
78 | if dropout > 0:
79 | fc.append(nn.Dropout(p=dropout))
80 | fc.append(nn.Linear(hidden_dim[-1], out_dim))
81 | self.fc = nn.Sequential(*fc)
82 |
83 | def forward(self, data):
84 | batch_g = []
85 | for adj in data[1]:
86 | batch_g.append(numpy_to_graph(adj.cpu().detach().T.numpy(), to_cuda=adj.is_cuda))
87 | batch_g = dgl.batch(batch_g)
88 |
89 | mask = data[2]
90 | if len(mask.shape) == 2:
91 | mask = mask.unsqueeze(2) # (B,N,1)
92 |
93 | B,N,F = data[0].shape[:3]
94 | x = data[0].reshape(B*N, F)
95 | mask = mask.reshape(B*N, 1)
96 | for layer in self.layers:
97 | x = layer(batch_g, x)
98 | x = x * mask
99 |
100 | F_prime = x.shape[-1]
101 | x = x.reshape(B, N, F_prime)
102 | x = torch.max(x, dim=1)[0].squeeze() # max pooling over nodes (usually performs better than average)
103 | # x = torch.mean(x, dim=1).squeeze()
104 | x = self.fc(x)
105 | return x
--------------------------------------------------------------------------------
/model/gcn.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 | import dgl
6 | import dgl.function as fn
7 | from utils.graph import numpy_to_graph
8 |
9 | gcn_msg = fn.copy_src(src='h', out='m')
10 | gcn_reduce = fn.sum(msg='m', out='h')
11 |
12 | # Used for inductive case (graph classification) by default.
13 | class GCNLayer(nn.Module):
14 | def __init__(self, in_feats, out_feats):
15 | super(GCNLayer, self).__init__()
16 | self.linear = nn.Linear(in_feats, out_feats)
17 |
18 | def forward(self, g, feature):
19 | # Creating a local scope so that all the stored ndata and edata
20 | # (such as the `'h'` ndata below) are automatically popped out
21 | # when the scope exits.
22 | with g.local_scope():
23 | g.ndata['h'] = feature
24 | g.update_all(gcn_msg, gcn_reduce)
25 | h = g.ndata['h']
26 | return self.linear(h)
27 |
28 |
29 | # 2 layers by default
30 | class GCN(nn.Module):
31 | def __init__(self, in_dim, out_dim,
32 | hidden_dim=[64, 32], # GNN layers + 1 layer MLP
33 | dropout=0.2,
34 | activation=F.relu):
35 | super(GCN, self).__init__()
36 | self.layers = nn.ModuleList()
37 |
38 | self.layers.append(GCNLayer(in_dim, hidden_dim[0]))
39 | for i in range(len(hidden_dim) - 1):
40 | self.layers.append(GCNLayer(hidden_dim[i], hidden_dim[i+1]))
41 |
42 | fc = []
43 | if dropout > 0:
44 | fc.append(nn.Dropout(p=dropout))
45 | fc.append(nn.Linear(hidden_dim[-1], out_dim))
46 | self.fc = nn.Sequential(*fc)
47 |
48 |
49 | def forward(self, data):
50 | batch_g = []
51 | for adj in data[1]:
52 | batch_g.append(numpy_to_graph(adj.cpu().detach().T.numpy(), to_cuda=adj.is_cuda))
53 | batch_g = dgl.batch(batch_g)
54 |
55 | mask = data[2]
56 | if len(mask.shape) == 2:
57 | mask = mask.unsqueeze(2) # (B,N,1)
58 |
59 | B,N,F = data[0].shape[:3]
60 | x = data[0].reshape(B*N, F)
61 | mask = mask.reshape(B*N, 1)
62 | for layer in self.layers:
63 | x = layer(batch_g, x)
64 | x = x * mask
65 |
66 | F_prime = x.shape[-1]
67 | x = x.reshape(B, N, F_prime)
68 | x = torch.max(x, dim=1)[0].squeeze() # max pooling over nodes (usually performs better than average)
69 | # x = torch.mean(x, dim=1).squeeze()
70 | x = self.fc(x)
71 | return x
72 |
73 |
--------------------------------------------------------------------------------
/model/sage.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 | import dgl
6 | from dgl import DGLGraph, transform
7 | from dgl.nn.pytorch.conv import SAGEConv
8 | from utils.graph import numpy_to_graph
9 |
10 | # Used for inductive case (graph classification) by default.
11 | class GraphSAGE(nn.Module):
12 | def __init__(self, in_dim, out_dim,
13 | hidden_dim=[64, 32], # GNN layers + 1 layer MLP
14 | dropout=0.2,
15 | activation=F.relu,
16 | aggregator_type='gcn'): # mean/gcn/pool/lstm
17 | super(GraphSAGE, self).__init__()
18 | self.layers = nn.ModuleList()
19 |
20 | # input layer
21 | self.layers.append(SAGEConv(in_dim, hidden_dim[0], aggregator_type, feat_drop=dropout, activation=activation))
22 | # hidden layers
23 | for i in range(len(hidden_dim) - 1):
24 | self.layers.append(SAGEConv(hidden_dim[i], hidden_dim[i+1], aggregator_type, feat_drop=dropout, activation=activation))
25 |
26 | fc = []
27 | if dropout > 0:
28 | fc.append(nn.Dropout(p=dropout))
29 | fc.append(nn.Linear(hidden_dim[-1], out_dim))
30 | self.fc = nn.Sequential(*fc)
31 |
32 |
33 | def forward(self, data):
34 | batch_g = []
35 | for adj in data[1]:
36 | # cannot use tensor init DGLGraph
37 | batch_g.append(numpy_to_graph(adj.cpu().T.numpy(), to_cuda=adj.is_cuda))
38 | batch_g = dgl.batch(batch_g)
39 |
40 | mask = data[2]
41 | if len(mask.shape) == 2:
42 | mask = mask.unsqueeze(2) # (B,N,1)
43 |
44 | B,N,F = data[0].shape[:3]
45 | x = data[0].reshape(B*N, F)
46 | mask = mask.reshape(B*N, 1)
47 | for layer in self.layers:
48 | x = layer(batch_g, x)
49 | x = x * mask
50 |
51 | F_prime = x.shape[-1]
52 | x = x.reshape(B, N, F_prime)
53 | x = torch.max(x, dim=1)[0].squeeze() # max pooling over nodes (usually performs better than average)
54 | # x = torch.mean(x, dim=1).squeeze()
55 | x = self.fc(x)
56 | return x
--------------------------------------------------------------------------------
/trojan/GTA.py:
--------------------------------------------------------------------------------
1 | import sys, os
2 | from utils.datareader import DataReader
3 | sys.path.append(os.path.abspath('..'))
4 |
5 | import numpy as np
6 | from tqdm import tqdm
7 | import torch
8 | import torch.nn as nn
9 | import torch.optim as optim
10 | import torch.nn.functional as F
11 |
12 | from utils.mask import recover_mask
13 | from trojan.prop import forwarding
14 |
15 | class GradWhere(torch.autograd.Function):
16 | """
17 | We can implement our own custom autograd Functions by subclassing
18 | torch.autograd.Function and implementing the forward and backward passes
19 | which operate on Tensors.
20 | """
21 |
22 | @staticmethod
23 | def forward(ctx, input, thrd, device):
24 | """
25 | In the forward pass we receive a Tensor containing the input and return
26 | a Tensor containing the output. ctx is a context object that can be used
27 | to stash information for backward computation. You can cache arbitrary
28 | objects for use in the backward pass using the ctx.save_for_backward method.
29 | """
30 | ctx.save_for_backward(input)
31 | rst = torch.where(input>thrd, torch.tensor(1.0, device=device, requires_grad=True),
32 | torch.tensor(0.0, device=device, requires_grad=True))
33 | return rst
34 |
35 | @staticmethod
36 | def backward(ctx, grad_output):
37 | """
38 | In the backward pass we receive a Tensor containing the gradient of the loss
39 | with respect to the output, and we need to compute the gradient of the loss
40 | with respect to the input.
41 | """
42 | input, = ctx.saved_tensors
43 | grad_input = grad_output.clone()
44 |
45 | """
46 | Return results number should corresponding with .forward inputs (besides ctx),
47 | for each input, return a corresponding backward grad
48 | """
49 | return grad_input, None, None
50 |
51 |
52 |
53 | class GraphTrojanNet(nn.Module):
54 | def __init__(self, sq_dim, layernum=1, dropout=0.05):
55 | super(GraphTrojanNet, self).__init__()
56 |
57 | layers = []
58 | if dropout > 0:
59 | layers.append(nn.Dropout(p=dropout))
60 | for l in range(layernum-1):
61 | layers.append(nn.Linear(sq_dim, sq_dim))
62 | layers.append(nn.ReLU(inplace=True))
63 | if dropout > 0:
64 | layers.append(nn.Dropout(p=dropout))
65 | layers.append(nn.Linear(sq_dim, sq_dim))
66 |
67 | self.layers = nn.Sequential(*layers)
68 |
69 | def forward(self, input, mask, thrd,
70 | device=torch.device('cpu'),
71 | activation='relu',
72 | for_whom='topo',
73 | binaryfeat=False):
74 |
75 | """
76 | "input", "mask" and "thrd", should already in cuda before sent to this function.
77 | If using sparse format, corresponding tensor should already in sparse format before
78 | sent into this function
79 | """
80 | GW = GradWhere.apply
81 |
82 | bkdmat = self.layers(input)
83 | if activation=='relu':
84 | bkdmat = F.relu(bkdmat)
85 | elif activation=='sigmoid':
86 | bkdmat = torch.sigmoid(bkdmat) # nn.Functional.sigmoid is deprecated
87 |
88 | if for_whom == 'topo': # not consider direct yet
89 | bkdmat = torch.div(torch.add(bkdmat, bkdmat.transpose(0, 1)), 2.0)
90 | if for_whom == 'topo' or (for_whom == 'feat' and binaryfeat):
91 | bkdmat = GW(bkdmat, thrd, device)
92 | bkdmat = torch.mul(bkdmat, mask)
93 |
94 | return bkdmat
95 |
96 |
97 | def train_gtn(args, model, toponet: GraphTrojanNet, featnet: GraphTrojanNet,
98 | pset, nset, topomasks, featmasks,
99 | init_dr: DataReader, bkd_dr: DataReader, Ainputs, Xinputs):
100 | """
101 | All matrix/array like inputs should already in torch.tensor format.
102 | All tensor parameters or models should initially stay in CPU when
103 | feeding into this function.
104 |
105 | About inputs of this function:
106 | - pset/nset: gids in trainset
107 | - init_dr: init datareader, keep unmodified inside of each resampling
108 | - bkd_dr: store temp adaptive adj/features, get by init_dr + GTN(inputs)
109 | """
110 | if torch.cuda.is_available():
111 | cuda = torch.device('cuda')
112 | cpu = torch.device('cpu')
113 |
114 | init_As = init_dr.data['adj_list']
115 | init_Xs = init_dr.data['features']
116 | bkd_As = bkd_dr.data['adj_list']
117 | bkd_Xs = bkd_dr.data['features']
118 |
119 | nodenums = [len(adj) for adj in init_As]
120 | glabels = torch.LongTensor(init_dr.data['labels']).to(cuda)
121 | glabels[pset] = args.target_class
122 | allset = np.concatenate((pset, nset))
123 |
124 | optimizer_topo = optim.Adam(toponet.parameters(),
125 | lr=args.gtn_lr,
126 | weight_decay=5e-4)
127 | optimizer_feat = optim.Adam(featnet.parameters(),
128 | lr=args.gtn_lr,
129 | weight_decay=5e-4)
130 |
131 |
132 | #----------- training topo generator -----------#
133 | toponet.to(cuda)
134 | model.to(cuda)
135 | topo_thrd = torch.tensor(args.topo_thrd).to(cuda)
136 | criterion = nn.CrossEntropyLoss()
137 |
138 | toponet.train()
139 | for _ in tqdm(range(args.gtn_epochs), desc="training topology generator"):
140 | optimizer_topo.zero_grad()
141 | # generate new adj_list by dr.data['adj_list']
142 | for gid in pset:
143 | SendtoCUDA(gid, [init_As, Ainputs, topomasks]) # only send the used graph items to cuda
144 | rst_bkdA = toponet(
145 | Ainputs[gid], topomasks[gid], topo_thrd, cuda, args.topo_activation, 'topo')
146 | # rst_bkdA = recover_mask(nodenums[gid], topomasks[gid], 'topo')
147 | # bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA, init_As[gid])
148 | bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA[:nodenums[gid], :nodenums[gid]], init_As[gid]) # only current position in cuda
149 | SendtoCPU(gid, [init_As, Ainputs, topomasks])
150 |
151 | loss = forwarding(args, bkd_dr, model, allset, criterion)
152 | loss.backward()
153 | optimizer_topo.step()
154 | torch.cuda.empty_cache()
155 |
156 | toponet.eval()
157 | toponet.to(cpu)
158 | model.to(cpu)
159 | for gid in pset:
160 | SendtoCPU(gid, [bkd_dr.data['adj_list']])
161 | del topo_thrd
162 | torch.cuda.empty_cache()
163 |
164 |
165 | #----------- training feat generator -----------#
166 | featnet.to(cuda)
167 | model.to(cuda)
168 | feat_thrd = torch.tensor(args.feat_thrd).to(cuda)
169 | criterion = nn.CrossEntropyLoss()
170 |
171 | featnet.train()
172 | for epoch in tqdm(range(args.gtn_epochs), desc="training feature generator"):
173 | optimizer_feat.zero_grad()
174 | # generate new features by dr.data['features']
175 | for gid in pset:
176 | SendtoCUDA(gid, [init_Xs, Xinputs, featmasks]) # only send the used graph items to cuda
177 | rst_bkdX = featnet(
178 | Xinputs[gid], featmasks[gid], feat_thrd, cuda, args.feat_activation, 'feat')
179 | # rst_bkdX = recover_mask(nodenums[gid], featmasks[gid], 'feat')
180 | # bkd_dr.data['features'][gid] = torch.add(rst_bkdX, init_Xs[gid])
181 | bkd_dr.data['features'][gid] = torch.add(rst_bkdX[:nodenums[gid]], init_Xs[gid]) # only current position in cuda
182 | SendtoCPU(gid, [init_Xs, Xinputs, featmasks])
183 |
184 | # generate DataLoader
185 | loss = forwarding(
186 | args, bkd_dr, model, allset, criterion)
187 | loss.backward()
188 | optimizer_feat.step()
189 | torch.cuda.empty_cache()
190 |
191 | featnet.eval()
192 | featnet.to(cpu)
193 | model.to(cpu)
194 | for gid in pset:
195 | SendtoCPU(gid, [bkd_dr.data['features']])
196 | del feat_thrd
197 | torch.cuda.empty_cache()
198 |
199 | return toponet, featnet
200 |
201 | #----------------------------------------------------------------
202 | def SendtoCUDA(gid, items):
203 | """
204 | - items: a list of dict / full-graphs list,
205 | used as item[gid] in items
206 | - gid: int
207 | """
208 | cuda = torch.device('cuda')
209 | for item in items:
210 | item[gid] = torch.as_tensor(item[gid], dtype=torch.float32).to(cuda)
211 |
212 |
213 | def SendtoCPU(gid, items):
214 | """
215 | Used after SendtoCUDA, target object must be torch.tensor and already in cuda.
216 |
217 | - items: a list of dict / full-graphs list,
218 | used as item[gid] in items
219 | - gid: int
220 | """
221 |
222 | cpu = torch.device('cpu')
223 | for item in items:
224 | item[gid] = item[gid].to(cpu)
--------------------------------------------------------------------------------
/trojan/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhaohan-xi/GraphBackdoor/3d975d78813f2a4a4960f92f9b66847dc19413a8/trojan/__init__.py
--------------------------------------------------------------------------------
/trojan/input.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import numpy as np
3 |
4 | def gen_input(args, datareader, bkd_gids):
5 | """
6 | Prepare inputs for GTN, topo input and feat input together.
7 |
8 | About inputs (of this function):
9 | - args: control adapt-input type
10 |
11 | Note: Extend input size as (N, N) / (N, F) where N is max node num among all graphs
12 | """
13 | As = {}
14 | Xs = {}
15 | for gid in bkd_gids:
16 | if gid not in As: As[gid] = torch.tensor(datareader.data['adj_list'][gid], dtype=torch.float)
17 | if gid not in Xs: Xs[gid] = torch.tensor(datareader.data['features'][gid], dtype=torch.float)
18 | Ainputs = {}
19 | Xinputs = {}
20 |
21 | if args.gtn_input_type == '1hop':
22 | for gid in bkd_gids:
23 | if gid not in Ainputs: Ainputs[gid] = As[gid].clone().detach()
24 | if gid not in Xinputs: Xinputs[gid] = torch.mm(Ainputs[gid], Xs[gid])
25 |
26 | elif args.gtn_input_type == '2hop':
27 | for gid in bkd_gids:
28 | As[gid] = torch.add(As[gid], torch.mm(As[gid], As[gid]))
29 | As[gid] = torch.where(As[gid]>0, torch.tensor(1.0, requires_grad=True),
30 | torch.tensor(0.0, requires_grad=True))
31 | As[gid].fill_diagonal_(0.0)
32 |
33 | for gid in bkd_gids:
34 | if gid not in Ainputs: Ainputs[gid] = As[gid].clone().detach()
35 | if gid not in Xinputs: Xinputs[gid] = torch.mm(Ainputs[gid], Xs[gid])
36 |
37 |
38 | elif args.gtn_input_type == '1hop_degree':
39 | rowsums = [torch.add(torch.sum(As[gid], dim=1), 1e-6) for gid in bkd_gids]
40 | re_Ds = [torch.diag(torch.pow(rowsum, -1)) for rowsum in rowsums]
41 |
42 | for i in range(len(bkd_gids)):
43 | gid = bkd_gids[i]
44 | if gid not in Ainputs: Ainputs[gid] = torch.mm(re_Ds[i], As[gid])
45 | if gid not in Xinputs: Xinputs[gid] = torch.mm(Ainputs[gid], Xs[gid])
46 |
47 |
48 | elif args.gtn_input_type == '2hop_degree':
49 | for gid in bkd_gids:
50 | As[gid] = torch.add(As[gid], torch.mm(As[gid], As[gid]))
51 | As[gid] = torch.where(As[gid]>0, torch.tensor(1.0, requires_grad=True),
52 | torch.tensor(0.0, requires_grad=True))
53 | As[gid].fill_diagonal_(0.0)
54 |
55 | rowsums = [torch.add(torch.sum(As[gid], dim=1), 1e-6) for gid in bkd_gids]
56 | re_Ds = [torch.diag(torch.pow(rowsum, -1)) for rowsum in rowsums]
57 |
58 | for i in range(len(bkd_gids)):
59 | gid = bkd_gids[i]
60 | if gid not in Ainputs: Ainputs[gid] = torch.mm(re_Ds[i], As[gid])
61 | if gid not in Xinputs: Xinputs[gid] = torch.mm(Ainputs[gid], Xs[gid])
62 |
63 | else: raise NotImplementedError('not support other types of aggregated inputs')
64 |
65 | # pad each input into maxi possible size (N, N) / (N, F)
66 | NodeMax = int(datareader.data['n_node_max'])
67 | FeatDim = np.array(datareader.data['features'][0]).shape[1]
68 | for gid in Ainputs.keys():
69 | a_input = Ainputs[gid]
70 | x_input = Xinputs[gid]
71 |
72 | add_dim = NodeMax - a_input.shape[0]
73 | Ainputs[gid] = np.pad(a_input, ((0, add_dim), (0, add_dim))).tolist()
74 | Xinputs[gid] = np.pad(x_input, ((0, add_dim), (0, 0))).tolist()
75 | Ainputs[gid] = torch.tensor(Ainputs[gid])
76 | Xinputs[gid] = torch.tensor(Xinputs[gid])
77 |
78 | return Ainputs, Xinputs
79 |
--------------------------------------------------------------------------------
/trojan/prop.py:
--------------------------------------------------------------------------------
1 | import sys, os
2 | sys.path.append(os.path.abspath('..'))
3 |
4 | import torch
5 | import torch.nn as nn
6 | import torch.optim as optim
7 | import torch.nn.functional as F
8 | import torch.optim.lr_scheduler as lr_scheduler
9 | from torch.utils.data import DataLoader
10 |
11 | from utils.datareader import GraphData, DataReader
12 | from utils.batch import collate_batch
13 |
14 | # run on CUDA
15 | def forwarding(args, bkd_dr: DataReader, model, gids, criterion):
16 | assert torch.cuda.is_available(), "no GPU available"
17 | cuda = torch.device('cuda')
18 |
19 | gdata = GraphData(bkd_dr, gids)
20 | loader = DataLoader(gdata,
21 | batch_size=args.batch_size,
22 | shuffle=False,
23 | collate_fn=collate_batch)
24 |
25 | if not next(model.parameters()).is_cuda:
26 | model.to(cuda)
27 | model.eval()
28 | all_loss, n_samples = 0.0, 0.0
29 | for batch_idx, data in enumerate(loader):
30 | # assert batch_idx == 0, "In AdaptNet Train, we only need one GNN pass, batch-size=len(all trainset)"
31 | for i in range(len(data)):
32 | data[i] = data[i].to(cuda)
33 | output = model(data)
34 |
35 | if len(output.shape)==1:
36 | output = output.unsqueeze(0)
37 |
38 | loss = criterion(output, data[4]) # only calculate once
39 | all_loss = torch.add(torch.mul(loss, len(output)), all_loss) # cannot be loss.item()
40 | n_samples += len(output)
41 |
42 | all_loss = torch.div(all_loss, n_samples)
43 | return all_loss
44 |
45 |
46 | def train_model(args, dr_train: DataReader, model, pset, nset):
47 | assert torch.cuda.is_available(), "no GPU available"
48 | cuda = torch.device('cuda')
49 | cpu = torch.device('cpu')
50 |
51 | model.to(cuda)
52 | gids = {'pos': pset, 'neg': nset}
53 | gdata = {}
54 | loader = {}
55 | for key in ['pos', 'neg']:
56 | gdata[key] = GraphData(dr_train, gids[key])
57 | loader[key] = DataLoader(gdata[key],
58 | batch_size=args.batch_size,
59 | shuffle=False,
60 | collate_fn=collate_batch)
61 |
62 | train_params = list(filter(lambda p: p.requires_grad, model.parameters()))
63 | optimizer = optim.Adam(train_params, lr=args.lr, weight_decay=args.weight_decay, betas=(0.5, 0.999))
64 | scheduler = lr_scheduler.MultiStepLR(optimizer, args.lr_decay_steps, gamma=0.1)
65 | loss_fn = F.cross_entropy
66 |
67 | model.train()
68 | for epoch in range(args.train_epochs):
69 | optimizer.zero_grad()
70 |
71 | losses = {'pos': 0.0, 'neg': 0.0}
72 | n_samples = {'pos': 0.0, 'neg': 0.0}
73 | for key in ['pos', 'neg']:
74 | for batch_idx, data in enumerate(loader[key]):
75 | for i in range(len(data)):
76 | data[i] = data[i].to(cuda)
77 | output = model(data)
78 | if len(output.shape)==1:
79 | output = output.unsqueeze(0)
80 | losses[key] += loss_fn(output, data[4])*len(output)
81 | n_samples[key] += len(output)
82 |
83 | for i in range(len(data)):
84 | data[i] = data[i].to(cpu)
85 |
86 | losses[key] = torch.div(losses[key], n_samples[key])
87 | loss = losses['pos'] + args.lambd*losses['neg']
88 | loss.backward()
89 | optimizer.step()
90 | scheduler.step()
91 | model.to(cpu)
92 |
93 |
94 | # def TrainGNN_v2(args,
95 | # dr_train,
96 | # model,
97 | # fold_id,
98 | # train_gids,
99 | # use_optim='Adam',
100 | # need_print=False):
101 | # assert torch.cuda.is_available(), "no GPU available"
102 | # cuda = torch.device('cuda')
103 | # cpu = torch.device('cpu')
104 |
105 | # model.to(cuda)
106 |
107 | # gdata = GraphData(dr_train,
108 | # fold_id,
109 | # 'train',
110 | # train_gids)
111 | # loader = DataLoader(gdata,
112 | # batch_size=args.batch_size,
113 | # shuffle=False,
114 | # collate_fn=collate_batch)
115 |
116 | # train_params = list(filter(lambda p: p.requires_grad, model.parameters()))
117 | # if use_optim=='Adam':
118 | # optimizer = optim.Adam(train_params, lr=args.lr, weight_decay=args.weight_decay, betas=(0.5, 0.999))
119 | # else:
120 | # optimizer = optim.SGD(train_params, lr=args.lr)
121 | # predict_fn = lambda output: output.max(1, keepdim=True)[1].detach().cpu()
122 | # loss_fn = F.cross_entropy
123 |
124 | # model.train()
125 | # for epoch in range(args.epochs):
126 | # optimizer.zero_grad()
127 |
128 | # loss = 0.0
129 | # n_samples = 0
130 | # correct = 0
131 | # for batch_idx, data in enumerate(loader):
132 | # for i in range(len(data)):
133 | # data[i] = data[i].to(cuda)
134 | # output = model(data)
135 | # if len(output.shape)==1:
136 | # output = output.unsqueeze(0)
137 | # loss += loss_fn(output, data[4])*len(output)
138 | # n_samples += len(output)
139 |
140 | # for i in range(len(data)):
141 | # data[i] = data[i].to(cpu)
142 | # torch.cuda.empty_cache()
143 |
144 | # pred = predict_fn(output)
145 | # correct += pred.eq(data[4].detach().cpu().view_as(pred)).sum().item()
146 | # acc = 100. * correct / n_samples
147 | # loss = torch.div(loss, n_samples)
148 |
149 | # if need_print and epoch%5==0:
150 | # print("Epoch {} | Loss {:.4f} | Train Accuracy {:.4f}".format(epoch, loss.item(), acc))
151 | # loss.backward()
152 | # optimizer.step()
153 | # model.to(cpu)
154 |
155 |
156 |
157 | def evaluate(args, dr_test: DataReader, model, gids):
158 | # separate bkd_test/clean_test gids
159 | softmax = torch.nn.Softmax(dim=1)
160 |
161 | model.cuda()
162 | gdata = GraphData(dr_test, gids)
163 | loader = DataLoader(gdata,
164 | batch_size=args.batch_size,
165 | shuffle=False,
166 | collate_fn=collate_batch)
167 |
168 | loss_fn = F.cross_entropy
169 | predict_fn = lambda output: output.max(1, keepdim=True)[1].detach().cpu()
170 |
171 | model.eval()
172 | test_loss, correct, n_samples, confidence = 0, 0, 0, 0
173 | for batch_idx, data in enumerate(loader):
174 | for i in range(len(data)):
175 | data[i] = data[i].cuda()
176 | output = model(data) # not softmax yet
177 | if len(output.shape)==1:
178 | output = output.unsqueeze(0)
179 | loss = loss_fn(output, data[4], reduction='sum')
180 | test_loss += loss.item()
181 | n_samples += len(output)
182 | pred = predict_fn(output)
183 |
184 | correct += pred.eq(data[4].detach().cpu().view_as(pred)).sum().item()
185 | confidence += torch.sum(torch.max(softmax(output), dim=1)[0]).item()
186 | acc = 100. * correct / n_samples
187 | confidence = confidence / n_samples
188 |
189 | print('Test set: Average loss: %.4f, Accuracy: %d/%d (%.2f%s), Average Confidence %.4f' % (
190 | test_loss / n_samples, correct, n_samples, acc, '%', confidence))
191 | model.cpu()
192 | return acc
--------------------------------------------------------------------------------
/utils/batch.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 |
4 |
5 | def collate_batch(batch):
6 | '''
7 | function: Creates a batch of same size graphs by zero-padding node features and adjacency matrices
8 | up to the maximum number of nodes in the CURRENT batch rather than in the entire dataset.
9 | param batch: [node_features*batch_size, A*batch_size, label*batch_size]
10 | return: [padded feature matrices, padded adjecency matrices, non-padding positions, nodenums, labels]
11 | '''
12 | B = len(batch)
13 | nodenums = [len(batch[b][1]) for b in range(B)]
14 | if len(batch[0][0].shape)==2:
15 | C = batch[0][0].shape[1] # C is feature dim
16 | else:
17 | C = batch[0][0].shape[0]
18 | n_node_max = int(np.max(nodenums))
19 |
20 | graph_support = torch.zeros(B, n_node_max)
21 | A = torch.zeros(B, n_node_max, n_node_max)
22 | X = torch.zeros(B, n_node_max, C)
23 | for b in range(B):
24 | X[b, :nodenums[b]] = batch[b][0] # store original values in top (no need to pad feat dim, node dim only)
25 | A[b, :nodenums[b], :nodenums[b]] = batch[b][1] # store original values in top-left corner
26 | graph_support[b][:nodenums[b]] = 1 # mask with values of 0 for dummy (zero padded) nodes, otherwise 1
27 |
28 | nodenums = torch.from_numpy(np.array(nodenums)).long()
29 | labels = torch.from_numpy(np.array([batch[b][2] for b in range(B)])).long()
30 | return [X, A, graph_support, nodenums, labels]
31 |
32 |
33 | # Note: here mask "graph_support" is only a 1D mask for each graph instance.
34 | # When use this mask for 2D work, should first extend into 2D.
35 |
--------------------------------------------------------------------------------
/utils/bkdcdd.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | from utils.datareader import DataReader
4 | sys.path.append('/home/zxx5113/BackdoorGNN/')
5 |
6 | import numpy as np
7 | import copy
8 |
9 |
10 | # return 1D list
11 | def select_cdd_graphs(args, data: list, adj_list: list, subset: str):
12 | '''
13 | Given a data (train/test), (randomly or determinately)
14 | pick up some graph to put backdoor information, return ids.
15 | '''
16 | rs = np.random.RandomState(args.seed)
17 | graph_sizes = [np.array(adj).shape[0] for adj in adj_list]
18 | bkd_graph_ratio = args.bkd_gratio_train if subset == 'train' else args.bkd_gratio_test
19 | bkd_num = int(np.ceil(bkd_graph_ratio * len(data)))
20 |
21 | assert len(data)>bkd_num , "Graph Instances are not enough"
22 | picked_ids = []
23 |
24 | # Randomly pick up graphs as backdoor candidates from data
25 | remained_set = copy.deepcopy(data)
26 | loopcount = 0
27 | while bkd_num-len(picked_ids) >0 and len(remained_set)>0 and loopcount<=50:
28 | loopcount += 1
29 |
30 | cdd_ids = rs.choice(remained_set, bkd_num-len(picked_ids), replace=False)
31 | for gid in cdd_ids:
32 | if bkd_num-len(picked_ids) <=0:
33 | break
34 | gsize = graph_sizes[gid]
35 | if gsize >= 3*args.bkd_size*args.bkd_num_pergraph:
36 | picked_ids.append(gid)
37 |
38 | if len(remained_set)= 1.5*args.bkd_size*args.bkd_num_pergraph and gid not in picked_ids:
44 | picked_ids.append(gid)
45 |
46 | if len(remained_set)= 1.0*args.bkd_size*args.bkd_num_pergraph and gid not in picked_ids:
52 | picked_ids.append(gid)
53 |
54 | picked_ids = list(set(picked_ids))
55 | remained_set = list(set(remained_set) - set(picked_ids))
56 | if len(remained_set)==0 and bkd_num>len(picked_ids):
57 | print("no more graph to pick, return insufficient candidate graphs, try smaller bkd-pattern or graph size")
58 |
59 | return picked_ids
60 |
61 |
62 | def select_cdd_nodes(args, graph_cdd_ids, adj_list):
63 | '''
64 | Given a graph instance, based on pre-determined standard,
65 | find nodes who should be put backdoor information, return
66 | their ids.
67 |
68 | return: same sequece with bkd-gids
69 | (1) a 2D list - bkd nodes under each graph
70 | (2) and a 3D list - bkd node groups under each graph
71 | (in case of each graph has multiple triggers)
72 | '''
73 | rs = np.random.RandomState(args.seed)
74 |
75 | # step1: find backdoor nodes
76 | picked_nodes = [] # 2D, save all cdd graphs
77 |
78 | for gid in graph_cdd_ids:
79 | node_ids = [i for i in range(len(adj_list[gid]))]
80 | assert len(node_ids)==len(adj_list[gid]), 'node number in graph {} mismatch'.format(gid)
81 |
82 | bkd_node_num = int(args.bkd_num_pergraph*args.bkd_size)
83 | assert bkd_node_num <= len(adj_list[gid]), "error in SelectCddGraphs, candidate graph too small"
84 | cur_picked_nodes = rs.choice(node_ids, bkd_node_num, replace=False)
85 | picked_nodes.append(cur_picked_nodes)
86 |
87 | # step2: match nodes
88 | assert len(picked_nodes)==len(graph_cdd_ids), "backdoor graphs & node groups mismatch, check SelectCddGraphs/SelectCddNodes"
89 |
90 | node_groups = [] # 3D, grouped trigger nodes
91 | for i in range(len(graph_cdd_ids)): # for each graph, devide candidate nodes into groups
92 | gid = graph_cdd_ids[i]
93 | nids = picked_nodes[i]
94 |
95 | assert len(nids)%args.bkd_size==0.0, "Backdoor nodes cannot equally be divided, check SelectCddNodes-STEP1"
96 |
97 | # groups within each graph
98 | groups = np.array_split(nids, len(nids)//args.bkd_size)
99 | # np.array_split return list[array([..]), array([...]), ]
100 | # thus transfer internal np.array into list
101 | # store groups as a 2D list.
102 | groups = np.array(groups).tolist()
103 | node_groups.append(groups)
104 |
105 | assert len(picked_nodes)==len(node_groups), "groups of bkd-nodes mismatch, check SelectCddNodes-STEP2"
106 | return picked_nodes, node_groups
107 |
108 |
--------------------------------------------------------------------------------
/utils/datareader.py:
--------------------------------------------------------------------------------
1 | """
2 | This file processes tu-dataset and saved in a 'DataReader' class,
3 | then the DataReader objects will transfer into 'GraphData' before training
4 |
5 | Specifically used to process dataset from
6 | https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets
7 | """
8 |
9 | import os
10 | import torch
11 | import numpy as np
12 |
13 | def split_ids(args, gids, rs):
14 | '''
15 | single fold
16 | gids: 0-based graph id list.
17 | '''
18 | train_gids = list(rs.choice(gids, int(args.train_ratio * len(gids)), replace=False))
19 | test_gids = list(set(gids)-set(train_gids))
20 | return train_gids, test_gids
21 |
22 |
23 | #! All files should end with .txt
24 | class DataReader():
25 | """
26 | Wil contain keys ['adj_list', 'nlabel', 'labels', 'attr', 'features',
27 | 'splits', 'n_node_max', 'num_features', 'num_classes']
28 | - 'adj_list': generated by 'read_graph_adj' from '_A.txt', which represents
29 | a list of adj matrices, whose shape may be different. Stored in
30 | same order of graph indicator.
31 | In list format, each element is a np.array, refers to a square and
32 | symmetric adj matrix.
33 |
34 | use: datareader.data['adj_list'][gid] - a 2D adj matrix
35 |
36 | - 'nlabels': generated by 'read_node_features' from '_node_labels.txt', which
37 | represents node labels within each same graph instance. Stored in
38 | same sequence of '_node_labels.txt' or '_graph_indicator.txt'.
39 | Stored in list format, 2D. Each internal dim refers to a feature
40 | list (node label list) for a graph instance.
41 |
42 | - 'attr': generated by 'read_node_features' from "_node_attributes.txt" if have,
43 | which represents original features of each node within a graph instance.
44 | Order are same with previous.
45 | Stored as a 2D list of 1D np.array. Internal list is a series of original
46 | node feature vectors for a graph instance. Internal element is a 1D np.array
47 | represents the (maybe floating-point) feature vector for a speficif node.
48 | More exatcly, is [
49 | [array(f1, f2, ..), array(f1, f2, ..), array(f1, f2, ..)],
50 | [similar node feature vectors in 2nd graph instance],
51 | [similar node feature vectors in 3rd graph instance],
52 | ...
53 | ]
54 |
55 | - 'features': combination of 'nlabel' and 'attr'. Where 'nlabel' are
56 | transferred as one-hot format to show which label belongs to
57 | a specific node. Overall sequence is same with previous.
58 | Each onehot feature matrix has shape (N, D1+D2), where N is
59 | number of nodes within a specific graph instance, D1, D2 are
60 | number of possible labels and node feature vector length within
61 | this graph, respectively. D2 is optional.
62 |
63 | use : datareader.data['features'][gid] - a 2D (N, D1+D2) matrix of constructed features
64 |
65 | - 'labels': concrete label for each graph instance, with same order of 'graph_labels.txt'.
66 | Stored as a list of np.int64.
67 |
68 | use : datareader.data['labels'][gid] - a single int label
69 |
70 | - 'splits': a split of train/test sets.
71 | {
72 | 'train': [list of train graph ids, in int],
73 | 'test': [list of test graph ids, in int]
74 | }.
75 |
76 | use : datareader.data['splits']['train/test'] - a list of int gids
77 |
78 |
79 |
80 | - 'n_node_max': max num of nodes within a graph instance among all graphs. Single int.
81 | - 'num_features': size of concatenate features in 'features'. Single int.
82 | - 'num_classes': num of graph classes. Single int.
83 | """
84 |
85 | def __init__(self, args):
86 |
87 | # self.args = args
88 | assert args.use_nlabel_asfeat or args.use_org_node_attr or args.use_degree_asfeat, \
89 | 'need at least one source to construct node features'
90 |
91 | self.data_path = os.path.join(args.data_path, args.dataset)
92 | self.rnd_state = np.random.RandomState(args.seed)
93 | files = os.listdir(self.data_path)
94 | data = {}
95 |
96 | """
97 | Load raw graphs, nodes, record in 2 dicts.
98 | Load adj list for each graph with sequence of graph indicator.
99 | Load node labels for each graph with sequence of graph indicator.
100 | Load graph labels for each graph with sequence of graph indicator.
101 | """
102 | nodes, graphs = self.read_graph_nodes_relations(
103 | list(filter(lambda f: f.find('graph_indicator') >= 0, files))[0])
104 | data['adj_list'] = self.read_graph_adj( # in case of Tox21_Axx_...
105 | list(filter(lambda f: f.find('_A.') >= 0, files))[0], nodes, graphs)
106 |
107 | node_labels_file = list(filter(lambda f: f.find('node_labels') >= 0, files))
108 | if len(node_labels_file) == 1:
109 | data['nlabels'] = self.read_node_features(
110 | node_labels_file[0], nodes, graphs, fn=lambda s: int(s.strip()))
111 | else:
112 | data['nlabels'] = None
113 |
114 | data['labels'] = np.array(
115 | self.parse_txt_file(
116 | list(filter(lambda f: f.find('graph_labels') >= 0 or f.find('graph_attributes') >= 0, files))[0],
117 | line_parse_fn=lambda s: int(float(s.strip()))))
118 |
119 | if args.use_org_node_attr:
120 | data['attr'] = self.read_node_features(list(filter(lambda f: f.find('node_attributes') >= 0, files))[0],
121 | nodes, graphs,
122 | fn=lambda s: np.array(list(map(float, s.strip().split(',')))))
123 |
124 | '''also include this part into GetFinalFeatures()
125 | '''
126 | # In each graph sample, treat node labels (if have) as feature for one graph.
127 | nlabels, n_edges, degrees = [], [], []
128 | for sample_id, adj in enumerate(data['adj_list']):
129 | N = len(adj) # number of nodes
130 |
131 | # some verifications
132 | if data['nlabels'] is not None:
133 | assert N == len(data['nlabels'][sample_id]), (N, len(data['nlabels'][sample_id]))
134 | # if not np.allclose(adj, adj.T):
135 | # print(sample_id, 'not symmetric') # not symm is okay, maybe direct graph
136 | n = np.sum(adj) # total sum of edges
137 | # assert n % 2 == 0, n
138 |
139 | n_edges.append(int(n / 2)) # undirected edges, so need to divide by 2
140 | degrees.extend(list(np.sum(adj, 1)))
141 | if data['nlabels'] is not None:
142 | nlabels.append(np.array(data['nlabels'][sample_id]))
143 |
144 | # Create nlabels over graphs as one-hot vectors for each node
145 | if data['nlabels'] is not None:
146 | nlabels_all = np.concatenate(nlabels)
147 | nlabels_min = nlabels_all.min()
148 | num_nlabels = int(nlabels_all.max() - nlabels_min + 1) # number of possible values
149 |
150 |
151 |
152 | #--------- Generate onehot-feature ---------#
153 | features = GetFinalFeatures(args, data)
154 |
155 | # final graph feature dim
156 | num_features = features[0].shape[1]
157 |
158 | shapes = [len(adj) for adj in data['adj_list']]
159 | labels = data['labels'] # graph class labels, np.ndarray
160 | labels -= np.min(labels) # to start from 0
161 |
162 | classes = np.unique(labels)
163 | num_classes = len(classes)
164 |
165 | """
166 | Test whether labels are successive, e.g., 0,1,2,3,4,..i, i+1,..
167 | If not, make them successive. New labels still store in "labels".
168 | """
169 | if not np.all(np.diff(classes) == 1):
170 | print('making labels sequential, otherwise pytorch might crash')
171 | labels_new = np.zeros(labels.shape, dtype=labels.dtype) - 1
172 | for lbl in range(num_classes):
173 | labels_new[labels == classes[lbl]] = lbl
174 | labels = labels_new
175 | classes = np.unique(labels)
176 | assert len(np.unique(labels)) == num_classes, np.unique(labels)
177 |
178 |
179 | def stats(x):
180 | return (np.mean(x), np.std(x), np.min(x), np.max(x))
181 |
182 | print('N nodes avg/std/min/max: \t%.2f/%.2f/%d/%d' % stats(shapes))
183 | print('N edges avg/std/min/max: \t%.2f/%.2f/%d/%d' % stats(n_edges))
184 | print('Node degree avg/std/min/max: \t%.2f/%.2f/%d/%d' % stats(degrees))
185 | print('Node features dim: \t\t%d' % num_features)
186 | print('N classes: \t\t\t%d' % num_classes)
187 | print('Classes: \t\t\t%s' % str(classes))
188 |
189 | for lbl in classes:
190 | print('Class %d: \t\t\t%d samples' % (lbl, np.sum(labels == lbl)))
191 |
192 | if args.data_verbose:
193 | if data['nlabels'] is not None:
194 | for u in np.unique(nlabels_all):
195 | print('nlabels {}, count {}/{}'.format(u, np.count_nonzero(nlabels_all == u), len(nlabels_all)))
196 |
197 | # some datasets like "Fingerprint" may lack graph in _indicator.txt
198 | # N_graphs = len(labels) # number of samples (graphs) in data
199 | # assert N_graphs == len(data['adj_list']) == len(features), 'invalid data'
200 | N_graphs = len(data['adj_list'])
201 |
202 | # Create train/test sets
203 | train_gids, test_gids = split_ids(args, self.rnd_state.permutation(N_graphs), self.rnd_state)
204 | splits = {'train': train_gids,
205 | 'test': test_gids}
206 |
207 | data['features'] = features
208 | data['labels'] = labels
209 | data['splits'] = splits
210 | data['n_node_max'] = np.max(shapes) # max number of nodes
211 | data['num_features'] = num_features
212 | data['num_classes'] = num_classes
213 |
214 | self.data = data
215 |
216 | # print(len(data['features']), len(data['adj_list']), len(data['labels']))
217 | assert len(data['features'])==len(data['adj_list'])==len(data['labels']), \
218 | "Graph Number Mismatch, Possible Reason: due to insuccessive graph indicator, \
219 | some gids are not existed in original indicator files, only thing is filtering graph labels. \
220 | Remember that insuccessive graph indicator is okay, graph labels-graphs are corresponding by \
221 | stored index in data['xxx']."
222 | print()
223 |
224 | def parse_txt_file(self, fpath, line_parse_fn=None):
225 | """
226 | Read a file, split each line by pre-defined pattern (e.g., ','),
227 | save results in list. Transferring data into Int is done outside.
228 | """
229 | with open(os.path.join(self.data_path, fpath), 'r') as f:
230 | lines = f.readlines()
231 | data = [line_parse_fn(s) if line_parse_fn is not None else s for s in lines]
232 | return data
233 |
234 |
235 | def read_graph_nodes_relations(self, fpath):
236 | """
237 | From graph_indicator.txt file, find { node_id: graph_id } and { graph_id:[nodes] }.
238 | """
239 | graph_ids = self.parse_txt_file(fpath,
240 | line_parse_fn=lambda s: int(s.rstrip()))
241 | nodes, graphs = {}, {}
242 | for node_id, graph_id in enumerate(graph_ids):
243 | if graph_id not in graphs:
244 | graphs[graph_id] = []
245 | graphs[graph_id].append(node_id)
246 | nodes[node_id] = graph_id
247 | graph_ids = np.unique(list(graphs.keys()))
248 | for graph_id in graph_ids:
249 | graphs[graph_id] = np.array(graphs[graph_id])
250 | return nodes, graphs
251 |
252 |
253 | # for direct graph, row is source nodes
254 | def read_graph_adj(self, fpath, nodes, graphs):
255 | edges = self.parse_txt_file(fpath,
256 | line_parse_fn=lambda s: s.split(','))
257 |
258 | adj_dict = {}
259 | for edge in edges:
260 | # Note: TU-datasets are all 1 based node id
261 | node1 = int(edge[0].strip()) - 1 # -1 because of zero-indexing in our code
262 | node2 = int(edge[1].strip()) - 1
263 | graph_id = nodes[node1]
264 |
265 | # both nodes in edge side should in a same graph
266 | assert graph_id == nodes[node2], ('invalid data', graph_id, nodes[node2])
267 | if graph_id not in adj_dict:
268 | n = len(graphs[graph_id])
269 | adj_dict[graph_id] = np.zeros((n, n))
270 |
271 | ind1 = np.where(graphs[graph_id] == node1)[0]
272 | ind2 = np.where(graphs[graph_id] == node2)[0]
273 | assert len(ind1) == len(ind2) == 1, (ind1, ind2)
274 | adj_dict[graph_id][ind1, ind2] = 1
275 |
276 | # no-connection graph may not included on code above,
277 | # should specially add it, e.g., graph-291 in Fingerprint
278 | # data set only have single node 1477 (1-based index),
279 | # which is not in edge file since it has no connection.
280 | # But still, we should add it to ensure the consistent.
281 | # some graphs in Tox21 also only have isolated nodes.
282 | adj_list = []
283 | for gid in sorted(list(graphs.keys())):
284 | if gid in adj_dict:
285 | adj_list.append(adj_dict[gid])
286 | else:
287 | adj_list.append(np.zeros((len(graphs[gid]), len(graphs[gid]))))
288 | return adj_list
289 |
290 |
291 | def read_node_features(self, fpath, nodes, graphs, fn):
292 | '''
293 | Return 'feature' graph by graph.
294 | here 'feature' may refer to (1) node attributes; (2) node labels; (3) node degrees
295 | '''
296 | node_features_all = self.parse_txt_file(fpath, line_parse_fn=fn)
297 | node_features = {}
298 | for node_id, x in enumerate(node_features_all):
299 | graph_id = nodes[node_id]
300 | if graph_id not in node_features:
301 | node_features[graph_id] = [None] * len(graphs[graph_id])
302 | ind = np.where(graphs[graph_id] == node_id)[0] # exactly find on index
303 | assert len(ind) == 1, ind
304 | assert node_features[graph_id][ind[0]] is None, node_features[graph_id][ind[0]]
305 | node_features[graph_id][ind[0]] = x
306 | node_features_lst = [node_features[graph_id] for graph_id in sorted(list(graphs.keys()))]
307 | return node_features_lst
308 |
309 |
310 | def GetFinalFeatures(args, data):
311 | '''
312 | Construct features for each graph instnace, may comes from 3 parts.
313 | Each element in 'features' refers to constructed feature mat
314 | to a graph. This feature mas has shape (Ni, Di), where Ni is number
315 | of nodes in graph_i, and Di is combined feature dimension, may comes
316 | from node labels, node features and degree.
317 | '''
318 |
319 | # In each graph sample, treat node labels (if have) as feature for one graph.
320 | nlabels, n_edges, degrees = [], [], []
321 | for sample_id, adj in enumerate(data['adj_list']):
322 | N = len(adj) # number of nodes
323 | n = np.sum(adj) # total sum of edges
324 |
325 | n_edges.append(int(n / 2)) # undirected edges, so need to divide by 2
326 | degrees.extend(list(np.sum(adj, 1)))
327 | if data['nlabels'] is not None:
328 | nlabels.append(np.array(data['nlabels'][sample_id]))
329 |
330 | # Create features over graphs as one-hot vectors for each node
331 | if data['nlabels'] is not None:
332 | nlabels_all = np.concatenate(nlabels)
333 | nlabels_min = nlabels_all.min()
334 | num_nlabels = int(nlabels_all.max() - nlabels_min + 1) # number of possible values
335 |
336 | final_features = []
337 | max_degree = int(np.max(degrees)) # maximum node degree among all graphs
338 | for sample_id, adj in enumerate(data['adj_list']):
339 | N = adj.shape[0]
340 |
341 | # OneHot Feature: (N, D), where D is all possible feature nums
342 | # among ondes within a graph. Each position in is 0/1 to show
343 | # whether it has/hasnot a corresopnding feature here. E.g., if
344 | # original features (also original node labels) range from 3~8,
345 | # now D = 6 (8-3+1), feature "3" will map to position "0", even
346 | # though there are multiple "3" in original feature vector.
347 |
348 | # This is down inside of one single graph.
349 |
350 |
351 | # part 1: one-hot nlabels as feature
352 | if args.use_nlabel_asfeat:
353 | if data['nlabels'] is not None:
354 | x = data['nlabels'][sample_id]
355 | nlabels_onehot = np.zeros((len(x), num_nlabels))
356 | for node, value in enumerate(x):
357 | if value is not None:
358 | nlabels_onehot[node, value - nlabels_min] = 1
359 | else:
360 | nlabels_onehot = np.empty((N, 0))
361 | else:
362 | nlabels_onehot = np.empty((N, 0))
363 |
364 | # part 2 (optional, not always have): original node features
365 | if args.use_org_node_attr:
366 | if args.dataset in ['COLORS-3', 'TRIANGLES']:
367 | # first column corresponds to node attention and shouldn't be used as node features
368 | feature_attr = np.array(data['attr'][sample_id])[:, 1:]
369 | else:
370 | feature_attr = np.array(data['attr'][sample_id])
371 | else:
372 | feature_attr = np.empty((N, 0))
373 |
374 | # part 3 (optinal): node degree
375 | if args.use_degree_asfeat:
376 | degree_onehot = np.zeros((N, max_degree + 1))
377 | degree_onehot[np.arange(N), np.sum(adj, 1).astype(np.int32)] = 1
378 | else:
379 | degree_onehot = np.empty((N, 0))
380 |
381 | node_features = np.concatenate((nlabels_onehot, feature_attr, degree_onehot), axis=1)
382 | if node_features.shape[1] == 0:
383 | # dummy features for datasets without node labels/attributes
384 | # node degree features can be used instead
385 | node_features = np.ones((N, 1))
386 | final_features.append(node_features)
387 |
388 | return final_features
389 |
390 |
391 | class GraphData(torch.utils.data.Dataset):
392 | def __init__(self, datareader: DataReader, gids: list):
393 | self.idx = gids
394 | self.rnd_state = datareader.rnd_state
395 | self.set_fold(datareader.data)
396 |
397 | def set_fold(self, data):
398 | self.total = len(data['labels'])
399 | self.n_node_max = data['n_node_max']
400 | self.num_classes = data['num_classes']
401 | self.num_features = data['num_features']
402 | self.labels = [data['labels'][i] for i in self.idx]
403 | self.adj_list = [data['adj_list'][i] for i in self.idx]
404 | self.features = [data['features'][i] for i in self.idx]
405 | # print('%s: %d/%d' % (self.split_name.upper(), len(self.labels), len(data['labels'])))
406 |
407 | def __len__(self):
408 | return len(self.labels)
409 |
410 | def __getitem__(self, index):
411 | # convert to torch
412 | return [torch.as_tensor(self.features[index], dtype=torch.float), # node features
413 | torch.as_tensor(self.adj_list[index], dtype=torch.float), # adj matrices
414 | int(self.labels[index])]
415 |
416 |
--------------------------------------------------------------------------------
/utils/graph.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import dgl
3 | import networkx as nx
4 |
5 | def numpy_to_graph(A,type_graph='dgl',node_features=None, to_cuda=True):
6 | '''Convert numpy arrays to graph
7 |
8 | Parameters
9 | ----------
10 | A : mxm array
11 | Adjacency matrix
12 | type_graph : str
13 | 'dgl' or 'nx'
14 | node_features : dict
15 | Optional, dictionary with key=feature name, value=list of size m
16 | Allows user to specify node features
17 |
18 | Returns
19 |
20 | -------
21 | Graph of 'type_graph' specification
22 | '''
23 |
24 | G = nx.from_numpy_array(A)
25 |
26 | if node_features != None:
27 | for n in G.nodes():
28 | for k,v in node_features.items():
29 | G.nodes[n][k] = v[n]
30 |
31 | if type_graph == 'nx':
32 | return G
33 |
34 | G = G.to_directed()
35 |
36 | if node_features != None:
37 | node_attrs = list(node_features.keys())
38 | else:
39 | node_attrs = []
40 |
41 | g = dgl.from_networkx(G, node_attrs=node_attrs, edge_attrs=['weight'])
42 | if to_cuda:
43 | g = g.to(torch.device('cuda'))
44 | return g
--------------------------------------------------------------------------------
/utils/mask.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import copy
4 |
5 | def gen_mask(datareader, bkd_gids, bkd_nid_groups):
6 | """
7 | Input a datareader and a list of backdoor candidate nodes (train/test),
8 | generate 2 list of masks (2D) to each of them, for topology and feature,
9 | respectively.
10 |
11 | Here a adj mask is (N, N), and feat mask is (N, F), where N is maximum
12 | num of nodes among all graphs in a dataset, F is fixed feat dim value.
13 |
14 | About how to use the mask: Topo- and Feat-mask are used in a same manner:
15 | (1) After the padding input (N, N/F) pass though its corresponding AdaptNet,
16 | we get a (N, N/F) result for one graph instance.
17 | (2) Simply do element-wise torch.mul with mask and this result, since we
18 | only want to keep mutual information inside of a backdoor pattern.
19 | (3) After masking redundant information, remember to remove additional dim
20 | in row/col, recover this masked result back to original dim same with
21 | corresponding graph instance.
22 | (4) Simply add recovered result with initialized adj / feat matrix.
23 |
24 | About inputs:
25 | - bkd_gids: 1D list
26 | - bkd_node_groups: 3D list
27 | """
28 | nodenums = [len(adj) for adj in datareader.data['adj_list']]
29 | N = max(nodenums)
30 | F = np.array(datareader.data['features'][0]).shape[1]
31 | topomask = {}
32 | featmask = {}
33 |
34 | for i in range(len(bkd_gids)):
35 | gid = bkd_gids[i]
36 | groups = bkd_nid_groups[i]
37 | if gid not in topomask: topomask[gid] = torch.zeros(N, N)
38 | if gid not in featmask: featmask[gid] = torch.zeros(N, F)
39 |
40 | for group in groups:
41 | for nid in group:
42 | topomask[gid][nid][group] = 1
43 | topomask[gid][nid][nid] = 0
44 | featmask[gid][nid][::] = 1
45 |
46 | return topomask, featmask
47 |
48 |
49 | def recover_mask(Ni, mask, for_whom):
50 | """
51 | Step3 of the mask usage, recover each masked result back to original:
52 | topomask[gid]: (N, N) --> (Ni, Ni)
53 | featmask[gid]: (N, F) --> (Ni, F)
54 |
55 | Not change original mask
56 |
57 | About mask:
58 | topomask: contains all topo masks in train/test set, dict.
59 | featmask: contains all feat masks in train/test set, dict.
60 | Return: mask for single graph instance
61 | """
62 | recovermask = copy.deepcopy(mask)
63 |
64 | if for_whom == 'topo':
65 | recovermask = recovermask[:Ni, :Ni]
66 | elif for_whom == 'feat':
67 | recovermask = recovermask[:Ni]
68 |
69 | return recovermask
70 |
--------------------------------------------------------------------------------