├── docs
├── arch.png
├── 01_installation.md
└── 03_reproduce.md
├── data
├── pos_enc
│ ├── README.md
│ ├── products_node2vec.py
│ └── snap_patents_node2vec.py
└── download_data.sh
├── .gitignore
├── README.md
├── codebook.py
├── local_module.py
├── data.py
├── data_utils.py
├── model.py
├── main.py
└── LICENSE
/docs/arch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snap-research/LargeGT/HEAD/docs/arch.png
--------------------------------------------------------------------------------
/data/pos_enc/README.md:
--------------------------------------------------------------------------------
1 | # Positional Encodings using Node2Vec
2 |
3 | This directory contains (optional) files to prepare the pos_enc embedding files for `snap-patents` and `ogbn-products` datasets, using node2vec. For the embeddings for `ogbn-papers100M`, we refer to [OGB](https://github.com/snap-stanford/ogb/tree/master/examples/nodeproppred/papers100M).
4 |
5 | The prepared files can directly be downloaded using [this script](../download_data.sh).
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *.pyo
5 | *.so
6 |
7 | # Virtual environment
8 | venv/
9 | .env
10 |
11 | # Jupyter Notebook files
12 | .ipynb_checkpoints/
13 |
14 | # PyTorch files
15 | *.pt
16 | *.pth
17 |
18 | # Logs
19 | logs/
20 |
21 | # Data
22 | *.csv
23 | *.tsv
24 | *.pkl
25 | *.mat
26 | data/ogbn_papers100M/
27 | data/ogbn_products/
28 | pos_enc/
29 |
30 | # Config files
31 | config.yaml
32 | config.json
33 |
34 | # IDE-specific files (VS Code, PyCharm, etc.)
35 | .vscode/
36 | .idea/
37 |
38 | # Miscellaneous
39 | .DS_Store
40 | *.swp
41 | *.swo
42 | wandb/
43 | run_expt.sh
44 |
--------------------------------------------------------------------------------
/docs/01_installation.md:
--------------------------------------------------------------------------------
1 | # Installation
2 |
3 |
4 |
5 |
6 |
7 | ## 1. Setup Conda
8 |
9 | ```
10 | # Conda installation
11 |
12 | # For Linux
13 | curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
14 |
15 | # For OSX
16 | curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
17 |
18 | chmod +x ~/miniconda.sh
19 | ./miniconda.sh
20 |
21 | source ~/.bashrc # For Linux
22 | source ~/.bash_profile # For OSX
23 | ```
24 |
25 |
26 |
27 |
28 | ## 2. Setup Python environment for CPU
29 |
30 | ```
31 | # Clone GitHub repo
32 | conda install git
33 | git clone https://github.com/snap-research/LargeGT.git
34 | cd LargeGT
35 |
36 | # Install python environment
37 | conda create -n gt python=3.10
38 | conda activate gt
39 | conda install -y pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
40 |
41 | pip install wandb absl-py tensorboard einops matplotlib progressbar
42 | pip install kmeans-pytorch torchviz fastcluster opentsne networkx pandas ogb kmedoids numba scikit-network
43 | pip install torch_geometric
44 | pip install pyg-lib -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
45 | pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
46 | pip install dgl -f https://data.dgl.ai/wheels/cu117/repo.html
47 |
48 | conda clean --all
49 | ```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # LargeGT: Graph Transformers for Large Graphs
2 |
3 | Source code for the paper **[Graph Transformers for Large Graphs](https://arxiv.org/abs/2312.11109)**
4 | >by [Vijay Prakash Dwivedi](http://vijaydwivedi.com.np), [Yozen Liu](https://research.snap.com/team/team-member.html#yozen-liu), [Anh Tuan Luu](https://tuanluu.github.io), [Xavier Bresson](https://scholar.google.com/citations?user=9pSK04MAAAAJ&hl=en), [Neil Shah](https://nshah.net) and [Tong Zhao](https://tzhao.io).
5 |
6 | The paper proposes LargeGT which is a scalable Graph Transformer framework designed to efficiently handle large-scale graphs, featuring a combination of fast neighborhood sampling and local-global attention mechanisms.
7 |
8 |
9 |
10 | ## 1. Installation
11 |
12 | To setup the Python environment with conda, [follow these instructions](./docs/01_installation.md).
13 |
14 | ## 2. Download data
15 |
16 | Download preprocessed data by [running this script](./data/download_data.sh) as:
17 | ```
18 | cd data
19 | bash download_data.sh
20 | ```
21 |
22 | ## 3. Run experiments
23 |
24 | To run an experiment, run the command:
25 |
26 | ```
27 | python main.py --dataset --sample_node_len
28 | ```
29 |
30 | For example:
31 | ```
32 | python main.py --dataset ogbn-products --sample_node_len 100
33 | ```
34 |
35 | To reproduce results, [follow these steps](./docs/03_reproduce.md).
36 |
37 | ## 4. Acknowledgement
38 |
39 | This code repository leverages the open-source codebases released by [GOAT](https://github.com/devnkong/GOAT) and [NAGphormer](https://github.com/JHL-HUST/NAGphormer).
40 |
41 | ## 5. Reference
42 |
43 | :page_with_curl: Paper [on arXiv](https://arxiv.org/abs/2312.11109)
44 |
45 | ```bibtex
46 | @article{dwivedi2023graph,
47 | title={Graph Transformers for Large Graphs},
48 | author={Dwivedi, Vijay Prakash and Liu, Yozen and Luu, Anh Tuan and Bresson, Xavier and Shah, Neil and Zhao, Tong},
49 | journal={arXiv preprint arXiv:2312.11109},
50 | year={2023}
51 | }
52 |
53 | ```
54 |
55 | ## 6. Contact
56 |
57 | Please contact vijaypra001@e.ntu.edu.sg for any questions.
58 |
--------------------------------------------------------------------------------
/data/pos_enc/products_node2vec.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | import torch
4 | from torch_geometric.nn import Node2Vec
5 |
6 | from ogb.nodeproppred import PygNodePropPredDataset
7 |
8 |
9 | def save_embedding(model, dim, dataset_name):
10 | root = '.'
11 | torch.save(model.embedding.weight.data.cpu(), f'{root}/{dataset_name}_embedding_{dim}.pt')
12 |
13 |
14 | def main():
15 | parser = argparse.ArgumentParser(description='OGBN-Products (Node2Vec)')
16 | parser.add_argument('--device', type=int, default=0)
17 | parser.add_argument('--embedding_dim', type=int, default=256)
18 | parser.add_argument('--walk_length', type=int, default=40)
19 | parser.add_argument('--context_size', type=int, default=20)
20 | parser.add_argument('--walks_per_node', type=int, default=10)
21 | parser.add_argument('--batch_size', type=int, default=256)
22 | parser.add_argument('--lr', type=float, default=0.01)
23 | parser.add_argument('--epochs', type=int, default=1)
24 | parser.add_argument('--log_steps', type=int, default=1)
25 | args = parser.parse_args()
26 |
27 | device = f'cuda:{args.device}' if torch.cuda.is_available() else 'cpu'
28 | device = torch.device(device)
29 |
30 | dataset_name = 'ogbn-products'
31 | dataset = PygNodePropPredDataset(name='ogbn-products', root='../')
32 | data = dataset[0]
33 |
34 | model = Node2Vec(data.edge_index, args.embedding_dim, args.walk_length,
35 | args.context_size, args.walks_per_node,
36 | sparse=True).to(device)
37 |
38 | loader = model.loader(batch_size=args.batch_size, shuffle=True,
39 | num_workers=4)
40 | optimizer = torch.optim.SparseAdam(list(model.parameters()), lr=args.lr)
41 |
42 | model.train()
43 | for epoch in range(1, args.epochs + 1):
44 | for i, (pos_rw, neg_rw) in enumerate(loader):
45 | optimizer.zero_grad()
46 | loss = model.loss(pos_rw.to(device), neg_rw.to(device))
47 | loss.backward()
48 | optimizer.step()
49 |
50 | if (i + 1) % args.log_steps == 0:
51 | print(f'Epoch: {epoch:02d}, Step: {i+1:03d}/{len(loader)}, '
52 | f'Loss: {loss:.4f}')
53 |
54 | if (i + 1) % 100 == 0: # Save model every 100 steps.
55 | save_embedding(model, args.embedding_dim, dataset_name)
56 | save_embedding(model, args.embedding_dim, dataset_name)
57 |
58 |
59 | if __name__ == "__main__":
60 | main()
--------------------------------------------------------------------------------
/data/pos_enc/snap_patents_node2vec.py:
--------------------------------------------------------------------------------
1 |
2 | import argparse
3 |
4 | import torch
5 | from torch_geometric.nn import Node2Vec
6 |
7 | from ogb.nodeproppred import PygNodePropPredDataset
8 | import scipy
9 |
10 |
11 | def save_embedding(model, dim, dataset_name):
12 | root = '.'
13 | torch.save(model.embedding.weight.data.cpu(), f'{root}/{dataset_name}_embedding_{dim}.pt')
14 |
15 |
16 | def main():
17 | parser = argparse.ArgumentParser(description='SNAP-Patents (Node2Vec)')
18 | parser.add_argument('--device', type=int, default=0)
19 | parser.add_argument('--embedding_dim', type=int, default=64)
20 | parser.add_argument('--walk_length', type=int, default=40)
21 | parser.add_argument('--context_size', type=int, default=20)
22 | parser.add_argument('--walks_per_node', type=int, default=10)
23 | parser.add_argument('--batch_size', type=int, default=256)
24 | parser.add_argument('--lr', type=float, default=0.01)
25 | parser.add_argument('--epochs', type=int, default=1)
26 | parser.add_argument('--log_steps', type=int, default=1)
27 | args = parser.parse_args()
28 |
29 | device = f'cuda:{args.device}' if torch.cuda.is_available() else 'cpu'
30 | device = torch.device(device)
31 |
32 | dataset_name = 'snap-patents'
33 |
34 | fulldata = scipy.io.loadmat(f'../snap_patents.mat')
35 | edge_index = torch.tensor(fulldata['edge_index'], dtype=torch.long)
36 | node_feat = torch.tensor(fulldata['node_feat'].todense(), dtype=torch.float)
37 | num_nodes = int(fulldata['num_nodes'])
38 |
39 |
40 | model = Node2Vec(edge_index, args.embedding_dim, args.walk_length,
41 | args.context_size, args.walks_per_node,
42 | sparse=True).to(device)
43 |
44 | loader = model.loader(batch_size=args.batch_size, shuffle=True,
45 | num_workers=4)
46 | optimizer = torch.optim.SparseAdam(list(model.parameters()), lr=args.lr)
47 |
48 | model.train()
49 | for epoch in range(1, args.epochs + 1):
50 | for i, (pos_rw, neg_rw) in enumerate(loader):
51 | optimizer.zero_grad()
52 | loss = model.loss(pos_rw.to(device), neg_rw.to(device))
53 | loss.backward()
54 | optimizer.step()
55 |
56 | if (i + 1) % args.log_steps == 0:
57 | print(f'Epoch: {epoch:02d}, Step: {i+1:03d}/{len(loader)}, '
58 | f'Loss: {loss:.4f}')
59 |
60 | if (i + 1) % 1000 == 0: # Save model every 100 steps.
61 | save_embedding(model, args.embedding_dim, dataset_name)
62 | save_embedding(model, args.embedding_dim, dataset_name)
63 |
64 |
65 | if __name__ == "__main__":
66 | main()
--------------------------------------------------------------------------------
/docs/03_reproduce.md:
--------------------------------------------------------------------------------
1 | # Reproducing results
2 |
3 | Run the following commands in the root directory. Note that many of the hyperparameters are not extensively tuned and are adapted from [GOAT repo](https://github.com/devnkong/GOAT).
4 |
5 | ## 1. `ogbn-products`
6 | ```
7 | python main.py \
8 | --dataset ogbn-products \
9 | --sample_node_len 100 \
10 | --lr 1e-3 \
11 | --batch_size 1024 \
12 | --test_batch_size 256 \
13 | --hidden_dim 256 \
14 | --global_dim 64 \
15 | --num_workers 4 \
16 | --conv_type local \
17 | --num_heads 2 \
18 | --num_centroids 4096
19 |
20 | python main.py \
21 | --dataset ogbn-products \
22 | --sample_node_len 100 \
23 | --lr 1e-3 \
24 | --batch_size 1024 \
25 | --test_batch_size 256 \
26 | --hidden_dim 256 \
27 | --global_dim 64 \
28 | --num_workers 4 \
29 | --conv_type full \
30 | --num_heads 2 \
31 | --num_centroids 4096
32 | ```
33 |
34 | ## 2. `snap-patents`
35 | ```
36 | python main.py \
37 | --dataset snap-patents \
38 | --sample_node_len 50 \
39 | --lr 1e-3 \
40 | --batch_size 2048 \
41 | --test_batch_size 1024 \
42 | --hidden_dim 128 \
43 | --global_dim 64 \
44 | --num_workers 4 \
45 | --conv_type local \
46 | --num_heads 2 \
47 | --num_centroids 4096
48 |
49 | python main.py \
50 | --dataset snap-patents \
51 | --sample_node_len 50 \
52 | --lr 1e-3 \
53 | --batch_size 2048 \
54 | --test_batch_size 1024 \
55 | --hidden_dim 128 \
56 | --global_dim 64 \
57 | --num_workers 4 \
58 | --conv_type full \
59 | --num_heads 2 \
60 | --num_centroids 4096
61 | ```
62 |
63 | ## 3. `ogbn-papers100M`
64 | ```
65 | python main.py \
66 | --dataset ogbn-papers100M \
67 | --sample_node_len 100 \
68 | --lr 1e-3 \
69 | --batch_size 1024 \
70 | --test_batch_size 1024 \
71 | --hidden_dim 512 \
72 | --global_dim 128 \
73 | --num_workers 4 \
74 | --conv_type full \
75 | --num_heads 2 \
76 | --num_centroids 4096
77 | ```
78 |
79 | ## 4. Additional experiments
80 |
81 | ```
82 | sample_node_len_values=(20 40 50 60 80 100 150 200)
83 |
84 | for sample_node_len in "${sample_node_len_values[@]}"; do
85 | echo "Running with sample_node_len = $sample_node_len"
86 |
87 | python main.py \
88 | --dataset ogbn-products \
89 | --sample_node_len $sample_node_len \
90 | --lr 1e-3 \
91 | --batch_size 1024 \
92 | --test_batch_size 256 \
93 | --hidden_dim 256 \
94 | --global_dim 64 \
95 | --num_workers 4 \
96 | --conv_type full \
97 | --num_heads 2 \
98 | --num_centroids 4096
99 |
100 | # sleep 2s
101 | done
102 | ```
103 |
104 | ```
105 | sample_node_len_values=(20 40 50 60 80 100 150 200)
106 |
107 | for sample_node_len in "${sample_node_len_values[@]}"; do
108 | echo "Running with sample_node_len = $sample_node_len"
109 |
110 | python main.py \
111 | --dataset snap-patents \
112 | --sample_node_len $sample_node_len \
113 | --lr 1e-3 \
114 | --batch_size 2048 \
115 | --test_batch_size 1024 \
116 | --hidden_dim 128 \
117 | --global_dim 64 \
118 | --num_workers 4 \
119 | --conv_type full \
120 | --num_heads 2 \
121 | --num_centroids 4096
122 |
123 | # sleep 2s
124 | done
125 | ```
--------------------------------------------------------------------------------
/codebook.py:
--------------------------------------------------------------------------------
1 | from re import X
2 | import numpy as np
3 | import torch
4 | from torch import nn
5 | import torch.nn.functional as F
6 |
7 |
8 | class VectorQuantizerEMA(nn.Module):
9 | """
10 | Vector Quantizer with Exponential Moving Average (EMA) for the codebook.
11 | Adapted from https://github.com/devnkong/GOAT
12 |
13 | Args:
14 | num_embeddings (int): The number of embeddings in the codebook.
15 | embedding_dim (int): The dimensionality of each embedding.
16 | decay (float, optional): The decay rate for the EMA. Defaults to 0.99.
17 |
18 | Attributes:
19 | _embedding_dim (int): The dimensionality of each embedding.
20 | _num_embeddings (int): The number of embeddings in the codebook.
21 | _decay (float): The decay rate for the EMA.
22 | _embedding (nn.Embedding): The embedding matrix.
23 | _ema_cluster_size (torch.Tensor): The exponential moving average of the cluster sizes.
24 | _ema_w (torch.Tensor): The exponential moving average of the embedding updates.
25 | """
26 |
27 | def __init__(self, num_embeddings, embedding_dim, decay=0.99):
28 | super(VectorQuantizerEMA, self).__init__()
29 |
30 | self._embedding_dim = embedding_dim
31 | self._num_embeddings = num_embeddings
32 |
33 | self.register_buffer(
34 | "_embedding", torch.randn(self._num_embeddings, self._embedding_dim * 2)
35 | )
36 | self.register_buffer(
37 | "_embedding_output",
38 | torch.randn(self._num_embeddings, self._embedding_dim * 2),
39 | )
40 | self.register_buffer("_ema_cluster_size", torch.zeros(num_embeddings))
41 | self.register_buffer(
42 | "_ema_w", torch.randn(self._num_embeddings, self._embedding_dim * 2)
43 | )
44 |
45 | self._decay = decay
46 | self.bn = torch.nn.BatchNorm1d(self._embedding_dim * 2, affine=False)
47 |
48 | def get_k(self):
49 | """
50 | Returns the key tensor of the embedding matrix.
51 | """
52 | return self._embedding_output
53 |
54 | def get_v(self):
55 | """
56 | Returns the value tensor of the embedding matrix.
57 | """
58 | return self._embedding_output[:, : self._embedding_dim]
59 |
60 | def update(self, x):
61 | inputs_normalized = self.bn(x)
62 | embedding_normalized = self._embedding
63 |
64 | # Calculate distances
65 | distances = (
66 | torch.sum(inputs_normalized**2, dim=1, keepdim=True)
67 | + torch.sum(embedding_normalized**2, dim=1)
68 | - 2 * torch.matmul(inputs_normalized, embedding_normalized.t())
69 | )
70 |
71 | # Encoding
72 | encoding_indices = torch.argmin(distances, dim=1).unsqueeze(1)
73 | encodings = torch.zeros(
74 | encoding_indices.shape[0], self._num_embeddings, device=x.device
75 | )
76 | encodings.scatter_(1, encoding_indices, 1)
77 |
78 | # Use EMA to update the embedding vectors
79 | if self.training:
80 | self._ema_cluster_size.data = self._ema_cluster_size * self._decay + (
81 | 1 - self._decay
82 | ) * torch.sum(encodings, 0)
83 |
84 | # Laplace smoothing of the cluster size
85 | n = torch.sum(self._ema_cluster_size.data)
86 | self._ema_cluster_size.data = (
87 | (self._ema_cluster_size + 1e-5) / (n + self._num_embeddings * 1e-5) * n
88 | )
89 |
90 | dw = torch.matmul(encodings.t(), inputs_normalized)
91 | self._ema_w.data = self._ema_w * self._decay + (1 - self._decay) * dw
92 | self._embedding.data = self._ema_w / self._ema_cluster_size.unsqueeze(1)
93 |
94 | running_std = torch.sqrt(self.bn.running_var + 1e-5).unsqueeze(dim=0)
95 | running_mean = self.bn.running_mean.unsqueeze(dim=0)
96 | self._embedding_output.data = self._embedding * running_std + running_mean
97 |
98 | return encoding_indices
99 |
--------------------------------------------------------------------------------
/data/download_data.sh:
--------------------------------------------------------------------------------
1 | # ogbn-products and ogbn-papers100M dataset are downloaded by ogb package
2 |
3 | ## 1. Mandatory download files (not automatically prepared in code repo, otherwise)
4 |
5 | # snap-patents dataset
6 | curl "https://www.dropbox.com/scl/fi/upsn08zx20nsxcyhfwc8d/snap_patents.mat?rlkey=4efpj1sg9s2fe5gjf7755tg81&dl=1" -o snap_patents.mat -J -L -k
7 |
8 | # pos_enc pre-prcessed files
9 | curl "https://www.dropbox.com/scl/fi/k3pabro7veyufburmmqkn/ogbn-products_embedding_64.pt?rlkey=ac84eb75nj741ueic7l69pbab&dl=1" -o ogbn-products_embedding_64.pt -J -L -k
10 | curl "https://www.dropbox.com/scl/fi/ig4t4kbk2454gteuel1xg/snap-patents_embedding_64.pt?rlkey=gduuevqbd1qsnrlat07mx7zka&dl=1" -o snap-patents_embedding_64.pt -J -L -k
11 | curl "https://www.dropbox.com/scl/fi/361vcdkk459kpj19tsqz2/ogbn-papers100M_data_dict.pt?rlkey=0ptndnsybxoptxfvxia9ncylf&dl=1" -o ogbn-papers100M_data_dict.pt -J -L -k
12 |
13 | ## 2. Optional download files (automatically prepared in code repo, otherwise) | For main expts
14 | curl "https://www.dropbox.com/scl/fi/puy19h1mx6bqy0k7g21sv/snap-patents_new_tokenizer_duplicates_50sample_node_len_2hop.pt?rlkey=tudi1p32t273xowr2uxpt5rr0&dl=1" -o snap-patents_sample_node_len_50.pt -J -L -k
15 | curl "https://www.dropbox.com/scl/fi/c8vy6lumnge89krv0wsv0/ogbn-products_new_tokenizer_duplicates_100sample_node_len_2hop.pt?rlkey=sflq2gs6bs1n6qydm8fab1upm&dl=1" -o ogbn-products_sample_node_len_100.pt -J -L -k
16 | curl "https://www.dropbox.com/scl/fi/s89fe40rpubz4wci5bhq0/ogbn-papers100M_new_tokenizer_duplicates_100sample_node_len_2hop.pt?rlkey=rak2qnbft7hacgsabapbxlsrn&dl=1" -o ogbn-papers100M_sample_node_len_100.pt -J -L -k
17 | curl "https://www.dropbox.com/scl/fi/bj8oi70urszrk5ngophr8/ogbn-papers100M_new_tokenizer_duplicates_100sample_node_len_2hop_hop2token_feats.pt?rlkey=pifpa9qok19yn5ya7m19jz5rk&dl=1" -o ogbn-papers100M_sample_node_len_100_hop2token_feats.pt -J -L -k
18 |
19 | ## 3. Optional download files (automatically prepared in code repo, otherwise) | For ablation studies
20 |
21 | # curl "https://www.dropbox.com/scl/fi/zvc8zh983pu8j60lxjw58/snap-patents_new_tokenizer_duplicates_20sample_node_len_2hop.pt?rlkey=r4gd4l28yd2k0f5y066pho0p1&dl=1" -o snap-patents_sample_node_len_20.pt -J -L -k
22 | # curl "https://www.dropbox.com/scl/fi/7c2tb1lopwk4ctnfliffc/snap-patents_new_tokenizer_duplicates_40sample_node_len_2hop.pt?rlkey=54jlqn4fian5sstvi1w76z65q&dl=1" -o snap-patents_sample_node_len_40.pt -J -L -k
23 | # curl "https://www.dropbox.com/scl/fi/17cg49vuc0q0k3lv9o64e/snap-patents_new_tokenizer_duplicates_60sample_node_len_2hop.pt?rlkey=kdcom0hgde1davlsbehentiet&dl=1" -o snap-patents_sample_node_len_60.pt -J -L -k
24 | # curl "https://www.dropbox.com/scl/fi/78ipdkdvb8bh4bxfauz08/snap-patents_new_tokenizer_duplicates_80sample_node_len_2hop.pt?rlkey=6vip7gi8nlnen9uyiwzme376k&dl=1" -o snap-patents_sample_node_len_80.pt -J -L -k
25 | # curl "https://www.dropbox.com/scl/fi/hi99mql9cy8aryq5uzvyt/snap-patents_new_tokenizer_duplicates_100sample_node_len_2hop.pt?rlkey=tqniki8g9nn3ob5ylmu8njy41&dl=1" -o snap-patents_sample_node_len_100.pt -J -L -k
26 | # curl "https://www.dropbox.com/scl/fi/fom1bx36meib7o1kh5uhn/snap-patents_new_tokenizer_duplicates_150sample_node_len_2hop.pt?rlkey=8bqis8xth3frqyg3zqfsjjn6r&dl=1" -o snap-patents_sample_node_len_150.pt -J -L -k
27 | # curl "https://www.dropbox.com/scl/fi/5j33m1tu9jpj6jba585lf/snap-patents_new_tokenizer_duplicates_200sample_node_len_2hop.pt?rlkey=unr06j8jbc9m7i3omr4j8il3t&dl=1" -o snap-patents_sample_node_len_200.pt -J -L -k
28 |
29 | # curl "https://www.dropbox.com/scl/fi/ux3j7jvopnmk1m5jy4e0l/ogbn-products_new_tokenizer_duplicates_20sample_node_len_2hop.pt?rlkey=9c80gtuh54n6alcj71crvzol5&dl=1" -o ogbn-products_sample_node_len_20.pt -J -L -k
30 | # curl "https://www.dropbox.com/scl/fi/lvvxs2l6td6bxc55kn71j/ogbn-products_new_tokenizer_duplicates_40sample_node_len_2hop.pt?rlkey=bx03569smrhth98rhpjhrfy55&dl=1" -o ogbn-products_sample_node_len_40.pt -J -L -k
31 | # curl "https://www.dropbox.com/scl/fi/1avmqsyb0o5xtmshazwqi/ogbn-products_new_tokenizer_duplicates_60sample_node_len_2hop.pt?rlkey=mvjgchtnyp8bazdxmvznjaw0l&dl=1" -o ogbn-products_sample_node_len_60.pt -J -L -k
32 | # curl "https://www.dropbox.com/scl/fi/zxjnwe8g5uemhh04jngfh/ogbn-products_new_tokenizer_duplicates_80sample_node_len_2hop.pt?rlkey=7ostgb5tuwscdeawzv9wfwxfu&dl=1" -o ogbn-products_sample_node_len_80.pt -J -L -k
33 | # curl "https://www.dropbox.com/scl/fi/heuiotzx3kector1nkf2a/ogbn-products_new_tokenizer_duplicates_150sample_node_len_2hop.pt?rlkey=p5tkv7xyxdu0rfp8as25nes9h&dl=1" -o ogbn-products_sample_node_len_150.pt -J -L -k
34 | # curl "https://www.dropbox.com/scl/fi/e88i28lvgr61or9m5erkq/ogbn-products_new_tokenizer_duplicates_200sample_node_len_2hop.pt?rlkey=lxa3i3we7gklw07ie0xef2z0o&dl=1" -o ogbn-products_sample_node_len_200.pt -J -L -k
35 |
36 | # curl "https://www.dropbox.com/scl/fi/ei93gn3z6nwtxo7dmyj06/ogbn-papers100M_new_tokenizer_duplicates_50sample_node_len_2hop.pt?rlkey=0oa42z2lxe2qgq0jceb6dtpw1&dl=1" -o ogbn-papers100M_sample_node_len_50.pt -J -L -k
--------------------------------------------------------------------------------
/local_module.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import math
3 | import torch.nn as nn
4 | import numpy as np
5 | import torch.nn.functional as F
6 |
7 |
8 | def init_params(module, n_layers):
9 | if isinstance(module, nn.Linear):
10 | module.weight.data.normal_(mean=0.0, std=0.02 / math.sqrt(n_layers))
11 | if module.bias is not None:
12 | module.bias.data.zero_()
13 | if isinstance(module, nn.Embedding):
14 | module.weight.data.normal_(mean=0.0, std=0.02)
15 |
16 |
17 | class LocalModule(nn.Module):
18 | def __init__(
19 | self,
20 | seq_len,
21 | input_dim,
22 | node_only_readout=False,
23 | n_layers=1,
24 | num_heads=8,
25 | hidden_dim=64,
26 | dropout_rate=0.3,
27 | attention_dropout_rate=0,
28 | ):
29 | super().__init__()
30 |
31 | self.seq_len = seq_len
32 | self.node_only_readout = node_only_readout
33 | self.input_dim = input_dim
34 | self.hidden_dim = hidden_dim
35 | self.ffn_dim = 2 * hidden_dim
36 | self.num_heads = num_heads
37 |
38 | self.n_layers = n_layers
39 |
40 | self.dropout_rate = dropout_rate
41 | self.attention_dropout_rate = attention_dropout_rate
42 |
43 | self.att_embeddings_nope = nn.Linear(self.input_dim, self.hidden_dim)
44 |
45 | encoders = [
46 | EncoderLayer(
47 | self.hidden_dim,
48 | self.ffn_dim,
49 | self.dropout_rate,
50 | self.attention_dropout_rate,
51 | self.num_heads,
52 | )
53 | for _ in range(self.n_layers)
54 | ]
55 | self.layers = nn.ModuleList(encoders)
56 | self.final_ln = nn.LayerNorm(hidden_dim * num_heads)
57 |
58 | self.out_proj = nn.Linear(self.ffn_dim, int(self.ffn_dim / 2))
59 | self.attn_layer = nn.Linear(2 * self.hidden_dim * num_heads, 1)
60 |
61 | self.apply(lambda module: init_params(module, n_layers=n_layers))
62 |
63 | def forward(self, batched_data):
64 | tensor = self.att_embeddings_nope(batched_data)
65 |
66 | # transformer encoder
67 | for enc_layer in self.layers:
68 | tensor = enc_layer(tensor)
69 |
70 | output = self.final_ln(tensor)
71 |
72 | _target = output[:, 0, :].unsqueeze(1).repeat(1, self.seq_len - 1, 1)
73 | split_tensor = torch.split(output, [1, self.seq_len - 1], dim=1)
74 |
75 | node_tensor = split_tensor[0]
76 | _neighbor_tensor = split_tensor[1]
77 |
78 | if self.node_only_readout:
79 | # only slicing the indices that belong to nodes and not the 1-hop and 2-hop feats
80 | indices = torch.arange(3, self.seq_len, 3)
81 | neighbor_tensor = _neighbor_tensor[:, indices]
82 | target = _target[:, indices]
83 | else:
84 | target = _target
85 | neighbor_tensor = _neighbor_tensor
86 |
87 | layer_atten = self.attn_layer(torch.cat((target, neighbor_tensor), dim=2))
88 | layer_atten = F.softmax(layer_atten, dim=1)
89 |
90 | neighbor_tensor = neighbor_tensor * layer_atten
91 | neighbor_tensor = torch.sum(neighbor_tensor, dim=1, keepdim=True)
92 |
93 | output = (node_tensor + neighbor_tensor).squeeze()
94 |
95 | return output
96 |
97 |
98 | class FeedForwardNetwork(nn.Module):
99 | def __init__(self, hidden_size, ffn_size, dropout_rate):
100 | super(FeedForwardNetwork, self).__init__()
101 |
102 | self.layer1 = nn.Linear(hidden_size, ffn_size)
103 | self.gelu = nn.GELU()
104 | self.layer2 = nn.Linear(ffn_size, hidden_size)
105 |
106 | def forward(self, x):
107 | x = self.layer1(x)
108 | x = self.gelu(x)
109 | x = self.layer2(x)
110 | return x
111 |
112 |
113 | class MultiHeadAttention(nn.Module):
114 | def __init__(self, hidden_size, attention_dropout_rate, num_heads):
115 | super(MultiHeadAttention, self).__init__()
116 |
117 | self.num_heads = num_heads
118 |
119 | self.att_size = att_size = hidden_size # // num_heads
120 | self.scale = att_size**-0.5
121 |
122 | self.linear_q = nn.Linear(hidden_size, num_heads * att_size)
123 | self.linear_k = nn.Linear(hidden_size, num_heads * att_size)
124 | self.linear_v = nn.Linear(hidden_size, num_heads * att_size)
125 | self.att_dropout = nn.Dropout(attention_dropout_rate)
126 |
127 | self.output_layer = nn.Linear(num_heads * att_size, num_heads * att_size)
128 |
129 | def forward(self, q, k, v, attn_bias=None):
130 | d_k = self.att_size
131 | d_v = self.att_size
132 | batch_size = q.size(0)
133 |
134 | q = self.linear_q(q).view(batch_size, -1, self.num_heads, d_k)
135 | k = self.linear_k(k).view(batch_size, -1, self.num_heads, d_k)
136 | v = self.linear_v(v).view(batch_size, -1, self.num_heads, d_v)
137 |
138 | q = q.transpose(1, 2) # [b, h, q_len, d_k]
139 | v = v.transpose(1, 2) # [b, h, v_len, d_v]
140 | k = k.transpose(1, 2).transpose(2, 3) # [b, h, d_k, k_len]
141 |
142 | q = q * self.scale
143 | x = torch.matmul(q, k) # [b, h, q_len, k_len]
144 | if attn_bias is not None:
145 | x = x + attn_bias
146 |
147 | x = torch.softmax(x, dim=3)
148 | x = self.att_dropout(x)
149 | x = x.matmul(v) # [b, h, q_len, attn]
150 |
151 | x = x.transpose(1, 2).contiguous() # [b, q_len, h, attn]
152 | x = x.view(batch_size, -1, self.num_heads * d_v)
153 |
154 | x = self.output_layer(x)
155 |
156 | return x
157 |
158 |
159 | class EncoderLayer(nn.Module):
160 | def __init__(
161 | self, hidden_size, ffn_size, dropout_rate, attention_dropout_rate, num_heads
162 | ):
163 | super(EncoderLayer, self).__init__()
164 |
165 | self.self_attention_norm = nn.LayerNorm(hidden_size)
166 | self.self_attention = MultiHeadAttention(
167 | hidden_size, attention_dropout_rate, num_heads
168 | )
169 | self.self_attention_dropout = nn.Dropout(dropout_rate)
170 | self.res_proj = nn.Linear(hidden_size, hidden_size * num_heads)
171 | self.ffn_dropout = nn.Dropout(dropout_rate)
172 |
173 | def forward(self, x, attn_bias=None):
174 | y = self.self_attention_norm(x)
175 | y = self.self_attention(y, y, y, attn_bias)
176 | y = self.self_attention_dropout(y)
177 | x = self.res_proj(x) + y
178 |
179 | x = self.ffn_dropout(x)
180 | return x
181 |
--------------------------------------------------------------------------------
/data.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | import numpy as np
4 | import os
5 | import os.path
6 | import time
7 |
8 | from data_utils import (
9 | get_ogbn_products_with_splits,
10 | get_snap_patents_with_splits,
11 | get_ogbn_papers100M_with_splits,
12 | get_data_pt_file,
13 | )
14 |
15 |
16 | def rand_train_test_idx(label, train_prop=0.5, valid_prop=0.25, ignore_negative=True):
17 | """randomly splits label into train/valid/test splits"""
18 | if ignore_negative:
19 | labeled_nodes = torch.where(label != -1)[0]
20 | else:
21 | labeled_nodes = label
22 |
23 | n = labeled_nodes.shape[0]
24 | train_num = int(n * train_prop)
25 | valid_num = int(n * valid_prop)
26 |
27 | perm = torch.as_tensor(np.random.permutation(n))
28 |
29 | train_indices = perm[:train_num]
30 | val_indices = perm[train_num : train_num + valid_num]
31 | test_indices = perm[train_num + valid_num :]
32 |
33 | if not ignore_negative:
34 | return train_indices, val_indices, test_indices
35 |
36 | train_idx = labeled_nodes[train_indices]
37 | valid_idx = labeled_nodes[val_indices]
38 | test_idx = labeled_nodes[test_indices]
39 |
40 | return train_idx, valid_idx, test_idx
41 |
42 |
43 | def even_quantile_labels(vals, nclasses=5, verbose=True):
44 | """partitions vals into nclasses by a quantile based split,
45 | where the first class is less than the 1/nclasses quantile,
46 | second class is less than the 2/nclasses quantile, and so on
47 |
48 | vals is np array
49 | returns an np array of int class labels
50 | """
51 | label = -1 * np.ones(vals.shape[0], dtype=int)
52 | interval_lst = []
53 | lower = -np.inf
54 | for k in range(nclasses - 1):
55 | upper = np.nanquantile(vals, (k + 1) / nclasses)
56 | interval_lst.append((lower, upper))
57 | inds = (vals >= lower) * (vals < upper)
58 | label[inds] = k
59 | lower = upper
60 | label[vals >= lower] = nclasses - 1
61 | interval_lst.append((lower, np.inf))
62 | if verbose:
63 | print("Class Label Intervals:")
64 | for class_idx, interval in enumerate(interval_lst):
65 | print(f"Class {class_idx}: [{interval[0]}, {interval[1]})]")
66 | return label
67 |
68 |
69 | def get_dataset_with_splits(dataset):
70 | if dataset == "ogbn-products":
71 | (
72 | adj,
73 | features,
74 | labels,
75 | idx_train,
76 | idx_val,
77 | idx_test,
78 | ) = get_ogbn_products_with_splits()
79 | elif dataset == "snap-patents":
80 | (
81 | adj,
82 | features,
83 | labels,
84 | idx_train,
85 | idx_val,
86 | idx_test,
87 | ) = get_snap_patents_with_splits()
88 | elif dataset == "ogbn-papers100M":
89 | (
90 | adj,
91 | features,
92 | labels,
93 | idx_train,
94 | idx_val,
95 | idx_test,
96 | ) = get_ogbn_papers100M_with_splits()
97 |
98 | return (adj, features, labels, idx_train, idx_val, idx_test)
99 |
100 |
101 | class LargeGTTokens(torch.utils.data.Dataset):
102 | """
103 | A class for preparing the input data used in the local module
104 | of LargeGT, or load it from a file. The class also contains
105 | the collate_new function for the dataloader which returns the
106 | input tokens for a mini-batch sample.
107 |
108 | Args:
109 | name (str): The name of the dataset.
110 | sample_node_len (int, optional): The total number of 1,2
111 | hop neighbors to sample for each node.
112 |
113 | """
114 |
115 | def __init__(self, name, sample_node_len=50, seed=0):
116 | super(LargeGTTokens, self).__init__()
117 |
118 | start = time.time()
119 | print("[I] Loading dataset %s..." % (name))
120 | self.name = name
121 | self.sample_node_len = sample_node_len
122 |
123 | file_path = "data/" + self.name + ".pt"
124 |
125 | if os.path.exists(file_path):
126 | print("processed data file exists, loading...")
127 | data_list = torch.load(file_path)
128 | else:
129 | print("processed data file does not exists, preparing...")
130 | data_list = get_data_pt_file(
131 | name, get_dataset_with_splits(name.split("_")[0]), self.sample_node_len
132 | )
133 |
134 | try:
135 | self.nodes_in_seq = torch.tensor(data_list[0])
136 | self.X = torch.tensor(data_list[1], dtype=torch.float32)
137 | self.hop2token_feats = torch.tensor(data_list[2], dtype=torch.float32)
138 | except KeyError:
139 | # for ogbn-papers100M with sample_node_len=100, the total data is too large
140 | # hence we save the data in multiple files
141 | self.nodes_in_seq = torch.tensor(data_list["nodes_in_seq"])
142 | self.X = torch.tensor(data_list["node_feat"])
143 | self.hop2token_feats = torch.load(
144 | file_path.replace(".pt", "_hop2token_feats.pt")
145 | )
146 |
147 | self.y = torch.tensor(data_list["label"])
148 | self.split_idx = data_list["split_idx"]
149 |
150 | self.token_len = self.nodes_in_seq.shape[1]
151 | self.input_dim = self.X.shape[-1]
152 |
153 | self.data_token_len = self.token_len * 3
154 |
155 | del data_list
156 | print("[I] Data load time: {:.4f}s".format(time.time() - start))
157 |
158 | def collate(self, samples, original_X=None):
159 | return self.collate_new(samples, original_X)
160 |
161 | def collate_new_slow(self, batch):
162 | """
163 | The function implements the Algorithm InputTokens in the paper.
164 | Slow version -- not used in the main code.
165 | """
166 | mini_batch_size = len(batch)
167 |
168 | seq = torch.empty(mini_batch_size, self.token_len * 3, self.input_dim)
169 |
170 | for i, node in enumerate(batch):
171 | j = 0
172 | for sampled_node in self.nodes_in_seq[node]:
173 | seq[i, j] = self.X[sampled_node]
174 | seq[i, j + 1] = self.hop2token_feats[sampled_node, 0]
175 | seq[i, j + 2] = self.hop2token_feats[sampled_node, 1]
176 | j += 3
177 |
178 | return seq, torch.tensor(batch)
179 |
180 | def collate_new(self, batch, original_X):
181 | """
182 | The function implements the Algorithm InputTokens in the paper.
183 | Efficient version -- used in the main code.
184 | """
185 | mini_batch_size = len(batch)
186 | seq = torch.empty(mini_batch_size, self.token_len * 3, self.input_dim)
187 |
188 | sampled_nodes = torch.stack([self.nodes_in_seq[node] for node in batch])
189 |
190 | i, j = torch.meshgrid(
191 | torch.arange(mini_batch_size), torch.arange(self.token_len), indexing="ij"
192 | )
193 |
194 | i = i.flatten()
195 | j = j.flatten() * 3
196 | sampled_nodes = sampled_nodes.flatten()
197 |
198 | if original_X is not None:
199 | seq[i, j] = original_X[sampled_nodes]
200 | else:
201 | seq[i, j] = self.X[sampled_nodes]
202 | seq[i, j + 1] = self.hop2token_feats[sampled_nodes, 0]
203 | seq[i, j + 2] = self.hop2token_feats[sampled_nodes, 1]
204 |
205 | return seq, torch.tensor(batch)
206 |
--------------------------------------------------------------------------------
/data_utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from tqdm import tqdm
3 | import random
4 | import time
5 | import numpy as np
6 | import scipy.sparse as sp
7 | from multiprocessing import Pool, cpu_count
8 |
9 | from ogb.nodeproppred import DglNodePropPredDataset
10 |
11 | # global vars used in data sampling steps in create_node_ids()
12 | all_1hop_indices = None
13 | all_2hop_indices = None
14 | all_nodes_set = None
15 | seq_length = None
16 |
17 |
18 | def get_ogbn_products_with_splits():
19 | dataset = DglNodePropPredDataset(name="ogbn-products")
20 |
21 | g = dataset[0][0]
22 | split_idx = dataset.get_idx_split()
23 | adj = g.adj_external(scipy_fmt="csr")
24 | features = g.ndata["feat"]
25 | labels = torch.nn.functional.one_hot(
26 | dataset[0][1].view(-1), dataset[0][1].max() + 1
27 | )
28 | idx_train = split_idx["train"]
29 | idx_val = split_idx["valid"]
30 | idx_test = split_idx["test"]
31 |
32 | return adj, features, labels, idx_train, idx_val, idx_test
33 |
34 |
35 | def get_snap_patents_with_splits():
36 | import scipy
37 |
38 | fulldata = scipy.io.loadmat(f"data/snap_patents.mat")
39 | edge_index = torch.tensor(fulldata["edge_index"], dtype=torch.long)
40 |
41 | num_nodes = int(fulldata["num_nodes"])
42 | features = torch.tensor(fulldata["node_feat"].todense(), dtype=torch.float)
43 |
44 | adj = sp.csr_matrix(
45 | (torch.ones(edge_index.shape[1]), edge_index),
46 | shape=(num_nodes, num_nodes),
47 | dtype=np.int64,
48 | )
49 |
50 | labels, idx_train, idx_val, idx_test = (
51 | torch.rand(1),
52 | torch.rand(1),
53 | torch.rand(1),
54 | torch.rand(1),
55 | )
56 | return adj, features, labels, idx_train, idx_val, idx_test
57 |
58 |
59 | def get_ogbn_papers100M_with_splits():
60 | dataset = DglNodePropPredDataset(name="ogbn-papers100M")
61 |
62 | g = dataset[0][0]
63 | split_idx = dataset.get_idx_split()
64 | adj = g.adj_external(scipy_fmt="csr")
65 | features = g.ndata["feat"]
66 |
67 | labels = torch.rand(1) # dummy value as it is not required for GOAT2 code
68 | idx_train = split_idx["train"]
69 | idx_val = split_idx["valid"]
70 | idx_test = split_idx["test"]
71 |
72 | return adj, features, labels, idx_train, idx_val, idx_test
73 |
74 |
75 | def sparse_mx_to_torch_sparse_tensor(sparse_mx):
76 | """Convert a scipy sparse matrix to a torch sparse tensor."""
77 | sparse_mx = sparse_mx.tocoo().astype(np.float32)
78 | indices = torch.from_numpy(
79 | np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64)
80 | )
81 | values = torch.from_numpy(sparse_mx.data)
82 | shape = torch.Size(sparse_mx.shape)
83 | return torch.sparse.FloatTensor(indices, values, shape)
84 |
85 |
86 | # for one node; to be exectuted in parallel
87 | def get_node_ids_for_all_seq(i):
88 | """Returns node ids for all seqs, sampled from 1/2 hop neighborhood"""
89 | # i, hop1indices, hop2indices, all_nodes_set, seq_length = args
90 | global all_1hop_indices, all_2hop_indices, all_nodes_set, seq_length
91 |
92 | hop1_neighbors = all_1hop_indices[i].tolist()
93 | hop2_neighbors = all_2hop_indices[i].tolist()
94 |
95 | all_applicable_neighbors = hop1_neighbors + hop2_neighbors
96 |
97 | if len(all_applicable_neighbors) < seq_length - 1:
98 | if len(all_applicable_neighbors) == 0:
99 | sampled_neighbors = random.sample(all_nodes_set, seq_length - 1)
100 | else:
101 | repeat_count = ((seq_length - 1) // len(all_applicable_neighbors)) + 1
102 | repeated_neighbor_list = all_applicable_neighbors * repeat_count
103 | sampled_neighbors = repeated_neighbor_list[: seq_length - 1]
104 | else:
105 | # for cases where node has >= 99 hop1 and hop2 neighbors
106 | sampled_neighbors = random.sample(all_applicable_neighbors, seq_length - 1)
107 |
108 | return [i] + sampled_neighbors
109 |
110 |
111 | def create_node_ids(X, adj_matrix, for_nagphormer=False, sample_node_len=0):
112 | global all_1hop_indices, all_2hop_indices, all_nodes_set, seq_length
113 |
114 | N = X.size(0)
115 | all_nodes_set = list(set(range(N)))
116 | seq_length = sample_node_len
117 |
118 | print("multiset length for sampling: ", seq_length)
119 |
120 | t0 = time.time()
121 | print("multiplying csr matrix to itself...")
122 |
123 | adj_matrix_2hop = adj_matrix @ adj_matrix
124 | print("Done", time.time() - t0)
125 | print("getting all 1 hop indices...")
126 | all_1hop_indices = np.split(adj_matrix.indices, adj_matrix.indptr)[1:-1]
127 | print("Done\ngetting all 2 hop indices...")
128 | all_2hop_indices = np.split(adj_matrix_2hop.indices, adj_matrix_2hop.indptr)[1:-1]
129 | print("Done!")
130 |
131 | # Parallelize the loop and collect the results
132 | with Pool(cpu_count() - 1) as pool:
133 | args_list = [i for i in range(N)]
134 | node_ids_for_all_seq = list(
135 | tqdm(pool.imap(get_node_ids_for_all_seq, args_list), total=N)
136 | )
137 |
138 | print(
139 | "Retrieved node ids for all seqs, now preparing hop2token feats for hop=2... "
140 | )
141 |
142 | t0 = time.time()
143 |
144 | if for_nagphormer:
145 | hop2token_range = 10
146 | else:
147 | hop2token_range = 3
148 |
149 | # Use 1-hop 2-hop 3-hop feature information from hop2token of nagphormer
150 | hop2token_feats = torch.empty(X.size(0), 1, hop2token_range, X.size(1))
151 |
152 | renormalize = True # required for preparing hop2token feats
153 | if renormalize:
154 | adj_matrix = adj_matrix + sp.eye(adj_matrix.shape[0])
155 | D1 = np.array(adj_matrix.sum(axis=1)) ** (-0.5)
156 | D2 = np.array(adj_matrix.sum(axis=0)) ** (-0.5)
157 | D1 = sp.diags(D1[:, 0], format="csr")
158 | D2 = sp.diags(D2[0, :], format="csr")
159 |
160 | A = adj_matrix.dot(D1)
161 | A = D2.dot(A)
162 | adj_matrix = A
163 |
164 | adj_matrix = sparse_mx_to_torch_sparse_tensor(adj_matrix)
165 | tmp = X + torch.zeros_like(X)
166 | for i in range(hop2token_range):
167 | # only preparing 3-hop for hop2token feats
168 | # expts could use upto 2-hop only
169 | tmp = torch.matmul(adj_matrix, tmp)
170 | for index in range(X.shape[0]):
171 | hop2token_feats[index, 0, i, :] = tmp[index]
172 | hop2token_feats = hop2token_feats.squeeze()
173 |
174 | print("DONE!", time.time() - t0)
175 | del adj_matrix, adj_matrix_2hop, tmp
176 |
177 | print("Saving now ...")
178 | return node_ids_for_all_seq, hop2token_feats
179 |
180 |
181 | def get_data_pt_file(name, data_args, sample_node_len):
182 | # adj_matrix, X, labels, idx_train, idx_val, idx_test = data_args
183 | adj_matrix = data_args[0]
184 | X = torch.tensor(data_args[1], dtype=torch.float32)
185 | labels = torch.tensor(data_args[2])
186 | idx_train = torch.tensor(data_args[3])
187 | idx_val = torch.tensor(data_args[4])
188 | idx_test = torch.tensor(data_args[5])
189 |
190 | if name.split("_")[1] == "nagphormer":
191 | file_save_name = "dataset/" + name + ".pt"
192 | for_nagphormer = True
193 | else:
194 | for_nagphormer = False
195 | if name.split("_")[0] == "ogbn-products":
196 | file_save_name = (
197 | "data/ogbn-products"
198 | + "_sample_node_len_"
199 | + str(sample_node_len)
200 | + ".pt"
201 | )
202 | elif name.split("_")[0] == "snap-patents":
203 | file_save_name = (
204 | "data/snap-patents" + "_sample_node_len_" + str(sample_node_len) + ".pt"
205 | )
206 | elif name.split("_")[0] == "ogbn-papers100M":
207 | file_save_name = (
208 | "data/ogbn-papers100M"
209 | + "_sample_node_len_"
210 | + str(sample_node_len)
211 | + ".pt"
212 | )
213 | else:
214 | raise Exception
215 |
216 | t0 = time.time()
217 | node_ids_for_all_seq, hop2token_feats = create_node_ids(
218 | X, adj_matrix, for_nagphormer, sample_node_len
219 | )
220 | torch.save(
221 | (
222 | node_ids_for_all_seq,
223 | X,
224 | hop2token_feats,
225 | adj_matrix,
226 | labels,
227 | idx_train,
228 | idx_val,
229 | idx_test,
230 | ),
231 | file_save_name,
232 | pickle_protocol=4,
233 | )
234 | print("total time taken: ", time.time() - t0)
235 |
236 | return (
237 | node_ids_for_all_seq,
238 | X,
239 | hop2token_feats,
240 | adj_matrix,
241 | labels,
242 | idx_train,
243 | idx_val,
244 | idx_test,
245 | )
246 |
--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
1 | import math
2 |
3 | import torch
4 | import torch.nn as nn
5 | import torch.nn.functional as F
6 |
7 | from torch_geometric.nn.conv import MessagePassing
8 | from torch_geometric.nn.dense.linear import Linear
9 |
10 | from codebook import VectorQuantizerEMA
11 | from einops import rearrange
12 | from local_module import LocalModule
13 |
14 |
15 | class LargeGTLayer(MessagePassing):
16 | def __init__(
17 | self,
18 | in_channels,
19 | out_channels,
20 | global_dim,
21 | num_nodes,
22 | heads=1,
23 | concat=True,
24 | beta=False,
25 | dropout=0.0,
26 | edge_dim=None,
27 | bias=True,
28 | skip=True,
29 | conv_type="local",
30 | num_centroids=None,
31 | sample_node_len=100,
32 | **kwargs,
33 | ):
34 | kwargs.setdefault("aggr", "add")
35 | super(LargeGTLayer, self).__init__(node_dim=0, **kwargs)
36 |
37 | self.in_channels = in_channels
38 | self.out_channels = out_channels
39 | self.heads = heads
40 | self.beta = beta and skip
41 | self.skip = skip
42 | self.concat = concat
43 | self.dropout = dropout
44 | self.edge_dim = edge_dim
45 | self.conv_type = conv_type
46 | self.num_centroids = num_centroids
47 | self._alpha = None
48 |
49 | self.sample_node_len = sample_node_len
50 |
51 | self.lin_key = Linear(in_channels, heads * out_channels)
52 | self.lin_query = Linear(in_channels, heads * out_channels)
53 | self.lin_value = Linear(in_channels, heads * out_channels)
54 |
55 | if concat:
56 | self.lin_skip = Linear(in_channels, heads * out_channels, bias=bias)
57 | if self.beta:
58 | self.lin_beta = Linear(3 * heads * out_channels, 1, bias=False)
59 | else:
60 | self.lin_beta = self.register_parameter("lin_beta", None)
61 | else:
62 | self.lin_skip = Linear(in_channels, out_channels, bias=bias)
63 | if self.beta:
64 | self.lin_beta = Linear(3 * out_channels, 1, bias=False)
65 | else:
66 | self.lin_beta = self.register_parameter("lin_beta", None)
67 |
68 | self.local_module = LocalModule(
69 | seq_len=self.sample_node_len * 3,
70 | input_dim=in_channels,
71 | n_layers=1,
72 | num_heads=heads,
73 | hidden_dim=out_channels,
74 | )
75 |
76 | if self.conv_type != "local":
77 | self.vq = VectorQuantizerEMA(num_centroids, global_dim, decay=0.99)
78 | c = torch.randint(0, num_centroids, (num_nodes,), dtype=torch.short)
79 | self.register_buffer("c_idx", c)
80 | self.attn_fn = F.softmax
81 |
82 | self.lin_proj_g = Linear(in_channels, global_dim)
83 | self.lin_key_g = Linear(global_dim * 2, heads * out_channels)
84 | self.lin_query_g = Linear(global_dim * 2, heads * out_channels)
85 | self.lin_value_g = Linear(global_dim, heads * out_channels)
86 |
87 | self.reset_parameters()
88 |
89 | def reset_parameters(self):
90 | self.lin_key.reset_parameters()
91 | self.lin_query.reset_parameters()
92 | self.lin_value.reset_parameters()
93 | self.lin_skip.reset_parameters()
94 | if self.beta:
95 | self.lin_beta.reset_parameters()
96 |
97 | def forward(self, seq, x, pos_enc=None, batch_idx=None):
98 | if self.conv_type == "local":
99 | out = self.local_forward(seq)
100 |
101 | elif self.conv_type == "global":
102 | out = self.global_forward(x[: len(batch_idx)], pos_enc, batch_idx)
103 |
104 | elif self.conv_type == "full":
105 | out_local = self.local_forward(seq)
106 | out_global = self.global_forward(x[: len(batch_idx)], pos_enc, batch_idx)
107 | out = torch.cat([out_local, out_global], dim=1)
108 |
109 | else:
110 | raise NotImplementedError
111 |
112 | return out
113 |
114 | def global_forward(self, x, pos_enc, batch_idx):
115 | d, h = self.out_channels, self.heads
116 | scale = 1.0 / math.sqrt(d)
117 |
118 | q_x = torch.cat([self.lin_proj_g(x), pos_enc], dim=1)
119 |
120 | k_x = self.vq.get_k()
121 | v_x = self.vq.get_v()
122 |
123 | q = self.lin_query_g(q_x)
124 | k = self.lin_key_g(k_x)
125 | v = self.lin_value_g(v_x)
126 |
127 | q, k, v = map(lambda t: rearrange(t, "n (h d) -> h n d", h=h), (q, k, v))
128 | dots = torch.einsum("h i d, h j d -> h i j", q, k) * scale
129 |
130 | c, c_count = self.c_idx.unique(return_counts=True)
131 |
132 | centroid_count = torch.zeros(self.num_centroids, dtype=torch.long).to(x.device)
133 | centroid_count[c.to(torch.long)] = c_count
134 |
135 | dots += torch.log(centroid_count.view(1, 1, -1))
136 |
137 | attn = self.attn_fn(dots, dim=-1)
138 | attn = F.dropout(attn, p=self.dropout, training=self.training)
139 |
140 | out = torch.einsum("h i j, h j d -> h i d", attn, v)
141 | out = rearrange(out, "h n d -> n (h d)")
142 |
143 | # Update the centroids
144 | if self.training:
145 | x_idx = self.vq.update(q_x)
146 | self.c_idx[batch_idx] = x_idx.squeeze().to(torch.short)
147 |
148 | return out
149 |
150 | def local_forward(self, seq):
151 | return self.local_module(seq)
152 |
153 | def __repr__(self) -> str:
154 | return (
155 | f"{self.__class__.__name__}({self.in_channels}, "
156 | f"{self.out_channels}, heads={self.heads})"
157 | )
158 |
159 |
160 | class LargeGT(torch.nn.Module):
161 | def __init__(
162 | self,
163 | num_nodes,
164 | in_channels,
165 | hidden_channels,
166 | out_channels,
167 | global_dim,
168 | num_layers,
169 | heads,
170 | ff_dropout,
171 | attn_dropout,
172 | skip,
173 | conv_type,
174 | num_centroids,
175 | no_bn,
176 | norm_type,
177 | sample_node_len,
178 | ):
179 | super(LargeGT, self).__init__()
180 |
181 | if norm_type == "batch_norm":
182 | norm_func = nn.BatchNorm1d
183 | elif norm_type == "layer_norm":
184 | norm_func = nn.LayerNorm
185 |
186 | if no_bn:
187 | self.fc_in = nn.Sequential(
188 | nn.Linear(in_channels, hidden_channels),
189 | nn.ReLU(),
190 | nn.Dropout(ff_dropout),
191 | nn.Linear(hidden_channels, hidden_channels),
192 | )
193 | self.fc_in_seq = nn.Sequential(
194 | nn.Linear(in_channels, hidden_channels),
195 | nn.ReLU(),
196 | nn.Dropout(ff_dropout),
197 | nn.Linear(hidden_channels, hidden_channels),
198 | )
199 | else:
200 | self.fc_in = nn.Sequential(
201 | nn.Linear(in_channels, hidden_channels),
202 | norm_func(hidden_channels),
203 | nn.ReLU(),
204 | nn.Dropout(ff_dropout),
205 | nn.Linear(hidden_channels, hidden_channels),
206 | )
207 | self.fc_in_seq = nn.Sequential(
208 | nn.Linear(in_channels, hidden_channels),
209 | # norm_func(hidden_channels),
210 | nn.ReLU(),
211 | nn.Dropout(ff_dropout),
212 | nn.Linear(hidden_channels, hidden_channels),
213 | )
214 | self.convs = torch.nn.ModuleList()
215 | self.ffs = torch.nn.ModuleList()
216 |
217 | assert num_layers == 1
218 | for _ in range(num_layers):
219 | self.convs.append(
220 | LargeGTLayer(
221 | in_channels=hidden_channels,
222 | out_channels=hidden_channels,
223 | global_dim=global_dim,
224 | num_nodes=num_nodes,
225 | heads=heads,
226 | dropout=attn_dropout,
227 | skip=skip,
228 | conv_type=conv_type,
229 | num_centroids=num_centroids,
230 | sample_node_len=sample_node_len,
231 | )
232 | )
233 | h_times = 2 if conv_type == "full" else 1
234 |
235 | if no_bn:
236 | self.ffs.append(
237 | nn.Sequential(
238 | nn.Linear(
239 | h_times * hidden_channels * heads, hidden_channels * heads
240 | ),
241 | nn.ReLU(),
242 | nn.Dropout(ff_dropout),
243 | nn.Linear(hidden_channels * heads, hidden_channels),
244 | nn.ReLU(),
245 | nn.Dropout(ff_dropout),
246 | )
247 | )
248 | else:
249 | self.ffs.append(
250 | nn.Sequential(
251 | nn.Linear(
252 | h_times * hidden_channels * heads, hidden_channels * heads
253 | ),
254 | norm_func(hidden_channels * heads),
255 | nn.ReLU(),
256 | nn.Dropout(ff_dropout),
257 | nn.Linear(hidden_channels * heads, hidden_channels),
258 | norm_func(hidden_channels),
259 | nn.ReLU(),
260 | nn.Dropout(ff_dropout),
261 | )
262 | )
263 |
264 | self.fc_out = torch.nn.Linear(hidden_channels, out_channels)
265 |
266 | def reset_parameters(self):
267 | self.fc_in.reset_parameters()
268 | for conv in self.convs:
269 | conv.reset_parameters()
270 | for ff in self.ffs:
271 | ff.reset_parameters()
272 | self.fc_out.reset_parameters()
273 |
274 | def forward(self, seq, x, pos_enc, batch_idx):
275 | x = self.fc_in(x)
276 | seq = self.fc_in_seq(seq)
277 |
278 | for i, conv in enumerate(self.convs):
279 | x = conv(seq, x, pos_enc, batch_idx)
280 | x = self.ffs[i](x)
281 | x = self.fc_out(x)
282 | return x
283 |
284 | def global_forward(self, x, pos_enc, batch_idx):
285 | x = self.fc_in(x)
286 | for i, conv in enumerate(self.convs):
287 | x = conv.global_forward(x, pos_enc, batch_idx)
288 | x = self.ffs[i](x)
289 | x = self.fc_out(x)
290 | return x
291 |
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import torch
3 | import torch.nn.functional as F
4 |
5 | from ogb.nodeproppred import PygNodePropPredDataset, Evaluator
6 |
7 | from model import LargeGT
8 | from data import LargeGTTokens, rand_train_test_idx, even_quantile_labels
9 |
10 | import sys
11 | import time
12 | import datetime
13 | import scipy.io
14 | from numpy import mean as npmean
15 | from numpy import std as npstd
16 | import wandb
17 | from multiprocessing import cpu_count
18 |
19 |
20 | def train(model, loader, x, pos_enc, y, optimizer, device, conv_type, evaluator=None):
21 | model.train()
22 |
23 | counter = 1
24 | total_loss, total_correct, total_count = 0, 0, 0
25 |
26 | if conv_type == "global":
27 | for node_idx in loader:
28 | batch_size = len(node_idx)
29 |
30 | feat = x[node_idx] if torch.is_tensor(x) else x(node_idx)
31 | input = feat.to(device), pos_enc[node_idx].to(device), node_idx
32 |
33 | optimizer.zero_grad()
34 | out = model.to(device).global_forward(*input)
35 | loss = F.cross_entropy(out, y[node_idx].to(device))
36 | loss.backward()
37 | optimizer.step()
38 |
39 | total_loss += loss.item() * batch_size
40 | total_correct += out.argmax(dim=-1).cpu().eq(y[node_idx]).sum().item()
41 | total_count += batch_size
42 |
43 | counter += 1
44 |
45 | else:
46 | y_pred, y_true = [], []
47 | for seq, node_idx in loader:
48 | batch_size = len(node_idx)
49 |
50 | feat = x[node_idx] if torch.is_tensor(x) else x(node_idx)
51 | input = (
52 | seq.to(device),
53 | feat.to(device),
54 | pos_enc[node_idx].to(device),
55 | node_idx,
56 | )
57 |
58 | optimizer.zero_grad()
59 | out = model.to(device)(*input)
60 | loss = F.cross_entropy(out, y[node_idx].long().to(device))
61 | loss.backward()
62 | optimizer.step()
63 |
64 | total_loss += loss.item() * batch_size
65 | total_correct += out.argmax(dim=-1).cpu().eq(y[node_idx]).sum().item()
66 | total_count += batch_size
67 |
68 | counter += 1
69 |
70 | y_pred.append(torch.argmax(out, dim=1, keepdim=True).cpu())
71 | y_true.append(y[node_idx].unsqueeze(1))
72 |
73 | if evaluator is not None:
74 | acc = evaluator.eval(
75 | {
76 | "y_true": torch.cat(y_true, dim=0),
77 | "y_pred": torch.cat(y_pred, dim=0),
78 | }
79 | )["acc"]
80 |
81 | return total_loss / total_count, acc
82 |
83 | return total_loss / total_count, total_correct / total_count
84 |
85 |
86 | def test(
87 | model, loader, x, pos_enc, y, device, conv_type, fast_eval=False, evaluator=None
88 | ):
89 | model.eval()
90 |
91 | counter = 1
92 | total_correct, total_count = 0, 0
93 |
94 | if conv_type == "global":
95 | for node_idx in loader:
96 | batch_size = len(node_idx)
97 |
98 | feat = x[node_idx] if torch.is_tensor(x) else x(node_idx)
99 | out = model.to(device).global_forward(
100 | feat.to(device), pos_enc[node_idx].to(device), node_idx
101 | )
102 |
103 | total_correct += out.argmax(dim=-1).cpu().eq(y[node_idx]).sum().item()
104 | total_count += batch_size
105 |
106 | if fast_eval and counter == len(loader) // 10:
107 | if total_correct / total_count < 0.8:
108 | return 0
109 | counter += 1
110 |
111 | else:
112 | y_pred, y_true = [], []
113 | for seq, node_idx in loader:
114 | batch_size = len(node_idx)
115 |
116 | feat = x[node_idx] if torch.is_tensor(x) else x(node_idx)
117 | out = model.to(device)(
118 | seq.to(device), feat.to(device), pos_enc[node_idx].to(device), node_idx
119 | )
120 |
121 | total_correct += out.argmax(dim=-1).cpu().eq(y[node_idx]).sum().item()
122 | total_count += batch_size
123 |
124 | if fast_eval and counter == len(loader) // 10:
125 | if total_correct / total_count < 0.8:
126 | return 0
127 | counter += 1
128 |
129 | y_pred.append(torch.argmax(out, dim=1, keepdim=True).cpu())
130 | y_true.append(y[node_idx].unsqueeze(1))
131 |
132 | if evaluator is not None:
133 | acc = evaluator.eval(
134 | {
135 | "y_true": torch.cat(y_true, dim=0),
136 | "y_pred": torch.cat(y_pred, dim=0),
137 | }
138 | )["acc"]
139 |
140 | return acc
141 |
142 | return total_correct / total_count
143 |
144 |
145 | def create_run_name_with_timestamp(args, timestamp):
146 | run_name = "largegt_"
147 |
148 | for arg_name, arg_value in vars(args).items():
149 | run_name += f"{arg_name}_{arg_value}_"
150 |
151 | run_name += "timestamp_" + timestamp
152 | return run_name
153 |
154 |
155 | def main(tstamp=0):
156 | parser = argparse.ArgumentParser(description="large")
157 |
158 | # data loading
159 | parser.add_argument(
160 | "--dataset",
161 | type=str,
162 | default="ogbn-products",
163 | choices=["ogbn-products", "snap-patents", "ogbn-papers100M"],
164 | )
165 | parser.add_argument("--data_root", type=str, default="data")
166 |
167 | # training
168 | parser.add_argument("--hetero_train_prop", type=float, default=0.5)
169 | parser.add_argument("--device", type=int, default=0)
170 | parser.add_argument("--lr", type=float, default=1e-3)
171 | parser.add_argument("--epochs", type=int, default=500)
172 | parser.add_argument("--batch_size", type=int, default=1024)
173 | parser.add_argument("--test_batch_size", type=int, default=256)
174 | parser.add_argument("--test_freq", type=int, default=1)
175 | parser.add_argument("--num_workers", type=int, default=cpu_count() - 1)
176 |
177 | # network
178 | parser.add_argument(
179 | "--conv_type", type=str, default="full", choices=["local", "global", "full"]
180 | )
181 | parser.add_argument("--hidden_dim", type=int, default=256)
182 | parser.add_argument("--global_dim", type=int, default=64)
183 | parser.add_argument("--num_layers", type=int, default=1)
184 | parser.add_argument("--num_heads", type=int, default=1)
185 | parser.add_argument("--attn_dropout", type=float, default=0)
186 | parser.add_argument("--ff_dropout", type=float, default=0.5)
187 | parser.add_argument("--skip", action="store_true")
188 | parser.add_argument("--num_centroids", type=int, default=4096)
189 | parser.add_argument("--no_bn", action="store_true")
190 | parser.add_argument("--norm_type", type=str, default="batch_norm")
191 |
192 | # eval
193 | parser.add_argument("--eval", action="store_true")
194 | parser.add_argument("--eval_epoch", type=int, default=100)
195 | parser.add_argument("--save_ckpt", action="store_true")
196 | parser.add_argument("--save_path", type=str, default="checkpoints")
197 |
198 | parser.add_argument("--sample_node_len", type=int, default=100)
199 | parser.add_argument("--project_name", default="test")
200 | parser.add_argument("--budget_hour", type=int, default=48)
201 |
202 | args = parser.parse_args()
203 |
204 | print(args)
205 |
206 | run_name = create_run_name_with_timestamp(args, tstamp)
207 | wandb.init(
208 | project=args.project_name,
209 | config=args,
210 | name=run_name,
211 | resume="allow",
212 | id=wandb.util.generate_id(),
213 | settings=wandb.Settings(start_method="fork"),
214 | )
215 |
216 | device = f"cuda:{args.device}" if torch.cuda.is_available() else "cpu"
217 | device = torch.device(device)
218 |
219 | if args.eval:
220 | ckpt = torch.load(
221 | f"checkpoints/ckpt_epoch{args.eval_epoch}.pt", map_location=device
222 | )
223 |
224 | data_root = args.data_root
225 |
226 | if args.dataset.startswith("ogbn"):
227 | dataset = PygNodePropPredDataset(name=args.dataset, root=data_root)
228 | dataset_new_tokenizer = LargeGTTokens(
229 | args.dataset + "_sample_node_len_" + str(args.sample_node_len),
230 | sample_node_len=args.sample_node_len,
231 | )
232 | num_classes = dataset.num_classes
233 | data = dataset[0]
234 |
235 | try:
236 | split_idx = dataset_new_tokenizer.split_idx
237 | x = dataset_new_tokenizer.X
238 | y = dataset_new_tokenizer.y.squeeze()
239 | num_nodes = y.shape[0]
240 | original_X = data.x
241 | except:
242 | split_idx = dataset.get_idx_split()
243 | x = data.x
244 | y = data.y.squeeze()
245 | num_nodes = data.num_nodes
246 |
247 | # Convert split indices to boolean masks and add them to `data`.
248 | for key, idx in split_idx.items():
249 | mask = torch.zeros(num_nodes, dtype=torch.bool)
250 | mask[idx] = True
251 | data[f"{key}_mask"] = mask
252 |
253 | assert args.batch_size <= len(split_idx["train"])
254 |
255 | if args.dataset == "ogbn-papers100M":
256 | evaluator = Evaluator(name="ogbn-papers100M")
257 | else:
258 | evaluator = None
259 |
260 | elif args.dataset == "snap-patents":
261 | dataset_new_tokenizer = LargeGTTokens(
262 | args.dataset + "_sample_node_len_" + str(args.sample_node_len),
263 | sample_node_len=args.sample_node_len,
264 | )
265 | num_classes = 5
266 | fulldata = scipy.io.loadmat(f"data/snap_patents.mat")
267 | edge_index = torch.tensor(fulldata["edge_index"], dtype=torch.long)
268 |
269 | num_nodes = int(fulldata["num_nodes"])
270 | node_feat = torch.tensor(fulldata["node_feat"].todense(), dtype=torch.float)
271 |
272 | years = fulldata["years"].flatten()
273 | label = even_quantile_labels(years, num_classes, verbose=False)
274 | label = torch.tensor(label, dtype=torch.long)
275 |
276 | class MyObject:
277 | pass
278 |
279 | data = MyObject()
280 | x = data.x = node_feat
281 | y = data.y = label
282 | data.num_features = data.x.shape[-1]
283 |
284 | data.edge_index = edge_index
285 | data.num_nodes = num_nodes
286 |
287 | train_idx, valid_idx, test_idx = rand_train_test_idx(
288 | y, train_prop=args.hetero_train_prop
289 | )
290 | split_idx = {"train": train_idx, "valid": valid_idx, "test": test_idx}
291 | evaluator = None
292 |
293 | if args.dataset == "ogbn-papers100M":
294 | try:
295 | data.num_nodes = num_nodes
296 | except:
297 | pass
298 |
299 | model = LargeGT(
300 | num_nodes=data.num_nodes,
301 | in_channels=data.num_features,
302 | hidden_channels=args.hidden_dim,
303 | out_channels=num_classes,
304 | global_dim=args.global_dim,
305 | num_layers=args.num_layers,
306 | heads=args.num_heads,
307 | ff_dropout=args.ff_dropout,
308 | attn_dropout=args.attn_dropout,
309 | skip=args.skip,
310 | conv_type=args.conv_type,
311 | num_centroids=args.num_centroids,
312 | no_bn=args.no_bn,
313 | norm_type=args.norm_type,
314 | sample_node_len=args.sample_node_len,
315 | )
316 |
317 | print("total params:", sum(p.numel() for p in model.parameters()))
318 |
319 | if args.conv_type == "local":
320 | pos_enc = x
321 | else:
322 | dataset_name_input = args.dataset
323 |
324 | if dataset_name_input == "ogbn-papers100M" and args.sample_node_len == 50:
325 | pos_enc = torch.randn(data.num_nodes, args.global_dim)
326 | ogb_node2vec = torch.load(
327 | f"data/{dataset_name_input}_data_dict.pt", map_location="cpu"
328 | ) # 128 dim
329 | node2vec_embd = ogb_node2vec["node2vec_embedding"]
330 |
331 | # https://github.com/snap-stanford/ogb/blob/master/examples/nodeproppred/papers100M/node2vec.py
332 | # Using the mapping from ogb node2vec example to assign pos_enc only to the labeled nodes
333 | all_original_split_idx = torch.cat(
334 | (split_idx["train"], split_idx["valid"], split_idx["test"])
335 | ).tolist()
336 | for i, idx in enumerate(all_original_split_idx):
337 | pos_enc[idx] = node2vec_embd[i]
338 | elif dataset_name_input == "ogbn-papers100M" and args.sample_node_len == 100:
339 | ogb_node2vec = torch.load(
340 | f"data/{dataset_name_input}_data_dict.pt", map_location="cpu"
341 | ) # 128 dim
342 | pos_enc = ogb_node2vec["node2vec_embedding"]
343 | else:
344 | pos_enc = torch.load(
345 | f"data/{dataset_name_input}_embedding_{args.global_dim}.pt",
346 | map_location="cpu",
347 | )
348 |
349 | if args.conv_type == "global":
350 | train_loader = torch.utils.data.DataLoader(
351 | split_idx["train"],
352 | batch_size=args.batch_size,
353 | shuffle=True,
354 | num_workers=args.num_workers,
355 | )
356 | valid_loader = torch.utils.data.DataLoader(
357 | split_idx["valid"],
358 | batch_size=args.test_batch_size,
359 | shuffle=False,
360 | num_workers=args.num_workers,
361 | )
362 | test_loader = torch.utils.data.DataLoader(
363 | split_idx["test"],
364 | batch_size=args.test_batch_size,
365 | shuffle=False,
366 | num_workers=args.num_workers,
367 | )
368 | else:
369 | if args.dataset == "ogbn-papers100M" and args.sample_node_len == 100:
370 | from functools import partial
371 |
372 | custom_collate = partial(
373 | dataset_new_tokenizer.collate, original_X=original_X
374 | )
375 | else:
376 | custom_collate = dataset_new_tokenizer.collate
377 |
378 | train_loader = torch.utils.data.DataLoader(
379 | split_idx["train"],
380 | batch_size=args.batch_size,
381 | shuffle=True,
382 | num_workers=args.num_workers,
383 | collate_fn=custom_collate,
384 | )
385 | valid_loader = torch.utils.data.DataLoader(
386 | split_idx["valid"],
387 | batch_size=args.test_batch_size,
388 | shuffle=False,
389 | num_workers=args.num_workers,
390 | collate_fn=custom_collate,
391 | )
392 | test_loader = torch.utils.data.DataLoader(
393 | split_idx["test"],
394 | batch_size=args.test_batch_size,
395 | shuffle=False,
396 | num_workers=args.num_workers,
397 | collate_fn=custom_collate,
398 | )
399 |
400 | optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
401 |
402 | test_start_epoch = 0
403 |
404 | valid_acc_final, test_acc_final, test_acc_highest = 0, 0, 0
405 |
406 | whole_start = time.time()
407 | for epoch in range(1, 1 + args.epochs):
408 | if time.time() - whole_start >= args.budget_hour * 60 * 60:
409 | print("Budget runtime has passed. Exiting.")
410 | sys.exit(0) # Exit the program
411 | start = time.time()
412 |
413 | train_loss, train_acc = train(
414 | model,
415 | train_loader,
416 | x,
417 | pos_enc,
418 | y,
419 | optimizer,
420 | device,
421 | args.conv_type,
422 | evaluator,
423 | )
424 | train_time = time.time() - start
425 | print(
426 | f"Epoch: {epoch}, Train loss:{train_loss:.4f}, Train acc:{100*train_acc:.2f}, Epoch time: {train_time:.4f}, Train Mem:{torch.cuda.max_memory_allocated(device=device)/1e6:.0f} MB"
427 | )
428 |
429 | wandb.log(
430 | {"loss_train": train_loss, "acc_train": train_acc, "time": train_time}
431 | )
432 |
433 | if epoch > test_start_epoch and epoch % args.test_freq == 0:
434 | if args.save_ckpt:
435 | ckpt = {}
436 | ckpt["model"] = model.state_dict()
437 |
438 | torch.save(
439 | ckpt, f"{args.save_path}/{args.dataset}_ckpt_epoch{epoch}.pt"
440 | )
441 | # ckpt = model.load_state_dict(torch.load('model.pt'))
442 |
443 | else:
444 | start = time.time()
445 | valid_acc = test(
446 | model,
447 | valid_loader,
448 | x,
449 | pos_enc,
450 | y,
451 | device,
452 | args.conv_type,
453 | False,
454 | evaluator,
455 | )
456 |
457 | wandb.log({"acc_val": valid_acc})
458 |
459 | if args.dataset == "ogbn-products" and valid_acc < 0.0:
460 | pass
461 | else:
462 | fast_eval_flag = args.dataset == "ogbn-products"
463 | fast_eval_flag = False
464 |
465 | test_acc = test(
466 | model,
467 | test_loader,
468 | x,
469 | pos_enc,
470 | y,
471 | device,
472 | args.conv_type,
473 | fast_eval_flag,
474 | evaluator,
475 | )
476 | test_time = time.time() - start
477 | print(
478 | f"Test acc: {100 * test_acc:.2f}, Val+Test time used: {test_time:.4f}"
479 | )
480 |
481 | if valid_acc > valid_acc_final:
482 | valid_acc_final = valid_acc
483 | test_acc_final = test_acc
484 | if test_acc > test_acc_highest:
485 | test_acc_highest = test_acc
486 |
487 | wandb.log(
488 | {
489 | "acc_test": test_acc,
490 | "acc_test_best": test_acc_final,
491 | "acc_test_highest": test_acc_highest,
492 | }
493 | )
494 |
495 | wandb.finish()
496 | return valid_acc_final, test_acc_final, time.time() - whole_start
497 |
498 |
499 | if __name__ == "__main__":
500 | # running for multiple times
501 | all_valid_acc = []
502 | all_test_acc = []
503 | all_time = []
504 |
505 | total_runs = 4
506 | timestamp = datetime.datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
507 |
508 | for run_number in range(total_runs):
509 | val_score, test_score, time_run = main(timestamp)
510 | all_valid_acc.append(val_score)
511 | all_test_acc.append(test_score)
512 | all_time.append(time_run)
513 |
514 | print("Mean valid acc: ", npmean(all_valid_acc), "s.d.: ", npstd(all_valid_acc))
515 | print("Mean test acc: ", npmean(all_test_acc), "s.d.: ", npstd(all_test_acc))
516 | print("Avg time taken for 4 runs: ", npmean(time_run))
517 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2023 Snap Inc.
2 |
3 | MIT License
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6 |
7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
8 |
9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 |
11 | The following sets forth attribution notices for third party software that may be included or may be required to use portions of this source code.
12 |
13 | pytorch:
14 | From PyTorch:
15 |
16 | Copyright (c) 2016- Facebook, Inc (Adam Paszke)
17 | Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
18 | Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
19 | Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
20 | Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
21 | Copyright (c) 2011-2013 NYU (Clement Farabet)
22 | Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
23 | Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
24 | Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
25 |
26 | From Caffe2:
27 |
28 | Copyright (c) 2016-present, Facebook Inc. All rights reserved.
29 |
30 | All contributions by Facebook:
31 | Copyright (c) 2016 Facebook Inc.
32 |
33 | All contributions by Google:
34 | Copyright (c) 2015 Google Inc.
35 | All rights reserved.
36 |
37 | All contributions by Yangqing Jia:
38 | Copyright (c) 2015 Yangqing Jia
39 | All rights reserved.
40 |
41 | All contributions by Kakao Brain:
42 | Copyright 2019-2020 Kakao Brain
43 |
44 | All contributions by Cruise LLC:
45 | Copyright (c) 2022 Cruise LLC.
46 | All rights reserved.
47 |
48 | All contributions from Caffe:
49 | Copyright(c) 2013, 2014, 2015, the respective contributors
50 | All rights reserved.
51 |
52 | All other contributions:
53 | Copyright(c) 2015, 2016 the respective contributors
54 | All rights reserved.
55 |
56 | Caffe2 uses a copyright model similar to Caffe: each contributor holds
57 | copyright over their contributions to Caffe2. The project versioning records
58 | all such contribution and copyright details. If a contributor wants to further
59 | mark their specific copyright on a particular contribution, they should
60 | indicate their copyright solely in the commit message of the change when it is
61 | committed.
62 |
63 | All rights reserved.
64 |
65 | Redistribution and use in source and binary forms, with or without
66 | modification, are permitted provided that the following conditions are met:
67 |
68 | 1. Redistributions of source code must retain the above copyright
69 | notice, this list of conditions and the following disclaimer.
70 |
71 | 2. Redistributions in binary form must reproduce the above copyright
72 | notice, this list of conditions and the following disclaimer in the
73 | documentation and/or other materials provided with the distribution.
74 |
75 | 3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
76 | and IDIAP Research Institute nor the names of its contributors may be
77 | used to endorse or promote products derived from this software without
78 | specific prior written permission.
79 |
80 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
81 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
82 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
83 | ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
84 | LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
85 | CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
86 | SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
87 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
88 | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
89 | ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
90 | POSSIBILITY OF SUCH DAMAGE.
91 |
92 |
93 | WANDB:
94 | MIT License
95 |
96 | Copyright (c) 2021 Weights and Biases, Inc.
97 |
98 | Permission is hereby granted, free of charge, to any person obtaining a copy
99 | of this software and associated documentation files (the "Software"), to deal
100 | in the Software without restriction, including without limitation the rights
101 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
102 | copies of the Software, and to permit persons to whom the Software is
103 | furnished to do so, subject to the following conditions:
104 |
105 | The above copyright notice and this permission notice shall be included in all
106 | copies or substantial portions of the Software.
107 |
108 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
109 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
110 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
111 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
112 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
113 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
114 | SOFTWARE.
115 |
116 | absl-py:
117 |
118 | Apache License
119 | Version 2.0, January 2004
120 | http://www.apache.org/licenses/
121 |
122 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
123 |
124 | 1. Definitions.
125 |
126 | "License" shall mean the terms and conditions for use, reproduction,
127 | and distribution as defined by Sections 1 through 9 of this document.
128 |
129 | "Licensor" shall mean the copyright owner or entity authorized by
130 | the copyright owner that is granting the License.
131 |
132 | "Legal Entity" shall mean the union of the acting entity and all
133 | other entities that control, are controlled by, or are under common
134 | control with that entity. For the purposes of this definition,
135 | "control" means (i) the power, direct or indirect, to cause the
136 | direction or management of such entity, whether by contract or
137 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
138 | outstanding shares, or (iii) beneficial ownership of such entity.
139 |
140 | "You" (or "Your") shall mean an individual or Legal Entity
141 | exercising permissions granted by this License.
142 |
143 | "Source" form shall mean the preferred form for making modifications,
144 | including but not limited to software source code, documentation
145 | source, and configuration files.
146 |
147 | "Object" form shall mean any form resulting from mechanical
148 | transformation or translation of a Source form, including but
149 | not limited to compiled object code, generated documentation,
150 | and conversions to other media types.
151 |
152 | "Work" shall mean the work of authorship, whether in Source or
153 | Object form, made available under the License, as indicated by a
154 | copyright notice that is included in or attached to the work
155 | (an example is provided in the Appendix below).
156 |
157 | "Derivative Works" shall mean any work, whether in Source or Object
158 | form, that is based on (or derived from) the Work and for which the
159 | editorial revisions, annotations, elaborations, or other modifications
160 | represent, as a whole, an original work of authorship. For the purposes
161 | of this License, Derivative Works shall not include works that remain
162 | separable from, or merely link (or bind by name) to the interfaces of,
163 | the Work and Derivative Works thereof.
164 |
165 | "Contribution" shall mean any work of authorship, including
166 | the original version of the Work and any modifications or additions
167 | to that Work or Derivative Works thereof, that is intentionally
168 | submitted to Licensor for inclusion in the Work by the copyright owner
169 | or by an individual or Legal Entity authorized to submit on behalf of
170 | the copyright owner. For the purposes of this definition, "submitted"
171 | means any form of electronic, verbal, or written communication sent
172 | to the Licensor or its representatives, including but not limited to
173 | communication on electronic mailing lists, source code control systems,
174 | and issue tracking systems that are managed by, or on behalf of, the
175 | Licensor for the purpose of discussing and improving the Work, but
176 | excluding communication that is conspicuously marked or otherwise
177 | designated in writing by the copyright owner as "Not a Contribution."
178 |
179 | "Contributor" shall mean Licensor and any individual or Legal Entity
180 | on behalf of whom a Contribution has been received by Licensor and
181 | subsequently incorporated within the Work.
182 |
183 | 2. Grant of Copyright License. Subject to the terms and conditions of
184 | this License, each Contributor hereby grants to You a perpetual,
185 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
186 | copyright license to reproduce, prepare Derivative Works of,
187 | publicly display, publicly perform, sublicense, and distribute the
188 | Work and such Derivative Works in Source or Object form.
189 |
190 | 3. Grant of Patent License. Subject to the terms and conditions of
191 | this License, each Contributor hereby grants to You a perpetual,
192 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
193 | (except as stated in this section) patent license to make, have made,
194 | use, offer to sell, sell, import, and otherwise transfer the Work,
195 | where such license applies only to those patent claims licensable
196 | by such Contributor that are necessarily infringed by their
197 | Contribution(s) alone or by combination of their Contribution(s)
198 | with the Work to which such Contribution(s) was submitted. If You
199 | institute patent litigation against any entity (including a
200 | cross-claim or counterclaim in a lawsuit) alleging that the Work
201 | or a Contribution incorporated within the Work constitutes direct
202 | or contributory patent infringement, then any patent licenses
203 | granted to You under this License for that Work shall terminate
204 | as of the date such litigation is filed.
205 |
206 | 4. Redistribution. You may reproduce and distribute copies of the
207 | Work or Derivative Works thereof in any medium, with or without
208 | modifications, and in Source or Object form, provided that You
209 | meet the following conditions:
210 |
211 | (a) You must give any other recipients of the Work or
212 | Derivative Works a copy of this License; and
213 |
214 | (b) You must cause any modified files to carry prominent notices
215 | stating that You changed the files; and
216 |
217 | (c) You must retain, in the Source form of any Derivative Works
218 | that You distribute, all copyright, patent, trademark, and
219 | attribution notices from the Source form of the Work,
220 | excluding those notices that do not pertain to any part of
221 | the Derivative Works; and
222 |
223 | (d) If the Work includes a "NOTICE" text file as part of its
224 | distribution, then any Derivative Works that You distribute must
225 | include a readable copy of the attribution notices contained
226 | within such NOTICE file, excluding those notices that do not
227 | pertain to any part of the Derivative Works, in at least one
228 | of the following places: within a NOTICE text file distributed
229 | as part of the Derivative Works; within the Source form or
230 | documentation, if provided along with the Derivative Works; or,
231 | within a display generated by the Derivative Works, if and
232 | wherever such third-party notices normally appear. The contents
233 | of the NOTICE file are for informational purposes only and
234 | do not modify the License. You may add Your own attribution
235 | notices within Derivative Works that You distribute, alongside
236 | or as an addendum to the NOTICE text from the Work, provided
237 | that such additional attribution notices cannot be construed
238 | as modifying the License.
239 |
240 | You may add Your own copyright statement to Your modifications and
241 | may provide additional or different license terms and conditions
242 | for use, reproduction, or distribution of Your modifications, or
243 | for any such Derivative Works as a whole, provided Your use,
244 | reproduction, and distribution of the Work otherwise complies with
245 | the conditions stated in this License.
246 |
247 | 5. Submission of Contributions. Unless You explicitly state otherwise,
248 | any Contribution intentionally submitted for inclusion in the Work
249 | by You to the Licensor shall be under the terms and conditions of
250 | this License, without any additional terms or conditions.
251 | Notwithstanding the above, nothing herein shall supersede or modify
252 | the terms of any separate license agreement you may have executed
253 | with Licensor regarding such Contributions.
254 |
255 | 6. Trademarks. This License does not grant permission to use the trade
256 | names, trademarks, service marks, or product names of the Licensor,
257 | except as required for reasonable and customary use in describing the
258 | origin of the Work and reproducing the content of the NOTICE file.
259 |
260 | 7. Disclaimer of Warranty. Unless required by applicable law or
261 | agreed to in writing, Licensor provides the Work (and each
262 | Contributor provides its Contributions) on an "AS IS" BASIS,
263 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
264 | implied, including, without limitation, any warranties or conditions
265 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
266 | PARTICULAR PURPOSE. You are solely responsible for determining the
267 | appropriateness of using or redistributing the Work and assume any
268 | risks associated with Your exercise of permissions under this License.
269 |
270 | 8. Limitation of Liability. In no event and under no legal theory,
271 | whether in tort (including negligence), contract, or otherwise,
272 | unless required by applicable law (such as deliberate and grossly
273 | negligent acts) or agreed to in writing, shall any Contributor be
274 | liable to You for damages, including any direct, indirect, special,
275 | incidental, or consequential damages of any character arising as a
276 | result of this License or out of the use or inability to use the
277 | Work (including but not limited to damages for loss of goodwill,
278 | work stoppage, computer failure or malfunction, or any and all
279 | other commercial damages or losses), even if such Contributor
280 | has been advised of the possibility of such damages.
281 |
282 | 9. Accepting Warranty or Additional Liability. While redistributing
283 | the Work or Derivative Works thereof, You may choose to offer,
284 | and charge a fee for, acceptance of support, warranty, indemnity,
285 | or other liability obligations and/or rights consistent with this
286 | License. However, in accepting such obligations, You may act only
287 | on Your own behalf and on Your sole responsibility, not on behalf
288 | of any other Contributor, and only if You agree to indemnify,
289 | defend, and hold each Contributor harmless for any liability
290 | incurred by, or claims asserted against, such Contributor by reason
291 | of your accepting any such warranty or additional liability.
292 |
293 | END OF TERMS AND CONDITIONS
294 |
295 | APPENDIX: How to apply the Apache License to your work.
296 |
297 | To apply the Apache License to your work, attach the following
298 | boilerplate notice, with the fields enclosed by brackets "[]"
299 | replaced with your own identifying information. (Don't include
300 | the brackets!) The text should be enclosed in the appropriate
301 | comment syntax for the file format. We also recommend that a
302 | file or class name and description of purpose be included on the
303 | same "printed page" as the copyright notice for easier
304 | identification within third-party archives.
305 |
306 | Copyright [yyyy] [name of copyright owner]
307 |
308 | Licensed under the Apache License, Version 2.0 (the "License");
309 | you may not use this file except in compliance with the License.
310 | You may obtain a copy of the License at
311 |
312 | http://www.apache.org/licenses/LICENSE-2.0
313 |
314 | Unless required by applicable law or agreed to in writing, software
315 | distributed under the License is distributed on an "AS IS" BASIS,
316 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
317 | See the License for the specific language governing permissions and
318 | limitations under the License.
319 |
320 |
321 | tensorboard:
322 | Copyright 2017 The TensorFlow Authors. All rights reserved.
323 |
324 | Apache License
325 | Version 2.0, January 2004
326 | http://www.apache.org/licenses/
327 |
328 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
329 |
330 | 1. Definitions.
331 |
332 | "License" shall mean the terms and conditions for use, reproduction,
333 | and distribution as defined by Sections 1 through 9 of this document.
334 |
335 | "Licensor" shall mean the copyright owner or entity authorized by
336 | the copyright owner that is granting the License.
337 |
338 | "Legal Entity" shall mean the union of the acting entity and all
339 | other entities that control, are controlled by, or are under common
340 | control with that entity. For the purposes of this definition,
341 | "control" means (i) the power, direct or indirect, to cause the
342 | direction or management of such entity, whether by contract or
343 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
344 | outstanding shares, or (iii) beneficial ownership of such entity.
345 |
346 | "You" (or "Your") shall mean an individual or Legal Entity
347 | exercising permissions granted by this License.
348 |
349 | "Source" form shall mean the preferred form for making modifications,
350 | including but not limited to software source code, documentation
351 | source, and configuration files.
352 |
353 | "Object" form shall mean any form resulting from mechanical
354 | transformation or translation of a Source form, including but
355 | not limited to compiled object code, generated documentation,
356 | and conversions to other media types.
357 |
358 | "Work" shall mean the work of authorship, whether in Source or
359 | Object form, made available under the License, as indicated by a
360 | copyright notice that is included in or attached to the work
361 | (an example is provided in the Appendix below).
362 |
363 | "Derivative Works" shall mean any work, whether in Source or Object
364 | form, that is based on (or derived from) the Work and for which the
365 | editorial revisions, annotations, elaborations, or other modifications
366 | represent, as a whole, an original work of authorship. For the purposes
367 | of this License, Derivative Works shall not include works that remain
368 | separable from, or merely link (or bind by name) to the interfaces of,
369 | the Work and Derivative Works thereof.
370 |
371 | "Contribution" shall mean any work of authorship, including
372 | the original version of the Work and any modifications or additions
373 | to that Work or Derivative Works thereof, that is intentionally
374 | submitted to Licensor for inclusion in the Work by the copyright owner
375 | or by an individual or Legal Entity authorized to submit on behalf of
376 | the copyright owner. For the purposes of this definition, "submitted"
377 | means any form of electronic, verbal, or written communication sent
378 | to the Licensor or its representatives, including but not limited to
379 | communication on electronic mailing lists, source code control systems,
380 | and issue tracking systems that are managed by, or on behalf of, the
381 | Licensor for the purpose of discussing and improving the Work, but
382 | excluding communication that is conspicuously marked or otherwise
383 | designated in writing by the copyright owner as "Not a Contribution."
384 |
385 | "Contributor" shall mean Licensor and any individual or Legal Entity
386 | on behalf of whom a Contribution has been received by Licensor and
387 | subsequently incorporated within the Work.
388 |
389 | 2. Grant of Copyright License. Subject to the terms and conditions of
390 | this License, each Contributor hereby grants to You a perpetual,
391 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
392 | copyright license to reproduce, prepare Derivative Works of,
393 | publicly display, publicly perform, sublicense, and distribute the
394 | Work and such Derivative Works in Source or Object form.
395 |
396 | 3. Grant of Patent License. Subject to the terms and conditions of
397 | this License, each Contributor hereby grants to You a perpetual,
398 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
399 | (except as stated in this section) patent license to make, have made,
400 | use, offer to sell, sell, import, and otherwise transfer the Work,
401 | where such license applies only to those patent claims licensable
402 | by such Contributor that are necessarily infringed by their
403 | Contribution(s) alone or by combination of their Contribution(s)
404 | with the Work to which such Contribution(s) was submitted. If You
405 | institute patent litigation against any entity (including a
406 | cross-claim or counterclaim in a lawsuit) alleging that the Work
407 | or a Contribution incorporated within the Work constitutes direct
408 | or contributory patent infringement, then any patent licenses
409 | granted to You under this License for that Work shall terminate
410 | as of the date such litigation is filed.
411 |
412 | 4. Redistribution. You may reproduce and distribute copies of the
413 | Work or Derivative Works thereof in any medium, with or without
414 | modifications, and in Source or Object form, provided that You
415 | meet the following conditions:
416 |
417 | (a) You must give any other recipients of the Work or
418 | Derivative Works a copy of this License; and
419 |
420 | (b) You must cause any modified files to carry prominent notices
421 | stating that You changed the files; and
422 |
423 | (c) You must retain, in the Source form of any Derivative Works
424 | that You distribute, all copyright, patent, trademark, and
425 | attribution notices from the Source form of the Work,
426 | excluding those notices that do not pertain to any part of
427 | the Derivative Works; and
428 |
429 | (d) If the Work includes a "NOTICE" text file as part of its
430 | distribution, then any Derivative Works that You distribute must
431 | include a readable copy of the attribution notices contained
432 | within such NOTICE file, excluding those notices that do not
433 | pertain to any part of the Derivative Works, in at least one
434 | of the following places: within a NOTICE text file distributed
435 | as part of the Derivative Works; within the Source form or
436 | documentation, if provided along with the Derivative Works; or,
437 | within a display generated by the Derivative Works, if and
438 | wherever such third-party notices normally appear. The contents
439 | of the NOTICE file are for informational purposes only and
440 | do not modify the License. You may add Your own attribution
441 | notices within Derivative Works that You distribute, alongside
442 | or as an addendum to the NOTICE text from the Work, provided
443 | that such additional attribution notices cannot be construed
444 | as modifying the License.
445 |
446 | You may add Your own copyright statement to Your modifications and
447 | may provide additional or different license terms and conditions
448 | for use, reproduction, or distribution of Your modifications, or
449 | for any such Derivative Works as a whole, provided Your use,
450 | reproduction, and distribution of the Work otherwise complies with
451 | the conditions stated in this License.
452 |
453 | 5. Submission of Contributions. Unless You explicitly state otherwise,
454 | any Contribution intentionally submitted for inclusion in the Work
455 | by You to the Licensor shall be under the terms and conditions of
456 | this License, without any additional terms or conditions.
457 | Notwithstanding the above, nothing herein shall supersede or modify
458 | the terms of any separate license agreement you may have executed
459 | with Licensor regarding such Contributions.
460 |
461 | 6. Trademarks. This License does not grant permission to use the trade
462 | names, trademarks, service marks, or product names of the Licensor,
463 | except as required for reasonable and customary use in describing the
464 | origin of the Work and reproducing the content of the NOTICE file.
465 |
466 | 7. Disclaimer of Warranty. Unless required by applicable law or
467 | agreed to in writing, Licensor provides the Work (and each
468 | Contributor provides its Contributions) on an "AS IS" BASIS,
469 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
470 | implied, including, without limitation, any warranties or conditions
471 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
472 | PARTICULAR PURPOSE. You are solely responsible for determining the
473 | appropriateness of using or redistributing the Work and assume any
474 | risks associated with Your exercise of permissions under this License.
475 |
476 | 8. Limitation of Liability. In no event and under no legal theory,
477 | whether in tort (including negligence), contract, or otherwise,
478 | unless required by applicable law (such as deliberate and grossly
479 | negligent acts) or agreed to in writing, shall any Contributor be
480 | liable to You for damages, including any direct, indirect, special,
481 | incidental, or consequential damages of any character arising as a
482 | result of this License or out of the use or inability to use the
483 | Work (including but not limited to damages for loss of goodwill,
484 | work stoppage, computer failure or malfunction, or any and all
485 | other commercial damages or losses), even if such Contributor
486 | has been advised of the possibility of such damages.
487 |
488 | 9. Accepting Warranty or Additional Liability. While redistributing
489 | the Work or Derivative Works thereof, You may choose to offer,
490 | and charge a fee for, acceptance of support, warranty, indemnity,
491 | or other liability obligations and/or rights consistent with this
492 | License. However, in accepting such obligations, You may act only
493 | on Your own behalf and on Your sole responsibility, not on behalf
494 | of any other Contributor, and only if You agree to indemnify,
495 | defend, and hold each Contributor harmless for any liability
496 | incurred by, or claims asserted against, such Contributor by reason
497 | of your accepting any such warranty or additional liability.
498 |
499 | END OF TERMS AND CONDITIONS
500 |
501 | APPENDIX: How to apply the Apache License to your work.
502 |
503 | To apply the Apache License to your work, attach the following
504 | boilerplate notice, with the fields enclosed by brackets "[]"
505 | replaced with your own identifying information. (Don't include
506 | the brackets!) The text should be enclosed in the appropriate
507 | comment syntax for the file format. We also recommend that a
508 | file or class name and description of purpose be included on the
509 | same "printed page" as the copyright notice for easier
510 | identification within third-party archives.
511 |
512 | Copyright 2017, The TensorFlow Authors.
513 |
514 | Licensed under the Apache License, Version 2.0 (the "License");
515 | you may not use this file except in compliance with the License.
516 | You may obtain a copy of the License at
517 |
518 | http://www.apache.org/licenses/LICENSE-2.0
519 |
520 | Unless required by applicable law or agreed to in writing, software
521 | distributed under the License is distributed on an "AS IS" BASIS,
522 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
523 | See the License for the specific language governing permissions and
524 | limitations under the License.
525 |
526 |
527 | einops:
528 | MIT License
529 |
530 | Copyright (c) 2018 Alex Rogozhnikov
531 |
532 | Permission is hereby granted, free of charge, to any person obtaining a copy
533 | of this software and associated documentation files (the "Software"), to deal
534 | in the Software without restriction, including without limitation the rights
535 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
536 | copies of the Software, and to permit persons to whom the Software is
537 | furnished to do so, subject to the following conditions:
538 |
539 | The above copyright notice and this permission notice shall be included in all
540 | copies or substantial portions of the Software.
541 |
542 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
543 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
544 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
545 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
546 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
547 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
548 | SOFTWARE.
549 |
550 |
551 | matplotlib
552 | License agreement for matplotlib versions 1.3.0 and later
553 | =========================================================
554 |
555 | 1. This LICENSE AGREEMENT is between the Matplotlib Development Team
556 | ("MDT"), and the Individual or Organization ("Licensee") accessing and
557 | otherwise using matplotlib software in source or binary form and its
558 | associated documentation.
559 |
560 | 2. Subject to the terms and conditions of this License Agreement, MDT
561 | hereby grants Licensee a nonexclusive, royalty-free, world-wide license
562 | to reproduce, analyze, test, perform and/or display publicly, prepare
563 | derivative works, distribute, and otherwise use matplotlib
564 | alone or in any derivative version, provided, however, that MDT's
565 | License Agreement and MDT's notice of copyright, i.e., "Copyright (c)
566 | 2012- Matplotlib Development Team; All Rights Reserved" are retained in
567 | matplotlib alone or in any derivative version prepared by
568 | Licensee.
569 |
570 | 3. In the event Licensee prepares a derivative work that is based on or
571 | incorporates matplotlib or any part thereof, and wants to
572 | make the derivative work available to others as provided herein, then
573 | Licensee hereby agrees to include in any such work a brief summary of
574 | the changes made to matplotlib .
575 |
576 | 4. MDT is making matplotlib available to Licensee on an "AS
577 | IS" basis. MDT MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
578 | IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, MDT MAKES NO AND
579 | DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
580 | FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF MATPLOTLIB
581 | WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.
582 |
583 | 5. MDT SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF MATPLOTLIB
584 | FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR
585 | LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING
586 | MATPLOTLIB , OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF
587 | THE POSSIBILITY THEREOF.
588 |
589 | 6. This License Agreement will automatically terminate upon a material
590 | breach of its terms and conditions.
591 |
592 | 7. Nothing in this License Agreement shall be deemed to create any
593 | relationship of agency, partnership, or joint venture between MDT and
594 | Licensee. This License Agreement does not grant permission to use MDT
595 | trademarks or trade name in a trademark sense to endorse or promote
596 | products or services of Licensee, or any third party.
597 |
598 | 8. By copying, installing or otherwise using matplotlib ,
599 | Licensee agrees to be bound by the terms and conditions of this License
600 | Agreement.
601 |
602 | License agreement for matplotlib versions prior to 1.3.0
603 | ========================================================
604 |
605 | 1. This LICENSE AGREEMENT is between John D. Hunter ("JDH"), and the
606 | Individual or Organization ("Licensee") accessing and otherwise using
607 | matplotlib software in source or binary form and its associated
608 | documentation.
609 |
610 | 2. Subject to the terms and conditions of this License Agreement, JDH
611 | hereby grants Licensee a nonexclusive, royalty-free, world-wide license
612 | to reproduce, analyze, test, perform and/or display publicly, prepare
613 | derivative works, distribute, and otherwise use matplotlib
614 | alone or in any derivative version, provided, however, that JDH's
615 | License Agreement and JDH's notice of copyright, i.e., "Copyright (c)
616 | 2002-2011 John D. Hunter; All Rights Reserved" are retained in
617 | matplotlib alone or in any derivative version prepared by
618 | Licensee.
619 |
620 | 3. In the event Licensee prepares a derivative work that is based on or
621 | incorporates matplotlib or any part thereof, and wants to
622 | make the derivative work available to others as provided herein, then
623 | Licensee hereby agrees to include in any such work a brief summary of
624 | the changes made to matplotlib.
625 |
626 | 4. JDH is making matplotlib available to Licensee on an "AS
627 | IS" basis. JDH MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
628 | IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, JDH MAKES NO AND
629 | DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
630 | FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF MATPLOTLIB
631 | WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.
632 |
633 | 5. JDH SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF MATPLOTLIB
634 | FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR
635 | LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING
636 | MATPLOTLIB , OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF
637 | THE POSSIBILITY THEREOF.
638 |
639 | 6. This License Agreement will automatically terminate upon a material
640 | breach of its terms and conditions.
641 |
642 | 7. Nothing in this License Agreement shall be deemed to create any
643 | relationship of agency, partnership, or joint venture between JDH and
644 | Licensee. This License Agreement does not grant permission to use JDH
645 | trademarks or trade name in a trademark sense to endorse or promote
646 | products or services of Licensee, or any third party.
647 |
648 | 8. By copying, installing or otherwise using matplotlib,
649 | Licensee agrees to be bound by the terms and conditions of this License
650 | Agreement.
651 |
652 |
653 | kmeans-pytorch:
654 | MIT License
655 |
656 | Copyright (c) 2020 subhadarshi
657 |
658 | Permission is hereby granted, free of charge, to any person obtaining a copy
659 | of this software and associated documentation files (the "Software"), to deal
660 | in the Software without restriction, including without limitation the rights
661 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
662 | copies of the Software, and to permit persons to whom the Software is
663 | furnished to do so, subject to the following conditions:
664 |
665 | The above copyright notice and this permission notice shall be included in all
666 | copies or substantial portions of the Software.
667 |
668 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
669 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
670 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
671 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
672 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
673 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
674 | SOFTWARE.
675 |
676 | networkx:
677 | NetworkX is distributed with the 3-clause BSD license.
678 |
679 | ::
680 |
681 | Copyright (C) 2004-2023, NetworkX Developers
682 | Aric Hagberg
683 | Dan Schult
684 | Pieter Swart
685 | All rights reserved.
686 |
687 | Redistribution and use in source and binary forms, with or without
688 | modification, are permitted provided that the following conditions are
689 | met:
690 |
691 | * Redistributions of source code must retain the above copyright
692 | notice, this list of conditions and the following disclaimer.
693 |
694 | * Redistributions in binary form must reproduce the above
695 | copyright notice, this list of conditions and the following
696 | disclaimer in the documentation and/or other materials provided
697 | with the distribution.
698 |
699 | * Neither the name of the NetworkX Developers nor the names of its
700 | contributors may be used to endorse or promote products derived
701 | from this software without specific prior written permission.
702 |
703 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
704 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
705 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
706 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
707 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
708 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
709 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
710 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
711 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
712 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
713 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
714 |
715 | pandas:
716 | BSD 3-Clause License
717 |
718 | Copyright (c) 2008-2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
719 | All rights reserved.
720 |
721 | Copyright (c) 2011-2023, Open source contributors.
722 |
723 | Redistribution and use in source and binary forms, with or without
724 | modification, are permitted provided that the following conditions are met:
725 |
726 | * Redistributions of source code must retain the above copyright notice, this
727 | list of conditions and the following disclaimer.
728 |
729 | * Redistributions in binary form must reproduce the above copyright notice,
730 | this list of conditions and the following disclaimer in the documentation
731 | and/or other materials provided with the distribution.
732 |
733 | * Neither the name of the copyright holder nor the names of its
734 | contributors may be used to endorse or promote products derived from
735 | this software without specific prior written permission.
736 |
737 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
738 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
739 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
740 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
741 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
742 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
743 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
744 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
745 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
746 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
747 |
748 | ogb:
749 | MIT License
750 |
751 | Copyright (c) 2019 OGB Team
752 |
753 | Permission is hereby granted, free of charge, to any person obtaining a copy
754 | of this software and associated documentation files (the "Software"), to deal
755 | in the Software without restriction, including without limitation the rights
756 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
757 | copies of the Software, and to permit persons to whom the Software is
758 | furnished to do so, subject to the following conditions:
759 |
760 | The above copyright notice and this permission notice shall be included in all
761 | copies or substantial portions of the Software.
762 |
763 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
764 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
765 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
766 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
767 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
768 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
769 | SOFTWARE.
770 |
771 | numba
772 | Copyright (c) 2012, Anaconda, Inc.
773 | All rights reserved.
774 |
775 | Redistribution and use in source and binary forms, with or without
776 | modification, are permitted provided that the following conditions are
777 | met:
778 |
779 | Redistributions of source code must retain the above copyright notice,
780 | this list of conditions and the following disclaimer.
781 |
782 | Redistributions in binary form must reproduce the above copyright
783 | notice, this list of conditions and the following disclaimer in the
784 | documentation and/or other materials provided with the distribution.
785 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
786 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
787 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
788 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
789 | HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
790 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
791 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
792 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
793 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
794 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
795 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
796 |
797 | scikit-network:
798 | BSD License
799 |
800 | Copyright (c) 2018, Scikit-network Developers
801 | Bertrand Charpentier
802 | Thomas Bonald
803 | All rights reserved.
804 |
805 | Redistribution and use in source and binary forms, with or without modification,
806 | are permitted provided that the following conditions are met:
807 |
808 | * Redistributions of source code must retain the above copyright notice, this
809 | list of conditions and the following disclaimer.
810 |
811 | * Redistributions in binary form must reproduce the above copyright notice, this
812 | list of conditions and the following disclaimer in the documentation and/or
813 | other materials provided with the distribution.
814 |
815 | * Neither the name of the copyright holder nor the names of its
816 | contributors may be used to endorse or promote products derived from this
817 | software without specific prior written permission.
818 |
819 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
820 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
821 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
822 | IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
823 | INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
824 | BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
825 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
826 | OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
827 | OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
828 | OF THE POSSIBILITY OF SUCH DAMAGE.
829 |
830 | pytorch-geometric:
831 | Copyright (c) 2023 PyG Team
832 |
833 | Permission is hereby granted, free of charge, to any person obtaining a copy
834 | of this software and associated documentation files (the "Software"), to deal
835 | in the Software without restriction, including without limitation the rights
836 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
837 | copies of the Software, and to permit persons to whom the Software is
838 | furnished to do so, subject to the following conditions:
839 |
840 | The above copyright notice and this permission notice shall be included in
841 | all copies or substantial portions of the Software.
842 |
843 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
844 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
845 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
846 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
847 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
848 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
849 | THE SOFTWARE.
850 |
851 |
852 | dgl:
853 | Apache License
854 | Version 2.0, January 2004
855 | http://www.apache.org/licenses/
856 |
857 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
858 |
859 | 1. Definitions.
860 |
861 | "License" shall mean the terms and conditions for use, reproduction,
862 | and distribution as defined by Sections 1 through 9 of this document.
863 |
864 | "Licensor" shall mean the copyright owner or entity authorized by
865 | the copyright owner that is granting the License.
866 |
867 | "Legal Entity" shall mean the union of the acting entity and all
868 | other entities that control, are controlled by, or are under common
869 | control with that entity. For the purposes of this definition,
870 | "control" means (i) the power, direct or indirect, to cause the
871 | direction or management of such entity, whether by contract or
872 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
873 | outstanding shares, or (iii) beneficial ownership of such entity.
874 |
875 | "You" (or "Your") shall mean an individual or Legal Entity
876 | exercising permissions granted by this License.
877 |
878 | "Source" form shall mean the preferred form for making modifications,
879 | including but not limited to software source code, documentation
880 | source, and configuration files.
881 |
882 | "Object" form shall mean any form resulting from mechanical
883 | transformation or translation of a Source form, including but
884 | not limited to compiled object code, generated documentation,
885 | and conversions to other media types.
886 |
887 | "Work" shall mean the work of authorship, whether in Source or
888 | Object form, made available under the License, as indicated by a
889 | copyright notice that is included in or attached to the work
890 | (an example is provided in the Appendix below).
891 |
892 | "Derivative Works" shall mean any work, whether in Source or Object
893 | form, that is based on (or derived from) the Work and for which the
894 | editorial revisions, annotations, elaborations, or other modifications
895 | represent, as a whole, an original work of authorship. For the purposes
896 | of this License, Derivative Works shall not include works that remain
897 | separable from, or merely link (or bind by name) to the interfaces of,
898 | the Work and Derivative Works thereof.
899 |
900 | "Contribution" shall mean any work of authorship, including
901 | the original version of the Work and any modifications or additions
902 | to that Work or Derivative Works thereof, that is intentionally
903 | submitted to Licensor for inclusion in the Work by the copyright owner
904 | or by an individual or Legal Entity authorized to submit on behalf of
905 | the copyright owner. For the purposes of this definition, "submitted"
906 | means any form of electronic, verbal, or written communication sent
907 | to the Licensor or its representatives, including but not limited to
908 | communication on electronic mailing lists, source code control systems,
909 | and issue tracking systems that are managed by, or on behalf of, the
910 | Licensor for the purpose of discussing and improving the Work, but
911 | excluding communication that is conspicuously marked or otherwise
912 | designated in writing by the copyright owner as "Not a Contribution."
913 |
914 | "Contributor" shall mean Licensor and any individual or Legal Entity
915 | on behalf of whom a Contribution has been received by Licensor and
916 | subsequently incorporated within the Work.
917 |
918 | 2. Grant of Copyright License. Subject to the terms and conditions of
919 | this License, each Contributor hereby grants to You a perpetual,
920 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
921 | copyright license to reproduce, prepare Derivative Works of,
922 | publicly display, publicly perform, sublicense, and distribute the
923 | Work and such Derivative Works in Source or Object form.
924 |
925 | 3. Grant of Patent License. Subject to the terms and conditions of
926 | this License, each Contributor hereby grants to You a perpetual,
927 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
928 | (except as stated in this section) patent license to make, have made,
929 | use, offer to sell, sell, import, and otherwise transfer the Work,
930 | where such license applies only to those patent claims licensable
931 | by such Contributor that are necessarily infringed by their
932 | Contribution(s) alone or by combination of their Contribution(s)
933 | with the Work to which such Contribution(s) was submitted. If You
934 | institute patent litigation against any entity (including a
935 | cross-claim or counterclaim in a lawsuit) alleging that the Work
936 | or a Contribution incorporated within the Work constitutes direct
937 | or contributory patent infringement, then any patent licenses
938 | granted to You under this License for that Work shall terminate
939 | as of the date such litigation is filed.
940 |
941 | 4. Redistribution. You may reproduce and distribute copies of the
942 | Work or Derivative Works thereof in any medium, with or without
943 | modifications, and in Source or Object form, provided that You
944 | meet the following conditions:
945 |
946 | (a) You must give any other recipients of the Work or
947 | Derivative Works a copy of this License; and
948 |
949 | (b) You must cause any modified files to carry prominent notices
950 | stating that You changed the files; and
951 |
952 | (c) You must retain, in the Source form of any Derivative Works
953 | that You distribute, all copyright, patent, trademark, and
954 | attribution notices from the Source form of the Work,
955 | excluding those notices that do not pertain to any part of
956 | the Derivative Works; and
957 |
958 | (d) If the Work includes a "NOTICE" text file as part of its
959 | distribution, then any Derivative Works that You distribute must
960 | include a readable copy of the attribution notices contained
961 | within such NOTICE file, excluding those notices that do not
962 | pertain to any part of the Derivative Works, in at least one
963 | of the following places: within a NOTICE text file distributed
964 | as part of the Derivative Works; within the Source form or
965 | documentation, if provided along with the Derivative Works; or,
966 | within a display generated by the Derivative Works, if and
967 | wherever such third-party notices normally appear. The contents
968 | of the NOTICE file are for informational purposes only and
969 | do not modify the License. You may add Your own attribution
970 | notices within Derivative Works that You distribute, alongside
971 | or as an addendum to the NOTICE text from the Work, provided
972 | that such additional attribution notices cannot be construed
973 | as modifying the License.
974 |
975 | You may add Your own copyright statement to Your modifications and
976 | may provide additional or different license terms and conditions
977 | for use, reproduction, or distribution of Your modifications, or
978 | for any such Derivative Works as a whole, provided Your use,
979 | reproduction, and distribution of the Work otherwise complies with
980 | the conditions stated in this License.
981 |
982 | 5. Submission of Contributions. Unless You explicitly state otherwise,
983 | any Contribution intentionally submitted for inclusion in the Work
984 | by You to the Licensor shall be under the terms and conditions of
985 | this License, without any additional terms or conditions.
986 | Notwithstanding the above, nothing herein shall supersede or modify
987 | the terms of any separate license agreement you may have executed
988 | with Licensor regarding such Contributions.
989 |
990 | 6. Trademarks. This License does not grant permission to use the trade
991 | names, trademarks, service marks, or product names of the Licensor,
992 | except as required for reasonable and customary use in describing the
993 | origin of the Work and reproducing the content of the NOTICE file.
994 |
995 | 7. Disclaimer of Warranty. Unless required by applicable law or
996 | agreed to in writing, Licensor provides the Work (and each
997 | Contributor provides its Contributions) on an "AS IS" BASIS,
998 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
999 | implied, including, without limitation, any warranties or conditions
1000 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
1001 | PARTICULAR PURPOSE. You are solely responsible for determining the
1002 | appropriateness of using or redistributing the Work and assume any
1003 | risks associated with Your exercise of permissions under this License.
1004 |
1005 | 8. Limitation of Liability. In no event and under no legal theory,
1006 | whether in tort (including negligence), contract, or otherwise,
1007 | unless required by applicable law (such as deliberate and grossly
1008 | negligent acts) or agreed to in writing, shall any Contributor be
1009 | liable to You for damages, including any direct, indirect, special,
1010 | incidental, or consequential damages of any character arising as a
1011 | result of this License or out of the use or inability to use the
1012 | Work (including but not limited to damages for loss of goodwill,
1013 | work stoppage, computer failure or malfunction, or any and all
1014 | other commercial damages or losses), even if such Contributor
1015 | has been advised of the possibility of such damages.
1016 |
1017 | 9. Accepting Warranty or Additional Liability. While redistributing
1018 | the Work or Derivative Works thereof, You may choose to offer,
1019 | and charge a fee for, acceptance of support, warranty, indemnity,
1020 | or other liability obligations and/or rights consistent with this
1021 | License. However, in accepting such obligations, You may act only
1022 | on Your own behalf and on Your sole responsibility, not on behalf
1023 | of any other Contributor, and only if You agree to indemnify,
1024 | defend, and hold each Contributor harmless for any liability
1025 | incurred by, or claims asserted against, such Contributor by reason
1026 | of your accepting any such warranty or additional liability.
1027 |
1028 | END OF TERMS AND CONDITIONS
1029 |
1030 | APPENDIX: How to apply the Apache License to your work.
1031 |
1032 | To apply the Apache License to your work, attach the following
1033 | boilerplate notice, with the fields enclosed by brackets "[]"
1034 | replaced with your own identifying information. (Don't include
1035 | the brackets!) The text should be enclosed in the appropriate
1036 | comment syntax for the file format. We also recommend that a
1037 | file or class name and description of purpose be included on the
1038 | same "printed page" as the copyright notice for easier
1039 | identification within third-party archives.
1040 |
1041 | Copyright [yyyy] [name of copyright owner]
1042 |
1043 | Licensed under the Apache License, Version 2.0 (the "License");
1044 | you may not use this file except in compliance with the License.
1045 | You may obtain a copy of the License at
1046 |
1047 | http://www.apache.org/licenses/LICENSE-2.0
1048 |
1049 | Unless required by applicable law or agreed to in writing, software
1050 | distributed under the License is distributed on an "AS IS" BASIS,
1051 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1052 | See the License for the specific language governing permissions and
1053 | limitations under the License.
1054 |
1055 |
--------------------------------------------------------------------------------