os
, numpy
, matplotlib
, pickle
, datetime
, scipy.io
, copy
, torch
, scipy
, math
, and sklearn
. Additionally, to handle specific datasets listed below, the following are also required hdf5storage
, urllib
, zipfile
, gzip
and shutil
; and to handle tensorboard visualization, also include glob
, torchvision
, operator
and tensorboardX
.
63 |
64 | ### Datasets
65 |
66 | The different datasets involved graph data that are available in this library are the following ones.
67 |
68 | 1. Authorship attribution dataset, available under datasets/authorshipData
(note that the available .rar files have to be uncompressed into the authorshipData.mat
to be able to use that dataset with the provided code). When using this dataset, please cite
2. The MovieLens-100k dataset. When using this dataset, please cite
73 | 74 | F. M. Harper and J. A. Konstan, "[The MovieLens datasets: History and Context](http://dl.acm.org/citation.cfm?id=2827872)", _ACM Trans. Interactive Intell. Syst._, vol. 5, no. 4, pp. 19:(1-19), Jan. 2016. 75 | 76 |3. A source localization dataset. This source localization problem generates synthetic data at execution time. This data can be generated on synthetic graphs such as the Small World graph or the Stochastic Block Model. It can also generate synthetic data, on a real Facebook graph. When using the Facebook graph, please cite
77 | 78 | J. McAuley and J. Leskovec, "[Learning to discover social circles in Ego networks](http://papers.nips.cc/paper/4532-learning-to-discover-social-circles-in-ego-networks)," in _26th Neural Inform. Process. Syst._ Stateline, TX: NeurIPS Foundation, 3-8 Dec. 2012. 79 | 80 |4. A flocking dataset. The problem of flocking consists on controlling a robot swarm, initially flying at random, arbitrary velocities, to fly together at the same velocity while avoiding collisions with each other. The task is to do so in a distributed and decentralized manner, where each agent (each robot) can compute its control action at every time instat relying only on information obtained from communications with immediate neighbors. The dataset is synthetic in that it generates different sample trajectories with random initializations. When using this dataset, please cite 81 | 82 | F. Gama, E. Tolstaya, and A. Ribeiro, "[Graph Neural Networks for Decentralized Controllers](http://arxiv.org/abs/2003.10280)," _arXiv:2003.10280v1 [cs.LG],_ 23 March 2020. 83 | 84 |
4. An epidemic dataset. In this problem, we track the spread of an epidemic on a high school friendship network. The epidemic data is generated by using the SIR model to simulate the spread of an infectious disease on the friendship network built from this SocioPatterns dataset. When using this dataset, please cite
85 |
86 | L. Ruiz, F. Gama, and A. Ribeiro, "[Gated Graph Recurrent Neural Networks](http://arxiv.org/abs/2002.01038)," submitted to _IEEE Trans. Signal Process._
87 |
88 |
89 | ### Libraries
90 |
91 | The `alelab` package is split up into two sub-package: `alelab.modules` and `alelab.utils`.
92 |
93 | * modules.architectures
contains the implementation of several standard architectures (as nn.Module
subclasses) so that they can be readily initialized and trained. Details are provided in the [next section](#architectures).
94 |
95 | * modules.architecturesTime
contains the implementation of several standard architectures (as nn.Module
subclasses) that handle time-dependent topologies, so that they can be readily initialized and trained. Details are provided in the [next section](#architectures).
96 |
97 | * modules.evaluation
contains functions that act as intermediaries between the model and the data in order to evaluate a trained architecture.
98 |
99 | * modules.loss
contains a wrapper for the loss function so that it can adapt to multiple scenarios, and the loss function for the F1 score.
100 |
101 | * modules.model
defines a Model
that binds together the three basic elements to construct a machine learning model: the (neural network) architecture, the loss function and the optimizer. Additionally, it assigns a training handler and an evaluator. It assigns a name to the model and a directory where to save the trained parameters of the architecture, as well. It is the basic class that can train and evaluate a model and also offers methods to save and load parameters.
102 |
103 | * modules.training
contains classes that handle the training of each model, acting as an intermediary between the data and the specific architecture within the model being trained.
104 |
105 | * utils.dataTools
loads each of the datasets described [above](#datasets) as classes with several functionalities particular to each dataset. All the data classes do have two methods: .getSamples
to gather the corresponding samples to training, validation or testing sets, and .evaluate
that compute the corresponding evaluation measure.
106 |
107 | * utils.graphML
is the main library containing the implementation of all the possible graph neural network layers (as nn.Module
subclasses). This library is the analogous of the torch.nn
layer, but for graph-based operations. It contains the definition of the basic layers that need to be put together to build a graph neural network. Details are provided in the [next section](#architectures).
108 |
109 | * utils.graphTools
defines the Graph
class that handles graph-structure information, and offers several other tools to handle graphs.
110 |
111 | * utils.miscTools
defines some miscellaneous functions.
112 |
113 | * utils.visualTools
contains all the relevant classes and functions to handle visualization in tensorboard.
114 |
115 | ### Architectures
116 |
117 | In what follows, we describe several ways of parameterizing the filters _Hlfg(S)_ that are implemented in this library.
118 |
119 | * ___Convolutional Graph Neural Networks (via Selection)___. The most popular graph neural network (GNN) is that one that parameterizes _Hlfg(S)_ by a linear shift-invariant graph filter, giving rise to a __graph convolution__. The nn.Module
subclass that implements the graph filter (convolutional) layer can be found in utils.graphML.GraphFilter
. This layer is the basic linear layer in the Selection GNN architecture (which also adds the pointwise activation function and the zero-padding pooling operation), which is already implemented in modules.architectures.SelectionGNN
and shown in several examples. For more details on this graph convolutional layer or its architecture, and whenever using it, please cite the following paper
120 |
121 | F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, "[Convolutional Neural Network Architectures for Signals Supported on Graphs](http://ieeexplore.ieee.org/document/8579589)," _IEEE Trans. Signal Process._, vol. 67, no. 4, pp. 1034–1049, Feb. 2019.
122 |
123 | The modules.architectures.SelectionGNN
also has a flag called coarsening
that allows for the pooling to be done in terms of graph coarsening, following the Graclus algorithm. This part of the code was mainly adapted to PyTorch from this repository. For more details on graph coarsening, and whenever using the SelectionGNN
with graph coarsening pooling, please cite the following [paper](http://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.pdf). Also note that by setting the number of filter taps (nFilterTaps
) to 2
on every layer leads to this [architecture](http://openreview.net/forum?id=SJU4ayYgl). Finally, this other [architecture](https://openreview.net/forum?id=ryGs6iA5Km) is obtained by setting the number of filter taps to 1
for each number of designed fully-connected layers, and then setting it to 2
to complete the corresponding _GIN layer_. There is one further implementation that is entirely local (i.e. it only involves operations exchanging information with one-hop neighbors). This implementation essentially replaces the last fully-connected layer by a readout layer that only operates on the features obtained at the node. The implementation is dubbed LocalGNN
and is used in the MovieLens
example.
124 |
125 | * ___Convolutional Graph Neural Networks (via Spectrum)___. The spectral GNN is an early implementation of the convolutional GNN in the graph frequency domain. It does not scale to large graphs due to the cost of the eigendecomposition of the GSO. The spectral filtering layer is implemented as a nn.Module
subclass in utils.graphML.SpectralGF
and the corresponding architecture with these linear layers, together with pointwise nonlinearities is implemented in modules.architectures.SpectralGNN
. For more details on the spectral graph filtering layer or its architecture, and whenever using it, please cite
126 |
127 | J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, "[Spectral networks and deep locally connected networks on graphs](http://openreview.net/forum?id=DQNsQf-UsoDBa)," in _Int. Conf. Learning Representations 2014_. Banff, AB: Assoc. Comput. Linguistics, 14-16 Apr. 2014, pp. 1–14.
128 |
129 | * ___Convolutional Graph Neural Networks (via Aggregation)___. An alternative way to implementing a graph convolution is by means of building an aggregation sequence on each node. Instead of thinking of the graph signal as being diffused through the graph and each diffusion being weighed separately (as is the case of a GCNN via Selection), we think of the signal as being aggregated at each node, by means of successive communications with the one-hop neighbors, and each communication is being weighed by a separate filter tap. The key point is that these aggregation sequences exhibit a regular structure that simultaneously take into account the underlying graph support, since each contiguous element in the sequence represents a contiguous neighborhood. Once we have a regular sequence we can go ahead and apply a regular CNN to process its information. This idea is called an Aggregation GNN and is implemented in modules.architectures.AggregationGNN
, since it relies on regular convolution and pooling already defined on torch.nn
. A more sophisticated and powerful variant of the Aggregation GNN, called the __Multi-Node Aggregation GNN__ is also available on modules.architectures.MultiNodeAggregationGNN
. For more details on the Aggregation GNN, and whenever using it, please cite the following paper
130 |
131 | F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, "[Convolutional Neural Network Architectures for Signals Supported on Graphs](http://ieeexplore.ieee.org/document/8579589)," _IEEE Trans. Signal Process._, vol. 67, no. 4, pp. 1034–1049, Feb. 2019.
132 |
133 | * ___Node-Variant Graph Neural Networks___. Parameterizing _Hlfg(S)_ with a node-variant graph filter (as opposed to a shift-invariant graph filter), a non-convolutional graph neural network architecture can be built. A node-variant graph filter, essentially lets each node learn its own weight for each neighborhood information. In order to allow this architecture to scale (so that the number of learnable parameters does not depend on the size of the graph), we offer a hybrid node-variant GNN approach as well. The graph filtering layer using node-variant graph filters is defined in utils.graphML.NodeVariantGF
and an example of an architecture using these filters for the linear operation, combined with pointwise activation functions and zero-padding pooling, is available in modules.architectures.NodeVariantGNN
. For more details on node-variant GNNs, and whenever using these filters or architecture, please cite the following paper
134 |
135 | E. Isufi, F. Gama, and A. Ribeiro, "[EdgeNets: Edge Varying Graph Neural Networks](http://arxiv.org/abs/2001.07620)," submitted to _IEEE Trans. Pattern Analysis and Mach. Intell._
136 |
137 | * ___ARMA Graph Neural Networks___. A convolutional architecture that is very flexible and with enlarged descriptive power. It replaces the graph convolution with a FIR filter (i.e. the use of a polynomial of the shift operator) by an ratio of polynomials. This architecture offers a good trade-off between number of paramters and selectivity of learnable filters. The edge-variant graph filter layer can be found in utils.graphML.EdgeVariantGF
. An example of an architecture with ARMA graph filters as the linear layer, and pointwise activation functions and zero-padding pooling is available in modules.architectures.ARMAfilterGNN
. A Local
version of this architecture is also available. For more details on ARMA GNNs, and whenever using these filters or architecture, please cite the following paper
138 |
139 | E. Isufi, F. Gama, and A. Ribeiro, "[EdgeNets: Edge Varying Graph Neural Networks](http://arxiv.org/abs/2001.07620)," submitted to _IEEE Trans. Pattern Analysis and Mach. Intell._
140 |
141 | * ___Edge-Variant Graph Neural Networks___. The most general parameterization that we can make of a linear operation that also takes into account the underlying graph support, is to let each node weigh each of their neighbors' information differently. This is achieved by means of an edge-variant graph filter. Certainly, the edge-variant graph filter has a number of parameters that scales with the number of edges, so a hybrid approach is available. The edge-variant graph filter layer can be found in utils.graphML.GraphFilterARMA
. An example of an architecture with edge-variant graph filters as the linear layer, and pointwise activation functions and zero-padding pooling is available in modules.architectures.EdgeVariantGNN
. A Local
version of this architecture is also available. For more details on edge-variant GNNs, and whenever using these filters or architecture, please cite the following paper
142 |
143 | E. Isufi, F. Gama, and A. Ribeiro, "[EdgeNets: Edge Varying Graph Neural Networks](http://arxiv.org/abs/2001.07620)," submitted to _IEEE Trans. Pattern Analysis and Mach. Intell._
144 |
145 | * ___Graph Attention Networks___. A particular case of edge-variant graph filters (that predates the use of more general edge-variant filters) and that has been shown to be successful is the graph attention network (commonly known as GAT). The original implementation of GATs can be found in this repository. In this library, we offer a PyTorch adaptation of this code (which was originally written for TensorFlow). The GAT parameterizes the edge-variant graph filter by taking into account both the graph support and the data, yielding an architecture with a number of parameters that is independent of the size of the graph. The graph attentional layer can be found in utils.graphML.GraphAttentional
, and an example of this architecture in modules.architectures.GraphAttentionNetwork
. For more details on GATs, and whenever using this code, please cite the following paper
146 |
147 | P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, "[Graph Attention Networks](http://openreview.net/forum?id=rJXMpikCZ)," in _6th Int. Conf. Learning Representations_. Vancouver, BC: Assoc. Comput. Linguistics, 30 Apr.-3 May 2018, pp. 1–12.
148 |
149 | * ___Local Activation Functions___. Local activation functions exploit the irregular neighborhoods that are inherent to arbitrary graphs. Instead of just applying a pointwise (node-wise) activation function, using a local activation function that carries out a nonlinear operation within a neighborhood has been shown to be effective as well. The corresponding architecture is named LocalActivationGNN
and is available under modules/architectures.py
. In particular, in this code, the __median activation function__ is implemented in utils.graphML.MedianLocalActivation
and the __max activation function__ is implemented in utils.graphML.MaxLocalActivation
. For more details on local activation function, and whenever using these operational layers, please cite the following papers
150 |
151 | L. Ruiz, F. Gama, A. G. Marques, and A. Ribeiro, "[Invariance-Preserving Localized Activation Functions for Graph Neural Networks](https://ieeexplore.ieee.org/document/8911416)," _IEEE Trans. Signal Process._, vol. 68, no. 1, pp. 127-141, Jan. 2020.
152 |
153 | * ___Time-Varying Architectures___. The Selection and Aggregation GNNs have a version adapted to handling time-varying graph signals as well as time-varying shift operators, acting with a unit-delay between communication with neighbors. These architectures can be found in architecturesTime.LocalGNN_DB
and architecturesTime.AggregationGNN_DB
. For more details on these architectures, please see (and if use, please cite)
154 |
155 | F. Gama, E. Tolstaya, and A. Ribeiro, "[Graph Neural Networks for Decentralized Controllers](http://arxiv.org/abs/2003.10280)," _arXiv:2003.10280v1 [cs.LG],_ 23 March 2020.
156 |
157 | E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, and A. Ribeiro, "[Learning Decentralized COntrollers for Robot Swarms with Graph Neural Networks](http://arxiv.org/abs/1903.10527)," in _Conf. Robot Learning 2019._ Osaka, Japan: Int. Found. Robotics Res., 30 Oct.-1 Nov. 2019.
158 |
159 | * ___Graph Recurrent Neural Networks___. A graph RNN approximates a time-varying graph process with a hidden Markov model, where the hidden state is learned from data. In a graph RNN all linear transforms involved are graph filters that respect the graph. This is a highly flexible architecture that exploits the graph structure as well as the time-dependencies present in data. For static graphs, the architecture can be found in architectures.GraphRecurrentNN
, and in architectures.GatedGraphRecurrentNN
for time, node and edge gated variations. For time-varying graphs, the architecture is architecturesTime.GraphRecurrentNN_DB
. For more details please see, and when using this architecture please cite,
160 |
161 | L. Ruiz, F. Gama, and A. Ribeiro, "[Gated Graph Recurrent Neural Networks](http://arxiv.org/abs/2002.01038)," submitted to _IEEE Trans. Signal Process._
162 |
163 | ### Examples
164 |
165 | We have included an in-depth [tutorial](tutorial.ipynb) tutorial.ipynb
on a [Jupyter Notebook](http://jupyter.org/). We have also included other examples involving all the four datasets presented [above](#datasets), with examples of all the architectures [just](#architectures) discussed.
166 |
167 | * [Tutorial](tutorial.ipynb): tutorial.ipynb
. The tutorial covers the basic mathematical formulation for the graph neural networks, and considers a small synthetic problem of source localization. It implements the Aggregation and Selection GNN (both zero-padding and graph coarsening). This tutorial explain, in-depth, all the elements intervening in the setup, training and evaluation of the models, that serves as skeleton for all the other examples.
168 |
169 | * [Source Localization](examples/sourceLocGNN.py): sourceLocGNN.py
. This example deals with the source localization problem on a 100-node, 5-community random-generated SBM graph. It can consider multiple graph and data realizations to account for randomness in data generation. Implementations of Selection and Aggregation GNNs with different node sampling criteria are presented.
170 |
171 | * [MovieLens](examples/movieGNN.py): movieGNN.py
. This example has the objective of predicting the rating some user would give to a movie, based on the movies it has ranked before (following the MovieLens-100k dataset). In this case we present a one- and two-layer Selection GNN with no-padding and the one- and two-layer local implementation available at LocalGNN
.
172 |
173 | * [Authorship Attribution](examples/authorshipGNN.py): authorshipGNN.py
. This example addresses the problem of authorship attribution, by which a text has to be assigned to some author according to their styolmetric signature (based on the underlying word adjacency network; details here). In this case, we test different local activation functions (median, max, and pointwise).
174 |
175 | * [Flocking](examples/flockingGNN.py): flockingGNN.py
. This is an example of controlling a robot swarm to fly together at the same velocity while avoiding collisions. It is a synthetic dataset where time-dependent architectures can be tested. In particular, we test the use of a linear filter, a Local GNN, an Aggregation GNN and a GRNN, considering, not only samples of the form (S_t, x_t), for each t, but also delayed communications where the information observed from further away neighbors is actually delayed.
176 |
177 | * [Epidemic Tracking](examples/epidemicGRNN.py): epidemicGRNN.py
. In this example, we compare GRNNs and gated GRNNs in a binary node classification problem modeling the spread of an epidemic on a high school friendship network. The disease is first recorded on day t=0, when each individual node is infected with probability p_seed=0.05. On the days that follow, an infected student can then spread the disease to their susceptible friends with probability p_inf=0.3 each day. Infected students become immune after 4 days, at which point they can no longer spread or contract the disease. Given the state of each node at some point in time (susceptible, infected or recovered), the binary node classification problem is to predict whether each node in the network will have the disease (i.e., be infected) 8 days ahead.
178 |
179 | ## Version
180 |
181 | * ___0.4 (March 5, 2021):___ Added the main file for the epidemic tracking experiment, epidemicGRNN.py
. Added the edge list from which the graph used in this experiment is built. dataTools.py
now has an Epidemics
class which handles the abovementioned graph and the epidemic data. loss.py
now has a new loss function, which computes the loss corresponding to the F1 score (1-F1 score). graphML.py
now has the functional GatedGRNN
and the layers HiddenState
, TimeGatedHiddenState
, NodeGatedHiddenState
, EdgeGatedHiddenState
, which are used to calculate the hidden state of (gated) GRNNs. architectures.py
now has the architectures GraphRecurrentNN
and GatedGraphRecurrentNN
.
182 |
183 | * ___0.3 (May 2, 2020):___ Added the time-dependent architectures that handle (graph, graph signal) batch data as well as delayed communications. These architectures can be found in architecturesTime.py
. A new synthetic dataset has also been added, namely, that used in the Flocking problem. Made the Model
class to be the central handler of all the machine learning model. Training multiple models has been dropped in favor of training through the method offered in the Model
class. Trainers and evaluators had to been added to be effective intermediaries between the architectures and the data, especially in problems that are not classification ones (i.e. regression -interpolation- in the movie recommendation setting, and imitation learning in the flocking problem). This should give flexibility to carry over these architectures to new problems, as well as make prototyping easier since training and evaluating has been greatly simplified. Minor modifications and eventual bug fixes have been made here and there.
184 |
185 | * ___0.2 (Dec 16, 2019):___ Added new architecture: LocalActivationGNN
and LocalGNN
. Added new loss module to handle the logic that gives flexibility to the loss function. Moved the ordering from external to the architecture, to internal to it. Added two new methods: .splitForward()
and .changeGSO()
to separate the output from the graph layers and the MLP, and to change the GSO from training to test time, respectively. Class Model
does not keep track of the order anymore. Got rid of MATLAB(R)
support. Better memory management (do not move the entire dataset to memory, only the batch). Created methods to normalize dat aand change data type. Deleted the 20News dataset which is not supported anymore. Added the method .expandDims()
to the data
for increased flexibility. Changed the evaluate method so that it is always a decreasing function. Totally revamped the MovieLens
class. Corrected a bug on the computeNeighborhood()
function (thanks to Bianca Iancu, A (dot) Iancu-1 (at) student (dot) tudelft (dot) nl and Gabriele Mazzola, G (dot) Mazzola (at) student (dot) tudelft (dot) nl for spotting it). Corrected bugs on device handling of local activation functions. Updated tutorial.
186 |
187 | * ___0.1 (Jul 12, 2019):___ First released (beta) version of this graph neural network library. Includes the basic convolutional graph neural networks (selection -zero-padding and graph coarsening-, spectral, aggregation), and some non-convolutional graph neural networks as well (node-variant, edge-variant and graph attention networks). It also inlcudes local activation functions (max and median). In terms of examples, it considers the source localization problem (both in the tutorial and in a separate example), the movie recommendation problem, the authorship attribution problem and the text categorization problems. In terms of structure, it sets the basis for data handling and training of multiple models.
188 |
--------------------------------------------------------------------------------
/aggGNN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alelab-upenn/graph-neural-networks/a84a39fabad5378bdcbaad20b5dbcff14b4eebcd/aggGNN.png
--------------------------------------------------------------------------------
/alegnn/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alelab-upenn/graph-neural-networks/a84a39fabad5378bdcbaad20b5dbcff14b4eebcd/alegnn/__init__.py
--------------------------------------------------------------------------------
/alegnn/modules/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alelab-upenn/graph-neural-networks/a84a39fabad5378bdcbaad20b5dbcff14b4eebcd/alegnn/modules/__init__.py
--------------------------------------------------------------------------------
/alegnn/modules/architecturesTime.py:
--------------------------------------------------------------------------------
1 | # 2019/12/31~
2 | # Fernando Gama, fgama@seas.upenn.edu
3 | # Luana Ruiz, rubruiz@seas.upenn.edu
4 | # Kate Tolstaya, eig@seas.upenn.edu
5 | """
6 | architecturesTime.py Architectures module
7 |
8 | Definition of GNN architectures. The basic idea of these architectures is that
9 | the data comes in the form {(S_t, x_t)} where the shift operator as well as the
10 | signal change with time, and where each training point consists of a trajectory.
11 | Unlike architectures.py where the shift operator S is fixed (although it can
12 | be changed after the architectures has been initialized) and the training set
13 | consist of a set of {x_b} with b=1,...,B for a total of B samples, here the
14 | training set is assumed to be a trajectory, and to include a different shift
15 | operator for each sample {(S_t, x_t)_{t=1}^{T}}_{b=1,...,B}. Also, all
16 | implementations consider a unit delay exchange (i.e. the S_t and x_t values
17 | get delayed by one unit of time for each neighboring exchange).
18 |
19 | LocalGNN_DB: implements the selection GNN architecture by means of local
20 | operations only
21 | GraphRecurrentNN_DB: implements the GRNN architecture
22 | AggregationGNN_DB: implements the aggregation GNN architecture
23 | """
24 |
25 | import numpy as np
26 | import torch
27 | import torch.nn as nn
28 |
29 | import alegnn.utils.graphML as gml
30 |
31 | zeroTolerance = 1e-9 # Absolute values below this number are considered zero.
32 |
33 | class LocalGNN_DB(nn.Module):
34 | """
35 | LocalGNN_DB: implement the local GNN architecture where all operations are
36 | implemented locally, i.e. by means of neighboring exchanges only. More
37 | specifically, it has graph convolutional layers, but the readout layer,
38 | instead of being an MLP for the entire graph signal, it is a linear
39 | combination of the features at each node. It considers signals
40 | that change in time with batch GSOs.
41 |
42 | Initialization:
43 |
44 | LocalGNN_DB(dimNodeSignals, nFilterTaps, bias, # Graph Filtering
45 | nonlinearity, # Nonlinearity
46 | dimReadout, # Local readout layer
47 | dimEdgeFeatures) # Structure
48 |
49 | Input:
50 | /** Graph convolutional layers **/
51 | dimNodeSignals (list of int): dimension of the signals at each layer
52 | (i.e. number of features at each node, or size of the vector
53 | supported at each node)
54 | nFilterTaps (list of int): number of filter taps on each layer
55 | (i.e. nFilterTaps-1 is the extent of neighborhoods that are
56 | reached, for example K=2 is info from the 1-hop neighbors)
57 | bias (bool): include bias after graph filter on every layer
58 | >> Obs.: dimNodeSignals[0] is the number of features (the dimension
59 | of the node signals) of the data, where dimNodeSignals[l] is the
60 | dimension obtained at the output of layer l, l=1,...,L.
61 | Therefore, for L layers, len(dimNodeSignals) = L+1. Slightly
62 | different, nFilterTaps[l] is the number of filter taps for the
63 | filters implemented at layer l+1, thus len(nFilterTaps) = L.
64 |
65 | /** Activation function **/
66 | nonlinearity (torch.nn): module from torch.nn non-linear activations
67 |
68 | /** Readout layers **/
69 | dimReadout (list of int): number of output hidden units of a
70 | sequence of fully connected layers applied locally at each node
71 | (i.e. no exchange of information involved).
72 |
73 | /** Graph structure **/
74 | dimEdgeFeatures (int): number of edge features
75 |
76 | Output:
77 | nn.Module with a Local GNN architecture with the above specified
78 | characteristics that considers time-varying batch GSO and delayed
79 | signals
80 |
81 | Forward call:
82 |
83 | LocalGNN_DB(x, S)
84 |
85 | Input:
86 | x (torch.tensor): input data of shape
87 | batchSize x timeSamples x dimFeatures x numberNodes
88 | GSO (torch.tensor): graph shift operator; shape
89 | batchSize x timeSamples (x dimEdgeFeatures)
90 | x numberNodes x numberNodes
91 |
92 | Output:
93 | y (torch.tensor): output data after being processed by the GNN;
94 | batchSize x timeSamples x dimReadout[-1] x numberNodes
95 |
96 | Other methods:
97 |
98 | y, yGNN = .splitForward(x, S): gives the output of the entire GNN y,
99 | which has shape batchSize x timeSamples x dimReadout[-1] x numberNodes,
100 | as well as the output of all the GNN layers (i.e. before the readout
101 | layers), yGNN of shape batchSize x timeSamples x dimFeatures[-1]
102 | x numberNodes. This can be used to isolate the effect of the graph
103 | convolutions from the effect of the readout layer.
104 |
105 | y = .singleNodeForward(x, S, nodes): outputs the value of the last
106 | layer at a single node. x is the usual input of shape batchSize
107 | x timeSamples x dimFeatures x numberNodes. nodes is either a single
108 | node (int) or a collection of nodes (list or numpy.array) of length
109 | batchSize, where for each element in the batch, we get the output at
110 | the single specified node. The output y is of shape batchSize
111 | x timeSamples x dimReadout[-1].
112 | """
113 |
114 | def __init__(self,
115 | # Graph filtering
116 | dimNodeSignals, nFilterTaps, bias,
117 | # Nonlinearity
118 | nonlinearity,
119 | # MLP in the end
120 | dimReadout,
121 | # Structure
122 | dimEdgeFeatures):
123 | # Initialize parent:
124 | super().__init__()
125 | # dimNodeSignals should be a list and of size 1 more than nFilter taps.
126 | assert len(dimNodeSignals) == len(nFilterTaps) + 1
127 |
128 | # Store the values (using the notation in the paper):
129 | self.L = len(nFilterTaps) # Number of graph filtering layers
130 | self.F = dimNodeSignals # Features
131 | self.K = nFilterTaps # Filter taps
132 | self.E = dimEdgeFeatures # Number of edge features
133 | self.bias = bias # Boolean
134 | # Store the rest of the variables
135 | self.sigma = nonlinearity
136 | self.dimReadout = dimReadout
137 | # And now, we're finally ready to create the architecture:
138 | #\\\ Graph filtering layers \\\
139 | # OBS.: We could join this for with the one before, but we keep separate
140 | # for clarity of code.
141 | gfl = [] # Graph Filtering Layers
142 | for l in range(self.L):
143 | #\\ Graph filtering stage:
144 | gfl.append(gml.GraphFilter_DB(self.F[l], self.F[l+1], self.K[l],
145 | self.E, self.bias))
146 | #\\ Nonlinearity
147 | gfl.append(self.sigma())
148 | # And now feed them into the sequential
149 | self.GFL = nn.Sequential(*gfl) # Graph Filtering Layers
150 | #\\\ MLP (Fully Connected Layers) \\\
151 | fc = []
152 | if len(self.dimReadout) > 0: # Maybe we don't want to readout anything
153 | # The first layer has to connect whatever was left of the graph
154 | # filtering stage to create the number of features required by
155 | # the readout layer
156 | fc.append(nn.Linear(self.F[-1], dimReadout[0], bias = self.bias))
157 | # The last linear layer cannot be followed by nonlinearity, because
158 | # usually, this nonlinearity depends on the loss function (for
159 | # instance, if we have a classification problem, this nonlinearity
160 | # is already handled by the cross entropy loss or we add a softmax.)
161 | for l in range(len(dimReadout)-1):
162 | # Add the nonlinearity because there's another linear layer
163 | # coming
164 | fc.append(self.sigma())
165 | # And add the linear layer
166 | fc.append(nn.Linear(dimReadout[l], dimReadout[l+1],
167 | bias = self.bias))
168 | # And we're done
169 | self.Readout = nn.Sequential(*fc)
170 | # so we finally have the architecture.
171 |
172 | def splitForward(self, x, S):
173 |
174 | # Check the dimensions of the input
175 | # S: B x T (x E) x N x N
176 | # x: B x T x F[0] x N
177 | assert len(S.shape) == 4 or len(S.shape) == 5
178 | if len(S.shape) == 4:
179 | S = S.unsqueeze(2)
180 | B = S.shape[0]
181 | T = S.shape[1]
182 | assert S.shape[2] == self.E
183 | N = S.shape[3]
184 | assert S.shape[4] == N
185 |
186 | assert len(x.shape) == 4
187 | assert x.shape[0] == B
188 | assert x.shape[1] == T
189 | assert x.shape[2] == self.F[0]
190 | assert x.shape[3] == N
191 |
192 | # Add the GSO at each layer
193 | for l in range(self.L):
194 | self.GFL[2*l].addGSO(S)
195 | # Let's call the graph filtering layer
196 | yGFL = self.GFL(x)
197 | # Change the order, for the readout
198 | y = yGFL.permute(0, 1, 3, 2) # B x T x N x F[-1]
199 | # And, feed it into the Readout layer
200 | y = self.Readout(y) # B x T x N x dimReadout[-1]
201 | # Reshape and return
202 | return y.permute(0, 1, 3, 2), yGFL
203 | # B x T x dimReadout[-1] x N, B x T x dimFeatures[-1] x N
204 |
205 | def forward(self, x, S):
206 |
207 | # Most of the times, we just need the actual, last output. But, since in
208 | # this case, we also want to compare with the output of the GNN itself,
209 | # we need to create this other forward funciton that takes both outputs
210 | # (the GNN and the MLP) and returns only the MLP output in the proper
211 | # forward function.
212 | output, _ = self.splitForward(x, S)
213 |
214 | return output
215 |
216 | def singleNodeForward(self, x, S, nodes):
217 |
218 | # x is of shape B x T x F[0] x N
219 | batchSize = x.shape[0]
220 | N = x.shape[3]
221 |
222 | # nodes is either an int, or a list/np.array of ints of size B
223 | assert type(nodes) is int \
224 | or type(nodes) is list \
225 | or type(nodes) is np.ndarray
226 |
227 | # Let us start by building the selection matrix
228 | # This selection matrix has to be a matrix of shape
229 | # B x 1 x N[-1] x 1
230 | # so that when multiplying with the output of the forward, we get a
231 | # B x T x dimRedout[-1] x 1
232 | # and we just squeeze the last dimension
233 |
234 | # TODO: The big question here is if multiplying by a matrix is faster
235 | # than doing torch.index_select
236 |
237 | # Let's always work with numpy arrays to make it easier.
238 | if type(nodes) is int:
239 | # Change the node number to accommodate the new order
240 | nodes = self.order.index(nodes)
241 | # If it's int, make it a list and an array
242 | nodes = np.array([nodes], dtype=np.int)
243 | # And repeat for the number of batches
244 | nodes = np.tile(nodes, batchSize)
245 | if type(nodes) is list:
246 | newNodes = [self.order.index(n) for n in nodes]
247 | nodes = np.array(newNodes, dtype = np.int)
248 | elif type(nodes) is np.ndarray:
249 | newNodes = np.array([np.where(np.array(self.order) == n)[0][0] \
250 | for n in nodes])
251 | nodes = newNodes.astype(np.int)
252 | # Now, nodes is an np.int np.ndarray with shape batchSize
253 |
254 | # Build the selection matrix
255 | selectionMatrix = np.zeros([batchSize, 1, N, 1])
256 | selectionMatrix[np.arange(batchSize), nodes, 0] = 1.
257 | # And convert it to a tensor
258 | selectionMatrix = torch.tensor(selectionMatrix,
259 | dtype = x.dtype,
260 | device = x.device)
261 |
262 | # Now compute the output
263 | y = self.forward(x, S)
264 | # This output is of size B x T x dimReadout[-1] x N
265 |
266 | # Multiply the output
267 | y = torch.matmul(y, selectionMatrix)
268 | # B x T x dimReadout[-1] x 1
269 |
270 | # Squeeze the last dimension and return
271 | return y.squeeze(3)
272 |
273 | class GraphRecurrentNN_DB(nn.Module):
274 | """
275 | GraphRecurrentNN_DB: implements the GRNN architecture on a time-varying GSO
276 | batch and delayed signals. It is a single-layer GRNN and the hidden
277 | state is initialized at random drawing from a standard gaussian.
278 |
279 | Initialization:
280 |
281 | GraphRecurrentNN_DB(dimInputSignals, dimOutputSignals,
282 | dimHiddenSignals, nFilterTaps, bias, # Filtering
283 | nonlinearityHidden, nonlinearityOutput,
284 | nonlinearityReadout, # Nonlinearities
285 | dimReadout, # Local readout layer
286 | dimEdgeFeatures) # Structure
287 |
288 | Input:
289 | /** Graph convolutions **/
290 | dimInputSignals (int): dimension of the input signals
291 | dimOutputSignals (int): dimension of the output signals
292 | dimHiddenSignals (int): dimension of the hidden state
293 | nFilterTaps (list of int): a list with two elements, the first one
294 | is the number of filter taps for the filters in the hidden
295 | state equation, the second one is the number of filter taps
296 | for the filters in the output
297 | bias (bool): include bias after graph filter on every layer
298 |
299 | /** Activation functions **/
300 | nonlinearityHidden (torch.function): the nonlinearity to apply
301 | when computing the hidden state; it has to be a torch function,
302 | not a nn.Module
303 | nonlinearityOutput (torch.function): the nonlinearity to apply when
304 | computing the output signal; it has to be a torch function, not
305 | a nn.Module.
306 | nonlinearityReadout (nn.Module): the nonlinearity to apply at the
307 | end of the readout layer (if the readout layer has more than
308 | one layer); this one has to be a nn.Module, instead of just a
309 | torch function.
310 |
311 | /** Readout layer **/
312 | dimReadout (list of int): number of output hidden units of a
313 | sequence of fully connected layers applied locally at each node
314 | (i.e. no exchange of information involved).
315 |
316 | /** Graph structure **/
317 | dimEdgeFeatures (int): number of edge features
318 |
319 | Output:
320 | nn.Module with a GRNN architecture with the above specified
321 | characteristics that considers time-varying batch GSO and delayed
322 | signals
323 |
324 | Forward call:
325 |
326 | GraphRecurrentNN_DB(x, S)
327 |
328 | Input:
329 | x (torch.tensor): input data of shape
330 | batchSize x timeSamples x dimInputSignals x numberNodes
331 | GSO (torch.tensor): graph shift operator; shape
332 | batchSize x timeSamples (x dimEdgeFeatures)
333 | x numberNodes x numberNodes
334 |
335 | Output:
336 | y (torch.tensor): output data after being processed by the GRNN;
337 | batchSize x timeSamples x dimReadout[-1] x numberNodes
338 |
339 | Other methods:
340 |
341 | y, yGNN = .splitForward(x, S): gives the output of the entire GRNN y,
342 | which has shape batchSize x timeSamples x dimReadout[-1] x numberNodes,
343 | as well as the output of the GRNN (i.e. before the readout layers),
344 | yGNN of shape batchSize x timeSamples x dimInputSignals x numberNodes.
345 | This can be used to isolate the effect of the graph convolutions from
346 | the effect of the readout layer.
347 |
348 | y = .singleNodeForward(x, S, nodes): outputs the value of the last
349 | layer at a single node. x is the usual input of shape batchSize
350 | x timeSamples x dimInputSignals x numberNodes. nodes is either a single
351 | node (int) or a collection of nodes (list or numpy.array) of length
352 | batchSize, where for each element in the batch, we get the output at
353 | the single specified node. The output y is of shape batchSize
354 | x timeSamples x dimReadout[-1].
355 | """
356 | def __init__(self,
357 | # Graph filtering
358 | dimInputSignals,
359 | dimOutputSignals,
360 | dimHiddenSignals,
361 | nFilterTaps, bias,
362 | # Nonlinearities
363 | nonlinearityHidden,
364 | nonlinearityOutput,
365 | nonlinearityReadout, # nn.Module
366 | # Local MLP in the end
367 | dimReadout,
368 | # Structure
369 | dimEdgeFeatures):
370 | # Initialize parent:
371 | super().__init__()
372 |
373 | # A list of two int, one for the number of filter taps (the computation
374 | # of the hidden state has the same number of filter taps)
375 | assert len(nFilterTaps) == 2
376 |
377 | # Store the values (using the notation in the paper):
378 | self.F = dimInputSignals # Number of input features
379 | self.G = dimOutputSignals # Number of output features
380 | self.H = dimHiddenSignals # NUmber of hidden features
381 | self.K = nFilterTaps # Filter taps
382 | self.E = dimEdgeFeatures # Number of edge features
383 | self.bias = bias # Boolean
384 | # Store the rest of the variables
385 | self.sigma = nonlinearityHidden
386 | self.rho = nonlinearityOutput
387 | self.nonlinearityReadout = nonlinearityReadout
388 | self.dimReadout = dimReadout
389 | #\\\ Hidden State RNN \\\
390 | # Create the layer that generates the hidden state, and generate z0
391 | self.hiddenState = gml.HiddenState_DB(self.F, self.H, self.K[0],
392 | nonlinearity = self.sigma, E = self.E,
393 | bias = self.bias)
394 | #\\\ Output Graph Filters \\\
395 | self.outputState = gml.GraphFilter_DB(self.H, self.G, self.K[1],
396 | E = self.E, bias = self.bias)
397 | #\\\ MLP (Fully Connected Layers) \\\
398 | fc = []
399 | if len(self.dimReadout) > 0: # Maybe we don't want to readout anything
400 | # The first layer has to connect whatever was left of the graph
401 | # filtering stage to create the number of features required by
402 | # the readout layer
403 | fc.append(nn.Linear(self.G, dimReadout[0], bias = self.bias))
404 | # The last linear layer cannot be followed by nonlinearity, because
405 | # usually, this nonlinearity depends on the loss function (for
406 | # instance, if we have a classification problem, this nonlinearity
407 | # is already handled by the cross entropy loss or we add a softmax.)
408 | for l in range(len(dimReadout)-1):
409 | # Add the nonlinearity because there's another linear layer
410 | # coming
411 | fc.append(self.nonlinearityReadout())
412 | # And add the linear layer
413 | fc.append(nn.Linear(dimReadout[l], dimReadout[l+1],
414 | bias = self.bias))
415 | # And we're done
416 | self.Readout = nn.Sequential(*fc)
417 | # so we finally have the architecture.
418 |
419 | def splitForward(self, x, S):
420 |
421 | # Check the dimensions of the input
422 | # S: B x T (x E) x N x N
423 | # x: B x T x F[0] x N
424 | assert len(S.shape) == 4 or len(S.shape) == 5
425 | if len(S.shape) == 4:
426 | S = S.unsqueeze(2)
427 | B = S.shape[0]
428 | T = S.shape[1]
429 | assert S.shape[2] == self.E
430 | N = S.shape[3]
431 | assert S.shape[4] == N
432 |
433 | assert len(x.shape) == 4
434 | assert x.shape[0] == B
435 | assert x.shape[1] == T
436 | assert x.shape[2] == self.F
437 | assert x.shape[3] == N
438 |
439 | # This can be generated here or generated outside of here, not clear yet
440 | # what's the most coherent option
441 | z0 = torch.randn((B, self.H, N), device = x.device)
442 |
443 | # Add the GSO for each graph filter
444 | self.hiddenState.addGSO(S)
445 | self.outputState.addGSO(S)
446 |
447 | # Compute the trajectory of hidden states
448 | z, _ = self.hiddenState(x, z0)
449 | # Compute the output trajectory from the hidden states
450 | yOut = self.outputState(z)
451 | yOut = self.rho(yOut) # Don't forget the nonlinearity!
452 | # B x T x G x N
453 | # Change the order, for the readout
454 | y = yOut.permute(0, 1, 3, 2) # B x T x N x G
455 | # And, feed it into the Readout layer
456 | y = self.Readout(y) # B x T x N x dimReadout[-1]
457 | # Reshape and return
458 | return y.permute(0, 1, 3, 2), yOut
459 | # B x T x dimReadout[-1] x N, B x T x dimFeatures[-1] x N
460 |
461 | def forward(self, x, S):
462 |
463 | # Most of the times, we just need the actual, last output. But, since in
464 | # this case, we also want to compare with the output of the GNN itself,
465 | # we need to create this other forward funciton that takes both outputs
466 | # (the GNN and the MLP) and returns only the MLP output in the proper
467 | # forward function.
468 | output, _ = self.splitForward(x, S)
469 |
470 | return output
471 |
472 | def singleNodeForward(self, x, S, nodes):
473 |
474 | # x is of shape B x T x F[0] x N
475 | batchSize = x.shape[0]
476 | N = x.shape[3]
477 |
478 | # nodes is either an int, or a list/np.array of ints of size B
479 | assert type(nodes) is int \
480 | or type(nodes) is list \
481 | or type(nodes) is np.ndarray
482 |
483 | # Let us start by building the selection matrix
484 | # This selection matrix has to be a matrix of shape
485 | # B x 1 x N[-1] x 1
486 | # so that when multiplying with the output of the forward, we get a
487 | # B x T x dimRedout[-1] x 1
488 | # and we just squeeze the last dimension
489 |
490 | # TODO: The big question here is if multiplying by a matrix is faster
491 | # than doing torch.index_select
492 |
493 | # Let's always work with numpy arrays to make it easier.
494 | if type(nodes) is int:
495 | # Change the node number to accommodate the new order
496 | nodes = self.order.index(nodes)
497 | # If it's int, make it a list and an array
498 | nodes = np.array([nodes], dtype=np.int)
499 | # And repeat for the number of batches
500 | nodes = np.tile(nodes, batchSize)
501 | if type(nodes) is list:
502 | newNodes = [self.order.index(n) for n in nodes]
503 | nodes = np.array(newNodes, dtype = np.int)
504 | elif type(nodes) is np.ndarray:
505 | newNodes = np.array([np.where(np.array(self.order) == n)[0][0] \
506 | for n in nodes])
507 | nodes = newNodes.astype(np.int)
508 | # Now, nodes is an np.int np.ndarray with shape batchSize
509 |
510 | # Build the selection matrix
511 | selectionMatrix = np.zeros([batchSize, 1, N, 1])
512 | selectionMatrix[np.arange(batchSize), nodes, 0] = 1.
513 | # And convert it to a tensor
514 | selectionMatrix = torch.tensor(selectionMatrix,
515 | dtype = x.dtype,
516 | device = x.device)
517 |
518 | # Now compute the output
519 | y = self.forward(x, S)
520 | # This output is of size B x T x dimReadout[-1] x N
521 |
522 | # Multiply the output
523 | y = torch.matmul(y, selectionMatrix)
524 | # B x T x dimReadout[-1] x 1
525 |
526 | # Squeeze the last dimension and return
527 | return y.squeeze(3)
528 |
529 | class AggregationGNN_DB(nn.Module):
530 | """
531 | AggregationGNN_DB: implement the aggregation GNN architecture with delayed
532 | time structure and batch GSOs
533 |
534 | Initialization:
535 |
536 | Input:
537 | /** Regular convolutional layers **/
538 | dimFeatures (list of int): number of features on each layer
539 | nFilterTaps (list of int): number of filter taps on each layer
540 | bias (bool): include bias after graph filter on every layer
541 | >> Obs.: dimFeatures[0] is the number of features (the dimension
542 | of the node signals) of the data, where dimFeatures[l] is the
543 | dimension obtained at the output of layer l, l=1,...,L.
544 | Therefore, for L layers, len(dimFeatures) = L+1. Slightly
545 | different, nFilterTaps[l] is the number of filter taps for the
546 | filters implemented at layer l+1, thus len(nFilterTaps) = L.
547 |
548 | /** Activation function **/
549 | nonlinearity (torch.nn): module from torch.nn non-linear activations
550 |
551 | /** Pooling **/
552 | poolingFunction (torch.nn): module from torch.nn pooling layers
553 | poolingSize (list of int): size of the neighborhood to compute the
554 | summary from at each layer
555 |
556 | /** Readout layer **/
557 | dimReadout (list of int): number of output hidden units of a
558 | sequence of fully connected layers after the filters have
559 | been applied
560 |
561 | /** Graph structure **/
562 | dimEdgeFeatures (int): number of edge features
563 | nExchanges (int): maximum number of neighborhood exchanges
564 |
565 | Output:
566 | nn.Module with an Aggregation GNN architecture with the above
567 | specified characteristics.
568 |
569 | Forward call:
570 |
571 | Input:
572 | x (torch.tensor): input data of shape
573 | batchSize x timeSamples x dimFeatures x numberNodes
574 | GSO (torch.tensor): graph shift operator of shape
575 | batchSize x timeSamples (x dimEdgeFeatures)
576 | x numberNodes x numberNodes
577 |
578 | Output:
579 | y (torch.tensor): output data after being processed by the selection
580 | GNN; shape: batchSize x x timeSamples x dimReadout[-1] x nNodes
581 | """
582 | def __init__(self,
583 | # Graph filtering
584 | dimFeatures, nFilterTaps, bias,
585 | # Nonlinearity
586 | nonlinearity,
587 | # Pooling
588 | poolingFunction, poolingSize,
589 | # MLP in the end
590 | dimReadout,
591 | # Structure
592 | dimEdgeFeatures, nExchanges):
593 | super().__init__()
594 | # dimNodeSignals should be a list and of size 1 more than nFilter taps.
595 | assert len(dimFeatures) == len(nFilterTaps) + 1
596 | # poolingSize also has to be a list of the same size
597 | assert len(poolingSize) == len(nFilterTaps)
598 | # Check whether the GSO has features or not. After that, always handle
599 | # it as a matrix of dimension E x N x N.
600 |
601 | # Store the values (using the notation in the paper):
602 | self.L = len(nFilterTaps) # Number of convolutional layers
603 | self.F = dimFeatures # Features
604 | self.K = nFilterTaps # Filter taps
605 | self.E = dimEdgeFeatures # Dimension of edge features
606 | self.bias = bias # Boolean
607 | self.sigma = nonlinearity
608 | self.rho = poolingFunction
609 | self.alpha = poolingSize # This acts as both the kernel_size and the
610 | # stride, so there is no overlap on the elements over which we take
611 | # the maximum (this is how it works as default)
612 | self.dimReadout = dimReadout
613 | self.nExchanges = nExchanges # Number of exchanges
614 | # Let's also record the number of nodes on each layer (L+1, actually)
615 | self.N = [self.nExchanges+1] # If we have one exchange, then we have
616 | # two entries in the collected vector (the zeroth-exchange the
617 | # first exchange)
618 | for l in range(self.L):
619 | # In pyTorch, the convolution is a valid correlation, instead of a
620 | # full one, which means that the output is smaller than the input.
621 | # Precisely, this smaller (check documentation for nn.conv1d)
622 | outConvN = self.N[l] - (self.K[l] - 1) # Size of the conv output
623 | # The next equation to compute the number of nodes is obtained from
624 | # the maxPool1d help in the pytorch documentation
625 | self.N += [int(
626 | (outConvN - (self.alpha[l]-1) - 1)/self.alpha[l] + 1
627 | )]
628 | # int() on a float always applies floor()
629 |
630 | # And now, we're finally ready to create the architecture:
631 | #\\\ Graph filtering layers \\\
632 | # OBS.: We could join this for with the one before, but we keep separate
633 | # for clarity of code.
634 | convl = [] # Convolutional Layers
635 | for l in range(self.L):
636 | #\\ Graph filtering stage:
637 | convl.append(nn.Conv1d(self.F[l]*self.E,
638 | self.F[l+1]*self.E,
639 | self.K[l],
640 | bias = self.bias))
641 | #\\ Nonlinearity
642 | convl.append(self.sigma())
643 | #\\ Pooling
644 | convl.append(self.rho(self.alpha[l]))
645 | # And now feed them into the sequential
646 | self.ConvLayers = nn.Sequential(*convl) # Convolutional layers
647 | #\\\ MLP (Fully Connected Layers) \\\
648 | fc = []
649 | if len(self.dimReadout) > 0: # Maybe we don't want to MLP anything
650 | # The first layer has to connect whatever was left of the graph
651 | # signal, flattened.
652 | dimInputReadout = self.N[-1] * self.F[-1] * self.E
653 | # (i.e., we have N[-1] nodes left, each one described by F[-1]
654 | # features which means this will be flattened into a vector of size
655 | # N[-1]*F[-1])
656 | fc.append(nn.Linear(dimInputReadout,dimReadout[0],bias=self.bias))
657 | # The last linear layer cannot be followed by nonlinearity, because
658 | # usually, this nonlinearity depends on the loss function (for
659 | # instance, if we have a classification problem, this nonlinearity
660 | # is already handled by the cross entropy loss or we add a softmax.)
661 | for l in range(len(dimReadout)-1):
662 | # Add the nonlinearity because there's another linear layer
663 | # coming
664 | fc.append(self.sigma())
665 | # And add the linear layer
666 | fc.append(nn.Linear(dimReadout[l], dimReadout[l+1],
667 | bias = self.bias))
668 | # And we're done within each node
669 | self.Readout = nn.Sequential(*fc)
670 |
671 | def forward(self, x, S):
672 |
673 | # Check the dimensions of the input first
674 | # S: B x T (x E) x N x N
675 | # x: B x T x F[0] x N
676 | assert len(S.shape) == 4 or len(S.shape) == 5
677 | if len(S.shape) == 4:
678 | # Then S is B x T x N x N
679 | S = S.unsqueeze(2) # And we want it B x T x 1 x N x N
680 | B = S.shape[0]
681 | T = S.shape[1]
682 | assert S.shape[2] == self.E
683 | N = S.shape[3]
684 | assert S.shape[4] == N
685 | # Check the dimensions of x
686 | assert len(x.shape) == 4
687 | assert x.shape[0] == B
688 | assert x.shape[1] == T
689 | assert x.shape[2] == self.F[0]
690 | assert x.shape[3] == N
691 |
692 | # Now we need to do the exchange to build the aggregation vector at
693 | # every node
694 | # z has to be of shape: B x T x F[0] x (nExchanges+1) x N
695 | # to be fed into conv1d it has to be (B*T*N) x F[0] x (nExchanges+1)
696 |
697 | # This vector is built by multiplying x with S, so we need to adapt x
698 | # to have a dimension that can be multiplied by S (we need to add the
699 | # E dimension)
700 | x = x.reshape([B, T, 1, self.F[0], N]).repeat(1, 1, self.E, 1, 1)
701 |
702 | # The first element of z is, precisely, this element (no exchanges)
703 | z = x.reshape([B, T, 1, self.E, self.F[0], N]) # The new dimension is
704 | # the one that accumulates the nExchanges
705 |
706 | # Now we start with the exchanges (multiplying by S)
707 | for k in range(1, self.nExchanges+1):
708 | # Across dim = 1 (time) we need to "displace the dimension down",
709 | # i.e. where it used to be t = 1 we now need it to be t=0 and so
710 | # on. For t=0 we add a "row" of zeros.
711 | x, _ = torch.split(x, [T-1, 1], dim = 1)
712 | # The second part is the most recent time instant which we do
713 | # not need anymore (it's used only once for the first value of K)
714 | # Now, we need to add a "row" of zeros at the beginning (for t = 0)
715 | zeroRow = torch.zeros(B, 1, self.E, self.F[0], N,
716 | dtype=x.dtype,device=x.device)
717 | x = torch.cat((zeroRow, x), dim = 1)
718 | # And now we multiply with S
719 | x = torch.matmul(x, S)
720 | # Add the dimension along K
721 | xS = x.reshape(B, T, 1, self.E, self.F[0], N)
722 | # And concatenate it with z
723 | z = torch.cat((z, xS), dim = 2)
724 |
725 | # Now, we have finally built the vector of delayed aggregations. This
726 | # vector has shape B x T x (nExchanges+1) x E x F[0] x N
727 | # To get rid of the edge features (dim E) we just sum through that
728 | # dimension
729 | z = torch.sum(z, dim = 3) # B x T x (nExchanges+1) x F[0] x N
730 | # It is, essentially, a matrix of N x (nExchanges+1) for each feature,
731 | # for each time instant, for each batch.
732 | # NOTE1: This is inconsequential if self.E = 1 (most of the cases)
733 | # NOTE2: Alternatively, not to lose information, we could contatenate
734 | # dim E after dim F[0] to get E*F[0] features; this increases the
735 | # dimensionsonality of the data (which could be fine) but need to be
736 | # adapted so that the first input in the conv1d takes self.E*self.F[0]
737 | # features instead of just self.F[0]
738 |
739 | # The operation conv1d takes tensors of shape
740 | # batchSize x nFeatures x nEntries
741 | # This means that the convolution takes place along nEntries with
742 | # a summation along nFeatures, for each of the elements along
743 | # batchSize. So we need to put (nExchanges+1) last since it is along
744 | # those elements that we want the convolution to be performed, and
745 | # we need to put F[0] as nFeatures since there is where we want the
746 | # features to be combined. The other three dimensions are different
747 | # elements (agents, time, batch) to which the convolution needs to be
748 | # applied.
749 | # Therefore, we want a vector z of shape
750 | # (B*T*N) x F[0] x (nExchanges+1)
751 |
752 | # Let's get started with this reorganization
753 | # First, we join B*T*N. Because we always join the last dimensions,
754 | # we need to permute first to put B, T, N as the last dimensions.
755 | # z: B x T x (nExchanges+1) x F[0] x N
756 | z = z.permute(3, 2, 0, 1, 4) # F[0] x (nExchanges+1) x B x T x N
757 | z = z.reshape([self.F[0], self.nExchanges+1, B*T*N])
758 | # F[0] x (nExchanges+1) x B*T*N
759 | # Second, we put it back at the beginning
760 | z = z.permute(2, 0, 1) # B*T*N x F[0] x (nExchanges+1)
761 |
762 | # Let's call the convolutional layers
763 | y = self.ConvLayers(z)
764 | # B*T*N x F[-1] x N[-1]
765 | # Flatten the output
766 | y = y.reshape([B*T*N, self.F[-1] * self.N[-1]])
767 | # And, feed it into the per node readout layers
768 | y = self.Readout(y) # (B*T*N) x dimReadout[-1]
769 | # And now we have to unpack it back for every node, i.e. to get it
770 | # back to shape B x T x N x dimReadout[-1]
771 | y = y.permute(1, 0) # dimReadout[-1] x (B*T*N)
772 | y = y.reshape(self.dimReadout[-1], B, T, N)
773 | # And finally put it back to the usual B x T x F x N
774 | y = y.permute(1, 2, 0, 3)
775 | return y
776 |
777 | def to(self, device):
778 | # Because only the filter taps and the weights are registered as
779 | # parameters, when we do a .to(device) operation it does not move the
780 | # GSOs. So we need to move them ourselves.
781 | # Call the parent .to() method (to move the registered parameters)
782 | super().to(device)
783 |
--------------------------------------------------------------------------------
/alegnn/modules/evaluation.py:
--------------------------------------------------------------------------------
1 | # 2020/02/25~
2 | # Fernando Gama, fgama@seas.upenn.edu
3 | # Luana Ruiz, rubruiz@seas.upenn.edu
4 | """
5 | evaluation.py Evaluation Module
6 |
7 | Methods for evaluating the models.
8 |
9 | evaluate: evaluate a model
10 | evaluateSingleNode: evaluate a model that has a single node forward
11 | evaluateFlocking: evaluate a model using the flocking cost
12 | """
13 |
14 | import os
15 | import torch
16 | import pickle
17 |
18 | def evaluate(model, data, **kwargs):
19 | """
20 | evaluate: evaluate a model using classification error
21 |
22 | Input:
23 | model (model class): class from Modules.model
24 | data (data class): a data class from the Utils.dataTools; it needs to
25 | have a getSamples method and an evaluate method.
26 | doPrint (optional, bool): if True prints results
27 |
28 | Output:
29 | evalVars (dict): 'errorBest' contains the error rate for the best
30 | model, and 'errorLast' contains the error rate for the last model
31 | """
32 |
33 | # Get the device we're working on
34 | device = model.device
35 |
36 | if 'doSaveVars' in kwargs.keys():
37 | doSaveVars = kwargs['doSaveVars']
38 | else:
39 | doSaveVars = True
40 |
41 | ########
42 | # DATA #
43 | ########
44 |
45 | xTest, yTest = data.getSamples('test')
46 | xTest = xTest.to(device)
47 | yTest = yTest.to(device)
48 |
49 | ##############
50 | # BEST MODEL #
51 | ##############
52 |
53 | model.load(label = 'Best')
54 |
55 | with torch.no_grad():
56 | # Process the samples
57 | yHatTest = model.archit(xTest)
58 | # yHatTest is of shape
59 | # testSize x numberOfClasses
60 | # We compute the error
61 | costBest = data.evaluate(yHatTest, yTest)
62 |
63 | ##############
64 | # LAST MODEL #
65 | ##############
66 |
67 | model.load(label = 'Last')
68 |
69 | with torch.no_grad():
70 | # Process the samples
71 | yHatTest = model.archit(xTest)
72 | # yHatTest is of shape
73 | # testSize x numberOfClasses
74 | # We compute the error
75 | costLast = data.evaluate(yHatTest, yTest)
76 |
77 | evalVars = {}
78 | evalVars['costBest'] = costBest.item()
79 | evalVars['costLast'] = costLast.item()
80 |
81 | if doSaveVars:
82 | saveDirVars = os.path.join(model.saveDir, 'evalVars')
83 | if not os.path.exists(saveDirVars):
84 | os.makedirs(saveDirVars)
85 | pathToFile = os.path.join(saveDirVars, model.name + 'evalVars.pkl')
86 | with open(pathToFile, 'wb') as evalVarsFile:
87 | pickle.dump(evalVars, evalVarsFile)
88 |
89 | return evalVars
90 |
91 | def evaluateSingleNode(model, data, **kwargs):
92 | """
93 | evaluateSingleNode: evaluate a model that has a single node forward
94 |
95 | Input:
96 | model (model class): class from Modules.model, needs to have a
97 | 'singleNodeForward' method
98 | data (data class): a data class from the Utils.dataTools; it needs to
99 | have a getSamples method and an evaluate method and it also needs to
100 | have a 'getLabelID' method
101 | doPrint (optional, bool): if True prints results
102 |
103 | Output:
104 | evalVars (dict): 'errorBest' contains the error rate for the best
105 | model, and 'errorLast' contains the error rate for the last model
106 | """
107 |
108 | assert 'singleNodeForward' in dir(model.archit)
109 | assert 'getLabelID' in dir(data)
110 |
111 | # Get the device we're working on
112 | device = model.device
113 |
114 | if 'doSaveVars' in kwargs.keys():
115 | doSaveVars = kwargs['doSaveVars']
116 | else:
117 | doSaveVars = True
118 |
119 | ########
120 | # DATA #
121 | ########
122 |
123 | xTest, yTest = data.getSamples('test')
124 | xTest = xTest.to(device)
125 | yTest = yTest.to(device)
126 | targetIDs = data.getLabelID('test')
127 |
128 | ##############
129 | # BEST MODEL #
130 | ##############
131 |
132 | model.load(label = 'Best')
133 |
134 | with torch.no_grad():
135 | # Process the samples
136 | yHatTest = model.archit.singleNodeForward(xTest, targetIDs)
137 | # yHatTest is of shape
138 | # testSize x numberOfClasses
139 | # We compute the error
140 | costBest = data.evaluate(yHatTest, yTest)
141 |
142 | ##############
143 | # LAST MODEL #
144 | ##############
145 |
146 | model.load(label = 'Last')
147 |
148 | with torch.no_grad():
149 | # Process the samples
150 | yHatTest = model.archit.singleNodeForward(xTest, targetIDs)
151 | # yHatTest is of shape
152 | # testSize x numberOfClasses
153 | # We compute the error
154 | costLast = data.evaluate(yHatTest, yTest)
155 |
156 | evalVars = {}
157 | evalVars['costBest'] = costBest.item()
158 | evalVars['costLast'] = costLast.item()
159 |
160 | if doSaveVars:
161 | saveDirVars = os.path.join(model.saveDir, 'evalVars')
162 | if not os.path.exists(saveDirVars):
163 | os.makedirs(saveDirVars)
164 | pathToFile = os.path.join(saveDirVars, model.name + 'evalVars.pkl')
165 | with open(pathToFile, 'wb') as evalVarsFile:
166 | pickle.dump(evalVars, evalVarsFile)
167 |
168 | return evalVars
169 |
170 | def evaluateFlocking(model, data, **kwargs):
171 | """
172 | evaluateClassif: evaluate a model using the flocking cost of velocity
173 | variacne of the team
174 |
175 | Input:
176 | model (model class): class from Modules.model
177 | data (data class): the data class that generates the flocking data
178 | doPrint (optional; bool, default: True): if True prints results
179 | nVideos (optional; int, default: 3): number of videos to save
180 | graphNo (optional): identify the run with a number
181 | realizationNo (optional): identify the run with another number
182 |
183 | Output:
184 | evalVars (dict):
185 | 'costBestFull': cost of the best model over the full trajectory
186 | 'costBestEnd': cost of the best model at the end of the trajectory
187 | 'costLastFull': cost of the last model over the full trajectory
188 | 'costLastEnd': cost of the last model at the end of the trajectory
189 | """
190 |
191 | if 'doPrint' in kwargs.keys():
192 | doPrint = kwargs['doPrint']
193 | else:
194 | doPrint = True
195 |
196 | if 'nVideos' in kwargs.keys():
197 | nVideos = kwargs['nVideos']
198 | else:
199 | nVideos = 3
200 |
201 | if 'graphNo' in kwargs.keys():
202 | graphNo = kwargs['graphNo']
203 | else:
204 | graphNo = -1
205 |
206 | if 'realizationNo' in kwargs.keys():
207 | if 'graphNo' in kwargs.keys():
208 | realizationNo = kwargs['realizationNo']
209 | else:
210 | graphNo = kwargs['realizationNo']
211 | realizationNo = -1
212 | else:
213 | realizationNo = -1
214 |
215 | #\\\\\\\\\\\\\\\\\\\\
216 | #\\\ TRAJECTORIES \\\
217 | #\\\\\\\\\\\\\\\\\\\\
218 |
219 | ########
220 | # DATA #
221 | ########
222 |
223 | # Initial data
224 | initPosTest = data.getData('initPos', 'test')
225 | initVelTest = data.getData('initVel', 'test')
226 |
227 | ##############
228 | # BEST MODEL #
229 | ##############
230 |
231 | model.load(label = 'Best')
232 |
233 | if doPrint:
234 | print("\tComputing learned trajectory for best model...",
235 | end = ' ', flush = True)
236 |
237 | posTestBest, \
238 | velTestBest, \
239 | accelTestBest, \
240 | stateTestBest, \
241 | commGraphTestBest = \
242 | data.computeTrajectory(initPosTest, initVelTest, data.duration,
243 | archit = model.archit)
244 |
245 | if doPrint:
246 | print("OK")
247 |
248 | ##############
249 | # LAST MODEL #
250 | ##############
251 |
252 | model.load(label = 'Last')
253 |
254 | if doPrint:
255 | print("\tComputing learned trajectory for last model...",
256 | end = ' ', flush = True)
257 |
258 | posTestLast, \
259 | velTestLast, \
260 | accelTestLast, \
261 | stateTestLast, \
262 | commGraphTestLast = \
263 | data.computeTrajectory(initPosTest, initVelTest, data.duration,
264 | archit = model.archit)
265 |
266 | if doPrint:
267 | print("OK")
268 |
269 | ###########
270 | # PREVIEW #
271 | ###########
272 |
273 | learnedTrajectoriesDir = os.path.join(model.saveDir,
274 | 'learnedTrajectories')
275 |
276 | if not os.path.exists(learnedTrajectoriesDir):
277 | os.mkdir(learnedTrajectoriesDir)
278 |
279 | if graphNo > -1:
280 | learnedTrajectoriesDir = os.path.join(learnedTrajectoriesDir,
281 | '%03d' % graphNo)
282 | if not os.path.exists(learnedTrajectoriesDir):
283 | os.mkdir(learnedTrajectoriesDir)
284 | if realizationNo > -1:
285 | learnedTrajectoriesDir = os.path.join(learnedTrajectoriesDir,
286 | '%03d' % realizationNo)
287 | if not os.path.exists(learnedTrajectoriesDir):
288 | os.mkdir(learnedTrajectoriesDir)
289 |
290 | learnedTrajectoriesDir = os.path.join(learnedTrajectoriesDir, model.name)
291 |
292 | if not os.path.exists(learnedTrajectoriesDir):
293 | os.mkdir(learnedTrajectoriesDir)
294 |
295 | if doPrint:
296 | print("\tPreview data...",
297 | end = ' ', flush = True)
298 |
299 | data.saveVideo(os.path.join(learnedTrajectoriesDir,'Best'),
300 | posTestBest,
301 | nVideos,
302 | commGraph = commGraphTestBest,
303 | vel = velTestBest,
304 | videoSpeed = 0.5,
305 | doPrint = False)
306 |
307 | data.saveVideo(os.path.join(learnedTrajectoriesDir,'Last'),
308 | posTestLast,
309 | nVideos,
310 | commGraph = commGraphTestLast,
311 | vel = velTestLast,
312 | videoSpeed = 0.5,
313 | doPrint = False)
314 |
315 | if doPrint:
316 | print("OK", flush = True)
317 |
318 | #\\\\\\\\\\\\\\\\\\
319 | #\\\ EVALUATION \\\
320 | #\\\\\\\\\\\\\\\\\\
321 |
322 | evalVars = {}
323 | evalVars['costBestFull'] = data.evaluate(vel = velTestBest)
324 | evalVars['costBestEnd'] = data.evaluate(vel = velTestBest[:,-1:,:,:])
325 | evalVars['costLastFull'] = data.evaluate(vel = velTestLast)
326 | evalVars['costLastEnd'] = data.evaluate(vel = velTestLast[:,-1:,:,:])
327 |
328 | return evalVars
--------------------------------------------------------------------------------
/alegnn/modules/loss.py:
--------------------------------------------------------------------------------
1 | # 2021/03/04~
2 | # Fernando Gama, fgama@seas.upenn.edu
3 | # Luana Ruiz, rubruiz@seas.upenn.edu
4 | """
5 | loss.py Loss functions
6 |
7 | adaptExtraDimensionLoss: wrapper that handles extra dimensions
8 | F1Score: loss function corresponding to 1 - F1 score
9 | """
10 |
11 | import torch
12 | import torch.nn as nn
13 |
14 | # An arbitrary loss function handling penalties needs to have the following
15 | # conditions
16 | # .penaltyList attribute listing the names of the penalties
17 | # .nPenalties attibute is an int with the number of penalties
18 | # Forward function has to output the actual loss, the main loss (with no
19 | # penalties), and a dictionary with the value of each of the penalties.
20 | # This will be standard procedure for all loss functions that have penalties.
21 | # Note: The existence of a penalty will be signaled by an attribute in the model
22 |
23 | class adaptExtraDimensionLoss(nn.modules.loss._Loss):
24 | """
25 | adaptExtraDimensionLoss: wrapper that handles extra dimensions
26 |
27 | Some loss functions take vectors as inputs while others take scalars; if we
28 | input a one-dimensional vector instead of a scalar, although virtually the
29 | same, the loss function could complain.
30 |
31 | The output of the GNNs is, by default, a vector. And sometimes we want it
32 | to still be a vector (i.e. crossEntropyLoss where we output a one-hot
33 | vector) and sometimes we want it to be treated as a scalar (i.e. MSELoss).
34 | Since we still have a single training function to train multiple models, we
35 | do not know whether we will have a scalar or a vector. So this wrapper
36 | adapts the input to the loss function seamlessly.
37 |
38 | Eventually, more loss functions could be added to the code below to better
39 | handle their dimensions.
40 |
41 | Initialization:
42 |
43 | Input:
44 | lossFunction (torch.nn loss function): desired loss function
45 | arguments: arguments required to initialize the loss function
46 | >> Obs.: The loss function gets initialized as well
47 |
48 | Forward:
49 | Input:
50 | estimate (torch.tensor): output of the GNN
51 | target (torch.tensor): target representation
52 | """
53 |
54 | # When we want to compare scalars, we will have a B x 1 output of the GNN,
55 | # since the number of features is always there. However, most of the scalar
56 | # comparative functions take just a B vector, so we have an extra 1 dim
57 | # that raises a warning. This container will simply get rid of it.
58 |
59 | # This allows to change loss from crossEntropy (class based, expecting
60 | # B x C input) to MSE or SmoothL1Loss (expecting B input)
61 |
62 | def __init__(self, lossFunction, *args):
63 | # The second argument is optional and it is if there are any extra
64 | # arguments with which we want to initialize the loss
65 |
66 | super().__init__()
67 |
68 | if len(args) > 0:
69 | self.loss = lossFunction(*args) # Initialize loss function
70 | else:
71 | self.loss = lossFunction()
72 |
73 | def forward(self, estimate, target):
74 |
75 | # What we're doing here is checking what kind of loss it is and
76 | # what kind of reshape we have to do on the estimate
77 |
78 | if 'CrossEntropyLoss' in repr(self.loss):
79 | # This is supposed to be a one-hot vector batchSize x nClasses
80 | assert len(estimate.shape) == 2
81 | elif 'SmoothL1Loss' in repr(self.loss) \
82 | or 'MSELoss' in repr(self.loss) \
83 | or 'L1Loss' in repr(self.loss):
84 | # In this case, the estimate has to be a batchSize tensor, so if
85 | # it has two dimensions, the second dimension has to be 1
86 | if len(estimate.shape) == 2:
87 | assert estimate.shape[1] == 1
88 | estimate = estimate.squeeze(1)
89 | assert len(estimate.shape) == 1
90 |
91 | return self.loss(estimate, target)
92 |
93 | def F1Score(yHat, y):
94 | # Luana R. Ruiz, rubruiz@seas.upenn.edu, 2021/03/04
95 | dimensions = len(yHat.shape)
96 | C = yHat.shape[dimensions-2]
97 | N = yHat.shape[dimensions-1]
98 | yHat = yHat.reshape((-1,C,N))
99 | yHat = torch.nn.functional.log_softmax(yHat, dim=1)
100 | yHat = torch.exp(yHat)
101 | yHat = yHat[:,1,:]
102 | y = y.reshape((-1,N))
103 |
104 | tp = torch.sum(y*yHat,1)
105 | #tn = torch.sum((1-y)*(1-yHat),1)
106 | fp = torch.sum((1-y)*yHat,1)
107 | fn = torch.sum(y*(1-yHat),1)
108 |
109 | p = tp / (tp + fp)
110 | r = tp / (tp + fn)
111 |
112 | idx_p = p!=p
113 | idx_tp = tp==0
114 | idx_p1 = idx_p*idx_tp
115 | p[idx_p] = 0
116 | p[idx_p1] = 1
117 | idx_r = r!=r
118 | idx_r1 = idx_r*idx_tp
119 | r[idx_r] = 0
120 | r[idx_r1] = 1
121 |
122 | f1 = 2*p*r / (p+r)
123 | f1[f1!=f1] = 0
124 |
125 | return 1 - torch.mean(f1)
--------------------------------------------------------------------------------
/alegnn/modules/model.py:
--------------------------------------------------------------------------------
1 | # 2018/10/02~
2 | # Fernando Gama, fgama@seas.upenn.edu
3 | # Luana Ruiz, rubruiz@seas.upenn.edu
4 | """
5 | model.py Model Module
6 |
7 | Utilities useful for working on the model
8 |
9 | Model: binds together the architecture, the loss function, the optimizer,
10 | the trainer, and the evaluator.
11 | """
12 |
13 | import os
14 | import torch
15 |
16 | class Model:
17 | """
18 | Model: binds together the architecture, the loss function, the optimizer,
19 | the trainer, and the evaluator.
20 |
21 | Initialization:
22 |
23 | architecture (nn.Module)
24 | loss (nn.modules.loss._Loss)
25 | optimizer (nn.optim)
26 | trainer (Modules.training)
27 | evaluator (Modules.evaluation)
28 | device (string or device)
29 | name (string)
30 | saveDir (string or path)
31 |
32 | .train(data, nEpochs, batchSize, **kwargs): train the model for nEpochs
33 | epochs, using batches of size batchSize and running over data data
34 | class; see the specific selected trainer for extra options
35 |
36 | .evaluate(data): evaluate the model over data data class; see the specific
37 | selected evaluator for extra options
38 |
39 | .save(label = '', [saveDir=dirPath]): save the model parameters under the
40 | name given by label, if the saveDir is different from the one specified
41 | in the initialization, it needs to be specified now
42 |
43 | .load(label = '', [loadFiles=(architLoadFile, optimLoadFile)]): loads the
44 | model parameters under the specified name inside the specific saveDir,
45 | unless they are provided externally through the keyword 'loadFiles'.
46 |
47 | .getTrainingOptions(): get a dict with the options used during training; it
48 | returns None if it hasn't been trained yet.'
49 | """
50 |
51 | def __init__(self,
52 | # Architecture (nn.Module)
53 | architecture,
54 | # Loss Function (nn.modules.loss._Loss)
55 | loss,
56 | # Optimization Algorithm (nn.optim)
57 | optimizer,
58 | # Training Algorithm (Modules.training)
59 | trainer,
60 | # Evaluating Algorithm (Modules.evaluation)
61 | evaluator,
62 | # Other
63 | device, name, saveDir):
64 |
65 | #\\\ ARCHITECTURE
66 | # Store
67 | self.archit = architecture
68 | # Move it to device
69 | self.archit.to(device)
70 | # Count parameters (doesn't work for EdgeVarying)
71 | self.nParameters = 0
72 | for param in list(self.archit.parameters()):
73 | if len(param.shape)>0:
74 | thisNParam = 1
75 | for p in range(len(param.shape)):
76 | thisNParam *= param.shape[p]
77 | self.nParameters += thisNParam
78 | else:
79 | pass
80 | #\\\ LOSS FUNCTION
81 | self.loss = loss
82 | #\\\ OPTIMIZATION ALGORITHM
83 | self.optim = optimizer
84 | #\\\ TRAINING ALGORITHM
85 | self.trainer = trainer
86 | #\\\ EVALUATING ALGORITHM
87 | self.evaluator = evaluator
88 | #\\\ OTHER
89 | # Device
90 | self.device = device
91 | # Model name
92 | self.name = name
93 | # Saving directory
94 | self.saveDir = saveDir
95 |
96 | def train(self, data, nEpochs, batchSize, **kwargs):
97 |
98 | self.trainer = self.trainer(self, data, nEpochs, batchSize, **kwargs)
99 |
100 | return self.trainer.train()
101 |
102 | def evaluate(self, data, **kwargs):
103 |
104 | return self.evaluator(self, data, **kwargs)
105 |
106 | def save(self, label = '', **kwargs):
107 | if 'saveDir' in kwargs.keys():
108 | saveDir = kwargs['saveDir']
109 | else:
110 | saveDir = self.saveDir
111 | saveModelDir = os.path.join(saveDir,'savedModels')
112 | # Create directory savedModels if it doesn't exist yet:
113 | if not os.path.exists(saveModelDir):
114 | os.makedirs(saveModelDir)
115 | saveFile = os.path.join(saveModelDir, self.name)
116 | torch.save(self.archit.state_dict(), saveFile+'Archit'+ label+'.ckpt')
117 | torch.save(self.optim.state_dict(), saveFile+'Optim'+label+'.ckpt')
118 |
119 | def load(self, label = '', **kwargs):
120 | if 'loadFiles' in kwargs.keys():
121 | (architLoadFile, optimLoadFile) = kwargs['loadFiles']
122 | else:
123 | saveModelDir = os.path.join(self.saveDir,'savedModels')
124 | architLoadFile = os.path.join(saveModelDir,
125 | self.name + 'Archit' + label +'.ckpt')
126 | optimLoadFile = os.path.join(saveModelDir,
127 | self.name + 'Optim' + label + '.ckpt')
128 | self.archit.load_state_dict(torch.load(architLoadFile))
129 | self.optim.load_state_dict(torch.load(optimLoadFile))
130 |
131 | def getTrainingOptions(self):
132 |
133 | return self.trainer.trainingOptions \
134 | if 'trainingOptions' in dir(self.trainer) \
135 | else None
136 |
137 | def __repr__(self):
138 | reprString = "Name: %s\n" % (self.name)
139 | reprString += "Number of learnable parameters: %d\n"%(self.nParameters)
140 | reprString += "\n"
141 | reprString += "Model architecture:\n"
142 | reprString += "----- -------------\n"
143 | reprString += "\n"
144 | reprString += repr(self.archit) + "\n"
145 | reprString += "\n"
146 | reprString += "Loss function:\n"
147 | reprString += "---- ---------\n"
148 | reprString += "\n"
149 | reprString += repr(self.loss) + "\n"
150 | reprString += "\n"
151 | reprString += "Optimizer:\n"
152 | reprString += "----------\n"
153 | reprString += "\n"
154 | reprString += repr(self.optim) + "\n"
155 | reprString += "Training algorithm:\n"
156 | reprString += "-------- ----------\n"
157 | reprString += "\n"
158 | reprString += repr(self.trainer) + "\n"
159 | reprString += "Evaluation algorithm:\n"
160 | reprString += "---------- ----------\n"
161 | reprString += "\n"
162 | reprString += repr(self.evaluator) + "\n"
163 | return reprString
164 |
--------------------------------------------------------------------------------
/alegnn/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alelab-upenn/graph-neural-networks/a84a39fabad5378bdcbaad20b5dbcff14b4eebcd/alegnn/utils/__init__.py
--------------------------------------------------------------------------------
/alegnn/utils/miscTools.py:
--------------------------------------------------------------------------------
1 | # 2018/10/15~
2 | # Fernando Gama, fgama@seas.upenn.edu.
3 | # Luana Ruiz, rubruiz@seas.upenn.edu.
4 | """
5 | miscTools Miscellaneous Tools module
6 |
7 | num2filename: change a numerical value into a string usable as a filename
8 | saveSeed: save the random state of generators
9 | loadSeed: load the number of random state of generators
10 | writeVarValues: write the specified values in the specified txt file
11 | """
12 |
13 | import os
14 | import pickle
15 | import numpy as np
16 | import torch
17 |
18 | def num2filename(x,d):
19 | """
20 | Takes a number and returns a string with the value of the number, but in a
21 | format that is writable into a filename.
22 |
23 | s = num2filename(x,d) Gets rid of decimal points which are usually
24 | inconvenient to have in a filename.
25 | If the number x is an integer, then s = str(int(x)).
26 | If the number x is a decimal number, then it replaces the '.' by the
27 | character specified by d. Setting d = '' erases the decimal point,
28 | setting d = '.' simply returns a string with the exact same number.
29 |
30 | Example:
31 | >> num2filename(2,'d')
32 | >> '2'
33 |
34 | >> num2filename(3.1415,'d')
35 | >> '3d1415'
36 |
37 | >> num2filename(3.1415,'')
38 | >> '31415'
39 |
40 | >> num2filename(3.1415,'.')
41 | >> '3.1415'
42 | """
43 | if x == int(x):
44 | return str(int(x))
45 | else:
46 | return str(x).replace('.',d)
47 |
48 | def saveSeed(randomStates, saveDir):
49 | """
50 | Takes a list of dictionaries of random generator states of different modules
51 | and saves them in a .pkl format.
52 |
53 | Inputs:
54 | randomStates (list): The length of this list is equal to the number of
55 | modules whose states want to be saved (torch, numpy, etc.). Each
56 | element in this list is a dictionary. The dictionary has three keys:
57 | 'module' with the name of the module in string format ('numpy' or
58 | 'torch', for example), 'state' with the saved generator state and,
59 | if corresponds, 'seed' with the specific seed for the generator
60 | (note that torch has both state and seed, but numpy only has state)
61 | saveDir (path): where to save the seed, it will be saved under the
62 | filename 'randomSeedUsed.pkl'
63 | """
64 | pathToSeed = os.path.join(saveDir, 'randomSeedUsed.pkl')
65 | with open(pathToSeed, 'wb') as seedFile:
66 | pickle.dump({'randomStates': randomStates}, seedFile)
67 |
68 | def loadSeed(loadDir):
69 | """
70 | Loads the states and seed saved in a specified path
71 |
72 | Inputs:
73 | loadDir (path): where to look for thee seed to load; it is expected that
74 | the appropriate file within loadDir is named 'randomSeedUsed.pkl'
75 |
76 | Obs.: The file 'randomSeedUsed.pkl' should contain a list structured as
77 | follows. The length of this list is equal to the number of modules whose
78 | states were saved (torch, numpy, etc.). Each element in this list is a
79 | dictionary. The dictionary has three keys: 'module' with the name of
80 | the module in string format ('numpy' or 'torch', for example), 'state'
81 | with the saved generator state and, if corresponds, 'seed' with the
82 | specific seed for the generator (note that torch has both state and
83 | seed, but numpy only has state)
84 | """
85 | pathToSeed = os.path.join(loadDir, 'randomSeedUsed.pkl')
86 | with open(pathToSeed, 'rb') as seedFile:
87 | randomStates = pickle.load(seedFile)
88 | randomStates = randomStates['randomStates']
89 | for module in randomStates:
90 | thisModule = module['module']
91 | if thisModule == 'numpy':
92 | np.random.RandomState().set_state(module['state'])
93 | elif thisModule == 'torch':
94 | torch.set_rng_state(module['state'])
95 | torch.manual_seed(module['seed'])
96 |
97 |
98 | def writeVarValues(fileToWrite, varValues):
99 | """
100 | Write the value of several string variables specified by a dictionary into
101 | the designated .txt file.
102 |
103 | Input:
104 | fileToWrite (os.path): text file to save the specified variables
105 | varValues (dictionary): values to save in the text file. They are
106 | saved in the format "key = value".
107 | """
108 | with open(fileToWrite, 'a+') as file:
109 | for key in varValues.keys():
110 | file.write('%s = %s\n' % (key, varValues[key]))
111 | file.write('\n')
112 |
--------------------------------------------------------------------------------
/alegnn/utils/visualTools.py:
--------------------------------------------------------------------------------
1 | # 2019/01/21~2018/07/12
2 | # This function is taken almost verbatim from https://github.com/amaiasalvador
3 | # and all credit should go to Amaia Salvador.
4 |
5 | import os
6 | import glob
7 | import torchvision.utils as vutils
8 | from operator import itemgetter
9 | from tensorboardX import SummaryWriter
10 |
11 | class Visualizer():
12 | def __init__(self, checkpoints_dir, name):
13 | self.win_size = 256
14 | self.name = name
15 | self.saved = False
16 | self.checkpoints_dir = checkpoints_dir
17 | self.ncols = 4
18 |
19 | # remove existing
20 | for filename in glob.glob(self.checkpoints_dir+"/events*"):
21 | os.remove(filename)
22 | self.writer = SummaryWriter(checkpoints_dir)
23 |
24 | def reset(self):
25 | self.saved = False
26 |
27 | # images: (b, c, 0, 1) array of images
28 | def image_summary(self, mode, epoch, images):
29 | images = vutils.make_grid(images, normalize=True, scale_each=True)
30 | self.writer.add_image('{}/Image'.format(mode), images, epoch)
31 |
32 | # figure (for matplotlib figures)
33 | def figure_summary(self, mode, epoch, fig):
34 | self.writer.add_figure('{}/Figure'.format(mode), fig, epoch)
35 |
36 | # text: type: ingredients/recipe
37 | def text_summary(self, mode, epoch, type, text, vocabulary, gt=True, max_length=20):
38 | for i, el in enumerate(text): # text_list
39 | if not gt: # we are printing a sample
40 | idx = el.nonzero().squeeze() + 1
41 | else:
42 | idx = el # we are printing the ground truth
43 |
44 | words_list = itemgetter(*idx)(vocabulary)
45 |
46 | if len(words_list) <= max_length:
47 | self.writer.add_text('{}/{}_{}_{}'.format(mode, type, i, 'gt' if gt else 'prediction'), ', '.join(filter(lambda x: x != '