├── Datasets ├── guangzhou_speed.npy └── portland_volume.npy ├── Examples ├── Toy examples.ipynb └── test.ipynb ├── Figures ├── algorithm1.png ├── algorithm2.png ├── missing patterns.png ├── norm_compare.png └── objective.png ├── Helper.py ├── Imputer.py ├── LICENSE └── README.md /Datasets/guangzhou_speed.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tongnie/tensorlib/1515a6307dea0c28c742d594c7eed013fb63cca1/Datasets/guangzhou_speed.npy -------------------------------------------------------------------------------- /Datasets/portland_volume.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tongnie/tensorlib/1515a6307dea0c28c742d594c7eed013fb63cca1/Datasets/portland_volume.npy -------------------------------------------------------------------------------- /Examples/Toy examples.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import pandas as pd\n", 11 | "import matplotlib.pyplot as plt\n", 12 | "import IPython\n", 13 | "import time\n", 14 | "import random\n", 15 | "import Helper as helper\n", 16 | "from Imputer import LRTC_TSpN\n", 17 | "\n", 18 | "plt.rcParams['figure.figsize'] = (10,6)\n", 19 | "%matplotlib inline\n", 20 | "IPython.display.set_matplotlib_formats('svg')" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "This notebook gives a toy example to show how to implement LRTC-TSpN (low-rank tensor completion based on truncated tensor norm) on two small-size traffic flow data. Users can adopt this model to any spatial-temporal traffic data. For more detailted discussion about LRTC-TSpN, please see [1]. More details please refer to our GitHub repository [**tensorlib - GitHub**](https://github.com/tongnie/tensorlib).\n", 28 | "\n", 29 | "

\n", 30 | "\n", 31 | "[1] Tong Nie, Guoyang Qin, Jian Sun (2022). Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns. arXiv.2205.09390 [PDF] \n", 32 | "\n", 33 | "

" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "## Preparation\n", 41 | "### Third-order Tensor Structure\n", 42 | "\n", 43 | "We organize the multivariate traffic time series as a third-order\n", 44 | "tensor structure, i.e. $time~intervals×locations~(sensors)×days$. This three-dimensional data structure simultaneously\n", 45 | "captures the integrated spatial-temporal information, thus making it more efficient to impute missing values.\n", 46 | "\n", 47 | "### Spatial-temporal traffic sensor data\n", 48 | "\n", 49 | "In this notebook, we conduct data imputation on the following two small subsets of traffic speed and volume datasets, the original data can be found at our GitHub repository. [**tensorlib - GitHub**](https://github.com/tongnie/tensorlib).\n", 50 | "- **Guangzhou-small:** This is an urban traffic speed data set which consists of 214 road segments within two months (i.e., 61 days from August 1, 2016 to September 30, 2016) at 10-minute interval, in Guangzhou, China. We only use the speed data with the first 50 locations and the first 15 days. The size is (144 × 50 × 15). \n", 51 | "- **Portland-small:** This data set consists of link volume collected from highways in Portland, which contains 1156 loop detectors within one month at 15-minute interval. Volume data with the first 80 locations and the first 15 days are used. The size is (96 × 80 × 15).\n", 52 | "\n", 53 | "### Complicated missing patterns\n", 54 | "Besides the element-wise random missing case, we define three structured fiber mode-$n$ missing scenarios, which are generated through the two-by-two combinations of tensor mode-$n$ fibers. This can be described as: \n", 55 | "- **’Intervals’ mode fiber-like missing (FM-0)**, which illustrates a temporal missing pattern, is caused by adverse weather, breakdown of wireless connections or apparatus maintenance; \n", 56 | "- **’Locations’ mode fiber-like missing (FM-1)**, which denotes a spatial missing pattern, can be explained by lack of electricity for successive sensors or malfunction of Internet Data Center; \n", 57 | "- **’Days’ mode fiber-like missing (FM-2)** illuminates a spatial-temporal mixture missing situation that they are offline (do not operate) at regular time intervals everyday for specific sensors." 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "Load the Guangzhou speed dataset" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 2, 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "name": "stdout", 74 | "output_type": "stream", 75 | "text": [ 76 | "Random missing rate of tensor is：51.00%\n" 77 | ] 78 | } 79 | ], 80 | "source": [ 81 | "#Random missing pattern\n", 82 | "speed_tensor = np.load('../Datasets/guangzhou_speed.npy')\n", 83 | "\n", 84 | "random.seed(123)\n", 85 | "speed_tensor_lost = helper.generate_tensor_random_missing(speed_tensor,lost_rate=0.5)\n", 86 | "tensor_miss_rate = helper.get_missing_rate(speed_tensor_lost)\n", 87 | "print(f'Random missing rate of tensor is：{100*tensor_miss_rate:.2f}%')" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "Generate three Fiber-mode missing patterns" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 3, 100 | "metadata": {}, 101 | "outputs": [ 102 | { 103 | "name": "stdout", 104 | "output_type": "stream", 105 | "text": [ 106 | "fiber-mode0 missing rate of tensor is：50.93%\n" 107 | ] 108 | } 109 | ], 110 | "source": [ 111 | "#Fiber mode-0 missing\n", 112 | "random.seed(123)\n", 113 | "speed_tensor_lost_fiber0 = helper.generate_fiber_missing(speed_tensor,lost_rate=0.5,mode=0)" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 4, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "name": "stdout", 123 | "output_type": "stream", 124 | "text": [ 125 | "fiber-mode1 missing rate of tensor is：51.00%\n" 126 | ] 127 | } 128 | ], 129 | "source": [ 130 | "#Fiber mode-1 missing\n", 131 | "random.seed(123)\n", 132 | "speed_tensor_lost_fiber1 = helper.generate_fiber_missing(speed_tensor,lost_rate=0.5,mode=1)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 5, 138 | "metadata": {}, 139 | "outputs": [ 140 | { 141 | "name": "stdout", 142 | "output_type": "stream", 143 | "text": [ 144 | "fiber-mode2 missing rate of tensor is：50.89%\n" 145 | ] 146 | } 147 | ], 148 | "source": [ 149 | "#Fiber mode-2 missing\n", 150 | "random.seed(123)\n", 151 | "speed_tensor_lost_fiber2 = helper.generate_fiber_missing(speed_tensor,lost_rate=0.5,mode=2)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 6, 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "name": "stdout", 161 | "output_type": "stream", 162 | "text": [ 163 | "TSp_ADMM Iteration: \n", 164 | " Processing loop 52\n", 165 | " total iterations = 52 error=0.0008558335667463486\n", 166 | "LRTC-TSpN imptation MAE = 2.832\n", 167 | "LRTC-TSpN imputation RMSE = 4.261\n" 168 | ] 169 | }, 170 | { 171 | "data": { 172 | "image/svg+xml": [ 173 | "\n", 174 | "\n", 176 | "\n" 1069 | ], 1070 | "text/plain": [ 1071 | "