├── 3d_overview.jpg ├── DBF-Net.png ├── Data1D.png ├── Data3D.png ├── Env-Data.png ├── README.md └── read_TCND.py /3d_overview.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaochengfuhuo/TropiCycloneNet-Dataset/b2bad7335f9d8b2698abe78579b3f28bb348f562/3d_overview.jpg -------------------------------------------------------------------------------- /DBF-Net.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaochengfuhuo/TropiCycloneNet-Dataset/b2bad7335f9d8b2698abe78579b3f28bb348f562/DBF-Net.png -------------------------------------------------------------------------------- /Data1D.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaochengfuhuo/TropiCycloneNet-Dataset/b2bad7335f9d8b2698abe78579b3f28bb348f562/Data1D.png -------------------------------------------------------------------------------- /Data3D.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaochengfuhuo/TropiCycloneNet-Dataset/b2bad7335f9d8b2698abe78579b3f28bb348f562/Data3D.png -------------------------------------------------------------------------------- /Env-Data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xiaochengfuhuo/TropiCycloneNet-Dataset/b2bad7335f9d8b2698abe78579b3f28bb348f562/Env-Data.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TropiCycloneNet-Dataset 2 | 3 | ## Overview 4 | 5 | This project introduces the details of the **TropiCycloneNet Dataset** ($TCN_D$), a comprehensive dataset for studying tropical cyclones (TCs). It includes $Data_{1d}$, $Data_{3d}$, and **environmental data (*Env-Data*)** collected for various tropical cyclones in the major oceanic regions from 1950 to 2023. The dataset is aimed at providing valuable insights for tropical cyclone research and predictive modeling. 6 | 7 | ## Download 8 | 9 | We offer two download options for the **TropiCycloneNet Dataset** ($TCN_D$): 10 | 11 | - **Full Dataset**: Contains tropical cyclone data from 1950 to 2023, across six major oceans. The data size is approximately `25.7 GB`. 12 | - [Full Dataset](https://drive.google.com/file/d/1BUAab0OYyiArbraQHu2oMoj_jF-nNUxT/view?usp=sharing) 13 | 14 | - **Test Subset**: A smaller subset of data from 2017 to 2023, intended for testing purposes. The data size is approximately `3.34 GB`. 15 | - [Test Subset](https://drive.google.com/file/d/1Xx2rzH6ztSGLTUR9EZkfDz5mQHsvhHsi/view?usp=sharing) 16 | 17 | More download options will be added in the future, such as downloading by ocean region, by year, etc. 18 | 19 | ## Check Data 20 | 21 | We provide code to read and visualize $Data_{1d}$, $Data_{3d}$, and *Env-Data*. Researchers can flexibly use our dataset and visualize different types of data using the provided scripts. 22 | 23 | ### Steps to Use the Dataset 24 | 25 | 1. **Download and Extract the Dataset**: 26 | - Download and extract the $TCN_D$ dataset to your desired path (`dataset_path`). 27 | 28 | 2. **Set Up the Environment**: 29 | - Install Python 3.7 and the necessary dependencies: 30 | 31 | ```bash 32 | pip install netCDF4==1.5.8 33 | pip install matplotlib==3.5.3 34 | pip install pandas==1.1.1 35 | pip install numpy==1.19.0 36 | ``` 37 | 38 | 3. **Run the Code**: 39 | - After setting up the environment, run the `read_TCND.py` script: 40 | 41 | ```bash 42 | python read_TCND.py dataset_path TC_name TC_date area train_val_test 43 | ``` 44 | 45 | Here: 46 | - `dataset_path` refers to the path where the dataset is located. 47 | - `TC_name` is the name of the tropical cyclone you wish to examine. 48 | - `TC_date` is the specific date and time of the cyclone in `YYYYMMDDHH` format. 49 | - `area` specifies the ocean region where the cyclone occurred (e.g., WP for Western Pacific, EP for Eastern Pacific, etc.). 50 | - `train_val_test` indicates whether the queried typhoon belongs to training, validation, or test set (train or val or test). 51 | 52 | - After running the script, you will find visualized images of **Data1D**, **Data3D**, and **Env-Data** in the current directory. The images will be named `Data1D.png`, `Data3D.png`, and `Env-Data.png`. 53 | 54 | ## Visualization Example 55 | 56 | ### Visualizing All Data 57 | 58 | 1. **Get Details for $Data_{3d}$**: 59 | ![Data1D Example](3d_overview.jpg) 60 | 61 | The **3D data** covers the **20° x 20° region** around the tropical cyclone's center. The spatial resolution is **0.25°**, and the temporal resolution is **6 hours**. We collect **Geopotential Height (GPH)**, **U-component of wind**, and **V-component of wind** at **200 hPa, 500 hPa, 850 hPa**, and **925 hPa** pressure levels. **Sea Surface Temperature (SST)** data is also included in the **Data3D** set. 62 | 63 | 3. **Example of $Data_{1d}$, $Data_{3d}$, and *Env-Data***: 64 | The following command visualizes the tropical cyclone data for a specific time (`2001101418` for `Haiyan` in the Western Pacific region): 65 | 66 | ```bash 67 | python read_TCND.py dataset_path Haiyan 2001101418 WP train 68 | ``` 69 | 70 | After running the script, you will see the corresponding cyclone **Data1D**, **Data3D**, and **Env-Data** visualizations. 71 | 72 | ### Examples of $Data_{3d}$: 73 | ![3D Data Example](Data3D.png) 74 | 75 | We crop the data covering a **20° x 20°** region around the TC center. The spatial resolution is **0.25°**, and the time resolution is **6 hours**. We collect **Geopotential Height (GPH)**, **U-component of wind**, and **V-component of wind** at **200 hPa**, **500 hPa**, **850 hPa**, and **925 hPa** pressure levels. **Sea Surface Temperature (SST)** data is also included in the **Data3D** set. 76 | 77 | ### Examples of $Data_{1d}$: 78 | ![Data1D Example](Data1D.png) 79 | 80 | The bolded content in the first row of the figure represents some information about the typhoon Haiyan at 2001101418 that we want to examine. 81 | - **ID**: Time step of the TC. 82 | - **LONG**: Longitude of the TC center (with a precision of **0.1°E**). 83 | - **LAT**: Latitude of the TC center (with a precision of **0.1°N**). 84 | - **PRES**: Minimum pressure (hPa) near the TC center. 85 | - **WND**: Two-minute mean maximum sustained wind (MSW; m/s) near the TC center. 86 | - **YYYYMMDDHH**: Date and time (UTC) of the TC event. 87 | - **Name**: Name of the TC. 88 | 89 | The **Data1D** is normalized using specific rules to make it suitable for **deep learning (DL)** methods to extract useful information. 90 | 91 | 92 | ### Examples of *Env-Data*: 93 | ![Env-Data Example](Env-Data.png) 94 | 95 | The **Env-Data** includes the following attributes: 96 | 97 | - **Movement Velocity**: The movement velocity of the tropical cyclone. 98 | - **Month**: Month of occurrence. 99 | - **Location Longitude and Latitude**: The relative location on Earth. 100 | - **24-hour History of Direction**: The movement direction of the cyclone in the past 24 hours. 101 | - **24-hour History of Intensity Change**: The intensity change of the cyclone in the past 24 hours. 102 | - **Subtropical High Region**: Extracted from **500 hPa Geopotential Height (GPH)** data. This variable is processed to make the data more suitable for input to **DL models**. 103 | 104 | ## Additional Experiments 105 | ![Comparison with DBF-Net](DBF-Net.png) 106 | 107 | ## License 108 | 109 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. 110 | 111 | ## Acknowledgments 112 | 113 | We would like to acknowledge the support of the research community and the institutions that contributed to the development of this dataset. The **TropiCycloneNet Dataset** has been designed for academic and research purposes in tropical cyclone studies. 114 | 115 | --- 116 | 117 | Feel free to reach out with any questions or comments regarding the **TropiCycloneNet Dataset** or how to use this project. 118 | 119 | 120 | ## Citing TropiCycloneNet 121 | 122 | ``` 123 | @article{TropiCycloneNet_under_review, 124 | author = {Huang, Cheng and Mu, Pan and Zhang, Jinglin and Chan, Sixian and Zhang Shiqi and Yan, Hanting and Chen, Shengyong and Bai, Cong}, 125 | title = {TropiCycloneNet: A Benchmark Dataset and A Deep Learning Method for Global Tropical Cyclone Forecasting}, 126 | journal = {Nature Communications}, 127 | volume = {under_review}, 128 | number = {under_review}, 129 | pages = {under_review}, 130 | doi = {under_review}, 131 | url = {under_review}, 132 | year = {under_review} 133 | } 134 | ``` 135 | -------------------------------------------------------------------------------- /read_TCND.py: -------------------------------------------------------------------------------- 1 | from netCDF4 import Dataset 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | import os 5 | import pandas as pd 6 | import argparse 7 | 8 | 9 | def read_3D(dataset_path, TC_name, TC_date, area): 10 | # Open the NetCDF file 11 | year = TC_date[:4] 12 | file_path = os.path.join(dataset_path, 'Data3D', area, year, TC_name, f"TCND_{TC_name}_{TC_date}_sst_z_u_v.nc") 13 | 14 | # Open the NetCDF file 15 | nc_data = Dataset(file_path, mode='r') 16 | 17 | # Retrieve data from the NetCDF file 18 | pressure_levels = nc_data.variables['pressure_level'][:] # Pressure levels 19 | u = nc_data.variables['u'][:] # U component 20 | v = nc_data.variables['v'][:] # V component 21 | z = nc_data.variables['z'][:] # Geopotential 22 | sst = nc_data.variables['sst'][:] # Sea Surface Temperature (assuming it's named 'sst') 23 | 24 | latitude = nc_data.variables['latitude'][:] # Latitude 25 | longitude = nc_data.variables['longitude'][:] # Longitude 26 | 27 | # Close the NetCDF file 28 | nc_data.close() 29 | 30 | # Plot all pressure levels (U, V, Z, SST) 31 | plot_all_pressure_levels(u, v, z, sst, pressure_levels, latitude, longitude) 32 | 33 | 34 | def read_env(dataset_path,TC_name,TC_date,area): 35 | year = TC_date[:4] 36 | # Build the file path for environmental data 37 | path = os.path.join(dataset_path,'Env-Data', area, year, TC_name, TC_date + '.npy') 38 | env_data = np.load(path, allow_pickle=True).item() 39 | 40 | # Read the corresponding NetCDF file for pressure levels and geophysical data 41 | file_path = os.path.join(dataset_path,'Data3D',area,year,TC_name,f"TCND_{TC_name}_{TC_date}_sst_z_u_v.nc") 42 | nc_data = Dataset(file_path, mode='r') 43 | 44 | z = nc_data.variables['z'][:] # Geopotential 45 | latitude = nc_data.variables['latitude'][:] # Latitude 46 | longitude = nc_data.variables['longitude'][:] # Longitude 47 | pressure_levels = nc_data.variables['pressure_level'][:] # Pressure levels 48 | print("Pressure levels:", pressure_levels) # Print the pressure levels 49 | 50 | # Find the index of 500 hPa 51 | pressure_500hPa = 500 52 | idx_500hPa = np.abs(pressure_levels - pressure_500hPa).argmin() # Find the closest index to 500 hPa 53 | print(f"Index of 500 hPa: {idx_500hPa}") 54 | 55 | print(env_data) 56 | 57 | # Plot the environmental data along with the geophysical data 58 | plot_env(env_data, z, latitude, longitude, pressure_levels, idx_500hPa) 59 | 60 | 61 | def plot_env(env_data, z, latitude, longitude, pressure_levels, idx_500hPa): 62 | # Example dictionary data to plot 63 | real_key = [ 64 | 'move_velocity', 65 | 'month', 66 | 'location_long', 67 | 'location_lat', 68 | 'history_direction24', 69 | 'history_inte_change24' 70 | ] 71 | 72 | data = { 73 | "Key": [ 74 | "Moving Velocity", 75 | "Month", 76 | "Location Longitude", 77 | "Location Latitude", 78 | "History Direction (24 h)", 79 | "History Intensity Change (24 h)", 80 | "Subtropical High" 81 | ], 82 | "Value": [str(env_data[x]) for x in real_key] 83 | } 84 | data['Value'].append('GPH') # Add GPH (Geopotential Height) to the data 85 | 86 | # Convert the data to a pandas DataFrame for easy display 87 | df = pd.DataFrame(data) 88 | 89 | # Create the figure for plotting 90 | fig, ax = plt.subplots(figsize=(14, 6), nrows=1, ncols=2) 91 | 92 | # First part: Display GPH image for 500 hPa 93 | ax[1].set_title("500 hPa GPH") 94 | gph_500 = z[0, idx_500hPa, :, :] # Use the index for 500 hPa 95 | contour = ax[1].contourf(longitude, latitude, gph_500, cmap='viridis') 96 | fig.colorbar(contour, ax=ax[1], orientation='vertical', label='Geopotential (gpm)') 97 | ax[1].set_xlabel('Longitude') 98 | ax[1].set_ylabel('Latitude') 99 | 100 | # Second part: Display the table with environmental data 101 | ax[0].axis('tight') 102 | ax[0].axis('off') 103 | 104 | # Create the table 105 | table = ax[0].table( 106 | cellText=df.values, 107 | colLabels=df.columns, 108 | cellLoc='center', 109 | loc='center' 110 | ) 111 | 112 | # Set font size and column width 113 | table.auto_set_font_size(False) 114 | table.set_fontsize(10) 115 | 116 | # Manually set column widths 117 | column_widths = [2.0, 3.0] # Adjust column width as needed 118 | for i, width in enumerate(column_widths): 119 | table.auto_set_column_width(col=[i]) # Set column width 120 | 121 | # Set cell height 122 | for key, cell in table.get_celld().items(): 123 | if key[0] == 0: # Header cells 124 | cell.set_height(0.1) # Set header row height 125 | else: # Data cells 126 | cell.set_height(0.1) # Set data cell height 127 | 128 | # Adjust layout so the table and plot are well spaced 129 | plt.subplots_adjust(wspace=0.3) # Adjust space between the two subplots 130 | 131 | # Show the plot and table 132 | plt.tight_layout() 133 | 134 | plt.savefig('Env-Data.png') 135 | plt.show() 136 | 137 | 138 | 139 | # Plotting function to visualize U, V, and Z components at all pressure levels 140 | def plot_all_pressure_levels(u_data, v_data, z_data, sst_data, pressure_levels, lat, lon): 141 | # Use the first time step for visualization 142 | time_index = 0 143 | # Set up the figure with multiple subplots: U, V, Z, and SST for each pressure level 144 | num_levels = len(pressure_levels) 145 | fig, axes = plt.subplots(num_levels, 4, figsize=(22, num_levels * 6)) # Added extra column for SST 146 | 147 | # Iterate over each pressure level and plot U, V, Z, and SST 148 | for i, level in enumerate(pressure_levels): 149 | # Extract the specific pressure level data for U, V, Z, and SST 150 | u_level = u_data[time_index, i, :, :] 151 | v_level = v_data[time_index, i, :, :] 152 | z_level = z_data[time_index, i, :, :] 153 | sst_level = sst_data # Sea Surface Temperature data 154 | 155 | # Plot U component 156 | ax1 = axes[i, 0] 157 | c1 = ax1.contourf(lon, lat, u_level, cmap='coolwarm') 158 | ax1.set_title(f"U Component at {level} hPa") 159 | ax1.set_xlabel("Longitude") 160 | ax1.set_ylabel("Latitude") 161 | fig.colorbar(c1, ax=ax1) 162 | 163 | # Plot V component 164 | ax2 = axes[i, 1] 165 | c2 = ax2.contourf(lon, lat, v_level, cmap='coolwarm') 166 | ax2.set_title(f"V Component at {level} hPa") 167 | ax2.set_xlabel("Longitude") 168 | ax2.set_ylabel("Latitude") 169 | fig.colorbar(c2, ax=ax2) 170 | 171 | # Plot Z component 172 | ax3 = axes[i, 2] 173 | c3 = ax3.contourf(lon, lat, z_level, cmap='coolwarm') 174 | ax3.set_title(f"Geopotential at {level} hPa") 175 | ax3.set_xlabel("Longitude") 176 | ax3.set_ylabel("Latitude") 177 | fig.colorbar(c3, ax=ax3) 178 | 179 | # Plot SST data 180 | if i == 0: 181 | ax4 = axes[i, 3] 182 | c4 = ax4.contourf(lon, lat, sst_level, cmap='viridis') 183 | ax4.set_title(f"SST") 184 | ax4.set_xlabel("Longitude") 185 | ax4.set_ylabel("Latitude") 186 | fig.colorbar(c4, ax=ax4) 187 | else: 188 | ax4 = axes[i, 3] 189 | ax4.set_axis_off() 190 | 191 | plt.tight_layout() 192 | 193 | plt.savefig('Data3D.png') 194 | plt.show() 195 | 196 | 197 | # Main function for reading 1D data and plotting it 198 | def read_1d(dataset_path,TC_name,TC_date,area,train_val_test): 199 | # Build file path for 1D data 200 | year = TC_date[:4] 201 | path = os.path.join(dataset_path,'Data1D', area, train_val_test, f"{area}{year}BST{TC_name.upper()}.txt") 202 | 203 | # Read the file content 204 | try: 205 | data = pd.read_csv(path, delimiter='\t', header=None, 206 | names=['ID', 'LONG', 'LAT', 'PRES', 'WND', 'YYYYMMDDHH', 'Name']) 207 | except FileNotFoundError: 208 | print(f"File not found: {path}") 209 | return 210 | 211 | print("Data loaded successfully!") 212 | print(data.head()) # Print the first few rows of data 213 | 214 | # Filter data based on the specified time 215 | target_time = float(TC_date) 216 | selected_data = data[data['YYYYMMDDHH'] == target_time] 217 | index = data[data['YYYYMMDDHH'] == target_time].index[0] 218 | 219 | # If data for the target time is found, get 5 data points 220 | if not selected_data.empty: 221 | print(f"Found data for {target_time}") 222 | # Get 5 data points, ensuring the specified time point is included 223 | data_to_plot = data[index:index + 5] # Get the next 5 rows 224 | 225 | # Visualize the data 226 | plot_data1d(data_to_plot) 227 | 228 | else: 229 | print(f"No data found for the time point: {target_time}") 230 | 231 | 232 | # Function to plot the 1D data in a table format 233 | def plot_data1d(data): 234 | # Create a new figure for the table 235 | fig, ax = plt.subplots(figsize=(10, 6)) # Adjust size 236 | 237 | # Set the table title 238 | ax.set_title("Data1D Example") 239 | 240 | # Create the table with selected columns 241 | table = ax.table( 242 | cellText=data[['ID', 'LONG', 'LAT', 'PRES', 'WND', 'YYYYMMDDHH', 'Name']].values, 243 | colLabels=['ID', 'LONG', 'LAT', 'PRES', 'WND', 'YYYYMMDDHH', 'Name'], 244 | cellLoc='center', 245 | loc='center' 246 | ) 247 | 248 | # Set font size and column width 249 | table.auto_set_font_size(False) 250 | table.set_fontsize(10) 251 | 252 | # Manually set column widths 253 | column_widths = [2.0, 3.0, 3.0, 3.0, 3.0, 3.0] 254 | for i, width in enumerate(column_widths): 255 | table.auto_set_column_width(col=[i]) # Set column width 256 | 257 | # Set cell height 258 | for key, cell in table.get_celld().items(): 259 | if key[0] == 0 or key[0] == 1: # Header cells 260 | cell.set_height(0.1) # Set header height 261 | cell.set_text_props(weight='bold') # Make header bold 262 | else: # Data cells 263 | cell.set_height(0.08) # Set data cell height 264 | 265 | # Hide the axes 266 | ax.axis('off') 267 | 268 | # Adjust layout to make the plot and table well spaced 269 | plt.tight_layout() 270 | plt.savefig('Data1D.png') 271 | plt.show() 272 | 273 | 274 | def read_all(dataset_path,TC_name,TC_date,area,train_val_test): 275 | read_3D(dataset_path,TC_name,TC_date,area) 276 | read_env(dataset_path,TC_name,TC_date,area) 277 | read_1d(dataset_path,TC_name,TC_date,area,train_val_test) 278 | 279 | if __name__ == '__main__': 280 | parser = argparse.ArgumentParser(description='Process TCND data') 281 | parser.add_argument('dataset_path', type=str, help='Path where the dataset is located', nargs='?', 282 | default='J:\\TropiCycloneNet_dataset\\TCND') 283 | parser.add_argument('TC_name', type=str, help='Name of the tropical cyclone to examine', nargs='?', 284 | default='Haiyan') 285 | parser.add_argument('TC_date', type=str, help='Specific date and time of the cyclone in YYYYMMDDHH format', 286 | nargs='?', default='2001101406') 287 | parser.add_argument('area', type=str, help='Ocean region where the cyclone occurred (e.g., WP for Western Pacific)', 288 | nargs='?', default='WP') 289 | parser.add_argument('train_val_test', type=str, 290 | help='Indicates whether the queried typhoon belongs to training, validation, or test set', 291 | nargs='?', default='train') 292 | 293 | args = parser.parse_args() 294 | read_all(args.dataset_path, args.TC_name, args.TC_date, args.area, args.train_val_test) 295 | 296 | 297 | #python read_TCND.py J:\\TropiCycloneNet_dataset\\TCND LINGLING 2019090618 WP test --------------------------------------------------------------------------------