├── Overview.png ├── 2404.00424v1.pdf ├── 2404.00424v2.pdf ├── 2404.00424v3.pdf ├── README.md └── Quantformer.ipynb /Overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangmordred/QuantFormer/HEAD/Overview.png -------------------------------------------------------------------------------- /2404.00424v1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangmordred/QuantFormer/HEAD/2404.00424v1.pdf -------------------------------------------------------------------------------- /2404.00424v2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangmordred/QuantFormer/HEAD/2404.00424v2.pdf -------------------------------------------------------------------------------- /2404.00424v3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangmordred/QuantFormer/HEAD/2404.00424v3.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Quantformer 2 | How to utilize transformer in quantitative financial trading? Here we provide a new model named Quantformer based on the transformer. 3 | 4 | The official implementation code of the work is available now! 5 | 6 | ![Overview](./Overview.png) 7 | 8 | ## Data collection 9 | The training and backtesting data are collected from [AKShare](https://github.com/akfamily/akshare) and [TuShare](https://github.com/waditu/tushare) from 2010 to 2019. For each stock, the **adjusted** cumulated return and cumulated turnover rate in the setting timestamp will be collected (if training model by return directly, the result may be influenced). 10 | 11 | ## Model implementation 12 | The code of the model is shown in [quantformer](./Quantformer.ipynb). If necessary, we will upload the code in a py file. 13 | 14 | The model is run in Python 3.8.3 (64-bit), torch version is 2.1.0+cpu and numpy version is 1.23.1. We are not sure if it will work properly under a lower version. 15 | 16 | ## Backtest 17 | ### Trading strategy 18 | Before the first trade date of the timestamp $t$, all sequence $\chi^{t} _{i,k}$ from the stock set $S^t$ will be put in the model and obtain the list of outputs. Then, the stocks will be ranked according to the first element of the output and the first $\frac{1}{q}$ % stocks will be added to the stock pool. If the stock already was in the stock pool on the last timestamp, it will be held; if the stock is in the predicted pool but not in the previous pool, it will be bought in with the same proportion of the whole account. Stocks that are not in the predicted pool will be sold out. The same method is run repeatedly during the subsequent periods. The backtest starts from January 2020, in other words, the result of the sequences from May 2018 to December 2019 will be used as the first stock pool to trade. 19 | 20 | If feels difficult to backtest, [JoinQuant](https://www.joinquant.com/) could be a considerable platform to help you with computation. By importing selected IDs of stocks, JoinQuant can simulate trading and show results to you. 21 | 22 | ### Other Settings 23 | **Transaction fee**: 0.3% for each time long or short 24 | 25 | **Trading period**: 01/2020-05/2023 26 | 27 | **Adjusted time**: 9:30 am BJT (ITC+08) 28 | 29 | ## Further collaboration or questions 30 | We are willing to collaborate and discuss this topic with those interested. If you want to further connect, you can contact the corresponding author via the paper in ArXiv by mail [zhangzf@umich.edu](mailto:zhangzf@umich.edu). 31 | 32 | ## Citation 33 | Our [paper](https://arxiv.org/abs/2404.00424): *Quantformer: from attention to profit with a quantitative transformer trading strategy* (which had been *From attention to profit: quantitative trading strategy based on transformer*) is available at arXiv. 34 | ```bibtex 35 | @unpublished{zhang2024attention, 36 | title={Quantformer: from attention to profit with a quantitative transformer trading strategy}, 37 | author={Zhang, Zhaofeng and Chen, Banghao and Zhu, Shengxin and Langren{\'e}, Nicolas}, 38 | note={arXiv:2404.00424}, 39 | year={2024} 40 | } 41 | 42 | 43 | -------------------------------------------------------------------------------- /Quantformer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import torch\n", 10 | "import torch.nn as nn\n", 11 | "from torch.nn import functional as F\n", 12 | "import numpy as np\n", 13 | "import pickle\n", 14 | "import re\n", 15 | "import csv\n", 16 | "from torch.utils.data import DataLoader, TensorDataset\n", 17 | "import ast\n", 18 | "from tqdm import tqdm" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "class MultiHeadAttention(nn.Module):\n", 28 | " def __init__(self, d_model, num_heads):\n", 29 | " super(MultiHeadAttention, self).__init__()\n", 30 | " self.num_heads = num_heads\n", 31 | " self.d_model = d_model\n", 32 | " self.depth = d_model // num_heads\n", 33 | "\n", 34 | " self.W_Q = nn.Linear(d_model, d_model)\n", 35 | " self.W_K = nn.Linear(d_model, d_model)\n", 36 | " self.W_V = nn.Linear(d_model, d_model)\n", 37 | " self.W_O = nn.Linear(d_model, d_model)\n", 38 | " \n", 39 | " def forward(self, Q, K, V):\n", 40 | " Q = self.W_Q(Q)\n", 41 | " K = self.W_K(K)\n", 42 | " V = self.W_V(V)\n", 43 | "\n", 44 | " Q = self._split_heads(Q)\n", 45 | " K = self._split_heads(K)\n", 46 | " V = self._split_heads(V)\n", 47 | "\n", 48 | " attention_weights = torch.matmul(Q, K.transpose(-1, -2)) / torch.sqrt(torch.tensor(self.depth, dtype=torch.float32))\n", 49 | " attention_weights = torch.softmax(attention_weights, dim=-1)\n", 50 | "\n", 51 | " output = torch.matmul(attention_weights, V)\n", 52 | " output = self._combine_heads(output)\n", 53 | "\n", 54 | " output = self.W_O(output)\n", 55 | " return output\n", 56 | " \n", 57 | " def _split_heads(self, tensor):\n", 58 | " tensor = tensor.view(tensor.size(0), -1, self.num_heads, self.depth)\n", 59 | " return tensor.transpose(1, 2)\n", 60 | " \n", 61 | " def _combine_heads(self, tensor):\n", 62 | " tensor = tensor.transpose(1, 2).contiguous()\n", 63 | " return tensor.view(tensor.size(0), -1, self.num_heads * self.depth)\n", 64 | "\n", 65 | "\n", 66 | "class EncoderLayer(nn.Module):\n", 67 | " def __init__(self, d_model, num_heads):\n", 68 | " super(EncoderLayer, self).__init__()\n", 69 | " self.attention = MultiHeadAttention(d_model, num_heads)\n", 70 | " self.feedforward = nn.Sequential(\n", 71 | " nn.Linear(d_model, 4 * d_model),\n", 72 | " nn.ReLU(),\n", 73 | " nn.Linear(4 * d_model, d_model)\n", 74 | " )\n", 75 | " self.norm1 = nn.LayerNorm(d_model)\n", 76 | " self.norm2 = nn.LayerNorm(d_model)\n", 77 | " \n", 78 | " def forward(self, x):\n", 79 | " attention_output = self.attention(x, x, x)\n", 80 | " attention_output = self.norm1(x + attention_output)\n", 81 | "\n", 82 | " feedforward_output = self.feedforward(attention_output)\n", 83 | " output = self.norm2(attention_output + feedforward_output)\n", 84 | " return output\n", 85 | "\n", 86 | "\n", 87 | "class DecoderLayer(nn.Module):\n", 88 | " def __init__(self, d_model, num_heads):\n", 89 | " super(DecoderLayer, self).__init__()\n", 90 | " self.self_attention = MultiHeadAttention(d_model, num_heads)\n", 91 | " self.encoder_attention = MultiHeadAttention(d_model, num_heads)\n", 92 | " self.feedforward = nn.Sequential(\n", 93 | " nn.Linear(d_model, 4 * d_model),\n", 94 | " nn.ReLU(),\n", 95 | " nn.Linear(4 * d_model, d_model)\n", 96 | " )\n", 97 | " self.norm1 = nn.LayerNorm(d_model)\n", 98 | " self.norm2 = nn.LayerNorm(d_model)\n", 99 | " self.norm3 = nn.LayerNorm(d_model)\n", 100 | " \n", 101 | " def forward(self, x, encoder_output):\n", 102 | " self_attention_output = self.self_attention(x, x, x)\n", 103 | " self_attention_output = self.norm1(x + self_attention_output)\n", 104 | "\n", 105 | " encoder_attention_output = self.encoder_attention(self_attention_output, encoder_output, encoder_output)\n", 106 | " encoder_attention_output = self.norm2(self_attention_output + encoder_attention_output)\n", 107 | "\n", 108 | " feedforward_output = self.feedforward(encoder_attention_output)\n", 109 | " output = self.norm3(encoder_attention_output + feedforward_output)\n", 110 | " return output\n", 111 | "\n", 112 | "class Transformer(nn.Module):\n", 113 | " def __init__(self, input_dim, hidden_dim, num_heads, num_layers, output_dim):\n", 114 | " super(Transformer, self).__init__()\n", 115 | " self.input_layer = nn.Linear(input_dim, hidden_dim)\n", 116 | " self.encoder_layers = nn.ModuleList([EncoderLayer(hidden_dim, num_heads) for _ in range(num_layers)])\n", 117 | " self.decoder_layers = nn.ModuleList([DecoderLayer(hidden_dim, num_heads) for _ in range(num_layers)])\n", 118 | " self.output_layer = nn.Linear(hidden_dim, output_dim)\n", 119 | " \n", 120 | " def forward(self, x):\n", 121 | " x = self.input_layer(x)\n", 122 | "\n", 123 | " encoder_output = x.transpose(0, 1)\n", 124 | " for layer in self.encoder_layers:\n", 125 | " encoder_output = layer(encoder_output)\n", 126 | "\n", 127 | " decoder_output = encoder_output\n", 128 | " for layer in self.decoder_layers:\n", 129 | " decoder_output = layer(decoder_output, encoder_output)\n", 130 | "\n", 131 | " decoder_output = decoder_output[-1, :, :]\n", 132 | "\n", 133 | " output = self.output_layer(decoder_output)\n", 134 | " return output\n", 135 | "\n", 136 | "model = Transformer(input_dim=2, hidden_dim=64, num_heads=8, num_layers=6, output_dim = 3)" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 5, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "def read_data(input_file, output_file, num_samples):\n", 146 | " with open(input_file, 'r') as f_input, open(output_file, 'r') as f_output:\n", 147 | " for _ in range(num_samples):\n", 148 | " input_line = f_input.readline().strip()\n", 149 | " output_line = f_output.readline().strip()\n", 150 | "\n", 151 | " if not input_line or not output_line:\n", 152 | " continue\n", 153 | "\n", 154 | " try:\n", 155 | " input_data = np.array(ast.literal_eval(input_line), dtype=np.float32)\n", 156 | " output_data = np.array(ast.literal_eval(output_line), dtype=np.float32)\n", 157 | " except SyntaxError:\n", 158 | " continue \n", 159 | "\n", 160 | " yield input_data, output_data" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "Import Data" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": null, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "num_samples = #number of samples in your import data\n", 177 | "data_generator = read_data('training_input.txt', 'training_output.txt', num_samples)\n", 178 | "\n", 179 | "inputs = []\n", 180 | "outputs = []\n", 181 | "for _ in range(num_samples):\n", 182 | " input_data, output_data = next(data_generator, (None, None))\n", 183 | " if input_data is not None and output_data is not None:\n", 184 | " inputs.append(input_data)\n", 185 | " outputs.append(output_data)\n", 186 | "\n", 187 | "inputs = torch.tensor(inputs, dtype=torch.float32)\n", 188 | "outputs = torch.tensor(outputs, dtype=torch.float32)\n", 189 | "\n", 190 | "print(inputs.shape)\n", 191 | "print(outputs_monthy_3.shape)" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "dataset = TensorDataset(inputs, outputs)\n", 201 | "\n", 202 | "batch_size = #set as you want\n", 203 | "\n", 204 | "data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)\n", 205 | "\n", 206 | "print(f'Number of batches: {len(data_loader)}')" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "Training Implementation" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [ 222 | "model = Transformer(input_dim=2, hidden_dim=64, num_heads=8, num_layers=6, output_dim = 3)\n", 223 | "\n", 224 | "criterion = nn.MSELoss()\n", 225 | "\n", 226 | "optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)\n", 227 | "\n", 228 | "num_epochs = #set as you want\n", 229 | "\n", 230 | "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", 231 | "model = model.to(device)\n", 232 | "criterion = criterion.to(device)\n", 233 | "\n", 234 | "for epoch in range(num_epochs):\n", 235 | " running_loss = 0.0\n", 236 | " for i, data in enumerate(data_loader, 0):\n", 237 | " inputs, labels = data\n", 238 | " inputs = inputs.to(device)\n", 239 | " labels = labels.to(device)\n", 240 | " optimizer.zero_grad()\n", 241 | " outputs = model(inputs)\n", 242 | " loss = criterion(outputs, labels)\n", 243 | " loss.backward()\n", 244 | " optimizer.step()\n", 245 | " running_loss += loss.item()\n", 246 | "\n", 247 | " print('Epoch [%d/%d], Loss: %.4f' % (epoch+1, num_epochs, running_loss / len(data_loader)))\n", 248 | "\n", 249 | "\n", 250 | "\n", 251 | "torch.save(model.state_dict(), 'your_model_pth_name')" 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "Usage" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": null, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [ 267 | "model_params_path = 'your_model_pth_name'\n", 268 | "output_dim = #your setting output dim\n", 269 | "model = Transformer(input_dim=2, hidden_dim=64, num_heads=8, num_layers=6, output_dim = output_dim) \n", 270 | "model.load_state_dict(torch.load(model_params_path, map_location=torch.device('cpu')))\n", 271 | "model.eval() \n", 272 | "\n", 273 | "new_input = bakctest_input\n", 274 | " \n", 275 | "\n", 276 | "new_input_tensor = torch.tensor(new_input, dtype=torch.float32).to(device)\n", 277 | "new_input_tensor = new_input_tensor.unsqueeze(0)\n", 278 | "\n", 279 | " \n", 280 | "with torch.no_grad():\n", 281 | " output = model(new_input_tensor)\n", 282 | "\n", 283 | "\n", 284 | "output_values = output[0].tolist()\n", 285 | "formatted_output = [format(x, '.10f') for x in output_values]\n", 286 | "\n", 287 | "print(formatted_output)" 288 | ] 289 | } 290 | ], 291 | "metadata": { 292 | "kernelspec": { 293 | "display_name": "Python 3", 294 | "language": "python", 295 | "name": "python3" 296 | }, 297 | "language_info": { 298 | "codemirror_mode": { 299 | "name": "ipython", 300 | "version": 3 301 | }, 302 | "file_extension": ".py", 303 | "mimetype": "text/x-python", 304 | "name": "python", 305 | "nbconvert_exporter": "python", 306 | "pygments_lexer": "ipython3", 307 | "version": "3.8.3" 308 | } 309 | }, 310 | "nbformat": 4, 311 | "nbformat_minor": 4 312 | } 313 | --------------------------------------------------------------------------------