├── docs
├── install_kdb_Ubuntu1804.md
├── images
│ ├── logo-kdb.jpg
│ ├── logo-influxdb.png
│ ├── logo-tdengine.png
│ ├── logo-dolphindb.jpg
│ ├── taos_commandline.png
│ ├── tdengine-architeture.png
│ ├── influxdb-architecture.png
│ └── tdengine-write-process.png
├── install_influxdb_Ubuntu1804.md
├── install_DolphinDB_Ubuntu1804.md
├── install_TDEngine_Ubuntu1804.md
├── QueryPerformance.md
├── import_data_into_DolphinDB.md
├── import_data_into_TDEngine.md
├── import_data_into_InfluxDB.md
└── index.md
├── README.md
├── code
├── ConvertDateTime2Int
│ ├── ConvertDateTime2Int.csproj
│ └── Program.cs
├── Nasdaq.Data
│ ├── Nasdaq.Data.csproj
│ └── StockData.cs
├── Nasdaq2InfluxDB
│ ├── Nasdaq2InfluxDB.csproj
│ └── Program.cs
├── Nasdaq2TDengine
│ ├── Nasdaq2TDengine.csproj
│ └── Program.cs
├── Nasdaq2DolphinDB
│ ├── Nasdaq2DolphinDB.csproj
│ └── Program.cs
├── NasdaqImport2InfluxDB
│ ├── NasdaqImport2InfluxDB.csproj
│ └── Program.cs
└── NasdaqDataConvert.sln
├── .gitignore
├── data
└── README.md
└── LICENSE
/docs/install_kdb_Ubuntu1804.md:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/docs/images/logo-kdb.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/micli/timeseriesdb-benchmarks/HEAD/docs/images/logo-kdb.jpg
--------------------------------------------------------------------------------
/docs/images/logo-influxdb.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/micli/timeseriesdb-benchmarks/HEAD/docs/images/logo-influxdb.png
--------------------------------------------------------------------------------
/docs/images/logo-tdengine.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/micli/timeseriesdb-benchmarks/HEAD/docs/images/logo-tdengine.png
--------------------------------------------------------------------------------
/docs/images/logo-dolphindb.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/micli/timeseriesdb-benchmarks/HEAD/docs/images/logo-dolphindb.jpg
--------------------------------------------------------------------------------
/docs/images/taos_commandline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/micli/timeseriesdb-benchmarks/HEAD/docs/images/taos_commandline.png
--------------------------------------------------------------------------------
/docs/images/tdengine-architeture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/micli/timeseriesdb-benchmarks/HEAD/docs/images/tdengine-architeture.png
--------------------------------------------------------------------------------
/docs/images/influxdb-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/micli/timeseriesdb-benchmarks/HEAD/docs/images/influxdb-architecture.png
--------------------------------------------------------------------------------
/docs/images/tdengine-write-process.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/micli/timeseriesdb-benchmarks/HEAD/docs/images/tdengine-write-process.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # timeseriesdb-benchmarks
2 |
3 | This project is used for seek a lightweight, fast time-series database. DolphinDB, InfluxDB, KDB+ and TDEngine databases are targets of thie project.
4 |
5 | You can start reading from [**HERE**](./docs/index.md).
--------------------------------------------------------------------------------
/code/ConvertDateTime2Int/ConvertDateTime2Int.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Exe
5 | netcoreapp3.1
6 |
7 |
8 |
9 |
--------------------------------------------------------------------------------
/code/Nasdaq.Data/Nasdaq.Data.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | netstandard2.1
5 |
6 |
7 |
8 |
9 |
10 |
11 |
--------------------------------------------------------------------------------
/code/Nasdaq2InfluxDB/Nasdaq2InfluxDB.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Exe
5 | netcoreapp3.1
6 |
7 |
8 |
9 |
10 |
11 |
12 |
--------------------------------------------------------------------------------
/code/Nasdaq2TDengine/Nasdaq2TDengine.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Exe
5 | netcoreapp3.1
6 |
7 |
8 |
9 |
10 |
11 |
12 |
--------------------------------------------------------------------------------
/code/Nasdaq2DolphinDB/Nasdaq2DolphinDB.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Exe
5 | netcoreapp3.1
6 |
7 |
8 |
9 |
10 |
11 |
12 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Binaries for programs and plugins
2 | *.exe
3 | *.exe~
4 | *.dll
5 | *.so
6 | *.dylib
7 | *.pdb
8 | /bin
9 | /obj
10 |
11 | # Test binary, built with `go test -c`
12 | *.test
13 |
14 | # Output of the go coverage tool, specifically when used with LiteIDE
15 | *.out
16 |
17 | # Dependency directories (remove the comment below to include it)
18 | # vendor/
19 | .DS_Store
20 |
--------------------------------------------------------------------------------
/code/NasdaqImport2InfluxDB/NasdaqImport2InfluxDB.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Exe
5 | netcoreapp3.1
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
--------------------------------------------------------------------------------
/docs/install_influxdb_Ubuntu1804.md:
--------------------------------------------------------------------------------
1 | # Install Influxdb database on Ubuntu 18.04
2 |
3 | In this document you will learn how to install InfluxDB database on Ubuntu 18.04.
4 |
5 | ## Install Manaually
6 |
7 | 1. Downlaod Influxdb from offical website
8 |
9 | ```shell
10 | mkdir influxdb
11 | cd influxdb
12 | wget https://dl.influxdata.com/influxdb/releases/influxdb_1.8.2_amd64.deb
13 | ```
14 |
15 | 2. Run installation by dpkg
16 |
17 | ```shell
18 | sudo dpkg -i influxdb_1.8.2_amd64.deb
19 | ```
20 |
21 | 3. Ensure influxdb service maintained by systemd.
22 |
23 | ```shell
24 |
25 | sudo systemctl enable --now influxdb
26 |
27 | ```
28 |
29 | ## Reference
30 |
31 | + [Influxdb 1.8.2 download page](https://portal.influxdata.com/downloads/)
32 | + [Influxdb sample data](https://docs.influxdata.com/influxdb/v1.7/query_language/data_download/)
--------------------------------------------------------------------------------
/code/ConvertDateTime2Int/Program.cs:
--------------------------------------------------------------------------------
1 | using System;
2 |
3 | namespace ConvertDateTime2Int
4 | {
5 | class Program
6 | {
7 | static void Main(string[] args)
8 | {
9 | Console.WriteLine("Please input DateTime value:");
10 | string strVal = Console.ReadLine();
11 | DateTime dt = DateTime.MinValue;
12 | if (DateTime.TryParse(strVal, out dt))
13 | {
14 | int ret = DateTime2Int(dt);
15 | Console.WriteLine("Value = {0}", ret);
16 | }
17 | else
18 | {
19 | Console.WriteLine("Invalid DateTime format.");
20 | }
21 | }
22 |
23 | private static int DateTime2Int(System.DateTime time)
24 | {
25 | DateTime start = DateTime.Parse("1970-01-01").ToLocalTime();
26 | return (int)(time - start).TotalSeconds;
27 | }
28 | }
29 | }
30 |
--------------------------------------------------------------------------------
/data/README.md:
--------------------------------------------------------------------------------
1 | # Testing Data
2 |
3 | The orignial data cannot be directly import into database.
4 | Because the database has its own compatible data format. We need convert data before importing.
5 |
6 | + The original data located at [**HERE**](https://dxact.blob.core.chinacloudapi.cn/testdata/nasdaq.zip).
7 |
8 | + DolphinDB data can be dowload from [**HERE**](https://dxact.blob.core.chinacloudapi.cn/testdata/nasdaq_dolphin_data.zip).
9 |
10 | + The data for TDEngine importing located at [**HERE**](https://dxact.blob.core.chinacloudapi.cn/testdata/tdengine_data.zip).
11 |
12 | + To import data into InfluxDb, please use .NET Core app NasdaqImport2InfluxDB which is located at /code/NasdaqImport2InfluxDB/
13 |
14 | # Installation Pacakages
15 |
16 | + [DolphinDB 64-bits](https://dxact.blob.core.chinacloudapi.cn/testdata/DolphinDB_Linux64_V1.00.24.zip)
17 |
18 | + [KDB+](https://dxact.blob.core.chinacloudapi.cn/testdata/kdb-linux_x86.zip)
19 |
20 | + [TDEngine](https://dxact.blob.core.chinacloudapi.cn/testdata/TDengine-server-2.0.3.0-Linux-x64.deb)
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 Michael Li
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/docs/install_DolphinDB_Ubuntu1804.md:
--------------------------------------------------------------------------------
1 | # Install DolphinDB on Ubuntu 18.04
2 |
3 | In this document you will learn how to install and configure DolphinDB on Ubuntu 18.04.
4 |
5 | ## Manually Installation
6 |
7 | 1. Install unzip for exacting binary files.
8 |
9 | ```shell
10 |
11 | sudo apt-get update
12 | sudo apt-get install unzip
13 |
14 | ```
15 |
16 | 2. Download DolphinDB Community Edition from offical site.
17 |
18 | ```shell
19 |
20 | wget http://www.dolphindb.cn/downLinux64-ABI.php
21 |
22 | ```
23 | DolphinDB 32-bit version is not supports running on 64-bit Linux. Please carefully select right version for downloading.
24 |
25 | 3. Create a folder and unzip DolphinDB into it.
26 |
27 | ```shell
28 |
29 | mkdir DolphinDB
30 | cd DolphinDB
31 | mv ~/DolphinDB_Linux64_V1.10.14_ABI.zip ./
32 | unzip DolphinDB_Linux64_V1.10.14_ABI.zip
33 |
34 | ```
35 |
36 | 4. Starting service
37 |
38 | To starting service, you need to create a service configuration file first.
39 |
40 | ```
41 | ./dolphindb -maxMemSize 16
42 |
43 | ```
44 |
45 |
46 | Notes:
47 |
48 | ***
49 |
50 | Quit dolphinDB interactive session, please input 'quit' directly and press enter. quit is a command not a function.
51 |
52 | ***
--------------------------------------------------------------------------------
/docs/install_TDEngine_Ubuntu1804.md:
--------------------------------------------------------------------------------
1 | # Install TDEngine database on Ubuntu 18.04
2 |
3 | In this document, you will learn how to install TDEngine database on Ubuntu 18.04
4 |
5 | # Manually Installation
6 |
7 | 1. The installation package of TDengine is very small(2.7MB~4.5MB). To get intall pacakge, you need to input a valid email address at below links:
8 |
9 | https://www.taosdata.com/en/getting-started/
10 | + TDengine-server-2.0.3.0-Linux-x64.rpm (4.2M)
11 | + TDengine-server-2.0.3.0-Linux-x64.deb (2.7M)
12 | + TDengine-server-2.0.3.0-Linux-x64.tar.gz (4.5M)
13 |
14 | The download link will appear in your **junk mail folder**.
15 |
16 | 2. Login Ubuntu server, and download package into local folder.
17 |
18 | ```shell
19 | mkdir TDEngine
20 | cd TDEngine
21 | wget https://dxact.blob.core.chinacloudapi.cn/21mfilms/TDengine-server-2.0.3.0-Linux-x64.deb
22 | ```
23 |
24 | 3. Manually run install action by dpkg command.
25 |
26 | ```shell
27 | sudo dpkg -i TDengine-server-2.0.3.0-Linux-64.deb
28 | ```
29 |
30 | 4. Make sure taosd service has been maintained by systemd. taosd is core service of TDEngine database.
31 |
32 | ```shell
33 | sudo systemctl enable --now taosd
34 | ```
35 |
36 | 5. TDEngine supports command line interaction by taos
37 |
38 | 
39 |
40 |
--------------------------------------------------------------------------------
/code/Nasdaq2DolphinDB/Program.cs:
--------------------------------------------------------------------------------
1 | using System;
2 | using System.IO;
3 | using System.Collections.Generic;
4 | using Nasdaq.Data;
5 |
6 | namespace Nasdaq2DolphinDB
7 | {
8 | class Program
9 | {
10 | static void Main(string[] args)
11 | {
12 | string path = "/Users/micl/Documents/Nasdaq/us/nasdaq/";
13 |
14 | using (StreamWriter sw = new StreamWriter("nasdaq.csv"))
15 | {
16 | sw.WriteLine("date,code,opening_price,highest_price,lowest_price,closing_price,adjusted_closing_price,trade_volume");
17 | sw.Flush();
18 |
19 | string[] subdirectiores = System.IO.Directory.GetDirectories(path);
20 | int subFolderCount = subdirectiores.Length;
21 | Console.WriteLine("Total found {0} sub folders", subFolderCount);
22 | int currentFolder = 1;
23 | foreach (string folder in subdirectiores)
24 | {
25 | Console.WriteLine("Now handling {0} / {1} folder, Symbol = {2}",
26 | currentFolder, subFolderCount, folder.Substring(folder.LastIndexOf("/") + 1, folder.Length - folder.LastIndexOf("/") - 1));
27 | string[] files = Directory.GetFiles(folder, "*.json");
28 | foreach (string f in files)
29 | {
30 | Console.WriteLine("Now reading file : {0}", f);
31 | var stock = new Stock(f);
32 | stock.LoadData();
33 | Console.WriteLine("File loaded, total {0} items", stock.Data.Count);
34 | foreach (StockData d in stock.Data)
35 | {
36 | sw.WriteLine(d.ToString("dolphindb"));
37 | }
38 | sw.Flush();
39 |
40 | }
41 | currentFolder++;
42 | }
43 | }
44 |
45 | }
46 | }
47 | }
48 |
--------------------------------------------------------------------------------
/docs/QueryPerformance.md:
--------------------------------------------------------------------------------
1 | # Query Performance
2 |
3 |
4 | ## Table scan
5 |
6 |
7 | DolphinDB:
8 |
9 | ```sql
10 | timer select count(*) from tb_nasdaq;
11 | ```
12 | time spent: 0.528 ms
13 |
14 | InfluxDB:
15 |
16 | ```sql
17 | select count(*) from tb_nasdaq;
18 | ```
19 | report error **ERR: no data received**
20 | During Calaulation, CPU nearly 100%, memory nearly 100%.
21 |
22 | TDEngine:
23 |
24 | ```sql
25 | select count(*) from tb_nasdaq;
26 | ```
27 | time spent: 4.928847s
28 |
29 |
30 | ## Query Min/Max valud and group by code
31 |
32 |
33 |
34 | DolphinDB:
35 |
36 | ```sql
37 | timer select min(opening_price) from tb_nasdaq group by code;
38 | ```
39 | time spent: 43.113 ms
40 |
41 | InfluxDB:
42 |
43 | ```sql
44 | select min(opening_price) from tb_nasdaq group by code;
45 | ```
46 | time spent: 44.37s
47 |
48 | TDEngine:
49 |
50 | ```sql
51 | select min(opening_price) from tb_nasdaq group by code;
52 | ```
53 | time spent: 7.648549s
54 |
55 |
56 | ## Quer by symbol code
57 |
58 |
59 | DolphinDB:
60 |
61 | ```sql
62 | timer select sum(trade_volume) from tb_nasdaq where code='AAPL';
63 | ```
64 |
65 | time spent: 6.847 ms
66 |
67 | InfluxDB:
68 |
69 | ```sql
70 | select sum(trade_volume) from tb_nasdaq where code='AAPL';
71 | ```
72 |
73 | time spent: 0.41s
74 |
75 | TDEngine:
76 |
77 | ```sql
78 | select sum(trade_volume) from tb_nasdaq where code='AAPL';
79 | ```
80 |
81 | time spent: 0.068869s
82 |
83 |
84 | ## Query by symbol code and time
85 |
86 | DolphinDB:
87 |
88 | ```sql
89 | timer select count(*) from tb_nasdaq where code='AAPL' and date < 2017.01.01;
90 | ```
91 |
92 | Time spent: 4.681 ms
93 |
94 | InfluxDB:
95 |
96 | ```sql
97 | select count(*) from tb_nasdaq where code='AAPL' and time < 1478188800000000000;
98 | ```
99 | Time spent: 0.7s
100 |
101 | TDEngine:
102 |
103 | ```sql
104 | select count(*) from tb_nasdaq where code='AAPL' and date < '2017-01-01 00:00:00';
105 | ```
106 | Time spent: 0.037594s
107 |
--------------------------------------------------------------------------------
/code/Nasdaq2InfluxDB/Program.cs:
--------------------------------------------------------------------------------
1 | using System;
2 | using System.IO;
3 | using Nasdaq.Data;
4 |
5 | namespace Nasdaq2InfluxDB
6 | {
7 | class Program
8 | {
9 | static void Main(string[] args)
10 | {
11 | if (args.Length < 2)
12 | {
13 | Console.WriteLine("Measurement name or data folder path not specified.");
14 | return;
15 | }
16 | var path = args[0];
17 | var measurement = args[1];
18 | using (StreamWriter sw = new StreamWriter("nasdaq_influx_batch.csv"))
19 | {
20 | //sw.WriteLine("measurement,code,opening_price,highest_price,lowest_price,closing_price,adjusted_closing_price,trade_volume,date");
21 | //sw.Flush();
22 | string[] subdirectiores = System.IO.Directory.GetDirectories(path);
23 | int subFolderCount = subdirectiores.Length;
24 | Console.WriteLine("Total found {0} sub folders", subFolderCount);
25 | int currentFolder = 1;
26 | foreach (string folder in subdirectiores)
27 | {
28 | Console.WriteLine("Now handling {0} / {1} folder, Symbol = {2}",
29 | currentFolder, subFolderCount, folder.Substring(folder.LastIndexOf("/") + 1, folder.Length - folder.LastIndexOf("/") - 1));
30 | string[] files = Directory.GetFiles(folder, "*.json");
31 | foreach (string f in files)
32 | {
33 | Console.WriteLine("Now reading file : {0}", f);
34 | var stock = new Stock(f);
35 | stock.LoadData();
36 | Console.WriteLine("File loaded, total {0} items", stock.Data.Count);
37 | foreach (StockData d in stock.Data)
38 | {
39 | sw.WriteLine("{0},{1}", measurement, d.ToString("influxdb"));
40 | }
41 | sw.Flush();
42 |
43 | }
44 | currentFolder++;
45 | }
46 | }
47 | }
48 | }
49 | }
50 |
--------------------------------------------------------------------------------
/code/NasdaqImport2InfluxDB/Program.cs:
--------------------------------------------------------------------------------
1 | using System;
2 | using System.IO;
3 | using InfluxDB;
4 | using InfluxDB.Client;
5 | using InfluxDB.Client.Core;
6 | using InfluxDB.LineProtocol.Payload;
7 | using InfluxDB.LineProtocol.Client;
8 |
9 | using Nasdaq.Data;
10 | using System.Collections.Generic;
11 |
12 | namespace NasdaqImport2InfluxDB
13 | {
14 | class Program
15 | {
16 | static void Main(string[] args)
17 | {
18 | var path = "/Users/micl/Documents/Nasdaq/us/nasdaq/";
19 | var database = "nasdaq";
20 |
21 | LineProtocolClient client = new LineProtocolClient(new Uri("http://40.73.35.55:8086"), database);
22 |
23 | string[] subdirectiores = System.IO.Directory.GetDirectories(path);
24 | int subFolderCount = subdirectiores.Length;
25 | Console.WriteLine("Total found {0} sub folders", subFolderCount);
26 | int currentFolder = 1;
27 | DateTime start = DateTime.Now;
28 | foreach (string folder in subdirectiores)
29 | {
30 | Console.WriteLine("Now handling {0} / {1} folder, Symbol = {2}",
31 | currentFolder, subFolderCount, folder.Substring(folder.LastIndexOf("/") + 1, folder.Length - folder.LastIndexOf("/") - 1));
32 | string[] files = Directory.GetFiles(folder, "*.json");
33 | foreach (string f in files)
34 | {
35 | Console.WriteLine("Now reading file : {0}", f);
36 | var stock = new Stock(f);
37 | stock.LoadData();
38 | Console.WriteLine("File loaded, total {0} items", stock.Data.Count);
39 |
40 | LineProtocolPayload payload = new LineProtocolPayload();
41 | foreach (StockData d in stock.Data)
42 | {
43 | LineProtocolPoint point = new LineProtocolPoint(
44 | "tb_nasdaq",
45 | new Dictionary
46 | {
47 | { "opening_price", d.opening_price},
48 | { "highest_price", d.highest_price},
49 | { "lowest_price", d.lowest_price},
50 | { "closing_price", d.closing_price},
51 | { "adjusted_closing_price", d.adjusted_closing_price},
52 | { "trade_volume", d.trade_volume},
53 | },
54 | new Dictionary
55 | {
56 | { "code", d.code }
57 | },
58 | DateTime.Parse(d.date).ToUniversalTime());
59 | payload.Add(point);
60 |
61 | }
62 | var result = client.WriteAsync(payload).GetAwaiter().GetResult();
63 | if (!result.Success)
64 | {
65 | Console.WriteLine(result.ErrorMessage);
66 | }
67 | }
68 | currentFolder++;
69 | }
70 |
71 | DateTime end = DateTime.Now;
72 | Console.WriteLine("Total spent: {0} seconds", (end - start).TotalSeconds);
73 | }
74 | }
75 | }
76 |
--------------------------------------------------------------------------------
/code/NasdaqDataConvert.sln:
--------------------------------------------------------------------------------
1 |
2 | Microsoft Visual Studio Solution File, Format Version 12.00
3 | # Visual Studio 15
4 | Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Nasdaq.Data", "Nasdaq.Data\Nasdaq.Data.csproj", "{AEC78C33-9F5D-46E3-BB6A-6BFA7F14858B}"
5 | EndProject
6 | Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Nasdaq2TDengine", "Nasdaq2TDengine\Nasdaq2TDengine.csproj", "{235AB5D7-084D-4FDC-8EF8-527354D5094C}"
7 | EndProject
8 | Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Nasdaq2DolphinDB", "Nasdaq2DolphinDB\Nasdaq2DolphinDB.csproj", "{44C79FBB-1373-4E47-8E8D-EE390CB449F4}"
9 | EndProject
10 | Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Nasdaq2InfluxDB", "Nasdaq2InfluxDB\Nasdaq2InfluxDB.csproj", "{B5E15B7C-1E22-47C4-88B3-A0BEB8773822}"
11 | EndProject
12 | Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "NasdaqImport2InfluxDB", "NasdaqImport2InfluxDB\NasdaqImport2InfluxDB.csproj", "{A1DE068A-B77E-476C-9921-D0ABA32AB565}"
13 | EndProject
14 | Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "ConvertDateTime2Int", "ConvertDateTime2Int\ConvertDateTime2Int.csproj", "{B819DDEE-065E-4012-8CC8-D1F4CF716AA2}"
15 | EndProject
16 | Global
17 | GlobalSection(SolutionConfigurationPlatforms) = preSolution
18 | Debug|Any CPU = Debug|Any CPU
19 | Release|Any CPU = Release|Any CPU
20 | EndGlobalSection
21 | GlobalSection(ProjectConfigurationPlatforms) = postSolution
22 | {AEC78C33-9F5D-46E3-BB6A-6BFA7F14858B}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
23 | {AEC78C33-9F5D-46E3-BB6A-6BFA7F14858B}.Debug|Any CPU.Build.0 = Debug|Any CPU
24 | {AEC78C33-9F5D-46E3-BB6A-6BFA7F14858B}.Release|Any CPU.ActiveCfg = Release|Any CPU
25 | {AEC78C33-9F5D-46E3-BB6A-6BFA7F14858B}.Release|Any CPU.Build.0 = Release|Any CPU
26 | {235AB5D7-084D-4FDC-8EF8-527354D5094C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
27 | {235AB5D7-084D-4FDC-8EF8-527354D5094C}.Debug|Any CPU.Build.0 = Debug|Any CPU
28 | {235AB5D7-084D-4FDC-8EF8-527354D5094C}.Release|Any CPU.ActiveCfg = Release|Any CPU
29 | {235AB5D7-084D-4FDC-8EF8-527354D5094C}.Release|Any CPU.Build.0 = Release|Any CPU
30 | {44C79FBB-1373-4E47-8E8D-EE390CB449F4}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
31 | {44C79FBB-1373-4E47-8E8D-EE390CB449F4}.Debug|Any CPU.Build.0 = Debug|Any CPU
32 | {44C79FBB-1373-4E47-8E8D-EE390CB449F4}.Release|Any CPU.ActiveCfg = Release|Any CPU
33 | {44C79FBB-1373-4E47-8E8D-EE390CB449F4}.Release|Any CPU.Build.0 = Release|Any CPU
34 | {B5E15B7C-1E22-47C4-88B3-A0BEB8773822}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
35 | {B5E15B7C-1E22-47C4-88B3-A0BEB8773822}.Debug|Any CPU.Build.0 = Debug|Any CPU
36 | {B5E15B7C-1E22-47C4-88B3-A0BEB8773822}.Release|Any CPU.ActiveCfg = Release|Any CPU
37 | {B5E15B7C-1E22-47C4-88B3-A0BEB8773822}.Release|Any CPU.Build.0 = Release|Any CPU
38 | {A1DE068A-B77E-476C-9921-D0ABA32AB565}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
39 | {A1DE068A-B77E-476C-9921-D0ABA32AB565}.Debug|Any CPU.Build.0 = Debug|Any CPU
40 | {A1DE068A-B77E-476C-9921-D0ABA32AB565}.Release|Any CPU.ActiveCfg = Release|Any CPU
41 | {A1DE068A-B77E-476C-9921-D0ABA32AB565}.Release|Any CPU.Build.0 = Release|Any CPU
42 | {B819DDEE-065E-4012-8CC8-D1F4CF716AA2}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
43 | {B819DDEE-065E-4012-8CC8-D1F4CF716AA2}.Debug|Any CPU.Build.0 = Debug|Any CPU
44 | {B819DDEE-065E-4012-8CC8-D1F4CF716AA2}.Release|Any CPU.ActiveCfg = Release|Any CPU
45 | {B819DDEE-065E-4012-8CC8-D1F4CF716AA2}.Release|Any CPU.Build.0 = Release|Any CPU
46 | EndGlobalSection
47 | EndGlobal
48 |
--------------------------------------------------------------------------------
/docs/import_data_into_DolphinDB.md:
--------------------------------------------------------------------------------
1 | # Import data into DolphinDB
2 |
3 | In this document, you will learn how to create database, partionTable and import Nasdaq data into database.
4 |
5 | # Creating Database
6 |
7 | DophinDB supports functions interacte with users. To create a database, it uses database function as below:
8 |
9 | ```shell
10 | database(directory, [partitionType], [partitionScheme], [locations])
11 | ```
12 | For optmizing I/O, user can specify partitionType parameter to make data store into different partitions. To create a database, it at least specified a database file.
13 |
14 | The function returns dbHandle object for future use.
15 |
16 | ```shell
17 | database("dfs://nasdaq")
18 | ```
19 |
20 | # Creating Partitioned Table
21 |
22 | Partitioned table used for store data into different partition to get max I/O performance. The function declaration as below:
23 |
24 | ```shell
25 | createPartitionedTable(dbHandle, table, [tableName], partitionColumns)
26 | ```
27 |
28 | Before createPartitionedTable, it needs to call table function to make a table declaration.
29 |
30 | ```shell
31 | table(capacity:size, colNames, colTypes)
32 | ```
33 |
34 | capacity means initial memory size of this table. If data size more than capacity, table will automatically allocate more memory to maintain data. size means inital data occupied memory.
35 |
36 | ```shell
37 |
38 | table(20:0, 'date'code'opening_price'highest_price'lowest_price'closing_price'adjusted_closing_price'trade_volume,[TIMESTAMP,STRING,DOUBLE,DOUBLE,DOUBLE,DOUBLE,DOUBLE,DOUBLE])
39 |
40 | ```
41 |
42 | To build both database and partitioned table, please use below code:
43 |
44 | ```shell
45 | yearRange=date(2011.01M + 12*0..22)
46 | dbPath="dfs://nasdaqdb"
47 | login("admin","123456")
48 | if(existsDatabase(dbPath)){
49 | dropDatabase(dbPath)
50 | }
51 | db=database(dbPath,VALUE,yearRange)
52 | saveDatabase(db);
53 | tb_nasdaq=table(20:0, 'date''code''opening_price''highest_price''lowest_price''closing_price''adjusted_closing_price''trade_volume',[TIMESTAMP,SYMBOL,DOUBLE,DOUBLE,DOUBLE,DOUBLE,DOUBLE,INT])
54 | pdt=createPartitionedTable(db, tb_nasdaq, 'tb_nasdaq', 'date');
55 | ```
56 |
57 | # Imoporting data
58 |
59 | There are three ways to retrieve data from a CSV file.
60 |
61 | + loadText: Import a text file as a memory table.
62 | + ploadText: Parallel import text files as partition memory tables.
63 | + loadTextEx: Import text files into databases, including distributed databases, local disk databases or memory databases.
64 | + textChunkDS: divide the text file into multiple small data sources, and then use the mr function for flexible data processing.
65 |
66 | ploadText is the be one in current scenario. Before start dolphindb session, please make sure CSV file locate in current directory. To get cost time and verify data loss, It's needed output execution time(timer) and loaded item count(count(*)).
67 |
68 | ```shell
69 | dbPath="dfs://nasdaqdb"
70 | login("admin","123456")
71 | db=database(dbPath)
72 | tb_nasdaq=loadTable(db,'tb_nasdaq')
73 | timer tb_nasdaq=ploadText('nasdaq_dolphin_data.csv')
74 | select count(*) from tb_nasdaq
75 | pdt=loadTable(db,'tb_nasdaq')
76 | pdt.append!(tb_nasdaq);
77 | ```
78 |
79 | # Experiment Result
80 |
81 | |Number of experiments|Time cost(ms)|
82 | |:-|:-|
83 | |1st|2026.2 ms|
84 | |2nd|1998.19 ms|
85 | |3rd|1993.18 ms|
86 | |4th|1949.8 ms|
87 |
88 | # Reference
89 | + [DolphinDB 文本数据加载教程](https://gitee.com/dolphindb/Tutorials_CN/blob/master/import_csv.md)
90 | + [DolphinDB 文本数据数据库教程](https://zhuanlan.zhihu.com/p/46299595)
--------------------------------------------------------------------------------
/code/Nasdaq2TDengine/Program.cs:
--------------------------------------------------------------------------------
1 | using System;
2 | using System.IO;
3 | using Nasdaq.Data;
4 |
5 | namespace Nasdaq2TDengine
6 | {
7 | class Program
8 | {
9 | static void Main(string[] args)
10 | {
11 | if(args.Length < 1)
12 | {
13 | Console.WriteLine("Please specify Nasdaq data folder path.");
14 | return;
15 | }
16 | string path = args[0]; //"/Users/micl/Documents/Nasdaq/us/nasdaq/";
17 | if (!Directory.Exists(path))
18 | {
19 | Console.WriteLine("The path {0} does not exist.", path);
20 | return;
21 | }
22 |
23 | // Delete history data folder and re-create.
24 | if (Directory.Exists("data"))
25 | {
26 | Directory.Delete("data", true);
27 | }
28 | Directory.CreateDirectory("data");
29 |
30 | string[] subdirectiores = System.IO.Directory.GetDirectories(path);
31 | int subFolderCount = subdirectiores.Length;
32 | Console.WriteLine("Total found {0} sub folders", subFolderCount);
33 | int currentFolder = 1;
34 | foreach (string folder in subdirectiores)
35 | {
36 | Console.WriteLine("Now handling {0} / {1} folder, Symbol = {2}",
37 | currentFolder, subFolderCount, folder.Substring(folder.LastIndexOf("/") + 1, folder.Length - folder.LastIndexOf("/") - 1));
38 | string[] files = Directory.GetFiles(folder, "*.json");
39 | foreach (string f in files)
40 | {
41 | var filename = "./data/" + Path.GetFileName(folder) + ".csv";
42 | if (File.Exists(filename))
43 | File.Delete(filename);
44 | using (StreamWriter sw = new StreamWriter(filename))
45 | {
46 | Console.WriteLine("Now reading file : {0}", f);
47 | var stock = new Stock(f);
48 | stock.LoadData();
49 | Console.WriteLine("File loaded, total {0} items", stock.Data.Count);
50 | foreach (StockData d in stock.Data)
51 | {
52 | sw.WriteLine(d.ToString("tdengine"));
53 | }
54 | sw.Flush();
55 | }
56 |
57 | }
58 | currentFolder++;
59 | }
60 |
61 | // Create batch files
62 | StreamWriter swCreate = new StreamWriter("create_table_tdengine.txt");
63 | StreamWriter swImport = new StreamWriter("import_table_tdengine.txt");
64 | swImport.WriteLine("use nasdaq;");
65 | swCreate.WriteLine("use nasdaq;");
66 | foreach (string folder in subdirectiores)
67 | {
68 | Console.WriteLine("Now handling {0} / {1} folder, Symbol = {2}",
69 | currentFolder, subFolderCount, folder.Substring(folder.LastIndexOf("/") + 1, folder.Length - folder.LastIndexOf("/") - 1));
70 | var symbol = Path.GetFileName(folder);
71 | swCreate.WriteLine(string.Format("CREATE TABLE {0} USING tb_nasdaq TAGS (\"{0}\");", symbol));
72 | swImport.WriteLine(string.Format("INSERT INTO {0} FILE \'{0}.csv\';", symbol));
73 | }
74 |
75 | swCreate.Close();
76 | swImport.Close();
77 | }
78 | }
79 | }
80 |
--------------------------------------------------------------------------------
/docs/import_data_into_TDEngine.md:
--------------------------------------------------------------------------------
1 | # Import data into TDengine database
2 |
3 | In this document, you will learn how to import Nasdaq stock data into TDEngine database.
4 |
5 | ## 1. Create a database in TDEngine service.
6 |
7 | TDEngine use SQL-like language to intract with users. You can use below command to create a database named nasdaq.
8 |
9 | ```shell
10 | CREATE DATABASE nasdaq [KEEP 3650];
11 | USE nasdaq;
12 | ```
13 | KEEP means data retention, any data stay long than keep days will be delete from database. The default value is 3650 == 10 years. USE command can help user swith databases in TDEngine service.
14 |
15 | ## 2. Creating a super table.
16 |
17 | In TDEngine, every device or identified object should has its owned data table. All of this kind of data table has same data struct. The super table is a template that defines data struct for a kind of device or identified object.
18 |
19 | ```sql
20 | create table tb_nasdaq(date TIMESTAMP,opening_price float,highest_price float,lowest_price float,closing_price float,adjusted_closing_price float,trade_volume float) tags (code BINARY(10));
21 | ```
22 |
23 | In this table, date column is timer series index. It's required. The column 'code' is used to identify data belongs which company. It used for group by calculation on super table.
24 |
25 | The super table looks like a data view in RMDBS. It doesn't maintain any data physically but can offer a way to query physical data table in common statements.
26 |
27 | TDEngine only supports 10 kinds of data types:
28 | |Data Type| Bytes|Note|
29 | |:-|:-|:-|
30 | |TINYINT|1|A nullable integer type with a range of [-127, 127]|
31 | |SMALLINT|2|A nullable integer type with a range of [-32767, 32767]|
32 | |INT|4|A nullable integer type with a range of [-2^31+1, 2^31-1 ]|
33 | |BIGINT|8|A nullable integer type with a range of [-2^59, 2^59 ]|
34 | |FLOAT|4|A standard nullable float type with 6 -7 significant digits and a range of [-3.4E38, 3.4E38]|
35 | |DOUBLE|8|A standard nullable double float type with 15-16 significant| digits and a range of [-1.7E308, 1.7E308]|
36 | |BOOL|1|A nullable boolean type, [true, false]|
37 | |TIMESTAMP|8|A nullable timestamp type with the same usage as the primary column timestamp|
38 | |BINARY(M)|M|A nullable string type whose length is M, error should be threw with exceeded chars, the maximum length of M is 65526, but as maximum row size is 64K bytes, the actual upper limit will generally less than 65526. This type of string only supports ASCii encoded chars.|
39 | |NCHAR(M)|4|* M A nullable string type whose length is M, error should be threw with exceeded chars. The NCHAR type supports Unicode encoded chars.|
40 |
41 | The symbol column a string type data, has to specify to type BINARY(10).
42 |
43 |
44 | ## 3. Creating data table for each code name.
45 |
46 | In Nasdaq stock data, the code means a abbr. of a company It used for identify stock data belongs which company. We need to create spcific data table for each company and use tb_nasdaq as reference. For example:
47 |
48 | ```sql
49 | CREATE TABLE LANC USING tb_nasdaq TAGS ("LANC");
50 | ```
51 | To import data into table, it needed to create a batch file that contains thoudands of SQL commands to create data table for each code name.
52 |
53 | The file can be founded [HERE](https://dxact.blob.core.chinacloudapi.cn/21mfilms/create_table_tdengine.txt)
54 |
55 |
56 | ## 4. using insert into clause to import data.
57 |
58 | TDEngine does not support import data directly into super table. It only support import data into physical table. It means that all Nasdaq stock data has to be seperated into thoundands of files by code name. CSV file will be imported by each SQL statements.
59 |
60 | ```shell
61 | insert into tb_nasdaq file '[code].csv'
62 | ```
63 | The data file can be found at [HERE](https://dxact.blob.core.chinacloudapi.cn/21mfilms/tdengine_data.zip).
64 |
65 | The zip file also contains import_tdendine_data.txt for insert command batch execution.
66 |
67 | Notice:
68 | ***
69 | The timestamp column 'date' must be at least accurate to the second.
70 | The code column should be surrounded by double quote.
71 | The sample data as below:
72 | '2012-01-03 00:00:00',69.49,71.58,69.04,70.47,52.8236,107000,"LANC"
73 | '2012-01-04 00:00:00',68.92,69.65,68.59,69.08,52.39031,67100,"LANC"
74 | ***
--------------------------------------------------------------------------------
/code/Nasdaq.Data/StockData.cs:
--------------------------------------------------------------------------------
1 | using System;
2 | using System.IO;
3 | using System.Collections.Generic;
4 | using Newtonsoft.Json.Linq;
5 |
6 | namespace Nasdaq.Data
7 | {
8 | public class StockData
9 | {
10 | public string date { get; set; }
11 | public string code { get; set; }
12 | public string name { get; set; }
13 | public float opening_price { get; set; }
14 | public float highest_price { get; set; }
15 | public float lowest_price { get; set; }
16 | public float closing_price { get; set; }
17 | public float adjusted_closing_price { get; set; }
18 | public float trade_volume { get; set; }
19 |
20 |
21 | public string ToString(string dbType)
22 | {
23 | switch(dbType.ToLower())
24 | {
25 | case "influxdb":
26 | return string.Format("code=\"{0}\",opening_price=\"{1}\",highest_price=\"{2}\",lowest_price=\"{3}\",closing_price=\"{4}\",adjusted_closing_price=\"{5}\",trade_volume=\"{6}\" {7}",
27 | code, opening_price, highest_price, lowest_price, closing_price, adjusted_closing_price, trade_volume, DateTime2Int(DateTime.Parse(date)));
28 | // return string.Format("{0},{1},{2},{3},{4},{5},{6},{7}", code, opening_price, highest_price, lowest_price, closing_price, adjusted_closing_price, trade_volume, DateTime2Int(DateTime.Parse(date)));
29 | case "tdengine":
30 | return string.Format("\'{0} 00:00:00.000\',{2},{3},{4},{5},{6},{7},\"{1}\"",
31 | date, code, opening_price, highest_price, lowest_price,
32 | closing_price, adjusted_closing_price, trade_volume);
33 | case "dolphindb":
34 | return string.Format("{0} 00:00:00.000,{1},{2},{3},{4},{5},{6},{7}",
35 | date.Replace("-", "."), code, opening_price, highest_price, lowest_price,
36 | closing_price, adjusted_closing_price, trade_volume);
37 | default:
38 | return string.Empty;
39 | }
40 | }
41 |
42 |
43 | private static int DateTime2Int(System.DateTime time)
44 | {
45 | DateTime start = DateTime.Parse("1970-01-01").ToLocalTime();
46 | return (int)(time - start).TotalSeconds;
47 | }
48 | }
49 |
50 | public class Stock
51 | {
52 | public string JsonFilename { get; set; }
53 |
54 | public List Data { get; private set; }
55 |
56 | public DateTime StartTime { get; private set; }
57 |
58 | public DateTime EndTime { get; private set; }
59 |
60 | public string Code { get; private set; }
61 |
62 | public string Name { get; private set; }
63 |
64 | private JObject jsonObj = null;
65 |
66 |
67 | public Stock(string jsonFilename)
68 | {
69 | JsonFilename = jsonFilename;
70 | }
71 |
72 | public void LoadData()
73 | {
74 | if (string.IsNullOrEmpty(JsonFilename) || !File.Exists(JsonFilename))
75 | return;
76 | using (StreamReader sr = new StreamReader(JsonFilename))
77 | {
78 | var content = sr.ReadToEnd();
79 | jsonObj = JObject.Parse(content);
80 | }
81 | if (null == jsonObj)
82 | return;
83 | StartTime = jsonObj.ContainsKey("oldest") ? DateTime.Parse(jsonObj["oldest"].ToString()) : DateTime.MinValue;
84 | EndTime = jsonObj.ContainsKey("latest") ? DateTime.Parse(jsonObj["latest"].ToString()) : DateTime.MinValue;
85 |
86 | if (StartTime != DateTime.MinValue && EndTime != DateTime.MinValue)
87 | {
88 | try
89 | {
90 | Data = new List(jsonObj.Count);
91 | for (DateTime date = StartTime; date <= EndTime; date = date.AddDays(1))
92 | {
93 | var key = date.ToString("yyyy-MM-dd");
94 | if (jsonObj.ContainsKey(key))
95 | {
96 | Data.Add(jsonObj[key].ToObject());
97 | }
98 | }
99 | if (Data.Count > 0)
100 | {
101 | Code = Data[0].code;
102 | Name = Data[0].name;
103 | }
104 | }
105 | catch (Exception ex)
106 | {
107 | Console.WriteLine(ex.ToString());
108 | }
109 | }
110 | }
111 |
112 | }
113 | }
114 |
--------------------------------------------------------------------------------
/docs/import_data_into_InfluxDB.md:
--------------------------------------------------------------------------------
1 | # Import data into InfluxDB database
2 |
3 | In this document you will learn how to imoport CSV file into InfluxDB.
4 |
5 |
6 | ## Creating Database
7 |
8 | Before import data into database, you have to create one first. Now need a database with 3650 retenation policy.
9 |
10 | ```shell
11 | > influx
12 |
13 | CREATE DATABASE nasdaq WITH DURATION 3650d
14 | use nasdaq
15 |
16 | ```
17 |
18 |
19 | ## Install Telegraf
20 |
21 | Telegraf is an agent written in Go for collecting metrics and writing them into InfluxDB or other possible outputs. This guide will get you up and running with Telegraf. It walks you through the download, installation, and configuration processes, and it shows how to use Telegraf to get data into InfluxDB. Telegraf can effecificntly import CSV data into InflxDB.
22 |
23 | Requirements:
24 |
25 | + Telegraf 1.8.0 or higher
26 | + InfluxDB 1.7.0 or higher
27 |
28 | Download and install Telegraf directly:
29 |
30 | ```shell
31 | wget https://dl.influxdata.com/telegraf/releases/telegraf_1.15.3-1_amd64.deb
32 | sudo dpkg -i telegraf_1.15.3-1_amd64.deb
33 | ```
34 |
35 | ## Configuring Telegraf
36 |
37 | Configuration file location by installation type
38 |
39 | Linux debian and RPM packages: /etc/telegraf/telegraf.conf
40 |
41 | Creating and editing the configuration file
42 |
43 | Before starting the Telegraf server you need to edit and/or create an initial configuration that specifies your desired inputs (where the metrics come from) and outputs (where the metrics go). There are several ways to create and edit the configuration file. Here, we’ll generate a configuration file and simultaneously specify the desired inputs with the -input-filter flag and the desired output with the -output-filter flag.
44 |
45 | Now need to generate a configure file for Telegraf:
46 |
47 | ```shell
48 | telegraf -sample-config -input-filter file -output-filter influxdb > file.conf
49 | ```
50 |
51 | Use nano open file.conf and do some edit action.
52 |
53 | ```shell
54 |
55 | nano file.conf
56 |
57 | ```
58 | At line 116, set database equals "nasdaq"
59 |
60 | ```shell
61 | database = "nasdaq"
62 | ```
63 |
64 | At Input Plugins section, add below lines to describe CSV file content:
65 |
66 | ```shell
67 |
68 | ###############################################################################
69 | # INPUT PLUGINS #
70 | ###############################################################################
71 |
72 | # Reload and gather from file[s] on telegraf's interval.
73 | [[inputs.file]]
74 | ## Files to parse each interval.
75 | ## These accept standard unix glob matching rules, but with the addition of
76 | ## ** as a "super asterisk". ie:
77 | ## /var/log/**.log -> recursively find all .log files in /var/log
78 | ## /var/log/*/*.log -> find all .log files with a parent dir in /var/log
79 | ## /var/log/apache.log -> only read the apache log file
80 | files = ["example.csv"]
81 |
82 | ## The dataformat to be read from files
83 | ## Each data format has its own unique set of configuration options, read
84 | ## more about them here:
85 | ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
86 | data_format = "csv"
87 | csv_header_row_count = 1
88 | csv_comment = "#"
89 | csv_measurement_column = "measurement"
90 | csv_tag_columns = ["code"]
91 | csv_timestamp_column = "date"
92 | csv_timestamp_format = "unix_ns"
93 |
94 | ```
95 |
96 | ## Starting Telegraf
97 |
98 | Put CSV data file and file.conf together in current floder. Then start telegraf with specified configuration file.
99 |
100 | ```shell
101 |
102 | telegraf --config file.conf
103 |
104 | ```
105 |
106 | CSV sample as below:
107 |
108 | ```csv
109 |
110 | measurement,code,opening_price,highest_price,lowest_price,closing_price,adjusted_closing_price,trade_volume,date
111 | tb_nasdaq,LANC,69.49,71.58,69.04,70.47,52.8236,107000,1325520000
112 | tb_nasdaq,LANC,68.92,69.65,68.59,69.08,52.39031,67100,1325606400
113 | tb_nasdaq,LANC,69.17,69.3,67.9,68.72,52.580345,97300,1325692800
114 | tb_nasdaq,LANC,68.7,69.33,68.55,68.98,52.223064,81700,1325779200
115 | tb_nasdaq,LANC,69.35,69.51,68.55,69.03,52.71719,125700,1326038400
116 |
117 | ```
118 |
119 | ## Result
120 |
121 | Total time cost > 10 hours.
122 |
123 | Telegraf is not an efficient way to import data. It's offical suggestion.
124 |
125 | ## importing data by HTTP service
126 |
127 | Since Telegraf is too slowly to import, we have to find a other way to import data. In this secion, we will use a InfluxDB client to access REST API by line protocol to import data into InfluxDB.
128 |
129 | ## Install package
130 |
131 | ```shell
132 |
133 | Install-Package InfluxDB.LineProtocol
134 |
135 | ```
136 | In the code, it will search all sub directories that contain Nasdaq trading data for each company. In each sub directory, It will retrieve trading data from .json file. And then, it will build an array of LinePoint objects that will be sent by PayLoad object trough HTTP connection.
137 |
138 | InfluxDB will get all of LinePoint objects and write them to database: nasdaq.
139 |
140 | ```csharp
141 |
142 | static void Main(string[] args)
143 | {
144 | var path = "/Users/micl/Documents/Nasdaq/us/nasdaq/";
145 | var database = "nasdaq";
146 |
147 | LineProtocolClient client = new LineProtocolClient(new Uri("http://{server}:8086"), database);
148 |
149 | string[] subdirectiores = System.IO.Directory.GetDirectories(path);
150 | int subFolderCount = subdirectiores.Length;
151 | Console.WriteLine("Total found {0} sub folders", subFolderCount);
152 | int currentFolder = 1;
153 | DateTime start = DateTime.Now;
154 | foreach (string folder in subdirectiores)
155 | {
156 | Console.WriteLine("Now handling {0} / {1} folder, Symbol = {2}",
157 | currentFolder, subFolderCount, folder.Substring(folder.LastIndexOf("/") + 1, folder.Length - folder.LastIndexOf("/") - 1));
158 | string[] files = Directory.GetFiles(folder, "*.json");
159 | foreach (string f in files)
160 | {
161 | Console.WriteLine("Now reading file : {0}", f);
162 | var stock = new Stock(f);
163 | stock.LoadData();
164 | Console.WriteLine("File loaded, total {0} items", stock.Data.Count);
165 |
166 | LineProtocolPayload payload = new LineProtocolPayload();
167 | foreach (StockData d in stock.Data)
168 | {
169 | LineProtocolPoint point = new LineProtocolPoint(
170 | "tb_nasdaq",
171 | new Dictionary
172 | {
173 | { "opening_price", d.opening_price},
174 | { "highest_price", d.highest_price},
175 | { "lowest_price", d.lowest_price},
176 | { "closing_price", d.closing_price},
177 | { "adjusted_closing_price", d.adjusted_closing_price},
178 | { "trade_volume", d.trade_volume},
179 | },
180 | new Dictionary
181 | {
182 | { "code", d.code }
183 | },
184 | DateTime.Parse(d.date).ToUniversalTime());
185 | payload.Add(point);
186 |
187 | }
188 | var result = client.WriteAsync(payload).GetAwaiter().GetResult();
189 | if (!result.Success)
190 | {
191 | Console.WriteLine(result.ErrorMessage);
192 | }
193 | }
194 | currentFolder++;
195 | }
196 |
197 | DateTime end = DateTime.Now;
198 | Console.WriteLine("Total spent: {0} seconds", (end - start).Seconds);
199 | }
200 |
201 | ```
202 | In this way, time spent: 2068.86100 seconds(about 35 minutes).
203 |
204 |
--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
1 | # Time Series Database Selection
2 |
3 | ## Executive Summary
4 |
5 | Now we are looking for a fast, lightly, easy to deployment time sierie database for future usage. By viewing [db-engines.com](https://db-engines.com/en/ranking/time+series+dbms) website and communities and GitHub, we selected 4 time series databases for assesment in this time. They are:
6 |
7 | | ||||
8 | |:-:|:-:|:-:|:-:|
9 | |[DolphinDB](https://www.dolphindb.com/)|[InfluxDB](https://www.influxdata.com/)|[KDB+](https://www.kx.com/)|[TDEngine](http://www.taosdata.com/en/)|
10 |
11 |
12 | We noticed that there are OpenTSDB, TimescaleDB etc,. Most of them base on relational database. It not lightly and easy to deployment.For testing these database, we downloaded Nasdaq daily stock trading summary (time: 2011.01.01~2020.09, 3065+ company) as testing data. We evaluate database by data importing, operationing, query time and others.
13 |
14 | Besides data, we also considering licensing type, architecture, ecosystem and support for future usage.
15 |
16 | ### Conclusion
17 |
18 | By completed whole evaluation, we can get below facts:
19 |
20 | + All of databases support Docker and kubernetes deployment.
21 | + DolphinDB has an overwhelming advantage in all of these 4 databases.
22 | + InfluxDB is schema free.
23 | + TDEngine and DolphinDB are schema defined database.
24 | + Best performance is DolphinDB but it under commercial license.
25 |
26 |
27 | Summaraied above information, we recommended **** database as the best choice.
28 |
29 | ## Scoping
30 |
31 | The testing only focus on these four time series database: DolphinDB, InfluxDB, KDB+, TDEngine. The testing will running on a specific virtual machine environment with same operating system Ubuntu 18.04.
32 |
33 | ### Environment
34 |
35 | Regarding testing resource, we selected **DSv4** series Virtual Machine. Detail as below:
36 |
37 | |Series|vCPU|Memory(GiB)|Temproary Storage(GiB)|Max Disk|IOPS/MBPs)|
38 | |:-|:-|:-|:-|:-|:-|
39 | |Standard_D4s_v3|4|16|32|8|8000/64 (100)|
40 |
41 | This kind of Vritual Machine has a good balance of Azure quota and pricing. Azure has deployed a large number of this kind of resources. All VMs will be running on Premium SSD for best disk performance.
42 |
43 | Operating System is **Ubuntu 18.04**.
44 |
45 | ### Version of Time Series database
46 |
47 | |Database|Version|Comments|
48 | |:-|:-|:-|
49 | |DolphinDB|1.00.24 64-bit||
50 | |InfluxDB|1.8.2 64-bit|Production version,although 2.0 is going to beta.|
51 | |KDB+||To prevent leagal issue, we only use 32-bit version.|
52 | |TDEngine|2.0.3.0 64-bit|
53 |
54 |
55 |
56 | ## Data for Testing
57 |
58 | The testing data comes from [Yahoo finance](https://finance.yahoo.com/) by Python library [yfinance](https://pypi.org/project/yfinance/). The data which is daily information contains 3086 stock symbols time spans 2011.01. 03~2011.09.02. Total lines are **4467799**.
59 |
60 | Data contains below fields:
61 |
62 | |Fields|Meaning|
63 | |:-|:-|
64 | |date|The trading date.|
65 | |opening_price|The price at which a security first trades upon the opening of an exchange on a trading day.|
66 | |highest_price|Today's high is the highest price at which a stock traded during the course of the trading day.|
67 | |lowest_price|Today's high is the lowest price at which a stock traded during the course of the trading day.|
68 | |closing_price|It refers to the last price at which a stock trades during a regular trading day.|
69 | |adjusted_closing_price|The price that is quoted at the end of the trading day is the price of the last lot of stock that was traded for the day. This is referred to as the stock's closing price.|
70 | |trade_volume|It is a measure of how much of a given financial asset has traded in a period of time. For stocks, volume is measured in the number of shares traded and, for futures and options, it is based on how many contracts have changed hands.|
71 |
72 | One of sample:
73 |
74 | ```json
75 | "2012-01-05": {
76 | "date": "2012-01-05",
77 | "code": "AAPL",
78 | "name": "Apple Inc. - Common Stock",
79 | "opening_price": 14.929642677307129,
80 | "highest_price": 14.948214530944824,
81 | "lowest_price": 14.738214492797852,
82 | "closing_price": 14.819643020629883,
83 | "adjusted_closing_price": 12.90129280090332,
84 | "trade_volume": 271269600.0
85 | }
86 | ```
87 | Total items are **5205952**. Different databases require different data formats. Data has to be changed into different format to compatiable data importing.
88 |
89 |
90 | ## Licensing Considerations
91 |
92 | **DolphinDB** has two kind of license for community and enterprise. In Community Edition, there are many limitations:
93 |
94 | + Not for commercial use
95 | + Up to 2 nodes with 2 cores and 4GB RAM per node
96 |
97 | Community Edition not for huge dataset. Because dolphinDB needs more memory during querying and calculating for optimizing. Memory limitmation will be bottleneck for huge data operating.
98 |
99 | Enterprise Edition:
100 | + Payment required, price unknown. Because to get pricing have to contact sales.
101 | + No resource limitation.
102 |
103 | **InfluxDB** has three edtions: InfluxDB cloud, InfluxDB open source, InfluxDB Enterprise. InfluxDB cloud encapsulates influxDB Enterprise make a PaaS service to users. Enterprise edition has full features. Open source endition will not offer cluster feature since v0.13.
104 |
105 | Since v1.0, cluster will be a feature plugin. Some community contributor created cluster feature and merged into fork. But seems not well tested.
106 |
107 | The open source version anncounced [MIT License](https://github.com/influxdata/influxdb/blob/master/LICENSE). It means that anyone can legally copy, modify, merge, publish, distribute, sublicense, and/or sell copies.
108 |
109 | **KDB+** is a commerical software in finTech field. KDB+ has academic, 32bit Personal Version, 64bit Personal Version, Non-Expiring Cores, Subscription cores, On Demand and OEM 7 editions. Only academic, 32bit Personal Version, 64bit Personal Version are free. The 64bit Personal Version requires always connect to KDB+ for submit information to keep license alive. All free edition are not allowed use in business scenarios.
110 |
111 | **TDEngine** has three editions: Community Edition, Enterprise Edition, Cloud Edition. Community Edition is free without commercial usage limitation. Community Edition is under [GNU Affero General Public License v3.0](https://github.com/taosdata/TDengine/blob/develop/LICENSE). Anyone can legally (1) assert copyright on the software, and (2) offer this License which gives you legal permission to copy, distribute and/or modify the software.
112 |
113 | ### Summary
114 |
115 | Here is ranking of software license agreement friendliness:
116 |
117 | 1. InfluxDB(Can do anything, user can decide open source or not.)
118 | 2. TDEngine(Can do anything, user has to open source.)
119 | 3. DolphinDB & KDB+(Commerical software, not open source, user has to by license.)
120 |
121 | DolphinDB and KDB+ are not allow free version running in business scenarios. KDB+ requries to collect usage data to keep license alive.
122 |
123 |
124 | ## Architecture Overview
125 |
126 | Since DolphinDB and KDB+ are not open source software, they does not expose any architecture information. In this section will only discuss InfluxDB and TDengine.
127 |
128 | ### InfluxDB
129 |
130 | 
131 |
132 | InfluxDB can be divided into data persistence layer, internal component layer, and external service layer. Let's analyze it from the lowest layer to the upper layer. In terms of storage, Influxdb is divided into two types of storage, one is META storage, and the other is data storage. First, let’s check the directory structure on the disk.
133 |
134 | META storage will generate a meta.db file in the data directory, which mainly stores the meta information of Influxdb, such as the name of the created database, the retention policy of the database, and so on. Data storage is mainly divided into two types, one is persistent data and the other is pre-written logs. Students who are familiar with storage systems are not unfamiliar with this synchronous pre-write and asynchronous flashing method. This method can be very good. Solve the problems of distributed data storage synchronization and write reliability. The data in the data and wal directories are stored through TSM Engine. The name of the second-level directory is the name of the database, the name of the third-level directory is the name of the retention policy, the fourth-level directory is the id of the shard, and the next level It is the actual stored data file. In addition, careful students will find that the directory structure of wal is exactly the same as the directory structure of data, including the file name. This is because the smallest logical storage unit entity in the TSM Engine is the Shard, and the Shard will build data and data according to its own configuration. wal, so they have the same directory structure.
135 |
136 | The internal component layer is the encapsulation of the underlying components in Influxdb. For example, all scenarios that need to manipulate meta information will reference MetaClient, scenarios that need to write data will reference PointsWritter by default, and scenarios that require data query will reference QueryExecutor, etc. These internal components Will be referenced by the upper Service, so we won't go into details about these internal components.
137 |
138 | ### TDengine
139 |
140 | There are two main modules in TDengine server as shown in Picture 1: Management Module (MGMT) and Data Module(DNODE). The whole TDengine architecture also includes a TDengine Client Module.
141 |
142 | 
143 |
144 | **MGMT Module**
145 |
146 | The MGMT module deals with the storage and querying on metadata, which includes information about users, databases, and tables. Applications will connect to the MGMT module at first when connecting the TDengine server. When creating/dropping databases/tables, The request is sent to the MGMT module at first to create/delete metadata. Then the MGMT module will send requests to the data module to allocate/free resources required. In the case of writing or querying, applications still need to visit the MGMT module to get meta data, according to which, then access the DNODE module.
147 |
148 | **DNODE Module**
149 |
150 | The DNODE module is responsible for storing and querying data. For the sake of future scaling and high-efficient resource usage, TDengine applies virtualization on resources it uses. TDengine introduces the concept of a virtual node (vnode), which is the unit of storage, resource allocation and data replication (enterprise edition). As is shown in Picture 2, TDengine treats each data node as an aggregation of vnodes.
151 |
152 | When a DB is created, the system will allocate a vnode. Each vnode contains multiple tables, but a table belongs to only one vnode. Each DB has one or mode vnodes, but one vnode belongs to only one DB. Each vnode contains all the data in a set of tables. Vnodes have their own cache and directory to store data. Resources between different vnodes are exclusive with each other, no matter cache or file directory. However, resources in the same vnode are shared between all the tables in it. Through virtualization, TDengine can distribute resources reasonably to each vnode and improve resource usage and concurrency. The number of vnodes on a dnode is configurable according to its hardware resources.
153 |
154 | **Writing Process**
155 |
156 | TDengine uses the Writing Ahead Log strategy to assure data security and integrity. Data received from the client is written to the commit log at first. When TDengine recovers from crashes caused by power loss or other situations, the commit log is used to recover data. After writting to the commit log, data will be wrtten to the corresponding vnode cache, then an acknowledgment is sent to the application. There are two mechanisms that can flush data in cache to disk for persistent storage:
157 |
158 | 
159 |
160 | Flush driven by timer: There is a backend timer which flushes data in cache periodically to disks. The period is configurable via parameter commitTime in system configuration file taos.cfg.
161 | Flush driven by data: Data in the cache is also flushed to disks when the left buffer size is below a threshold. Flush driven by data can reset the timer of flush driven by the timer.
162 |
163 |
164 | ## Performance Considerations
165 |
166 |
167 | ### Data Import Performance
168 |
169 | In this part of evaluation, we will import all of Nasdaq data into one data table and get data of spent time. Since each data base requires its own data format, the downloaded data has to change format to adapt database. We won't calculate time of data preparation. It only focus on data importing process time.
170 |
171 | The result is DolphinDB got the best record that about spent 2.0s to import all of data.
172 | TDEngine almost spent 20 seconds to complete import.
173 |
174 | The poorest one is InfluxDB with Telegrah, more than 10 hours.
175 |
176 |
177 | For importing details, please view below:
178 |
179 | + [Import data into DolphinDB](./import_data_into_DolphinDB.md)
180 | + [Import data into InfluxDB](./import_data_into_InfluxDB.md)
181 | + [Import data into KDB+]()
182 | + [Import data into TDEngine](./import_data_into_TDEngine.md)
183 |
184 | ### Query Performance
185 |
186 | By tesing serval SQL statments, We got the rank for query performance:
187 |
188 | 1. DolphinDB
189 | 2. TDEngine
190 | 3. InfluxDB
191 |
192 | DolphinDB always complete query in several milliseconds. TTDEngine usually takes 100 times longer to complete the query than dolphinDB. InfluxDB it the slowest.
193 |
194 | For details, please view: [Query Performance](./QueryPerformance.md).
195 |
196 | ## Support and Community
197 |
198 | + [DolphinDB documentation](https://www.dolphindb.cn/cn/help/index.html)
199 | + [KDB+](https://code.kx.com/q/)
200 | + [InfluxDB](https://www.docs.influxdata.com/influxdb/v1.8)
201 | + [TDEngine](https://www.taosdata.com/cn/documentation/)
202 |
203 | Notes:
204 |
205 | ***
206 |
207 | + InfluxDb has a poor interactive experience. for example: if you select wrong column name that not in the measurement, it won't tell you anything about it. Just return prompt without any meesage.
208 | + There are a lot of difference between english version and chinese version in DolphinDB documents.
209 | ***
210 |
--------------------------------------------------------------------------------