📝MSDS Dataset
2 |
3 |
16 |
17 | ## Directory
18 |
19 | - [Usage & Download](#usage&download)
20 | - [Description](#description)
21 | - [Collection](#collection)
22 | - [Responsible Use](#responsible-use)
23 | - [Experimental Result](#experimental-result)
24 | - [License](#license)
25 | - [Directory Format](#directory-format)
26 | - [Copyright](#copyright)
27 |
28 | ##
🖥️Usage & Download
29 |
30 | - The MSDS dataset can only be used for non-commercial research purposes. For scholar or organization who wants to use the MSDS dataset, please first fill in this [Application Form](./application-form/Application-Form-for-Using-MSDS.docx) and sign the [Legal Commitment](./application-form/Legal-Commitment.docx) and email them to us. When submitting the application form to us, please list or attached 1-2 of your publications in the recent 6 years to indicate that you (or your team) do research in the related research fields of handwriting verification, handwriting analysis and recognition, document image processing, and so on.
31 | - We will give you the download link and the decompression password after your application has been received and approved.
32 | - All users must follow all use conditions; otherwise, the authorization will be revoked.
33 |
34 | ##
📖Description
35 |
36 | MSDS dataset is a handwriting verification benchmark dataset and consists of two subsets: MSDS-ChS (for Chinese signatures) and MSDS-TDS (for Token Digit Strings). Each subset contains 16080 samples from 402 users, with 20 genuine samples and 20 skilled forgeries per user. The details are presented below:
37 |
38 | | Subset | Content | Online | Offline | User | Genuine Sample | Skilled Forgery | Features |
39 | | -------- | ------------------ | :----: | :-----: | ---- | --------------------------- | --------------------------- | --------------- |
40 | | MSDS-ChS | Chinese Signature | ✓ | ✓ | 402 | $402\times(10 + 10) = 8040$ | $402\times(10 + 10) = 8040$ | $X,Y,P,T,I_r,U$ |
41 | | MSDS-TDS | Token Digit String | ✓ | ✓ | 402 | $402\times(10 + 10) = 8040$ | $402\times(10 + 10) = 8040$ | $X,Y,P,T,I_r,U$ |
42 |
43 | $X,Y,P,T,I_r,U$ respectively denote the $x, y$ coordinates, pressure, timestamps, rendered static images, and the pen-up/pen-down information. The pen-down/pen-up information is represented by 0~4. 0 indicates that this is not a pen-up/pen-down point. 1 indicates that this is a pen-down point. 2 indicates that this is a pen-up point. 3 indicates that this point is both a pen-up and pen-down point, which is isolated.
44 |
45 | The contributions of MSDS include:
46 |
47 | - [x] MSDS-ChS is the largest publicly available Chinese signature dataset for signature for signature verification, at least eight times larger than existing ones.
48 | - [x] MSDS-TDS is the first dataset that covers Token Digit String, which brings a new and effective biometric for handwriting verification.
49 | - [x] The experimental results indicates that the Token Digit String is more powerful than Chinese signature, which is inspiring and promising.
50 |
51 | ##
🧬Collection
52 |
53 | The data of MSDS was acquired with two types of Android tablets. Both of them have specific stylus. We specifically developed an Android app and the user interface is shown below. Users directly performed handwriting on the tablets with specific styluses and the produced information was automatically recorded by the app.
54 |
55 | 
56 |
57 | The data acquisition process is divided into two separate sessions with a time interval of at least 21 days. In each session, users performed writing according to the same procedure: 10 genuine signatures→10 genuine phone numbers→10 forged signatures→10 forged phone numbers.
58 |
59 | ##
⚒️Responsible Use
60 |
61 | MSDS is collected for handwriting identity verification. Specifically, the MSDS-ChS subset could be exploited in online/offline Chinese signature verification, and the MSDS-TDS subset is intended to be used in online/offline identity verification with Token Digit Strings. In addition, MSDS can be exploited in writer identification.
62 |
63 | ##
🔭Experimental Result
64 |
65 | 
66 |
67 | Experimental results show all models perform better on MSDS-TDS than MSDS-ChS. This finding is inspiring that the accuracy of TDS verification is higher than that of Chinese signature verification as the two subsets were collected simultaneously. Therefore, Token Digit String could be a more effective biometric trait than Chinese signature for high-accurate online verification.
68 |
69 | ##
📄License
70 |
71 | MSDS should be used and distributed under [Creative Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License](https://creativecommons.org/licenses/by-nc-nd/4.0/) for non-commercial research purposes.
72 |
73 | ##
📁Directory Format
74 |
75 | The dataset is organized in the following directory format:
76 |
77 | ```bash
78 | ├─MSDS
79 | │ ├─MSDS-ChS
80 | │ │ ├─session1
81 | │ │ │ ├─0
82 | │ │ │ │ ├─images
83 | │ │ │ │ │ ├─f_0_0.png
84 | │ │ │ │ │ ├─f_0_1.png
85 | │ │ │ │ │ ├─...
86 | │ │ │ │ │ ├─g_0_0.png
87 | │ │ │ │ │ ├─g_0_1.png
88 | │ │ │ │ │ └─...
89 | │ │ │ │ └─series
90 | │ │ │ │ │ ├─f_0_0.txt
91 | │ │ │ │ │ ├─f_0_1.txt
92 | │ │ │ │ │ ├─...
93 | │ │ │ │ │ ├─g_0_0.txt
94 | │ │ │ │ │ ├─g_0_1.txt
95 | │ │ │ │ │ └─...
96 | │ │ │ ├─1
97 | │ │ │ │ ├─images
98 | │ │ │ │ │ ├─f_0_0.png
99 | │ │ │ │ │ ├─f_0_1.png
100 | │ │ │ │ │ ├─...
101 | │ │ │ │ │ ├─g_0_0.png
102 | │ │ │ │ │ ├─g_0_1.png
103 | │ │ │ │ │ └─...
104 | │ │ │ │ └─series
105 | │ │ │ │ │ ├─f_0_0.txt
106 | │ │ │ │ │ ├─f_0_1.txt
107 | │ │ │ │ │ ├─...
108 | │ │ │ │ │ ├─g_0_0.txt
109 | │ │ │ │ │ ├─g_0_1.txt
110 | │ │ │ │ │ └─...
111 | │ │ │ └─...
112 | │ │ └─session2
113 | │ │ ├─...
114 | │ └─MSDS-TDS
115 | │ │ ├─session1
116 | │ │ │ ├─0
117 | │ │ │ │ ├─images
118 | │ │ │ │ │ ├─f_0_0.png
119 | │ │ │ │ │ ├─f_0_1.png
120 | │ │ │ │ │ ├─...
121 | │ │ │ │ │ ├─g_0_0.png
122 | │ │ │ │ │ ├─g_0_1.png
123 | │ │ │ │ │ └─...
124 | │ │ │ │ └─series
125 | │ │ │ │ │ ├─f_0_0.txt
126 | │ │ │ │ │ ├─f_0_1.txt
127 | │ │ │ │ │ ├─...
128 | │ │ │ │ │ ├─g_0_0.txt
129 | │ │ │ │ │ ├─g_0_1.txt
130 | │ │ │ │ │ └─...
131 | │ │ │ ├─1
132 | │ │ │ │ ├─images
133 | │ │ │ │ │ ├─f_0_0.png
134 | │ │ │ │ │ ├─f_0_1.png
135 | │ │ │ │ │ ├─...
136 | │ │ │ │ │ ├─g_0_0.png
137 | │ │ │ │ │ ├─g_0_1.png
138 | │ │ │ │ │ └─...
139 | │ │ │ │ └─series
140 | │ │ │ │ │ ├─f_0_0.txt
141 | │ │ │ │ │ ├─f_0_1.txt
142 | │ │ │ │ │ ├─...
143 | │ │ │ │ │ ├─g_0_0.txt
144 | │ │ │ │ │ ├─g_0_1.txt
145 | │ │ │ │ │ └─...
146 | │ │ │ └─...
147 | │ └─session2
148 | │ │ ├─...
149 | ```
150 |
151 | - The `MSDS-ChS` folder contains Chinese signatures and `MSDS-TDS` contains Token Digit Strings (TDS).
152 | - Each of them contains the data in two sessions which is stored in `session1` and `session2`.
153 | - The users are arranged from `0` to `401`, with online dynamic time series and offline static images provided in `series` and `images`. The time series are saved as `.txt` files and the images are in `.png` format.
154 | - The naming of each file follows the same format: `flag_user_index`.
155 | - - `flag` is `f` or `g`. `f` indicates that this file is a skilled forgery, while `g` indicates that it is a genuine sample.
156 | - - `user` indicates the number of user of this file.
157 | - - `index` indicates the number of this file (`.txt` or `.png`) in the current folder.
158 | - - For example, `f_0_0.txt` represents the first file (time series) of all skilled forgeries of the user `0`. `g_5_6.png` represents the seventh file (image) of all genuine samples of the user `5`.
159 |
160 | ##
:bookmark_tabs:Citation
161 |
162 | ```
163 | @inproceedings{zhang2022msds,
164 | author = {Zhang, Peirong and Jiang, Jiajia and Liu, Yuliang and Jin, Lianwen},
165 | booktitle = {{Advances in Neural Information Processing Systems (NeurIPS)}},
166 | pages = {36507--36519},
167 | title = {{MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification}},
168 | volume = {35},
169 | year = {2022}
170 | }
171 | ```
172 |
173 | ##
:palm_tree:Copyright
174 |
175 | For commercial purpose usage, please contact Prof. Lianwen Jin: eelwjin@scut.edu.cn.
176 |
177 | Copyright 2022-2025, [Deep Learning and Vision Computing Lab](http://www.dlvc-lab.net), South China China University of Technology.
178 |
--------------------------------------------------------------------------------
/application-form/.ignore:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/application-form/Application-Form-for-Using-MSDS.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HCIILAB/MSDS/b25fc79028bd2dff64ebdab7eedf1941ae94bb08/application-form/Application-Form-for-Using-MSDS.docx
--------------------------------------------------------------------------------
/application-form/Legal-Commitment.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HCIILAB/MSDS/b25fc79028bd2dff64ebdab7eedf1941ae94bb08/application-form/Legal-Commitment.docx
--------------------------------------------------------------------------------
/images/chs-tds.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HCIILAB/MSDS/b25fc79028bd2dff64ebdab7eedf1941ae94bb08/images/chs-tds.png
--------------------------------------------------------------------------------
/images/gui.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HCIILAB/MSDS/b25fc79028bd2dff64ebdab7eedf1941ae94bb08/images/gui.png
--------------------------------------------------------------------------------