├── CONTRIBUTING.md
├── LICENSE.txt
├── README.md
├── eval.log
├── lzh_kyoto-ud-dev.conllu
├── lzh_kyoto-ud-test.conllu
├── lzh_kyoto-ud-train.conllu
├── not-to-release
├── deprel.lzh
└── feat_val.lzh
└── stats.xml
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing
2 |
3 | Please do not make pull requests against master, any such pull requests will be
4 | closed. Instead make them against the dev branch.
5 |
6 | For full details on the branch policy see
7 | [here](http://universaldependencies.org/release_checklist.html#repository-branches).
8 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | The treebank is licensed under the Creative Commons License Attribution-ShareAlike 4.0 International.
2 |
3 | The complete license text is available at:
4 | http://creativecommons.org/licenses/by-sa/4.0/legalcode
5 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Summary
2 |
3 | Classical Chinese Universal Dependencies Treebank annotated and converted by Institute for Research in Humanities, Kyoto University.
4 |
5 | # Introduction
6 |
7 | This Treebank is taken under the full text of [論語](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR1h0004), [孟子](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR1h0001), [禮記](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR1d0052), [十八史略](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/18shilue), [楚辭](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR4a0001), [戰國策](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR2e0003), and others. In Classical Chinese we had no spaces or punctuations between words or sentences, so we did not include any spaces or punctuations in Treebank files:
8 |
9 | * lzh_kyoto-ud-test.conllu
10 | - 學而篇第一 為政篇第二 and 八佾篇第三 from 論語
11 | - 梁惠王上 and 梁惠王下 from 孟子
12 | - 中庸 from 禮記
13 | - 春秋戰國 from 十八史略
14 | - 離騷 from 楚辭
15 | - [摩訶般若波羅蜜大明呪經](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR6c0127)
16 | - 東周 from 戰國策
17 |
18 | * lzh_kyoto-ud-dev.conllu
19 | - 顏淵篇第十二 子路篇第十三 and 憲問篇第十四 from 論語
20 | - 告子上 and 告子下 from 孟子
21 | - 大學 from 禮記
22 | - 唐 from 十八史略
23 | - 遠遊 from 楚辭
24 | - [金剛般若波羅蜜經](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR6c0023)
25 | - 西周 from 戰國策
26 |
27 | * lzh_kyoto-ud-train.conllu
28 | - 論語 (except for 學而篇第一 為政篇第二 八佾篇第三 顏淵篇第十二 子路篇第十三 憲問篇第十四)
29 | - 孟子 (except for 梁惠王上 梁惠王下 告子上 告子下)
30 | - 禮記 (except for 中庸 大學)
31 | - 十八史略 (except for 春秋戰國 唐)
32 | - 九歌 天問 九章 卜居 漁父 九辯 and 招魂 from 楚辭
33 | - [唐詩三百首](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR4h0169)
34 | - [佛說阿彌陀經](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR6f0082)
35 | - 戰國策 (except for 東周 西周)
36 |
37 | # References
38 |
39 | * Koichi Yasuoka: [Universal Dependencies Treebank of the Four Books in Classical Chinese](http://hdl.handle.net/2433/245217), DADH2019: 10th International Conference of Digital Archives and Digital Humanities (December 2019), pp.20-28.
40 | * Koichi Yasuoka, Christian Wittern, Tomohiko Morioka, Takumi Ikeda, Naoki Yamazaki, Yoshihiro Nikaido, Shingo Suzuki, Shigeki Moro, Kazunori Fujita: [Designing Universal Dependencies for Classical Chinese and Its Application](http://id.nii.ac.jp/1001/00216242/), Journal of Information Processing Society of Japan, Vol.63, No.2 (February 2022), pp.355-363.
41 |
42 | # Changelog
43 |
44 | * 2025-05-15 v2.16
45 | * bug fix and Gloss addition.
46 |
47 | * 2024-11-15 v2.15
48 | * bug fix and Gloss addition.
49 |
50 | * 2024-05-15 v2.14
51 | * bug fix and Gloss addition.
52 |
53 | * 2023-11-15 v2.13
54 | * bug fix for `fixed`.
55 |
56 | * 2023-05-15 v2.12
57 | * 戰國策 added.
58 |
59 | * 2022-11-15 v2.11
60 | * `nsubj:outer` and `csubj:outer` supported.
61 | * 唐詩三百首 added.
62 |
63 | * 2022-05-15 v2.10
64 | * 摩訶般若波羅蜜大明呪經 added.
65 | * 金剛般若波羅蜜經 added.
66 | * 佛說阿彌陀經 added.
67 |
68 | * 2021-11-15 v2.9
69 | * 9 poetries from 楚辭 added.
70 |
71 | * 2021-05-15 v2.8
72 | * 2 eras from 十八史略 added.
73 |
74 | * 2020-11-15 v2.7
75 | * 7 volumes from 禮記 added.
76 | * 17 eras from 十八史略 added.
77 |
78 | * 2020-05-15 v2.6
79 | * 19 volumes from 禮記 added.
80 |
81 | * 2019-11-15 v2.5
82 | * 22 volumes from 禮記 added.
83 |
84 | * 2019-05-15 v2.4
85 | * Initial release in Universal Dependencies.
86 |
87 |
88 | === Machine-readable metadata (DO NOT REMOVE!) ================================
89 | Data available since: UD v2.4
90 | License: PD
91 | Includes text: yes
92 | Genre: nonfiction poetry
93 | Lemmas: converted with corrections
94 | UPOS: converted with corrections
95 | XPOS: converted with corrections
96 | Features: converted with corrections
97 | Relations: manual native
98 | Contributors: Yasuoka, Koichi; Wittern, Christian; Morioka, Tomohiko; Ikeda, Takumi; Yamazaki, Naoki; Nikaido, Yoshihiro; Suzuki, Shingo; Moro, Shigeki; Li, Yuan; Shirasu, Hiroyuki; Fujita, Kazunori
99 | Contributing: elsewhere
100 | Contact: yasuoka@kanji.zinbun.kyoto-u.ac.jp
101 | ===============================================================================
102 |
103 |
--------------------------------------------------------------------------------
/eval.log:
--------------------------------------------------------------------------------
1 | Running the following version of UD tools:
2 | commit ecbbdff44b15c9b6de4a691e3499c1286459ab2e
3 | Author: Dan Zeman
4 | Date: Fri May 9 21:07:42 2025 +0200
5 | Evaluating the following revision of UD_Classical_Chinese-Kyoto:
6 | commit c32426535f4863cb4b8eae9a59c68fc6b0225f42
7 | Merge: dbf956f 779e363
8 | Author: Dan Zeman
9 | Size: counted 433168 of 433168 words (nodes).
10 | Size: min(0, log((N/1000)**2)) = 12.1422512870313.
11 | Size: maximum value 13.815511 is for 1000000 words or more.
12 | Split: Found more than 10000 training words.
13 | Split: Found at least 10000 development words.
14 | Split: Found at least 10000 test words.
15 | Lemmas: source of annotation (from README) factor is 0.9.
16 | Universal POS tags: 14 out of 17 found in the corpus.
17 | Universal POS tags: source of annotation (from README) factor is 0.9.
18 | Features: 171622 out of 433168 total words have one or more features.
19 | Features: source of annotation (from README) factor is 0.9.
20 | Universal relations: 33 out of 37 found in the corpus.
21 | Universal relations: source of annotation (from README) factor is 1.
22 | Udapi:
23 | TOTAL 132445
24 | Udapi: found 132445 bugs.
25 | Udapi: worst expected case (threshold) is one bug per 10 words. There are 433168 words.
26 | Genres: found 2 out of 18 known.
27 | /net/work/people/zeman/unidep/tools/validate.py --lang lzh --max-err=10 UD_Classical_Chinese-Kyoto/lzh_kyoto-ud-dev.conllu
28 | [Line 940 Sent KR1h0004_012_par12_3-13 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
29 | [Line 1033 Sent KR1h0004_012_par15_11-17 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
30 | [Line 1907 Sent KR1h0004_013_par3_41-51 Node 8]: [L3 Warning fixed-without-extpos] Fixed expression '闕 如' does not have the 'ExtPos' feature
31 | [Line 2397 Sent KR1h0004_013_par11_9-16 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
32 | [Line 2544 Sent KR1h0004_013_par15_4-10 Node 4]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
33 | [Line 2564 Sent KR1h0004_013_par15_17-25 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
34 | [Line 2638 Sent KR1h0004_013_par15_65-73 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
35 | [Line 3003 Sent KR1h0004_013_par20_59-61 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '硜 然' does not have the 'ExtPos' feature
36 | [Line 3003 Sent KR1h0004_013_par20_59-61 Node 1]: [L3 Warning fixed-gap] Gaps in fixed expression [1, 3] '硜 * 然'
37 | [Line 3017 Sent KR1h0004_013_par20_65-71 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
38 | ...suppressing further errors regarding Warning
39 | Warnings: 73
40 | *** PASSED ***
41 | /net/work/people/zeman/unidep/tools/validate.py --lang lzh --max-err=10 UD_Classical_Chinese-Kyoto/lzh_kyoto-ud-test.conllu
42 | [Line 1238 Sent KR1h0004_002_par9_18-21 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '足 以' does not have the 'ExtPos' feature
43 | [Line 1306 Sent KR1h0004_002_par11_8-12 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
44 | [Line 2408 Sent KR1h0004_003_par14_8-12 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '郁 乎' does not have the 'ExtPos' feature
45 | [Line 2408 Sent KR1h0004_003_par14_8-12 Node 1]: [L3 Warning fixed-gap] Gaps in fixed expression [1, 3] '郁 * 乎'
46 | [Line 2858 Sent KR1h0004_003_par23_15-17 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '翕 如' does not have the 'ExtPos' feature
47 | [Line 2869 Sent KR1h0004_003_par23_20-22 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '純 如' does not have the 'ExtPos' feature
48 | [Line 2875 Sent KR1h0004_003_par23_23-25 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '皦 如' does not have the 'ExtPos' feature
49 | [Line 2881 Sent KR1h0004_003_par23_26-28 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '繹 如' does not have the 'ExtPos' feature
50 | [Line 3073 Sent KR1h0001_001_par1_16-23 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '有 以' does not have the 'ExtPos' feature
51 | [Line 3641 Sent KR1h0001_001_par3_77-80 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '填 然' does not have the 'ExtPos' feature
52 | ...suppressing further errors regarding Warning
53 | Warnings: 103
54 | *** PASSED ***
55 | /net/work/people/zeman/unidep/tools/validate.py --lang lzh --max-err=10 UD_Classical_Chinese-Kyoto/lzh_kyoto-ud-train.conllu
56 | [Line 180 Sent KR1h0004_004_par5_43-46 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '惡 乎' does not have the 'ExtPos' feature
57 | [Line 2030 Sent KR1h0004_005_par22_16-19 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '斐 然' does not have the 'ExtPos' feature
58 | [Line 3271 Sent KR1h0004_006_par21_7-11 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
59 | [Line 3287 Sent KR1h0004_006_par21_16-21 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
60 | [Line 3546 Sent KR1h0004_006_par27_13-19 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
61 | [Line 3841 Sent KR1h0004_007_par4_5-8 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '申 如' does not have the 'ExtPos' feature
62 | [Line 3841 Sent KR1h0004_007_par4_5-8 Node 1]: [L3 Warning fixed-gap] Gaps in fixed expression [1, 3] '申 * 如'
63 | [Line 3848 Sent KR1h0004_007_par4_9-12 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '夭 如' does not have the 'ExtPos' feature
64 | [Line 3848 Sent KR1h0004_007_par4_9-12 Node 1]: [L3 Warning fixed-gap] Gaps in fixed expression [1, 3] '夭 * 如'
65 | [Line 4348 Sent KR1h0004_007_par17_11-17 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature
66 | ...suppressing further errors regarding Warning
67 | Warnings: 821
68 | *** PASSED ***
69 | Validity: 1
70 | (weight=0.0769230769230769) * (score{features}=0.45) = 0.0346153846153846
71 | (weight=0.0769230769230769) * (score{genres}=0.111111111111111) = 0.00854700854700855
72 | (weight=0.0769230769230769) * (score{lemmas}=0.9) = 0.0692307692307692
73 | (weight=0.256410256410256) * (score{size}=0.878885455306727) = 0.225355244950443
74 | (weight=0.0512820512820513) * (score{split}=1) = 0.0512820512820513
75 | (weight=0.0769230769230769) * (score{tags}=0.741176470588235) = 0.0570135746606335
76 | (weight=0.307692307692308) * (score{udapi}=0.01) = 0.00307692307692308
77 | (weight=0.0769230769230769) * (score{udeprels}=0.891891891891892) = 0.0686070686070686
78 | (TOTAL score=0.517728024970282) * (availability=1) * (validity=1) = 0.517728024970282
79 | STARS = 2.5
80 | UD_Classical_Chinese-Kyoto 0.517728024970282 2.5
81 |
--------------------------------------------------------------------------------
/not-to-release/deprel.lzh:
--------------------------------------------------------------------------------
1 | compound:redup
2 | csubj:pass
3 | discourse:sp
4 | flat:vv
5 | nsubj:pass
6 | obl:lmod
7 | obl:tmod
8 |
--------------------------------------------------------------------------------
/not-to-release/feat_val.lzh:
--------------------------------------------------------------------------------
1 | AdvType=Cau
2 | AdvType=Deg
3 | AdvType=Tim
4 | NameType=Geo
5 | NameType=Giv
6 | NameType=Nat
7 | NameType=Prs
8 | NameType=Sur
9 | NounType=Class
10 | VerbType=Cop
11 |
--------------------------------------------------------------------------------
/stats.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
7 |
8 | 862394331684331680
9 | 746093745593745590
10 | 610231043310430
11 | 552827566275660
12 |
13 |
14 |
15 |
16 |
17 |
18 | 8650
19 | 42224
20 | 7517
21 | 7623
22 | 131
23 | 123065
24 | 8133
25 | 21383
26 | 20798
27 | 45765
28 | 5
29 | 9150
30 | 1364
31 | 137360
32 |
33 |
34 |
35 | 650
36 | 210
37 | 6435
38 | 811
39 | 42761
40 | 8850
41 | 25
42 | 1901
43 | 26911
44 | 154
45 | 1505
46 | 176
47 | 2949
48 | 5815
49 | 15455
50 | 13277
51 | 3517
52 | 7700
53 | 913
54 | 141
55 | 1916
56 | 737
57 | 11102
58 | 14405
59 | 4197
60 | 1362
61 | 15239
62 | 1041
63 | 630
64 | 402
65 | 94
66 | 13475
67 | 12996
68 | 2790
69 | 97
70 |
71 |
72 |
73 | 4866
74 | 4486
75 | 43318
76 | 14372
77 | 1
78 | 4527
79 | 16902
80 | 5948
81 | 6776
82 | 2069
83 | 5352
84 | 634
85 | 18336
86 | 2790
87 | 2981
88 | 62
89 | 1
90 | 5370
91 | 825
92 | 12965
93 | 269
94 | 423
95 | 764
96 | 11499
97 | 64
98 | 5759
99 | 1252
100 | 116
101 | 2779
102 | 30099
103 | 50565
104 | 1414
105 | 17
106 | 5420
107 | 59136
108 | 5608
109 | 5019
110 | 4186
111 | 2
112 | 8444
113 | 86239
114 | 202
115 | 1311
116 |
117 |
118 |
--------------------------------------------------------------------------------