├── CONTRIBUTING.md ├── LICENSE.txt ├── README.md ├── eval.log ├── lzh_kyoto-ud-dev.conllu ├── lzh_kyoto-ud-test.conllu ├── lzh_kyoto-ud-train.conllu ├── not-to-release ├── deprel.lzh └── feat_val.lzh └── stats.xml /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | Please do not make pull requests against master, any such pull requests will be 4 | closed. Instead make them against the dev branch. 5 | 6 | For full details on the branch policy see 7 | [here](http://universaldependencies.org/release_checklist.html#repository-branches). 8 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | The treebank is licensed under the Creative Commons License Attribution-ShareAlike 4.0 International. 2 | 3 | The complete license text is available at: 4 | http://creativecommons.org/licenses/by-sa/4.0/legalcode 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | Classical Chinese Universal Dependencies Treebank annotated and converted by Institute for Research in Humanities, Kyoto University. 4 | 5 | # Introduction 6 | 7 | This Treebank is taken under the full text of [論語](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR1h0004), [孟子](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR1h0001), [禮記](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR1d0052), [十八史略](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/18shilue), [楚辭](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR4a0001), [戰國策](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR2e0003), and others. In Classical Chinese we had no spaces or punctuations between words or sentences, so we did not include any spaces or punctuations in Treebank files: 8 | 9 | * lzh_kyoto-ud-test.conllu 10 | - 學而篇第一 為政篇第二 and 八佾篇第三 from 論語 11 | - 梁惠王上 and 梁惠王下 from 孟子 12 | - 中庸 from 禮記 13 | - 春秋戰國 from 十八史略 14 | - 離騷 from 楚辭 15 | - [摩訶般若波羅蜜大明呪經](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR6c0127) 16 | - 東周 from 戰國策 17 | 18 | * lzh_kyoto-ud-dev.conllu 19 | - 顏淵篇第十二 子路篇第十三 and 憲問篇第十四 from 論語 20 | - 告子上 and 告子下 from 孟子 21 | - 大學 from 禮記 22 | - 唐 from 十八史略 23 | - 遠遊 from 楚辭 24 | - [金剛般若波羅蜜經](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR6c0023) 25 | - 西周 from 戰國策 26 | 27 | * lzh_kyoto-ud-train.conllu 28 | - 論語 (except for 學而篇第一 為政篇第二 八佾篇第三 顏淵篇第十二 子路篇第十三 憲問篇第十四) 29 | - 孟子 (except for 梁惠王上 梁惠王下 告子上 告子下) 30 | - 禮記 (except for 中庸 大學) 31 | - 十八史略 (except for 春秋戰國 唐) 32 | - 九歌 天問 九章 卜居 漁父 九辯 and 招魂 from 楚辭 33 | - [唐詩三百首](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR4h0169) 34 | - [佛說阿彌陀經](https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-kanbun/-/tree/master/kanripo/kR6f0082) 35 | - 戰國策 (except for 東周 西周) 36 | 37 | # References 38 | 39 | * Koichi Yasuoka: [Universal Dependencies Treebank of the Four Books in Classical Chinese](http://hdl.handle.net/2433/245217), DADH2019: 10th International Conference of Digital Archives and Digital Humanities (December 2019), pp.20-28. 40 | * Koichi Yasuoka, Christian Wittern, Tomohiko Morioka, Takumi Ikeda, Naoki Yamazaki, Yoshihiro Nikaido, Shingo Suzuki, Shigeki Moro, Kazunori Fujita: [Designing Universal Dependencies for Classical Chinese and Its Application](http://id.nii.ac.jp/1001/00216242/), Journal of Information Processing Society of Japan, Vol.63, No.2 (February 2022), pp.355-363. 41 | 42 | # Changelog 43 | 44 | * 2025-05-15 v2.16 45 | * bug fix and Gloss addition. 46 | 47 | * 2024-11-15 v2.15 48 | * bug fix and Gloss addition. 49 | 50 | * 2024-05-15 v2.14 51 | * bug fix and Gloss addition. 52 | 53 | * 2023-11-15 v2.13 54 | * bug fix for `fixed`. 55 | 56 | * 2023-05-15 v2.12 57 | * 戰國策 added. 58 | 59 | * 2022-11-15 v2.11 60 | * `nsubj:outer` and `csubj:outer` supported. 61 | * 唐詩三百首 added. 62 | 63 | * 2022-05-15 v2.10 64 | * 摩訶般若波羅蜜大明呪經 added. 65 | * 金剛般若波羅蜜經 added. 66 | * 佛說阿彌陀經 added. 67 | 68 | * 2021-11-15 v2.9 69 | * 9 poetries from 楚辭 added. 70 | 71 | * 2021-05-15 v2.8 72 | * 2 eras from 十八史略 added. 73 | 74 | * 2020-11-15 v2.7 75 | * 7 volumes from 禮記 added. 76 | * 17 eras from 十八史略 added. 77 | 78 | * 2020-05-15 v2.6 79 | * 19 volumes from 禮記 added. 80 | 81 | * 2019-11-15 v2.5 82 | * 22 volumes from 禮記 added. 83 | 84 | * 2019-05-15 v2.4 85 | * Initial release in Universal Dependencies. 86 | 87 |
 88 | === Machine-readable metadata (DO NOT REMOVE!) ================================
 89 | Data available since: UD v2.4
 90 | License: PD
 91 | Includes text: yes
 92 | Genre: nonfiction poetry
 93 | Lemmas: converted with corrections
 94 | UPOS: converted with corrections
 95 | XPOS: converted with corrections
 96 | Features: converted with corrections
 97 | Relations: manual native
 98 | Contributors: Yasuoka, Koichi; Wittern, Christian; Morioka, Tomohiko; Ikeda, Takumi; Yamazaki, Naoki; Nikaido, Yoshihiro; Suzuki, Shingo; Moro, Shigeki; Li, Yuan; Shirasu, Hiroyuki; Fujita, Kazunori
 99 | Contributing: elsewhere
100 | Contact: yasuoka@kanji.zinbun.kyoto-u.ac.jp
101 | ===============================================================================
102 | 
103 | -------------------------------------------------------------------------------- /eval.log: -------------------------------------------------------------------------------- 1 | Running the following version of UD tools: 2 | commit ecbbdff44b15c9b6de4a691e3499c1286459ab2e 3 | Author: Dan Zeman 4 | Date: Fri May 9 21:07:42 2025 +0200 5 | Evaluating the following revision of UD_Classical_Chinese-Kyoto: 6 | commit c32426535f4863cb4b8eae9a59c68fc6b0225f42 7 | Merge: dbf956f 779e363 8 | Author: Dan Zeman 9 | Size: counted 433168 of 433168 words (nodes). 10 | Size: min(0, log((N/1000)**2)) = 12.1422512870313. 11 | Size: maximum value 13.815511 is for 1000000 words or more. 12 | Split: Found more than 10000 training words. 13 | Split: Found at least 10000 development words. 14 | Split: Found at least 10000 test words. 15 | Lemmas: source of annotation (from README) factor is 0.9. 16 | Universal POS tags: 14 out of 17 found in the corpus. 17 | Universal POS tags: source of annotation (from README) factor is 0.9. 18 | Features: 171622 out of 433168 total words have one or more features. 19 | Features: source of annotation (from README) factor is 0.9. 20 | Universal relations: 33 out of 37 found in the corpus. 21 | Universal relations: source of annotation (from README) factor is 1. 22 | Udapi: 23 | TOTAL 132445 24 | Udapi: found 132445 bugs. 25 | Udapi: worst expected case (threshold) is one bug per 10 words. There are 433168 words. 26 | Genres: found 2 out of 18 known. 27 | /net/work/people/zeman/unidep/tools/validate.py --lang lzh --max-err=10 UD_Classical_Chinese-Kyoto/lzh_kyoto-ud-dev.conllu 28 | [Line 940 Sent KR1h0004_012_par12_3-13 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 29 | [Line 1033 Sent KR1h0004_012_par15_11-17 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 30 | [Line 1907 Sent KR1h0004_013_par3_41-51 Node 8]: [L3 Warning fixed-without-extpos] Fixed expression '闕 如' does not have the 'ExtPos' feature 31 | [Line 2397 Sent KR1h0004_013_par11_9-16 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 32 | [Line 2544 Sent KR1h0004_013_par15_4-10 Node 4]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 33 | [Line 2564 Sent KR1h0004_013_par15_17-25 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 34 | [Line 2638 Sent KR1h0004_013_par15_65-73 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 35 | [Line 3003 Sent KR1h0004_013_par20_59-61 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '硜 然' does not have the 'ExtPos' feature 36 | [Line 3003 Sent KR1h0004_013_par20_59-61 Node 1]: [L3 Warning fixed-gap] Gaps in fixed expression [1, 3] '硜 * 然' 37 | [Line 3017 Sent KR1h0004_013_par20_65-71 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 38 | ...suppressing further errors regarding Warning 39 | Warnings: 73 40 | *** PASSED *** 41 | /net/work/people/zeman/unidep/tools/validate.py --lang lzh --max-err=10 UD_Classical_Chinese-Kyoto/lzh_kyoto-ud-test.conllu 42 | [Line 1238 Sent KR1h0004_002_par9_18-21 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '足 以' does not have the 'ExtPos' feature 43 | [Line 1306 Sent KR1h0004_002_par11_8-12 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 44 | [Line 2408 Sent KR1h0004_003_par14_8-12 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '郁 乎' does not have the 'ExtPos' feature 45 | [Line 2408 Sent KR1h0004_003_par14_8-12 Node 1]: [L3 Warning fixed-gap] Gaps in fixed expression [1, 3] '郁 * 乎' 46 | [Line 2858 Sent KR1h0004_003_par23_15-17 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '翕 如' does not have the 'ExtPos' feature 47 | [Line 2869 Sent KR1h0004_003_par23_20-22 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '純 如' does not have the 'ExtPos' feature 48 | [Line 2875 Sent KR1h0004_003_par23_23-25 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '皦 如' does not have the 'ExtPos' feature 49 | [Line 2881 Sent KR1h0004_003_par23_26-28 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '繹 如' does not have the 'ExtPos' feature 50 | [Line 3073 Sent KR1h0001_001_par1_16-23 Node 3]: [L3 Warning fixed-without-extpos] Fixed expression '有 以' does not have the 'ExtPos' feature 51 | [Line 3641 Sent KR1h0001_001_par3_77-80 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '填 然' does not have the 'ExtPos' feature 52 | ...suppressing further errors regarding Warning 53 | Warnings: 103 54 | *** PASSED *** 55 | /net/work/people/zeman/unidep/tools/validate.py --lang lzh --max-err=10 UD_Classical_Chinese-Kyoto/lzh_kyoto-ud-train.conllu 56 | [Line 180 Sent KR1h0004_004_par5_43-46 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '惡 乎' does not have the 'ExtPos' feature 57 | [Line 2030 Sent KR1h0004_005_par22_16-19 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '斐 然' does not have the 'ExtPos' feature 58 | [Line 3271 Sent KR1h0004_006_par21_7-11 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 59 | [Line 3287 Sent KR1h0004_006_par21_16-21 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 60 | [Line 3546 Sent KR1h0004_006_par27_13-19 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 61 | [Line 3841 Sent KR1h0004_007_par4_5-8 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '申 如' does not have the 'ExtPos' feature 62 | [Line 3841 Sent KR1h0004_007_par4_5-8 Node 1]: [L3 Warning fixed-gap] Gaps in fixed expression [1, 3] '申 * 如' 63 | [Line 3848 Sent KR1h0004_007_par4_9-12 Node 1]: [L3 Warning fixed-without-extpos] Fixed expression '夭 如' does not have the 'ExtPos' feature 64 | [Line 3848 Sent KR1h0004_007_par4_9-12 Node 1]: [L3 Warning fixed-gap] Gaps in fixed expression [1, 3] '夭 * 如' 65 | [Line 4348 Sent KR1h0004_007_par17_11-17 Node 2]: [L3 Warning fixed-without-extpos] Fixed expression '可 以' does not have the 'ExtPos' feature 66 | ...suppressing further errors regarding Warning 67 | Warnings: 821 68 | *** PASSED *** 69 | Validity: 1 70 | (weight=0.0769230769230769) * (score{features}=0.45) = 0.0346153846153846 71 | (weight=0.0769230769230769) * (score{genres}=0.111111111111111) = 0.00854700854700855 72 | (weight=0.0769230769230769) * (score{lemmas}=0.9) = 0.0692307692307692 73 | (weight=0.256410256410256) * (score{size}=0.878885455306727) = 0.225355244950443 74 | (weight=0.0512820512820513) * (score{split}=1) = 0.0512820512820513 75 | (weight=0.0769230769230769) * (score{tags}=0.741176470588235) = 0.0570135746606335 76 | (weight=0.307692307692308) * (score{udapi}=0.01) = 0.00307692307692308 77 | (weight=0.0769230769230769) * (score{udeprels}=0.891891891891892) = 0.0686070686070686 78 | (TOTAL score=0.517728024970282) * (availability=1) * (validity=1) = 0.517728024970282 79 | STARS = 2.5 80 | UD_Classical_Chinese-Kyoto 0.517728024970282 2.5 81 | -------------------------------------------------------------------------------- /not-to-release/deprel.lzh: -------------------------------------------------------------------------------- 1 | compound:redup 2 | csubj:pass 3 | discourse:sp 4 | flat:vv 5 | nsubj:pass 6 | obl:lmod 7 | obl:tmod 8 | -------------------------------------------------------------------------------- /not-to-release/feat_val.lzh: -------------------------------------------------------------------------------- 1 | AdvType=Cau 2 | AdvType=Deg 3 | AdvType=Tim 4 | NameType=Geo 5 | NameType=Giv 6 | NameType=Nat 7 | NameType=Prs 8 | NameType=Sur 9 | NounType=Class 10 | VerbType=Cop 11 | -------------------------------------------------------------------------------- /stats.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 7 | 8 | 862394331684331680 9 | 746093745593745590 10 | 610231043310430 11 | 552827566275660 12 | 13 | 14 | 15 | 16 | 17 | 18 | 8650 19 | 42224 20 | 7517 21 | 7623 22 | 131 23 | 123065 24 | 8133 25 | 21383 26 | 20798 27 | 45765 28 | 5 29 | 9150 30 | 1364 31 | 137360 32 | 33 | 34 | 35 | 650 36 | 210 37 | 6435 38 | 811 39 | 42761 40 | 8850 41 | 25 42 | 1901 43 | 26911 44 | 154 45 | 1505 46 | 176 47 | 2949 48 | 5815 49 | 15455 50 | 13277 51 | 3517 52 | 7700 53 | 913 54 | 141 55 | 1916 56 | 737 57 | 11102 58 | 14405 59 | 4197 60 | 1362 61 | 15239 62 | 1041 63 | 630 64 | 402 65 | 94 66 | 13475 67 | 12996 68 | 2790 69 | 97 70 | 71 | 72 | 73 | 4866 74 | 4486 75 | 43318 76 | 14372 77 | 1 78 | 4527 79 | 16902 80 | 5948 81 | 6776 82 | 2069 83 | 5352 84 | 634 85 | 18336 86 | 2790 87 | 2981 88 | 62 89 | 1 90 | 5370 91 | 825 92 | 12965 93 | 269 94 | 423 95 | 764 96 | 11499 97 | 64 98 | 5759 99 | 1252 100 | 116 101 | 2779 102 | 30099 103 | 50565 104 | 1414 105 | 17 106 | 5420 107 | 59136 108 | 5608 109 | 5019 110 | 4186 111 | 2 112 | 8444 113 | 86239 114 | 202 115 | 1311 116 | 117 | 118 | --------------------------------------------------------------------------------