├── CyberCan.xlsx └── README.md /CyberCan.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shenfei1010/CyberCan/HEAD/CyberCan.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CyberCan 2 | Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan, a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. The details regarding the creation of the lexicon could be found here: https://osf.io/preprints/socarxiv/tyjr7 3 | 4 | Citation: 5 | Shen, F., Yu, W., Min, C., Ye, Q., Xia, C., Wang, T., & Wu, Y. (n.d.). CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation. Retrieved from osf.io/preprints/socarxiv/tyjr7 6 | --------------------------------------------------------------------------------