└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # TC32 : Multi Class Classification Dataset for Turkish 2 | Text Classification Dataset for Turkish Language 3 | 4 | 5 | * Benchmark dataset for Turkish text classification 6 | * It contians 430K lines, 32 categories 7 | * Each category roughly has 13K comments 8 | * Data is collected from Turkish web sites 9 | * the data contains the comments of the products and product categories 10 | * Baseline algoritm , Naive Bayes gets %84 F1 score as follows 11 | 12 | 13 | Download Link 14 | https://www.kaggle.com/savasy/multiclass-classification-data-for-turkish-tc32 15 | --------------------------------------------------------------------------------