├── imgs
    └── kjt.png
├── requirements.txt
├── examples
    └── finetune_edubert.sh
├── LICENSE
├── docs
    └── 使用文档.md
├── README.md
└── src
    └── finetune.py


/imgs/kjt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tal-tech/edu-bert/HEAD/imgs/kjt.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | datasets==1.1.2
 2 | numpy==1.19.4
 3 | pandas==1.1.4
 4 | pyOpenSSL==19.1.0
 5 | scikit-learn==0.23.2
 6 | scipy==1.5.4
 7 | torch==1.4.0
 8 | transformers==3.5.1
 9 | dataclasses==0.8
10 | 


--------------------------------------------------------------------------------
/examples/finetune_edubert.sh:
--------------------------------------------------------------------------------
 1 | export MODEL_PATH=../../path_to_EduBERT # 如果需要加载tensorflow模型，模型文件夹名需要以.cpkt结尾，如下面一行所示
 2 | #export MODEL_PATH=../edu-bert.ckpt #加载tensorflow模型的文件夹名称样例
 3 | export DATA_DIR=../data/
 4 | export TASK_NAME=CoLA
 5 | python ../src/finetune.py \
 6 |   --model_name_or_path $MODEL_PATH \
 7 |   --task_name $TASK_NAME \
 8 |   --data_dir ${DATA_DIR} \
 9 |   --do_train \
10 |   --do_eval \
11 |   --max_seq_length 128 \
12 |   --per_device_train_batch_size 32 \
13 |   --learning_rate 2e-5 \
14 |   --num_train_epochs 3.0 \
15 |   --output_dir ./output \
16 |   --overwrite_output_dir \
17 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 好未来技术
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/docs/使用文档.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # EduBERT
 3 | 
 4 | ## 依赖环境
 5 | 
 6 | <br>
 7 | 
 8 | ```
 9 | conda create -n py36_test python=3.6 --yes
10 | 
11 | pip install -r requirements.txt
12 | ```
13 | 
14 | 
15 | <br>
16 | 
17 | ## 代码结构
18 | 
19 | <br>
20 | 
21 | 为了方便大家使用EduBERT，我们也开源了在下游任务上进行Finetune的代码，同时也附上了输入数据的样例(data/train.tsv, data/dev.tsv，由于数据隐私问题，我们只提供了10条人造的数据，仅作为格式参考)，欢迎使用～
22 | 
23 | <br>
24 | 
25 | [data](../data/): Finetune输入数据样例
26 | 
27 | <br>
28 | 
29 | ```
30 | 数据格式：
31 | \t[label]\t\t[text]
32 | 
33 | [label]:该条文本的标签
34 | [text]:该条文本的内容
35 | \t:制表符
36 | ```
37 | 
38 | 
39 | <br>
40 | 
41 | [examples](../examples/): 使用方法样例
42 | 
43 | <br>
44 | 
45 | ```
46 | # ./examples/finetune_edubert.sh
47 | 
48 | export MODEL_PATH=/YourPath/EduBERT
49 | # MODEL_PATH为模型下载后存放的地址
50 | 
51 | export DATA_DIR=../data/
52 | # Finetune输入数据地址，默认为输入数据样例的地址
53 | 
54 | export TASK_NAME=CoLA
55 | python ../src/finetune.py \
56 |   --model_name_or_path $MODEL_PATH \
57 |   --task_name $TASK_NAME \
58 |   --data_dir ${DATA_DIR} \
59 |   --do_train \
60 |   --do_eval \
61 |   --max_seq_length 128 \
62 |   --per_device_train_batch_size 32 \
63 |   --learning_rate 2e-5 \
64 |   --num_train_epochs 3.0 \
65 |   --output_dir ./output \
66 |   --overwrite_output_dir \
67 | 
68 | ```
69 | 
70 | <br>
71 | 
72 | [src](../src): 源代码文件
73 | 
74 | <br>
75 | 
76 | 
77 | 
78 | 
79 | 
80 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 好未来开源教育领域首个在线教学中文预训练模型TAL-EduBERT
  2 | 
  3 | ## 一、背景及下载地址
  4 | 
  5 | ### 1. 背景
  6 | 
  7 | 2020年初Covid-19疫情的爆发对各行各业产生了不可小觑的影响，也让以线下方式为主的传统教育在短期内受到了极大的冲击，更多人开始看到科技对教育市场的价值。在线教育成为了特殊时期教学的最佳选择，大规模地渗透至每一所学校、每一个家庭。在线教育的爆火使得教育行业产生了海量的在线教学语音识别（Automatic Speech Recognition，以下简称ASR）文本数据，极大地推动了教育领域技术的发展。
  8 | 
  9 | 数据作为产业最为核心和宝贵的资源之一，更是自然语言处理技术（Natural Language Processing，以下简称NLP）在各个领域得以应用和发展的基础。在线教育文本数据有着区别于通用场景数据的特有属性，给在线教育领域NLP的研究、应用和发展带来了极大的挑战，一是从音视频转录出来的文本数据中，存在着较多的ASR错误，这些错误可能会对文本处理相关任务的效果造成较大的影响；二是数据中含有大量的教育领域特有的专有词汇，现有的通用领域的开源词向量和开源预训练语言模型（如Google BERT Base<sup>[1]</sup>，Roberta<sup>[2]</sup>等）对于这些词汇的语义表示能力有限，进而会影响后续任务的效果。
 10 | 
 11 | 为了帮助解决这两个问题，好未来AI中台机器学习团队从多个来源收集了超过2000万条（约包含3.8亿Tokens）的教育领域中文ASR文本数据，基于此建立了教育领域首个在线教学中文预训练模型TAL-EduBERT，并把其推至开源。
 12 | 
 13 | 从2018年谷歌发布预训练模型BERT以来，以BERT为代表的预训练语言模型， 在各个自然语言处理任务上都达到了SOTA的效果。并且作为通用的预训练语言模型，BERT的出现，使得NLP算法工程师不需要进行繁重的网络结构的修改，直接对于下游任务进行fine-tune，便可得到比以往的深度学习方法更好的效果，显著的减轻了NLP算法工程师的繁重的调整模型网络结构的工作，降低了算法应用的成本，预训练语言模型已经成为工作中不可或缺的一项基础技术。
 14 | 
 15 | 但是，当前开源的各类中文领域的深度预训练模型，多是面向通用领域的应用需求，在包括教育在内的多个垂直领域均没有看到相关开源模型。相较于谷歌发布的Google BERT Base以及开源的中文Roberta模型，**好未来本次开源的TAL-EduBERT在多个教育领域的下游任务中得到了显著的效果提升**。好未来希望通过本次开源，助力推动 NLP技术在教育领域的应用发展，欢迎各位同仁下载使用。
 16 | 
 17 | ### 2. 模型下载
 18 | 
 19 | 下载地址：
 20 | 
 21 | pytorch版：[https://ai.100tal.com/download/TAL-EduBERT.zip](https://ai.100tal.com/download/TAL-EduBERT.zip)
 22 | 
 23 | tensorflow版：[https://ai.100tal.com/download/TAL-EduBERT-TF.zip](https://ai.100tal.com/download/TAL-EduBERT-TF.zip)
 24 | 
 25 | 
 26 | ## 二、 模型结构及训练数据
 27 | 
 28 | ### 1. 模型结构
 29 | TAL-EduBERT在网络结构上，采用与Google BERT Base相同的结构，包含12层的Transformer编码器、768个隐藏单元以及12个multi-head attention的head。之所以使用BERT Base的网络结构，是因为我们考虑到实际使用的便捷性和普遍性，后续会进一步开源其他教育领域ASR预训练语言模型。
 30 | 
 31 | ### 2. 训练语料
 32 | TAL-EduBERT所采用的预训练语料，主要源于好未来内部积淀的海量教师教学语音经ASR转录而得到的文本，对于语料进行筛选、预处理后，选取了超过2000万条教育ASR文本，大约包含3.8亿Tokens。
 33 | 
 34 | ### 3. 预训练方式
 35 |  
 36 | ![Alt text](imgs/kjt.png?raw=true "")
 37 | 
 38 | 如上图所示，TAL-EduBERT采取了与BERT相同的两种预训练任务来进行预训练学习，分别是教育领域字级别任务（Masked Language Modeling，简称MLM）和句子级别的训练任务（Next Sentence Prediction，简称NSP），通过这两个任务，使得TAL-EduBERT能够捕获教育ASR文本数据中的字、词和句子级别的语法和语义信息。
 39 | 
 40 | ## 三、 下游任务实验结果
 41 | 为了证明TAL-EduBERT在下游任务上的效果，我们从实际业务中抽取了4类典型的在线教育领域教学行为预测任务数据集，详见文献[3][4]。在此基础上，我们与Google BERT Base这一在中文领域应用最为广泛的模型以及效果较好的Roberta做了对比，实验结果表明，TAL-EduBERT在教育ASR下游任务上取得了较好的效果。
 42 | 
 43 | ### 1. 实验简介：教师行为预测
 44 | 此任务来源于我们对老师的教学行为进行智能化的评估，具体我们评估了四项教师行为，分别是引导学生进行课后总结（Conclude）、带着学生记笔记（Note）、表扬学生（Praise）和提问学生（QA）。通过对教师教学行为进行分类，给老师打上行为标签，从而更方便地分析老师教学行为，进而辅助老师更好地教学，提升教学质量。
 45 | 
 46 | ### 2. 实验结果：
 47 | <table>
 48 |     <tr>
 49 |         <th colspan="2">Task\Model</th><th>Conclude</th><th>Note</th><th>Praise</th><th>QA</th>
 50 |     </tr>
 51 |     <tr>
 52 |         <td rowspan="2">Google BERT</td><td>Acc</td><td>0.7036</td><td>0.8436</td><td>0.8652</td><td>0.8948</td>
 53 |     </tr>
 54 |     <tr>
 55 |         <td>F1</td><td>0.6404</td><td>0.8356</td><td>0.8683</td><td>0.8469</td>
 56 |     </tr>
 57 |     <tr>
 58 |         <td rowspan="2">Roberta</td><td>Acc</td><td>0.7097</td><td>0.8558</td><td>0.8689</td><td>0.8979</td>
 59 |     </tr>
 60 |     <tr>
 61 |         <td>F1</td><td>0.6382</td><td>0.8464</td><td>0.8668</td><td>0.8433</td>
 62 |     </tr>
 63 | 	<tr>
 64 |         <td rowspan="2">TAL-EduBERT</td><td>Acc</td><td>0.7270</td><td>0.8638</td><td>0.8731</td><td>0.9147</td>
 65 |     </tr>
 66 |     <tr>
 67 |         <td>F1</td><td>0.6486</td><td>0.8549</td><td>0.8688</td><td>0.8721</td>
 68 |     </tr>
 69 | </table>
 70 | 
 71 | ## 四、 适用范围、使用方法及使用案例
 72 | ### 1. 适用范围：
 73 | 相较于Google BERT Base和Roberta，TAL-EduBERT基于大量教育ASR文本数据训练，因此对于ASR的识别错误具有较强的鲁棒性，并且在教育场景的下游任务上也具有较好的效果。鉴于此，我们推荐从事教育，并且工作内容与ASR文本相关的NLP算法工程师使用我们的模型，希望能通过本次的开源，推进自然语言处理在教育领域的应用和发展。
 74 | 
 75 | ### 2. 使用方法：
 76 | 与Google发布的原生BERT使用方式一致，支持transformers包，因此在使用时，直接进行模型路径替换即可。
 77 | 
 78 | ### 3.使用案例：
 79 | ```
 80 | from transformers import BertTokenizer, BertModel
 81 | import torch
 82 | 
 83 | path_to_TAL-EduBERT = "/YourPath/TAL-EduBERT/"
 84 | 
 85 | tokenizer = BertTokenizer.from_pretrained(path_to_TAL-EduBERT)
 86 | model = BertModel.from_pretrained(path_to_TAL-EduBERT)
 87 | 
 88 | sentence = "让我们来看一下这道题，这个题的也是一种比较经典类型的这个数列题目他呢，有个特点就是前面的是an+1，后面是一个an的式子加上一个根号下an的，一个二次的一个式子。"
 89 | inputs = tokenizer(sentence, return_tensors="pt")
 90 | outputs = model(**inputs)
 91 | last_hidden_states = outputs.last_hidden_state
 92 | ```
 93 | ## 五、 小结
 94 | 为了证明TAL-EduBERT在教育领域下游任务的优势，我们从教育场景中的四类业务问题和数据入手进行了对比实验，对比Google BERT Base和Roberta这两种通用领域的预训练模型可知，TAL-EduBERT效果显著提升，在F1上最高提升大约3个百分点。因此，想要在教育领域进行NLP相关方向探索的技术伙伴可以直接使用TAL-EduBERT开展更专业地教育技术实践训练。
 95 | 
 96 | 本文介绍了 TAL-EduBERT 的开源背景、数据背景、对比实验结果。后续，好未来AI中台也会持续进行理论创新和实践探索，进行更全面的开源开放，非常欢迎从事相关领域的伙伴们提供更多、更丰富的对比实验和实际应用案例，让我们共同推进自然语言处理技术在教育领域的应用和发展，为中国的教育事业注入新的动能。
 97 | 
 98 | 
 99 | ## 参考文献：
100 |     [1] Devlin, Jacob, et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
101 |     [2] Liu, Yinhan, et al. "Roberta: A robustly optimized BERT pretraining approach." arXiv preprint arXiv:1907.11692 (2019).
102 |     [3] Huang, Gale Yan, et al. "Neural Multi-Task Learning for Teacher Question Detection in Online Classrooms." International Conference on Artificial Intelligence in Education. Springer, Cham, 2020. 
103 |     [4] Xu, Shiting, Wenbiao Ding, and Zitao Liu. "Automatic Dialogic Instruction Detection for K-12 Online One-on-one Classes." International Conference on Artificial Intelligence in Education. Springer, Cham, 2020.
104 | 
105 | 


--------------------------------------------------------------------------------
/src/finetune.py:
--------------------------------------------------------------------------------
  1 | """ Finetuning the library models for sequence classification on GLUE (Bert, XLM, XLNet, RoBERTa, Albert, XLM-RoBERTa)."""
  2 | 
  3 | 
  4 | import dataclasses
  5 | import logging
  6 | import os
  7 | import sys
  8 | from dataclasses import dataclass, field
  9 | from typing import Dict, Optional
 10 | 
 11 | import numpy as np
 12 | 
 13 | from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer, EvalPrediction, GlueDataset
 14 | from transformers import GlueDataTrainingArguments as DataTrainingArguments
 15 | from transformers import (
 16 |     HfArgumentParser,
 17 |     Trainer,
 18 |     TrainingArguments,
 19 |     glue_compute_metrics,
 20 |     glue_output_modes,
 21 |     glue_tasks_num_labels,
 22 |     set_seed,
 23 | )
 24 | 
 25 | 
 26 | logger = logging.getLogger(__name__)
 27 | 
 28 | 
 29 | @dataclass
 30 | class ModelArguments:
 31 |     """
 32 |     Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
 33 |     """
 34 | 
 35 |     model_name_or_path: str = field(
 36 |         metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
 37 |     )
 38 |     config_name: Optional[str] = field(
 39 |         default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
 40 |     )
 41 |     tokenizer_name: Optional[str] = field(
 42 |         default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
 43 |     )
 44 |     cache_dir: Optional[str] = field(
 45 |         default=None, metadata={"help": "Where do you want to store the pretrained models downloaded from s3"}
 46 |     )
 47 | 
 48 | 
 49 | def main():
 50 |     # See all possible arguments in src/transformers/training_args.py
 51 |     # or by passing the --help flag to this script.
 52 |     # We now keep distinct sets of args, for a cleaner separation of concerns.
 53 | 
 54 |     parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
 55 | 
 56 |     if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
 57 |         # If we pass only one argument to the script and it's the path to a json file,
 58 |         # let's parse it to get our arguments.
 59 |         model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
 60 |     else:
 61 |         model_args, data_args, training_args = parser.parse_args_into_dataclasses()
 62 | 
 63 |     if (
 64 |         os.path.exists(training_args.output_dir)
 65 |         and os.listdir(training_args.output_dir)
 66 |         and training_args.do_train
 67 |         and not training_args.overwrite_output_dir
 68 |     ):
 69 |         raise ValueError(
 70 |             f"Output directory ({training_args.output_dir}) already exists and is not empty. Use --overwrite_output_dir to overcome."
 71 |         )
 72 | 
 73 |     # Setup logging
 74 |     logging.basicConfig(
 75 |         format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
 76 |         datefmt="%m/%d/%Y %H:%M:%S",
 77 |         level=logging.INFO if training_args.local_rank in [-1, 0] else logging.WARN,
 78 |     )
 79 |     logger.warning(
 80 |         "Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s",
 81 |         training_args.local_rank,
 82 |         training_args.device,
 83 |         training_args.n_gpu,
 84 |         bool(training_args.local_rank != -1),
 85 |         training_args.fp16,
 86 |     )
 87 |     logger.info("Training/evaluation parameters %s", training_args)
 88 | 
 89 |     # Set seed
 90 |     set_seed(training_args.seed)
 91 | 
 92 |     try:
 93 |         num_labels = glue_tasks_num_labels[data_args.task_name]
 94 |         output_mode = glue_output_modes[data_args.task_name]
 95 |     except KeyError:
 96 |         raise ValueError("Task not found: %s" % (data_args.task_name))
 97 | 
 98 |     # Load pretrained model and tokenizer
 99 |     #
100 |     # Distributed training:
101 |     # The .from_pretrained methods guarantee that only one local process can concurrently
102 |     # download model & vocab.
103 | 
104 |     config = AutoConfig.from_pretrained(
105 |         model_args.config_name if model_args.config_name else model_args.model_name_or_path,
106 |         num_labels=num_labels,
107 |         finetuning_task=data_args.task_name,
108 |         cache_dir=model_args.cache_dir,
109 |     )
110 |     tokenizer = AutoTokenizer.from_pretrained(
111 |         model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
112 |         cache_dir=model_args.cache_dir,
113 |     )
114 |     model = AutoModelForSequenceClassification.from_pretrained(
115 |         model_args.model_name_or_path,
116 |         from_tf=bool(".ckpt" in model_args.model_name_or_path),
117 |         config=config,
118 |         cache_dir=model_args.cache_dir,
119 |     )
120 | 
121 |     # Get datasets
122 |     train_dataset = GlueDataset(data_args, tokenizer=tokenizer) if training_args.do_train else None
123 |     eval_dataset = GlueDataset(data_args, tokenizer=tokenizer, mode="dev") if training_args.do_eval else None
124 |     test_dataset = GlueDataset(data_args, tokenizer=tokenizer, mode="test") if training_args.do_predict else None
125 | 
126 |     def compute_metrics(p: EvalPrediction) -> Dict:
127 |         if output_mode == "classification":
128 |             preds = np.argmax(p.predictions, axis=1)
129 |         elif output_mode == "regression":
130 |             preds = np.squeeze(p.predictions)
131 |         return glue_compute_metrics(data_args.task_name, preds, p.label_ids)
132 | 
133 |     # Initialize our Trainer
134 |     trainer = Trainer(
135 |         model=model,
136 |         args=training_args,
137 |         train_dataset=train_dataset,
138 |         eval_dataset=eval_dataset,
139 |         compute_metrics=compute_metrics,
140 |     )
141 | 
142 |     # Training
143 |     if training_args.do_train:
144 |         trainer.train(
145 |             model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
146 |         )
147 |         trainer.save_model()
148 |         # For convenience, we also re-save the tokenizer to the same directory,
149 |         # so that you can share your model easily on huggingface.co/models =)
150 |         if trainer.is_world_master():
151 |             tokenizer.save_pretrained(training_args.output_dir)
152 | 
153 |     # Evaluation
154 |     eval_results = {}
155 |     if training_args.do_eval:
156 |         logger.info("*** Evaluate ***")
157 | 
158 |         # Loop to handle MNLI double evaluation (matched, mis-matched)
159 |         eval_datasets = [eval_dataset]
160 |         if data_args.task_name == "mnli":
161 |             mnli_mm_data_args = dataclasses.replace(data_args, task_name="mnli-mm")
162 |             eval_datasets.append(GlueDataset(mnli_mm_data_args, tokenizer=tokenizer, mode="dev"))
163 | 
164 |         for eval_dataset in eval_datasets:
165 |             eval_result = trainer.evaluate(eval_dataset=eval_dataset)
166 | 
167 |             output_eval_file = os.path.join(
168 |                 training_args.output_dir, f"eval_results_{eval_dataset.args.task_name}.txt"
169 |             )
170 |             if trainer.is_world_master():
171 |                 with open(output_eval_file, "w") as writer:
172 |                     logger.info("***** Eval results {} *****".format(eval_dataset.args.task_name))
173 |                     for key, value in eval_result.items():
174 |                         logger.info("  %s = %s", key, value)
175 |                         writer.write("%s = %s\n" % (key, value))
176 | 
177 |             eval_results.update(eval_result)
178 | 
179 |     if training_args.do_predict:
180 |         logging.info("*** Test ***")
181 |         test_datasets = [test_dataset]
182 |         if data_args.task_name == "mnli":
183 |             mnli_mm_data_args = dataclasses.replace(data_args, task_name="mnli-mm")
184 |             test_datasets.append(GlueDataset(mnli_mm_data_args, tokenizer=tokenizer, mode="test"))
185 | 
186 |         for test_dataset in test_datasets:
187 |             predictions = trainer.predict(test_dataset=test_dataset).predictions
188 |             if output_mode == "classification":
189 |                 predictions = np.argmax(predictions, axis=1)
190 | 
191 |             output_test_file = os.path.join(
192 |                 training_args.output_dir, f"test_results_{test_dataset.args.task_name}.txt"
193 |             )
194 |             if trainer.is_world_master():
195 |                 with open(output_test_file, "w") as writer:
196 |                     logger.info("***** Test results {} *****".format(test_dataset.args.task_name))
197 |                     writer.write("index\tprediction\n")
198 |                     for index, item in enumerate(predictions):
199 |                         if output_mode == "regression":
200 |                             writer.write("%d\t%3.3f\n" % (index, item))
201 |                         else:
202 |                             item = test_dataset.get_labels()[item]
203 |                             writer.write("%d\t%s\n" % (index, item))
204 |     return eval_results
205 | 
206 | 
207 | def _mp_fn(index):
208 |     # For xla_spawn (TPUs)
209 |     main()
210 | 
211 | 
212 | if __name__ == "__main__":
213 |     main()
214 | 


--------------------------------------------------------------------------------
Task\Model		Conclude	Note	Praise	QA
Google BERT	Acc	0.7036	0.8436	0.8652	0.8948
Google BERT	F1	0.6404	0.8356	0.8683	0.8469
Roberta	Acc	0.7097	0.8558	0.8689	0.8979
Roberta	F1	0.6382	0.8464	0.8668	0.8433
TAL-EduBERT	Acc	0.7270	0.8638	0.8731	0.9147
TAL-EduBERT	F1	0.6486	0.8549	0.8688	0.8721